首页 > 代码库 > pig进阶

pig进阶

一个练习:

1、student表
1:zhangsan:boy:12:compute
3:lisi:boy:32:math
2:xiaoli:girl:23:meish
模式:
A = load ‘student‘ using PigStorage(‘:‘) as (sno:chararray,sname:chararray,ssex:chararray,sage:int,sdept:chararray);
2、课程表(课程名,课程号,先修课程号,学分)
01,English,,4
02,Data structure,05,2
03,DataBase,02,2
04,DB design,03,3
05,C languange,,3
06,Pricinples of NetWork,07,3
07,OS,05,3

模式:
Course(cno:chararray,cname:chararray,cpno:chararray,ccredit:int)
3、学生和课程表对应的选课表(学号,课程号,成绩)
1,01,92
1,03,84
2,01,23
3,05,23
2,05,89
1,04,90
模式:
SC(sno:chararray,cno:chararray,grade:int)

---------------
1、计算每个学生的平均成绩
 1、加载学生表和选课表
 2、对学生表和选课表通过学号进行连接
 3、基于学号对连接生成的表进行分组
 4、计算平均成绩

-- load
A = load ‘/input/student‘ using PigStorage(‘:‘) as (sno:chararray,sname:chararray,ssex:chararray,sage:int,sdept:chararray);
B = load ‘/input/course‘ using PigStorage(‘,‘) as(cno:chararray,cname:chararray,ccredit:int);
C = load ‘/input/SC‘ using PigStorage(‘,‘) as(sno:chararray,cno:chararray,grade:int);
-- join
D = join A by sno,C by sno;
-- group
E = foreach D generate A::sno,sname,grade;
F = group E by (sno,sname);
-- avg
G = foreach F generate group.sname,(SUM(E.grade)/COUNT(E)); -- 此处COUNT可以运行

2、找出不及格成绩的学生(姓名,课程名,成绩)
-- load
A = load ‘/input/student‘ using PigStorage(‘:‘) as (sno:chararray,sname:chararray,ssex:chararray,sage:int,sdept:chararray);
B = load ‘/input/course‘ using PigStorage(‘,‘) as(cno:chararray,cname:chararray,ccredit:int);
C = load ‘/input/SC‘ using PigStorage(‘,‘) as(sno:chararray,cno:chararray,grade:int);
-- 过滤C选课表(不及格的)
D = filter C by grade < 60;
-- join
E = join D by sno,A by sno;
-- join
F = join E by cno,B by cno;
-- output
G = foreach F generate sname,cname,grade;

3、找出修了C language课程的学生(姓名即可)
-- load
A = load ‘/input/student‘ using PigStorage(‘:‘) as (sno:chararray,sname:chararray,ssex:chararray,sage:int,sdept:chararray);
B = load ‘/input/course‘ using PigStorage(‘,‘) as(cno:chararray,cname:chararray,ccredit:int);
C = load ‘/input/SC‘ using PigStorage(‘,‘) as(sno:chararray,cno:chararray,grade:int);

-- 过滤
D = filter B by cname matches ‘C.+‘;
E = foreach D generate cno;
F = join E by cno,C by cno;
G = join F by C::sno,A by sno;
H = foreach G generate A::sname;

-- 嵌套操作(这个下面是有错误的,嵌套的时候对字段的名称需要确定好,对于cno就丢失了)
E = foreach (filter B by cname matches ‘C.+‘) generate cno;
F = join (foreach (filter B by cname matches ‘C.+‘) generate cno) by cno,C by cno;
G = join (join (foreach (filter B by cname matches ‘C.+‘) generate cno) by cno,C by cno) by C::sno,A by sno;
H = foreach (join (join (foreach (filter B by cname matches ‘C.+‘) generate cno) by cno,C by cno) by C::sno,A by sno) generate A::sname;