首页 > 代码库 > 案例3,mapreduce单表关联,根据child-parient表解析出grandchild-grandparient表

案例3,mapreduce单表关联,根据child-parient表解析出grandchild-grandparient表

1.数据样例如下

Tom Lucy 

Tom Jack 

Jone Lucy 

Jone Jack 

Lucy Mary 

Lucy Ben 

Jack Alice 

Jack Jesse 

Terry Alice 

Terry Jesse 

Philip Terry 

Philip Alma 

Mark Terry 

Mark Alma

2.map的代码如下:

            public static class ChildParentMapper extends MapReduceBase implements Mapper<Object, Text, Text, Text> {

                        private static Logger logger = Logger.getLogger(ChildParentMapper.class);

                        String childname = new String();

                        String parientname = new String();

                        String flag = new String();//左右表标识符

                        @Override

                        public void map(Object ikey, Text ivalue, OutputCollector<Text, Text> output, Reporter arg3)

                                                throws IOException {

                                    String str[] = ivalue.toString().split(" ");//分割出子和父的名称

                                    if (str[0].compareTo("child") != 0) {//忽略表头

                                                

                                                childname = str[0];//得到子名称

                                                parientname = str[1];//得到父名称

                                                // 左表=左表标识+子名称+父名称

                                                flag = "1";

                                                logger.info(new Text(parientname)+","+ new Text(flag + "+" + childname + "+" + parientname));

                                                output.collect(new Text(parientname), new Text(flag + "+" + childname + "+" + parientname));

                                                // 右表=右表标识+子名称+父名称

                                                flag = "2";

                                                logger.info(new Text(parientname)+","+ new Text(flag + "+" + childname + "+" + parientname));

                                                output.collect(new Text(childname), new Text(flag + "+" + childname + "+" + parientname));

                                    }

                        }

            }


代码解析:

第一步,定义以下三个参数:

1.子女名称(childname ):

2.父母名称(parientname ):

3.区分左表和右表的一个标识符号(flag ):

  String childname = new String();

  String parientname = new String();

  String flag = new String();//左右表标识符


第二步,切割数据,分别得到子女名称和父母名称


  String str[] = ivalue.toString().split(" ");

  childname = str[0];//得到子名称

  parientname = str[1];//得到父名称


第三步,做两个key,value的输出,分别标识出左表和右表


           第一个:<父母名称,左表表标识符+子名称+父名称>

                                                  flag = "1";

                                                output.collect(new Text(parientname), new Text(flag + "+" + childname + "+" + parientname));

           第二个:<子女名称,右表表标识符+子名称+父名称>

                                                flag = "2";

                                                output.collect(new Text(childname), new Text(flag + "+" + childname + "+" + parientname));

第四步,mapper结果:


Alice  1+Terry+Alice

Alice  1+Jack+Alice

Alma   1+Mark+Alma

Alma   1+Philip+Alma

Ben    1+Lucy+Ben

Jack   2+Jack+Alice

Jack   1+Tom+Jack

Jack   1+Jone+Jack

Jack   2+Jack+Jesse

Jesse  1+Jack+Jesse

Jesse  1+Terry+Jesse

Jone   2+Jone+Lucy

Jone   2+Jone+Jack

Lucy   1+Tom+Lucy

Lucy   2+Lucy+Ben

Lucy   2+Lucy+Mary

Lucy   1+Jone+Lucy

Mark   2+Mark+Alma

Mark   2+Mark+Terry

Mary   1+Lucy+Mary

Philip 2+Philip+Terry

Philip 2+Philip+Alma

Terry  1+Philip+Terry

Terry  1+Mark+Terry

Terry  2+Terry+Alice

Terry  2+Terry+Jesse

Tom    2+Tom+Lucy

Tom    2+Tom+Jack

4.reduce代码如下:

            public static class ChildParentReduce extends MapReduceBase implements Reducer<Text, Text, Text, Text> {

                        private static Logger logger = Logger.getLogger(ChildParentReduce.class);

                        private int num = 0;

                        @Override

                        public void reduce(Text ikey, Iterator<Text> ivalue, OutputCollector<Text, Text> output, Reporter arg3)

                                                throws IOException {

                                    if (num == 0) {// 构造输出表头

                                                output.collect(new Text("grandchild"), new Text("grandparient"));

                                                num++;

                                    }

                                    int grandchildnum = 0;//多少个孙

                                    int grandparientnum = 0;//多少个爷

                                    String[] grandchild = new String[100];

                                    String[] grandparient = new String[100];

                                    while (ivalue.hasNext()){

                                                String[] record = ivalue.next().toString().split("\\+");//根据“+”把数据分成三份

                                                //左表数据

                                                if (record[0].compareTo("1") == 0) {

                                                            grandchild[grandchildnum] = record[1];//拿到子名,放到数组中

                                                            grandchildnum++;

                                                }

                                                //右表数据

                                                else if (record[0].compareTo("2") == 0) {

                                                            grandparient[grandparientnum] = record[2];//拿到父名,放到数组中

                                                            grandparientnum++;

                                                }

                                    }

                                    if (grandchildnum != 0 && grandparientnum != 0) {

                                    //执行笛卡尔乘积

                                                for (int i = 0; i < grandparientnum; i++) {

                                                            for (int j = 0; j < grandchildnum; j++) {

                                                                        logger.info(new Text(grandchild[i])+","+new Text(grandparient[j]));

                                                                        output.collect(new Text(grandchild[i]), new Text(grandparient[j]));

                                                            }

                                                }

                                    }

                        }


代码解析:

第一步:如果需要表头就在第一行输出表头

                                    if (num == 0) {// 构造输出表头

                                                output.collect(new Text("grandchild"), new Text("grandparient"));

                                                num++;

                                    }

第二步:定义四个参数,分别用于存放孙子和祖辈的数组,孙子的数量和祖辈的数量


                                    int grandchildnum = 0;//多少个孙

                                    int grandparientnum = 0;//多少个爷

                                    String[] grandchild = new String[100];

                                    String[] grandparient = new String[100];

第三步:解析map中得到的value-list

           第一:要解析的内容应该是这样的:以mapper的结果Lucy作为key,解析如下数据:

                

 

<Lucy, 1+Tom+Lucy,2+Lucy+Ben,2+Lucy+Mary,1+Jone+Lucy>


循环value

                                              //左表数据

                                                if (record[0].compareTo("1") == 0) {

                                                            grandchild[grandchildnum] = record[1];//拿到子名,放到数组中

                                                            grandchildnum++;

                                                }

孙子:Tom,Jone


                                                 //右表数据

                                                else if (record[0].compareTo("2") == 0) {

                                                            grandparient[grandparientnum] = record[2];//拿到父名,放到数组中

                                                            grandparientnum++;

                                                }

祖辈;Ben,Mary


使用笛卡尔乘积,得到祖辈与孙辈的关系结果:

                                    if (grandchildnum != 0 && grandparientnum != 0) {

                                    //执行笛卡尔乘积

                                                for (int i = 0; i < grandparientnum; i++) {

                                                            for (int j = 0; j < grandchildnum; j++) {

                                                                        logger.info(new Text(grandchild[i])+","+new Text(grandparient[j]));

                                                                        output.collect(new Text(grandchild[i]), new Text(grandparient[j]));

                                                            }

                                                }

                                    }


Tom,Ben

TomMary

Jone Ben

Jone Mary



附上main方法:

public static void main(String[] args) {

                                    try {

                                                String inputDir = "hdfs://192.168.1.61:9000/home/zhongml/childparent/input";

                                                String outputDir = "hdfs://192.168.1.61:9000/home/zhongml/childparent/output";

                                                JobConf con = new JobConf(ChildParent2.class);

                                                con.setJobName("childparent");

                                                con.setMapOutputKeyClass(Text.class);

                                                con.setMapOutputValueClass(Text.class);

                                                con.setOutputKeyClass(Text.class);

                                                con.setOutputValueClass(Text.class);

                                                con.setMapperClass(ChildParentMapper.class);

                                                con.setReducerClass(ChildParentReduce.class);

                                                con.setInputFormat(TextInputFormat.class);

                                                con.setOutputFormat(TextOutputFormat.class);

                                                FileInputFormat.setInputPaths(con, new Path(inputDir));

                                                FileOutputFormat.setOutputPath(con, new Path(outputDir));

                                                JobClient.runJob(con);

                                                System.exit(0);

                                    } catch (IllegalArgumentException e) {

                                                e.printStackTrace();

                                    } catch (IOException e) {

                                                e.printStackTrace();

                                    }

                        }

            }





                    






本文出自 “钟茂霖博客” 博客,请务必保留此出处http://zhongml.blog.51cto.com/4808277/1877330

案例3,mapreduce单表关联,根据child-parient表解析出grandchild-grandparient表