Map/Reduce的类体系架构

首页 > 代码库 > Map/Reduce的类体系架构

2024-07-19 21:21:01 223人阅读

Map/Reduce的类体系架构

Map/Reduce案例解析:

　　先以简单的WordCount例程, 来讲解如何去描述Map/Reduce任务.

public static void main(String[] args) throws Exception {　　// *) 创建Configuration类, 用于获取Map/Reduce的执行环境　　Configuration conf = new Configuration();　　// *) 对命令行参数进行解析　　String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();　　if (otherArgs.length != 2) {　　　　System.err.println("Usage: wordcount <in> <out>");　　System.exit(2);　　}　　// *) 创建Job任务实例　　Job job = new Job(conf, "word count");　　job.setJarByClass(WordCount.class);　　// *) 设置Mapper类　　job.setMapperClass(TokenizerMapper.class);　　// *) 设置Combiner类　　job.setCombinerClass(IntSumReducer.class);　　// *) 设置Reducer类　　job.setReducerClass(IntSumReducer.class);　　// *) 设置输出结果的Key类型为Text　　job.setOutputKeyClass(Text.class);　　// *) 设置输出结果的Value类型为Text　　job.setOutputValueClass(IntWritable.class);　　// *) 设置InputFormat和OutputFormat的HDFS路径　　FileInputFormat.addInputPath(job, new Path(otherArgs[0]));　　FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));　　// *) 等待Map/Reduce任务结束　　System.exit(job.waitForCompletion(true) ? 0 : 1);}

　　评注: 具体的一个Job需要设置Mapper和Reducer类, 来决定如何处理数据. 而对于InputFormat/OutputFormat则决定了其数据输入/输出源.

Mapper类的解析
　　Mapper抽象类, 引入内部抽象类Context, 通过采用模板方法的设计模式.

public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {　　public abstract class Context　　　　　　implements MapContext<KEYIN,VALUEIN,KEYOUT,VALUEOUT> {　　}　　protected void setup(Context context) 　　　　　　throws IOException, InterruptedException {	　　}　　protected void map(KEYIN key, VALUEIN value, Context context) 　　　　　　throws IOException, InterruptedException {　　　　context.write((KEYOUT) key, (VALUEOUT) value);　　}　　protected void cleanup(Context context) 　　　　　　throws IOException, InterruptedException {　　}　　// *) 采用模板方法来实现　　public void run(Context context) 　　　　　　throws IOException, InterruptedException {　　}}

　　评注: setup扮演map初始化的工作, cleanup是map任务结束后的工作, 而map则是具体key/value对操作的处理函数.
　　来具体看下map函数中精华run函数的定义:

// *) map阶段的初始化工作setup(context);try {　　// *) 循环遍历key/value对　　while (context.nextKeyValue()) {　　　　// *) 进行map回调处理　　　　map(context.getCurrentKey(), 　　　　　　context.getCurrentValue(), context);　　}} finally {　　// *) map阶段的清除工作　　cleanup(context);}

　　评注: 采用类模板方法的设计模式(setup, map, cleanup, 通过run函数合理的串联)

InputFormat类的构成
　　InputFormat中最重要的两个类是InputSplit和RecordReader.
　　*) InputSplit: 是Map数据源的一个分片, 对应于一个具体map任务.
　　*) RecordReader: 针对一个具体的InputSplit, 封装的一个记录读取器.
　　具体代码如下所示:

public abstract class InputFormat<K, V> {　　// *) 获取InputSplit, 用于Map数据的拆分依据　　public abstract List<InputSplit> getSplits(JobContext context) 　　　　　　　　throws IOException, InterruptedException;　　// *) 针对InputSplit, 获取RecordReader类实例　　public abstract RecordReader<K,V> createRecordReader(　　　　　　InputSplit split, TaskAttemptContext context) 　　　　　　　　throws IOException, InterruptedException;}

　　评注: InputSplit数决定Map个数, 同时决定了数据的划分和规模, 而RecordReader则决定Key/Value的格式和具体数值. 这些概念对于数据的生成至关重要.

Reducer/OutputFormat
　　Reducer类和Mapper类定义类似, OutputFormat类与InputFormat类似, 简略之.

总结:

该文还没有完结, 先占个坑....

Map/Reduce的类体系架构

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > Map/Reduce的类体系架构

Map/Reduce的类体系架构

看完仍有疑问？有类似问题直接问程序猿