hadoop输入格式(InputFormat)

首页 > 代码库 > hadoop输入格式(InputFormat)

2024-07-24 19:05:53 220人阅读

　　InputFormat接口里包括两个方法：getSplits()和createRecordReader()，这两个方法分别用来定义输入分片和读取分片的方法。

 1 public abstract class InputFormat<K, V> { 2  3   /**  4    * Logically split the set of input files for the job.   5    *  6    * <p>Each {@link InputSplit} is then assigned to an individual {@link Mapper} 7    * for processing.</p> 8    * 9    * <p><i>Note</i>: The split is a <i>logical</i> split of the inputs and the10    * input files are not physically split into chunks. For e.g. a split could11    * be <i>&lt;input-file-path, start, offset&gt;</i> tuple. The InputFormat12    * also creates the {@link RecordReader} to read the {@link InputSplit}.13    * 14    * @param context job configuration.15    * @return an array of {@link InputSplit}s for the job.16    */17   public abstract 18     List<InputSplit> getSplits(JobContext context19                                ) throws IOException, InterruptedException;20   21   /**22    * Create a record reader for a given split. The framework will call23    * {@link RecordReader#initialize(InputSplit, TaskAttemptContext)} before24    * the split is used.25    * @param split the split to be read26    * @param context the information about the task27    * @return a new record reader28    * @throws IOException29    * @throws InterruptedException30    */31   public abstract 32     RecordReader<K,V> createRecordReader(InputSplit split,33                                          TaskAttemptContext context34                                         ) throws IOException, 35                                                  InterruptedException;36 37 }

撒发生

hadoop输入格式(InputFormat)

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们

首页 > 代码库 > hadoop输入格式(InputFormat)

hadoop输入格式(InputFormat)

看完仍有疑问？有类似问题直接问程序猿