首页 > 代码库 > hadoop输入格式(InputFormat)

hadoop输入格式(InputFormat)

  InputFormat接口里包括两个方法:getSplits()和createRecordReader(),这两个方法分别用来定义输入分片和读取分片的方法。 

 1 public abstract class InputFormat<K, V> { 2  3   /**  4    * Logically split the set of input files for the job.   5    *  6    * <p>Each {@link InputSplit} is then assigned to an individual {@link Mapper} 7    * for processing.</p> 8    * 9    * <p><i>Note</i>: The split is a <i>logical</i> split of the inputs and the10    * input files are not physically split into chunks. For e.g. a split could11    * be <i>&lt;input-file-path, start, offset&gt;</i> tuple. The InputFormat12    * also creates the {@link RecordReader} to read the {@link InputSplit}.13    * 14    * @param context job configuration.15    * @return an array of {@link InputSplit}s for the job.16    */17   public abstract 18     List<InputSplit> getSplits(JobContext context19                                ) throws IOException, InterruptedException;20   21   /**22    * Create a record reader for a given split. The framework will call23    * {@link RecordReader#initialize(InputSplit, TaskAttemptContext)} before24    * the split is used.25    * @param split the split to be read26    * @param context the information about the task27    * @return a new record reader28    * @throws IOException29    * @throws InterruptedException30    */31   public abstract 32     RecordReader<K,V> createRecordReader(InputSplit split,33                                          TaskAttemptContext context34                                         ) throws IOException, 35                                                  InterruptedException;36 37 }

 

 

 

 

撒发生

hadoop输入格式(InputFormat)