首页 > 代码库 > hadoop输入格式(InputFormat)
hadoop输入格式(InputFormat)
InputFormat接口里包括两个方法:getSplits()和createRecordReader(),这两个方法分别用来定义输入分片和读取分片的方法。
1 public abstract class InputFormat<K, V> { 2 3 /** 4 * Logically split the set of input files for the job. 5 * 6 * <p>Each {@link InputSplit} is then assigned to an individual {@link Mapper} 7 * for processing.</p> 8 * 9 * <p><i>Note</i>: The split is a <i>logical</i> split of the inputs and the10 * input files are not physically split into chunks. For e.g. a split could11 * be <i><input-file-path, start, offset></i> tuple. The InputFormat12 * also creates the {@link RecordReader} to read the {@link InputSplit}.13 * 14 * @param context job configuration.15 * @return an array of {@link InputSplit}s for the job.16 */17 public abstract 18 List<InputSplit> getSplits(JobContext context19 ) throws IOException, InterruptedException;20 21 /**22 * Create a record reader for a given split. The framework will call23 * {@link RecordReader#initialize(InputSplit, TaskAttemptContext)} before24 * the split is used.25 * @param split the split to be read26 * @param context the information about the task27 * @return a new record reader28 * @throws IOException29 * @throws InterruptedException30 */31 public abstract 32 RecordReader<K,V> createRecordReader(InputSplit split,33 TaskAttemptContext context34 ) throws IOException, 35 InterruptedException;36 37 }
撒发生
hadoop输入格式(InputFormat)
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。