首页 > 代码库 > 利用hadoop自带程序运行wordcount

利用hadoop自带程序运行wordcount

1.启动hadoop守护进程

   bin/hadoop start-all.sh

2.在hadoop的bin目录下建立一个input文件夹

   mkdire input

3.进入input目录之后,在input目录下新建两个文本文件,并想其写入内容

  echo "hello excuse me fuck thank you">test1.txt

  echo "hello how do you do thank you">test2.txt

4.进入hadoop的bin目录,输入jps命令,确认hadoop已经跑起来了

2195 SecondaryNameNode2245 JobTracker2055 NameNode2664 Jps2314 TaskTracker2123 DataNode

5.把input文件上传到hdfs上

JIAS-MacBook-Pro:hadoop-0.20.2 jia$ bin/hadoop dfs -put input in

6.查看hdfs上的项目

JIAS-MacBook-Pro:hadoop-0.20.2 jia$ bin/hadoop dfs -ls inFound 3 items-rw-r--r--   1 jia supergroup       6148 2014-07-16 22:56 /user/jia/in/.DS_Store-rw-r--r--   1 jia supergroup         18 2014-07-16 22:56 /user/jia/in/tex1.txt-rw-r--r--   1 jia supergroup         22 2014-07-16 22:56 /user/jia/in/tex2.txt

7.利用自带的wordcount执行

JIAS-MacBook-Pro:hadoop-0.20.2 jia$ bin/hadoop jar hadoop-0.20.2-examples.jar wordcount in put14/07/16 23:06:52 INFO input.FileInputFormat: Total input paths to process : 214/07/16 23:06:52 INFO mapred.JobClient: Running job: job_201407162246_000114/07/16 23:06:53 INFO mapred.JobClient:  map 0% reduce 0%14/07/16 23:07:03 INFO mapred.JobClient:  map 100% reduce 0%14/07/16 23:07:15 INFO mapred.JobClient:  map 100% reduce 100%14/07/16 23:07:17 INFO mapred.JobClient: Job complete: job_201407162246_000114/07/16 23:07:17 INFO mapred.JobClient: Counters: 1714/07/16 23:07:17 INFO mapred.JobClient:   Map-Reduce Framework14/07/16 23:07:17 INFO mapred.JobClient:     Combine output records=714/07/16 23:07:17 INFO mapred.JobClient:     Spilled Records=1414/07/16 23:07:17 INFO mapred.JobClient:     Reduce input records=714/07/16 23:07:17 INFO mapred.JobClient:     Reduce output records=414/07/16 23:07:17 INFO mapred.JobClient:     Map input records=214/07/16 23:07:17 INFO mapred.JobClient:     Map output records=714/07/16 23:07:17 INFO mapred.JobClient:     Map output bytes=6814/07/16 23:07:17 INFO mapred.JobClient:     Reduce shuffle bytes=5214/07/16 23:07:17 INFO mapred.JobClient:     Combine input records=714/07/16 23:07:17 INFO mapred.JobClient:     Reduce input groups=414/07/16 23:07:17 INFO mapred.JobClient:   FileSystemCounters14/07/16 23:07:17 INFO mapred.JobClient:     HDFS_BYTES_READ=4014/07/16 23:07:17 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=24614/07/16 23:07:17 INFO mapred.JobClient:     FILE_BYTES_READ=8814/07/16 23:07:17 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=3014/07/16 23:07:17 INFO mapred.JobClient:   Job Counters 14/07/16 23:07:17 INFO mapred.JobClient:     Launched map tasks=214/07/16 23:07:17 INFO mapred.JobClient:     Launched reduce tasks=114/07/16 23:07:17 INFO mapred.JobClient:     Data-local map tasks=2