首页 > 代码库 > Hello Giraph
Hello Giraph
Apache Giraph
http://blog.cloudera.com/blog/2014/02/how-to-write-and-run-giraph-jobs-on-hadoop/
http://blog.cloudera.com/blog/2014/05/how-to-manage-time-dependent-multilayer-networks-in-apache-hadoop/
大规模图数据专栏: http://blog.csdn.net/column/details/big-data.html
二进制包
$ vi .bashrc
GIRAPH_HOME=/home/hadoop/soft/giraph-1.1.0-hadoop-2.5.1
$ cd ~/soft/giraph-1.1.0-hadoop-2.5.1
$ hadoop jar $GIRAPH_HOME/giraph-examples-1.1.0-hadoop2.jar \
org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /input/giraph/tiny_graph.txt \
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /output/giraph/shortestpaths \
-w 1 \
-yj giraph-core-1.1.0-hadoop2.jar \
-ca giraph.SplitMasterWorker=false
错误1: master/worker模式参数设置
注意: 由于是本机伪分布式模式, 如果没有添加-ca giraph.SplitMasterWorker=false. 由于不是master/worker模式运行会报错:
Exception in thread "main" java.lang.IllegalArgumentException: checkLocalJobRunnerConfiguration: When using LocalJobRunner, you cannot run in split master / worker mode since there is only 1 task at a time!
at org.apache.giraph.job.GiraphJob.checkLocalJobRunnerConfiguration(GiraphJob.java:168)
at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:236)
at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:94)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
解决方案: http://stackoverflow.com/questions/26175116/apache-giraph-cannot-run-in-split-master-worker-mode-since-there-is-only-1-t
给脚本添加-ca giraph.SplitMasterWorker=false参数
错误2: GiraphRunner没有加载
执行上面的命令, 如果没有添加-yj giraph-core-1.1.0-hadoop2.jar参数会报错: 没有加载到giraph的相关jar包
Exception in thread "main" java.lang.ClassNotFoundException: org.apache.giraph.GiraphRunner
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.main(RunJar.java:205)
错误3: Netty找不到
错误的解决方案1: 编写example.sh脚本
#!/bin/sh
LIB=./lib
for jar in $LIB/*.*
do HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$jar done
GLIB=./
for jar in $GLIB/*.*
do HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$jar done
echo $HADOOP_CLASSPATH
HADOOP_CLASSPATH=$HADOOP_CLASSPATH hadoop jar giraph-examples-1.1.0-hadoop2.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /input/giraph/tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /output/giraph/shortestpaths -w 1 -ca giraph.SplitMasterWorker=false
执行example.sh
15/01/15 16:43:50 INFO mapreduce.Job: Job job_1421308355863_0002 running in uber mode : false
15/01/15 16:43:50 INFO mapreduce.Job: map 0% reduce 0%
15/01/15 16:43:50 INFO mapreduce.Job: Job job_1421308355863_0002 failed with state FAILED due to: Application application_1421308355863_0002 failed 2 times due to AM Container for appattempt_1421308355863_0002_000002 exited with exitCode: 1 due to: Exception from container-launch.
Container id: container_1421308355863_0002_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1
.Failing this attempt.. Failing the application.
15/01/15 16:43:50 INFO mapreduce.Job: Counters: 0
查看后台真正报错的内容:
2015-01-15 16:43:49,911 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster
java.lang.NoClassDefFoundError: io/netty/buffer/ByteBufAllocator
at org.apache.giraph.bsp.BspOutputFormat.getOutputCommitter(BspOutputFormat.java:62)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:465)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:368)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1477)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1474)
at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1407)
Caused by: java.lang.ClassNotFoundException: io.netty.buffer.ByteBufAllocator
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 10 more
2015-01-15 16:43:49,916 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1
错误的解决方案2: 命令添加HADOOP_CLASSPATH
HADOOP_CLASSPATH=giraph-core-1.1.0-hadoop2.jar hadoop jar $GIRAPH_HOME/giraph-examples-1.1.0-hadoop2.jar \
org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /input/giraph/tiny_graph.txt \
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /output/giraph/shortestpaths \
-w 1 \
-ca giraph.SplitMasterWorker=false
报错:
Exception in thread "main" java.lang.NoClassDefFoundError: io/netty/buffer/ByteBufAllocator
at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:73)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Caused by: java.lang.ClassNotFoundException: io.netty.buffer.ByteBufAllocator
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 9 more
源码编译
$ cd ~/github/apache/giraph-release-1.1
$ mvn -Phadoop_yarn -Dhadoop.version=2.5.0 -DskipTests clean package
或者mvn -Phadoop_2 -Dhadoop.version=2.5.0 -DskipTests clean package
错误1: 新版本的hadoop的SASL
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.0:compile
(default-compile) on project giraph-core: Compilation failure: Compilation failure:
[ERROR] /home/hadoop/github/apache/giraph/giraph-core/target/munged/main/org/apache/giraph/comm/netty/SaslNettyServer.java:[105,62] 找不到符号
[ERROR] 符号: 变量 SASL_PROPS
[ERROR] 位置: 类 org.apache.hadoop.security.SaslRpcServer
[ERROR] /home/hadoop/github/apache/giraph/giraph-core/target/munged/main/org/apache/giraph/comm/netty/SaslNettyClient.java:[84,68] 找不到符号
[ERROR] 符号: 变量 SASL_PROPS
[ERROR] 位置: 类 org.apache.hadoop.security.SaslRpcServer
[ERROR] -> [Help 1]
[ERROR]
解决办法: 删除profile=hadoop_yarn的SASL标记: ,STATIC_SASL_SYMBOL
http://mail-archives.apache.org/mod_mbox/giraph-user/201501.mbox/%3CCAFJOoJdMdQmCFX-M-L-XiCY_M3FsvtEGS20C9huO6xhrUmYp8w@mail.gmail.com%3E
错误2: 1.2.0-SNAPSHOT的giraph的jar包不存在
[ERROR] Failed to execute goal on project giraph-dist: Could not resolve dependencies for project org.apache.giraph:giraph-dist:pom:1.2.0-SNAPSHOT: Could not find artifact org.apache.giraph:giraph-rexster-io:jar:1.2.0-SNAPSHOT in central (http://repo1.maven.org/maven2) -> [Help 1]
解决办法: 由于这是github上最新的snapshot版本, maven仓库没有这个jar包.
我们暂时下载giraph-1.1.0稳定版本. 然后重新编译.
编译成功
[INFO] Copying files to /home/hadoop/github/apache/giraph-release-1.1/giraph-dist/target/giraph-1.1.0-for-hadoop-2.5.0-bin
[INFO] Building tar: /home/hadoop/github/apache/giraph-release-1.1/giraph-dist/target/giraph-1.1.0-for-hadoop-2.5.0-bin.tar.gz
[INFO] Copying files to /home/hadoop/github/apache/giraph-release-1.1/giraph-dist/target/giraph-1.1.0-for-hadoop-2.5.0-src
[INFO] Building tar: /home/hadoop/github/apache/giraph-release-1.1/giraph-dist/target/giraph-1.1.0-for-hadoop-2.5.0-src.tar.gz
[INFO] Building tar: /home/hadoop/github/apache/giraph-release-1.1/giraph-dist/target/giraph-1.1.0-for-hadoop-2.5.0-src.tar.bz2
[INFO] Building zip: /home/hadoop/github/apache/giraph-release-1.1/giraph-dist/target/giraph-1.1.0-for-hadoop-2.5.0-src.zip
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Apache Giraph Parent .............................. SUCCESS [2.582s]
[INFO] Apache Giraph Core ................................ SUCCESS [21.932s]
[INFO] Apache Giraph Examples ............................ SUCCESS [9.894s]
[INFO] Apache Giraph Distribution ........................ SUCCESS [15.915s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 50.560s
[INFO] Finished at: Fri Jan 16 08:32:47 CST 2015
[INFO] Final Memory: 92M/1430M
[INFO] ------------------------------------------------------------------------
错误3: 集群yarn-heap大小
运行测试用例:
$ hadoop jar giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /input/giraph/tiny_graph.txt \
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /output/giraph/shortestpaths \
-w 1
15/01/16 08:53:25 INFO yarn.GiraphYarnClient: Final output path is: hdfs://localhost:9000/output/giraph/shortestpaths
15/01/16 08:53:25 INFO yarn.GiraphYarnClient: Running Client
15/01/16 08:53:25 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
15/01/16 08:53:25 INFO yarn.GiraphYarnClient: Defaulting per-task heap size to 1024MB.
Exception in thread "main" java.lang.IllegalStateException: Giraph‘s estimated cluster heap 2048MB ask is greater than the current available cluster heap of 0MB. Aborting Job.
at org.apache.giraph.yarn.GiraphYarnClient.checkPerNodeResourcesAvailable(GiraphYarnClient.java:230)
at org.apache.giraph.yarn.GiraphYarnClient.run(GiraphYarnClient.java:124)
at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:96)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:126)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
解决办法: 添加参数: -yh 2048
参考文档: https://www.mail-archive.com/user@giraph.apache.org/msg02192.html
$ hadoop jar giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /input/giraph/tiny_graph.txt \
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /output/giraph/shortestpaths \
-w 1 \
-yh 2048
错误4: GiraphApplicationMaster找不到
15/01/16 08:57:41 ERROR yarn.GiraphYarnClient: Giraph: org.apache.giraph.examples.SimpleShortestPathsComputation reports FAILED state, diagnostics show: Application application_1421369592714_0002 failed 2 times due to AM Container for appattempt_1421369592714_0002_000002 exited with exitCode: 1 due to: Exception from container-launch.
Container id: container_1421369592714_0002_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
真正的错误访问: http://localhost:8088/cluster/app/application_1421369592714_0002 点击logs
或者在hadoop配置的yarn目录: /home/hadoop/data/cdh520/yarn/logs/application_1421369592714_0002/
错误: 找不到或无法加载主类 org.apache.giraph.yarn.GiraphApplicationMaster
实际上错误是:
Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/giraph/yarn/GiraphApplicationMaster
Caused by: java.lang.ClassNotFoundException:
org.apache.giraph.yarn.GiraphApplicationMaster
但是giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar已经包含了所有依赖的jar包了
这篇文章说添加参数: -yj, 但是还是不行. 然后说到使用外部的zookeeper
http://mail-archives.apache.org/mod_mbox/giraph-user/201311.mbox/%3CCEAF98C2.2CC44%25rvesse@dotnetrdf.org%3E
hadoop jar giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner \
org.apache.giraph.examples.SimpleShortestPathsComputation \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /input/giraph/tiny_graph.txt \
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /output/giraph/shortestpaths \
-w 1 \
-yh 2048 \
-yj giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar
错误5: 外部ZooKeeper
使用外部zookeeper: 命令行添加参数-Dgiraph.zkList=localhost:2181
报错: Exception in thread "main" org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -Dgiraph.zkList=localhost:2181
参考前面的-ca giraph.SplitMasterWorker=false, 应该使用-ca配置参数
hadoop jar giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner \
org.apache.giraph.examples.SimpleShortestPathsComputation \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /input/giraph/tiny_graph.txt \
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /output/giraph/shortestpaths \
-w 1 \
-yh 2048 \
-yj giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \
-ca giraph.zkList=localhost:2181
错误6: HADOOP_CLASSPATH
按照这篇文章: http://mail-archives.apache.org/mod_mbox/giraph-user/201408.mbox/%3C53EA4CA8.1070005@web.de%3E
添加jar包到HADOOP_HOME下的方式, 由于修改hadoop-env可能要重启, 所以手动在命令前添加HADOOP_CLASSPATH
HADOOP_CLASSPATH=giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \
hadoop jar giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner \
org.apache.giraph.examples.SimpleShortestPathsComputation \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /input/giraph/tiny_graph.txt \
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /output/giraph/shortestpaths \
-w 1 \
-yh 2048 \
-yj giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \
-ca giraph.SplitMasterWorker=false
还是不行, 停掉hadoop, 修改etc/hadoop/hadoop-env.sh, 并重启hadoop... 报错的内容和二进制包中的一样说netty找不到
for f in $HADOOP_HOME/share/giraph/*.jar; do
if [ "$HADOOP_CLASSPATH" ]; then
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
else
export HADOOP_CLASSPATH=$f
fi
done
停掉hadoop, 删除掉上面hadoop-env.sh的配置, 重启hadoop和zookeeper
测试用例
使用-Phadoop_2编译, 而不是前面的-Phadoop_yarn
mvn -Phadoop_2 -Dhadoop.version=2.5.0 -DskipTests clean package
hadoop jar giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner \
org.apache.giraph.examples.SimpleShortestPathsComputation \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /input/giraph/tiny_graph.txt \
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /output/giraph/shortestpaths \
-w 1 \
-yh 2048 \
-yj giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar, \
giraph-core/target/giraph-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \
-ca giraph.SplitMasterWorker=false
hadoop jar giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \
org.apache.giraph.GiraphRunner \
org.apache.giraph.examples.SimpleShortestPathsComputation \
-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
-vip /input/giraph/tiny_graph.txt \
-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
-op /output/giraph/shortestpaths2 \
-w 1 \
-yh 2048 \
-ca giraph.SplitMasterWorker=false
成功运行:
15/01/16 10:34:18 INFO job.GiraphJob: Waiting for resources... Job will start only when it gets all 2 mappers
15/01/16 10:34:32 INFO job.HaltApplicationUtils$DefaultHaltInstructionsWriter: writeHaltInstructions: To halt after next superstep execute: ‘bin/halt-application --zkServer localhost:22181 --zkNode /_hadoopBsp/job_1421375482928_0002/_haltComputation‘
15/01/16 10:34:32 INFO mapreduce.Job: Running job: job_1421375482928_0002
15/01/16 10:34:33 INFO mapreduce.Job: Job job_1421375482928_0002 running in uber mode : false
15/01/16 10:34:33 INFO mapreduce.Job: map 100% reduce 0%
15/01/16 10:34:41 INFO mapreduce.Job: Job job_1421375482928_0002 completed successfully
15/01/16 10:34:41 INFO mapreduce.Job: Counters: 53
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=102546
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=156
HDFS: Number of bytes written=30
HDFS: Number of read operations=17
HDFS: Number of large read operations=0
HDFS: Number of write operations=9
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=15315
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=15315
Total vcore-seconds taken by all map tasks=15315
Total megabyte-seconds taken by all map tasks=7841280
Map-Reduce Framework
Map input records=1
Map output records=0
Input split bytes=44
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=38
CPU time spent (ms)=2460
Physical memory (bytes) snapshot=177041408
Virtual memory (bytes) snapshot=744656896
Total committed heap usage (bytes)=191365120
Giraph Stats
Aggregate edges=12
Aggregate finished vertices=5
Aggregate sent message message bytes=267
Aggregate sent messages=12
Aggregate vertices=5
Current master task partition=0
Current workers=1
Last checkpointed superstep=0
Sent message bytes=0
Sent messages=0
Superstep=4
Giraph Timers
Initialize (ms)=142
Input superstep (ms)=108
Setup (ms)=24
Shutdown (ms)=8846
Superstep 0 SimpleShortestPathsComputation (ms)=47
Superstep 1 SimpleShortestPathsComputation (ms)=52
Superstep 2 SimpleShortestPathsComputation (ms)=33
Superstep 3 SimpleShortestPathsComputation (ms)=29
Total (ms)=9142
Zookeeper base path
/_hadoopBsp/job_1421375482928_0002=0
Zookeeper halt node
/_hadoopBsp/job_1421375482928_0002/_haltComputation=0
Zookeeper server:port
localhost:22181=0
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
下面2个job第一个成功运行使用的是mvn -Phadoop_2 -Dhadoop.version=2.5.0 -DskipTests clean package
而第二个运行失败的使用的编译命令是: mvn -Phadoop_yarn -Dhadoop.version=2.5.0 -DskipTests clean package
可以看出2个Job的ApplicationType分别是MapReduce和YARN
查看pom.xml中关于hadoop_2和hadoop_yarn的说明:
hadoop_2: Help keep future Hadoop versions munge-free:
All profiles below are munge-free: avoid introducing any munge flags on any of the following profiles.
hadoop_yarn: This profile runs on Hadoop-2.0.3-alpha by default, but does not use Hadoop MapReduce v2 to set up the Giraph job.
This means the Giraph worker/master tasks are not Mappers. Tasks are run in YARN-managed execution
containers. Internally, the Giraph framework continues to depend on many Hadoop MapReduce classes to perform work.
查看运行结果
hadoop@hadoop:~/github/apache/giraph-release-1.1$ hadoop fs -ls /output/giraph/shortestpaths2
-rw-r--r-- 3 hadoop supergroup 0 2015-01-16 11:08 /output/giraph/shortestpaths2/_SUCCESS
-rw-r--r-- 3 hadoop supergroup 30 2015-01-16 11:08 /output/giraph/shortestpaths2/part-m-00000
hadoop@hadoop:~/github/apache/giraph-release-1.1$ hadoop fs -cat /output/giraph/shortestpaths2/part-m-00000
15/01/16 11:19:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
01.0
22.0
10.0
31.0
45.0
Hello Giraph