首页 > 代码库 > Hello Giraph

Hello Giraph

Apache Giraph

http://blog.cloudera.com/blog/2014/02/how-to-write-and-run-giraph-jobs-on-hadoop/

http://blog.cloudera.com/blog/2014/05/how-to-manage-time-dependent-multilayer-networks-in-apache-hadoop/

大规模图数据专栏http://blog.csdn.net/column/details/big-data.html 

 

二进制包

$ vi .bashrc

GIRAPH_HOME=/home/hadoop/soft/giraph-1.1.0-hadoop-2.5.1

$ cd ~/soft/giraph-1.1.0-hadoop-2.5.1

hadoop jar $GIRAPH_HOME/giraph-examples-1.1.0-hadoop2.jar \

org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation \

-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \

-vip /input/giraph/tiny_graph.txt \

-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \

-op /output/giraph/shortestpaths \

-w 1 \

-yj giraph-core-1.1.0-hadoop2.jar \

-ca giraph.SplitMasterWorker=false

 

错误1: master/worker模式参数设置

注意由于是本机伪分布式模式如果没有添加-ca giraph.SplitMasterWorker=false由于不是master/worker模式运行会报错

Exception in thread "main" java.lang.IllegalArgumentException: checkLocalJobRunnerConfiguration: When using LocalJobRunner, you cannot run in split master / worker mode since there is only 1 task at a time!

at org.apache.giraph.job.GiraphJob.checkLocalJobRunnerConfiguration(GiraphJob.java:168)

at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:236)

at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:94)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

解决方案http://stackoverflow.com/questions/26175116/apache-giraph-cannot-run-in-split-master-worker-mode-since-there-is-only-1-t

给脚本添加-ca giraph.SplitMasterWorker=false参数

 

错误2: GiraphRunner没有加载

执行上面的命令如果没有添加-yj giraph-core-1.1.0-hadoop2.jar参数会报错没有加载到giraph的相关jar

Exception in thread "main" java.lang.ClassNotFoundException: org.apache.giraph.GiraphRunner

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:270)

at org.apache.hadoop.util.RunJar.main(RunJar.java:205)

 

错误3: Netty找不到

错误的解决方案1: 编写example.sh脚本

#!/bin/sh

LIB=./lib

for jar in $LIB/*.*

do HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$jar done

GLIB=./

for jar in $GLIB/*.*

do HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$jar done

echo $HADOOP_CLASSPATH

 

HADOOP_CLASSPATH=$HADOOP_CLASSPATH hadoop jar giraph-examples-1.1.0-hadoop2.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /input/giraph/tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /output/giraph/shortestpaths -w 1 -ca giraph.SplitMasterWorker=false

 

执行example.sh

15/01/15 16:43:50 INFO mapreduce.Job: Job job_1421308355863_0002 running in uber mode : false

15/01/15 16:43:50 INFO mapreduce.Job:  map 0% reduce 0%

15/01/15 16:43:50 INFO mapreduce.Job: Job job_1421308355863_0002 failed with state FAILED due to: Application application_1421308355863_0002 failed 2 times due to AM Container for appattempt_1421308355863_0002_000002 exited with  exitCode: 1 due to: Exception from container-launch.

Container id: container_1421308355863_0002_02_000001

Exit code: 1

Stack trace: ExitCodeException exitCode=1: 

at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)

at org.apache.hadoop.util.Shell.run(Shell.java:455)

at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702)

at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:196)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:299)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Container exited with a non-zero exit code 1

.Failing this attempt.. Failing the application.

15/01/15 16:43:50 INFO mapreduce.Job: Counters: 0

查看后台真正报错的内容

2015-01-15 16:43:49,911 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaster

java.lang.NoClassDefFoundError: io/netty/buffer/ByteBufAllocator

at org.apache.giraph.bsp.BspOutputFormat.getOutputCommitter(BspOutputFormat.java:62)

at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.createOutputCommitter(MRAppMaster.java:465)

at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceInit(MRAppMaster.java:368)

at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)

at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$1.run(MRAppMaster.java:1477)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)

at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1474)

at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1407)

Caused by: java.lang.ClassNotFoundException: io.netty.buffer.ByteBufAllocator

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

... 10 more

2015-01-15 16:43:49,916 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status 1

 

错误的解决方案2: 命令添加HADOOP_CLASSPATH

HADOOP_CLASSPATH=giraph-core-1.1.0-hadoop2.jar hadoop jar $GIRAPH_HOME/giraph-examples-1.1.0-hadoop2.jar \

org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation \

-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \

-vip /input/giraph/tiny_graph.txt \

-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \

-op /output/giraph/shortestpaths \

-w 1 \

-ca giraph.SplitMasterWorker=false

报错

Exception in thread "main" java.lang.NoClassDefFoundError: io/netty/buffer/ByteBufAllocator

at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:73)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:124)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Caused by: java.lang.ClassNotFoundException: io.netty.buffer.ByteBufAllocator

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

... 9 more

 

 

 

源码编译

$ cd ~/github/apache/giraph-release-1.1

mvn -Phadoop_yarn -Dhadoop.version=2.5.0 -DskipTests clean package

或者mvn -Phadoop_2 -Dhadoop.version=2.5.0 -DskipTests clean package

 

错误1: 新版本的hadoopSASL

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.0:compile 

(default-compile) on project giraph-core: Compilation failure: Compilation failure:

[ERROR] /home/hadoop/github/apache/giraph/giraph-core/target/munged/main/org/apache/giraph/comm/netty/SaslNettyServer.java:[105,62] 找不到符号

[ERROR] 符号:   变量 SASL_PROPS

[ERROR] 位置类 org.apache.hadoop.security.SaslRpcServer

[ERROR] /home/hadoop/github/apache/giraph/giraph-core/target/munged/main/org/apache/giraph/comm/netty/SaslNettyClient.java:[84,68] 找不到符号

[ERROR] 符号:   变量 SASL_PROPS

[ERROR] 位置类 org.apache.hadoop.security.SaslRpcServer

[ERROR] -> [Help 1]

[ERROR] 

解决办法删除profile=hadoop_yarnSASL标记,STATIC_SASL_SYMBOL

http://mail-archives.apache.org/mod_mbox/giraph-user/201501.mbox/%3CCAFJOoJdMdQmCFX-M-L-XiCY_M3FsvtEGS20C9huO6xhrUmYp8w@mail.gmail.com%3E  

 

错误2: 1.2.0-SNAPSHOTgiraphjar包不存在

[ERROR] Failed to execute goal on project giraph-dist: Could not resolve dependencies for project org.apache.giraph:giraph-dist:pom:1.2.0-SNAPSHOT: Could not find artifact org.apache.giraph:giraph-rexster-io:jar:1.2.0-SNAPSHOT in central (http://repo1.maven.org/maven2) -> [Help 1]

解决办法由于这是github上最新的snapshot版本, maven仓库没有这个jar

我们暂时下载giraph-1.1.0稳定版本然后重新编译

 

编译成功

[INFO] Copying files to /home/hadoop/github/apache/giraph-release-1.1/giraph-dist/target/giraph-1.1.0-for-hadoop-2.5.0-bin

[INFO] Building tar: /home/hadoop/github/apache/giraph-release-1.1/giraph-dist/target/giraph-1.1.0-for-hadoop-2.5.0-bin.tar.gz

[INFO] Copying files to /home/hadoop/github/apache/giraph-release-1.1/giraph-dist/target/giraph-1.1.0-for-hadoop-2.5.0-src

[INFO] Building tar: /home/hadoop/github/apache/giraph-release-1.1/giraph-dist/target/giraph-1.1.0-for-hadoop-2.5.0-src.tar.gz

[INFO] Building tar: /home/hadoop/github/apache/giraph-release-1.1/giraph-dist/target/giraph-1.1.0-for-hadoop-2.5.0-src.tar.bz2

[INFO] Building zip: /home/hadoop/github/apache/giraph-release-1.1/giraph-dist/target/giraph-1.1.0-for-hadoop-2.5.0-src.zip

[INFO] ------------------------------------------------------------------------

[INFO] Reactor Summary:

[INFO] 

[INFO] Apache Giraph Parent .............................. SUCCESS [2.582s]

[INFO] Apache Giraph Core ................................ SUCCESS [21.932s]

[INFO] Apache Giraph Examples ............................ SUCCESS [9.894s]

[INFO] Apache Giraph Distribution ........................ SUCCESS [15.915s]

[INFO] ------------------------------------------------------------------------

[INFO] BUILD SUCCESS

[INFO] ------------------------------------------------------------------------

[INFO] Total time: 50.560s

[INFO] Finished at: Fri Jan 16 08:32:47 CST 2015

[INFO] Final Memory: 92M/1430M

[INFO] ------------------------------------------------------------------------

 

错误3: 集群yarn-heap大小

运行测试用例:

hadoop jar giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \

org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation \

-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \

-vip /input/giraph/tiny_graph.txt \

-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \

-op /output/giraph/shortestpaths \

-w 1

15/01/16 08:53:25 INFO yarn.GiraphYarnClient: Final output path is: hdfs://localhost:9000/output/giraph/shortestpaths

15/01/16 08:53:25 INFO yarn.GiraphYarnClient: Running Client

15/01/16 08:53:25 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032

15/01/16 08:53:25 INFO yarn.GiraphYarnClient: Defaulting per-task heap size to 1024MB.

Exception in thread "main" java.lang.IllegalStateException: Giraph‘s estimated cluster heap 2048MB ask is greater than the current available cluster heap of 0MB. Aborting Job.

at org.apache.giraph.yarn.GiraphYarnClient.checkPerNodeResourcesAvailable(GiraphYarnClient.java:230)

at org.apache.giraph.yarn.GiraphYarnClient.run(GiraphYarnClient.java:124)

at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:96)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)

at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:126)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

 

解决办法添加参数: -yh 2048

参考文档https://www.mail-archive.com/user@giraph.apache.org/msg02192.html 

hadoop jar giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \

org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation \

-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \

-vip /input/giraph/tiny_graph.txt \

-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \

-op /output/giraph/shortestpaths \

-w 1 \

-yh 2048

 

错误4: GiraphApplicationMaster找不到

15/01/16 08:57:41 ERROR yarn.GiraphYarnClient: Giraph: org.apache.giraph.examples.SimpleShortestPathsComputation reports FAILED state, diagnostics show: Application application_1421369592714_0002 failed 2 times due to AM Container for appattempt_1421369592714_0002_000002 exited with  exitCode: 1 due to: Exception from container-launch.

Container id: container_1421369592714_0002_02_000001

Exit code: 1

Stack trace: ExitCodeException exitCode=1: 

 

真正的错误访问http://localhost:8088/cluster/app/application_1421369592714_0002 点击logs

或者在hadoop配置的yarn目录/home/hadoop/data/cdh520/yarn/logs/application_1421369592714_0002/

错误找不到或无法加载主类 org.apache.giraph.yarn.GiraphApplicationMaster

实际上错误是

Exception in thread "main" java.lang.NoClassDefFoundError:

org/apache/giraph/yarn/GiraphApplicationMaster

Caused by: java.lang.ClassNotFoundException:

org.apache.giraph.yarn.GiraphApplicationMaster

但是giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar已经包含了所有依赖的jar包了

 

这篇文章说添加参数-yj但是还是不行然后说到使用外部的zookeeper

http://mail-archives.apache.org/mod_mbox/giraph-user/201311.mbox/%3CCEAF98C2.2CC44%25rvesse@dotnetrdf.org%3E 

hadoop jar giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \

org.apache.giraph.GiraphRunner \

org.apache.giraph.examples.SimpleShortestPathsComputation \

-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \

-vip /input/giraph/tiny_graph.txt \

-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \

-op /output/giraph/shortestpaths \

-w 1 \

-yh 2048 \

-yj giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar

 

错误5: 外部ZooKeeper

使用外部zookeeper: 命令行添加参数-Dgiraph.zkList=localhost:2181

报错Exception in thread "main" org.apache.commons.cli.UnrecognizedOptionException: Unrecognized option: -Dgiraph.zkList=localhost:2181

 

参考前面的-ca giraph.SplitMasterWorker=false应该使用-ca配置参数

hadoop jar giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \

org.apache.giraph.GiraphRunner \

org.apache.giraph.examples.SimpleShortestPathsComputation \

-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \

-vip /input/giraph/tiny_graph.txt \

-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \

-op /output/giraph/shortestpaths \

-w 1 \

-yh 2048 \

-yj giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \

-ca giraph.zkList=localhost:2181

错误6: HADOOP_CLASSPATH

按照这篇文章http://mail-archives.apache.org/mod_mbox/giraph-user/201408.mbox/%3C53EA4CA8.1070005@web.de%3E

添加jar包到HADOOP_HOME下的方式由于修改hadoop-env可能要重启所以手动在命令前添加HADOOP_CLASSPATH

HADOOP_CLASSPATH=giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \

hadoop jar giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \

org.apache.giraph.GiraphRunner \

org.apache.giraph.examples.SimpleShortestPathsComputation \

-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \

-vip /input/giraph/tiny_graph.txt \

-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \

-op /output/giraph/shortestpaths \

-w 1 \

-yh 2048 \

-yj giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \

-ca giraph.SplitMasterWorker=false

 

还是不行停掉hadoop, 修改etc/hadoop/hadoop-env.sh, 并重启hadoop... 报错的内容和二进制包中的一样说netty找不到

for f in $HADOOP_HOME/share/giraph/*.jar; do

  if [ "$HADOOP_CLASSPATH" ]; then

    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f

  else

    export HADOOP_CLASSPATH=$f

  fi

done

 

停掉hadoop, 删除掉上面hadoop-env.sh的配置重启hadoopzookeeper

 

测试用例

使用-Phadoop_2编译而不是前面的-Phadoop_yarn

mvn -Phadoop_2 -Dhadoop.version=2.5.0 -DskipTests clean package

 

hadoop jar giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \

org.apache.giraph.GiraphRunner \

org.apache.giraph.examples.SimpleShortestPathsComputation \

-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \

-vip /input/giraph/tiny_graph.txt \

-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \

-op /output/giraph/shortestpaths \

-w 1 \

-yh 2048 \

-yj giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar, \

giraph-core/target/giraph-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \

-ca giraph.SplitMasterWorker=false

 

hadoop jar giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.5.0-jar-with-dependencies.jar \

org.apache.giraph.GiraphRunner \

org.apache.giraph.examples.SimpleShortestPathsComputation \

-vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \

-vip /input/giraph/tiny_graph.txt \

-vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \

-op /output/giraph/shortestpaths2 \

-w 1 \

-yh 2048 \

-ca giraph.SplitMasterWorker=false

 

成功运行:

15/01/16 10:34:18 INFO job.GiraphJob: Waiting for resources... Job will start only when it gets all 2 mappers

15/01/16 10:34:32 INFO job.HaltApplicationUtils$DefaultHaltInstructionsWriter: writeHaltInstructions: To halt after next superstep execute: ‘bin/halt-application --zkServer localhost:22181 --zkNode /_hadoopBsp/job_1421375482928_0002/_haltComputation‘

15/01/16 10:34:32 INFO mapreduce.Job: Running job: job_1421375482928_0002

15/01/16 10:34:33 INFO mapreduce.Job: Job job_1421375482928_0002 running in uber mode : false

15/01/16 10:34:33 INFO mapreduce.Job:  map 100% reduce 0%

15/01/16 10:34:41 INFO mapreduce.Job: Job job_1421375482928_0002 completed successfully

15/01/16 10:34:41 INFO mapreduce.Job: Counters: 53

File System Counters

FILE: Number of bytes read=0

FILE: Number of bytes written=102546

FILE: Number of read operations=0

FILE: Number of large read operations=0

FILE: Number of write operations=0

HDFS: Number of bytes read=156

HDFS: Number of bytes written=30

HDFS: Number of read operations=17

HDFS: Number of large read operations=0

HDFS: Number of write operations=9

Job Counters 

Launched map tasks=1

Other local map tasks=1

Total time spent by all maps in occupied slots (ms)=15315

Total time spent by all reduces in occupied slots (ms)=0

Total time spent by all map tasks (ms)=15315

Total vcore-seconds taken by all map tasks=15315

Total megabyte-seconds taken by all map tasks=7841280

Map-Reduce Framework

Map input records=1

Map output records=0

Input split bytes=44

Spilled Records=0

Failed Shuffles=0

Merged Map outputs=0

GC time elapsed (ms)=38

CPU time spent (ms)=2460

Physical memory (bytes) snapshot=177041408

Virtual memory (bytes) snapshot=744656896

Total committed heap usage (bytes)=191365120

Giraph Stats

Aggregate edges=12

Aggregate finished vertices=5

Aggregate sent message message bytes=267

Aggregate sent messages=12

Aggregate vertices=5

Current master task partition=0

Current workers=1

Last checkpointed superstep=0

Sent message bytes=0

Sent messages=0

Superstep=4

Giraph Timers

Initialize (ms)=142

Input superstep (ms)=108

Setup (ms)=24

Shutdown (ms)=8846

Superstep 0 SimpleShortestPathsComputation (ms)=47

Superstep 1 SimpleShortestPathsComputation (ms)=52

Superstep 2 SimpleShortestPathsComputation (ms)=33

Superstep 3 SimpleShortestPathsComputation (ms)=29

Total (ms)=9142

Zookeeper base path

/_hadoopBsp/job_1421375482928_0002=0

Zookeeper halt node

/_hadoopBsp/job_1421375482928_0002/_haltComputation=0

Zookeeper server:port

localhost:22181=0

File Input Format Counters 

Bytes Read=0

File Output Format Counters 

Bytes Written=0

 

下面2job第一个成功运行使用的是mvn -Phadoop_2 -Dhadoop.version=2.5.0 -DskipTests clean package

而第二个运行失败的使用的编译命令是mvn -Phadoop_yarn -Dhadoop.version=2.5.0 -DskipTests clean package

可以看出2JobApplicationType分别是MapReduceYARN

技术分享


查看pom.xml中关于hadoop_2hadoop_yarn的说明:

hadoop_2: Help keep future Hadoop versions munge-free:

         All profiles below are munge-free: avoid introducing any munge flags on any of the following profiles.

hadoop_yarn: This profile runs on Hadoop-2.0.3-alpha by default, but does not use Hadoop MapReduce v2 to set up the Giraph job. 

      This means the Giraph worker/master tasks are not Mappers. Tasks are run in YARN-managed execution

      containers. Internally, the Giraph framework continues to depend on many Hadoop MapReduce classes to perform work. 

 

查看运行结果

hadoop@hadoop:~/github/apache/giraph-release-1.1$ hadoop fs -ls /output/giraph/shortestpaths2

-rw-r--r--   3 hadoop supergroup          0 2015-01-16 11:08 /output/giraph/shortestpaths2/_SUCCESS

-rw-r--r--   3 hadoop supergroup         30 2015-01-16 11:08 /output/giraph/shortestpaths2/part-m-00000

hadoop@hadoop:~/github/apache/giraph-release-1.1$ hadoop fs -cat /output/giraph/shortestpaths2/part-m-00000

15/01/16 11:19:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

01.0

22.0

10.0

31.0

45.0


Hello Giraph