首页 > 代码库 > Hello Tez

Hello Tez

Tez

http://www.infoq.com/cn/articles/apache-tez-saha-murthy  

http://hortonworks.com/blog/apache-tez-a-new-chapter-in-hadoop-data-processing/ 

http://www.cnblogs.com/fxjwind/p/3377695.html 

http://zcdeng.iteye.com/blog/1897208 

http://blog.sequenceiq.com/blog/2014/09/23/topn-on-apache-tez/ 

 

1) 编译protobuffer

$ tar zxf protobuf-2.5.0.tar.gz

$ cd protobuf-2.5.0

$ ./configure && make && sudo make install

$ protoc --version

libprotoc 2.5.0

2) 修改nodejsnpm的版本

由于本机已经安装了node.jsnpm

hadoop@hadoop:~$ node --version

v0.10.33

hadoop@hadoop:~$ npm --version

1.4.28

为了和版本一致修改tez-uipom.xml. 试过注释掉<!--NPM Install-->那段代码但是会报错

  <properties>

    <webappDir>src/main/webapp</webappDir>

    <!-- /usr/local/bin/node -->

    <node.executable>${basedir}/src/main/webapp/node/node</node.executable>

    <fileName>${artifactId}-${parent.version}</fileName>

    <nodeVersion>v0.10.33</nodeVersion> <!--v0.10.18  1.3.8-->

    <npmVersion>1.4.28</npmVersion>

  </properties>

3) 编译tez

hadoop@hadoop:~/github/apache/tez$ mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true

技术分享

技术分享

Tez on YARN

http://blog.woopi.org/wordpress/?p=96 

http://hadooptutorial.info/apache-tez-successor-mapreduce-framework/

http://blog.csdn.net/teddeyang/article/details/19564603 

$ cd /home/hadoop/github/apache/tez/tez-dist/target/tez-0.7.0-SNAPSHOT

$ mkdir conf 

$ hadoop fs -mkdir /apps

$ hadoop fs -mkdir /apps/tez

$ hadoop fs -put /home/hadoop/github/apache/tez/tez-dist/target/tez-0.7.0-SNAPSHOT /apps/tez

$ vi conf/tez-site.xml

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href=http://www.mamicode.com/"configuration.xsl"?>

<configuration>

  <property>

    <name>tez.version</name>

    <value>tez-0.7.0-SNAPSHOT</value>

  </property>

  <property>

    <name>tez.lib.uris</name>

    <value>${fs.default.name}/apps/tez/${tez.version},${fs.default.name}/apps/tez/${tez.version}/lib/</value>

  </property>

</configuration>

vi ~/.bashrc

export TEZ_HOME=/home/hadoop/github/apache/tez/tez-dist/target/tez-0.7.0-SNAPSHOT

export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:${TEZ_HOME}/conf:${TEZ_HOME}/*:${TEZ_HOME}/lib/*

另一种方案是复制tez-site.xml$HADOOP_HOME/etc/hadoop并修改hadoop-env.sh文件这种侵入性大不建议

vi ${HADOOP_INSTALL}/etc/hadoop/hadoop-env.sh

export HADOOP_CLASSPATH=$HADOOP_HOME:$HADOOP_HOME/etc/hadoop

for f in /home/hadoop/github/apache/tez/tez-dist/target/tez-0.7.0-SNAPSHOT/*.jar; do

  if [ "$HADOOP_CLASSPATH" ]; then

    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f

  else

    export HADOOP_CLASSPATH=$f

  fi

done

for f in /home/hadoop/github/apache/tez/tez-dist/target/tez-0.7.0-SNAPSHOT/lib/*.jar; do

  if [ "$HADOOP_CLASSPATH" ]; then

    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f

  else

    export HADOOP_CLASSPATH=$f

  fi

done

可选步骤:

$ vi ${HADOOP_HOME}/etc/hadoop/mapred-site.xml

  <property>

    <name>mapreduce.framework.name</name>

    <value>yarn-tez</value>

  </property>

 

Tez WorldCount

$ stop-yarn.sh

$ start-yarn.sh

$ cd ~/github/apache/tez/tez-dist/target/tez-0.7.0-SNAPSHOT

$ hadoop fs -put ~/data/helloworld.txt /input/tez

hadoop jar tez-examples-0.7.0-SNAPSHOT.jar orderedwordcount /input/tez/helloworld.txt /output/tez/helloworld

技术分享

hadoop@hadoop:~/github/apache/tez/tez-dist/target/tez-0.7.0-SNAPSHOT$ hadoop fs -ls /output/tez/helloworld

-rw-r--r--   3 hadoop supergroup          0 2015-01-20 19:33 /output/tez/helloworld/_SUCCESS

-rw-r--r--   3 hadoop supergroup        130 2015-01-20 19:33 /output/tez/helloworld/part-v002-o000-00000

hadoop@hadoop:~/github/apache/tez/tez-dist/target/tez-0.7.0-SNAPSHOT$ hadoop fs -cat /output/tez/helloworld/part-v002-o000-00000

again1

system.1

Spark1

system,1

process1

Now1

hot1

today1

batch1

also1

..2

bigdata2

a2

Hadoop2

Hello3

world3

is3

可以看到结果安装wordcount进行升序排列在 http://localhost:8088/cluster 

技术分享

hadoop jar tez-tests-0.7.0-SNAPSHOT.jar testorderedwordcount -DUSE_TEZ_SESSION=true \

/input/tez/helloworld.txt /output/tez/helloworld2 /input/tez/helloworld2.txt /output/tez/helloworld3

15/01/20 20:01:04 INFO examples.TestOrderedWordCount: Creating Tez Session

15/01/20 20:01:04 INFO client.TezClient: Tez Client Version: [ component=tez-api, version=0.7.0-SNAPSHOT, revision=83261659809f7904b786c9c81def4451dca27078, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=20150120-1554 ]

15/01/20 20:01:04 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032

15/01/20 20:01:04 INFO client.TezClient: Session mode. Starting session.

15/01/20 20:01:04 INFO Configuration.deprecation: fs.default.name is deprecated. Instead, use fs.defaultFS

15/01/20 20:01:04 INFO client.TezClientUtils: Using tez.lib.uris value from configuration: 

hdfs://localhost:9000/apps/tez/tez-0.7.0-SNAPSHOT,hdfs://localhost:9000/apps/tez/tez-0.7.0-SNAPSHOT/lib/

15/01/20 20:01:04 INFO client.TezClient: Tez system stage directory 

hdfs://localhost:9000/tmp/hadoop/tez/staging/1421755263939/.tez/application_1421753603786_0002 doesn‘t exist and is created

15/01/20 20:01:04 INFO impl.YarnClientImpl: Submitted application application_1421753603786_0002

15/01/20 20:01:04 INFO client.TezClient: The url to track the Tez Session: http://localhost:8088/proxy/application_1421753603786_0002/

15/01/20 20:01:04 INFO examples.TestOrderedWordCount: Running OrderedWordCount DAG, 

dagIndex=1, inputPath=/input/tez/helloworld.txt, outputPath=/output/tez/helloworld2

15/01/20 20:01:05 INFO examples.TestOrderedWordCount: Checking DAG specific ACLS

15/01/20 20:01:05 INFO examples.TestOrderedWordCount: Waiting for TezSession to get into ready state

15/01/20 20:01:08 INFO examples.TestOrderedWordCount: Submitting DAG to Tez Session, dagIndex=1

15/01/20 20:01:08 INFO client.TezClient: Submitting dag to TezSession, 

sessionName=OrderedWordCountSession, applicationId=application_1421753603786_0002, dagName=OrderedWordCount1

15/01/20 20:01:08 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032

15/01/20 20:01:08 INFO examples.TestOrderedWordCount: Submitted DAG to Tez Session, dagIndex=1

15/01/20 20:01:15 INFO examples.TestOrderedWordCount: DAG 1 completed. FinalState=SUCCEEDED

examples.TestOrderedWordCount: Running OrderedWordCount DAG, dagIndex=2, inputPath=/input/tez/helloworld2.txt, outputPath=/output/tez/helloworld3

15/01/20 20:01:15 INFO examples.TestOrderedWordCount: Checking DAG specific ACLS

15/01/20 20:01:15 INFO examples.TestOrderedWordCount: Waiting for TezSession to get into ready state

15/01/20 20:01:15 INFO examples.TestOrderedWordCount: Submitting DAG to Tez Session, dagIndex=2

15/01/20 20:01:15 INFO client.TezClient: Submitting dag to TezSession, 

sessionName=OrderedWordCountSession, applicationId=application_1421753603786_0002, dagName=OrderedWordCount2

15/01/20 20:01:15 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032

15/01/20 20:01:15 INFO examples.TestOrderedWordCount: Submitted DAG to Tez Session, dagIndex=2

15/01/20 20:01:16 INFO examples.TestOrderedWordCount: DAG 2 completed. FinalState=SUCCEEDED

15/01/20 20:01:16 INFO examples.TestOrderedWordCount: Shutting down session

client.TezClient: Shutting down Tez Session, sessionName=OrderedWordCountSession, applicationId=application_1421753603786_0002

查看yarn web

技术分享

Hello Tez