ubuntu12.04+hadoop2.2.0+zookeeper3.4.5+hbase0.96.2+hive0.13.1分布式环境部署 - 程序员工具箱

2000万优秀解决方案库，覆盖所有编程及软件开发类，极速查询

今日已更新 492 篇代码解决方案

首页 > 代码库 > ubuntu12.04+hadoop2.2.0+zookeeper3.4.5+hbase0.96.2+hive0.13.1分布式环境部署

ubuntu12.04+hadoop2.2.0+zookeeper3.4.5+hbase0.96.2+hive0.13.1分布式环境部署

2024-07-15 22:42:36 223人阅读

博文作者：迦壹

博客地址：http://idoall.org/home.php?mod=space&uid=1&do=blog&id=542

转载声明：可以转载, 但必须以超链接形式标明文章原始出处和作者信息及版权声明，谢谢合作！

---------------------------------------

目录：

　　一、hadoop2.2.0、zookeeper3.4.5、hbase0.96.2、hive0.13.1都是什么？

　　二、这些软件在哪里下载？

　　三、如何安装

　　　　1、安装JDK

　　　　2、用parallels克隆3台机器

　　　　3、安装Zookeeper-3.4.5

　　　　4、安装hadoop2.2.0

　　　　5、启动zookeeper

　　　　6、启动JournalNode集群

　　　　7、Hbase-0.96.2-hadoop2（启动双HMaster的配置，m1是主HMaster，m2是从HMaster）

　　　　8、在ubuntu12.04的m1上面安装mysql5.5.x

　　　　9、hive 0.13.1安装

　　　　10、hive to hbase (Hive中的表数据导入到Hbase中去)

　　　　11、hbase to hive (Hbase中的表数据导入到Hive)

　　四、常见问题

　　五、参考资料

一、hadoop2.2.0、zookeeper3.4.5、hbase0.96.2、hive0.13.1都是什么？

　　hadoop2.2.0的介绍以及特性，参考这里：http://blog.yidooo.net/archives/hadoop-2-2-0-new-features.html

　　zookeeper的介绍，参考这里：http://baike.baidu.com/view/3061646.htm

　　hbase的介绍，参考这里：http://baike.baidu.com/view/1993870.htm

　　hive0.13的介绍以及特性，参考这里：http://www.csdn.net/article/2014-04-22/2819438-Cloud-Hive

　　四款软件打包后的文件，我放到了这里：http://pan.baidu.com/s/1i35PlI1

　　我想能够看这篇文章的人，都会具备一些基础知识，这里就不多介绍了。

　　BTW：我是用MAC10.09+Parallels9虚拟的4个ubuntu。分别为m1,m2两个主，s1,s2两个从，共四台机器。

二、这些软件在哪里下载？

　　hadoop2.2.0：http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0-src.tar.gz

　　zookeeper3.4.5：http://apache.dataguru.cn/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz

　　hbase0.96.2：http://mirrors.hust.edu.cn/apache/hbase/hbase-0.96.2/hbase-0.96.2-hadoop2-bin.tar.gz

　　hive0.13.1：http://mirrors.cnnic.cn/apache/hive/hive-0.13.1/apache-hive-0.13.1-bin.tar.gz

　　JDK1.7.0_65：使用apt-get方式安装

　　这里hadoop2.2.0使用的是源码包，因为我使用的是64bit的ubuntu，而hadoop官方提供的，只有32bit可用。如果在64bit上运行会报错util.NativeCodeLoader - Unable to load native-hadoop library for your platform..错误，所以需要重新在64bit上编辑，后面我会单独写一篇文章介绍如何操作。

　　三、如何安装

　　1、安装JDK（当前主机名为m1）

　　　　1)执行以下命令

#sudo apt-get install oracle-java7-installer

　　

　　　　2)配置JAVA环境变量

#sudo vi /etc/environment

　　　　在第一行的PASH最后加上java的bin路径。　　　　

PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/lib/jvm/java-7oracle/bin”

　　　　在PATH的后面加上下面三行

CLASSPATH="/usr/lib/jvm/java-7-oracle/lib”

JAVA_HOME="/usr/lib/jvm/java-7-oracle”

JRE_HOME="/usr/lib/jvm/java-7-oracle/jre”

　　　　告诉系统，我们使用的sun的JDK，而非OpenJDK了

#sudo update-alternatives --install /usr/bin/java java /usr/lib/jvm/java-7-oracle/bin/java 300
#sudo update-alternatives --install /usr/bin/javac javac /usr/lib/jvm/java-7-oracle/bin/javac 300
#sudo update-alternatives --config java

　　　　这时会有几个选项，如下图选择2,然后再执行java -version就可以看到最新版本

　　2、用parallels克隆3台机器
　　　　1)在parallels的硬件网络中选择如下所示，这个时候这个ping www.163.com就会ping通了

　　　　2)点击Parallels左上角=》文件=》克隆，克隆三台虚拟机名字分别命名为：m2,s1,s2(克隆前要先停止虚拟机)
　　　　执行sudo vi /etc/hostname ，修改各自的主机名称，如果生效需要重启。
　　　　在m1、m2、s1、s2上分别执行ifconfig查看被分配到的IP地址，然后执行sudo vi /etc/hosts，我的机器修改如下图,然后执行”sudo /etc/init.d/networking restart"生效：

　　　　3)配置shhd无验证登录(我使用的是root帐号)
　　　　安装SSH工具

#sudo apt-get install ssh openssh-server
(如果默认执行ssh存在，就不用安装了)

　　　　在每台机器分别输入ssh-keygen，一路回车，然后会在用户的.ssh目录生成id_rsa和id_rsa.pub文件。
　　　　在m1上执行：

#scp -r root@m2:/root/.ssh/id_rsa.pub ~/.ssh/m2.pub
#scp -r root@s1:/root/.ssh/id_rsa.pub ~/.ssh/s1.pub
#scp -r root@s2:/root/.ssh/id_rsa.pub ~/.ssh/s2.pub
#cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
#cat ~/.ssh/m2.pub >> ~/.ssh/authorized_keys
#cat ~/.ssh/s1.pub >> ~/.ssh/authorized_keys
#cat ~/.ssh/s2.pub >> ~/.ssh/authorized_keys
#scp -r ~/.ssh/authorized_keys root@m2:~/.ssh/
#scp -r ~/.ssh/authorized_keys root@s1:~/.ssh/
#scp -r ~/.ssh/authorized_keys root@s2:~/.ssh/

　　3、安装Zookeeper-3.4.5
　　　　1)配置zoo.cfg（默认是没有zoo.cfg，将zoo_sample.cfg复制一份，并命名为zoo.cfg）

root@m1:/home/hadoop/zookeeper-3.4.5/conf# vi zoo.cfg
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/hadoop/zookeeper-3.4.5/data
dataLogDir=/home/hadoop/zookeeper-3.4.5/logs
server.1=m1:2888:3888
server.2=m2:2888:3888
server.3=s1:2888:3888
server.4=s2:2888:3888
# the port at which the clients will connect
clientPort=2181
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

　　　　2)将zookeeper从m1复制到m2,s1,s2机器上

root@m1:/home/hadoop/zookeeper-3.4.5/conf# scp -r /home/hadoop/zookeeper-3.4.5 root@m2:/home/hadoop
root@m1:/home/hadoop/zookeeper-3.4.5/conf# scp -r /home/hadoop/zookeeper-3.4.5 root@s1:/home/hadoop
root@m1:/home/hadoop/zookeeper-3.4.5/conf# scp -r /home/hadoop/zookeeper-3.4.5 root@s2:/home/hadoop

　　　　3)在m1,m2,s1,s2机器上，的/home/hadoop/zookeeper-3.4.5/dataDir目录下创建 myid文件，内容为在zoo.cfg中配置的server.后面的数字，记住只能是数字

　　　　m1为1

　　　　m2为2

　　　　s1为3

　　　　s2为4

　　　　至此，zookeeper的配置结束。

　　4、安装hadoop2.2.0
　　　　修改以下7个配置文件：

　　　　1)/home/hadoop/hadoop-2.2.0/etc/hadoop/hadoop-env.sh(主要修改java路径)

root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
#export JAVA_HOME=${JAVA_HOME}

　　　　2)/home/hadoop/hadoop-2.2.0/etc/hadoop/yarn-env.sh(主要修改java路径)

root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi yarn-env.sh
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# User for YARN daemons
export HADOOP_YARN_USER=${HADOOP_YARN_USER:-yarn}
export JAVA_HOME=/usr/lib/jvm/java-7-oracle

　　　　3)/home/hadoop/hadoop-2.2.0/etc/hadoop/hdfs-site.xml

root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="http://www.mamicode.com/configuration.xsl"?>




<configuration>
       <property>
              <name>dfs.nameservices</name>
              <value>mycluster</value>
       </property>
       <property>
              <name>dfs.ha.namenodes.mycluster</name>
              <value>m1,m2</value>
       </property>
       <property>
              <name>dfs.namenode.rpc-address.mycluster.m1</name>
              <value>m1:9000</value>
       </property>
       <property>
              <name>dfs.namenode.rpc-address.mycluster.m2</name>
              <value>m2:9000</value>
       </property>
       <property>
              <name>dfs.namenode.http-address.mycluster.m1</name>
              <value>m1:50070</value>
       </property>
       <property>
              <name>dfs.namenode.http-address.mycluster.m2</name>
              <value>m2:50070</value>
       </property>
       <property>
              <name>dfs.namenode.shared.edits.dir</name>
              <value>qjournal://m1:8485;m2:8485/mycluster</value>
       </property>
      <property>
          <name>dfs.ha.automatic-failover.enabled.mycluster</name>
        <value>true</value>
  </property>
       <property>
              <name>dfs.client.failover.proxy.provider.mycluster</name>
       <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
       </property>
       <property>
              <name>dfs.ha.fencing.methods</name>
              <value>sshfence</value>
       </property>
       <property>
              <name>dfs.ha.fencing.ssh.private-key-files</name>
              <value>/root/.ssh/id_rsa</value>
       </property>
       <property>
            <name>dfs.journalnode.edits.dir</name>
              <value>/home/hadoop/hadoop-2.2.0/tmp/journal</value>
       </property>
       <property>
              <name>dfs.replication</name>
              <value>3</value>
       </property>
       <property>
              <name>dfs.webhdfs.enabled</name>
              <value>true</value>
       </property>
          <property>
    <name>dfs.permissions</name>
    <value>false</value>
</property>
<property>
    <name>dfs.permissions.enabled</name>
    <value>false</value>
</property>
</configuration>

　　　　4)/home/hadoop/hadoop-2.2.0/etc/hadoop/mapred-site.xml

root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="http://www.mamicode.com/configuration.xsl"?>

<configuration>
     <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
          <description>Execution framework set to Hadoop YARN.</description>
     </property>
</configuration>

　　　　5)/home/hadoop/hadoop-2.2.0/etc/hadoop/core-site.xml

root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="http://www.mamicode.com/configuration.xsl"?>




<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>m1:2181,m2:2181,s1:2181,s2:2181</value>
</property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/home/hadoop/hadoop-2.2.0/tmp</value>
                <description></description>
        </property>
</configuration>

　　　　6)/home/hadoop/hadoop-2.2.0/etc/hadoop/yarn-site.xml

root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi yarn-site.xml
<?xml version="1.0"?>

<configuration>

       <property>
              <name>yarn.nodemanager.aux-services</name>
              <value>mapreduce_shuffle</value>
       </property>
       <property>
              <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
              <value>org.apache.hadoop.mapred.ShuffleHandler</value>
       </property>
       <property>
              <name>yarn.resourcemanager.hostname</name>
              <value>m1</value>
       </property>
</configuration>

　　　　7)/home/hadoop/hadoop-2.2.0/etc/hadoop/slaves

root@m1:/home/hadoop/hadoop-2.2.0/etc/hadoop# vi slaves
s1
s2

　　　　至此，hadoop的配置结束。

　　　　5、启动zookeeper

　　　　1)在m1,m2,s1,s2所有机器上执行,下面的代码是在m1上执行的示例：

root@m1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkServer.sh start
JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
root@m1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkServer.sh status
JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg
Mode: follower
root@m1:/home/hadoop#

　　　　2)在每台机器上执行下面的命令，可以查看状态，在s1上是leader，其他机器是follower

root@s1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkServer.sh start
JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
root@s1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkServer.sh status
JMX enabled by default
Using config: /home/hadoop/zookeeper-3.4.5/bin/../conf/zoo.cfg
Mode: leader
root@s1:/home/hadoop#

　　　　3)测试zookeeper是否启动成功

root@m1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkCli.sh

Connecting to localhost:2181
2014-07-27 00:27:16,621 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
2014-07-27 00:27:16,628 [myid:] - INFO [main:Environment@100] - Client environment:host.name=m1
2014-07-27 00:27:16,628 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.7.0_65
2014-07-27 00:27:16,629 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2014-07-27 00:27:16,629 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
2014-07-27 00:27:16,630 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/home/hadoop/zookeeper-3.4.5/bin/../build/classes:/home/hadoop/zookeeper-3.4.5/bin/../build/lib/*.jar:/home/hadoop/zookeeper-3.4.5/bin/../lib/slf4j-log4j12-1.6.1.jar:/home/hadoop/zookeeper-3.4.5/bin/../lib/slf4j-api-1.6.1.jar:/home/hadoop/zookeeper-3.4.5/bin/../lib/netty-3.2.2.Final.jar:/home/hadoop/zookeeper-3.4.5/bin/../lib/log4j-1.2.15.jar:/home/hadoop/zookeeper-3.4.5/bin/../lib/jline-0.9.94.jar:/home/hadoop/zookeeper-3.4.5/bin/../zookeeper-3.4.5.jar:/home/hadoop/zookeeper-3.4.5/bin/../src/java/lib/*.jar:/home/hadoop/zookeeper-3.4.5/bin/../conf:/usr/lib/jvm/java-7-oracle/lib
2014-07-27 00:27:16,630 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=:/usr/local/lib:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2014-07-27 00:27:16,631 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2014-07-27 00:27:16,631 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA>
2014-07-27 00:27:16,632 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux
2014-07-27 00:27:16,632 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64
2014-07-27 00:27:16,632 [myid:] - INFO [main:Environment@100] - Client environment:os.version=3.11.0-15-generic
2014-07-27 00:27:16,633 [myid:] - INFO [main:Environment@100] - Client environment:user.name=root
2014-07-27 00:27:16,633 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/root
2014-07-27 00:27:16,634 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/home/hadoop
2014-07-27 00:27:16,636 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@19b1ebe5
Welcome to ZooKeeper!
2014-07-27 00:27:16,672 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@966] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2014-07-27 00:27:16,685 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@849] - Socket connection established to localhost/127.0.0.1:2181, initiating session
JLine support is enabled
2014-07-27 00:27:16,719 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1207] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x147737cd5d30000, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] ls /
[zookeeper]
[zk: localhost:2181(CONNECTED) 1]

　　　　4)在m1上格式化zookeeper

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/bin/hdfs zkfc -formatZK
14/07/27 00:31:59 INFO tools.DFSZKFailoverController: Failover controller configured for NameNode NameNode at m1/192.168.1.50:9000
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:host.name=m1
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_65
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-7-oracle/jre
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/home/hadoop/hadoop-2.2.0/etc/hadoop:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/guava-11.0.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-codec-1.4.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/hadoop-annotations-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-api-1.7.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-net-3.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/paranamer-2.3.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jasper-compiler-5.5.23.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-math-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-lang-2.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/servlet-api-2.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-logging-1.1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jasper-runtime-5.5.23.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/mockito-all-1.8.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/hadoop-auth-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-digester-1.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jsp-api-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jersey-core-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jettison-1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/xmlenc-0.52.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-httpclient-3.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-io-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jsch-0.1.42.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jackson-jaxrs-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/junit-4.8.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jetty-6.1.26.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-collections-3.2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jsr305-1.3.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jackson-xc-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/asm-3.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jersey-json-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/stax-api-1.0.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jets3t-0.6.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/avro-1.7.4.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-el-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/commons-configuration-1.6.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jersey-server-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/activation-1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/zookeeper-3.4.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/xz-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/hadoop-nfs-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0-tests.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/guava-11.0.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-lang-2.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-logging-1.1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jasper-runtime-5.5.23.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jsp-api-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-io-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jsr305-1.3.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/asm-3.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-el-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/hadoop-hdfs-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/hadoop-hdfs-nfs-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/hdfs/hadoop-hdfs-2.2.0-tests.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/hamcrest-core-1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/hadoop-annotations-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/paranamer-2.3.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/guice-3.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/jersey-core-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/junit-4.10.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/commons-io-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/javax.inject-1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/aopalliance-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/asm-3.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/avro-1.7.4.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/jersey-server-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/lib/xz-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-client-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-server-tests-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-common-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-server-common-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-api-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-site-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/hamcrest-core-1.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/hadoop-annotations-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/jackson-core-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/guice-3.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/junit-4.10.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/commons-io-2.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.8.8.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/javax.inject-1.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/asm-3.2.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/lib/xz-1.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0-tests.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar:/home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.2.0.jar:/home/hadoop/hadoop-2.2.0/contrib/capacity-scheduler/*.jar
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/home/hadoop/hadoop-2.2.0/lib/native
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:os.version=3.11.0-15-generic
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:user.name=root
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:user.home=/root
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/hadoop
14/07/27 00:32:00 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=m1:2181,m2:2181,s1:2181,s2:2181 sessionTimeout=5000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@5990054a
14/07/27 00:32:00 INFO zookeeper.ClientCnxn: Opening socket connection to server m1/192.168.1.50:2181. Will not attempt to authenticate using SASL (unknown error)
14/07/27 00:32:00 INFO zookeeper.ClientCnxn: Socket connection established to m1/192.168.1.50:2181, initiating session
14/07/27 00:32:00 INFO zookeeper.ClientCnxn: Session establishment complete on server m1/192.168.1.50:2181, sessionid = 0x147737cd5d30001, negotiated timeout = 5000
===============================================
The configured parent znode /hadoop-ha/mycluster already exists.
Are you sure you want to clear all failover information from
ZooKeeper?
WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!
===============================================
Proceed formatting /hadoop-ha/mycluster? (Y or N) 14/07/27 00:32:00 INFO ha.ActiveStandbyElector: Session connected.
y
14/07/27 00:32:13 INFO ha.ActiveStandbyElector: Recursively deleting /hadoop-ha/mycluster from ZK...
14/07/27 00:32:13 INFO ha.ActiveStandbyElector: Successfully deleted /hadoop-ha/mycluster from ZK.
14/07/27 00:32:13 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.
14/07/27 00:32:13 INFO zookeeper.ClientCnxn: EventThread shut down
14/07/27 00:32:13 INFO zookeeper.ZooKeeper: Session: 0x147737cd5d30001 closed
root@m1:/home/hadoop#

　　　　5)验证zkfc是否格式化成功，如果多了一个hadoop-ha包就是成功了。

root@m1:/home/hadoop# /home/hadoop/zookeeper-3.4.5/bin/zkCli.sh
[zk: localhost:2181(CONNECTED) 0] ls /
[hadoop-ha, zookeeper]
[zk: localhost:2181(CONNECTED) 1]

　　6、启动JournalNode集群

　　　　1)依次在m1,m2,s1,s2上面执行

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/hadoop-daemon.sh start journalnode
starting journalnode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-root-journalnode-m1.out
root@m1:/home/hadoop# jps
2884 JournalNode
2553 QuorumPeerMain
2922 Jps
root@m1:/home/hadoop#

　　　　2)格式化集群的一个NameNode（m1），有两种方法，我使用的是第一种

　　　　方法一

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/bin/hdfs namenode –format

　　　　方法二

root@m1:/home/hadoop/hadoop-2.2.0/bin/hdfs namenode -format -clusterId m1

　　　　3)在m1上启动刚才格式化的 namenode，执行命令后，浏览:http://m1:50070/dfshealth.jsp可以看到m1的状态

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/hadoop-daemon.sh start namenode

　　　　4)将m1的数据复制到m2上来,在m2上执行

root@m2:/home/hadoop# /home/hadoop/hadoop-2.2.0/bin/hdfs namenode –bootstrapStandby

　　　　5)启动m2上的namenode，执行命令后，浏览:http://m1:50070/dfshealth.jsp可以看到m1的状态。这个时候在网址上可以发现m1和m2的状态都是standby。

root@m2:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/hadoop-daemon.sh start namenode

　　　　6)启动所有的datanode，在m1上执行

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/hadoop-daemons.sh start datanode
s2: starting datanode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-root-datanode-s2.out
s1: starting datanode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-root-datanode-s1.out
root@m1:/home/hadoop#

　　　　7)启动yarn，在m1上执行以下命令，然后浏览可以看到效果：http://m1:8088/cluster

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/hadoop-2.2.0/logs/yarn-root-resourcemanager-m1.out
s1: starting nodemanager, logging to /home/hadoop/hadoop-2.2.0/logs/yarn-root-nodemanager-s1.out
s2: starting nodemanager, logging to /home/hadoop/hadoop-2.2.0/logs/yarn-root-nodemanager-s2.out
root@m1:/home/hadoop#

　　　　8)、启动 ZooKeeperFailoverCotroller，在m1,m2机器上依次执行以下命令，这个时候再浏览50070端口，可以发现m1变成active状态了，而m2还是standby状态

root@m1:/home/hadoop# /home/hadoop/hadoop-2.2.0/sbin/hadoop-daemon.sh start zkfc
starting zkfc, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-root-zkfc-m1.out
root@m1:/home/hadoop#

　　　　9)、测试HDFS是否可用

root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -ls /
Found 2 items
drwx------   - root supergroup          0 2014-07-17 23:54 /tmp
drwxr-xr-x   - lion supergroup          0 2014-07-21 00:40 /user
root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -mkdir /input
root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -ls /
Found 3 items
drwxr-xr-x   - root supergroup          0 2014-07-27 01:20 /input
drwx------   - root supergroup          0 2014-07-17 23:54 /tmp
drwxr-xr-x   - lion supergroup          0 2014-07-21 00:40 /user
root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -ls /input
root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -put hadoop.cmd /input
root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hdfs dfs -ls /input
Found 1 items
-rw-r--r--   3 root supergroup       7530 2014-07-27 01:20 /input/hadoop.cmd
root@m1:/home/hadoop/hadoop-2.2.0/bin#

　　　　10)、测试YARN是否可用,我们来做一个经典的例子，统计刚才放入input下面的hadoop.cmd的单词频率

root@m1:/home/hadoop/hadoop-2.2.0/bin# /home/hadoop/hadoop-2.2.0/bin/hadoop jar /home/hadoop/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /input /output
14/07/27 01:22:41 INFO client.RMProxy: Connecting to ResourceManager at m1/192.168.1.50:8032
14/07/27 01:22:43 INFO input.FileInputFormat: Total input paths to process : 1
14/07/27 01:22:44 INFO mapreduce.JobSubmitter: number of splits:1
14/07/27 01:22:44 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/07/27 01:22:44 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
14/07/27 01:22:44 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/07/27 01:22:44 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/07/27 01:22:44 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/07/27 01:22:45 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1406394452186_0001
14/07/27 01:22:46 INFO impl.YarnClientImpl: Submitted application application_1406394452186_0001 to ResourceManager at m1/192.168.1.50:8032
14/07/27 01:22:46 INFO mapreduce.Job: The url to track the job: http://m1:8088/proxy/application_1406394452186_0001/
14/07/27 01:22:46 INFO mapreduce.Job: Running job: job_1406394452186_0001
14/07/27 01:23:10 INFO mapreduce.Job: Job job_1406394452186_0001 running in uber mode : false
14/07/27 01:23:10 INFO mapreduce.Job: map 0% reduce 0%
14/07/27 01:23:31 INFO mapreduce.Job: map 100% reduce 0%
14/07/27 01:23:48 INFO mapreduce.Job: map 100% reduce 100%
14/07/27 01:23:48 INFO mapreduce.Job: Job job_1406394452186_0001 completed successfully
14/07/27 01:23:49 INFO mapreduce.Job: Counters: 43
        File System Counters
                FILE: Number of bytes read=6574
                FILE: Number of bytes written=175057
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=7628
                HDFS: Number of bytes written=5088
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=18062
                Total time spent by all reduces in occupied slots (ms)=14807
        Map-Reduce Framework
                Map input records=240
                Map output records=827
                Map output bytes=9965
                Map output materialized bytes=6574
                Input split bytes=98
                Combine input records=827
                Combine output records=373
                Reduce input groups=373
                Reduce shuffle bytes=6574
                Reduce input records=373
                Reduce output records=373
                Spilled Records=746
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=335
                CPU time spent (ms)=2960
                Physical memory (bytes) snapshot=270057472
                Virtual memory (bytes) snapshot=1990762496
                Total committed heap usage (bytes)=136450048
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=7530
        File Output Format Counters
                Bytes Written=5088
root@m1:/home/hadoop/hadoop-2.2.0/bin#

　　　　11)、验证HA的高可用性，故障转移，刚才我们用浏览器打开m1和m2的50070端口，已经看到m1的状态是active，m2的状态是standby，

　　　　　　　a)我们在m1上kill掉namenode进程

root@m1:/home/hadoop/hadoop-2.2.0/bin# jps
5492 Jps
2884 JournalNode
4375 DFSZKFailoverController
2553 QuorumPeerMain
3898 NameNode
4075 ResourceManager
root@m1:/home/hadoop/hadoop-2.2.0/bin# kill -9 3898
root@m1:/home/hadoop/hadoop-2.2.0/bin# jps
2884 JournalNode
4375 DFSZKFailoverController
2553 QuorumPeerMain
4075 ResourceManager
5627 Jps
root@m1:/home/hadoop/hadoop-2.2.0/bin#

　　　　　　　b)再浏览m1和m2的50070端口，发现m1是打不开，而m2是active状态。

　　　　这时候在m2上的HDFS和mapreduce还是可以正常运行的，虽然m1上的namenode进程已经被kill掉，但不影响使用这就是故障转移的优势！

　　7、Hbase-0.96.2-hadoop2（启动双HMaster的配置，m1是主HMaster，m2是从HMaster）

　　　　1)、修改hbase-env.sh配置，主要修JAVA_HOME的目录，以及HBASE_MANAGES_ZK

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# vi hbase-env.sh
#
#/**
# * Copyright 2007 The Apache Software Foundation
# *
# * Licensed to the Apache Software Foundation (ASF) under one
# * or more contributor license agreements.  See the NOTICE file
# * distributed with this work for additional information
# * regarding copyright ownership.  The ASF licenses this file
# * to you under the Apache License, Version 2.0 (the
# * "License"); you may not use this file except in compliance
# * with the License.  You may obtain a copy of the License at
# *
# *     http://www.apache.org/licenses/LICENSE-2.0
# *
# * Unless required by applicable law or agreed to in writing, software
# * distributed under the License is distributed on an "AS IS" BASIS,
# * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# * See the License for the specific language governing permissions and
# * limitations under the License.
# */

# Set environment variables here.

# This script sets variables multiple times over the course of starting an hbase process,
# so try to keep things idempotent unless you want to take an even deeper look
# into the startup scripts (bin/hbase, etc.)

# The java implementation to use.  Java 1.6 required.
export JAVA_HOME=/usr/lib/jvm/java-7-oracle

# Extra Java CLASSPATH elements.  Optional.
# export HBASE_CLASSPATH=

# The maximum amount of heap to use, in MB. Default is 1000.
# export HBASE_HEAPSIZE=1000

# Extra Java runtime options.
# Below are what we set by default.  May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="-XX:+UseConcMarkSweepGC"

# Uncomment one of the below three options to enable java garbage collection logging for the server-side processes.

# This enables basic gc logging to the .out file.
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"

# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

# Uncomment one of the below three options to enable java garbage collection logging for the client processes.

# This enables basic gc logging to the .out file.
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"

# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

# Uncomment below if you intend to use the EXPERIMENTAL off heap cache.
# export HBASE_OPTS="$HBASE_OPTS -XX:MaxDirectMemorySize="
# Set hbase.offheapcache.percentage in hbase-site.xml to a nonzero value.

# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
#
# export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
# export HBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10105"

# File naming hosts on which HRegionServers will run.  $HBASE_HOME/conf/regionservers by default.
# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers

# Uncomment and adjust to keep all the Region Server pages mapped to be memory resident
#HBASE_REGIONSERVER_MLOCK=true
#HBASE_REGIONSERVER_UID="hbase"

# File naming hosts on which backup HMaster will run.  $HBASE_HOME/conf/backup-masters by default.
# export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters

# Extra ssh options.  Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"

# Where log files are stored.  $HBASE_HOME/logs by default.
# export HBASE_LOG_DIR=${HBASE_HOME}/logs

# Enable remote JDWP debugging of major HBase processes. Meant for Core Developers
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"

# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER

# The scheduling priority for daemon processes.  See ‘man nice‘.
# export HBASE_NICENESS=10

# The directory where pid files are stored. /tmp by default.
# export HBASE_PID_DIR=/var/hadoop/pids

# Seconds to sleep between slave commands.  Unset by default.  This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1

# Tell HBase whether it should manage it‘s own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false
#这个值为false时，表示启动的是独立的zookeeper。而配置成true则是hbase自带的zookeeper。
# The default log rolling policy is RFA, where the log file is rolled as per the size defined for the
# RFA appender. Please refer to the log4j.properties file to see more details on this appender.
# In case one needs to do log rolling on a date change, one should set the environment property
# HBASE_ROOT_LOGGER to "<DESIRED_LOG LEVEL>,DRFA".
# For example:
# HBASE_ROOT_LOGGER=INFO,DRFA
# The reason for changing default to RFA is to avoid the boundary case of filling out disk space as
# DRFA doesn‘t put any cap on the log size. Please refer to HBase-5655 for more context.

　　　　2)、修改hbase-site.xml配置

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# vi hbase-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="http://www.mamicode.com/configuration.xsl"?>

<configuration>
       <property>
                
               <name>hbase.rootdir</name>
               <value>hdfs://mycluster/hbase</value>
       </property>
       <property>
                
               <name>hbase.cluster.distributed</name>
               <value>true</value>
       </property>
       <property>
               <name>hbase.tmp.dir</name>
               <value>/home/hadoop/hbase-0.96.2-hadoop2/tmp</value>
       </property>
       <property>
                
               <name>hbase.master</name>
               <value>60000</value>
        </property>
       <property>
                
               <name>hbase.zookeeper.quorum</name>
               <value>m1,m2,s1,s2</value>
       </property>
       <property>
                
               <name>hbase.zookeeper.property.clientPort</name>
                <value>2181</value>
       </property>
       <property>
               <name>hbase.zookeeper.property.dataDir</name>
               <value>/home/hadoop/zookeeper-3.4.5/data</value>
       </property>
</configuration>

　　　　2)、修改regionservers文件
　　　　　　　通常部署master的机器上不就部署slave了，用两台集群做Hbase从服务器

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# vi regionservers
s1
s2

　　　　3)、创建hadoop的hdfs-site.xml的软连接到hbase的配置文件目录

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# ll
总用量 40
drwxr-xr-x 2 root root  4096 Jul 27 09:15 ./
drwxr-xr-x 9 root root  4096 Jul 20 21:40 ../
-rw-r--r-- 1 root staff 1026 Mar 25 06:29 hadoop-metrics2-hbase.properties
-rw-r--r-- 1 root staff 4023 Mar 25 06:29 hbase-env.cmd
-rw-r--r-- 1 root staff 7129 Jul 27 08:58 hbase-env.sh
-rw-r--r-- 1 root staff 2257 Mar 25 06:29 hbase-policy.xml
-rw-r--r-- 1 root staff 2550 Jul 27 09:10 hbase-site.xml
-rw-r--r-- 1 root staff 3451 Mar 25 06:29 log4j.properties
-rw-r--r-- 1 root staff    6 Jul 20 21:38 regionservers
root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# ln -s /home/hadoop/hadoop-2.2.0/etc/hadoop/hdfs-site.xml hdfs-site.xml
root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf# ll
总用量 40
drwxr-xr-x 2 root root  4096 Jul 27 09:16 ./
drwxr-xr-x 9 root root  4096 Jul 20 21:40 ../
-rw-r--r-- 1 root staff 1026 Mar 25 06:29 hadoop-metrics2-hbase.properties
-rw-r--r-- 1 root staff 4023 Mar 25 06:29 hbase-env.cmd
-rw-r--r-- 1 root staff 7129 Jul 27 08:58 hbase-env.sh
-rw-r--r-- 1 root staff 2257 Mar 25 06:29 hbase-policy.xml
-rw-r--r-- 1 root staff 2550 Jul 27 09:10 hbase-site.xml
lrwxrwxrwx 1 root root    50 Jul 27 09:16 hdfs-site.xml -> /home/hadoop/hadoop-2.2.0/etc/hadoop/hdfs-site.xml*
-rw-r--r-- 1 root staff 3451 Mar 25 06:29 log4j.properties
-rw-r--r-- 1 root staff    6 Jul 20 21:38 regionservers
root@m1:/home/hadoop/hbase-0.96.2-hadoop2/conf#

　　　　3)、hbase0.96.2版本的jar包不需要复制，官方提供的是已经打包好的

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/lib# ls | grep hadoop
hadoop-annotations-2.2.0.jar
hadoop-auth-2.2.0.jar
hadoop-client-2.2.0.jar
hadoop-common-2.2.0.jar
hadoop-hdfs-2.2.0.jar
hadoop-hdfs-2.2.0-tests.jar
hadoop-mapreduce-client-app-2.2.0.jar
hadoop-mapreduce-client-common-2.2.0.jar
hadoop-mapreduce-client-core-2.2.0.jar
hadoop-mapreduce-client-jobclient-2.2.0.jar
hadoop-mapreduce-client-jobclient-2.2.0-tests.jar
hadoop-mapreduce-client-shuffle-2.2.0.jar
hadoop-yarn-api-2.2.0.jar
hadoop-yarn-client-2.2.0.jar
hadoop-yarn-common-2.2.0.jar
hadoop-yarn-server-common-2.2.0.jar
hadoop-yarn-server-nodemanager-2.2.0.jar
hbase-client-0.96.2-hadoop2.jar
hbase-common-0.96.2-hadoop2.jar
hbase-common-0.96.2-hadoop2-tests.jar
hbase-examples-0.96.2-hadoop2.jar
hbase-hadoop2-compat-0.96.2-hadoop2.jar
hbase-hadoop-compat-0.96.2-hadoop2.jar
hbase-it-0.96.2-hadoop2.jar
hbase-it-0.96.2-hadoop2-tests.jar
hbase-prefix-tree-0.96.2-hadoop2.jar
hbase-protocol-0.96.2-hadoop2.jar
hbase-server-0.96.2-hadoop2.jar
hbase-server-0.96.2-hadoop2-tests.jar
hbase-shell-0.96.2-hadoop2.jar
hbase-testing-util-0.96.2-hadoop2.jar
hbase-thrift-0.96.2-hadoop2.jar
root@m1:/home/hadoop/hbase-0.96.2-hadoop2/lib#

　　　　4)、将m1上面的hbase0.96.2复制到m2,s1,s2同样的目录中

root@m1:/home/hadoop/hbase-0.96.2-hadoop2/lib# scp -r /home/hadoop/hbase-0.96.2-hadoop2 root@m2:/home/hadoop
root@m1:/home/hadoop/hbase-0.96.2-hadoop2/lib# scp -r /home/hadoop/hbase-0.96.2-hadoop2 root@s1:/home/hadoop
root@m1:/home/hadoop/hbase-0.96.2-hadoop2/lib# scp -r /home/hadoop/hbase-0.96.2-hadoop2 root@s2:/home/hadoop

　　　　5)、在m1上启动hbase0.96.2，执行命令后，浏览网址可以看效果：http://m1:60010/master-status

root@m1:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/start-hbase.sh
starting master, logging to /home/hadoop/hbase-0.96.2-hadoop2/bin/../logs/hbase-root-master-m1.out
s1: starting regionserver, logging to /home/hadoop/hbase-0.96.2-hadoop2/bin/../logs/hbase-root-regionserver-s1.out
s2: starting regionserver, logging to /home/hadoop/hbase-0.96.2-hadoop2/bin/../logs/hbase-root-regionserver-s2.out
root@m1:/home/hadoop# jps
6688 NameNode
7540 HMaster
2884 JournalNode
4375 DFSZKFailoverController
2553 QuorumPeerMain
7769 Jps
4075 ResourceManager
root@m1:/home/hadoop#

　　　　6)、在m1上用shell测试连接hbase

root@m1:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/hbase shell
2014-07-27 09:31:07,601 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter ‘help<RETURN>‘ for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.96.2-hadoop2, r1581096, Mon Mar 24 16:03:18 PDT 2014
hbase(main):001:0> list
TABLE
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.96.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
0 row(s) in 2.8030 seconds
=> []
hbase(main):002:0> version
0.96.2-hadoop2, r1581096, Mon Mar 24 16:03:18 PDT 2014
hbase(main):003:0> status
2 servers, 0 dead, 1.0000 average load
hbase(main):004:0> create ‘test_idoall_org‘,‘uid‘,‘name‘
0 row(s) in 0.5800 seconds
=> Hbase::Table - test_idoall_org
hbase(main):005:0> list
TABLE
test_idoall_org
1 row(s) in 0.0320 seconds
=> ["test_idoall_org"]
hbase(main):006:0> put ‘test_idoall_org‘,‘10086‘,‘name:idoall‘,‘idoallvalue‘
0 row(s) in 0.1090 seconds                 ^
hbase(main):009:0> get ‘test_idoall_org‘,‘10086‘
COLUMN                                                 CELL
name:idoall                                           timestamp=1406424831473, value=http://www.mamicode.com/idoallvalue
1 row(s) in 0.0450 seconds
hbase(main):010:0> scan ‘test_idoall_org‘
ROW                                                    COLUMN+CELL
10086                                                 column=name:idoall, timestamp=1406424831473, value=http://www.mamicode.com/idoallvalue
1 row(s) in 0.0620 seconds
hbase(main):011:0>

　　　　7)、在m2上启动hbase，同样执行命令后，在浏览器打开网址也可以看到m2上的hbase状态：http://m2:60010/master-status

root@m2:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/hbase-daemon.sh start master
starting master, logging to /home/hadoop/hbase-0.96.2-hadoop2/bin/../logs/hbase-root-master-m2.out
root@m2:/home/hadoop#

　　　　8)、测试m1和m2的主从备份切换
　　　　　　　a)这时在浏览器打开http://m1:60010/master-status和http://m2:60010/master-status，可以看到下图的状态

　　　　　　　b)我们在m1上停止掉hbase的进程，再打开网址，会发现m1已经打不开，而m2的hbase集群状态已经被改变

root@m1:/home/hadoop# jps
6688 NameNode
7540 HMaster
2884 JournalNode
8645 Jps
4375 DFSZKFailoverController
2553 QuorumPeerMain
4075 ResourceManager
root@m1:/home/hadoop# kill -9 7540
root@m1:/home/hadoop# jps
6688 NameNode
2884 JournalNode
4375 DFSZKFailoverController
2553 QuorumPeerMain
4075 ResourceManager
8655 HMaster
8719 Jps
root@m1:/home/hadoop#

　　　　至此，hbase已经配置完，并且主从故障转移是可用的。

　　8、在ubuntu12.04的m1上面安装mysql5.5.x

　　　　1)、apt-get install mysql-server mysql-client mysql-common
　　　　过程中会弹出一个界面，让你输入root的密码。我设置的是123456
　　　　安装后可以测试下mysql的连接状态：mysql -uroot -p123456
　　　　可以用service mysql stop/service mysql start来启动和停止mysql状态

　　　　2)、授权可以远程访问mysql

root@m1:/home/hadoop# mysql -uroot -p123456
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 36
Server version: 5.5.22-0ubuntu1 (Ubuntu)

Copyright (c) 2000, 2011, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type ‘help;‘ or ‘\h‘ for help. Type ‘\c‘ to clear the current input statement.

mysql> grant all on *.* to ‘root‘@‘%‘ identified by ‘123456‘ WITH GRANT OPTION;
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> quit
Bye
root@m1:/home/hadoop#

　　　　3)、如果还无法远程连接，打开：vi /etc/mysql/my.cnf。将bind-address=127.0.0.1，改为本机ip，重新启动mysql

　　9、hive 0.13.1安装(在m1上操作)

　　　　1)、将apache-hive-0.13.1-bin.tar.gz解压到/home/hadoop/hive-0.13.1

　　　　2)、进入到hive的conf文件，将模板文件复制出对应的配置文件

root@m1:/home/hadoop/hive-0.13.1/conf# cp hive-env.sh.template hive-env.sh
root@m1:/home/hadoop/hive-0.13.1/conf# cp hive-default.xml.template hive-site.xml

　　　　3)、修改hive-env.sh文件，主要设置hadoop目录

root@m1:/home/hadoop/hive-0.13.1/conf# vi hive-env.sh
HADOOP_HOME=/home/hadoop/hadoop-2.2.0

　　　　4)、修改hive-site.xml文件

root@m1:/home/hadoop/hive-0.13.1/conf# vi hive-site.xml
       <property>
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="http://www.mamicode.com/configuration.xsl"?>

                
               <name>hive.metastore.warehouse.dir</name>
               <value>hdfs://mycluster/user/hive/warehouse</value>
       </property>
       <property>
       </property>
               <description>The list of zookeeper servers to talk to. This isonly needed for read/write locks.</description>
                
               <name>hive.exec.scratchdir</name>
               <value>hdfs://mycluster/user/hive/scratchdir</value>
       </property>
       <property>
                
               <name>hive.querylog.location</name>
               <value>/home/hadoop/hive-0.13.1/logs</value>
       </property>
       <property>
                
               <name>javax.jdo.option.ConnectionURL</name>
               <value>jdbc:mysql://m1:3306/hiveMeta?createDatabaseIfNotExist=true</value>
       </property>
       <property>
                
               <name>javax.jdo.option.ConnectionDriverName</name>
               <value>com.mysql.jdbc.Driver</value>
       </property>
       <property>
               <name>javax.jdo.option.ConnectionUserName</name>
               <value>root</value>
       </property>
       <property>
               <name>javax.jdo.option.ConnectionPassword</name>
               <value>123456</value>
       </property>
       <property>
                
                
               <name>hive.aux.jars.path</name>
               <value>file:///home/hadoop/hive-0.13.1/lib/hbase-hadoop-compat-0.96.2-hadoop2.jar,file:///home/hadoop/hive-0.13.1/lib/hbase-hadoop2-compat-0.96.2-hadoop2.jar,file:///home/hadoop/hive-0.13.1/lib/hive-h
base-handler-0.13.1.jar,file:///home/hadoop/hive-0.13.1/lib/protobuf-java-2.5.0.jar,file:///home/hadoop/hive-0.13.1/lib/hbase-client-0.96.2-hadoop2.jar,file:///home/hadoop/hive-0.13.1/lib/hbase-common-0.96.2-hadoop2
.jar,file:///home/hadoop/hive-0.13.1/lib/hbase-protocol-0.96.2-hadoop2.jar,file:///home/hadoop/hive-0.13.1/lib/hbase-server-0.96.2-hadoop2.jar,file:///home/hadoop/hive-0.13.1/lib/zookeeper-3.4.5.jar,file:///home/had
oop/hive-0.13.1/lib/guava-11.0.2.jar,file:///home/hadoop/hive-0.13.1/lib/htrace-core-2.04.jar</value>
       </property>
       <property>
                
               <name>hive.zookeeper.quorum</name>
               <value>m1,m2,s1,s2</value>
       </property>
</configuration>

　　　　5)、hive-site.xml中hive.aux.jars.path配置项包含的jar，hive-hbase-handler-0.13.1.jar和guava-11.0.2.jar是默认就有的，只需要执行以下命令，将其他的从hadoop/zookeeper/hbase中复制过来即可

root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/protobuf-java-2.5.0.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-client-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-common-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-protocol-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-server-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-hadoop2-compat-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/hbase-hadoop-compat-0.96.2-hadoop2.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/hbase-0.96.2-hadoop2/lib/htrace-core-2.04.jar /home/hadoop/hive-0.13.1/lib
root@m1:/home/hadoop# cp /home/hadoop/zookeeper-3.4.5/dist-maven/zookeeper-3.4.5.jar /home/hadoop/hive-0.13.1/lib

　　　　6)、mysql的odbc驱动，可以到这里下载http://dev.mysql.com/downloads/connector/j/，解压后，将目录中的mysql-connector-java-5.1.31-bin.jar复制到 /home/hadoop/hive-0.13.1/lib

　　　　7)、创建测试数据，以及数据仓库目录

root@m1:/home/hadoop/hive-0.13.1/conf# vi /home/hadoop/hive-0.13.1/testdata001.dat
12306,mname,yname
10086,myidoall,youidoall
/home/hadoop/hadoop-2.2.0/bin/hadoop fs -mkdir -p /user/hive/warehouse

　　　　8)、使用shell命令，测试hive

root@m1:/home/hadoop# /home/hadoop/hive-0.13.1/bin/hive
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/07/27 11:17:35 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed

Logging initialized using configuration in jar:file:/home/hadoop/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive> show databases;
OK
default
Time taken: 0.464 seconds, Fetched: 1 row(s)
hive> create database testidoall;
OK
Time taken: 0.279 seconds
hive> show databases;
OK
default
testidoall
Time taken: 0.021 seconds, Fetched: 2 row(s)
hive> use testidoall;
OK
Time taken: 0.039 seconds
hive> create external table testtable(uid int,myname string,youname string) row format delimited fields terminated by ‘,‘ location ‘/user/hive/warehouse/testtable‘;
OK
Time taken: 0.205 seconds
hive> LOAD DATA LOCAL INPATH ‘/home/hadoop/hive-0.13.1/testdata001.dat‘ OVERWRITE INTO TABLE testtable;
Copying data from file:/home/hadoop/hive-0.13.1/testdata001.dat
Copying file: file:/home/hadoop/hive-0.13.1/testdata001.dat
Loading data to table testidoall.testtable
rmr: DEPRECATED: Please use ‘rm -r‘ instead.
Deleted hdfs://mycluster/user/hive/warehouse/testtable
Table testidoall.testtable stats: [numFiles=0, numRows=0, totalSize=0, rawDataSize=0]
OK
Time taken: 0.77 seconds
hive> select * from testtable;
OK
12306 mname yname
10086 myidoall youidoall
Time taken: 0.279 seconds, Fetched: 2 row(s)
hive>

　　　　至此，hive已经安装完成。

　　10、hive to hbase(Hive中的表数据导入到Hbase中去)

　　　　1)、创建hbase可以识别的表

root@m1:/home/hadoop# /home/hadoop/hive-0.13.1/bin/hive
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/07/27 11:33:53 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed

Logging initialized using configuration in jar:file:/home/hadoop/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive> show databases;
OK
default
testidoall
Time taken: 0.45 seconds, Fetched: 2 row(s)
hive> use testidoall;
OK
Time taken: 0.021 seconds
hive> show tables;
OK
testtable
Time taken: 0.032 seconds, Fetched: 1 row(s)
hive> CREATE TABLE hive2hbase_idoall(key int, value string) STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler‘ WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") TBLPROPERTIES ("hbase.table.name" = "hive2hbase_idoall");
OK
Time taken: 2.332 seconds
hive> show tables;
OK
hive2hbase_idoall
testtable
Time taken: 0.036 seconds, Fetched: 2 row(s)
hive>

　　　　2)、创建本地表，用来存储数据，然后插入到Hbase用的，相当于一张中间表了。同时将之前的测试数据导入到这张中间表。

hive> create table hive2hbase_idoall_middle(foo int,bar string)row format delimited fields terminated by ‘,‘;
OK
Time taken: 0.086 seconds
hive> show tables;
OK
hive2hbase_idoall
hive2hbase_idoall_middle
testtable
Time taken: 0.03 seconds, Fetched: 3 row(s)
hive> load data local inpath ‘/home/hadoop/hive-0.13.1/testdata001.dat‘ overwrite into table hive2hbase_idoall_middle;
Copying data from file:/home/hadoop/hive-0.13.1/testdata001.dat
Copying file: file:/home/hadoop/hive-0.13.1/testdata001.dat
Loading data to table testidoall.hive2hbase_idoall_middle
rmr: DEPRECATED: Please use ‘rm -r‘ instead.
Deleted hdfs://mycluster/user/hive/warehouse/testidoall.db/hive2hbase_idoall_middle
Table testidoall.hive2hbase_idoall_middle stats: [numFiles=1, numRows=0, totalSize=43, rawDataSize=0]
OK
Time taken: 0.683 seconds
hive>

　　　　3)、将本地中间表（hive2hbase_idoall_middle）导入到表（hive2hbase_idoall）中，会自动同步到hbase。

hive> insert overwrite table hive2hbase_idoall select * from hive2hbase_idoall_middle;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there‘s no reduce operator
Starting Job = job_1406394452186_0002, Tracking URL = http://m1:8088/proxy/application_1406394452186_0002/
Kill Command = /home/hadoop/hadoop-2.2.0/bin/hadoop job  -kill job_1406394452186_0002
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2014-07-27 11:44:11,491 Stage-0 map = 0%,  reduce = 0%
2014-07-27 11:44:22,684 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 1.51 sec
MapReduce Total cumulative CPU time: 1 seconds 510 msec
Ended Job = job_1406394452186_0002
MapReduce Jobs Launched:
Job 0: Map: 1   Cumulative CPU: 1.51 sec   HDFS Read: 288 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 510 msec
OK
Time taken: 25.613 seconds
hive> select * from hive2hbase_idoall;
OK
10086   myidoall
12306   mname
Time taken: 0.179 seconds, Fetched: 2 row(s)
hive> select * from hive2hbase_idoall_middle;
OK
12306   mname
10086   myidoall
Time taken: 0.088 seconds, Fetched: 2 row(s)
hive>

　　　　4)、用shell连接hbase，查看hive过来的数据是否已经存在

root@m1:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/hbase shell
2014-07-27 11:47:14,454 INFO  [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter ‘help<RETURN>‘ for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.96.2-hadoop2, r1581096, Mon Mar 24 16:03:18 PDT 2014

hbase(main):001:0> list
TABLE
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.96.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
hive2hbase_idoall
test_idoall_org
2 row(s) in 2.9480 seconds

=> ["hive2hbase_idoall", "test_idoall_org"]
hbase(main):002:0> scan "hive2hbase_idoall"
ROW                                                    COLUMN+CELL
10086                                                 column=cf1:val, timestamp=1406432660860, value=http://www.mamicode.com/myidoall
12306                                                 column=cf1:val, timestamp=1406432660860, value=http://www.mamicode.com/mname
2 row(s) in 0.0540 seconds

hbase(main):003:0> get "hive2hbase_idoall",‘12306‘
COLUMN                                                 CELL
cf1:val                                               timestamp=1406432660860, value=http://www.mamicode.com/mname
1 row(s) in 0.0110 seconds

hbase(main):004:0>

　　　　至此,hive to hbase的测试功能正常。

　　11、hbase to hive（Hbase中的表数据导入到Hive）

　　　　1)、在hbase下创建表hbase2hive_idoall

root@m1:/home/hadoop# /home/hadoop/hbase-0.96.2-hadoop2/bin/hbase shell
2014-07-27 11:54:25,844 INFO [main] Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter ‘help<RETURN>‘ for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.96.2-hadoop2, r1581096, Mon Mar 24 16:03:18 PDT 2014

hbase(main):001:0> create ‘hbase2hive_idoall‘,‘gid‘,‘info‘
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-0.96.2-hadoop2/lib/slf4j-log4j12-1.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
0 row(s) in 3.4970 seconds

=> Hbase::Table - hbase2hive_idoall
hbase(main):002:0> put ‘hbase2hive_idoall‘,‘3344520‘,‘info:time‘,‘20140704‘
0 row(s) in 0.1020 seconds

hbase(main):003:0> put ‘hbase2hive_idoall‘,‘3344520‘,‘info:address‘,‘HK‘
0 row(s) in 0.0090 seconds

hbase(main):004:0> scan ‘hbase2hive_idoall‘
ROW                                                    COLUMN+CELL
3344520                                               column=info:address, timestamp=1406433302317, value=http://www.mamicode.com/HK
3344520                                               column=info:time, timestamp=1406433297567, value=http://www.mamicode.com/20140704
1 row(s) in 0.0330 seconds

hbase(main):005:0>

　　　　2)、Hive下创建表连接Hbase中的表

root@m1:/home/hadoop# /home/hadoop/hive-0.13.1/bin/hive
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.min.split.size.per.node is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.node
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.min.split.size.per.rack is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize.per.rack
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.max.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.maxsize
14/07/27 11:57:20 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed

Logging initialized using configuration in jar:file:/home/hadoop/hive-0.13.1/lib/hive-common-0.13.1.jar!/hive-log4j.properties
hive> show databases;
OK
default
testidoall
Time taken: 0.449 seconds, Fetched: 2 row(s)
hive> use testidoall;
OK
Time taken: 0.02 seconds
hive> show tables;
OK
hive2hbase_idoall
hive2hbase_idoall_middle
testtable
Time taken: 0.026 seconds, Fetched: 3 row(s)
hive> create external table hbase2hive_idoall (key string,gid map<string,string>)STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler‘ WITH SERDEPROPERTIES ("hbase.columns.mapping" ="info:") TBLPROPERTIES ("hbase.table.name" = "hbase2hive_idoall");
OK
Time taken: 1.696 seconds
hive> show tables;
OK
hbase2hive_idoall
hive2hbase_idoall
hive2hbase_idoall_middle
testtable
Time taken: 0.034 seconds, Fetched: 4 row(s)
hive> select * from hbase2hive_idoall;
OK
3344520 {"address":"HK","time":"20140704"}
Time taken: 0.701 seconds, Fetched: 1 row(s)
hive>

　　　　至此，如文章标题所描述的ubuntu12.04+hadoop2.2.0+zookeeper3.4.5+hbase0.96.2+hive0.13.1分布式环境部署，全部测试完毕，过程中也遇到了一些坑，会在常见问题中介绍。希望这个测试笔记可以帮助到更多的人。

　　四、常见问题

　　1、过程中如果在hadoop(namenode/datanode/yarn)、hbase、hive启动出现问题时，一定要用tail -n 100 ***.log仔细查看相关的日志，可以发现很多有用的信息。以下几个命令，也有助于在命令行模式追踪错误。

　　　　　　1)、hadoop在控制台输出debug信息，执行完以下命令后，可以启动namenode,datanode,yarn测试效果

export HADOOP_ROOT_LOGGER=DEBUG,console

　　　　　　2)、hive 在控制台输出debug信息

/home/hadoop/hive-0.13.1/bin/hive --hiveconf hive.root.logger=DEBUG,console

　　2、mysql在启动时，遇到过job failed to start，可以用以下几个命令，重新安装解决。

rm /var/lib/mysql/ -R
rm /etc/mysql/ -R
apt-get autoremove mysql* —purge
apt-get remove apparmor
apt-get install mysql-server mysql-client mysql-common

　　3、dpkg 被中断,您必须手工运行 sudo dpkg --configure -a解决此问题

sudo rm /var/lib/dpkg/updates/*
sudo apt-get update
sudo apt-get upgrade

　　五、参考资料
　　_00018 Hadoop-2.2.0 + Hbase-0.96.2 + Hive-0.13.1 分布式环境整合，Hadoop-2.X使用HA方式

　　Hadoop2.2.0源代码编译

　　hadoop2.1.0编译安装教程

　　CentOS6.4编译Hadoop2.2.0

声明：以上内容来自用户投稿及互联网公开渠道收集整理发布，本网站不拥有所有权，未作人工编辑处理，也不承担相关法律责任，若内容有误或涉及侵权可进行投诉：投诉/举报工作人员会在5个工作日内联系你，一经查实，本站将立刻删除涉嫌侵权内容。

联系
我们