首页 > 代码库 > Linux下Hadoop 2.2.0 集群配置攻略
Linux下Hadoop 2.2.0 集群配置攻略
Hadoop 2.2.0 集群配置攻略
1. 安装sun jdk
(1). 到Oracle的官方网站下载jdk,目前最新版本是7u51
安装包:
http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html
或 rpm包
http://www.oracle.com/technetwork/cn/java/javase/downloads/jdk7-downloads-1880260-zhs.html
(2). 卸载jdk, 如果在部署机上已有其他版本
# rpm -qa | grep jdk
ldapjdk-4.18-2jpp.3.el5
jdk-1.7.0_51-fcs
# rpm -e --nodeps jdk-1.7.0_51-fcs
(3)安装jdk
# chmod +x jdk-7u45-linux-x64.rpm
# rpm -ivh jdk-7u45-linux-x64.rpm
(4). 在/etc/profile中添加环境变量
JAVA_HOME=/usr/java/jdk1.7.0_51
JRE_HOME=/usr/java/jdk1.7.0_51/jre
CLASSPATH=.:$JAVA_HOME/lib/jt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib
PATH=$PATH:$JAVA_HOME/bin
export JAVA_HOME JRE_HOME PATH CLASSPATH
(5). 保存环境变量
# source /etc/profile
(6). 安装测试
# java -version
java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)
# javac -version
javac 1.7.0_51
(6). 卸载jdk, 如果有必要
# rpm -qa | grep jdk
ldapjdk-4.18-2jpp.3.el5
jdk-1.7.0_51-fcs
# rpm -e --nodeps jdk-1.7.0_51-fcs
2. 安装其他基础库
# yum install openssh-clients
# yum install openssh-server
(1‘). Ubuntu系统 则
# sudo apt-get install g++ autoconf automake libtool make cmake zlib1g-dev pkg-config libssl-dev
# sudo apt-get install openssh-clients
#
sudo apt-get install openssh-server
(2). 安装protobuf
下载地址:下载最新的安装包,当前最新的安装包为 protobuf-2.5.0
https://code.google.com/p/protobuf/downloads/list
# tar -xzvf protobuf-2.5.0.tar.gz
# cd probuf-2.5.0
# ./configure ; make ; make check; make install
# protoc --version
libprotoc 2.5.0
3. 安装maven
(1). 首先到Maven官网( http://maven.apache.org/download.cgi ),
下载最新的bin包(apache-maven-3.2.1-bin.tar.gz)。完成后解压并将目录移动到/usr/local/apache-maven
(2). 编辑/etc/profile,配置环境变量
MAVEN_HOME=/usr/local/apache-maven-3.2.1
PATH=$PATH:$MAVEN_HOME/bin
export MAVEN_HOME PATH
(3). 保存环境变量
# source /etc/profile
$ echo $MAVEN_HOME
$ mvn -v
Maven home: /usr/local/apache-maven/apache-maven-3.2.1
Java version: 1.7.0_51, vendor: Oracle Corporation
Java home: /usr/java/jdk1.7.0_51/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "2.6.18-194.el5", arch: "amd64", family: "unix"
<mirror>
<id>nexus-osc</id><mirrorOf>*</mirrorOf><name>Nexusosc</name><url>http://maven.oschina.net/content/groups/public/</url>
<id>jdk-1.7</id><activation><jdk>1.7</jdk></activation><repositories><repository><id>nexus</id><name>local private nexus</name><url>http://maven.oschina.net/content/groups/public/</url><releases><enabled>true</enabled></releases><snapshots><enabled>false</enabled></snapshots></repository></repositories><pluginRepositories><pluginRepository><id>nexus</id><name>local private nexus</name><url>http://maven.oschina.net/content/groups/public/</url><releases><enabled>true</enabled></releases><snapshots><enabled>false</enabled></snapshots></pluginRepository></pluginRepositories>
4. 编译安装Hadoop
# wget http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0-src.tar.gz
<dependency>
<groupId>org.mortbay.jetty</groupId>
<artifactId>jetty-util</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.mortbay.jetty</groupId>
<artifactId>jetty</artifactId>
<scope>test</scope>
</dependency>
</dependencys>
[INFO] ------------------------------------------------------------------------[INFO] Reactor Summary:[INFO][INFO] Apache Hadoop Main ................................ SUCCESS [3.709s][INFO] Apache Hadoop Project POM ......................... SUCCESS [2.229s][INFO] Apache Hadoop Annotations ......................... SUCCESS [5.270s][INFO] Apache Hadoop Assemblies .......................... SUCCESS [0.388s][INFO] Apache Hadoop Project Dist POM .................... SUCCESS [3.485s][INFO] Apache Hadoop Maven Plugins ....................... SUCCESS [8.655s][INFO] Apache Hadoop Auth ................................ SUCCESS [7.782s][INFO] Apache Hadoop Auth Examples ....................... SUCCESS [5.731s][INFO] Apache Hadoop Common .............................. SUCCESS [1:52.476s][INFO] Apache Hadoop NFS ................................. SUCCESS [9.935s][INFO] Apache Hadoop Common Project ...................... SUCCESS [0.110s][INFO] Apache Hadoop HDFS ................................ SUCCESS [1:58.347s][INFO] Apache Hadoop HttpFS .............................. SUCCESS [26.915s][INFO] Apache Hadoop HDFS BookKeeper Journal ............. SUCCESS [17.002s][INFO] Apache Hadoop HDFS-NFS ............................ SUCCESS [5.292s][INFO] Apache Hadoop HDFS Project ........................ SUCCESS [0.073s][INFO] hadoop-yarn ....................................... SUCCESS [0.335s][INFO] hadoop-yarn-api ................................... SUCCESS [54.478s][INFO] hadoop-yarn-common ................................ SUCCESS [39.215s][INFO] hadoop-yarn-server ................................ SUCCESS [0.241s][INFO] hadoop-yarn-server-common ......................... SUCCESS [15.601s][INFO] hadoop-yarn-server-nodemanager .................... SUCCESS [21.566s][INFO] hadoop-yarn-server-web-proxy ...................... SUCCESS [4.754s][INFO] hadoop-yarn-server-resourcemanager ................ SUCCESS [20.625s][INFO] hadoop-yarn-server-tests .......................... SUCCESS [0.755s][INFO] hadoop-yarn-client ................................ SUCCESS [6.748s][INFO] hadoop-yarn-applications .......................... SUCCESS [0.155s][INFO] hadoop-yarn-applications-distributedshell ......... SUCCESS [4.661s][INFO] hadoop-mapreduce-client ........................... SUCCESS [0.160s][INFO] hadoop-mapreduce-client-core ...................... SUCCESS [36.090s][INFO] hadoop-yarn-applications-unmanaged-am-launcher .... SUCCESS [2.753s][INFO] hadoop-yarn-site .................................. SUCCESS [0.151s][INFO] hadoop-yarn-project ............................... SUCCESS [4.771s][INFO] hadoop-mapreduce-client-common .................... SUCCESS [24.870s][INFO] hadoop-mapreduce-client-shuffle ................... SUCCESS [3.812s][INFO] hadoop-mapreduce-client-app ....................... SUCCESS [15.759s][INFO] hadoop-mapreduce-client-hs ........................ SUCCESS [6.831s][INFO] hadoop-mapreduce-client-jobclient ................. SUCCESS [8.126s][INFO] hadoop-mapreduce-client-hs-plugins ................ SUCCESS [2.320s][INFO] Apache Hadoop MapReduce Examples .................. SUCCESS [9.596s][INFO] hadoop-mapreduce .................................. SUCCESS [3.905s][INFO] Apache Hadoop MapReduce Streaming ................. SUCCESS [7.118s][INFO] Apache Hadoop Distributed Copy .................... SUCCESS [11.651s][INFO] Apache Hadoop Archives ............................ SUCCESS [2.671s][INFO] Apache Hadoop Rumen ............................... SUCCESS [10.038s][INFO] Apache Hadoop Gridmix ............................. SUCCESS [6.062s][INFO] Apache Hadoop Data Join ........................... SUCCESS [4.104s][INFO] Apache Hadoop Extras .............................. SUCCESS [4.210s][INFO] Apache Hadoop Pipes ............................... SUCCESS [9.419s][INFO] Apache Hadoop Tools Dist .......................... SUCCESS [2.306s][INFO] Apache Hadoop Tools ............................... SUCCESS [0.037s][INFO] Apache Hadoop Distribution ........................ SUCCESS [21.579s][INFO] Apache Hadoop Client .............................. SUCCESS [7.299s][INFO] Apache Hadoop Mini-Cluster ........................ SUCCESS [7.347s][INFO] ------------------------------------------------------------------------[INFO] BUILD SUCCESS[INFO] ------------------------------------------------------------------------[INFO] Total time: 11:53.144s[INFO] Finished at: Fri Nov 22 16:58:32 CST 2013[INFO] Final Memory: 70M/239M[INFO] ------------------------------------------------------------------------
export HADOOP_HOME=/usr/local/hadoop-2.2.0
(7). 保存环境变量
# source /etc/profile
(8). 测试编译版本及配置
# hadoop version
Hadoop 2.2.0
5. Linux分布式配置
若采用单机部署,则直接跳至 7.hadoop 单机部署配置
(1). 确定linux主从机器, 并修改/etc/sysconfig/network 的主机名
如部署三台机器,
# vim /etc/sysconfig/network
# hostname xxx
61.129.82.157(had-master)
61.129.82.221(had-slave1)
61.129.82.222(had-slave2)
(2). 配置每台机器上的hosts
修改每台机器的/etc/hosts(包括namenode和datanode)
127.0.0.1 localhost.localdomain localhost
61.129.82.157 had-master
61.129.82.221 had-slave1
61.129.82.222 had-slave2
::1 localhost6.localdomain6 localhost6
(3). 在linux主从机器上建立统一帐号并设计密码
# useradd hadoop
# passwd hadoop
# su hadoop
$ cd ~
(4).在NameNode上生成公密私密
haduser@61.129.82.157$ ssh-keygen -t rsa -P ‘‘
Generating public/private rsa key pair. Enter passphrase (empty for no passphrase): (忽略) Enter same passphrase again: (忽略) Your identification has been saved in /home/haduser/.ssh/id_rsa. Your public key has been saved in /home/haduser/.ssh/id_rsa.pub.
(5).复制公密到待控制服务器
haduser@61.129.82.221$cd ~; mkdir .sshhaduser@61.129.82.222$cd ~; mkdir .sshhaduser@61.129.82.157$cd ~/.sshhaduser@61.129.82.157$scp -P22 ~/.ssh/id_rsa.pub hadoop@61.129.82.221:~/.sshhaduser@61.129.82.157$scp -P22 ~/.ssh/id_rsa.pub hadoop@61.129.82.222:~/.ssh
(6). 各服务器添加公密到信任区域,以221为例
haduser@61.129.82.221$cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys(7). 设置各服务器.ssh目录权限,以221为例
haduser@61.129.82.221$cd ~/haduser@61.129.82.221$chmod 700 .ssh -Rhaduser@61.129.82.221$chmod 600 .ssh/authorized_keys(8).ssh 无密登录测试
haduser@61.129.82.157$ssh 61.129.82.221第一次需要回车确认,第二次登录测试
haduser@61.129.82.157$ssh 61.129.82.221显示last login.
6. Hadoop分布式部署配置
(1). 修改core-site.xml
主要完成NameNode的 ip和port设置, hadoop分布式文件系统的两个重要的目录结构,一个是namenode上名字空间的存放地方,一个是datanode数据块的存放地方,还有一些其他的文件存放地方,这些存放地方都是基于hadoop.tmp.dir目录的.
比如:
namenode的名字空间存放地方就是 ${hadoop.tmp.dir}/dfs/name,
datanode数据块的存放地方就是 ${hadoop.tmp.dir}/dfs/data
所以设置好hadoop.tmp.dir目录后,其他的重要目录都是在这个目录下面,这是一个根目录。在此设置的是/tmp,当然这个目录必须是存在的。
# vim $HADOOP_HOME/etc/hadoop/core-site.xml
<property><name>fs.default.name</name>
<value>hdfs://61.129.82.157:9000/</value>
<description>The name of the default file system.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp/hadoop</value>
<description>A base for other temporary directories.</description>
</property>
(2). 修改hdfs-site.xml
# vim $HADOOP_HOME/etc/hadoop/hdfs-site.xml
<property>
<name>dfs.replication </name>
<value>2</value>
</property>
(3). 修改mapred-site.xml
# cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml
# vim $HADOOP_HOME/etc/hadoop/mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>61.129.82.157:9001</value>
<description>true</description>
</property>
(4). 修改slaves文件
# vim $HADOOP_HOME/etc/hadoop/slaves
had-slave1
had-slave2
(5). 修改hadoop-env文件
# vim $HADOOP_HOME/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_51
7. Hadoop单机部署配置
(1). 修改core-site.xml
# vim $HADOOP_HOME/etc/hadoop/core-site.xml
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
<description>The name of the default file system.</description>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/Users/micmiu/tmp/hadoop</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>io.native.lib.available</name>
<value>false</value>
<description>default value is true:Should native hadoop libraries, if present, be used.</description>
</property>
(2). 修改hdfs-site.xml
# vim $HADOOP_HOME/etc/hadoop/hdfs -site.xml
<property>
<name>dfs.replication </name>
<value>1</value>
</property>
(3). 修改yarn-site.xml
# vim $HADOOP_HOME/etc/hadoop/yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
(4). 修改mapred-site.xml
# cp $HADOOP_HOME/etc/hadoop/mapred-site.xml.template $HADOOP_HOME/etc/hadoop/mapred-site.xml
# vim $HADOOP_HOME/etc/hadoop/mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<final>true</final>
</property>
8. Hadoop启动 并执行示例
(1) 添加haduser 帐号并 设置localhost无密登录
步骤5 为本地分布式部署,设置localhost无密登录
# useradd haduser
# passwd haduser
# su haduser
$ cd ~
$ ssh-keygen -t rsa -P ‘‘
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
验证ssh设置是否成功:
$ ssh localhost
last login: xxxxx
(2) 启动Hadoop
$ hdfs namenode -format
$ start-dfs.sh
$ jps
1522 NameNode
1651 DataNode
1794 SecondaryNameNode
1863 Jps
$ start-yarn.sh
$ jps
2033 NodeManager
1900 ResourceManager
1522 NameNode
1651 DataNode
2058 Jps
1794 SecondaryNameNode
(3) 测试用例
$ hdfs dfs -ls /
$ hdfs dfs -mkdir /dhfile
$ hdfs dfs -ls /
$ hdfs dfs -p /hadfile/ot/otds
$ hdfs dfs -ls /hadfile/ot/otds
$ hdfs dfs -put XXX.log /hadfile/ot/otds
$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /hadfile/ot/otds /hadout
$ hadoop dfs -ls /hadout
$ hadoop dfs -cat /hadout/part-r-0000
(4) web ui test
http://10.0.18.31:8088
http://10.0.18.31:50070
检查监控页面
http://10.0.18.31:8088/cluster/nodes
HDFS集群状态:
http://10.0.18.31:50070/dfshealth.jsp
检查监控页面
http://61.129.82.157:8088/cluster/nodes
HDFS集群状态:
http://61.129.82.157:50070/dfshealth.jsp