首页 > 代码库 > Hadoop从0开始 (安装配置:转) (一)
Hadoop从0开始 (安装配置:转) (一)
之前一直在找安装教程 都是0.20版本的 弄了好久没有弄出来..发现新版跟旧版有很大的不同
今天终于找到新版的安装配置方法.分享出来.
安装环境:
- 系统:Ubuntu 12.10
- hadoop:0.23.6
- jdk:sun 1.7.0_21
安装步骤:
一.安装JDK
安装 orcale jdk,并且配置环境以及设置成默认(略)
检查jdk是否正确安装和配置
在主目录下执行java -version
如果出现下面类似结果则ok
hadoop@ubuntu:~$ java -version
java version "1.7.0_21"
Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
Java HotSpot(TM) Server VM (build 23.21-b01, mixed mode)
二.安装hadoop
1.下载:
http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-0.23.6/hadoop-0.23.6.tar.gz
2.安装
tar -zxvf hadoop-0.23.6.tar.gz
mv hadoop /opt/
cd /opt
sudo ln -s /opt/hadoop-0.23.6 /opt/hadoop
三.安装ssh server
sudo apt-get install openssh-server
四.添加hadoop用户
为了方便hadoop的管理,最好添加一个单独的用户来管理hadoop,例如添加hadoop用户
执行以下命令
sudo adduser hadoop
然后会提示输入密码,设置好密码就可以了
这时候只要执行
su hadoop
输入密码后就可以切换到hadoop用户下
注:
为了使hadoop帐号下能够sudo,建议做以下添加
在:
%sudo ALL=(ALL:ALL) ALL 后添加
hadoop ALL=(ALL:ALL) ALL
五.配置本机ssh无密码登录
hadoop@ubuntu:~$ ssh-keygen -t rsa -P ""
hadoop@ubuntu:~$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
测试:
hadoop@ubuntu:~$ ssh localhost
Welcome to Ubuntu 12.10 (GNU/Linux 3.5.0-17-generic i686)
* Documentation: https://help.ubuntu.com/
340 packages can be updated.
105 updates are security updates.
Last login: Thu Apr 18 07:18:03 2013 from localhost
六.配置Hadoop
chown -R hadoop:hadoop /opt/hadoop
chown -R hadoop:hadoop /opt/hadoop-0.23.6
su hadoop
1. 配置jdk及hadoop环境变量
在~/.bashrc文件里追加
export JAVA_HOME=/usr/lib/jvm/java-7-sun
export JRE_HOME=${JAVA_HOME}/jre
export HADOOP_HOME=/opt/hadoop
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$HADOOP_HOME/bin:$PATH
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
2. Hadoop文件配置
hadoop@ubuntu:~$ cd /opt/hadoop/etc/hadoop/
hadoop@ubuntu:/opt/hadoop/etc/hadoop$ vi yarn-env.sh
追加以下配置
export HADOOP_FREFIX=/opt/hadoop
export HADOOP_COMMON_HOME=${HADOOP_FREFIX}
export HADOOP_HDFS_HOME=${HADOOP_FREFIX}
export PATH=$PATH:$HADOOP_FREFIX/bin
export PATH=$PATH:$HADOOP_FREFIX/sbin
export HADOOP_MAPRED_HOME=${HADOOP_FREFIX}
export YARN_HOME=${HADOOP_FREFIX}
export HADOOP_CONF_HOME=${HADOOP_FREFIX}/etc/hadoop
export YARN_CONF_DIR=${HADOOP_FREFIX}/etc/hadoop
下面在 /opt/hadoop/etc/hadoop/ 目录下进行一系列配置
vi core-site.xml
<configuration><property> <name>fs.defaultFS</name> <value>hdfs://localhost:12200</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop/hadoop-root</value> </property> <property> <name>fs.arionfs.impl</name> <value>org.apache.hadoop.fs.pvfs2.Pvfs2FileSystem</value> <description>The FileSystem for arionfs.</description></property> </configuration>
vi hdfs-site.xml
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:/opt/hadoop/data/dfs/name</value> <final>true</final> </property> <property> <name>dfs.namenode.data.dir</name> <value>file:/opt/hadoop/data/dfs/data</value> <final>true</final> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permission</name> <value>false</value> </property></configuration>
vi mapred-site.xml 这个是新文件
<configuration><property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.job.tracker</name> <value>hdfs://localhost:9001</value> <final>true</final> </property> <property> <name>mapreduce.map.memory.mb</name> <value>1536</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx1024M</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>3072</value> </property> <property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx2560M</value> </property> <property> <name>mapreduce.task.io.sort.mb</name> <value>512</value> </property> <property> <name>mapreduce.task.io.sort.factor</name> <value>100</value> </property> <property> <name>mapreduce.reduce.shuffle.parallelcopies</name> <value>50</value> </property> <property> <name>mapreduce.system.dir</name> <value>file:/opt/hadoop/data/mapred/system</value> </property> <property> <name>mapreduce.local.dir</name> <value>file:/opt/hadoop/data/mapred/local</value> <final>true</final> </property></configuration>
vi yarn-site.xml
<configuration><!-- Site specific YARN configuration properties --><property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce.shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>user.name</name> <value>hadoop</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>localhost:54311</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>localhost:54312</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>localhost:54313</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>localhost:54314</value> </property> <property> <name>yarn.web-proxy.address</name> <value>localhost:54315</value> </property> <property> <name>mapred.job.tracker</name> <value>localhost</value> </property></configuration>
配置完成. 里面的localhost 可以换成本机的IP地址或者host.
七.启动并运行wordcount程序
1.设置JAVA_HOME
hadoop@ubuntu:/opt/hadoop$ vi libexec/hadoop-config.sh
在
if [[ -z $JAVA_HOME ]]; then
# On OSX use java_home (or /Library for older versions)
if [ "Darwin" == "$(uname -s)" ]; then
if [ -x /usr/libexec/java_home ]; then
export JAVA_HOME=($(/usr/libexec/java_home))
else
export JAVA_HOME=(/Library/Java/Home)
fi
fi
# Bail if we did not detect it
if [[ -z $JAVA_HOME ]]; then
echo "Error: JAVA_HOME is not set and could not be found." 1>&2
exit 1
fi
fi
之前添加
export JAVA_HOME=/usr/lib/jvm/java-7-sun (自己电脑的java地址)
2. 格式化namenode. 进入 /opt/hadoop/
键入 bin/hadoop namenode -format
3. 启动
于/opt/hadoop/sbin
$ ./start-dfs.sh
4.检查启动是否成功
hadoop@ubuntu:/opt/hadoop/sbin$ jps
5036 DataNode
5246 SecondaryNameNode
5543 NodeManager
5369 ResourceManager
4852 NameNode
5816 Jps
5.试着运行wordcount
1)构造输入数据
生成一个字符文本文件
hadoop@ubuntu:/opt/hadoop$ cat tmp/test.txt
a c b a b d f f e b a c c d g i s a b c d e a b f g e i k m m n a b d g h i j a k j e
2)上传到hdfs
hadoop@ubuntu:/opt/hadoop$ hadoop fs -mkdir /test
hadoop@ubuntu:/opt/hadoop$ hadoop fs -copyFromLocal tmp/test.txt /test
hadoop@ubuntu:/opt/hadoop$ hadoop fs -ls /test
Found 1 items
-rw-r--r-- 1 hadoop supergroup 86 2013-04-18 07:47 /test/test.txt
3)执行程序
hadoop@ubuntu:/opt/hadoop$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-0.23.6.jar wordcount /test/test.txt /test/out #其中/test/out 为输出目录
13/04/18 22:41:11 INFO input.FileInputFormat: Total input paths to process : 1
13/04/18 22:41:11 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/04/18 22:41:11 WARN snappy.LoadSnappy: Snappy native library not loaded
13/04/18 22:41:12 INFO mapreduce.JobSubmitter: number of splits:1
13/04/18 22:41:12 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar
13/04/18 22:41:12 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
13/04/18 22:41:12 WARN conf.Configuration: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
13/04/18 22:41:12 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
13/04/18 22:41:12 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name
13/04/18 22:41:12 WARN conf.Configuration: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
13/04/18 22:41:12 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
13/04/18 22:41:12 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
13/04/18 22:41:12 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
13/04/18 22:41:12 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
13/04/18 22:41:12 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
13/04/18 22:41:13 INFO mapred.ResourceMgrDelegate: Submitted application application_1366295287642_0001 to ResourceManager at localhost/127.0.0.1:54311
13/04/18 22:41:13 INFO mapreduce.Job: The url to track the job: http://localhost:54315/proxy/application_1366295287642_0001/
13/04/18 22:41:13 INFO mapreduce.Job: Running job: job_1366295287642_0001
13/04/18 22:41:21 INFO mapreduce.Job: Job job_1366295287642_0001 running in uber mode : false
13/04/18 22:41:21 INFO mapreduce.Job: map 0% reduce 0%
13/04/18 22:41:36 INFO mapreduce.Job: map 100% reduce 0%
13/04/18 22:41:36 INFO mapreduce.Job: Task Id : attempt_1366295287642_0001_m_000000_0, Status : FAILED
Killed by external signal
13/04/18 22:41:37 INFO mapreduce.Job: map 0% reduce 0%
13/04/18 22:42:11 INFO mapreduce.Job: map 100% reduce 0%
13/04/18 22:42:26 INFO mapreduce.Job: map 100% reduce 100%
13/04/18 22:42:26 INFO mapreduce.Job: Job job_1366295287642_0001 completed successfully
13/04/18 22:42:27 INFO mapreduce.Job: Counters: 45
4)查看结果
hadoop@ubuntu:/opt/hadoop$ hadoop fs -ls /test
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2013-04-18 22:42 /test/out
-rw-r--r-- 1 hadoop supergroup 86 2013-04-18 07:47 /test/test.txt
hadoop@ubuntu:/opt/hadoop$ hadoop fs -ls /test/out
Found 2 items
-rw-r--r-- 1 hadoop supergroup 0 2013-04-18 22:42 /test/out/_SUCCESS
-rw-r--r-- 1 hadoop supergroup 56 2013-04-18 22:42 /test/out/part-r-00000
hadoop@ubuntu:/opt/hadoop$ hadoop fs -cat /test/out/part-r-00000
13/04/18 22:45:25 INFO util.NativeCodeLoader: Loaded the native-hadoop library
a 7
b 6
c 4
d 4
e 4
f 3
g 3
h 1
i 3
j 2
k 2
m 2
n 1
s 1