首页 > 代码库 > Hadoop安装和使用
Hadoop安装和使用
1、安装
1.1、下载hadoop-2.5.1.tar.gz
1.2、解压至安装目录
tar -zxv -f hadoop-2.5.1.tar.gz -C ../soft/
1.3、配置hadoop相关配置文件
vim .bashrc
##添加JAVA配置export JAVA_HOME=/usr/xuelu/javaexport PATH=$PATH:$JAVA_HOME/bin
vim .bash_profile
# .bash_profile# Get the aliases and functionsif [ -f ~/.bashrc ]; then . ~/.bashrcfi# User specific environment and startup programsPATH=$PATH:$HOME/bin#设置hadoop的环境变量export HADOOP_HOME=/home/xuelul/soft/hadoop251#设置maven的环境变量export MAVEN_HOME=/usr/xuelul/mavenexport ZOOKEEPER_HOME=/home/xuelu/soft/zoo346PATH=$PATH:$HADOOP_HOME/bin:$MAVEN_HOME/bin:$ZOOKEEPER_HOME/binexport PATH
source .bash_profile,使上述修改生效
修改hadoop自带的配置文件:
etc/hadoop/core-site.xml:
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property></configuration>
etc/hadoop/hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property></configuration>
Setup passphraseless ssh
Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P ‘‘ -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
hadoop运行命令如下:
#格式化文件系统: $ bin/hdfs namenode -format#开启 NameNode daemon and DataNode daemon: $ sbin/start-dfs.sh#The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs). Browse the web interface for the NameNode; by default it is available at: NameNode - http://localhost:50070/ Make the HDFS directories required to execute MapReduce jobs: $ bin/hdfs dfs -mkdir /user $ bin/hdfs dfs -mkdir /user/<username> Copy the input files into the distributed filesystem: $ bin/hdfs dfs -put etc/hadoop input Run some of the examples provided: $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.1.jar grep input output ‘dfs[a-z.]+‘ Examine the output files: Copy the output files from the distributed filesystem to the local filesystem and examine them: $ bin/hdfs dfs -get output output $ cat output/* or View the output files on the distributed filesystem: $ bin/hdfs dfs -cat output/* When you‘re done, stop the daemons with: $ sbin/stop-dfs.sh
YARN on Single Node
You can run a MapReduce job on YARN in a pseudo-distributed mode by setting a few parameters and running ResourceManager daemon and NodeManager daemon in addition.
The following instructions assume that 1. ~ 4. steps of the above instructions are already executed.
- Configure parameters as follows:
etc/hadoop/mapred-site.xml:
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property></configuration>
etc/hadoop/yarn-site.xml:
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property></configuration>
- Start ResourceManager daemon and NodeManager daemon:
$ sbin/start-yarn.sh
- Browse the web interface for the ResourceManager; by default it is available at:
- ResourceManager - http://localhost:8088/
- Run a MapReduce job.
- When you‘re done, stop the daemons with:
$ sbin/stop-yarn.sh
Hadoop安装和使用
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。