首页 > 代码库 > Hadoop day 1 - Setup ENV

Hadoop day 1 - Setup ENV

环境:

Xshell: 5

Xftp: 4

Virtual Box: 5.16

Linux: CentOS-7-x86_64-Minimal-1511

Vim: yum -y install vim-enhanced

JDK: 8

Hadoop: 2.7.3.tar.gz


在Virtual Box中安装完成 Linux后,设置网卡为自动启动:

检查机器网卡:

nmcli d

技术分享

可以看到有一个网卡:enp0s3


用vi打开网卡配置文件:

vi /etc/sysconfig/network-scirpts/ifcfg-enp0s3

技术分享

修改最后一行:ONBOOT=no -> ONBOOT=yes


DEVICE=eth0

描述网卡对应的设备别名,例如ifcfg-eth0的文件中它为eth0

BOOTPROTO=static

设置网卡获得ip地址的方式,可能的选项为static,dhcp或bootp,分别对应静态指定的ip地址,通过dhcp协议获得的ip地址,通过bootp协议获得的ip地址

BROADCAST=192.168.0.255

对应的子网广播地址

HWADDR=00:07:E9:05:E8:B4

对应的网卡物理地址

IPADDR=12.168.1.2

如果设置网卡获得ip地址的方式为静态指定,此字段就指定了网卡对应的ip地址

IPV6INIT=no

开启或关闭IPv6;关闭no,开启yes

IPV6_AUTOCONF=no

开启或关闭IPv6自动配置;关闭no,开启yes

NETMASK=255.255.255.0

网卡对应的网络掩码

NETWORK=192.168.1.0

网卡对应的网络地址

ONBOOT=yes

系统启动时是否设置此网络接口,设置为yes时,系统启动时激活此设备


技术分享


技术分享


安装Hadoop

[root@centosmaster opt]# tar zxf hadoop-2.7.3.tar.gz
[root@centosmaster opt]# cd hadoop-2.7.3
[root@centosmaster opt]# cd /opt/hadoop-2.7.3/etc/hadoop

core-site.xml

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://CentOS_105:9000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>file:/opt/hadoop-2.7.3/current/tmp</value>
  </property>
  <property>
    <name>fs.trash.interval</name>
    <value>4320</value>
  </property>
</configuration>

hdfs-site.xml

<configuration>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>/opt/hadoop-2.7.3/current/dfs/name</value>
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>/opt/hadoop-2.7.3/current/data</value>
	</property>
	<!--副本 -->
	<property>
		<name>dfs.replication</name>
		<value>1</value>
	</property>
	<property>
		<name>dfs.webhdfs.enabled</name>
		<value>true</value>
	</property>
	<property>
		<name>dfs.permissions.superusergroup</name>
		<value>staff</value>
	</property>
	<property>
		<name>dfs.permissions.enabled</name>
		<value>false</value>
	</property>
</configuration>

yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
  <property>
     <name>yarn.resourcemanager.hostname</name>
     <value>centosmaster</value>
   </property>
  <property>
     <name>yarn.nodemanager.aux.services</name>
     <value>mapreduce_shuffle</value>
   </property>
  <property>
     <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
     <value>org.apache.hadoop.mapred.ShuffleHandler</value>
   </property>
  <property>
     <name>yarn.resourcemanager.address</name>
     <value>centosmaster:18040</value>
   </property>
  <property>
     <name>yarn.resourcemanager.scheduler.address</name>
     <value>centosmaster:18030</value>
   </property>
  <property>
     <name>yarn.resourcemanager.resource.tracker.address</name>
     <value>centosmaster:18025</value>
   </property>
  <property>
     <name>yarn.resourcemanager.manager.admin.address</name>
     <value>centosmaster:18141</value>
   </property>
  <property>
     <name>yarn.resourcemanager.webapp.address</name>
     <value>centosmaster:18088</value>
   </property>
  <property>
     <name>yarn.log-aggregation-enable</name>
     <value>true</value>
   </property>
  <property>
     <name>yarn.log-aggregation.retain-seconds</name>
     <value>86400</value>
   </property>
  <property>
    <name>yarn.log-aggregation.retain-check-interval-seconds</name>
     <value>86400</value>
   </property>
  <property>
     <name>yarn.nodemanager.remote-app-log-dir</name>
     <value>/tmp/logs</value>
   </property>
  <property>
     <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
     <value>logs</value>
   </property>
</configuration>

mapred-site.xml

<configuration>
  <property>
     <name>mapreduce.foramework.name</name>
     <value>yarn</value>
   </property>
  <property>
     <name>mapreduce.jobtracker.http.address</name>
     <value>centosmaster:50030</value>
   </property>
  <property>
     <name>mapreduce.jobhistory.address</name>
     <value>centosmaster:10020</value>
   </property>
  <property>
     <name>mapreduce.jobhistory.webapp.address</name>
     <value>centosmaster:19888</value>
   </property>
  <property>
     <name>mapreduce.jobhistory.done.dir</name>
     <value>/jobhistory/done</value>
  </property>
  <property>
     <name>mapreduce.intermediate-done-dir</name>
     <value>/jobhistory/one_intermediate</value>
   </property>
  <property>
     <name>mapreduce.job.ubertask.enable</name>
     <value>true</value>
   </property>
</configuration>

在Slaves文件中添加本机ip,指定本机为Slave:

centosmaster


给hadoop指定java jdk

vim hadoop-env.sh
# The java implementation to use.
export JAVA_HOME=/usr/java/jdk1.8.0_111/

格式化HDFS文件系统

[root@centosmaster~]# hdfs namenode -format
************************************************************/
16/10/23 08:58:31 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
16/10/23 08:58:31 INFO namenode.NameNode: createNameNode [-format]
16/10/23 08:58:31 WARN common.Util: Path /opt/hadoop-2.7.3/current/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
16/10/23 08:58:31 WARN common.Util: Path /opt/hadoop-2.7.3/current/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
Formatting using clusterid: CID-1294bdbb-d45c-49f3-b5c5-3d26934e084f
16/10/23 08:58:32 INFO namenode.FSNamesystem: No KeyProvider found.
16/10/23 08:58:32 INFO namenode.FSNamesystem: fsLock is fair:true
16/10/23 08:58:32 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
16/10/23 08:58:32 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
16/10/23 08:58:32 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
16/10/23 08:58:32 INFO blockmanagement.BlockManager: The block deletion will start around 2016 Oct 23 08:58:32
16/10/23 08:58:32 INFO util.GSet: Computing capacity for map BlocksMap
16/10/23 08:58:32 INFO util.GSet: VM type       = 64-bit
16/10/23 08:58:32 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
16/10/23 08:58:32 INFO util.GSet: capacity      = 2^21 = 2097152 entries
16/10/23 08:58:32 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
16/10/23 08:58:32 INFO blockmanagement.BlockManager: defaultReplication         = 1
16/10/23 08:58:32 INFO blockmanagement.BlockManager: maxReplication             = 512
16/10/23 08:58:32 INFO blockmanagement.BlockManager: minReplication             = 1
16/10/23 08:58:32 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
16/10/23 08:58:32 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
16/10/23 08:58:32 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
16/10/23 08:58:32 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
16/10/23 08:58:32 INFO namenode.FSNamesystem: fsOwner             = root (auth:SIMPLE)
16/10/23 08:58:32 INFO namenode.FSNamesystem: supergroup          = staff
16/10/23 08:58:32 INFO namenode.FSNamesystem: isPermissionEnabled = false
16/10/23 08:58:32 INFO namenode.FSNamesystem: HA Enabled: false
16/10/23 08:58:32 INFO namenode.FSNamesystem: Append Enabled: true
16/10/23 08:58:32 INFO util.GSet: Computing capacity for map INodeMap
16/10/23 08:58:32 INFO util.GSet: VM type       = 64-bit
16/10/23 08:58:32 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
16/10/23 08:58:32 INFO util.GSet: capacity      = 2^20 = 1048576 entries
16/10/23 08:58:32 INFO namenode.FSDirectory: ACLs enabled? false
16/10/23 08:58:32 INFO namenode.FSDirectory: XAttrs enabled? true
16/10/23 08:58:32 INFO namenode.FSDirectory: Maximum size of an xattr: 16384
16/10/23 08:58:32 INFO namenode.NameNode: Caching file names occuring more than 10 times
16/10/23 08:58:32 INFO util.GSet: Computing capacity for map cachedBlocks
16/10/23 08:58:32 INFO util.GSet: VM type       = 64-bit
16/10/23 08:58:32 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
16/10/23 08:58:32 INFO util.GSet: capacity      = 2^18 = 262144 entries
16/10/23 08:58:32 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
16/10/23 08:58:32 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
16/10/23 08:58:32 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
16/10/23 08:58:32 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
16/10/23 08:58:32 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
16/10/23 08:58:32 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
16/10/23 08:58:32 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
16/10/23 08:58:32 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
16/10/23 08:58:32 INFO util.GSet: Computing capacity for map NameNodeRetryCache
16/10/23 08:58:32 INFO util.GSet: VM type       = 64-bit
16/10/23 08:58:32 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
16/10/23 08:58:32 INFO util.GSet: capacity      = 2^15 = 32768 entries
16/10/23 08:58:32 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1532573559-192.168.0.105-1477184312651
16/10/23 08:58:32 INFO common.Storage: Storage directory /opt/hadoop-2.7.3/current/dfs/name has been successfully formatted.
16/10/23 08:58:32 INFO namenode.FSImageFormatProtobuf: Saving image file /opt/hadoop-2.7.3/current/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
16/10/23 08:58:32 INFO namenode.FSImageFormatProtobuf: Image file /opt/hadoop-2.7.3/current/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 346 bytes saved in 0 seconds.
16/10/23 08:58:32 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
16/10/23 08:58:32 INFO util.ExitUtil: Exiting with status 0
16/10/23 08:58:32 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at CentOS_105/192.168.0.105

************************************************************/

从打出的Log文件可以看到格式化成功:

INFO common.Storage: Storage directory /opt/hadoop-2.7.3/current/dfs/name has been successfully formatted.

hdfs的路径有个warning,需要修改hdfs-site.xml

<property>

<name>dfs.namenode.name.dir</name>

<value>/opt/hadoop-2.7.3/current/dfs/name</value>

<value>file:///opt/hadoop-2.7.3/current/dfs/name</value>

</property>


重新新格式化:

hdfs namenode -format

查看host:

hostnamectl

修改hostname:

[root@centosmaster~]#Hostnamectl set-hostname "centosmaster"


启动hadoop:

[root@centosmaster hadoop-2.7.3]# sbin/start-all.sh 
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /opt/hadoop-2.7.3/logs/hadoop-root-namenode-centosmaster.out
centosmaster: starting datanode, logging to /opt/hadoop-2.7.3/logs/hadoop-root-datanode-centosmaster.out
Starting secondary namenodes [Centosmaster]
Centosmaster: starting secondarynamenode, logging to /opt/hadoop-2.7.3/logs/hadoop-root-secondarynamenode-centosmaster.out
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop-2.7.3/logs/yarn-root-resourcemanager-centosmaster.out
centosmaster: starting nodemanager, logging to /opt/hadoop-2.7.3/logs/yarn-root-nodemanager-centosmaster.out

用Jps查看启动了什么节点:

[root@centosmaster hadoop]# jps
2546 NodeManager
3090 SecondaryNameNode
3348 Jps
2201 DataNode
2109 NameNode
2447 ResourceManager

停止Hadoop:

sbin/stop-all.sh


验证:

技术分享


问题1-权限

[root@CentOS_105 jdk1.8.0_111]# java -version
bash: /usr/java/jdk1.8.0_111//bin/java: Permission denied

解决:chmod 777 /usr/java/jdk1.8.0_111/bin/java


问题2-配置

[root@centos_1 hadoop-2.7.3]# sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
Starting namenodes on []

解决:在etc/hadoop/core-site.xml中增加配置:

<property>
  <name>fs.default.name</name>
  <value>hdfs://127.0.0.1:9000</value>
</property>


问题3-Hostname

Does not contain a valid host:port authority:

原因:Hadoop的xml配置中会因为某些特殊字符而不正常.

解决:主机使用的hostname不合法,修改为不包含着‘.’ ‘/‘ ‘_‘等非法字符的主机名


参阅

网卡配置信息:http://www.krizna.com/centos/setup-network-centos-7/

JDK安装详解:http://www.cnblogs.com/wangfajun/p/5257899.html 


本文出自 “lybing” 博客,请务必保留此出处http://lybing.blog.51cto.com/3286625/1894409

Hadoop day 1 - Setup ENV