首页 > 代码库 > CDH5.3集群安装笔记-环境准备(1)

CDH5.3集群安装笔记-环境准备(1)

Hadoop是一个复杂的系统组合,搭建一个用于生产的Hadoop环境是一件非常麻烦的事情。但这个世界上总有一些牛人会帮你解决一些看似痛苦的问题,如果现在没有,那也是早晚的事。CDH是Cloudera旗下的Hadoop套装环境,CDH的相关介绍请各位亲自己查阅www.cloudera.com,我就不再多说了。这里主要是介绍使用CDH5.3安装一个可以用于生产的Hadoop环境。虽然人家Cloudera牛人帮你解决了hadoop安装的问题,但随之而来的是:Cloudera Manager的安装不比hadoop的安装来得简单,而且有很多坑,后面的文章里我们将一一踩过去。

第一篇 环境准备

一、服务器准备:

我们准备一个12台的小集群,所有服务器安装Redhat 6.4 server x64 操作系统。服务器的hostname统一命名为server[1-12].cdhwork.org,内网ip地址为192.168.10.[1-12],所有服务器都必须设置DNS服务器(可以用202.96.209.5或者8.8.8.8),所有服务器的root密码必须设置成一样的

服务器角色分配表
服务器(cdhwork.org)ip地址安装的角色
server1192.168.10.1CDH本地镜像,cloudera manager,时间服务器
server2192.168.10.2Cloudera Management Service Host Monitor
Cloudera Management Service Service Monitor
server3192.168.10.3HDFS NameNode
Hive Gateway
Impala Catalog Server
Cloudera Management Service Alert Publisher
Spark Gateway
ZooKeeper Server
server4192.168.10.4HDFS SecondaryNameNode
Hive Gateway
Impala StateStore
Solr Server
Spark Gateway
YARN (MR2 Included) ResourceManager
ZooKeeper Server
server5192.168.10.5HDFS Balancer
Hive Gateway
Hue Server
Cloudera Management Service Activity Monitor
Oozie Server
Spark Gateway
Sqoop 2 Server
ZooKeeper Server
server6192.168.10.6HBase Master
Hive Gateway
MapReduce JobTracker
Solr Server
Spark Gateway
YARN (MR2 Included) JobHistory Server
ZooKeeper Server
server7192.168.10.7HBase REST Server
HBase Thrift Server
Hive Metastore Server
HiveServer2
Key-Value Store Indexer Lily HBase Indexer
Cloudera Management Service Event Server
Spark History Server
server8192.168.10.8HBase RegionServer
HDFS DataNode
Impala Daemon
MapReduce TaskTracker
YARN (MR2 Included) NodeManager
server9192.168.10.9HBase RegionServer
HDFS DataNode
Impala Daemon
MapReduce TaskTracker
YARN (MR2 Included) NodeManager
server10192.168.10.10HBase RegionServer
HDFS DataNode
Impala Daemon
MapReduce TaskTracker
YARN (MR2 Included) NodeManager
server11192.168.10.11HBase RegionServer
HDFS DataNode
Impala Daemon
MapReduce TaskTracker
YARN (MR2 Included) NodeManager
server12192.168.10.12HBase RegionServer
HDFS DataNode
Impala Daemon
MapReduce TaskTracker
YARN (MR2 Included) NodeManager

以下操作请用root账户在所有服务器上执行相同操作。

1、关闭防火墙

/etc/init.d/iptables stop #关闭防火墙
chkconfig iptables off    #设置启动时关闭防火墙服务

2、关闭selinux
命令行执行:
setenforce 0
编辑配置文件以便重启后保持设置:

vi /etc/selinux/config

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=disabled
修改SELINUX=disabled,保存退出。

3、加快内存释放

执行命令:
sysctl vm.swappiness=0

编辑配置文件以便重启后保持设置:

vi /etc/sysctl.conf

# Controls the maximum shared segment size, in bytes
kernel.shmmax = 68719476736

# Controls the maximum number of shared memory segments, in pages
kernel.shmall = 4294967296

vm.swappiness = 0
增加vm.swappiness = 0,保存退出。

4、关闭redhat的内存hugepage

执行命令:
echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag
编辑配置文件以便重启后保持设置:
vi /etc/rc.local

#!/bin/sh
#
# This script will be executed *after* all the other init scripts.
# You can put your own initialization stuff in here if you don't
# want to do the full Sys V style init stuff.
echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag

touch /var/lock/subsys/local
增加echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag,保存退出。

5、修改hosts

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

# CDH本地镜像
192.168.10.1	archive.cloudera.com

# ClouderaManager
192.168.10.1 	server1.cdhwork.org

# Cloudera Management Service Host Monitor,Cloudera Management Service Service Monitor
192.168.10.2	server2.cdhwork.org

# HDFS NameNode,Hive Gateway,Impala Catalog Server,Cloudera Management Service Alert Publisher,Spark Gateway,ZooKeeper Server
192.168.10.3	server3.cdhwork.org

# HDFS SecondaryNameNode,Hive Gateway,Impala StateStore,Solr Server,Spark Gateway,YARN (MR2 Included) ResourceManager,ZooKeeper Server
192.168.10.4	server4.cdhwork.org

# HDFS Balancer,Hive Gateway,Hue Server,Cloudera Management Service Activity Monitor,Oozie Server,Spark Gateway,Sqoop 2 Server,ZooKeeper Server
192.168.10.5	server5.cdhwork.org

# HBase Master,Hive Gateway,MapReduce JobTracker,Solr Server,Spark Gateway,YARN (MR2 Included) JobHistory Server,ZooKeeper Server,Postgresql-9.2
192.168.10.6	server6.cdhwork.org

# HBase REST Server,HBase Thrift Server,Hive Metastore Server,HiveServer2,Key-Value Store Indexer Lily HBase Indexer,Cloudera Management Service Event Server,Spark History Server
192.168.10.7	server7.cdhwork.org

# HBase RegionServer,HDFS DataNode,Impala Daemon,MapReduce TaskTracker,YARN (MR2 Included) NodeManager
192.168.10.8	server8.cdhwork.org
192.168.10.9	server9.cdhwork.org
192.168.10.10	server10.cdhwork.org
192.168.10.11	server11.cdhwork.org
192.168.10.12	server12.cdhwork.org

6、配置yum源

cd /etc/yum.repos.d/
mv rhel-source.repo rhel-source.repo.bak
vi rhel-source.repo

[base]
name=CentOS-6.6 - Base
baseurl=http://mirrors.163.com/centos/6.6/os/x86_64/
gpgcheck=1
gpgkey=http://mirrors.163.com/centos/RPM-GPG-KEY-CentOS-6
exclude=postgresql*

#released updates
[updates]
name=CentOS-$releasever - Updates
baseurl=http://mirrors.163.com/centos/6.6/updates/x86_64/
gpgcheck=1
gpgkey=http://mirrors.163.com/centos/RPM-GPG-KEY-CentOS-6
exclude=postgresql*

#packages used/produced in the build but not released
#[addons]
#name=CentOS-$releasever - Addons
#baseurl=http://mirrors.163.com/centos/6.6/addons/x86_64/
#gpgcheck=1
#gpgkey=http://mirrors.163.com/centos/RPM-GPG-KEY-CentOS-6
#additional packages that may be useful
[extras]
name=CentOS-$releasever - Extras
baseurl=http://mirrors.163.com/centos/6.6/extras/x86_64/
gpgcheck=1
gpgkey=http://mirrors.163.com/centos/RPM-GPG-KEY-CentOS-6
#additional packages that extend functionality of existing packages
[centosplus]
name=CentOS-$releasever - Plus
baseurl=http://mirrors.163.com/centos/6.6/centosplus/x86_64/
gpgcheck=1
enabled=0

保存修改退出。修改yum源是为了让后面的安装速度更快些。当然,有个前提是你所有的服务器都可以访问外网,如果不能,要么安装一个代理服务器,代理访问外网;要么直接自己做一个yum源的镜像来提供服务。建议自建yum镜像,这样既方便又省事,唯一缺陷就是占一点磁盘空间。

自建镜像站点可以用wget -r <target>命令复制目标站点下所有内容,用httpd服务来一共web访问,如果你的复制站点在/usr/site,你可以直接在/var/www/html/下创建一个软连接:

ln -s /usr/site /var/www/html/site

这样,你就可以用过http://ip/site访问镜像站点了。说真的linux下有很多实用性很强的工具,wget,ln都是其中之一。

7、更新服务器环境到最新设置

yum update

这样做是为了让后面的Cloudera安装时尽量不出错,因为很多时候Cloudera会莫名其妙的报依赖的rpm资源包不存在,或者版本太旧啥的。才开始一直不知道如何解决,后来一狠心做了一次系统更新,竟然解决了!虽然不知道为啥会这样,但总算解决了问题不是?所以大家辛苦点更新一下吧,如果网络速度够快,花不了多长时间。

CDH5.3集群安装笔记-环境准备(1)