首页 > 代码库 > Hadoop+Spark:集群环境搭建
Hadoop+Spark:集群环境搭建
- 环境准备:
在虚拟机下,大家三台Linux ubuntu 14.04 server x64 系统(下载地址:http://releases.ubuntu.com/14.04.2/ubuntu-14.04.2-server-amd64.iso):
192.168.1.200 master
192.168.1.201 node1
192.168.1.202 node2
- 在Master上安装Spark环境:
具体请参考我的文章:《Hadoop:搭建hadoop集群》
Spark集群环境搭建:
搭建hadoop集群使用hadoop版本是hadoop2.6.4
搭建spark这里使用spark版本是spark1.6.2(spark-1.6.2-bin-hadoop2.6.tgz)
1、下载安装包到master虚拟服务器:
在线下载:
hadoop@master:~/ wget http://mirror.bit.edu.cn/apache/spark/spark-1.6.2/spark-1.6.2-bin-hadoop2.6.tgz
离线上传到集群:
2、解压spark安装包到master虚拟服务器/usr/local/spark下,并分配权限:
#解压到/usr/local/下
hadoop@master:~$ sudo tar -zxvf spark-1.6.2-bin-hadoop2.6.tgz -C /usr/local/hadoop@master:~$ cd /usr/local/hadoop@master:/usr/local$ lsbin games include man share srcetc hadoop lib sbin spark-1.6.2-bin-hadoop2.6
#重命名 为sparkhadoop@master:/usr/local$ sudo mv spark-1.6.2-bin-hadoop2.6/ spark/hadoop@master:/usr/local$ lsbin etc games hadoop include lib man sbin share spark src
#分配权限hadoop@master:/usr/local$ sudo chown -R hadoop:hadoop sparkhadoop@master:/usr/local$
3、在master虚拟服务器/etc/profile中添加Spark环境变量:
编辑/etc/profile文件
sudo vim /etc/profile
在尾部添加$SPARK_HOME变量,添加后,目前我的/etc/profile文件尾部内容如下:
export JAVA_HOME=/usr/lib/jvm/java-8-oracleexport JRE_HOME=/usr/lib/jvm/java-8-oracleexport SCALA_HOME=/opt/scala/scala-2.10.5# add hadoop bin/ directory to PATHexport HADOOP_HOME=/usr/local/hadoopexport SPARK_HOME=/usr/local/sparkexport PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$JAVA_HOME:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin:$PATHexport CLASSPATH=$CLASS_PATH::$JAVA_HOME/lib:$JAVA_HOME/jre/lib
生效:
source /etc/profile
- 在Master配置Spark:
1、配置master虚拟服务器hadoop-env.sh文件:
sudo vim /usr/local/spark/conf/hadoop-env.sh
注意:默认情况下没有hadoop-env.sh和slaves文件,而是*.template文件,需要重命名:
hadoop@master:/usr/local/spark/conf$ lsdocker.properties.template metrics.properties.template spark-env.shfairscheduler.xml.template slaves.templatelog4j.properties.template spark-defaults.conf.templatehadoop@master:/usr/local/spark/conf$ sudo vim spark-env.shhadoop@master:/usr/local/spark/conf$ mv slaves.template slaves
在文件末尾追加如下内容:
export STANDALONE_SPARK_MASTER_HOST=192.168.1.200export SPARK_MASTER_IP=192.168.1.200export SPARK_WORKER_CORES=1#every slave node start work instance countexport SPARK_WORKER_INSTANCES=1export SPARK_MASTER_PORT=7077export SPARK_WORKER_MEMORY=1gexport MASTER=spark://${SPARK_MASTER_IP}:${SPARK_MASTER_PORT}export SCALA_HOME=/opt/scala/scala-2.10.5export JAVA_HOME=/usr/lib/jvm/java-8-oracleexport SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=hdfs://172.21.7.10:9000/SparkEventLog"export SPARK_WORKDER_OPTS="-Dspark.worker.cleanup.enabled=true"export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
2、配置master虚拟服务器下slaves文件:
sudo vim /usr/local/spark/conf/slaves
在slaves文件中内容如下:
192.168.1.201192.168.1.202
注意:每行写一个机器的ip。
3、Master虚拟机下/usr/local/spark/目录下创建logs文件夹,并分配777权限:
hadoop@master:/usr/local/spark$ mkdir logshadoop@master:/usr/local/spark$ chmod 777 logs
- 复制Master虚拟服务器上的/usr/loca/spark下文件到所有slaves节点(node1、node2)下:
1、复制Master虚拟服务器上/usr/local/spark/安装文件到各个salves(node1、node2)上:
注意:拷贝钱需要在ssh到所有salves节点(node1、node2)上,创建/usr/local/spark/目录,并分配777权限。
hadoop@master:/usr/local/spark/conf$ cd ~/hadoop@master:~$ sudo chmod 777 /usr/local/sparkhadoop@master:~$ scp -r /usr/local/spark hadoop@node1:/usr/localscp: /usr/local/spark: Permission deniedhadoop@master:~$ sudo scp -r /usr/local/spark hadoop@node1:/usr/localhadoop@node1‘s password:scp: /usr/local/spark: Permission deniedhadoop@master:~$ sudo chmod 777 /usr/local/sparkhadoop@master:~$ ssh node1Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Fri Sep 23 16:40:31 UTC 2016 System load: 0.08 Processes: 400 Usage of /: 12.2% of 17.34GB Users logged in: 0 Memory usage: 5% IP address for eth0: 192.168.1.201 Swap usage: 0% Graph this data and manage this system at: https://landscape.canonical.com/New release ‘16.04.1 LTS‘ available.Run ‘do-release-upgrade‘ to upgrade to it.Last login: Wed Sep 21 16:19:25 2016 from masterhadoop@node1:~$ cd /usr/local/hadoop@node1:/usr/local$ sudo mkdir spark[sudo] password for hadoop:hadoop@node1:/usr/local$ lsbin etc games hadoop include lib man sbin share spark srchadoop@node1:/usr/local$ sudo chmod 777 ./sparkhadoop@node1:/usr/local$ exithadoop@master:~$ scp -r /usr/local/spark hadoop@node1:/usr/local...........hadoop@master:~$ ssh node2Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Fri Sep 23 16:15:03 UTC 2016 System load: 0.08 Processes: 435 Usage of /: 13.0% of 17.34GB Users logged in: 0 Memory usage: 6% IP address for eth0: 192.168.1.202 Swap usage: 0% Graph this data and manage this system at: https://landscape.canonical.com/Last login: Wed Sep 21 16:19:47 2016 from masterhadoop@node2:~$ cd /usr/localhadoop@node2:/usr/local$ sudo mkdir spark[sudo] password for hadoop:hadoop@node2:/usr/local$ sudo chmod 777 ./sparkhadoop@node2:/usr/local$ exitlogoutConnection to node2 closed.hadoop@master:~$ scp -r /usr/local/spark hadoop@node2:/usr/local...........
2、修改所有salves节点(node1、node2)上/etc/profile,追加$SPARK_HOME环境变量:
注意:一般都会遇到权限问题。最好登录到各个salves节点(node1、node2)上手动编辑/etc/profile。
hadoop@master:~$ ssh node1Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Fri Sep 23 16:42:44 UTC 2016 System load: 0.01 Processes: 400 Usage of /: 12.2% of 17.34GB Users logged in: 0 Memory usage: 5% IP address for eth0: 192.168.1.201 Swap usage: 0% Graph this data and manage this system at: https://landscape.canonical.com/New release ‘16.04.1 LTS‘ available.Run ‘do-release-upgrade‘ to upgrade to it.Last login: Fri Sep 23 16:40:52 2016 from masterhadoop@node1:~$ sudo vim /etc/profile[sudo] password for hadoop:hadoop@node1:~$ exitlogoutConnection to node1 closed.hadoop@master:~$ ssh node2Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Fri Sep 23 16:44:42 UTC 2016 System load: 0.0 Processes: 400 Usage of /: 13.0% of 17.34GB Users logged in: 0 Memory usage: 5% IP address for eth0: 192.168.1.202 Swap usage: 0% Graph this data and manage this system at: https://landscape.canonical.com/New release ‘16.04.1 LTS‘ available.Run ‘do-release-upgrade‘ to upgrade to it.Last login: Fri Sep 23 16:43:31 2016 from masterhadoop@node2:~$ sudo vim /etc/profile[sudo] password for hadoop:hadoop@node2:~$ exitlogoutConnection to node2 closed.hadoop@master:~$
修改后的所有salves上/etc/profile文件与master节点上/etc/profile文件配置一致。
- 在Master启动spark并验证是否配置成功:
1、启动命令:
一般要确保hadoop已经启动,之后才启动spark
hadoop@master:~$ cd /usr/local/spark/hadoop@master:/usr/local/spark$ ./sbin/start-all.sh
2、验证是否启动成功:
方法一、jps
hadoop@master:/usr/local/spark$ ./sbin/start-all.shstarting org.apache.spark.deploy.master.Master, logging to /usr/local/spark/logs/spark-hadoop-org.apache.spark.deploy.master.Master-1-master.out192.168.1.201: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node1.out192.168.1.202: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-node2.outhadoop@master:/usr/local/spark$ jps1650 NameNode1875 SecondaryNameNode3494 Jps2025 ResourceManager3423 Masterhadoop@master:/usr/local/spark$ cd ~/hadoop@master:~$ ssh node1Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Fri Sep 23 17:33:40 UTC 2016 System load: 0.06 Processes: 402 Usage of /: 13.9% of 17.34GB Users logged in: 0 Memory usage: 21% IP address for eth0: 192.168.1.201 Swap usage: 0% Graph this data and manage this system at: https://landscape.canonical.com/New release ‘16.04.1 LTS‘ available.Run ‘do-release-upgrade‘ to upgrade to it.Last login: Fri Sep 23 17:33:10 2016 from masterhadoop@node1:~$ jps1392 DataNode2449 Jps2330 Worker2079 NodeManagerhadoop@node1:~$ exitlogoutConnection to node1 closed.hadoop@master:~$ ssh node2Welcome to Ubuntu 14.04.2 LTS (GNU/Linux 3.16.0-30-generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Fri Sep 23 17:33:40 UTC 2016 System load: 0.07 Processes: 404 Usage of /: 14.7% of 17.34GB Users logged in: 0 Memory usage: 20% IP address for eth0: 192.168.1.202 Swap usage: 0% Graph this data and manage this system at: https://landscape.canonical.com/New release ‘16.04.1 LTS‘ available.Run ‘do-release-upgrade‘ to upgrade to it.Last login: Fri Sep 23 16:51:36 2016 from masterhadoop@node2:~$ jps2264 Worker2090 NodeManager1402 DataNode2383 Jpshadoop@node2:~$
方法二、web方式http://192.168.1.200:8080看是否正常:
Hadoop+Spark:集群环境搭建