首页 > 代码库 > Spark安装部署(local和standalone模式)
Spark安装部署(local和standalone模式)
Spark运行的4中模式:
Local
Standalone
Yarn
Mesos
一、安装spark前期准备
1、安装java
$ sudo tar -zxvf jdk-7u67-linux-x64.tar.gz -C /opt/service/ |
export JAVA_HOME=/opt/service/jdk1.7.0_67 export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATH |
alternatives --config java alternatives --install /usr/bin/java java /opt/java/jdk1.7.0_67/bin/java 3 alternatives --config java --如果不修改这里,可能在安装spark组件时报错。 |
2、安装scala
tar -zxvf scala-2.10.4.tgz -C /opt/ 配置环境变量后scala即安装成功。 |
3、安装hadoop
参考:http://www.cnblogs.com/wcwen1990/p/6739151.html
4、安装spark
1)local模式安装部署
tar -zxvf spark-1.3.0-bin-2.5.0-cdh5.3.6.tgz -C /opt/cdh-5.3.6/ cd /opt/cdh-5.3.6/ mv spark-1.3.0-bin-2.5.0-cdh5.3.6/ spark-1.3.0 spark安装local模式安装成功,通过bin/spark-shell可以进行spark基本操作。 |
Local模式下spark基本测试: bin/spark-shell scala> sc.textFile("/opt/datas/wc.input") scala> res0.collect scala> sc.stop() scala> exit |
2)standalone模式安装spark
tar -zxvf spark-1.3.0-bin-2.5.0-cdh5.3.6.tgz -C /opt/cdh-5.3.6/ cd /opt/cdh-5.3.6/ mv spark-1.3.0-bin-2.5.0-cdh5.3.6/ spark-1.3.0 |
编辑slaves文件,添加worker节点: db02 |
设置log4j日志,内容默认 |
配置spark-env.sh环境变量: JAVA_HOME=/opt/java/jdk1.7.0_67 SCALA_HOME=/opt/scala-2.10.4 HADOOP_CONF_DIR=/opt/cdh-5.3.6/hadoop-2.5.0/etc/hadoop SPARK_MASTER_IP=db02 SPARK_MASTER_PORT=7077 SPARK_MASTER_WEBUI_PORT=8080 SPARK_WORKER_CORES=2 SPARK_WORKER_MEMORY=5g SPARK_WORKER_PORT=7078 SPARK_WORKER_WEBUI_PORT=8081 SPARK_WORKER_INSTANCES=1 SPARK_WORKER_DIR=/opt/cdh-5.3.6/spark-1.3.0/data/tmp |
配置spark-defaults.conf文件:不配置此选项运行spark服务还是在local模式下运行。 spark.master spark://db02:7077 ----------------------------------------------------------------------------------------------------------------------------------------- 如果没有配置此选项,也可以通过bin/spark-shell命令通过指定--master参数指定其运行在哪种模式下,例如: # bin/spark-shell --master spark://db02:7077 或者 # bin/spark-shell --master local |
启动spark: sbin/start-master.sh sbin/start-slaves.sh |
此时http://db02:8080/可以登录web浏览器访问,如下: |
运行bin/spark-shell,可以在web端看到下面任务,这是配置了spark-default.conf文件,否则将看不到任务: |
测试standalone模式spark: bin/hdfs dfs -mkdir -p /user/hadoop/wordcount/input/ bin/hdfs dfs -ls /user/hadoop/wordcount/ Found 1 items drwxr-xr-x - root supergroup 0 2017-05-22 14:47 /user/hadoop/wordcount/input bin/hdfs dfs -put /opt/datas/wc.input /user/hadoop/wordcount/input bin/hdfs dfs -ls /user/hadoop/wordcount/input Found 1 items -rw-r--r-- 3 root supergroup 63 2017-05-22 14:48 /user/hadoop/wordcount/input/wc.input --------------------------------------------------------------------------------------------------------------------------------------- scala> sc.textFile("hdfs://db02:8020/user/hadoop/wordcount/input/wc.input") scala> res0.collect scala> sc.stop() scala> exit |
Spark安装部署(local和standalone模式)