首页 > 代码库 > spark1.1.0集群安装配置

spark1.1.0集群安装配置

和分布式文件系统和NoSQL数据库相比而言,spark集群的安装配置还算是比较简单的:

  1. 安装JDK,这个几乎不用介绍了(很多软件都需要JDK嘛)
    wget http://download.oracle.com/otn-pub/java/jdk/7u71-b14/jdk-7u71-linux-x64.tar.gz?AuthParam=1416666050_dca8969bfc01e3d8d42d04040f76ff1
    tar -zxvf jdk-7u71-linux-x64.tar.gz
  2. 安装scala,网上建议用2.9版本:
    wget http://www.scala-lang.org/files/archive/scala-2.9.1.final.tgz
    tar -zxvf scala-2.9.1.final.tgz
    ln -n scala-2.9.1.final scala
  3. 设置环境变量,vi /etc/profile
    export JAVA_HOME=/usr/local/java
    export SCALA_HOME=/usr/local/scala
  4. 安装spark:
    wget http://mirror.bit.edu.cn/apache/spark/spark-1.1.0/spark-1.1.0-bin-hadoop2.3.tgz
    tar -zxvf spark-1.1.0-bin-hadoop2.3.tgz
    ln -s spark-1.1.0-bin-hadoop2.3 spark
  5. 执行测试程序:
    cd /usr/local/spark/bin
    ./spark-shell
    输入:
    scala> val data = http://www.mamicode.com/Array(1, 2, 3, 4, 5)
    data: Array[Int] = Array(1, 2, 3, 4, 5)


    scala> val distData = http://www.mamicode.com/sc.parallelize(data)
    distData: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:14


    scala> distData.reduce(_+_)
  6. 可以观察4040端口:


spark1.1.0集群安装配置