首页 > 代码库 > sparkr——报错

sparkr——报错

> sc <- sparkR.init()
Re-using existing Spark Context. Please stop SparkR with sparkR.stop() or restart R to create a new Spark Context
> sqlContext <- sparkRSQL.init(sc)
> df <- createDataFrame(sqlContext, faithful)
17/03/01 15:05:56 INFO SparkContext: Starting job: collectPartitions at NativeMethodAccessorImpl.java:-2
17/03/01 15:05:56 INFO DAGScheduler: Got job 0 (collectPartitions at NativeMethodAccessorImpl.java:-2) with 1 output partitions
17/03/01 15:05:56 INFO DAGScheduler: Final stage: ResultStage 0 (collectPartitions at NativeMethodAccessorImpl.java:-2)
17/03/01 15:05:56 INFO DAGScheduler: Parents of final stage: List()
17/03/01 15:05:56 INFO DAGScheduler: Missing parents: List()
17/03/01 15:05:56 INFO DAGScheduler: Submitting ResultStage 0 (ParallelCollectionRDD[0] at parallelize at RRDD.scala:460), which has no missing parents
17/03/01 15:05:56 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1280.0 B, free 1280.0 B)
17/03/01 15:05:56 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 854.0 B, free 2.1 KB)
17/03/01 15:05:56 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.16.31.137:49150 (size: 854.0 B, free: 511.5 MB)
17/03/01 15:05:56 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
17/03/01 15:05:56 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (ParallelCollectionRDD[0] at parallelize at RRDD.scala:460)
17/03/01 15:05:56 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
17/03/01 15:05:56 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, test3, partition 0,PROCESS_LOCAL, 12976 bytes)
17/03/01 15:05:56 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on test3:50531 (size: 854.0 B, free: 511.5 MB)
17/03/01 15:05:56 INFO DAGScheduler: ResultStage 0 (collectPartitions at NativeMethodAccessorImpl.java:-2) finished in 0.396 s
17/03/01 15:05:56 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 389 ms on test3 (1/1)
17/03/01 15:05:56 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
17/03/01 15:05:56 INFO DAGScheduler: Job 0 finished: collectPartitions at NativeMethodAccessorImpl.java:-2, took 0.526915 s
> showDF(df)
17/03/01 15:06:02 INFO SparkContext: Starting job: showString at NativeMethodAccessorImpl.java:-2
17/03/01 15:06:02 INFO DAGScheduler: Got job 1 (showString at NativeMethodAccessorImpl.java:-2) with 1 output partitions
17/03/01 15:06:02 INFO DAGScheduler: Final stage: ResultStage 1 (showString at NativeMethodAccessorImpl.java:-2)
17/03/01 15:06:02 INFO DAGScheduler: Parents of final stage: List()
17/03/01 15:06:02 INFO DAGScheduler: Missing parents: List()
17/03/01 15:06:02 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[4] at showString at NativeMethodAccessorImpl.java:-2), which has no missing parents
17/03/01 15:06:02 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 8.7 KB, free 10.8 KB)
17/03/01 15:06:02 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.5 KB, free 14.4 KB)
17/03/01 15:06:02 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.16.31.137:49150 (size: 3.5 KB, free: 511.5 MB)
17/03/01 15:06:02 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
17/03/01 15:06:02 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[4] at showString at NativeMethodAccessorImpl.java:-2)
17/03/01 15:06:02 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
17/03/01 15:06:02 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, test2, partition 0,PROCESS_LOCAL, 12976 bytes)
17/03/01 15:06:03 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on test2:57552 (size: 3.5 KB, free: 511.5 MB)
17/03/01 15:06:04 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, test2): java.io.IOException: Cannot run program "Rscript": error=2, No such file or directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
    at org.apache.spark.api.r.RRDD$.createRProcess(RRDD.scala:413)
    at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:429)
    at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:63)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: error=2, No such file or directory
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:187)
    at java.lang.ProcessImpl.start(ProcessImpl.java:130)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
    ... 20 more

17/03/01 15:06:04 INFO TaskSetManager: Starting task 0.1 in stage 1.0 (TID 2, test2, partition 0,PROCESS_LOCAL, 12976 bytes)
17/03/01 15:06:04 INFO TaskSetManager: Lost task 0.1 in stage 1.0 (TID 2) on executor test2: java.io.IOException (Cannot run program "Rscript": error=2, No such file or directory) [duplicate 1]
17/03/01 15:06:04 INFO TaskSetManager: Starting task 0.2 in stage 1.0 (TID 3, test3, partition 0,PROCESS_LOCAL, 12976 bytes)
17/03/01 15:06:04 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on test3:50531 (size: 3.5 KB, free: 511.5 MB)
17/03/01 15:06:04 INFO TaskSetManager: Lost task 0.2 in stage 1.0 (TID 3) on executor test3: java.io.IOException (Cannot run program "Rscript": error=2, No such file or directory) [duplicate 2]
17/03/01 15:06:04 INFO TaskSetManager: Starting task 0.3 in stage 1.0 (TID 4, test3, partition 0,PROCESS_LOCAL, 12976 bytes)
17/03/01 15:06:04 INFO TaskSetManager: Lost task 0.3 in stage 1.0 (TID 4) on executor test3: java.io.IOException (Cannot run program "Rscript": error=2, No such file or directory) [duplicate 3]
17/03/01 15:06:04 ERROR TaskSetManager: Task 0 in stage 1.0 failed 4 times; aborting job
17/03/01 15:06:04 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
17/03/01 15:06:04 INFO TaskSchedulerImpl: Cancelling stage 1
17/03/01 15:06:04 INFO DAGScheduler: ResultStage 1 (showString at NativeMethodAccessorImpl.java:-2) failed in 2.007 s
17/03/01 15:06:04 INFO DAGScheduler: Job 1 failed: showString at NativeMethodAccessorImpl.java:-2, took 2.027519 s
17/03/01 15:06:04 ERROR RBackendHandler: showString on 15 failed
Error in invokeJava(isStatic = FALSE, objId$id, methodName, ...) : 
  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, test3): java.io.IOException: Cannot run program "Rscript": error=2, No such file or directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:1047)
    at org.apache.spark.api.r.RRDD$.createRProcess(RRDD.scala:413)
    at org.apache.spark.api.r.RRDD$.createRWorker(RRDD.scala:429)
    at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:63)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
    at org.apache.spark.rdd.R

 

 

  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, test3): java.io.IOException: Cannot run program "Rscript": error=2, No such file or directory

重点为这一句
这一错误,使得在sparkr中,定义class为

class(df)
[1] "DataFrame"
attr(,"package")
[1] "SparkR"
的对象之后,使用class以及names以及show可以查看

但使用showDF以及head则报出如上错误。即无法读取

关注重点报错句,可知,其他节点上没有

Rscript

解决办法为,登陆其他的机器,将将Rscript copy到/usr/bin便可

 或改成单节点:

即启动时,去掉--master

sparkR --driver-class-path /data1/mysql-connector-java-5.1.18.jar

sparkr——报错