首页 > 代码库 > spark运行java-jar:Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs

spark运行java-jar:Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs

 

今天碰到的一个 spark问题,困扰好久才解决

首先我的spark集群部署使用的部署包是官方提供的

spark-1.0.2-bin-hadoop2.tgz

部署在hadoop集群上。

在运行java jar包的时候使用命令

java -jar chinahadoop-1.0-SNAPSHOT.jar  chinahadoop-1.0-SNAPSHOT.jar  hdfs://node1:8020/user/ning/data.txt /user/ning/output

出现了如下错误

14/08/23 23:18:55 INFO AppClient$ClientActor: Executor updated: app-20140823231852-0000/1 is now RUNNING
before count:MappedRDD[1] at textFile at Analysis.scala:35
Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
 at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:287)
 at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:221)
 at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
 at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:175)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
 at scala.Option.getOrElse(Option.scala:120)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
 at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
 at scala.Option.getOrElse(Option.scala:120)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:1097)
 at org.apache.spark.rdd.RDD.count(RDD.scala:861)
 at cn.chinahadoop.spark.Analysis$.main(Analysis.scala:39)
 at cn.chinahadoop.spark.Analysis.main(Analysis.scala)

 

 

 

 

在网上找了好久都没有找到答案,最终在我的maven配置文件 pom.xml添加上这么一行,终于运行通过

 

        <transformer                                        implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">                                    <resource>META-INF/services/org.apache.hadoop.fs.FileSystem</resource>         </transformer> 

 

 

maven的全部配置如下

 

<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0"         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">    <modelVersion>4.0.0</modelVersion>    <groupId>chinahadoop</groupId>    <artifactId>chinahadoop</artifactId>    <version>1.0-SNAPSHOT</version>    <repositories>        <repository>            <id>Akka repository</id>            <url>http://repo.akka.io/releases</url>        </repository>    </repositories>    <build>        <sourceDirectory>src/main/scala/</sourceDirectory>        <testSourceDirectory>src/test/scala/</testSourceDirectory>        <plugins>            <plugin>                <groupId>org.scala-tools</groupId>                <artifactId>maven-scala-plugin</artifactId>                <executions>                    <execution>                        <goals>                            <goal>compile</goal>                            <goal>testCompile</goal>                        </goals>                    </execution>                </executions>                <configuration>                    <scalaVersion>2.10.3</scalaVersion>                </configuration>            </plugin>            <plugin>                <groupId>org.apache.maven.plugins</groupId>                <artifactId>maven-shade-plugin</artifactId>                <version>2.2</version>                <executions>                    <execution>                        <phase>package</phase>                        <goals>                            <goal>shade</goal>                        </goals>                        <configuration>                            <filters>                                <filter>                                    <artifact>*:*</artifact>                                    <excludes>                                        <exclude>META-INF/*.SF</exclude>                                        <exclude>META-INF/*.DSA</exclude>                                        <exclude>META-INF/*.RSA</exclude>                                    </excludes>                                </filter>                            </filters>                            <transformers>                                <transformer                                        implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">                                    <resource>reference.conf</resource>                                </transformer>                                <transformer                                        implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">                                    <manifestEntries>                                        <Main-Class>cn.chinahadoop.spark.Analysis</Main-Class>                                    </manifestEntries>                                </transformer>                                <transformer                                        implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">                                    <resource>META-INF/services/org.apache.hadoop.fs.FileSystem</resource>                                </transformer>                            </transformers>                        </configuration>                    </execution>                </executions>            </plugin>        </plugins>    </build>    <dependencies>        <dependency>            <groupId>org.apache.spark</groupId>            <artifactId>spark-core_2.10</artifactId>            <version>1.0.2</version>        </dependency>        <dependency>            <groupId>org.apache.hadoop</groupId>            <artifactId>hadoop-client</artifactId>            <version>2.4.1</version>        </dependency>        <dependency>            <groupId>org.apache.spark</groupId>            <artifactId>spark-streaming_2.10</artifactId>            <version>1.0.2</version>        </dependency>        <dependency>            <groupId>org.apache.hadoop</groupId>            <artifactId>hadoop-hdfs</artifactId>            <version>2.4.1</version>        </dependency>    </dependencies></project>

 

spark运行java-jar:Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs