首页 > 代码库 > MapReduce案例运行

MapReduce案例运行

从《Hadoop权威指南》选取了一个小案例,在Hadoop集群环境中运行。

1、新建JAVA类,保存书中源代码。

[huser@master bin]$ vi URLCat.java
import java.io.InputStream;
import java.net.URL;

import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
import org.apache.hadoop.io.IOUtils;

public class URLCat {

        static {
                URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
        }

        public static void main(String[] args) throws Exception {
                InputStream in = null;
                try {
                        in = new URL(args[0]).openStream();
                        IOUtils.copyBytes(in, System.out, 4096, false);
                } finally {
                        IOUtils.closeStream(in);
                }
        }
}

~
"URLCat.java" [新] 23L, 481C 已写入                            

2、编译JAVA类。

[huser@master bin]$ javac URLCat.java 
URLCat.java:4: 错误: 程序包org.apache.hadoop.fs不存在
import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
                           ^
URLCat.java:5: 错误: 程序包org.apache.hadoop.io不存在
import org.apache.hadoop.io.IOUtils;
                           ^
URLCat.java:10: 错误: 找不到符号
                URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
                                                   ^
  符号:   类 FsUrlStreamHandlerFactory
  位置: 类 URLCat
URLCat.java:17: 错误: 找不到符号
                        IOUtils.copyBytes(in, System.out, 4096, false);
                        ^
  符号:   变量 IOUtils
  位置: 类 URLCat
URLCat.java:19: 错误: 找不到符号
                        IOUtils.closeStream(in);
                        ^
  符号:   变量 IOUtils
  位置: 类 URLCat
5 个错误

这是因为找不到编译需要加载的类库,指定编译的类库路径。

[huser@master bin]$ javac -classpath ../hadoop-core-1.2.1.jar URLCat.java 
[huser@master bin]$ ll
总用量 152
-rwxr-xr-x 1 huser huser 15147 7月  23 2013 hadoop
-rwxr-xr-x 1 huser huser  2643 7月  23 2013 hadoop-config.sh
-rwxr-xr-x 1 huser huser  5064 7月  23 2013 hadoop-daemon.sh
-rwxr-xr-x 1 huser huser  1329 7月  23 2013 hadoop-daemons.sh
-rwxr-xr-x 1 huser huser  2810 7月  23 2013 rcc
-rwxr-xr-x 1 huser huser  2050 7月  23 2013 slaves.sh
-rwxr-xr-x 1 huser huser  1166 7月  23 2013 start-all.sh
-rwxr-xr-x 1 huser huser  1065 7月  23 2013 start-balancer.sh
-rwxr-xr-x 1 huser huser  1745 7月  23 2013 start-dfs.sh
-rwxr-xr-x 1 huser huser  1145 7月  23 2013 start-jobhistoryserver.sh
-rwxr-xr-x 1 huser huser  1259 7月  23 2013 start-mapred.sh
-rwxr-xr-x 1 huser huser  1119 7月  23 2013 stop-all.sh
-rwxr-xr-x 1 huser huser  1116 7月  23 2013 stop-balancer.sh
-rwxr-xr-x 1 huser huser  1246 7月  23 2013 stop-dfs.sh
-rwxr-xr-x 1 huser huser  1131 7月  23 2013 stop-jobhistoryserver.sh
-rwxr-xr-x 1 huser huser  1168 7月  23 2013 stop-mapred.sh
-rwxr-xr-x 1 huser huser 63598 7月  23 2013 task-controller
-rw-rw-r-- 1 huser huser  1021 4月  17 23:09 URLCat.class
-rw-rw-r-- 1 huser huser   481 4月  17 23:04 URLCat.java

编译成功为CLASS。

3、运行程序

[huser@master bin]$ ../bin/hadoop URLCat hdfs://master/user/huser/in/test2.txt
Warning: $HADOOP_HOME is deprecated.

14/04/17 23:34:37 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/04/17 23:34:38 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/04/17 23:34:39 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/04/17 23:34:40 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/04/17 23:34:41 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/04/17 23:34:42 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/04/17 23:34:43 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/04/17 23:34:44 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/04/17 23:34:45 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/04/17 23:34:46 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
Exception in thread "main" java.net.ConnectException: Call to master/192.168.1.115:8020 failed on connection exception: java.net.ConnectException: 拒绝连接
        at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142)
        at org.apache.hadoop.ipc.Client.call(Client.java:1118)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
        at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
        at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422)
        at org.apache.hadoop.hdfs.DFSClient.createNamenode(DFSClient.java:183)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:281)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1446)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263)
        at org.apache.hadoop.fs.FsUrlConnection.connect(FsUrlConnection.java:45)
        at org.apache.hadoop.fs.FsUrlConnection.getInputStream(FsUrlConnection.java:56)
        at java.net.URL.openStream(URL.java:1037)
        at URLCat.main(URLCat.java:16)
Caused by: java.net.ConnectException: 拒绝连接
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481)
        at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:457)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:583)
        at org.apache.hadoop.ipc.Client$Connection.access$2200(Client.java:205)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1249)
        at org.apache.hadoop.ipc.Client.call(Client.java:1093)
        ... 22 more

这是因为连接失败,需要检查HDFS环境。

[huser@master conf]$ cat core-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href=http://www.mamicode.com/"configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
 <name>fs.default.name</name>
 <value>hdfs://master:9000</value>
</property>

端口是9000,不是默认值。

[huser@master bin]$ ../bin/hadoop URLCat hdfs://master:9000/user/huser/in/test2.txt
Warning: $HADOOP_HOME is deprecated.

hello hadoop

运行成功。