首页 > 代码库 > Hadoop通过c语言API访问hdfs

Hadoop通过c语言API访问hdfs

Hadoop给我们提供了使用c语言访问hdfsAPI,下面进行简要介绍:

环境:ubuntu14.04  hadoop1.0.1  jdk1.7.0_51

访问hdfs的函数主要定义在hdfs.h文件中,该文件位于hadoop-1.0.1/src/c++/libhdfs/文件夹下,而相应的库文件是位于hadoop-1.0.1/c++/Linux-amd64-64/lib/目录下的libhdfs.so,另外要访问hdfs还需要依赖jdk的相关API,头文件目录包括jdk1.7.0_51/include/和jdk1.7.0_51/include/linux/,库文件为jdk1.7.0_51/jre/lib/amd64/server/目录下的libjvm.so,这些库和包含目录都要在编译连接时给出。下面是一段简单的源程序main.c

  1 #include <stdio.h>  2   3 #include <stdlib.h>  4   5 #include <string.h>  6   7 #include "hdfs.h"  8   9   10  11 int main(int argc, char **argv) 12  13 { 14  15     /* 16  17      * Connection to hdfs. 18  19      */ 20  21     hdfsFS fs = hdfsConnect("127.0.0.1", 9000); 22  23     if(!fs) 24  25     { 26  27         fprintf(stderr, "Failed to connect to hdfs.\n"); 28  29         exit(-1); 30  31     } 32  33     /* 34  35      * Create and open a file in hdfs. 36  37      */ 38  39     const char* writePath = "/user/root/output/testfile.txt"; 40  41     hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 0); 42  43     if(!writeFile) 44  45     { 46  47         fprintf(stderr, "Failed to open %s for writing!\n", writePath); 48  49         exit(-1); 50  51     } 52  53     /* 54  55      * Write data to the file. 56  57      */ 58  59     const char* buffer = "Hello, World!"; 60  61     tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1); 62  63   64  65     /* 66  67      * Flush buffer. 68  69      */ 70  71     if (hdfsFlush(fs, writeFile)) 72  73     { 74  75         fprintf(stderr, "Failed to ‘flush‘ %s\n", writePath); 76  77         exit(-1); 78  79     } 80  81   82  83     /* 84  85      * Close the file. 86  87      */ 88  89     hdfsCloseFile(fs, writeFile); 90  91   92  93     unsigned bufferSize=1024; 94  95     const char* readPath = "/user/root/output/testfile.txt"; 96  97     hdfsFile readFile = hdfsOpenFile(fs, readPath, O_RDONLY, bufferSize, 0, 0); 98  99     if (!readFile) {100 101         fprintf(stderr,"couldn‘t open file %s for reading\n",readPath);102 103         exit(-2);104 105     }106 107     // data to be written to the file108 109     char* rbuffer = (char*)malloc(sizeof(char) * (bufferSize+1));110 111     if(rbuffer == NULL) {112 113         return -2;114 115     }116 117  118 119     // read from the file120 121     tSize curSize = bufferSize;122 123     for (; curSize == bufferSize;) {124 125         curSize = hdfsRead(fs, readFile, (void*)rbuffer, curSize);126 127         rbuffer[curSize]=\0;128 129         fprintf(stdout, "read ‘%s‘ from file!\n", rbuffer);130 131     }132 133  134 135     free(rbuffer);136 137     hdfsCloseFile(fs, readFile);138 139     /*140 141      * Disconnect to hdfs.142 143      */144 145     hdfsDisconnect(fs);146 147  148 149     return 0;150 151 }

 

程序比较简单,重要的地方都有注释,这里就不一一解释了。程序所实现的主要功能为在hdfs/user/root/output/目录下新建一名称为testfile.txt的文件,并写入Hello, World!,然后将Hello, World!从该文件中读出并打印出来。如果你的hdfs中没有/user/root/output/目录,则需要你新建一个或将路径改为一个存在的路径。

下面给出我系统中的编译连接指令:

g++ main.cpp -I /root/hadoop-1.0.1/src/c++/libhdfs/ -I /usr/java/jdk1.7.0_51/include/ -I /usr/java/jdk1.7.0_51/include/linux/ -L /root/hadoop-1.0.1/c++/Linux-amd64-64/lib/ -lhdfs -L /usr/java/jdk1.7.0_51/jre/lib/amd64/server/ -ljvm -o hdfs-test

其中,g++为编译指令,-I后面的是头文件包含路径,-L后面的是要连接的库文件路径-lhdfs-ljvm是要连接的具体库名称。具体路径需要换成你系统中的相应路径。至此,编译应该就可以完成了。但运行时回报找不到libhdfs.so.0libjvm.so。解决办法是将相应库文件所在目录追加到到/etc/ld.so.conf文件中,然后执行ldconfig命令,这相当于在系统中注册了一下相应的库,运行时就不会找不到了。