首页 > 代码库 > sqoop搭建

sqoop搭建

sqoop版本1.99.7#此搭建过程在最后启动job的时候失败了,每个版本的差异性蛮大的。

版本下载链接:http://pan.baidu.com/s/1pKYrusz 密码:7ib5

搭建sqoop之前,已经配置好了hadoop和java的环境

当第一次启动后,KILL掉HADOOP进程后出现的各种问题,重启机器解决问题。

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

错误: 找不到或无法加载主类 org.apache.hadoop.hdfs.tools.GetConf

错误: 找不到或无法加载主类 org.apache.hadoop.hdfs.server.namenode.NameNode


/usr/local/hadoop/sbin/hadoop-daemon.sh start  namenode ##起来后用jps查看有namenode 和dataname

/usr/local/hadoop/sbin/yarn-daemon.sh start  resourcemanager


##按服务起来

提示什么鬼 都可以略过,这里被坑一次

1、下载包

2、解压安装包

tar -xzvf sqoop-1.99.7-bin-hadoop200.tar.gz -C /usr/local/

cd /usr/local/

mv sqoop-1.99.7-bin-hadoop200 sqoop1

3、配置环境变量

export SQOOP_HOME=/usr/local/sqoop

export PATH=$PATH:$SQOOP_HOME/bin

配置了hadoop的环境变量,在sqoop2版本也需要配置以下几个,不然重启后就问题了:

Can‘t load the Hadoop related java lib, please check the setting for the following environment variables:

    HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, HADOOP_MAPRED_HOME, HADOOP_YARN_HOME



export HADOOP_COMMON_HOME=$HADOOP_HOME/share/hadoop/common

export HADOOP_HDFS_HOME=$HADOOP_HOME/share/hadoop/hdfs

export  HADOOP_MAPRED_HOME=$HADOOP_HOME/share/hadoop/mapreduce

export HADOOP_YARN_HOME=$HADOOP_HOME/share/hadoop/yarn

4、修改sqoop的配置文件

/usr/local/sqoop/conf/sqoop.properties 在144行

org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/usr/local/hadoop/etc/hadoop/

#这里配置的hadoop的配置文件路径,没配置好会导致

not a directory or permission issues

5、拷贝驱动包到sqoop/lib目录下

拷贝mysql-connector-java-5.1.6-bin.jar

链接:http://pan.baidu.com/s/1qXIGeSG 密码:iykt 下载包

6、配置hadoop代理访问

sqoop访问Hadoop的MapReduce使用的是代理的方式,必须在Hadoop中配置所接受的proxy用户和组。

找到Hadoop的core-site.xml配置文件

vi /usr/local/hadoop/etc/hadoop/container-executor.cfg 

allowed.system.users=hadoop

<property>

<name>hadoop.proxyuser.hadoop.hosts</name>

<value>*</value>

</property><property>

     <name>hadoop.proxyuser.hadoop.groups</name>

      <value>*</value></property>

</property>


7、配置 Tool class org.apache.sqoop.tools.tool.VerifyTool has failed(没有配置yarm-site.xml)

 <property>

   <name>mapreduce.framework.name</name>

    <value>yarn</value>

 </property>

8、验证配置是否有效

bin/sqoop2-tool verify 返回

Verification was successful.

Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.

9、启动sqoop服务

sqoop2-server start

通过JDK中的jps工具查看是否已经正确启动起来,正常情况下会有个SqoopJettyServer的进程,

Sqoop server是基于jetty实现的。

10、进入SQOOP

进入Client的shell环境:./sqoop2-shell

问题:Exception in thread "main" java.lang.UnsatisfiedLinkError:

 /usr/local/jdk/jre/lib/i386/xawt/libmawt.so: libXext.so.6: 无法打开共对象文件: 没有那个文件或目录


yum install glibc.i686 libXtst.i686


问题2:

 /usr/local/jdk/jre/lib/i386/xawt/libmawt.so: libXrender.so.1:

需要安装一个RPM包,根据系统的版本642.4.2.el6.x86_64 #1

包下载地址:http://down.51cto.com/data/2260998

然后rpm -ivh libXrender-0.9.10-1.fc26.i686.rpm 

系统包的缺失是个坑

最后在执行sqoop2-shell 

[root@nod2 bin]# ./sqoop2-shell

Setting conf dir: /usr/local/sqoop/bin/../conf

Sqoop home directory: /usr/local/sqoop

Sqoop Shell: Type ‘help‘ or ‘\h‘ for help.


sqoop:000> 


测试mysql到hadoop数据的导入

1、在MYSQL添加一个用户

grant all privileges on *.* to sqoop@‘192.168.%‘ identified by ‘sqoop‘ with grants option;


mysql> create database sqoop;

Query OK, 1 row affected (0.02 sec)


mysql> use sqoop

Database changed

mysql> create table sqoop(id int,c_time timestamp,name varchar(20))

    -> ;

Query OK, 0 rows affected (0.10 sec)


mysql> insert into sqoop(id,name)value(1,‘sqoop‘)

    -> ;

Query OK, 1 row affected (0.02 sec)


mysql> insert into sqoop(id,name)value(2,‘hadoop‘)

    -> ;

Query OK, 1 row affected (0.02 sec)


mysql> insert into sqoop(id,name)value(3,‘hive‘)

    -> ;

Query OK, 1 row affected (0.02 sec)


mysql> select * from sqoop;

+------+---------------------+--------+

| id   | c_time              | name   |

+------+---------------------+--------+

|    1 | 2016-11-22 15:04:04 | sqoop  |

|    2 | 2016-11-22 15:04:13 | hadoop |

|    3 | 2016-11-22 15:04:21 | hive   |

+------+---------------------+--------+

3 rows in set (0.00 sec)


2、测试连接sqoop

首相了解下1.97的sqoop的操作和其他的版本有区别的:

#查看版本信息

show version /show version --all

#查看sqoop所有连接方式,重点

sqoop:000> show connector

+------------------------+---------+------------------------------------------------------------+----------------------+

|          Name          | Version |                           Class                            | Supported Directions |

+------------------------+---------+------------------------------------------------------------+----------------------+

| oracle-jdbc-connector  | 1.99.7  | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO              |

| sftp-connector         | 1.99.7  | org.apache.sqoop.connector.sftp.SftpConnector              | TO                   |

| kafka-connector        | 1.99.7  | org.apache.sqoop.connector.kafka.KafkaConnector            | TO                   |

| kite-connector         | 1.99.7  | org.apache.sqoop.connector.kite.KiteConnector              | FROM/TO              |

| ftp-connector          | 1.99.7  | org.apache.sqoop.connector.ftp.FtpConnector                | TO                   |

| hdfs-connector         | 1.99.7  | org.apache.sqoop.connector.hdfs.HdfsConnector              | FROM/TO              |

| generic-jdbc-connector | 1.99.7  | org.apache.sqoop.connector.jdbc.GenericJdbcConnector       | FROM/TO              |

+------------------------+---------+------------------------------------------------------------+----------------------+


#查看当前的link

show link

#查看job

show job

#创建一个hdfs连接,这个和其他版本操作有区别的,

##这个版本创建一个链接必须指定好根据show link的类型才可以,

sqoop:000> create link -c hdfs-connector

Creating link for connector with name hdfs-connector

Please fill following values to create new link object

Name: hdfs_link


HDFS cluster


URI: hdfs://mycat:9000

Conf directory: /usr/local/hadoop/etc/hadoop

Additional configs:: 

There are currently 0 values in the map:

entry# 

New link was successfully created with validation status OK and name hdfs_link

sqoop:000> show link

+-----------+----------------+---------+

|   Name    | Connector Name | Enabled |

+-----------+----------------+---------+

| show link | hdfs-connector | true    |

| hdfs_link | hdfs-connector | true    |

+-----------+----------------+---------+

##创建mysql连接

sqoop:000> create link  -connector generic-jdbc-connector

Creating link for connector with name generic-jdbc-connector

Please fill following values to create new link object

Name: mysql


Database connection


Driver class: com.mysql.jdbc.Driver

Connection String: jdbc:mysql://192.168.1.107/sqoop

Username: sqoop

Password: *******

Fetch Size: 

Connection Properties: 

There are currently 0 values in the map:

entry# protocol=tcp

There are currently 1 values in the map:

protocol = tcp

entry# 

Identifier enclose: 

New link was successfully created with validation status OK and name mysql

sqoop:000> show link

+-----------+------------------------+---------+

|   Name    |     Connector Name     | Enabled |

+-----------+------------------------+---------+

| show link | hdfs-connector         | true    |

| hdfs_link | hdfs-connector         | true    |

| mysql     | generic-jdbc-connector | true    |

+-----------+------------------------+---------+

配置好连接,就要配置传输事物job,需要指定JOB,用于提交给mapreduce

Name: mysql-hdfs


Database source


Schema name: sqoop

Table name: sqoop

SQL statement: 

Column names: 

There are currently 1 values in the list:

1

element# 

Partition column: 

Partition column nullable: 

Boundary query: 


Incremental read


Check column: 

Last value: 


Target configuration


Override null value: false

Null value: 

File format: 

  0 : TEXT_FILE

  1 : SEQUENCE_FILE

  2 : PARQUET_FILE

Choose: 2

Compression codec: 

  0 : NONE

  1 : DEFAULT

  2 : DEFLATE

  3 : GZIP

  4 : BZIP2

  5 : LZO

  6 : LZ4

  7 : SNAPPY

  8 : CUSTOM

Choose: 0

Custom codec: 

Output directory: hdfs:/home/sqoop

Append mode: false


Throttling resources


Extractors: 2

Loaders: 2


Classpath configuration


Extra mapper jars: 

There are currently 1 values in the list:

1

element# 

New job was successfully created with validation status OK  and name mysql-hdfs

sqoop:000> 


sqoop:000> create job -f 2 -t 1  

Creating job for links with from id 1 and to id 6  

Please fill following values to create new job object  

Name: mysql_openfire--设置 任务名称  

FromJob configuration  

Schema name:(Required)sqoop --库名:必填  

Table name:(Required)sqoop --表名:必填  

Table SQL statement:(Optional) --选填  

Table column names:(Optional) --选填  

Partition column name:(Optional) id --选填  

Null value allowed for the partition column:(Optional) --选填  

Boundary query:(Optional) --选填  

ToJob configuration  

Output format:  

0 : TEXT_FILE  

1 : SEQUENCE_FILE  

Output format:  

0 : TEXT_FILE  

1 : SEQUENCE_FILE  

Choose: 0 --选择文件压缩格式  

Compression format:  

0 : NONE  

1 : DEFAULT  

2 : DEFLATE  

3 : GZIP  

4 : BZIP2  

5 : LZO  

6 : LZ4  

7 : SNAPPY  

8 : CUSTOM  

Choose: 0 --选择压缩类型  

Custom compression format:(Optional) --选填  

Output directory:hdfs:/ns1/sqoop --HDFS存储目录(目的地)  

Driver Config  

Extractors: 2 --提取器  

Loaders: 2 --加载器  

New job was successfully created with validation status OK and persistent id 1  

sqoop:000> show job

+----+------------+--------------------------------+----------------------------+---------+

| Id |    Name    |         From Connector         |        To Connector        | Enabled |

+----+------------+--------------------------------+----------------------------+---------+

| 1  | mysql-hdfs | mysql (generic-jdbc-connector) | hdfs_link (hdfs-connector) | true    |

+----+------------+--------------------------------+----------------------------+---------+

sqoop:000> 

 常用命令列表

sqoop:001> show link 显示所有链接

sqoop:001> carete link --cid 1床架连接

sqoop:000> delete link --lid 1 删除link

sqoop:001> show job 显示所有job

sqoop:001> create job --f 2 --t 1 创建job ( 从link 2 向link 1导入数据)

sqoop:000> start job --jid 1 启动job

sqoop:000> status job --jid 1 查看导入状态

sqoop:000> delete job --jid 1 删除job




ERROR tool.ImportTool: Encountered IOException running import job: 

java.io.IOException: No columns to generate for ClassWriter

连接器问题,换把

Exception in thread "main" Java.lang.IncompatibleClassChangeError: 

Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected


hadoop与sqoop版本问题 不一致


Exception has occurred during processing command

Exception: org.apache.sqoop.common.SqoopException Message: CLIENT_0001:Server has returned exception

根本不知道这个提示说什么,通过修改设置:

set option --name verbose --value true


本文出自 “DBSpace” 博客,请务必保留此出处http://dbspace.blog.51cto.com/6873717/1875955

sqoop搭建