首页 > 代码库 > sqoop搭建
sqoop搭建
sqoop版本1.99.7#此搭建过程在最后启动job的时候失败了,每个版本的差异性蛮大的。
版本下载链接:http://pan.baidu.com/s/1pKYrusz 密码:7ib5
搭建sqoop之前,已经配置好了hadoop和java的环境
当第一次启动后,KILL掉HADOOP进程后出现的各种问题,重启机器解决问题。
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
错误: 找不到或无法加载主类 org.apache.hadoop.hdfs.tools.GetConf
错误: 找不到或无法加载主类 org.apache.hadoop.hdfs.server.namenode.NameNode
/usr/local/hadoop/sbin/hadoop-daemon.sh start namenode ##起来后用jps查看有namenode 和dataname
/usr/local/hadoop/sbin/yarn-daemon.sh start resourcemanager
##按服务起来
提示什么鬼 都可以略过,这里被坑一次
1、下载包
2、解压安装包
tar -xzvf sqoop-1.99.7-bin-hadoop200.tar.gz -C /usr/local/
cd /usr/local/
mv sqoop-1.99.7-bin-hadoop200 sqoop1
3、配置环境变量
export SQOOP_HOME=/usr/local/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
配置了hadoop的环境变量,在sqoop2版本也需要配置以下几个,不然重启后就问题了:
Can‘t load the Hadoop related java lib, please check the setting for the following environment variables:
HADOOP_COMMON_HOME, HADOOP_HDFS_HOME, HADOOP_MAPRED_HOME, HADOOP_YARN_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME/share/hadoop/common
export HADOOP_HDFS_HOME=$HADOOP_HOME/share/hadoop/hdfs
export HADOOP_MAPRED_HOME=$HADOOP_HOME/share/hadoop/mapreduce
export HADOOP_YARN_HOME=$HADOOP_HOME/share/hadoop/yarn
4、修改sqoop的配置文件
/usr/local/sqoop/conf/sqoop.properties 在144行
org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/usr/local/hadoop/etc/hadoop/
#这里配置的hadoop的配置文件路径,没配置好会导致
not a directory or permission issues
5、拷贝驱动包到sqoop/lib目录下
拷贝mysql-connector-java-5.1.6-bin.jar
链接:http://pan.baidu.com/s/1qXIGeSG 密码:iykt 下载包
6、配置hadoop代理访问
sqoop访问Hadoop的MapReduce使用的是代理的方式,必须在Hadoop中配置所接受的proxy用户和组。
找到Hadoop的core-site.xml配置文件
vi /usr/local/hadoop/etc/hadoop/container-executor.cfg
allowed.system.users=hadoop
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property><property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value></property>
</property>
7、配置 Tool class org.apache.sqoop.tools.tool.VerifyTool has failed(没有配置yarm-site.xml)
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
8、验证配置是否有效
bin/sqoop2-tool verify 返回
Verification was successful.
Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.
9、启动sqoop服务
sqoop2-server start
通过JDK中的jps工具查看是否已经正确启动起来,正常情况下会有个SqoopJettyServer的进程,
Sqoop server是基于jetty实现的。
10、进入SQOOP
进入Client的shell环境:./sqoop2-shell
问题:Exception in thread "main" java.lang.UnsatisfiedLinkError:
/usr/local/jdk/jre/lib/i386/xawt/libmawt.so: libXext.so.6: 无法打开共对象文件: 没有那个文件或目录
yum install glibc.i686 libXtst.i686
问题2:
/usr/local/jdk/jre/lib/i386/xawt/libmawt.so: libXrender.so.1:
需要安装一个RPM包,根据系统的版本642.4.2.el6.x86_64 #1
包下载地址:http://down.51cto.com/data/2260998
然后rpm -ivh libXrender-0.9.10-1.fc26.i686.rpm
系统包的缺失是个坑
最后在执行sqoop2-shell
[root@nod2 bin]# ./sqoop2-shell
Setting conf dir: /usr/local/sqoop/bin/../conf
Sqoop home directory: /usr/local/sqoop
Sqoop Shell: Type ‘help‘ or ‘\h‘ for help.
sqoop:000>
测试mysql到hadoop数据的导入
1、在MYSQL添加一个用户
grant all privileges on *.* to sqoop@‘192.168.%‘ identified by ‘sqoop‘ with grants option;
mysql> create database sqoop;
Query OK, 1 row affected (0.02 sec)
mysql> use sqoop
Database changed
mysql> create table sqoop(id int,c_time timestamp,name varchar(20))
-> ;
Query OK, 0 rows affected (0.10 sec)
mysql> insert into sqoop(id,name)value(1,‘sqoop‘)
-> ;
Query OK, 1 row affected (0.02 sec)
mysql> insert into sqoop(id,name)value(2,‘hadoop‘)
-> ;
Query OK, 1 row affected (0.02 sec)
mysql> insert into sqoop(id,name)value(3,‘hive‘)
-> ;
Query OK, 1 row affected (0.02 sec)
mysql> select * from sqoop;
+------+---------------------+--------+
| id | c_time | name |
+------+---------------------+--------+
| 1 | 2016-11-22 15:04:04 | sqoop |
| 2 | 2016-11-22 15:04:13 | hadoop |
| 3 | 2016-11-22 15:04:21 | hive |
+------+---------------------+--------+
3 rows in set (0.00 sec)
2、测试连接sqoop
首相了解下1.97的sqoop的操作和其他的版本有区别的:
#查看版本信息
show version /show version --all
#查看sqoop所有连接方式,重点
sqoop:000> show connector
+------------------------+---------+------------------------------------------------------------+----------------------+
| Name | Version | Class | Supported Directions |
+------------------------+---------+------------------------------------------------------------+----------------------+
| oracle-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.oracle.OracleJdbcConnector | FROM/TO |
| sftp-connector | 1.99.7 | org.apache.sqoop.connector.sftp.SftpConnector | TO |
| kafka-connector | 1.99.7 | org.apache.sqoop.connector.kafka.KafkaConnector | TO |
| kite-connector | 1.99.7 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO |
| ftp-connector | 1.99.7 | org.apache.sqoop.connector.ftp.FtpConnector | TO |
| hdfs-connector | 1.99.7 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO |
| generic-jdbc-connector | 1.99.7 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
+------------------------+---------+------------------------------------------------------------+----------------------+
#查看当前的link
show link
#查看job
show job
#创建一个hdfs连接,这个和其他版本操作有区别的,
##这个版本创建一个链接必须指定好根据show link的类型才可以,
sqoop:000> create link -c hdfs-connector
Creating link for connector with name hdfs-connector
Please fill following values to create new link object
Name: hdfs_link
HDFS cluster
URI: hdfs://mycat:9000
Conf directory: /usr/local/hadoop/etc/hadoop
Additional configs::
There are currently 0 values in the map:
entry#
New link was successfully created with validation status OK and name hdfs_link
sqoop:000> show link
+-----------+----------------+---------+
| Name | Connector Name | Enabled |
+-----------+----------------+---------+
| show link | hdfs-connector | true |
| hdfs_link | hdfs-connector | true |
+-----------+----------------+---------+
##创建mysql连接
sqoop:000> create link -connector generic-jdbc-connector
Creating link for connector with name generic-jdbc-connector
Please fill following values to create new link object
Name: mysql
Database connection
Driver class: com.mysql.jdbc.Driver
Connection String: jdbc:mysql://192.168.1.107/sqoop
Username: sqoop
Password: *******
Fetch Size:
Connection Properties:
There are currently 0 values in the map:
entry# protocol=tcp
There are currently 1 values in the map:
protocol = tcp
entry#
Identifier enclose:
New link was successfully created with validation status OK and name mysql
sqoop:000> show link
+-----------+------------------------+---------+
| Name | Connector Name | Enabled |
+-----------+------------------------+---------+
| show link | hdfs-connector | true |
| hdfs_link | hdfs-connector | true |
| mysql | generic-jdbc-connector | true |
+-----------+------------------------+---------+
配置好连接,就要配置传输事物job,需要指定JOB,用于提交给mapreduce
Name: mysql-hdfs
Database source
Schema name: sqoop
Table name: sqoop
SQL statement:
Column names:
There are currently 1 values in the list:
1
element#
Partition column:
Partition column nullable:
Boundary query:
Incremental read
Check column:
Last value:
Target configuration
Override null value: false
Null value:
File format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
2 : PARQUET_FILE
Choose: 2
Compression codec:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0
Custom codec:
Output directory: hdfs:/home/sqoop
Append mode: false
Throttling resources
Extractors: 2
Loaders: 2
Classpath configuration
Extra mapper jars:
There are currently 1 values in the list:
1
element#
New job was successfully created with validation status OK and name mysql-hdfs
sqoop:000>
sqoop:000> create job -f 2 -t 1
Creating job for links with from id 1 and to id 6
Please fill following values to create new job object
Name: mysql_openfire--设置 任务名称
FromJob configuration
Schema name:(Required)sqoop --库名:必填
Table name:(Required)sqoop --表名:必填
Table SQL statement:(Optional) --选填
Table column names:(Optional) --选填
Partition column name:(Optional) id --选填
Null value allowed for the partition column:(Optional) --选填
Boundary query:(Optional) --选填
ToJob configuration
Output format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
Output format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
Choose: 0 --选择文件压缩格式
Compression format:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0 --选择压缩类型
Custom compression format:(Optional) --选填
Output directory:hdfs:/ns1/sqoop --HDFS存储目录(目的地)
Driver Config
Extractors: 2 --提取器
Loaders: 2 --加载器
New job was successfully created with validation status OK and persistent id 1
sqoop:000> show job
+----+------------+--------------------------------+----------------------------+---------+
| Id | Name | From Connector | To Connector | Enabled |
+----+------------+--------------------------------+----------------------------+---------+
| 1 | mysql-hdfs | mysql (generic-jdbc-connector) | hdfs_link (hdfs-connector) | true |
+----+------------+--------------------------------+----------------------------+---------+
sqoop:000>
常用命令列表
sqoop:001> show link 显示所有链接
sqoop:001> carete link --cid 1床架连接
sqoop:000> delete link --lid 1 删除link
sqoop:001> show job 显示所有job
sqoop:001> create job --f 2 --t 1 创建job ( 从link 2 向link 1导入数据)
sqoop:000> start job --jid 1 启动job
sqoop:000> status job --jid 1 查看导入状态
sqoop:000> delete job --jid 1 删除job
ERROR tool.ImportTool: Encountered IOException running import job:
java.io.IOException: No columns to generate for ClassWriter
连接器问题,换把
Exception in thread "main" Java.lang.IncompatibleClassChangeError:
Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected
hadoop与sqoop版本问题 不一致
Exception has occurred during processing command
Exception: org.apache.sqoop.common.SqoopException Message: CLIENT_0001:Server has returned exception
根本不知道这个提示说什么,通过修改设置:
set option --name verbose --value true
本文出自 “DBSpace” 博客,请务必保留此出处http://dbspace.blog.51cto.com/6873717/1875955
sqoop搭建