首页 > 代码库 > SparkSQL使用之Thrift JDBC server

SparkSQL使用之Thrift JDBC server

Thrift JDBC Server描述

Thrift JDBC Server使用的是HIVE0.12的HiveServer2实现。能够使用Spark或者hive0.12版本的beeline脚本与JDBC Server进行交互使用。Thrift JDBC Server默认监听端口是10000。

 

使用Thrift JDBC Server前需要注意:

1、将hive-site.xml配置文件拷贝到$SPARK_HOME/conf目录下;

2、需要在$SPARK_HOME/conf/spark-env.sh中的SPARK_CLASSPATH添加jdbc驱动的jar包

export SPARK_CLASSPATH=$SPARK_CLASSPATH:/home/hadoop/software/mysql-connector-java-5.1.27-bin.jar

 

Thrift JDBC Server命令使用帮助:

cd $SPARK_HOME/sbinstart-thriftserver.sh --help
Usage: ./sbin/start-thriftserver [options] [thrift server options]Spark assembly has been built with Hive, including Datanucleus jars on classpathOptions:  --master MASTER_URL         spark://host:port, mesos://host:port, yarn, or local.  --deploy-mode DEPLOY_MODE   Whether to launch the driver program locally ("client") or                              on one of the worker machines inside the cluster ("cluster")                              (Default: client).  --class CLASS_NAME          Your applications main class (for Java / Scala apps).  --name NAME                 A name of your application.  --jars JARS                 Comma-separated list of local jars to include on the driver                              and executor classpaths.  --py-files PY_FILES         Comma-separated list of .zip, .egg, or .py files to place                              on the PYTHONPATH for Python apps.  --files FILES               Comma-separated list of files to be placed in the working                              directory of each executor.  --conf PROP=VALUE           Arbitrary Spark configuration property.  --properties-file FILE      Path to a file from which to load extra properties. If not                              specified, this will look for conf/spark-defaults.conf.  --driver-memory MEM         Memory for driver (e.g. 1000M, 2G) (Default: 512M).  --driver-java-options       Extra Java options to pass to the driver.  --driver-library-path       Extra library path entries to pass to the driver.  --driver-class-path         Extra class path entries to pass to the driver. Note that                              jars added with --jars are automatically included in the                              classpath.  --executor-memory MEM       Memory per executor (e.g. 1000M, 2G) (Default: 1G).  --help, -h                  Show this help message and exit  --verbose, -v               Print additional debug output Spark standalone with cluster deploy mode only:  --driver-cores NUM          Cores for driver (Default: 1).  --supervise                 If given, restarts the driver on failure. Spark standalone and Mesos only:  --total-executor-cores NUM  Total cores for all executors. YARN-only:  --executor-cores NUM        Number of cores per executor (Default: 1).  --queue QUEUE_NAME          The YARN queue to submit to (Default: "default").  --num-executors NUM         Number of executors to launch (Default: 2).  --archives ARCHIVES         Comma separated list of archives to be extracted into the                              working directory of each executor.Thrift server options:    --hiveconf <property=value>   Use value for given property

master的描述与Spark SQL CLI一致 

 

beeline命令使用帮助:

cd $SPARK_HOME/binbeeline --help
Usage: java org.apache.hive.cli.beeline.BeeLine    -u <database url>               the JDBC URL to connect to   -n <username>                   the username to connect as   -p <password>                   the password to connect as   -d <driver class>               the driver class to use   -e <query>                      query that should be executed   -f <file>                       script file that should be executed   --color=[true/false]            control whether color is used for display   --showHeader=[true/false]       show column names in query results   --headerInterval=ROWS;          the interval between which heades are displayed   --fastConnect=[true/false]      skip building table/column list for tab-completion   --autoCommit=[true/false]       enable/disable automatic transaction commit   --verbose=[true/false]          show verbose error messages and debug info   --showWarnings=[true/false]     display connection warnings   --showNestedErrs=[true/false]   display nested errors   --numberFormat=[pattern]        format numbers using DecimalFormat pattern   --force=[true/false]            continue running script even after errors   --maxWidth=MAXWIDTH             the maximum width of the terminal   --maxColumnWidth=MAXCOLWIDTH    the maximum width to use when displaying columns   --silent=[true/false]           be more silent   --autosave=[true/false]         automatically save preferences   --outputformat=[table/vertical/csv/tsv]   format mode for result display   --isolation=LEVEL               set the transaction isolation level   --help                          display this message

 

Thrift JDBC Server/beeline启动

启动Thrift JDBC Server:默认端口是10000

cd $SPARK_HOME/sbinstart-thriftserver.sh

如何修改Thrift JDBC Server的默认监听端口号?借助于--hiveconf

start-thriftserver.sh  --hiveconf hive.server2.thrift.port=14000

HiveServer2 Clients 详情参见:https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients

 

启动beeline

cd $SPARK_HOME/binbeeline -u jdbc:hive2://hadoop000:10000/default -n hadoop

 

sql脚本测试

SELECT track_time, url, session_id, referer, ip, end_user_id, city_id FROM page_views WHERE city_id = -1000 limit 10;SELECT session_id, count(*) c FROM page_views group by session_id order by c desc limit 10;

 

SparkSQL使用之Thrift JDBC server