首页 > 代码库 > 鸡肋的JdbcRDD
鸡肋的JdbcRDD
今天准备将mysql的数据倒腾到RDD,很早以前就知道有一个JdbcRDD,就想着使用一下,结果发现却是鸡肋一个。
附上个例子:
首先,看看JdbcRDD的定义:
* An RDD that executes an SQL query on a JDBC connection and reads results. * For usage example, see test case JdbcRDDSuite. * * @param getConnection a function that returns an open Connection. * The RDD takes care of closing the connection. * @param sql the text of the query. * The query must contain two ? placeholders for parameters used to partition the results. * E.g. "select title, author from books where ? <= id and id <= ?" * @param lowerBound the minimum value of the first placeholder * @param upperBound the maximum value of the second placeholder * The lower and upper bounds are inclusive. * @param numPartitions the number of partitions. * Given a lowerBound of 1, an upperBound of 20, and a numPartitions of 2, * the query would be executed twice, once with (1, 10) and once with (11, 20) * @param mapRow a function from a ResultSet to a single row of the desired result type(s). * This should only call getInt, getString, etc; the RDD takes care of calling next. * The default maps a ResultSet to an array of Object. */ class JdbcRDD[T: ClassTag]( sc: SparkContext, getConnection: () => Connection, sql: String, lowerBound: Long, upperBound: Long, numPartitions: Int, mapRow: (ResultSet) => T = JdbcRDD.resultSetToObjectArray _)
附上个例子:
package test import java.sql.{Connection, DriverManager, ResultSet} import org.apache.spark.rdd.JdbcRDD import org.apache.spark.{SparkConf, SparkContext} object spark_mysql { def main(args: Array[String]) { //val conf = new SparkConf().setAppName("spark_mysql").setMaster("local") val sc = new SparkContext("local","spark_mysql") def createConnection() = { Class.forName("com.mysql.jdbc.Driver").newInstance() DriverManager.getConnection("jdbc:mysql://192.168.0.15:3306/wsmall", "root", "passwd") } def extractValues(r: ResultSet) = { (r.getString(1), r.getString(2)) } val data = http://www.mamicode.com/new JdbcRDD(sc, createConnection, "SELECT id,aa FROM bbb where ? <= ID AND ID <= ?", lowerBound = 3, upperBound =5, numPartitions = 1, mapRow = extractValues)>
使用的MySQL表的数据如下:
运行结果如下:
可以看出:JdbcRDD的sql参数要带有两个?的占位符,而这两个占位符是给参数lowerBound和参数upperBound定义where语句的边界的,如果仅仅是这样的话,还可以接受;但悲催的是参数lowerBound和参数upperBound都是Long类型的,,不知道现在作为关键字或做查询的字段有多少long类型呢?不过参照JdbcRDD的源代码,用户还是可以写出符合自己需求的JdbcRDD,这算是不幸中之大幸了。
最近一直忙于炼数成金的spark课程,没多少时间整理博客。特意给想深入了解spark的朋友推荐一位好友的博客http://www.cnblogs.com/cenyuhai/ ,里面有不少源码博文,利于理解spark的内核。
鸡肋的JdbcRDD
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。