首页 > 代码库 > spark dataframe 类型转换

spark dataframe 类型转换

读一张表,对其进行二值化特征转换。可以二值化要求输入类型必须double类型,类型怎么转换呢?

直接利用spark column 就可以进行转换:

<style></style>

 

DataFrame dataset = hive.sql("select age,sex,race from hive_race_sex_bucktizer ");

/**

* 类型转换

*/

dataset = dataset.select(dataset.col("age").cast(DoubleType).as("age"),dataset.col("sex"),dataset.col("race"));

 

是不是很简单。想起之前的类型转换做法,遍历并创建另外一个满足类型要求的RDD,然后根据RDD创建Datafame,好复杂!!!!

 

		JavaRDD<Row> parseDataset =   dataset.toJavaRDD().map(new Function<Row,Row>() {			@Override			public Row call(Row row) throws Exception {				System.out.println(row);				long age = row.getLong(row.fieldIndex("age"));				String sex = row.getAs("sex");				String race =row.getAs("race");				double raceV  = -1;				if("white".equalsIgnoreCase(race)){					raceV = 1;				} else if("black".equalsIgnoreCase(race)) {					raceV = 2;				} else if("yellow".equalsIgnoreCase(race)) {					raceV = 3;				} else if("Asian-Pac-Islander".equalsIgnoreCase(race)) {					raceV = 4;				}else if("Amer-Indian-Eskimo".equalsIgnoreCase(race)) {					raceV = 3;				}else {					raceV = 0;				}								return RowFactory.create(age,("male".equalsIgnoreCase(sex)?1:0),raceV);			}		});				StructType schema = new StructType(new StructField[]{				 createStructField("_age", LongType, false),				  createStructField("_sex", IntegerType, false),				  createStructField("_race", DoubleType, false)				});				DataFrame  df  =  hive.createDataFrame(parseDataset, schema);

  不断探索,不断尝试!

 

spark dataframe 类型转换