首页 > 代码库 > hive 使用脚本清洗数据:时间戳转日期
hive 使用脚本清洗数据:时间戳转日期
import sysimport datetimefor line in sys.stdin: line = line.strip() userid, movieid, rating, unixtime = line.split(‘\t‘) weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday() print ‘\t‘.join([userid, movieid, rating, str(weekday)])
Use the mapper script:
CREATE TABLE u_data_new ( userid INT, movieid INT, rating INT, weekday INT)ROW FORMAT DELIMITEDFIELDS TERMINATED BY ‘\t‘;add FILE weekday_mapper.py;INSERT OVERWRITE TABLE u_data_newSELECT TRANSFORM (userid, movieid, rating, unixtime) USING ‘python weekday_mapper.py‘ AS (userid, movieid, rating, weekday)FROM u_data;SELECT weekday, COUNT(*)FROM u_data_newGROUP BY weekday;
- FROM (
- MAP doctext USING ‘python wc_mapper.py‘ AS (word, cnt)
- FROM docs
- CLUSTER BY word
- ) a
- REDUCE word, cnt USING ‘python wc_reduce.py‘;
hive 使用脚本清洗数据:时间戳转日期
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。