首页 > 代码库 > hive:子查询
hive:子查询
hive本身支持的子查询非常有限,Hive不支持where子句中的子查询,只允许子查询在from中出现
错误写法:
insert into table branch_atmzc_sumSelect XT_OP_TRL, SA_TX_DT,"取款-存款",b.cr_tx_amt- a.cr_tx_amt as cr_tx_amt from branch_atmzc a join branch_atmzc b on (a.XT_OP_TRL = b.XT_OP_TRL and a.SA_TX_DT = b.SA_TX_DT and a.tran_cd = ‘ATM存款‘ and b.tran_cd = ‘ATM取款‘), counts from branch_atmzcgroup by XT_OP_TRL, SA_TX_DT,cr_tx_amt,counts;
正确写法:
insert into table branch_atmzc_sumSelect a.XT_OP_TRL, a.SA_TX_DT,"取款-存款",b.cr_tx_amt- a.cr_tx_amt ,b.counts+a.counts from branch_atmzc a join branch_atmzc b on (a.XT_OP_TRL = b.XT_OP_TRL and a.SA_TX_DT = b.SA_TX_DT and a.tran_cd = ‘ATM存款‘ and b.tran_cd = ‘ATM取款‘)
另外一种:考虑效率
SELECT t1.产品类型, COUNT(DISTINCT (IF(t2.用户ID IS NULL, NULL, t1.用户ID))) AS KEEP_UVFROM( SELECT 产品类型, 用户ID FROM 事实表 WHERE (`DATE` >= 20140201 AND `DATE` <= 20140228)) t1LEFT OUTER JOIN( SELECT 产品类型, 用户ID FROM 事实表 WHERE (`DATE` >= 20140101 AND `DATE` <= 20140131)) t2 ON (t1.产品类型 = t2.产品类型 AND t1.用户ID = t2.用户ID)GROUP BY t1.产品类型
本身表包含的字段信息多,时间跨度大。对于这种对于IN / EXISTS子查询(准确地说,这里是非相关子查询)有一种高效的实现,就是LEFT SEMI JOIN:
LEFT SEMI JOIN implements the uncorrelated IN/EXISTS subquery semantics in an efficient way.
left join
SELECT 产品类型, COUNT(DISTINCT t1.用户ID) AS KEEP_UVFROM( SELECT 产品类型, 用户ID FROM 事实表 WHERE (`DATE` >= 20140201 AND `DATE` <= 20140228)) t1LEFT SEMI JOIN( SELECT 产品类型, 用户ID FROM 事实表 WHERE (`DATE` >= 20140101 AND `DATE` <= 20140131)) t2 ON (t1.产品类型 = t2.产品类型 AND t1.用户ID = t2.用户ID)GROUP BY 产品类型
hive:子查询
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。