首页 > 代码库 > Hive自定义函数UDAF开发
Hive自定义函数UDAF开发
Hive自定义函数UDAF开发
Hive支持自定义函数,UDAF是接受多行,输出一行。
通常是group by时用到这种函数。
其实最好的学习资料就是官方自带的examples了。
我这里用的是0.10版本hive,所以对于的examples在
https://github.com/apache/hive/tree/branch-0.10/contrib/src/java/org/apache/hadoop/hive/contrib/udaf/example
我这里的功能需求是:
actionCount(act_code,act_times,‘1‘)
如果act_code==‘1‘,则将在一个group里面的act_times加起来。
package hive.udaf; import org.apache.hadoop.hive.ql.exec.UDAF; import org.apache.hadoop.hive.ql.exec.UDAFEvaluator; /** * * It should be very easy to follow and can be used as an example for writing * new UDAFs. * * Note that Hive internally uses a different mechanism (called GenericUDAF) to * implement built-in aggregation functions, which are harder to program but * more efficient. * */ public final class ActionCount extends UDAF { /** * The internal state of an aggregation for average. * * Note that this is only needed if the internal state cannot be represented * by a primitive. * * The internal state can also contains fields with types like * ArrayList<String> and HashMap<String,Double> if needed. */ public static class UDAFState { private long mCount; private long mSum; } /** * The actual class for doing the aggregation. Hive will automatically look * for all internal classes of the UDAF that implements UDAFEvaluator. */ public static class UDAFExampleAvgEvaluator implements UDAFEvaluator { UDAFState state; public UDAFExampleAvgEvaluator() { super(); state = new UDAFState(); init(); } /** * Reset the state of the aggregation. */ public void init() { state.mSum = 0; state.mCount = 0; } /** * Iterate through one row of original data. * * The number and type of arguments need to the same as we call this UDAF * from Hive command line. * * This function should always return true. */ public boolean iterate(String act_code,long act_times,String act_type) // 来了一行 { if (act_code .equals(act_type)) { state.mSum += act_times; state.mCount++; } return true; } /** * Terminate a partial aggregation and return the state. If the state is a * primitive, just return primitive Java classes like Integer or String. */ public UDAFState terminatePartial() {//状态传递 // This is SQL standard - average of zero items should be null. return state.mCount == 0 ? null : state; } /** * Merge with a partial aggregation. * * This function should always have a single argument which has the same * type as the return value of terminatePartial(). */ public boolean merge(UDAFState o) {//子任务合并 if (o != null) { state.mSum += o.mSum; state.mCount += o.mCount; } return true; } /** * Terminates the aggregation and return the final result. */ public long terminate() {//返回最终结果 // This is SQL standard - average of zero items should be null. return state.mCount == 0 ? 0 : state.mSum; } } private ActionCount() { // prevent instantiation } }
关键还是要深刻理解map-reduce工作模型,才能更好驾驭hive。
本文作者:linger
本文链接:http://blog.csdn.net/lingerlanlan/article/details/41920151
Hive自定义函数UDAF开发
声明:以上内容来自用户投稿及互联网公开渠道收集整理发布,本网站不拥有所有权,未作人工编辑处理,也不承担相关法律责任,若内容有误或涉及侵权可进行投诉: 投诉/举报 工作人员会在5个工作日内联系你,一经查实,本站将立刻删除涉嫌侵权内容。