首页 > 代码库 > Pentaho的Mondrian对Hive的支持
Pentaho的Mondrian对Hive的支持
需求描述
考虑直接在Hive或者Impala等Big Data方案,能够支持MDX查询,现调研一下Mondrian对hive的支持情况。
环境准备
hive环境,采用hive-0.10-cdh4.2.1 客户端程序使用的类库:mondrian-3.6.0、olap4j-1.2.0-SNAPSHOT
数据准备
来源于网上一个数据源,准备四张表 Customer - 客户信息维表 Product - 产品维表 ProductType - 产品类表维表 Sale - 销售记录表 为了方便测试数据与MDX正确性,将数据导入到MySQL中一份,用来与Hive查询结果进行对比。
在MySQL创建对应表及数据
具体SQL语句:
create database hive_test; use hive_test; /**用户信息表*/ create table Customer ( cusId int not null, gender char(1) null, constraint PK_CUSTOMER primary key(cusId) ); /**产品表*/ create table Product ( proId int not null, proTypeId int null, proName varchar(32) null, constraint PK_PRODUCT primary key(proId) ); /**产品类别表*/ create table ProductType ( proTypeId int not null, proTypeName varchar(32) null, constraint PK_PRODUCTTYPE primary key (proTypeId) ); /**销售记录表/ create table Sale ( saleId int not null, proId int null, cusId int null, unitPrice float null, number int null, constraint PK_SALE primary key(saleId) ); insert into Customer(cusId,gender) values(1,‘F‘); insert into Customer(cusId,gender) values(2,‘M‘); insert into Customer(cusId,gender) values(3,‘M‘); insert into Customer(cusId,gender) values(4,‘F‘); insert into ProductType(proTypeId,proTypeName) values(1,‘electrical‘); insert into ProductType(proTypeId,proTypeName) values(2,‘digital‘); insert into ProductType(proTypeId,proTypeName) values(3,‘furniture‘); insert into Product(proId,proTypeId,proName) values(1,1,‘washing machine‘); insert into Product(proId,proTypeId,proName) values(2,1,‘television‘); insert into Product(proId,proTypeId,proName) values(3,2,‘mp3‘); insert into Product(proId,proTypeId,proName) values(4,2,‘mp4‘); insert into Product(proId,proTypeId,proName) values(5,2,‘camera‘); insert into Product(proId,proTypeId,proName) values(6,3,‘chair‘); insert into Product(proId,proTypeId,proName) values(7,3,‘desk‘); insert into sale(saleId,proId,cusId,unitPrice,number) values(1,1,1,340.34,2); insert into sale(saleId,proId,cusId,unitPrice,number) values(2,1,2,140.34,1); insert into sale(saleId,proId,cusId,unitPrice,number) values(3,2,3,240.34,3); insert into sale(saleId,proId,cusId,unitPrice,number) values(4,3,4,540.34,4); insert into sale(saleId,proId,cusId,unitPrice,number) values(5,4,1,80.34,5); insert into sale(saleId,proId,cusId,unitPrice,number) values(6,5,2,90.34,26); insert into sale(saleId,proId,cusId,unitPrice,number) values(7,6,3,140.34,7); insert into sale(saleId,proId,cusId,unitPrice,number) values(8,7,4,640.34,28); insert into sale(saleId,proId,cusId,unitPrice,number) values(9,6,1,140.34,29); insert into sale(saleId,proId,cusId,unitPrice,number) values(10,7,2,740.34,29); insert into sale(saleId,proId,cusId,unitPrice,number) values(11,5,3,30.34,28); insert into sale(saleId,proId,cusId,unitPrice,number) values(12,4,4,1240.34,72); insert into sale(saleId,proId,cusId,unitPrice,number) values(13,3,1,314.34,27); insert into sale(saleId,proId,cusId,unitPrice,number) values(14,3,2,45.34,27);
hive中测试数据准备
在虚拟机准备好hive测试环境,采用hive-0.10-cdh4.2.1版本 具体语句:
create database mondrian; use mondrian; create table Sale (saleId INT, proId INT, cusId INT, unitPrice FLOAT, number INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; create table Product (proId INT, proTypeId INT, proName STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; create table ProductType (proTypeId INT, proTypeName STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; create table Customer (cusId INT, gender STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ","; # Customer文件 1,F 2,M 3,M 4,F load data local inpath "/home/hzwangxx/cdh4/hive/myTmp/Customer" OVERWRITE into table Customer; # ProductType文件 1,electrical 2,digital 3,furniture load data local inpath "/home/hzwangxx/cdh4/hive/myTmp/ProductType" into table ProductType; # Product数据文件 1,1,washing machine 2,1,television 3,2,mp3 4,2,mp4 5,2,camera 6,3,chair 7,3,desk load data local inpath "/home/hzwangxx/cdh4/hive/myTmp/Product" into table Product; # Sale数据文件 1,1,1,340.34,2 2,1,2,140.34,1 3,2,3,240.34,3 4,3,4,540.34,4 5,4,1,80.34,5 6,5,2,90.34,26 7,6,3,140.34,7 8,7,4,640.34,28 9,6,1,140.34,29 10,7,2,740.34,29 11,5,3,30.34,28 12,4,4,1240.34,72 13,3,1,314.34,27 14,3,2,45.34,27 load data local inpath "/home/hzwangxx/cdh4/hive/myTmp/Sale" into table Sale;
元数据定义
Cube、Measure等元数据定义见:
<Schema name="hello"> <Cube name="Sales"> <!-- 事实表(fact table) --> <Table name="Sale"/> <!-- 客户维 --> <Dimension name="cusGender" foreignKey="cusId"> <Hierarchy hasAll="true" allMemberName="allGender" primaryKey="cusId"> <Table name="Customer"/> <Level name="gender" column="gender"/> </Hierarchy> </Dimension> <!-- 产品类别维 --> <Dimension name="proType" foreignKey="proId"> <Hierarchy hasAll="true" allMemberName="allPro" primaryKey="proId" primaryKeyTable="Product"> <join leftKey="proTypeId" rightKey="proTypeId"> <Table name="Product"/> <Table name="ProductType"/> </join> <Level name="proTypeId" column="proTypeId" nameColumn="proTypeName" uniqueMembers="true" table="ProductType"/> <Level name="proId" column="proId" nameColumn="proName" uniqueMembers="true" table="Product"/> </Hierarchy> </Dimension> <Measure name="numb" column="number" aggregator="sum" datatype="Numeric"/> <Measure name="totalSale" aggregator="sum" formatString="$ #,##0.00"> <!-- unitPrice*number所得值的列 --> <MeasureExpression> <SQL dialect="generic">unitPrice*number</SQL> </MeasureExpression> </Measure> <CalculatedMember name="averPri" dimension="Measures"> <Formula>[Measures].[totalSale] / [Measures].[numb]</Formula> <CalculatedMemberProperty name="FORMAT_STRING" value="$ #,##0.00"/> </CalculatedMember> </Cube> </Schema>
测试MDX
1. 查询所有类别产品销售总件数、平均价格和总销售额
"select " + "{[Measures].[numb],[Measures].[averPri],[Measures].[totalSale]} on columns," + "{([proType].[allPro],[cusGender].[allGender])} " + "on rows " + "from [Sales]"
对Hive支持情况
建立Connection连接方式
建立Connection连接方式有两种:
使用mondrian自带的DriverManager获取Connection实例
mondrian中自带的API
# 这里的Connection、DriverManager、Query、Result等都是mondrian提供的API接口 Connection connection = DriverManager.getConnection( "Provider=mondrian;" + "Jdbc=jdbc:hive2://node02:10000/mondrian;" + "JdbcUser=;JdbcPassword=;" + "Catalog=/Users/apple/IdeaProjects/hbase-manage/src/main/resources/MiniMart.xml;" + "JdbcDrivers=org.apache.hive.jdbc.HiveDriver", null); Query query = connection.parseQuery( "select \n" + "{[Measures].[numb],[Measures].[averPri],[Measures].[totalSale]} on columns,\n" + "{([proType].[allPro],[cusGender].[allGender])} \n" + "on rows\n" + "from [Sales]\n"); @SuppressWarnings("deprecation") Result result = connection.execute(query); PrintWriter pw = new PrintWriter(System.out); result.print(pw); pw.flush();
对应的连接MySQL,只需要将getConnection中的connectString换成如下即可:
Connection connection = DriverManager.getConnection( "Provider=mondrian;" + "Jdbc=jdbc:mysql://localhost:3306/hive_test; JdbcUser=root;" + "JdbcPassword=123;" + "Catalog=/Users/apple/IdeaProjects/hbase-manage/src/main/resources/MiniMart.xml;" + "JdbcDrivers=com.mysql.jdbc.Driver", null);
测试的时候连接MySQL时,没什么问题,在使用相同的API连Hive的时候,有点问题。down了一下源码发现它的过程是这样的:先去连接池中取一个Connection实例,没有的话通过Factory创建一个Connection放入池里。而在Mondrian创建Factory的时候指定了两个属性:autoCommit和readOnly,RDBMS的Driver都没什么问题,Hive的JDBC提供的HiveConnection中对这两个属性的set方法实现得很诡异,都是直接抛异常了:
public void setReadOnly(boolean readOnly) throws SQLException { // TODO Auto-generated method stub throw new SQLException("Method not supported"); } public void setAutoCommit(boolean autoCommit) throws SQLException { if (autoCommit) { throw new SQLException("enabling autocommit is not supported"); } }
将这两行抛出异常的地方注释掉,rebuild一下jar包,MDX就可以顺利执行完了。
使用JDK原生的DriverManager获取Connection实例
可以使用JDK原生的DriverManager获取Connection然后再使用Olap4j的封装成OLapConnection然后再去执行MDX 具体连接示例如下:
Class.forName("mondrian.olap4j.MondrianOlap4jDriver"); Connection nativeConn = DriverManager.getConnection("jdbc:mondrian:Jdbc=jdbc:hive2://node02:10000/mondrian; JdbcUser=;" + "JdbcPassword=;" + "Catalog=/Users/apple/IdeaProjects/hbase-manage/src/main/resources/MiniMart.xml;" + "JdbcDrivers=org.apache.hive.jdbc.HiveDriver"); OlapConnection olapConn = nativeConn.unwrap(OlapConnection.class); if (olapConn == null) { throw new IllegalStateException("Connection is null"); } OlapStatement statement = olapConn.createStatement(); CellSet cellSet = statement.executeOlapQuery("select " + "{[Measures].[numb],[Measures].[averPri],[Measures].[totalSale]} on columns," + "{([proType].[allPro],[cusGender].[allGender])} " + "on rows " + "from [Sales]") ; //formatter. RectangularCellSetFormatter formatter = new RectangularCellSetFormatter(false); // Print out. PrintWriter writer = new PrintWriter(System.out); formatter.format(cellSet, writer); writer.flush(); statement.close(); olapConn.close(); nativeConn.close();
指定Database
hive也有类似RDBMS一样有database的概念,在Hive提供的普通Java API中虽然在连接字符串中指定了database,但是它默认的并非你指定的database而是上一次当前客户端或线程使用的database(注:并非default),所以一般使用Hive 客户端必须先执行一下use database。而在OlapConnection和Mondrian提供的Connection都不支持"use database"操作。暂时的解决办法,每次去进行MDX查询的时候先通过普通的Java Api执行一下use database,指定到当前需要查询的数据库中。