首页 > 代码库 > 【Hadoop】HIVE 小结概览

【Hadoop】HIVE 小结概览

一、HIVE概览小结

技术分享

 

二、HIVE安装

Hive只在一个节点上安装即可1.上传tar包2.解压    tar -zxvf hive-0.9.0.tar.gz -C /cloud/3.配置mysql metastore(切换到root用户)    配置HIVE_HOME环境变量    rpm -qa | grep mysql    rpm -e mysql-libs-5.1.66-2.el6_3.i686 --nodeps    rpm -ivh MySQL-server-5.1.73-1.glibc23.i386.rpm     rpm -ivh MySQL-client-5.1.73-1.glibc23.i386.rpm     修改mysql的密码    /usr/bin/mysql_secure_installation    (注意:删除匿名用户,允许用户远程连接)    登陆mysql    mysql -u root -p4.配置hive    cp hive-default.xml.template hive-site.xml     修改hive-site.xml(删除所有内容,只留一个<property></property>)    添加如下内容:    <property>      <name>javax.jdo.option.ConnectionURL</name>      <value>jdbc:mysql://weekend01:3306/hive?createDatabaseIfNotExist=true</value>      <description>JDBC connect string for a JDBC metastore</description>    </property>    <property>      <name>javax.jdo.option.ConnectionDriverName</name>      <value>com.mysql.jdbc.Driver</value>      <description>Driver class name for a JDBC metastore</description>    </property>    <property>      <name>javax.jdo.option.ConnectionUserName</name>      <value>root</value>      <description>username to use against metastore database</description>    </property>    <property>      <name>javax.jdo.option.ConnectionPassword</name>      <value>root</value>      <description>password to use against metastore database</description>    </property>    5.安装hive和mysq完成后,将mysql的连接jar包拷贝到$HIVE_HOME/lib目录下    如果出现没有权限的问题,在mysql授权(在安装mysql的机器上执行)    mysql -uroot -p    #(执行下面的语句  *.*:所有库下的所有表   %:任何IP地址或主机都可以连接)    GRANT ALL PRIVILEGES ON *.* TO root@% IDENTIFIED BY 123 WITH GRANT OPTION;    FLUSH PRIVILEGES;    6.建表(默认是内部表)    create table trade_detail(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by \t;    建分区表    create table td_part(id bigint, account string, income double, expenses double, time string) partitioned by (logdate string) row format delimited fields terminated by \t;    建外部表    create external table td_ext(id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by \t location /td_ext;7.创建分区表    普通表和分区表区别:有大量数据增加的需要建分区表    create table book (id bigint, name string) partitioned by (pubdate string) row format delimited fields terminated by \t;     分区表加载数据    load data local inpath ./book.txt overwrite into table book partition (pubdate=2010-08-22);        load data local inpath /root/data.am into table beauty partition (nation="USA");        select nation, avg(size) from beauties group by nation order by avg(size);

 

三、HIVE基础

1.上传hive安装包2.解压3.配置    3.1安装mysql        查询以前安装的mysql相关包        rpm -qa | grep mysql        暴力删除这个包        rpm -e mysql-libs-5.1.66-2.el6_3.i686 --nodeps                rpm -ivh MySQL-server-5.1.73-1.glibc23.i386.rpm         rpm -ivh MySQL-client-5.1.73-1.glibc23.i386.rpm                执行命令设置mysql        /usr/bin/mysql_secure_installation                将hive添加到环境变量当中                GRANT ALL PRIVILEGES ON hive.* TO root@% IDENTIFIED BY 123 WITH GRANT OPTION;        FLUSH PRIVILEGES        在hive当中创建两张表        create table trade_detail (id bigint, account string, income double, expenses double, time string) row format delimited fields terminated by \t;        create table user_info (id bigint, account string, name  string, age int) row format delimited fields terminated by \t;                将mysq当中的数据直接导入到hive当中        sqoop import --connect jdbc:mysql://192.168.1.10:3306/itcast --username root --password 123 --table trade_detail --hive-import --hive-overwrite --hive-table trade_detail --fields-terminated-by ‘\t‘        sqoop import --connect jdbc:mysql://192.168.1.10:3306/itcast --username root --password 123 --table user_info --hive-import --hive-overwrite --hive-table user_info --fields-terminated-by ‘\t‘        创建一个result表保存前一个sql执行的结果        create table result row format delimited fields terminated by \t as select t2.account, t2.name, t1.income, t1.expenses, t1.surplus from user_info t2 join (select account, sum(income) as income, sum(expenses) as expenses, sum(income-expenses) as surplus from trade_detail group by account) t1 on (t1.account = t2.account);                create table user (id int, name string) row format delimited fields terminated by \t        将本地文件系统上的数据导入到HIVE当中        load data local inpath /root/user.txt into table user;                创建外部表        create external table stubak (id int, name string) row format delimited fields terminated by \t location /stubak;                创建分区表        普通表和分区表区别:有大量数据增加的需要建分区表        create table book (id bigint, name string) partitioned by (pubdate string) row format delimited fields terminated by \t;         分区表加载数据        load data local inpath ./book.txt overwrite into table book partition (pubdate=2010-08-22);

 

四、HIVE SQL

set hive.cli.print.header=true;CREATE TABLE page_view(viewTime INT, userid BIGINT,     page_url STRING, referrer_url STRING,     ip STRING COMMENT IP Address of the User) COMMENT This is the page view table PARTITIONED BY(dt STRING, country STRING) ROW FORMAT DELIMITED   FIELDS TERMINATED BY \001STORED AS SEQUENCEFILE;   TEXTFILE//sequencefilecreate table tab_ip_seq(id int,name string,ip string,country string)     row format delimited    fields terminated by ,    stored as sequencefile;insert overwrite table tab_ip_seq select * from tab_ext;//create & loadcreate table tab_ip(id int,name string,ip string,country string)     row format delimited    fields terminated by ,    stored as textfile;load data local inpath /home/hadoop/ip.txt into table tab_ext;//externalCREATE EXTERNAL TABLE tab_ip_ext(id int, name string,     ip STRING,     country STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY , STORED AS TEXTFILE LOCATION /external/hive; // CTAS  用于创建一些临时表存储中间结果CREATE TABLE tab_ip_ctas   ASSELECT id new_id, name new_name, ip new_ip,country new_countryFROM tab_ip_extSORT BY new_id;//insert from select   用于向临时表中追加中间结果数据create table tab_ip_like like tab_ip;insert overwrite table tab_ip_like    select * from tab_ip;//CLUSTER <--相对高级一点,你可以放在有精力的时候才去学习>create table tab_ip_cluster(id int,name string,ip string,country string)clustered by(id) into 3 buckets;load data local inpath /home/hadoop/ip.txt overwrite into table tab_ip_cluster;set hive.enforce.bucketing=true;insert into table tab_ip_cluster select * from tab_ip;select * from tab_ip_cluster tablesample(bucket 2 out of 3 on id); //PARTITIONcreate table tab_ip_part(id int,name string,ip string,country string)     partitioned by (part_flag string)    row format delimited fields terminated by ,;    load data local inpath /home/hadoop/ip.txt overwrite into table tab_ip_part     partition(part_flag=part1);        load data local inpath /home/hadoop/ip_part2.txt overwrite into table tab_ip_part     partition(part_flag=part2);select * from tab_ip_part;select * from tab_ip_part  where part_flag=part2;select count(*) from tab_ip_part  where part_flag=part2;alter table tab_ip change id id_alter string;ALTER TABLE tab_cts ADD PARTITION (partCol = dt) location /external/hive/dt;show partitions tab_ip_part;   //write to hdfsinsert overwrite local directory /home/hadoop/hivetemp/test.txt select * from tab_ip_part where part_flag=part1;    insert overwrite directory /hiveout.txt select * from tab_ip_part where part_flag=part1;//array create table tab_array(a array<int>,b array<string>)row format delimitedfields terminated by \tcollection items terminated by ,;示例数据tobenbrone,laihama,woshishui     13866987898,13287654321abc,iloveyou,itcast     13866987898,13287654321select a[0] from tab_array;select * from tab_array where array_contains(b,word);insert into table tab_array select array(0),array(name,ip) from tab_ext t; //mapcreate table tab_map(name string,info map<string,string>)row format delimitedfields terminated by \tcollection items terminated by ;map keys terminated by :;示例数据:fengjie            age:18;size:36A;addr:usafurong        age:28;size:39C;addr:beijing;weight:180KGload data local inpath /home/hadoop/hivetemp/tab_map.txt overwrite into table tab_map;insert into table tab_map select name,map(name,name,ip,ip) from tab_ext; //structcreate table tab_struct(name string,info struct<age:int,tel:string,addr:string>)row format delimitedfields terminated by \tcollection items terminated by ,load data local inpath /home/hadoop/hivetemp/tab_st.txt overwrite into table tab_struct;insert into table tab_struct select name,named_struct(age,id,tel,name,addr,country) from tab_ext;//cli shellhive -S -e select country,count(*) from tab_ext > /home/hadoop/hivetemp/e.txt  有了这种执行机制,就使得我们可以利用脚本语言(bash shell,python)进行hql语句的批量执行select * from tab_ext sort by id desc limit 5;select a.ip,b.book from tab_ext a join tab_ip_book b on(a.name=b.name);//UDFselect if(id=1,first,no-first),name from tab_ext;hive>add jar /home/hadoop/myudf.jar;hive>CREATE TEMPORARY FUNCTION my_lower AS org.dht.Lower;select my_upper(name) from tab_ext;  

五、HIVE 自定义函数

0.要继承org.apache.hadoop.hive.ql.exec.UDF类实现evaluate自定义函数调用过程:1.添加jar包(在hive命令行里面执行)hive> add jar /root/NUDF.jar;2.创建临时函数hive> create temporary function getNation as cn.itcast.hive.udf.NationUDF;3.调用hive> select id, name, getNation(nation) from beauty;4.将查询结果保存到HDFS中hive> create table result row format delimited fields terminated by \t as select * from beauty order by id desc;    hive> select id, getAreaName(id) as name from tel_rec;create table result row format delimited fields terminated by \t as select id, getNation(nation) from beauties;

 

【Hadoop】HIVE 小结概览