首页 > 代码库 > Spark+Hadoop+Hive集群上数据操作记录

Spark+Hadoop+Hive集群上数据操作记录

[rc@vq18ptkh01 ~]$ hadoop fs -ls /drwxr-xr-x+  - jc_rc      supergroup                 0 2016-11-03 11:46 /dt[rc@vq18ptkh01 ~]$ hadoop fs -copyFromLocal wifi_phone_list_1030.csv /dt[rc@vq18ptkh01 ~]$ hadoop fs -copyFromLocal wifi_phone_list_1031.csv /dt[rc@vq18ptkh01 ~]$ hadoop fs -copyFromLocal wifi_phone_list_1101.csv /dt[rc@vq18ptkh01 ~]$ hadoop fs -ls /dt16/11/03 11:53:16 INFO hdfs.PeerCache: SocketCache disabled.Found 3 items-rw-r--r--+  3 jc_rc supergroup    1548749 2016-11-03 11:48 /dt/wifi_phone_list_1030.csv-rw-r--r--+  3 jc_rc supergroup    1262964 2016-11-03 11:52 /dt/wifi_phone_list_1031.csv-rw-r--r--+  3 jc_rc supergroup     979619 2016-11-03 11:52 /dt/wifi_phone_list_1101.csv[rc@vq18ptkh01 ~]$ beelineConnecting to jdbc:hive2://1.8.15.1:24002,10.78.152.24:24002,1.8.15.2:24002,1.8.12.42:24002,1.8.15.62:24002/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2;sasl.qop=auth-conf;auth=KERBEROS;principal=hive/hadoop.hadoop.com@HADOOP.COMDebug is  true storeKey false useTicketCache true useKeyTab false doNotPrompt false ticketCache is null isInitiator true KeyTab is null refreshKrb5Config is false principal is null tryFirstPass is false useFirstPass is false storePass is false clearPass is falseAcquire TGT from CachePrincipal is jc_rc@HADOOP.COMCommit Succeeded Connected to: Apache Hive (version 1.3.0)Driver: Hive JDBC (version 1.3.0)Transaction isolation: TRANSACTION_REPEATABLE_READBeeline version 1.3.0 by Apache Hive0: jdbc:hive2://1.8.15.2:21066/> use r_hive_db;No rows affected (0.547 seconds)0: jdbc:hive2://1.8.15.2:21066/> create table tmp_wifi1030(imisdn string,starttime string,endtime string) row format delimited fields terminated by ‘,‘ stored as textfile;0: jdbc:hive2://1.8.15.2:21066/> show tables;[rc@vq18ptkh01 ~]$ wc wifi_phone_list_1030.csv -l25390 wifi_phone_list_1030.csv+---------------+--+|   tab_name    |+---------------+--+| tmp_wifi1030  |+---------------+--+1 row selected (0.401 seconds)0: jdbc:hive2://1.8.15.2:21066/> load data inpath ‘hdfs:/dt/wifi_phone_list_1030.csv‘ into table tmp_wifi1030;0: jdbc:hive2://1.8.15.2:21066/> select * from tmp_wifi1030;| tmp_wifi1030.imisdn  |  tmp_wifi1030.starttime  |   tmp_wifi1030.endtime   |+----------------------+--------------------------+--------------------------+--+| 18806503523          | 2016-10-30 23:58:56.000  | 2016-10-31 00:01:07.000  || 15700125216          | 2016-10-30 23:58:57.000  | 2016-10-31 00:01:49.000  |+----------------------+--------------------------+--------------------------+--+25,390 rows selected (5.649 seconds)0: jdbc:hive2://1.8.15.2:21066/> select count(*) from tmp_wifi1030;INFO  : Number of reduce tasks determined at compile time: 1INFO  : In order to change the average load for a reducer (in bytes):INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>INFO  : In order to limit the maximum number of reducers:INFO  :   set hive.exec.reducers.max=<number>INFO  : In order to set a constant number of reducers:INFO  :   set mapreduce.job.reduces=<number>INFO  : number of splits:1INFO  : Submitting tokens for job: job_1475071482566_2471703INFO  : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hacluster, Ident: (HDFS_DELEGATION_TOKEN token 19416140 for jc_rc)INFO  : Kind: HIVE_DELEGATION_TOKEN, Service: HiveServer2ImpersonationToken, Ident: 00 05 6a 63 5f 72 63 05 6a 63 5f 72 63 21 68 69 76 65 2f 68 61 64 6f 6f 70 2e 68 61 64 6f 6f 70 2e 63 6f 6d 40 48 41 44 4f 4f 50 2e 43 4f 4d 8a 01 58 28 57 df 96 8a 01 58 4c 64 63 96 8d 0d 65 ff 8e 03 97INFO  : The url to track the job: https://pc-z1:26001/proxy/application_1475071482566_2471703/INFO  : Starting Job = job_1475071482566_2471703, Tracking URL = https://pc-z1:26001/proxy/application_1475071482566_2471703/INFO  : Kill Command = /opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/FusionInsight-Hive-1.3.0/hive-1.3.0/bin/..//../hadoop/bin/hadoop job  -kill job_1475071482566_2471703INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1INFO  : 2016-11-03 12:04:58,351 Stage-1 map = 0%,  reduce = 0%INFO  : 2016-11-03 12:05:04,702 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.72 secINFO  : 2016-11-03 12:05:12,096 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.86 secINFO  : MapReduce Total cumulative CPU time: 4 seconds 860 msecINFO  : Ended Job = job_1475071482566_2471703+--------+--+|  _c0   |+--------+--+| 25390  |+--------+--+1 row selected (25.595 seconds)0: jdbc:hive2://1.8.15.62:21066/> select * from default.d_s1mme limit 10;+----------------------+--------------------+-------------------------+----------------------+-------------------+--------------------+--------------------+----------------------+------------------------------+------------------------------------+----------------------------------+--------------------------------+-----------------------------+-----------------------------+-----------------------+------------------------------+----------------------------+------------------------+----------------------+--------------------+------------------------------------+------------------------------------+--------------------------+--------------------------+------------------------+------------------------+-------------------+-----------------------+-------------------------+-------------------------+-------------------+---------------------------------+---------------------------+-----------------------------+----------------------------+-------------------------------+-------------------------------------+-------------------------------------+---------------------------+-----------------------------+----------------------------+-------------------------------+-------------------------------------+-------------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+----------------------+--+| d_s1mme .length  | d_s1mme .city  。。。。。。。。。。。。                           |                               | 2016101714           || NULL                 | 579                | 5                       | 130980097fb8c900     | 6                 | 460006791248581    | 352093070081343    | 88888888888888888    | 20                           | 2016-10-17 13:30:23.0              | 2016-10-17 13:30:23.0            | 0                              | 20                          | NULL                        | 0                     | 209743848                    | 419                        | 32                     | D5095073             | NULL               | NULL                               | NULL                               | 100.67.254.45            | 100.111.211.166          | 36412                  | 36412                  | 589D              | BAE6802               | NULL                    | NULL                    | NULL              | 0                               | NULL                      | NULL                        | NULL                       | NULL                          | NULL                                | NULL                                |                           |                             |                            |                               |                                     |                                     |                               |                               |                               |                               | 2016101714           |+----------------------+--------------------+-------------------------+----------------------+-------------------+--------------------+--------------------+----------------------+------------------------------+------------------------------------+----------------------------------+--------------------------------+-----------------------------+-----------------------------+-----------------------+------------------------------+----------------------------+------------------------+----------------------+--------------------+------------------------------------+------------------------------------+--------------------------+--------------------------+------------------------+------------------------+-------------------+-----------------------+-------------------------+-------------------------+-------------------+---------------------------------+---------------------------+-----------------------------+----------------------------+-------------------------------+-------------------------------------+-------------------------------------+---------------------------+-----------------------------+----------------------------+-------------------------------+-------------------------------------+-------------------------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+----------------------+--+10 rows selected (0.6 seconds)0: jdbc:hive2://1.8.15.62:21066/> create table tmp_mr_s1_mme1030 as 0: jdbc:hive2://1.8.15.62:21066/> select 0: jdbc:hive2://1.8.15.62:21066/> a.length,0: jdbc:hive2://1.8.15.62:21066/> a.city,0: jdbc:hive2://1.8.15.62:21066/> a.interface,0: jdbc:hive2://1.8.15.62:21066/> a.xdr_id,0: jdbc:hive2://1.8.15.62:21066/> a.rat,0: jdbc:hive2://1.8.15.62:21066/> a.imsi,0: jdbc:hive2://1.8.15.62:21066/> a.imei,0: jdbc:hive2://1.8.15.62:21066/> a.msisdn,0: jdbc:hive2://1.8.15.62:21066/> a.procedure_start_time,0: jdbc:hive2://1.8.15.62:21066/> a.procedure_end_time,0: jdbc:hive2://1.8.15.62:21066/> a.mme_ue_s1ap_id,0: jdbc:hive2://1.8.15.62:21066/> a.mme_group_id,0: jdbc:hive2://1.8.15.62:21066/> a.mme_code,0: jdbc:hive2://1.8.15.62:21066/> a.user_ipv4,0: jdbc:hive2://1.8.15.62:21066/> a.tac,0: jdbc:hive2://1.8.15.62:21066/> a.cell_id,0: jdbc:hive2://1.8.15.62:21066/> a.other_tac,0: jdbc:hive2://1.8.15.62:21066/> a.other_eci0: jdbc:hive2://1.8.15.62:21066/> from  default.d_s1mme a0: jdbc:hive2://1.8.15.62:21066/> join r_hive_db.tmp_wifi1030 b0: jdbc:hive2://1.8.15.62:21066/> on a.msisdn=b.imisdn0: jdbc:hive2://1.8.15.62:21066/> and a.p_hour>=‘201610300: jdbc:hive2://1.8.15.62:21066/> and a.p_hour<‘20161031‘;create table tmp_mr_s1_mme1030 as select a.length,a.city,a.interface,a.xdr_id,a.rat,a.imsi,a.imei,a.msisdn,a.procedure_start_time,a.procedure_end_time,a.mme_ue_s1ap_id,a.mme_group_id,a.mme_code,a.user_ipv4,a.tac,a.cell_id,a.other_tac,a.other_ecifrom  default.d_s1mme   a join r_hive_db.tmp_wifi1030 b on a.msisdn=b.imisdn and a.p_hour>=‘20161030‘ and a.p_hour<‘20161031‘;0: jdbc:hive2://1.8.15.2:21066/> create table tmp_mr_s1_mme_enbs1030 as 0: jdbc:hive2://1.8.15.2:21066/> select cell_id/256 from tmp_mr_s1_mme1030;0: jdbc:hive2://1.8.15.62:21066/> create table tmp_mr_s1_mme_cellids1030 as select distinct cast(cell_id as bigint) as cellid from tmp_mr_s1_mme1030;0: jdbc:hive2://1.8.15.62:21066/> set hive.merge.mapfiles;+---------------------------+--+|            set            |+---------------------------+--+| hive.merge.mapfiles=true  |+---------------------------+--+1 row selected (0.022 seconds)0: jdbc:hive2://1.8.15.62:21066/> set hive.merge.mapredfileds;+---------------------------------------+--+|                  set                  |+---------------------------------------+--+| hive.merge.mapredfileds is undefined  |+---------------------------------------+--+1 row selected (0.022 seconds)0: jdbc:hive2://1.8.15.62:21066/> set hive.merge.size.per.task=1024000000;No rows affected (0.012 seconds)0: jdbc:hive2://1.8.15.62:21066/> set hive.merge.smallfiles.avgsize=1024000000;No rows affected (0.012 seconds)0: jdbc:hive2://1.8.15.62:21066/> use r_hive_db;No rows affected (0.031 seconds)0: jdbc:hive2://1.8.15.62:21066/> insert overwrite directory ‘/dt/‘ row format delimited fields terminated by ‘|‘ select * from tmp_mr_s1_mme_cellids1030;INFO  : Number of reduce tasks is set to 0 since there‘s no reduce operatorINFO  : number of splits:17INFO  : Submitting tokens for job: job_1475071482566_2477152INFO  : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hacluster, Ident: (HDFS_DELEGATION_TOKEN token 19422634 for jc_rc)INFO  : Kind: HIVE_DELEGATION_TOKEN, Service: HiveServer2ImpersonationToken, Ident: 00 05 6a 63 5f 72 63 05 6a 63 5f 72 63 21 68 69 76 65 2f 68 61 64 6f 6f 70 2e 68 61 64 6f 6f 70 2e 63 6f 6d 40 48 41 44 4f 4f 50 2e 43 4f 4d 8a 01 58 28 d2 8f 0b 8a 01 58 4c df 13 0b 8d 0d 6c 4b 8e 03 98INFO  : The url to track the job: https://pc-z1:26001/proxy/application_1475071482566_2477152/INFO  : Starting Job = job_1475071482566_2477152, Tracking URL = https://pc-z1:26001/proxy/application_1475071482566_2477152/INFO  : Kill Command = /opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/FusionInsight-Hive-1.3.0/hive-1.3.0/bin/..//../hadoop/bin/hadoop job  -kill job_1475071482566_2477152INFO  : Hadoop job information for Stage-1: number of mappers: 17; number of reducers: 0INFO  : 2016-11-03 14:40:52,492 Stage-1 map = 0%,  reduce = 0%INFO  : 2016-11-03 14:40:58,835 Stage-1 map = 76%,  reduce = 0%, Cumulative CPU 28.78 secINFO  : 2016-11-03 14:40:59,892 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 33.55 secINFO  : 2016-11-03 14:41:10,486 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 37.13 secINFO  : 2016-11-03 14:41:11,549 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 41.13 secINFO  : MapReduce Total cumulative CPU time: 41 seconds 130 msecINFO  : Ended Job = job_1475071482566_2477152INFO  : Stage-3 is filtered out by condition resolver.INFO  : Stage-2 is selected by condition resolver.INFO  : Stage-4 is filtered out by condition resolver.INFO  : Number of reduce tasks is set to 0 since there‘s no reduce operatorINFO  : number of splits:1INFO  : Submitting tokens for job: job_1475071482566_2477181INFO  : Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:hacluster, Ident: (HDFS_DELEGATION_TOKEN token 19422663 for jc_rc)INFO  : Kind: HIVE_DELEGATION_TOKEN, Service: HiveServer2ImpersonationToken, Ident: 00 05 6a 63 5f 72 63 05 6a 63 5f 72 63 21 68 69 76 65 2f 68 61 64 6f 6f 70 2e 68 61 64 6f 6f 70 2e 63 6f 6d 40 48 41 44 4f 4f 50 2e 43 4f 4d 8a 01 58 28 d2 8f 0b 8a 01 58 4c df 13 0b 8d 0d 6c 4b 8e 03 98INFO  : The url to track the job: https://pc-z1:26001/proxy/application_1475071482566_2477181/INFO  : Starting Job = job_1475071482566_2477181, Tracking URL = https://pc-z1:26001/proxy/application_1475071482566_2477181/INFO  : Kill Command = /opt/huawei/Bigdata/FusionInsight_V100R002C60SPC200/FusionInsight-Hive-1.3.0/hive-1.3.0/bin/..//../hadoop/bin/hadoop job  -kill job_1475071482566_2477181INFO  : Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0INFO  : 2016-11-03 14:41:22,190 Stage-2 map = 0%,  reduce = 0%INFO  : 2016-11-03 14:41:28,571 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 2.2 secINFO  : MapReduce Total cumulative CPU time: 2 seconds 200 msecINFO  : Ended Job = job_1475071482566_2477181INFO  : Moving data to directory /dt from hdfs://hacluster/dt/.hive-staging_hive_2016-11-03_14-40-43_774_4317869403646242426-140183/-ext-10000No rows affected (46.604 seconds)[rc@vq18ptkh01 dt]$ hadoop fs -ls /dt16/11/03 14:46:18 INFO hdfs.PeerCache: SocketCache disabled.Found 1 items-rwxrwxrwx+  3 jc_rc supergroup      26819 2016-11-03 14:41 /dt/000000_0[rc@vq18ptkh01 dt]$ hadoop fs -copyToLocal /dt/000000_0 16/11/03 14:46:33 INFO hdfs.PeerCache: SocketCache disabled.[rc@vq18ptkh01 dt]$ ls000000_0[rc@vq18ptkh01 dt]$ [rc@vq18ptkh01 dt]$ ls000000_0  000001_0  000002_0  000003_0  000004_0  000005_0[rc@vq18ptkh01 dt]$ ftp 10.70.41.126 21Connected to 10.70.41.126 (10.70.41.126).220 10.70.41.126 FTP server readyName (10.70.41.126:rc): joy331 Password required for joy.Password:230 User joy logged in.Remote system type is UNIX.Using binary mode to transfer files.ftp> put 000000_0 /Temp/a_dt/local: 000000_0 remote: /Temp/a_dt/227 Entering Passive Mode (10,70,41,126,168,163).550 /Temp/a_dt/: Not a regular fileftp> put (local-file) 000000_0(remote-file) /Temp/a_dt/000000_0local: 000000_0 remote: /Temp/a_dt/000000_0227 Entering Passive Mode (10,70,41,126,168,207).150 Opening BINARY mode data connection for /Temp/a_dt/000000_0226 Transfer complete.1049905992 bytes sent in 33 secs (31787.20 Kbytes/sec)ftp> put 000001_0 /Temp/a_dt/000001_0local: 000001_0 remote: /Temp/a_dt/000001_0227 Entering Passive Mode (10,70,41,126,168,255).150 Opening BINARY mode data connection for /Temp/a_dt/000001_0452 Transfer aborted.  No space left on deviceftp> put 000002_0 /Temp/a_dt/000002_0local: 000002_0 remote: /Temp/a_dt/000002_0227 Entering Passive Mode (10,70,41,126,169,20).150 Opening BINARY mode data connection for /Temp/a_dt/000002_0452 Transfer aborted.  No space left on deviceftp> put 000003_0 /Temp/a_dt/000003_0local: 000003_0 remote: /Temp/a_dt/000003_0227 Entering Passive Mode (10,70,41,126,169,40).150 Opening BINARY mode data connection for /Temp/a_dt/000003_0452 Transfer aborted.  No space left on deviceftp> put 000004_0 /Temp/a_dt/000004_0local: 000004_0 remote: /Temp/a_dt/000004_0227 Entering Passive Mode (10,70,41,126,169,66).150 Opening BINARY mode data connection for /Temp/a_dt/000004_0452 Transfer aborted.  No space left on deviceftp> put 000005_0 /Temp/a_dt/000005_0local: 000005_0 remote: /Temp/a_dt/000005_0227 Entering Passive Mode (10,70,41,126,169,85).150 Opening BINARY mode data connection for /Temp/a_dt/000005_0226 Transfer complete.23465237 bytes sent in 0.747 secs (31391.79 Kbytes/sec)ftp> 

 

Spark+Hadoop+Hive集群上数据操作记录