hive的Tez与MR引擎对比

2019-02-17 19:37
5737
2

hive on tez与 hive on mr的对比。

hive 运行路径 ==> /usr/local/service/hive/bin

[hadoop@10 ~] cd /usr/local/service/hive/bin
​
[hadoop@10 bin]$ hive --hiveconf hive.execution.engine=tez   #使用tez计算引擎
​

hive中的‘product_info’已经成功映射了 hbase中的gizwits_product。 映射方法见‘emr-hive’中"product-info.sql、product-info-exec.sh"

使用 count() #返回的数据条数为0

hive> select count(*) from product_info;
OK
0
Time taken: 2.687 seconds, Fetched: 1 row(s)

第一次执行

hive> select count(product_key) from product_info;
Query ID = hadoop_20190110144242_585f9dc0-8635-46e7-8739-55f6cdceff46
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1546417190707_0003)
​
----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 7.51 s     
----------------------------------------------------------------------------------------------
OK
95243
Time taken: 13.497 seconds, Fetched: 1 r

第二次执行

hive> select count(product_key) from product_info;
Query ID = hadoop_20190110144357_7e5ba026-6006-409d-b829-fa1df7ca2106
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1546417190707_0003)
​
----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED  
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED      1          1        0        0       0       0  
Reducer 2 ...... container     SUCCEEDED      1          1        0        0       0       0  
----------------------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 5.73 s     
----------------------------------------------------------------------------------------------
OK
95243
Time taken: 8.109 seconds, Fetched: 1 row(s)
​

 

如果是hive on mr

其两次执行的结果如下:

#第一次执行
hive> select count(product_key) from product_info;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20190110144002_26033397-acb6-4745-a896-bd5d56a574a2
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Job = job_1546417190707_0001, Tracking URL = http://10.8.1.14:5004/proxy/application_1546417190707_0001/
Kill Command = /usr/local/service/hadoop/bin/hadoop job  -kill job_1546417190707_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-01-10 14:40:12,599 Stage-1 map = 0%,  reduce = 0%
2019-01-10 14:40:21,018 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.45 sec
2019-01-10 14:40:27,325 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.28 sec
MapReduce Total cumulative CPU time: 6 seconds 280 msec
Ended Job = job_1546417190707_0001
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 6.28 sec   HDFS Read: 11819 HDFS Write: 105 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 280 msec
OK
95243
Time taken: 26.359 seconds, Fetched: 1 row(s)
#第二次执行
hive> select count(product_key) from product_info;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20190110144033_88b03a83-68d3-4e26-b675-972b0ff540ea
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
Starting Job = job_1546417190707_0002, Tracking URL = http://10.8.1.14:5004/proxy/application_1546417190707_0002/
Kill Command = /usr/local/service/hadoop/bin/hadoop job  -kill job_1546417190707_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-01-10 14:40:41,076 Stage-1 map = 0%,  reduce = 0%
2019-01-10 14:40:48,460 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 5.12 sec
2019-01-10 14:40:53,669 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.79 sec
MapReduce Total cumulative CPU time: 6 seconds 790 msec
Ended Job = job_1546417190707_0002
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 6.79 sec   HDFS Read: 11819 HDFS Write: 105 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 790 msec
OK
95243
Time taken: 20.815 seconds, Fetched: 1 row(s)

测试结果:

  hive on tez hive on mr
第一次 13.497 seconds 26.359 seconds
第二次 8.109 seconds 20.815 seconds

结论:hive on tez 查询效率高于hive on mr。

全部评论