ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

大数据开发之Hive优化篇4-Hive的数据抽样-Sampling

2021-01-22 09:29:04  阅读:203  来源: 互联网

标签:map 抽样 Cumulative reduce Hive Sampling sec CPU Stage


备注:
Hive 版本 2.1.1

文章目录

抽样概述

当数据量特别大时,对全体数据进行处理存在困难时,抽样就显得尤其重要了。抽样可以从被抽取的数据中估计和推断出整体的特性,是科学实验、质量检验、社会调查普遍采用的一种经济有效的工作和研究方法。

Hive中,数据抽样分为以下三种:

  1. 随机抽样
  2. 桶表抽样
  3. 块抽样

一.随机抽样

Hive有个随机函数rand(),我们可以通过rand()函数对表进行抽样,然后用limit子句进行限制抽样数据的返回。
其中rand函数前的distribute和sort关键字可以保证数据在mapper和reducer阶段是随机分布的。

代码:

select * from ods_fact_sale order by rand() limit 20;
select * from ods_fact_sale where sale_date = '2011-08-16 00:00:00.0'  distribute by rand() sort by rand() limit 10;

测试记录:
从测试记录可以看出,随机抽样因为需要排序,所以性能也不佳,当然会比全量数据查询性能更优一些

hive> 
    > select * from ods_fact_sale order by rand() limit 20;
Query ID = root_20201231105936_75f9fb76-9149-4884-8faf-4254fd1e3b30
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
20/12/31 10:59:37 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1609141291605_0022, Tracking URL = http://hp3:8088/proxy/application_1609141291605_0022/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1609141291605_0022
Hadoop job information for Stage-1: number of mappers: 117; number of reducers: 1
2020-12-31 10:59:46,944 Stage-1 map = 0%,  reduce = 0%
2020-12-31 11:00:01,475 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 13.27 sec
2020-12-31 11:00:02,506 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 27.08 sec
2020-12-31 11:00:13,893 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 50.64 sec
2020-12-31 11:00:25,199 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 74.57 sec
2020-12-31 11:00:37,526 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 99.39 sec
2020-12-31 11:00:49,832 Stage-1 map = 9%,  reduce = 0%, Cumulative CPU 123.39 sec
2020-12-31 11:01:01,139 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 147.16 sec
2020-12-31 11:01:12,412 Stage-1 map = 12%,  reduce = 0%, Cumulative CPU 170.94 sec
2020-12-31 11:01:24,721 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU 194.79 sec
2020-12-31 11:01:35,987 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU 206.61 sec
2020-12-31 11:01:47,263 Stage-1 map = 16%,  reduce = 0%, Cumulative CPU 230.32 sec
2020-12-31 11:01:49,314 Stage-1 map = 17%,  reduce = 0%, Cumulative CPU 242.44 sec
2020-12-31 11:01:58,542 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 254.01 sec
2020-12-31 11:02:00,591 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 266.02 sec
2020-12-31 11:02:09,819 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU 277.88 sec
2020-12-31 11:02:12,895 Stage-1 map = 21%,  reduce = 0%, Cumulative CPU 289.49 sec
2020-12-31 11:02:24,167 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU 312.52 sec
2020-12-31 11:02:31,327 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 324.24 sec
2020-12-31 11:02:34,390 Stage-1 map = 24%,  reduce = 0%, Cumulative CPU 336.04 sec
2020-12-31 11:02:42,588 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 348.11 sec
2020-12-31 11:02:45,663 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU 359.64 sec
2020-12-31 11:02:56,917 Stage-1 map = 27%,  reduce = 0%, Cumulative CPU 383.53 sec
2020-12-31 11:03:06,149 Stage-1 map = 28%,  reduce = 0%, Cumulative CPU 395.32 sec
2020-12-31 11:03:09,227 Stage-1 map = 29%,  reduce = 0%, Cumulative CPU 407.11 sec
2020-12-31 11:03:16,393 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU 418.82 sec
2020-12-31 11:03:19,467 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 430.2 sec
2020-12-31 11:03:27,645 Stage-1 map = 32%,  reduce = 0%, Cumulative CPU 441.85 sec
2020-12-31 11:03:38,914 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 465.44 sec
2020-12-31 11:03:43,008 Stage-1 map = 34%,  reduce = 0%, Cumulative CPU 477.26 sec
2020-12-31 11:03:51,199 Stage-1 map = 35%,  reduce = 0%, Cumulative CPU 489.1 sec
2020-12-31 11:03:55,286 Stage-1 map = 36%,  reduce = 0%, Cumulative CPU 500.86 sec
2020-12-31 11:04:02,462 Stage-1 map = 37%,  reduce = 0%, Cumulative CPU 512.68 sec
2020-12-31 11:04:06,560 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 524.44 sec
2020-12-31 11:04:17,815 Stage-1 map = 39%,  reduce = 0%, Cumulative CPU 548.46 sec
2020-12-31 11:04:23,958 Stage-1 map = 40%,  reduce = 0%, Cumulative CPU 560.07 sec
2020-12-31 11:04:30,092 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU 572.02 sec
2020-12-31 11:04:35,194 Stage-1 map = 42%,  reduce = 0%, Cumulative CPU 584.2 sec
2020-12-31 11:04:42,337 Stage-1 map = 43%,  reduce = 0%, Cumulative CPU 595.69 sec
2020-12-31 11:04:47,456 Stage-1 map = 44%,  reduce = 0%, Cumulative CPU 607.62 sec
2020-12-31 11:04:58,717 Stage-1 map = 45%,  reduce = 0%, Cumulative CPU 631.07 sec
2020-12-31 11:05:03,819 Stage-1 map = 46%,  reduce = 0%, Cumulative CPU 642.81 sec
2020-12-31 11:05:09,966 Stage-1 map = 47%,  reduce = 0%, Cumulative CPU 654.43 sec
2020-12-31 11:05:15,077 Stage-1 map = 48%,  reduce = 0%, Cumulative CPU 665.79 sec
2020-12-31 11:05:21,220 Stage-1 map = 49%,  reduce = 0%, Cumulative CPU 677.29 sec
2020-12-31 11:05:26,334 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 688.58 sec
2020-12-31 11:05:38,617 Stage-1 map = 51%,  reduce = 0%, Cumulative CPU 710.6 sec
2020-12-31 11:05:41,688 Stage-1 map = 52%,  reduce = 0%, Cumulative CPU 723.47 sec
2020-12-31 11:05:49,869 Stage-1 map = 53%,  reduce = 0%, Cumulative CPU 735.39 sec
2020-12-31 11:05:52,936 Stage-1 map = 54%,  reduce = 0%, Cumulative CPU 747.02 sec
2020-12-31 11:06:01,119 Stage-1 map = 55%,  reduce = 0%, Cumulative CPU 759.4 sec
2020-12-31 11:06:05,217 Stage-1 map = 56%,  reduce = 0%, Cumulative CPU 771.34 sec
2020-12-31 11:06:16,493 Stage-1 map = 57%,  reduce = 0%, Cumulative CPU 795.58 sec
2020-12-31 11:06:25,733 Stage-1 map = 58%,  reduce = 0%, Cumulative CPU 807.6 sec
2020-12-31 11:06:28,797 Stage-1 map = 59%,  reduce = 0%, Cumulative CPU 819.42 sec
2020-12-31 11:06:38,003 Stage-1 map = 60%,  reduce = 0%, Cumulative CPU 831.37 sec
2020-12-31 11:06:39,030 Stage-1 map = 61%,  reduce = 0%, Cumulative CPU 842.84 sec
2020-12-31 11:06:49,244 Stage-1 map = 62%,  reduce = 0%, Cumulative CPU 854.85 sec
2020-12-31 11:07:00,504 Stage-1 map = 63%,  reduce = 0%, Cumulative CPU 879.03 sec
2020-12-31 11:07:01,528 Stage-1 map = 64%,  reduce = 0%, Cumulative CPU 891.09 sec
2020-12-31 11:07:11,764 Stage-1 map = 65%,  reduce = 0%, Cumulative CPU 902.4 sec
2020-12-31 11:07:13,809 Stage-1 map = 66%,  reduce = 0%, Cumulative CPU 913.91 sec
2020-12-31 11:07:24,033 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 937.48 sec
2020-12-31 11:07:36,294 Stage-1 map = 69%,  reduce = 0%, Cumulative CPU 961.98 sec
2020-12-31 11:07:47,557 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 973.78 sec
2020-12-31 11:07:48,577 Stage-1 map = 71%,  reduce = 0%, Cumulative CPU 986.12 sec
2020-12-31 11:07:58,802 Stage-1 map = 72%,  reduce = 0%, Cumulative CPU 997.56 sec
2020-12-31 11:07:59,822 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 1009.51 sec
2020-12-31 11:08:10,088 Stage-1 map = 74%,  reduce = 0%, Cumulative CPU 1021.66 sec
2020-12-31 11:08:21,359 Stage-1 map = 75%,  reduce = 0%, Cumulative CPU 1045.31 sec
2020-12-31 11:08:22,387 Stage-1 map = 76%,  reduce = 0%, Cumulative CPU 1057.46 sec
2020-12-31 11:08:32,659 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU 1069.52 sec
2020-12-31 11:08:34,714 Stage-1 map = 78%,  reduce = 0%, Cumulative CPU 1081.69 sec
2020-12-31 11:08:44,962 Stage-1 map = 79%,  reduce = 0%, Cumulative CPU 1093.68 sec
2020-12-31 11:08:57,248 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 1117.85 sec
2020-12-31 11:08:59,303 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 1129.42 sec
2020-12-31 11:09:08,562 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 1141.26 sec
2020-12-31 11:09:14,718 Stage-1 map = 82%,  reduce = 27%, Cumulative CPU 1142.0 sec
2020-12-31 11:09:19,840 Stage-1 map = 83%,  reduce = 27%, Cumulative CPU 1153.68 sec
2020-12-31 11:09:25,991 Stage-1 map = 83%,  reduce = 28%, Cumulative CPU 1153.9 sec
2020-12-31 11:09:31,094 Stage-1 map = 84%,  reduce = 28%, Cumulative CPU 1165.59 sec
2020-12-31 11:09:42,338 Stage-1 map = 85%,  reduce = 28%, Cumulative CPU 1177.7 sec
2020-12-31 11:10:04,855 Stage-1 map = 86%,  reduce = 28%, Cumulative CPU 1201.17 sec
2020-12-31 11:10:08,950 Stage-1 map = 86%,  reduce = 29%, Cumulative CPU 1201.22 sec
2020-12-31 11:10:16,121 Stage-1 map = 87%,  reduce = 29%, Cumulative CPU 1212.59 sec
2020-12-31 11:10:27,370 Stage-1 map = 88%,  reduce = 29%, Cumulative CPU 1224.54 sec
2020-12-31 11:10:37,625 Stage-1 map = 89%,  reduce = 29%, Cumulative CPU 1236.28 sec
2020-12-31 11:10:38,646 Stage-1 map = 89%,  reduce = 30%, Cumulative CPU 1236.32 sec
2020-12-31 11:10:48,884 Stage-1 map = 90%,  reduce = 30%, Cumulative CPU 1248.28 sec
2020-12-31 11:11:00,139 Stage-1 map = 91%,  reduce = 30%, Cumulative CPU 1260.32 sec
2020-12-31 11:11:23,680 Stage-1 map = 92%,  reduce = 30%, Cumulative CPU 1283.88 sec
2020-12-31 11:11:26,757 Stage-1 map = 92%,  reduce = 31%, Cumulative CPU 1283.92 sec
2020-12-31 11:11:34,927 Stage-1 map = 93%,  reduce = 31%, Cumulative CPU 1295.65 sec
2020-12-31 11:11:46,182 Stage-1 map = 94%,  reduce = 31%, Cumulative CPU 1308.17 sec
2020-12-31 11:11:58,462 Stage-1 map = 95%,  reduce = 31%, Cumulative CPU 1320.31 sec
2020-12-31 11:12:02,563 Stage-1 map = 95%,  reduce = 32%, Cumulative CPU 1320.36 sec
2020-12-31 11:12:08,713 Stage-1 map = 96%,  reduce = 32%, Cumulative CPU 1331.91 sec
2020-12-31 11:12:19,990 Stage-1 map = 97%,  reduce = 32%, Cumulative CPU 1343.48 sec
2020-12-31 11:12:43,556 Stage-1 map = 98%,  reduce = 32%, Cumulative CPU 1367.18 sec
2020-12-31 11:12:45,603 Stage-1 map = 98%,  reduce = 33%, Cumulative CPU 1367.24 sec
2020-12-31 11:12:55,854 Stage-1 map = 99%,  reduce = 33%, Cumulative CPU 1378.97 sec
2020-12-31 11:13:07,114 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 1390.76 sec
2020-12-31 11:13:09,162 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 1392.76 sec
MapReduce Total cumulative CPU time: 23 minutes 12 seconds 760 msec
Ended Job = job_1609141291605_0022
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 117  Reduce: 1   Cumulative CPU: 1392.76 sec   HDFS Read: 31436905540 HDFS Write: 1147 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 23 minutes 12 seconds 760 msec
OK
654691105       2011-05-25 00:00:00.0   PROD10  53
752859493       2011-11-08 00:00:00.0   PROD4   92
620442730       2010-06-11 00:00:00.0   PROD5   22
524983813       2011-04-11 00:00:00.0   PROD6   31
89887602        2010-08-18 00:00:00.0   PROD7   45
93701058        2011-10-31 00:00:00.0   PROD4   62
739459682       2011-01-15 00:00:00.0   PROD4   93
480818608       2010-07-12 00:00:00.0   PROD2   87
457915153       2011-09-09 00:00:00.0   PROD9   85
405422684       2011-11-23 00:00:00.0   PROD10  86
322983965       2012-04-06 00:00:00.0   PROD8   7
588940412       2010-08-15 00:00:00.0   PROD8   51
421954935       2012-01-24 00:00:00.0   PROD4   17
749374812       2010-12-12 00:00:00.0   PROD4   62
298315594       2010-06-13 00:00:00.0   PROD5   75
723116860       2011-01-17 00:00:00.0   PROD10  89
167011022       2011-01-20 00:00:00.0   PROD4   69
430667509       2011-07-07 00:00:00.0   PROD6   63
665176804       2012-08-25 00:00:00.0   PROD7   77
648219864       2012-05-15 00:00:00.0   PROD7   74
Time taken: 814.055 seconds, Fetched: 20 row(s)

hive> 
    > select * from ods_fact_sale where sale_date = '2011-08-16 00:00:00.0'  distribute by rand() sort by rand() limit 10;
Query ID = root_20201231135813_71f7d916-8e6f-4c7a-846f-49b78194da8d
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 469
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
20/12/31 13:58:14 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1609141291605_0023, Tracking URL = http://hp3:8088/proxy/application_1609141291605_0023/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1609141291605_0023
Hadoop job information for Stage-1: number of mappers: 117; number of reducers: 469
2020-12-31 13:58:21,609 Stage-1 map = 0%,  reduce = 0%
2020-12-31 13:58:31,907 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 7.78 sec
2020-12-31 13:58:32,938 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 17.16 sec
2020-12-31 13:58:39,109 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 24.81 sec
2020-12-31 13:58:46,309 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 41.49 sec
2020-12-31 13:58:49,396 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 50.29 sec
2020-12-31 13:58:52,477 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 57.84 sec
2020-12-31 13:58:56,588 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 66.39 sec
2020-12-31 13:58:59,672 Stage-1 map = 8%,  reduce = 0%, Cumulative CPU 73.76 sec
2020-12-31 13:59:04,828 Stage-1 map = 9%,  reduce = 0%, Cumulative CPU 81.84 sec
2020-12-31 13:59:12,006 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 97.09 sec
2020-12-31 13:59:14,061 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 104.49 sec
2020-12-31 13:59:20,215 Stage-1 map = 12%,  reduce = 0%, Cumulative CPU 112.37 sec
2020-12-31 13:59:21,246 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 119.89 sec
2020-12-31 13:59:28,440 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU 135.55 sec
2020-12-31 13:59:36,643 Stage-1 map = 16%,  reduce = 0%, Cumulative CPU 151.18 sec
2020-12-31 13:59:41,772 Stage-1 map = 17%,  reduce = 0%, Cumulative CPU 158.62 sec
2020-12-31 13:59:44,854 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 166.61 sec
2020-12-31 13:59:48,955 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 173.93 sec
2020-12-31 13:59:52,032 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU 182.11 sec
2020-12-31 13:59:56,135 Stage-1 map = 21%,  reduce = 0%, Cumulative CPU 189.52 sec
2020-12-31 14:00:03,337 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU 204.95 sec
2020-12-31 14:00:08,474 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 212.83 sec
2020-12-31 14:00:10,529 Stage-1 map = 24%,  reduce = 0%, Cumulative CPU 220.27 sec
2020-12-31 14:00:16,678 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU 235.53 sec
2020-12-31 14:00:24,875 Stage-1 map = 27%,  reduce = 0%, Cumulative CPU 251.0 sec
2020-12-31 14:00:31,029 Stage-1 map = 28%,  reduce = 0%, Cumulative CPU 258.48 sec
2020-12-31 14:00:33,084 Stage-1 map = 29%,  reduce = 0%, Cumulative CPU 266.53 sec
2020-12-31 14:00:38,214 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU 273.81 sec
2020-12-31 14:00:40,263 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 281.73 sec
2020-12-31 14:00:45,380 Stage-1 map = 32%,  reduce = 0%, Cumulative CPU 289.15 sec
2020-12-31 14:00:52,560 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 304.46 sec
2020-12-31 14:00:56,667 Stage-1 map = 34%,  reduce = 0%, Cumulative CPU 312.72 sec
2020-12-31 14:00:59,737 Stage-1 map = 35%,  reduce = 0%, Cumulative CPU 320.25 sec
2020-12-31 14:01:04,867 Stage-1 map = 36%,  reduce = 0%, Cumulative CPU 328.38 sec
2020-12-31 14:01:05,893 Stage-1 map = 37%,  reduce = 0%, Cumulative CPU 335.74 sec
2020-12-31 14:01:13,071 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 350.86 sec
2020-12-31 14:01:20,251 Stage-1 map = 40%,  reduce = 0%, Cumulative CPU 366.17 sec
2020-12-31 14:01:27,416 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU 373.71 sec
2020-12-31 14:01:28,442 Stage-1 map = 42%,  reduce = 0%, Cumulative CPU 381.35 sec
2020-12-31 14:01:34,585 Stage-1 map = 43%,  reduce = 0%, Cumulative CPU 388.67 sec
2020-12-31 14:01:35,607 Stage-1 map = 44%,  reduce = 0%, Cumulative CPU 396.63 sec
2020-12-31 14:01:43,802 Stage-1 map = 45%,  reduce = 0%, Cumulative CPU 412.13 sec
2020-12-31 14:01:48,929 Stage-1 map = 46%,  reduce = 0%, Cumulative CPU 419.64 sec
2020-12-31 14:01:52,005 Stage-1 map = 47%,  reduce = 0%, Cumulative CPU 427.68 sec
2020-12-31 14:01:54,056 Stage-1 map = 48%,  reduce = 0%, Cumulative CPU 433.61 sec
2020-12-31 14:02:00,199 Stage-1 map = 49%,  reduce = 0%, Cumulative CPU 441.53 sec
2020-12-31 14:02:02,274 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 449.88 sec
2020-12-31 14:02:09,465 Stage-1 map = 51%,  reduce = 0%, Cumulative CPU 465.79 sec
2020-12-31 14:02:15,630 Stage-1 map = 52%,  reduce = 0%, Cumulative CPU 473.67 sec
2020-12-31 14:02:17,687 Stage-1 map = 53%,  reduce = 0%, Cumulative CPU 481.54 sec
2020-12-31 14:02:23,854 Stage-1 map = 54%,  reduce = 0%, Cumulative CPU 489.38 sec
2020-12-31 14:02:25,905 Stage-1 map = 55%,  reduce = 0%, Cumulative CPU 497.17 sec
2020-12-31 14:02:33,084 Stage-1 map = 56%,  reduce = 0%, Cumulative CPU 506.49 sec
2020-12-31 14:02:41,293 Stage-1 map = 57%,  reduce = 0%, Cumulative CPU 523.33 sec
2020-12-31 14:02:43,344 Stage-1 map = 58%,  reduce = 0%, Cumulative CPU 532.39 sec
2020-12-31 14:02:49,496 Stage-1 map = 59%,  reduce = 0%, Cumulative CPU 540.88 sec
2020-12-31 14:02:51,541 Stage-1 map = 60%,  reduce = 0%, Cumulative CPU 548.98 sec
2020-12-31 14:02:57,686 Stage-1 map = 61%,  reduce = 0%, Cumulative CPU 557.3 sec
2020-12-31 14:03:00,757 Stage-1 map = 62%,  reduce = 0%, Cumulative CPU 567.09 sec
2020-12-31 14:03:08,949 Stage-1 map = 63%,  reduce = 0%, Cumulative CPU 585.16 sec
2020-12-31 14:03:13,044 Stage-1 map = 64%,  reduce = 0%, Cumulative CPU 593.36 sec
2020-12-31 14:03:18,168 Stage-1 map = 65%,  reduce = 0%, Cumulative CPU 602.73 sec
2020-12-31 14:03:21,233 Stage-1 map = 66%,  reduce = 0%, Cumulative CPU 610.98 sec
2020-12-31 14:03:27,362 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 620.83 sec
2020-12-31 14:03:29,405 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 629.08 sec
2020-12-31 14:03:37,585 Stage-1 map = 69%,  reduce = 0%, Cumulative CPU 646.57 sec
2020-12-31 14:03:43,726 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 654.23 sec
2020-12-31 14:03:45,778 Stage-1 map = 71%,  reduce = 0%, Cumulative CPU 662.3 sec
2020-12-31 14:03:50,901 Stage-1 map = 72%,  reduce = 0%, Cumulative CPU 670.0 sec
2020-12-31 14:03:52,949 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 678.12 sec
2020-12-31 14:03:58,069 Stage-1 map = 74%,  reduce = 0%, Cumulative CPU 685.59 sec
2020-12-31 14:04:05,244 Stage-1 map = 75%,  reduce = 0%, Cumulative CPU 701.02 sec
2020-12-31 14:04:09,343 Stage-1 map = 76%,  reduce = 0%, Cumulative CPU 709.68 sec
2020-12-31 14:04:12,416 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU 717.45 sec
2020-12-31 14:04:17,540 Stage-1 map = 78%,  reduce = 0%, Cumulative CPU 717.45 sec
2020-12-31 14:04:19,590 Stage-1 map = 79%,  reduce = 0%, Cumulative CPU 734.92 sec
2020-12-31 14:04:28,796 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 752.8 sec
2020-12-31 14:04:33,910 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 761.43 sec
2020-12-31 14:04:38,010 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 771.05 sec
2020-12-31 14:04:46,214 Stage-1 map = 83%,  reduce = 0%, Cumulative CPU 780.66 sec
2020-12-31 14:04:55,479 Stage-1 map = 84%,  reduce = 0%, Cumulative CPU 791.47 sec
2020-12-31 14:05:04,703 Stage-1 map = 85%,  reduce = 0%, Cumulative CPU 801.23 sec
2020-12-31 14:05:22,125 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU 820.39 sec
2020-12-31 14:05:31,339 Stage-1 map = 87%,  reduce = 0%, Cumulative CPU 830.03 sec
2020-12-31 14:05:40,555 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 839.37 sec
2020-12-31 14:05:49,765 Stage-1 map = 89%,  reduce = 0%, Cumulative CPU 848.73 sec
2020-12-31 14:05:57,956 Stage-1 map = 90%,  reduce = 0%, Cumulative CPU 848.87 sec
2020-12-31 14:06:07,193 Stage-1 map = 91%,  reduce = 0%, Cumulative CPU 866.8 sec
2020-12-31 14:06:21,535 Stage-1 map = 92%,  reduce = 0%, Cumulative CPU 882.17 sec
2020-12-31 14:06:28,722 Stage-1 map = 93%,  reduce = 0%, Cumulative CPU 890.1 sec
2020-12-31 14:06:34,871 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 897.74 sec
2020-12-31 14:06:42,042 Stage-1 map = 95%,  reduce = 0%, Cumulative CPU 905.22 sec
2020-12-31 14:06:49,212 Stage-1 map = 96%,  reduce = 0%, Cumulative CPU 912.75 sec
2020-12-31 14:06:56,368 Stage-1 map = 97%,  reduce = 0%, Cumulative CPU 920.31 sec
2020-12-31 14:07:10,714 Stage-1 map = 98%,  reduce = 0%, Cumulative CPU 935.37 sec
2020-12-31 14:07:16,855 Stage-1 map = 99%,  reduce = 0%, Cumulative CPU 943.06 sec
2020-12-31 14:07:24,004 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 950.73 sec
2020-12-31 14:07:30,153 Stage-1 map = 100%,  reduce = 1%, Cumulative CPU 957.08 sec
2020-12-31 14:07:40,370 Stage-1 map = 100%,  reduce = 2%, Cumulative CPU 968.59 sec
2020-12-31 14:07:47,548 Stage-1 map = 100%,  reduce = 3%, Cumulative CPU 977.62 sec
2020-12-31 14:07:57,780 Stage-1 map = 100%,  reduce = 4%, Cumulative CPU 989.98 sec
2020-12-31 14:08:08,025 Stage-1 map = 100%,  reduce = 5%, Cumulative CPU 1002.35 sec
2020-12-31 14:08:16,225 Stage-1 map = 100%,  reduce = 6%, Cumulative CPU 1012.48 sec
2020-12-31 14:08:26,474 Stage-1 map = 100%,  reduce = 7%, Cumulative CPU 1023.83 sec
2020-12-31 14:08:35,686 Stage-1 map = 100%,  reduce = 8%, Cumulative CPU 1035.25 sec
2020-12-31 14:08:43,876 Stage-1 map = 100%,  reduce = 9%, Cumulative CPU 1044.42 sec
2020-12-31 14:08:54,122 Stage-1 map = 100%,  reduce = 10%, Cumulative CPU 1056.78 sec
2020-12-31 14:09:04,381 Stage-1 map = 100%,  reduce = 11%, Cumulative CPU 1068.06 sec
2020-12-31 14:09:12,577 Stage-1 map = 100%,  reduce = 12%, Cumulative CPU 1077.07 sec
2020-12-31 14:09:22,814 Stage-1 map = 100%,  reduce = 13%, Cumulative CPU 1089.5 sec
2020-12-31 14:09:32,020 Stage-1 map = 100%,  reduce = 14%, Cumulative CPU 1100.92 sec
2020-12-31 14:09:42,265 Stage-1 map = 100%,  reduce = 15%, Cumulative CPU 1112.77 sec
2020-12-31 14:09:50,446 Stage-1 map = 100%,  reduce = 16%, Cumulative CPU 1122.02 sec
2020-12-31 14:10:00,697 Stage-1 map = 100%,  reduce = 17%, Cumulative CPU 1134.19 sec
2020-12-31 14:10:10,947 Stage-1 map = 100%,  reduce = 18%, Cumulative CPU 1145.76 sec
2020-12-31 14:10:18,126 Stage-1 map = 100%,  reduce = 19%, Cumulative CPU 1154.87 sec
2020-12-31 14:10:28,387 Stage-1 map = 100%,  reduce = 20%, Cumulative CPU 1166.3 sec
2020-12-31 14:10:38,627 Stage-1 map = 100%,  reduce = 21%, Cumulative CPU 1178.67 sec
2020-12-31 14:10:46,829 Stage-1 map = 100%,  reduce = 22%, Cumulative CPU 1188.38 sec
2020-12-31 14:10:56,045 Stage-1 map = 100%,  reduce = 23%, Cumulative CPU 1199.71 sec
2020-12-31 14:11:06,291 Stage-1 map = 100%,  reduce = 24%, Cumulative CPU 1211.25 sec
2020-12-31 14:11:14,480 Stage-1 map = 100%,  reduce = 25%, Cumulative CPU 1220.47 sec
2020-12-31 14:11:24,728 Stage-1 map = 100%,  reduce = 26%, Cumulative CPU 1231.71 sec
2020-12-31 14:11:34,956 Stage-1 map = 100%,  reduce = 27%, Cumulative CPU 1243.07 sec
2020-12-31 14:11:43,155 Stage-1 map = 100%,  reduce = 28%, Cumulative CPU 1252.2 sec
2020-12-31 14:11:52,379 Stage-1 map = 100%,  reduce = 29%, Cumulative CPU 1263.42 sec
2020-12-31 14:12:02,628 Stage-1 map = 100%,  reduce = 30%, Cumulative CPU 1274.71 sec
2020-12-31 14:12:12,877 Stage-1 map = 100%,  reduce = 31%, Cumulative CPU 1285.68 sec
2020-12-31 14:12:21,081 Stage-1 map = 100%,  reduce = 32%, Cumulative CPU 1294.61 sec
2020-12-31 14:12:32,371 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 1308.38 sec
2020-12-31 14:12:40,558 Stage-1 map = 100%,  reduce = 34%, Cumulative CPU 1317.37 sec
2020-12-31 14:12:48,765 Stage-1 map = 100%,  reduce = 35%, Cumulative CPU 1326.48 sec
2020-12-31 14:13:00,064 Stage-1 map = 100%,  reduce = 36%, Cumulative CPU 1337.97 sec
2020-12-31 14:13:08,279 Stage-1 map = 100%,  reduce = 37%, Cumulative CPU 1348.92 sec
2020-12-31 14:13:16,478 Stage-1 map = 100%,  reduce = 38%, Cumulative CPU 1358.16 sec
2020-12-31 14:13:27,748 Stage-1 map = 100%,  reduce = 39%, Cumulative CPU 1369.61 sec
2020-12-31 14:13:36,954 Stage-1 map = 100%,  reduce = 40%, Cumulative CPU 1381.52 sec
2020-12-31 14:13:45,162 Stage-1 map = 100%,  reduce = 41%, Cumulative CPU 1390.5 sec
2020-12-31 14:13:56,426 Stage-1 map = 100%,  reduce = 42%, Cumulative CPU 1404.04 sec
2020-12-31 14:14:04,630 Stage-1 map = 100%,  reduce = 43%, Cumulative CPU 1413.17 sec
2020-12-31 14:14:15,907 Stage-1 map = 100%,  reduce = 44%, Cumulative CPU 1424.15 sec
2020-12-31 14:14:24,108 Stage-1 map = 100%,  reduce = 45%, Cumulative CPU 1433.07 sec
2020-12-31 14:14:33,324 Stage-1 map = 100%,  reduce = 46%, Cumulative CPU 1444.27 sec
2020-12-31 14:14:44,594 Stage-1 map = 100%,  reduce = 47%, Cumulative CPU 1457.57 sec
2020-12-31 14:14:51,765 Stage-1 map = 100%,  reduce = 48%, Cumulative CPU 1464.38 sec
2020-12-31 14:15:00,997 Stage-1 map = 100%,  reduce = 49%, Cumulative CPU 1475.56 sec
2020-12-31 14:15:12,284 Stage-1 map = 100%,  reduce = 50%, Cumulative CPU 1486.63 sec
2020-12-31 14:15:20,479 Stage-1 map = 100%,  reduce = 51%, Cumulative CPU 1495.52 sec
2020-12-31 14:15:28,690 Stage-1 map = 100%,  reduce = 52%, Cumulative CPU 1506.78 sec
2020-12-31 14:15:39,970 Stage-1 map = 100%,  reduce = 53%, Cumulative CPU 1518.01 sec
2020-12-31 14:15:48,182 Stage-1 map = 100%,  reduce = 54%, Cumulative CPU 1527.01 sec
2020-12-31 14:15:57,401 Stage-1 map = 100%,  reduce = 55%, Cumulative CPU 1538.2 sec
2020-12-31 14:16:08,678 Stage-1 map = 100%,  reduce = 56%, Cumulative CPU 1551.7 sec
2020-12-31 14:16:16,895 Stage-1 map = 100%,  reduce = 57%, Cumulative CPU 1560.42 sec
2020-12-31 14:16:25,107 Stage-1 map = 100%,  reduce = 58%, Cumulative CPU 1569.44 sec
2020-12-31 14:16:36,370 Stage-1 map = 100%,  reduce = 59%, Cumulative CPU 1580.69 sec
2020-12-31 14:16:45,596 Stage-1 map = 100%,  reduce = 60%, Cumulative CPU 1592.01 sec
2020-12-31 14:16:52,776 Stage-1 map = 100%,  reduce = 61%, Cumulative CPU 1601.0 sec
2020-12-31 14:17:05,078 Stage-1 map = 100%,  reduce = 62%, Cumulative CPU 1614.51 sec
2020-12-31 14:17:13,284 Stage-1 map = 100%,  reduce = 63%, Cumulative CPU 1623.31 sec
2020-12-31 14:17:21,491 Stage-1 map = 100%,  reduce = 64%, Cumulative CPU 1632.19 sec
2020-12-31 14:17:32,762 Stage-1 map = 100%,  reduce = 65%, Cumulative CPU 1643.45 sec
2020-12-31 14:17:40,961 Stage-1 map = 100%,  reduce = 66%, Cumulative CPU 1654.75 sec
2020-12-31 14:17:49,164 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 1663.65 sec
2020-12-31 14:18:00,460 Stage-1 map = 100%,  reduce = 68%, Cumulative CPU 1674.82 sec
2020-12-31 14:18:09,686 Stage-1 map = 100%,  reduce = 69%, Cumulative CPU 1685.96 sec
2020-12-31 14:18:18,922 Stage-1 map = 100%,  reduce = 70%, Cumulative CPU 1694.86 sec
2020-12-31 14:18:29,166 Stage-1 map = 100%,  reduce = 71%, Cumulative CPU 1706.34 sec
2020-12-31 14:18:38,369 Stage-1 map = 100%,  reduce = 72%, Cumulative CPU 1717.78 sec
2020-12-31 14:18:48,641 Stage-1 map = 100%,  reduce = 73%, Cumulative CPU 1728.87 sec
2020-12-31 14:18:56,848 Stage-1 map = 100%,  reduce = 74%, Cumulative CPU 1737.66 sec
2020-12-31 14:19:06,087 Stage-1 map = 100%,  reduce = 75%, Cumulative CPU 1748.69 sec
2020-12-31 14:19:16,359 Stage-1 map = 100%,  reduce = 76%, Cumulative CPU 1759.78 sec
2020-12-31 14:19:24,564 Stage-1 map = 100%,  reduce = 77%, Cumulative CPU 1768.65 sec
2020-12-31 14:19:34,806 Stage-1 map = 100%,  reduce = 78%, Cumulative CPU 1779.88 sec
2020-12-31 14:19:45,062 Stage-1 map = 100%,  reduce = 79%, Cumulative CPU 1791.24 sec
2020-12-31 14:19:53,273 Stage-1 map = 100%,  reduce = 80%, Cumulative CPU 1800.31 sec
2020-12-31 14:20:02,499 Stage-1 map = 100%,  reduce = 81%, Cumulative CPU 1811.28 sec
2020-12-31 14:20:12,750 Stage-1 map = 100%,  reduce = 82%, Cumulative CPU 1822.45 sec
2020-12-31 14:20:20,981 Stage-1 map = 100%,  reduce = 83%, Cumulative CPU 1831.51 sec
2020-12-31 14:20:31,219 Stage-1 map = 100%,  reduce = 84%, Cumulative CPU 1843.78 sec
2020-12-31 14:20:41,474 Stage-1 map = 100%,  reduce = 85%, Cumulative CPU 1855.11 sec
2020-12-31 14:20:48,653 Stage-1 map = 100%,  reduce = 86%, Cumulative CPU 1864.26 sec
2020-12-31 14:20:58,906 Stage-1 map = 100%,  reduce = 87%, Cumulative CPU 1875.36 sec
2020-12-31 14:21:09,160 Stage-1 map = 100%,  reduce = 88%, Cumulative CPU 1886.85 sec
2020-12-31 14:21:19,417 Stage-1 map = 100%,  reduce = 89%, Cumulative CPU 1897.92 sec
2020-12-31 14:21:26,596 Stage-1 map = 100%,  reduce = 90%, Cumulative CPU 1907.18 sec
2020-12-31 14:21:36,846 Stage-1 map = 100%,  reduce = 91%, Cumulative CPU 1918.57 sec
2020-12-31 14:21:47,109 Stage-1 map = 100%,  reduce = 92%, Cumulative CPU 1929.52 sec
2020-12-31 14:21:55,303 Stage-1 map = 100%,  reduce = 93%, Cumulative CPU 1938.42 sec
2020-12-31 14:22:05,571 Stage-1 map = 100%,  reduce = 94%, Cumulative CPU 1949.77 sec
2020-12-31 14:22:14,793 Stage-1 map = 100%,  reduce = 95%, Cumulative CPU 1960.81 sec
2020-12-31 14:22:23,001 Stage-1 map = 100%,  reduce = 96%, Cumulative CPU 1969.72 sec
2020-12-31 14:22:33,270 Stage-1 map = 100%,  reduce = 97%, Cumulative CPU 1981.0 sec
2020-12-31 14:22:43,503 Stage-1 map = 100%,  reduce = 98%, Cumulative CPU 1992.01 sec
2020-12-31 14:22:50,683 Stage-1 map = 100%,  reduce = 99%, Cumulative CPU 2000.89 sec
2020-12-31 14:23:05,030 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2016.56 sec
MapReduce Total cumulative CPU time: 33 minutes 36 seconds 560 msec
Ended Job = job_1609141291605_0023
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
20/12/31 14:23:06 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1609141291605_0024, Tracking URL = http://hp3:8088/proxy/application_1609141291605_0024/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1609141291605_0024
Hadoop job information for Stage-2: number of mappers: 2; number of reducers: 1
2020-12-31 14:23:17,212 Stage-2 map = 0%,  reduce = 0%
2020-12-31 14:23:24,475 Stage-2 map = 50%,  reduce = 0%, Cumulative CPU 5.37 sec
2020-12-31 14:23:25,505 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 11.9 sec
2020-12-31 14:23:30,651 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 14.14 sec
MapReduce Total cumulative CPU time: 14 seconds 140 msec
Ended Job = job_1609141291605_0024
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 117  Reduce: 469   Cumulative CPU: 2016.56 sec   HDFS Read: 31438766866 HDFS Write: 79070 HDFS EC Read: 0 SUCCESS
Stage-Stage-2: Map: 2  Reduce: 1   Cumulative CPU: 14.14 sec   HDFS Read: 207188 HDFS Write: 614 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 33 minutes 50 seconds 700 msec
OK
601096637       2011-08-16 00:00:00.0   PROD10  28
7504198 2011-08-16 00:00:00.0   PROD7   22
7666912 2011-08-16 00:00:00.0   PROD7   70
393337914       2011-08-16 00:00:00.0   PROD5   55
98814403        2011-08-16 00:00:00.0   PROD4   45
744615937       2011-08-16 00:00:00.0   PROD7   73
124859277       2011-08-16 00:00:00.0   PROD3   69
212317100       2011-08-16 00:00:00.0   PROD10  48
504809117       2011-08-16 00:00:00.0   PROD3   33
268235827       2011-08-16 00:00:00.0   PROD9   91
Time taken: 1517.782 seconds, Fetched: 10 row(s)

二.桶表抽样

当数据量特别大时,对全体数据进行处理存在困难时,抽样就显得尤其重要了。抽样可以从被抽取的数据中估计和推断出整体的特性,是科学实验、质量检验、社会调查普遍采用的一种经济有效的工作和研究方法。

Hive支持桶表抽样和块抽样。所谓桶表指的是在创建表时使用CLUSTERED BY子句创建了桶的表。桶表抽样的语法如下:

table_sample: TABLESAMPLE (BUCKET x OUT OF y [ON colname])

TABLESAMPLE子句允许用户编写用于数据抽样而不是整个表的查询,该子句出现FROM子句中,可用于任何表中。桶编号从1开始,colname表明抽取样本的列,可以是非分区列中的任意一列,或者使用rand()表明在整个行中抽取样本而不是单个列。在colname上分桶的行随机进入1到y个桶中,返回属于桶x的行。下面的例子中,返回32个桶中的第3个桶中的行:

代码:

-- 随机抽取一百分之一的数据
select * from ods_fact_sale tablesample(bucket 1 out of 100 on rand()) limit 100

测试记录:

hive> select * from ods_fact_sale tablesample(bucket 1 out of 100 on rand()) limit 100;
Query ID = root_20210106102309_b7fd3c38-74f3-4877-bf44-d5bb24a62a93
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
21/01/06 10:23:11 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1609141291605_0029, Tracking URL = http://hp3:8088/proxy/application_1609141291605_0029/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1609141291605_0029
Hadoop job information for Stage-1: number of mappers: 117; number of reducers: 0
2021-01-06 10:23:18,751 Stage-1 map = 0%,  reduce = 0%
2021-01-06 10:23:26,042 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 7.34 sec
2021-01-06 10:23:30,196 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 14.66 sec
2021-01-06 10:23:34,325 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 21.83 sec
2021-01-06 10:23:38,447 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 29.09 sec
2021-01-06 10:23:42,571 Stage-1 map = 8%,  reduce = 0%, Cumulative CPU 32.67 sec
2021-01-06 10:23:43,616 Stage-1 map = 9%,  reduce = 0%, Cumulative CPU 36.22 sec
2021-01-06 10:23:46,695 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 43.32 sec
2021-01-06 10:23:49,779 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 46.9 sec
2021-01-06 10:23:50,812 Stage-1 map = 12%,  reduce = 0%, Cumulative CPU 50.38 sec
2021-01-06 10:23:53,928 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 53.95 sec
2021-01-06 10:23:54,958 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU 57.4 sec
2021-01-06 10:23:58,044 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU 60.99 sec
2021-01-06 10:24:02,163 Stage-1 map = 16%,  reduce = 0%, Cumulative CPU 68.22 sec
2021-01-06 10:24:03,201 Stage-1 map = 17%,  reduce = 0%, Cumulative CPU 71.85 sec
2021-01-06 10:24:06,282 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 75.38 sec
2021-01-06 10:24:07,315 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 78.96 sec
2021-01-06 10:24:10,398 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU 82.35 sec
2021-01-06 10:24:11,427 Stage-1 map = 21%,  reduce = 0%, Cumulative CPU 85.78 sec
2021-01-06 10:24:15,539 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU 92.86 sec
2021-01-06 10:24:18,621 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 96.44 sec
2021-01-06 10:24:19,646 Stage-1 map = 24%,  reduce = 0%, Cumulative CPU 100.07 sec
2021-01-06 10:24:22,724 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 103.7 sec
2021-01-06 10:24:23,749 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU 107.35 sec
2021-01-06 10:24:26,827 Stage-1 map = 27%,  reduce = 0%, Cumulative CPU 114.56 sec
2021-01-06 10:24:29,950 Stage-1 map = 28%,  reduce = 0%, Cumulative CPU 118.18 sec
2021-01-06 10:24:30,972 Stage-1 map = 29%,  reduce = 0%, Cumulative CPU 121.63 sec
2021-01-06 10:24:34,056 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU 125.19 sec
2021-01-06 10:24:35,083 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 128.71 sec
2021-01-06 10:24:38,153 Stage-1 map = 32%,  reduce = 0%, Cumulative CPU 132.27 sec
2021-01-06 10:24:42,256 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 139.28 sec
2021-01-06 10:24:43,284 Stage-1 map = 34%,  reduce = 0%, Cumulative CPU 142.67 sec
2021-01-06 10:24:46,354 Stage-1 map = 35%,  reduce = 0%, Cumulative CPU 146.25 sec
2021-01-06 10:24:47,379 Stage-1 map = 36%,  reduce = 0%, Cumulative CPU 149.89 sec
2021-01-06 10:24:50,443 Stage-1 map = 37%,  reduce = 0%, Cumulative CPU 153.59 sec
2021-01-06 10:24:51,469 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 157.12 sec
2021-01-06 10:24:55,594 Stage-1 map = 39%,  reduce = 0%, Cumulative CPU 164.4 sec
2021-01-06 10:24:58,671 Stage-1 map = 40%,  reduce = 0%, Cumulative CPU 168.07 sec
2021-01-06 10:24:59,695 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU 171.6 sec
2021-01-06 10:25:03,822 Stage-1 map = 43%,  reduce = 0%, Cumulative CPU 179.17 sec
2021-01-06 10:25:07,927 Stage-1 map = 44%,  reduce = 0%, Cumulative CPU 186.31 sec
2021-01-06 10:25:12,024 Stage-1 map = 46%,  reduce = 0%, Cumulative CPU 193.51 sec
2021-01-06 10:25:16,114 Stage-1 map = 48%,  reduce = 0%, Cumulative CPU 200.55 sec
2021-01-06 10:25:20,199 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 207.57 sec
2021-01-06 10:25:24,294 Stage-1 map = 51%,  reduce = 0%, Cumulative CPU 214.76 sec
2021-01-06 10:25:28,387 Stage-1 map = 53%,  reduce = 0%, Cumulative CPU 221.97 sec
2021-01-06 10:25:32,493 Stage-1 map = 55%,  reduce = 0%, Cumulative CPU 228.8 sec
2021-01-06 10:25:36,581 Stage-1 map = 56%,  reduce = 0%, Cumulative CPU 235.82 sec
2021-01-06 10:25:40,678 Stage-1 map = 58%,  reduce = 0%, Cumulative CPU 243.01 sec
2021-01-06 10:25:44,771 Stage-1 map = 60%,  reduce = 0%, Cumulative CPU 250.35 sec
2021-01-06 10:25:48,863 Stage-1 map = 62%,  reduce = 0%, Cumulative CPU 257.45 sec
2021-01-06 10:25:52,971 Stage-1 map = 63%,  reduce = 0%, Cumulative CPU 264.79 sec
2021-01-06 10:25:56,067 Stage-1 map = 64%,  reduce = 0%, Cumulative CPU 268.38 sec
2021-01-06 10:25:57,091 Stage-1 map = 65%,  reduce = 0%, Cumulative CPU 272.06 sec
2021-01-06 10:26:00,161 Stage-1 map = 66%,  reduce = 0%, Cumulative CPU 275.58 sec
2021-01-06 10:26:01,181 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 279.07 sec
2021-01-06 10:26:04,288 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 282.68 sec
2021-01-06 10:26:08,389 Stage-1 map = 69%,  reduce = 0%, Cumulative CPU 289.73 sec
2021-01-06 10:26:09,413 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 293.31 sec
2021-01-06 10:26:12,491 Stage-1 map = 71%,  reduce = 0%, Cumulative CPU 296.94 sec
2021-01-06 10:26:13,517 Stage-1 map = 72%,  reduce = 0%, Cumulative CPU 300.47 sec
2021-01-06 10:26:16,646 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 304.17 sec
2021-01-06 10:26:17,667 Stage-1 map = 74%,  reduce = 0%, Cumulative CPU 307.78 sec
2021-01-06 10:26:21,775 Stage-1 map = 75%,  reduce = 0%, Cumulative CPU 314.72 sec
2021-01-06 10:26:24,842 Stage-1 map = 76%,  reduce = 0%, Cumulative CPU 318.24 sec
2021-01-06 10:26:25,864 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU 321.84 sec
2021-01-06 10:26:28,990 Stage-1 map = 78%,  reduce = 0%, Cumulative CPU 325.54 sec
2021-01-06 10:26:30,010 Stage-1 map = 79%,  reduce = 0%, Cumulative CPU 328.98 sec
2021-01-06 10:26:33,100 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 335.9 sec
2021-01-06 10:26:36,182 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 339.46 sec
2021-01-06 10:26:37,208 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 342.94 sec
2021-01-06 10:26:40,278 Stage-1 map = 83%,  reduce = 0%, Cumulative CPU 346.48 sec
2021-01-06 10:26:41,299 Stage-1 map = 84%,  reduce = 0%, Cumulative CPU 350.03 sec
2021-01-06 10:26:44,368 Stage-1 map = 85%,  reduce = 0%, Cumulative CPU 353.74 sec
2021-01-06 10:26:48,470 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU 360.97 sec
2021-01-06 10:26:49,519 Stage-1 map = 87%,  reduce = 0%, Cumulative CPU 364.47 sec
2021-01-06 10:26:52,587 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 367.97 sec
2021-01-06 10:26:53,612 Stage-1 map = 89%,  reduce = 0%, Cumulative CPU 371.48 sec
2021-01-06 10:26:56,714 Stage-1 map = 90%,  reduce = 0%, Cumulative CPU 375.17 sec
2021-01-06 10:26:57,740 Stage-1 map = 91%,  reduce = 0%, Cumulative CPU 378.75 sec
2021-01-06 10:27:01,849 Stage-1 map = 92%,  reduce = 0%, Cumulative CPU 386.16 sec
2021-01-06 10:27:04,955 Stage-1 map = 93%,  reduce = 0%, Cumulative CPU 389.82 sec
2021-01-06 10:27:05,978 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 393.33 sec
2021-01-06 10:27:09,048 Stage-1 map = 95%,  reduce = 0%, Cumulative CPU 397.01 sec
2021-01-06 10:27:10,079 Stage-1 map = 96%,  reduce = 0%, Cumulative CPU 400.49 sec
2021-01-06 10:27:13,200 Stage-1 map = 97%,  reduce = 0%, Cumulative CPU 404.31 sec
2021-01-06 10:27:16,273 Stage-1 map = 98%,  reduce = 0%, Cumulative CPU 411.48 sec
2021-01-06 10:27:18,312 Stage-1 map = 99%,  reduce = 0%, Cumulative CPU 414.97 sec
2021-01-06 10:27:20,354 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 418.35 sec
MapReduce Total cumulative CPU time: 6 minutes 58 seconds 350 msec
Ended Job = job_1609141291605_0029
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 117   Cumulative CPU: 418.35 sec   HDFS Read: 62555036 HDFS Write: 629015 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 minutes 58 seconds 350 msec
OK
169387977       2011-01-30 00:00:00.0   PROD10  87
169387995       2011-05-10 00:00:00.0   PROD6   86
169388013       2011-04-14 00:00:00.0   PROD10  46
169388092       2010-06-07 00:00:00.0   PROD3   34
169388149       2010-06-21 00:00:00.0   PROD7   16
169388210       2011-10-05 00:00:00.0   PROD3   85
169388272       2012-02-27 00:00:00.0   PROD6   65
169388359       2012-08-30 00:00:00.0   PROD10  10
169388383       2011-11-09 00:00:00.0   PROD5   95
169388414       2011-07-25 00:00:00.0   PROD2   35
169388433       2011-05-18 00:00:00.0   PROD9   85
169388697       2010-12-25 00:00:00.0   PROD9   20
169388811       2012-04-03 00:00:00.0   PROD9   49
169388872       2010-11-23 00:00:00.0   PROD7   71
169388935       2012-04-18 00:00:00.0   PROD6   62
169389026       2011-03-21 00:00:00.0   PROD10  80
169389070       2010-09-09 00:00:00.0   PROD3   90
169389083       2010-05-20 00:00:00.0   PROD3   41
169389370       2011-01-28 00:00:00.0   PROD6   39
169389409       2012-08-09 00:00:00.0   PROD2   20
169389430       2012-08-23 00:00:00.0   PROD3   47
169389517       2011-10-25 00:00:00.0   PROD8   33
169389759       2010-09-03 00:00:00.0   PROD3   14
169389802       2010-08-22 00:00:00.0   PROD3   55
169389899       2012-01-14 00:00:00.0   PROD5   80
169389935       2010-06-10 00:00:00.0   PROD9   25
169390249       2010-09-05 00:00:00.0   PROD7   89
169390332       2012-07-28 00:00:00.0   PROD9   24
169390405       2011-09-30 00:00:00.0   PROD6   82
169390432       2010-09-04 00:00:00.0   PROD6   3
169390525       2011-04-24 00:00:00.0   PROD6   50
169390529       2012-06-29 00:00:00.0   PROD4   36
169390596       2011-09-29 00:00:00.0   PROD2   69
169390726       2011-01-09 00:00:00.0   PROD4   20
169390784       2011-08-20 00:00:00.0   PROD7   19
169390821       2010-07-14 00:00:00.0   PROD4   44
169390835       2010-09-24 00:00:00.0   PROD2   15
169390858       2012-08-08 00:00:00.0   PROD5   3
169391297       2011-03-24 00:00:00.0   PROD10  75
169391461       2012-03-14 00:00:00.0   PROD4   32
169391509       2010-11-23 00:00:00.0   PROD3   28
169391526       2012-03-28 00:00:00.0   PROD6   35
169391558       2011-02-21 00:00:00.0   PROD2   79
169391632       2010-10-09 00:00:00.0   PROD9   37
169391649       2012-09-22 00:00:00.0   PROD8   80
169391761       2011-03-15 00:00:00.0   PROD7   45
169391765       2011-01-23 00:00:00.0   PROD4   71
169391951       2012-03-08 00:00:00.0   PROD3   97
169392051       2011-05-13 00:00:00.0   PROD9   27
169392357       2010-05-22 00:00:00.0   PROD4   8
169392408       2011-01-06 00:00:00.0   PROD7   31
169392481       2012-07-25 00:00:00.0   PROD10  81
169392709       2012-08-12 00:00:00.0   PROD3   75
169392782       2012-07-28 00:00:00.0   PROD2   8
169392825       2011-03-14 00:00:00.0   PROD7   89
169392843       2010-10-31 00:00:00.0   PROD3   19
169392864       2011-05-19 00:00:00.0   PROD4   88
169392979       2012-05-11 00:00:00.0   PROD4   65
169393180       2011-05-02 00:00:00.0   PROD4   99
169393214       2011-10-27 00:00:00.0   PROD7   31
169393460       2012-07-27 00:00:00.0   PROD8   63
169393613       2011-03-03 00:00:00.0   PROD9   55
169393624       2010-04-24 00:00:00.0   PROD7   80
169393740       2011-08-17 00:00:00.0   PROD8   71
169394026       2012-06-07 00:00:00.0   PROD9   76
169394117       2012-02-29 00:00:00.0   PROD4   72
169394147       2011-12-23 00:00:00.0   PROD7   53
169394177       2011-01-07 00:00:00.0   PROD7   35
169394508       2012-05-24 00:00:00.0   PROD3   88
169394552       2011-07-16 00:00:00.0   PROD4   41
169394614       2010-08-17 00:00:00.0   PROD6   98
169394631       2010-09-23 00:00:00.0   PROD10  45
169394679       2011-01-22 00:00:00.0   PROD6   57
169394778       2011-09-03 00:00:00.0   PROD10  45
169394824       2011-06-04 00:00:00.0   PROD8   82
169394827       2010-07-14 00:00:00.0   PROD9   42
169394830       2012-03-09 00:00:00.0   PROD10  36
169394864       2010-09-17 00:00:00.0   PROD9   56
169394881       2011-07-01 00:00:00.0   PROD6   7
169395019       2011-11-17 00:00:00.0   PROD6   66
169395142       2012-01-21 00:00:00.0   PROD6   54
169395197       2012-08-10 00:00:00.0   PROD5   72
169395226       2010-09-20 00:00:00.0   PROD3   88
169395253       2011-12-31 00:00:00.0   PROD4   56
169395358       2010-07-16 00:00:00.0   PROD2   75
169395367       2010-12-16 00:00:00.0   PROD4   86
169395398       2012-01-07 00:00:00.0   PROD5   18
169395418       2011-05-08 00:00:00.0   PROD7   82
169395463       2011-08-23 00:00:00.0   PROD9   44
169395636       2011-01-16 00:00:00.0   PROD8   11
169395766       2012-06-05 00:00:00.0   PROD4   43
169395909       2011-12-10 00:00:00.0   PROD5   79
169395943       2012-05-11 00:00:00.0   PROD4   27
169395960       2012-01-17 00:00:00.0   PROD7   43
169396093       2011-08-28 00:00:00.0   PROD8   60
169396142       2010-11-13 00:00:00.0   PROD7   46
169396183       2011-06-16 00:00:00.0   PROD8   88
169396195       2010-10-06 00:00:00.0   PROD3   60
169396279       2012-06-18 00:00:00.0   PROD2   65
169396328       2011-05-14 00:00:00.0   PROD5   21
Time taken: 251.799 seconds, Fetched: 100 row(s)
hive> 

三.数据块抽样

1) tablesample(n percent) 根据hive表数据的大小按比例抽取数据,并保存到新的hive表中。如:抽取原hive表中10%的数据
(注意:测试过程中发现,select语句不能带where条件且不支持子查询,可通过新建中间表或使用随机抽样解决)
create table xxx_new as select * from xxx tablesample(10 percent)
2)tablesample(n M) 指定抽样数据的大小,单位为M。
3)tablesample(n rows) 指定抽样数据的行数,其中n代表每个map任务均取n行数据,map数量可通过hive表的简单查询语句确认(关键词:number of mappers: x)

代码:

create table sample_test1 as select * from ods_fact_sale tablesample(10000 rows);

测试记录:

hive> 
    > create table sample_test1 as select * from ods_fact_sale tablesample(10000 rows);
Query ID = root_20210106103549_9aaeea0b-6414-40ea-af0b-2942c80ad3a4
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
21/01/06 10:35:50 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1609141291605_0031, Tracking URL = http://hp3:8088/proxy/application_1609141291605_0031/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1609141291605_0031
Hadoop job information for Stage-1: number of mappers: 117; number of reducers: 0
2021-01-06 10:43:18,970 Stage-1 map = 0%,  reduce = 0%
2021-01-06 10:43:25,150 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 2.23 sec
2021-01-06 10:43:26,183 Stage-1 map = 2%,  reduce = 0%, Cumulative CPU 4.48 sec
2021-01-06 10:43:29,274 Stage-1 map = 3%,  reduce = 0%, Cumulative CPU 6.63 sec
2021-01-06 10:43:33,375 Stage-1 map = 4%,  reduce = 0%, Cumulative CPU 10.81 sec
2021-01-06 10:43:34,415 Stage-1 map = 5%,  reduce = 0%, Cumulative CPU 13.01 sec
2021-01-06 10:43:37,513 Stage-1 map = 6%,  reduce = 0%, Cumulative CPU 15.14 sec
2021-01-06 10:43:38,545 Stage-1 map = 7%,  reduce = 0%, Cumulative CPU 17.34 sec
2021-01-06 10:43:41,660 Stage-1 map = 9%,  reduce = 0%, Cumulative CPU 22.66 sec
2021-01-06 10:43:45,757 Stage-1 map = 10%,  reduce = 0%, Cumulative CPU 27.05 sec
2021-01-06 10:43:48,836 Stage-1 map = 11%,  reduce = 0%, Cumulative CPU 29.23 sec
2021-01-06 10:43:49,866 Stage-1 map = 12%,  reduce = 0%, Cumulative CPU 31.3 sec
2021-01-06 10:43:53,953 Stage-1 map = 14%,  reduce = 0%, Cumulative CPU 36.41 sec
2021-01-06 10:43:57,029 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU 38.51 sec
2021-01-06 10:44:01,131 Stage-1 map = 16%,  reduce = 0%, Cumulative CPU 42.67 sec
2021-01-06 10:44:02,159 Stage-1 map = 17%,  reduce = 0%, Cumulative CPU 44.8 sec
2021-01-06 10:44:05,239 Stage-1 map = 18%,  reduce = 0%, Cumulative CPU 47.68 sec
2021-01-06 10:44:06,263 Stage-1 map = 19%,  reduce = 0%, Cumulative CPU 49.82 sec
2021-01-06 10:44:09,337 Stage-1 map = 20%,  reduce = 0%, Cumulative CPU 51.91 sec
2021-01-06 10:44:10,363 Stage-1 map = 21%,  reduce = 0%, Cumulative CPU 54.01 sec
2021-01-06 10:44:14,485 Stage-1 map = 22%,  reduce = 0%, Cumulative CPU 58.3 sec
2021-01-06 10:44:17,602 Stage-1 map = 23%,  reduce = 0%, Cumulative CPU 60.69 sec
2021-01-06 10:44:18,629 Stage-1 map = 24%,  reduce = 0%, Cumulative CPU 62.8 sec
2021-01-06 10:44:20,695 Stage-1 map = 25%,  reduce = 0%, Cumulative CPU 62.8 sec
2021-01-06 10:44:21,722 Stage-1 map = 26%,  reduce = 0%, Cumulative CPU 67.1 sec
2021-01-06 10:44:25,807 Stage-1 map = 27%,  reduce = 0%, Cumulative CPU 72.28 sec
2021-01-06 10:44:28,901 Stage-1 map = 28%,  reduce = 0%, Cumulative CPU 74.58 sec
2021-01-06 10:44:29,928 Stage-1 map = 29%,  reduce = 0%, Cumulative CPU 76.67 sec
2021-01-06 10:44:33,007 Stage-1 map = 30%,  reduce = 0%, Cumulative CPU 78.79 sec
2021-01-06 10:44:34,028 Stage-1 map = 31%,  reduce = 0%, Cumulative CPU 80.96 sec
2021-01-06 10:44:37,102 Stage-1 map = 32%,  reduce = 0%, Cumulative CPU 83.07 sec
2021-01-06 10:44:41,245 Stage-1 map = 33%,  reduce = 0%, Cumulative CPU 87.27 sec
2021-01-06 10:44:42,273 Stage-1 map = 34%,  reduce = 0%, Cumulative CPU 89.43 sec
2021-01-06 10:44:45,358 Stage-1 map = 35%,  reduce = 0%, Cumulative CPU 91.67 sec
2021-01-06 10:44:46,384 Stage-1 map = 36%,  reduce = 0%, Cumulative CPU 93.74 sec
2021-01-06 10:44:49,455 Stage-1 map = 37%,  reduce = 0%, Cumulative CPU 95.87 sec
2021-01-06 10:44:50,475 Stage-1 map = 38%,  reduce = 0%, Cumulative CPU 97.96 sec
2021-01-06 10:44:54,573 Stage-1 map = 39%,  reduce = 0%, Cumulative CPU 102.2 sec
2021-01-06 10:44:57,641 Stage-1 map = 40%,  reduce = 0%, Cumulative CPU 104.33 sec
2021-01-06 10:44:58,664 Stage-1 map = 41%,  reduce = 0%, Cumulative CPU 106.43 sec
2021-01-06 10:45:01,731 Stage-1 map = 42%,  reduce = 0%, Cumulative CPU 108.6 sec
2021-01-06 10:45:02,748 Stage-1 map = 43%,  reduce = 0%, Cumulative CPU 110.65 sec
2021-01-06 10:45:05,815 Stage-1 map = 44%,  reduce = 0%, Cumulative CPU 112.77 sec
2021-01-06 10:45:09,914 Stage-1 map = 45%,  reduce = 0%, Cumulative CPU 117.8 sec
2021-01-06 10:45:11,961 Stage-1 map = 46%,  reduce = 0%, Cumulative CPU 120.7 sec
2021-01-06 10:45:13,062 Stage-1 map = 47%,  reduce = 0%, Cumulative CPU 122.89 sec
2021-01-06 10:45:15,114 Stage-1 map = 48%,  reduce = 0%, Cumulative CPU 125.05 sec
2021-01-06 10:45:17,165 Stage-1 map = 49%,  reduce = 0%, Cumulative CPU 127.48 sec
2021-01-06 10:45:19,206 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 129.7 sec
2021-01-06 10:45:23,292 Stage-1 map = 51%,  reduce = 0%, Cumulative CPU 134.0 sec
2021-01-06 10:45:25,332 Stage-1 map = 52%,  reduce = 0%, Cumulative CPU 136.29 sec
2021-01-06 10:45:27,388 Stage-1 map = 53%,  reduce = 0%, Cumulative CPU 138.46 sec
2021-01-06 10:45:29,446 Stage-1 map = 54%,  reduce = 0%, Cumulative CPU 140.55 sec
2021-01-06 10:45:31,492 Stage-1 map = 55%,  reduce = 0%, Cumulative CPU 142.66 sec
2021-01-06 10:45:33,543 Stage-1 map = 56%,  reduce = 0%, Cumulative CPU 144.88 sec
2021-01-06 10:45:37,635 Stage-1 map = 57%,  reduce = 0%, Cumulative CPU 149.09 sec
2021-01-06 10:45:39,684 Stage-1 map = 58%,  reduce = 0%, Cumulative CPU 151.19 sec
2021-01-06 10:45:41,722 Stage-1 map = 59%,  reduce = 0%, Cumulative CPU 153.36 sec
2021-01-06 10:45:43,772 Stage-1 map = 60%,  reduce = 0%, Cumulative CPU 155.63 sec
2021-01-06 10:45:45,845 Stage-1 map = 61%,  reduce = 0%, Cumulative CPU 157.83 sec
2021-01-06 10:45:47,898 Stage-1 map = 62%,  reduce = 0%, Cumulative CPU 160.0 sec
2021-01-06 10:45:50,964 Stage-1 map = 63%,  reduce = 0%, Cumulative CPU 164.4 sec
2021-01-06 10:45:53,011 Stage-1 map = 64%,  reduce = 0%, Cumulative CPU 166.52 sec
2021-01-06 10:45:56,082 Stage-1 map = 65%,  reduce = 0%, Cumulative CPU 169.38 sec
2021-01-06 10:45:57,100 Stage-1 map = 66%,  reduce = 0%, Cumulative CPU 171.54 sec
2021-01-06 10:45:59,150 Stage-1 map = 67%,  reduce = 0%, Cumulative CPU 173.78 sec
2021-01-06 10:46:01,196 Stage-1 map = 68%,  reduce = 0%, Cumulative CPU 175.86 sec
2021-01-06 10:46:05,279 Stage-1 map = 69%,  reduce = 0%, Cumulative CPU 180.23 sec
2021-01-06 10:46:07,323 Stage-1 map = 70%,  reduce = 0%, Cumulative CPU 182.35 sec
2021-01-06 10:46:09,367 Stage-1 map = 71%,  reduce = 0%, Cumulative CPU 184.54 sec
2021-01-06 10:46:11,417 Stage-1 map = 72%,  reduce = 0%, Cumulative CPU 186.79 sec
2021-01-06 10:46:13,466 Stage-1 map = 73%,  reduce = 0%, Cumulative CPU 189.05 sec
2021-01-06 10:46:15,512 Stage-1 map = 74%,  reduce = 0%, Cumulative CPU 191.37 sec
2021-01-06 10:46:19,604 Stage-1 map = 75%,  reduce = 0%, Cumulative CPU 196.46 sec
2021-01-06 10:46:21,656 Stage-1 map = 76%,  reduce = 0%, Cumulative CPU 198.58 sec
2021-01-06 10:46:23,700 Stage-1 map = 77%,  reduce = 0%, Cumulative CPU 200.71 sec
2021-01-06 10:46:25,743 Stage-1 map = 78%,  reduce = 0%, Cumulative CPU 202.83 sec
2021-01-06 10:46:27,790 Stage-1 map = 79%,  reduce = 0%, Cumulative CPU 205.01 sec
2021-01-06 10:46:31,884 Stage-1 map = 80%,  reduce = 0%, Cumulative CPU 210.06 sec
2021-01-06 10:46:33,933 Stage-1 map = 81%,  reduce = 0%, Cumulative CPU 212.22 sec
2021-01-06 10:46:36,001 Stage-1 map = 82%,  reduce = 0%, Cumulative CPU 215.35 sec
2021-01-06 10:46:38,047 Stage-1 map = 83%,  reduce = 0%, Cumulative CPU 217.44 sec
2021-01-06 10:46:40,097 Stage-1 map = 84%,  reduce = 0%, Cumulative CPU 219.61 sec
2021-01-06 10:46:42,146 Stage-1 map = 85%,  reduce = 0%, Cumulative CPU 221.86 sec
2021-01-06 10:46:45,215 Stage-1 map = 86%,  reduce = 0%, Cumulative CPU 226.88 sec
2021-01-06 10:46:47,259 Stage-1 map = 87%,  reduce = 0%, Cumulative CPU 229.08 sec
2021-01-06 10:46:50,350 Stage-1 map = 88%,  reduce = 0%, Cumulative CPU 231.42 sec
2021-01-06 10:46:51,376 Stage-1 map = 89%,  reduce = 0%, Cumulative CPU 233.67 sec
2021-01-06 10:46:53,421 Stage-1 map = 90%,  reduce = 0%, Cumulative CPU 235.9 sec
2021-01-06 10:46:55,456 Stage-1 map = 91%,  reduce = 0%, Cumulative CPU 238.13 sec
2021-01-06 10:46:59,543 Stage-1 map = 92%,  reduce = 0%, Cumulative CPU 242.35 sec
2021-01-06 10:47:01,588 Stage-1 map = 93%,  reduce = 0%, Cumulative CPU 244.55 sec
2021-01-06 10:47:03,636 Stage-1 map = 94%,  reduce = 0%, Cumulative CPU 246.69 sec
2021-01-06 10:47:05,701 Stage-1 map = 95%,  reduce = 0%, Cumulative CPU 248.95 sec
2021-01-06 10:47:07,755 Stage-1 map = 96%,  reduce = 0%, Cumulative CPU 251.08 sec
2021-01-06 10:47:09,798 Stage-1 map = 97%,  reduce = 0%, Cumulative CPU 253.23 sec
2021-01-06 10:47:13,877 Stage-1 map = 98%,  reduce = 0%, Cumulative CPU 257.48 sec
2021-01-06 10:47:15,930 Stage-1 map = 99%,  reduce = 0%, Cumulative CPU 259.56 sec
2021-01-06 10:47:17,973 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 262.3 sec
MapReduce Total cumulative CPU time: 4 minutes 22 seconds 300 msec
Ended Job = job_1609141291605_0031
Stage-4 is filtered out by condition resolver.
Stage-3 is selected by condition resolver.
Stage-5 is filtered out by condition resolver.
Launching Job 3 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
21/01/06 10:47:19 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1609141291605_0032, Tracking URL = http://hp3:8088/proxy/application_1609141291605_0032/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1609141291605_0032
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2021-01-06 10:47:29,293 Stage-3 map = 0%,  reduce = 0%
2021-01-06 10:47:37,518 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 5.67 sec
MapReduce Total cumulative CPU time: 5 seconds 670 msec
Ended Job = job_1609141291605_0032
Moving data to directory hdfs://nameservice1/user/hive/warehouse/test.db/sample_test1
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 117   Cumulative CPU: 262.3 sec   HDFS Read: 61853025 HDFS Write: 47856547 HDFS EC Read: 0 SUCCESS
Stage-Stage-3: Map: 1   Cumulative CPU: 5.67 sec   HDFS Read: 47866656 HDFS Write: 47847187 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 4 minutes 27 seconds 970 msec
OK
Time taken: 709.589 seconds
hive> 
    > select count(*) from sample_test1;
Query ID = root_20210106105110_0c94562d-021f-45ac-bf4e-d0fa98dcf849
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
21/01/06 10:51:10 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1609141291605_0033, Tracking URL = http://hp3:8088/proxy/application_1609141291605_0033/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1609141291605_0033
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2021-01-06 10:51:17,757 Stage-1 map = 0%,  reduce = 0%
2021-01-06 10:51:25,012 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.8 sec
2021-01-06 10:51:30,170 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.23 sec
MapReduce Total cumulative CPU time: 6 seconds 230 msec
Ended Job = job_1609141291605_0033
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 6.23 sec   HDFS Read: 47855329 HDFS Write: 107 HDFS EC Read: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 230 msec
OK
1170000
Time taken: 20.625 seconds, Fetched: 1 row(s)
hive> 

参考

1.https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Sampling
2.https://blog.csdn.net/baidu_20183817/article/details/84099049

标签:map,抽样,Cumulative,reduce,Hive,Sampling,sec,CPU,Stage
来源: https://blog.csdn.net/u010520724/article/details/112977280

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有