ICode9

精准搜索请尝试: 精确搜索
首页 > 其他分享> 文章详细

hive 窗口函数(一)

2021-05-22 10:02:22  阅读:206  来源: 互联网

标签:pv 函数 over partition hive cookieid between 窗口 createtime


窗口函数可以对数据进行分组,将数据按组分成一个个窗口。利用窗口函数可以对窗口内的数据进行排序,聚合操作,还可以通过window子句让窗口上下滑动,非常灵活。

数据准备

cookie1,2015-04-10,1
cookie1,2015-04-11,5
cookie1,2015-04-12,7
cookie1,2015-04-13,3
cookie1,2015-04-14,2
cookie1,2015-04-15,4
cookie1,2015-04-16,4

create table  cookie1_sum
(
    cookieid   string,
    createtime string,
    pv         int
) row format delimited fields terminated by ',';

select * from   cookie1_sum;

 一、sum

select cookieid,
       createtime,
       pv,
       sum(pv) over (partition by cookieid order by createtime rows between unbounded preceding and current row ) p1,
       sum(pv) over (partition by cookieid order by createtime )                                                  p2,
       sum(pv) over (partition by cookieid )                                                                      p3,
       sum(pv) over (partition by cookieid order by createtime rows between 3 preceding and 1 following )         p4,
       sum(pv) over (partition by cookieid order by createtime rows between 3 preceding and current row )         p5,
       sum(pv) over (partition by cookieid order by createtime rows between current row and unbounded following)  p6
from hive_zb_serv_hq.cookie1_sum;

 

 说明:        

pv1--rows between unbounded preceding and current row :
 分组内从起点到当前行的pv累积,如,11号的pv1=10号的pv+11号的pv, 12号=10号+11号+12号
pv2: 分组内从起点到当前行的pv累积  同上
pv3: 分组内(cookie1)所有的pv累加
pv4 rows between 3 preceding and current row : 
分组内当前行+往前3行,如,11号=10号+11号, 12号=10号+11号+12号, 13号=10号+11号+12号+13号, 14号=11号+12号+13号+14号
pv5 rows between 3 preceding and 1 following: 
分组内当前行+往前3行+往后1行,如,14号=11号+12号+13号+14号+15号=5+7+3+2+4=21
pv6 rows between current row and unbounded following:
 分组内当前行+往后所有行,如,13号=13号+14号+15号+16号=3+2+4+4=13,14号=14号+15号+16号=2+4+4=10

如果不指定ROWS BETWEEN,默认为从起点到当前行;如果不指定ORDER BY,则将分组内所有值累加;关键是理解ROWS BETWEEN含义,也叫做WINDOW子句:
PRECEDING:往前
FOLLOWING:往后
CURRENT ROW:当前行
UNBOUNDED:起点, 

UNBOUNDED PRECEDING 表示从前面的起点,

UNBOUNDED FOLLOWING:表示到后面的终点

–其他AVG,MIN,MAX,和SUM用法一样。

二、avg

set mapreduce.map.memory.mb=8192;
set mapreduce.reduce.memory.mb=8192;
select
   cookieid,
   createtime,
   pv,
   avg(pv) over (partition by cookieid order by createtime rows between unbounded preceding and current row) as pv1, -- 默认为从起点到当前行
   avg(pv) over (partition by cookieid order by createtime) as pv2, --从起点到当前行,结果同pv1
   avg(pv) over (partition by cookieid) as pv3, --分组内所有行
   avg(pv) over (partition by cookieid order by createtime rows between 3 preceding and current row) as pv4, --当前行+往前3行
   avg(pv) over (partition by cookieid order by createtime rows between 3 preceding and 1 following) as pv5, --当前行+往前3行+往后1行
   avg(pv) over (partition by cookieid order by createtime rows between current row and unbounded following) as pv6  --当前行+往后所有行
 from hive_zb_serv_hq.cookie1_sum;

 

 

 三、max

set mapreduce.map.memory.mb=8192;
set mapreduce.reduce.memory.mb=8192;
select
   cookieid,
   createtime,
   pv,
   max(pv) over (partition by cookieid order by createtime rows between unbounded preceding and current row) as pv1, -- 默认为从起点到当前行
   max(pv) over (partition by cookieid order by createtime) as pv2, --从起点到当前行,结果同pv1
   max(pv) over (partition by cookieid) as pv3, --分组内所有行
   max(pv) over (partition by cookieid order by createtime rows between 3 preceding and current row) as pv4, --当前行+往前3行
   max(pv) over (partition by cookieid order by createtime rows between 3 preceding and 1 following) as pv5, --当前行+往前3行+往后1行
   max(pv) over (partition by cookieid order by createtime rows between current row and unbounded following) as pv6  --当前行+往后所有行
 from hive_zb_serv_hq.cookie1_sum;

 

 

 四、min

set mapreduce.map.memory.mb=8192;
set mapreduce.reduce.memory.mb=8192;
select
   cookieid,
   createtime,
   pv,
   min(pv) over (partition by cookieid order by createtime rows between unbounded preceding and current row) as pv1, -- 默认为从起点到当前行
   min(pv) over (partition by cookieid order by createtime) as pv2, --从起点到当前行,结果同pv1
   min(pv) over (partition by cookieid) as pv3, --分组内所有行
   min(pv) over (partition by cookieid order by createtime rows between 3 preceding and current row) as pv4, --当前行+往前3行
   min(pv) over (partition by cookieid order by createtime rows between 3 preceding and 1 following) as pv5, --当前行+往前3行+往后1行
   min(pv) over (partition by cookieid order by createtime rows between current row and unbounded following) as pv6  --当前行+往后所有行
 from hive_zb_serv_hq.cookie1_sum;

 

标签:pv,函数,over,partition,hive,cookieid,between,窗口,createtime
来源: https://www.cnblogs.com/wdh01/p/14783407.html

本站声明: 1. iCode9 技术分享网(下文简称本站)提供的所有内容,仅供技术学习、探讨和分享;
2. 关于本站的所有留言、评论、转载及引用,纯属内容发起人的个人观点,与本站观点和立场无关;
3. 关于本站的所有言论和文字,纯属内容发起人的个人观点,与本站观点和立场无关;
4. 本站文章均是网友提供,不完全保证技术分享内容的完整性、准确性、时效性、风险性和版权归属;如您发现该文章侵犯了您的权益,可联系我们第一时间进行删除;
5. 本站为非盈利性的个人网站,所有内容不会用来进行牟利,也不会利用任何形式的广告来间接获益,纯粹是为了广大技术爱好者提供技术内容和技术思想的分享性交流网站。

专注分享技术,共同学习,共同进步。侵权联系[81616952@qq.com]

Copyright (C)ICode9.com, All Rights Reserved.

ICode9版权所有