lag(cloumn,n) 前n行数数据
例如: lag(ctime , 1)
店铺名,日期,订单金额
a,2020-02-10,600 a,2020-03-01,200 a,2020-03-02,300 a,2020-03-03,200 a,2020-03-04,400 a,2020-03-05,600 a,2020-02-05,200 a,2020-02-06,300 a,2020-02-07,200 a,2020-02-08,400 b,2020-02-05,200 b,2020-02-06,300 b,2020-02-08,200 b,2020-02-09,400 b,2020-02-10,600 c,2020-01-31,200 c,2020-02-01,300 c,2020-02-02,200 c,2020-02-03,400 c,2020-02-10,600
统计连续三天有订单的店铺,列出店铺名
1. 创建表---加载数据
---创建表 create table tb_shop( name string, ctime string, cost double ) row format delimited fields terminated by ','; ---加载本地数据到表中 load data local inpath '/doit17/orders.txt' into table tb_shop;2. 为了判断数据是连续三天的---按照店铺名分区---对日期进行排序并编号
为了得到结果更快 , 本地运行mapreduce程序 , 不提交Yarn
set mapreduce.framework.name=local;
查看所有函数
show functions;
查看具体函数的用法
desc function 函数;
select *, row_number() over(partition by name order by ctime) from tb_shop; ---row_number() 给每行数据编号 ---partition by name 按照店铺名分区 ---order by ctime 相同店铺名按照日期排序3. 使用函数date_sub将日期与编号相减 , 连续几个结果相同即为连续几天有订单
date_sub(start_date, num_days) - Returns the date that is num_days before start_date.
select *, date_sub(ctime , rn) diff from ( select *, row_number() over(partition by name order by ctime) rn ---对编号起别名 from tb_shop ) t1 ---对子查询起别名 ;4. 按照店铺名和diff进行分组,并统计组内个数---店铺名相同,diff相同为一组---统计连续天数>3的店铺名和天数
select name, diff, count(1) days from ( select *, date_sub(ctime , rn) diff from ( select *, row_number() over(partition by name order by ctime) rn from tb_shop ) t1 ) t2 group by name,diff having days > 3 ;5. 去除重复数据--得出最终结果
select distinct name from ( select name, diff, count(1) days from ( select *, date_sub(ctime , rn) diff from ( select *, row_number() over(partition by name order by ctime) rn from tb_shop ) t1 ) t2 group by name,diff having days > 3 ) t3 ;
人名,打地鼠次数编号,1代表击中
u01,1,1 u01,2,0 u01,3,1 u01,4,1 u01,5,0 u01,6,1 u02,1,1 u02,2,1 u02,3,0 u02,4,1 u02,5,1 u02,6,0 u02,7,0 u02,8,1 u02,9,1 u03,1,1 u03,2,1 u03,3,1 u03,4,1 u03,5,1 u03,6,0