已知有用户账户表,包含年份,用户ID和值,请按照年份分组,取出值前两小和前两大对应的用户ID。需要保持值最小和最大的用户ID排首位。
样例数据
+-------+----------+--------+ | year | user_id | value | +-------+----------+--------+ | 2022 | A | 30 | | 2022 | B | 10 | | 2022 | C | 20 | | 2023 | A | 40 | | 2023 | B | 50 | | 2023 | C | 20 | | 2023 | D | 30 | +-------+----------+--------+
期望结果
+-------+-----------------+-----------------+ | year | min_user_list | max_user_list | +-------+-----------------+-----------------+ | 2022 | B,C | A,C | | 2023 | C,D | B,A | +-------+-----------------+-----------------+
分析
属于取最大最小记录的升级版,最大难点在于拼接用户要保证有序。
关于 CONCAT_WS
函数的详细用法,详见 MaxCompute 官方文档。
SQL
使用 ROW_NUMBER() 函数根据年份分组,VALUE 字段正排和倒排得到两个序列
使用 row_number()
函数根据年份分组,根据 value
字段正序得到 asc_rn 用于取出 value
最小的两行记录;根据 value
倒序得到 desc_rn 用于取出最大的两行记录。
- 执行 SQL
select year ,user_id ,value ,row_number() over(partition by year order by value asc) as asc_rn ,row_number() over(partition by year order by value desc) as desc_rn from user_account;
- 返回结果
取出 VALUE 值最大的 USER_ID、第二大 USER_ID、最小 USER_ID、第二小 USER_ID
使用 if
函数对 asc_rn 和 desc_rn 字段进行判断,对符合条件的数据取出 user_id,不满足则取 null。
- 执行 SQL
select year ,user_id ,value ,if(asc_rn = 1, user_id, null) as min_user_id_1 ,if(asc_rn = 2, user_id, null) as min_user_id_2 ,if(desc_rn = 1, user_id, null) as max_user_id_1 ,if(desc_rn = 2, user_id, null) as max_user_id_2 from ( select year ,user_id ,value ,row_number() over(partition by year order by value asc) as asc_rn ,row_number() over(partition by year order by value desc) as desc_rn from user_account );
- 返回结果
根据 YEAR 字段分组聚合
根据 year
字段分组聚合,将多行记录合并为单行记录。
- 执行 SQL
select year ,max(if(asc_rn = 1, user_id, null)) as min_user_id_1 ,max(if(asc_rn = 2, user_id, null)) as min_user_id_2 ,max(if(desc_rn = 1, user_id, null)) as max_user_id_1 ,max(if(desc_rn = 2, user_id, null)) as max_user_id_2 from ( select year ,user_id ,value ,row_number() over(partition by year order by value asc) as asc_rn ,row_number() over(partition by year order by value desc) as desc_rn from user_account ) group by year;
- 返回结果
拼接字符串,得到最终结果
拼接 min1_user_id、min2_user_id 字段为 min_user_list;拼接 max1_user_id、max2_user_id 字段为 max_user_list。
- 执行 SQL
select year ,concat_ws(',', min_user_id_1, min_user_id_2) as min_user_list ,concat_ws(',', max_user_id_1, max_user_id_2) as max_user_list from ( select year ,max(if(asc_rn = 1, user_id, null)) as min_user_id_1 ,max(if(asc_rn = 2, user_id, null)) as min_user_id_2 ,max(if(desc_rn = 1, user_id, null)) as max_user_id_1 ,max(if(desc_rn = 2, user_id, null)) as max_user_id_2 from ( select year ,user_id ,value ,row_number() over(partition by year order by value asc) as asc_rn ,row_number() over(partition by year order by value desc) as desc_rn from user_account ) group by year );
- 返回结果
建表语句
-- drop table user_account; create table if not exists user_account ( year string, user_id string, value bigint ) comment '用户账户表';
示例数据
-- 样例数据 insert into user_account(year, user_id, value) values ('2022', 'A', 30), ('2022', 'B', 10), ('2022', 'C', 20), ('2023', 'A', 40), ('2023', 'B', 50), ('2023', 'C', 20), ('2023', 'D', 30);
参考资料
原创文章,转载请注明出处:http://www.opcoder.cn/article/76/