分析函数用来计算一组行的聚合值,与聚集函数不同的是分析函数为每组行返回多条记录。由分析函数语法定义的行集就叫做窗口,窗口决定了对于当前行而言进行计算的行的范围,窗口的范围则可能基于物理的行数或者逻辑间隔。
函数语法
analytic_function([ arguments ]) OVER (analytic_clause)
- OVER() 是开窗函数,这是开启分析函数的起点
- PARTITION BY 是窗口
- ORDER BY 是窗口规则(排序)
- ROW|RANGE BETWEEN 是窗口范围
analytic_function
analytic_function 指定分析函数的名称,如 RANK 、LAG 等 ...
analytic_clause
analytic_clause 用来确定分析函数的操作规则,包括 query_partition_clause 、order_by_clause 和 windowing_clause 三个子句。
# analytic_clause 语法: [ query_partition_clause ] [ order_by_clause [ windowing_clause ] ]
-
query_partition_clause 记录集分组子句,类似于 GROUP BY 分组
-
order_by_clause 记录集排序子句, 类似于 ORDER BY 排序
-
windowing_clause 记录集窗口范围子句
函数目录
序号 | 函数分类 | 函数名称 | 函数简介 |
1 | 常规统计函数 | MAX | 在一个分组的数据窗口中查找表达式的最大值 |
2 | MIN | 在一个分组的数据窗口中查找表达式的最小值 | |
3 | AVG | 计算分组中表达式的平均值 | |
4 | SUM | 计算分组中表达式的累积和 | |
5 | COUNT | 对一组内发生的事情进行累计计数 | |
6 | 数据排序函数 | RANK | 根据 order by 排序结果,计算组间相对位置,有跳号 |
7 | DENSE_RANK | 根据 order by 排序结果,计算组间相对位置,无跳号 | |
8 | ROW_NUMBER | 返回有序组中一行的偏移量,按特定的顺序分配行号 | |
9 | FIRST | 从 DENSE_RANK 结果中返回最前面的一行 | |
10 | FIRST_VALUE | 返回分组中数据窗口的第一个值 | |
11 | LAST | 从 DENSE_RANK 结果中返回最后面的一行 | |
12 | LAST_VALUE | 返回分组中数据窗口的最后一个值 | |
13 | LAG | 访问结果集中的其他行,即当前行之前的行 | |
14 | LEAD | 访问结果集中的其他行,即当前行之后的行 | |
15 | 数据分布函数 | NTILE | 将一个分组分为"表达式"的散列表示 |
16 | CUME_DIST | 计算一行在分组中的相对位置 | |
17 | PERCENT_RANK | 与 CUME_DIST 类似,计算规则稍有差别 | |
18 | PERCENTILE_CONT | 返回一个与输入的百分比值相对应的数据值 | |
19 | PERCENTILE_DISC | 返回一个与输入的百分比值相对应的数据值 | |
20 | RATIO_TO_REPORT | 计算当前值对应于总数据的百分比 | |
21 | REGR_SLOPE | 线性回归函数(适合最小二乘法回归线) | |
22 | REGR_INTERCEPT | ||
23 | REGR_COUNT | ||
24 | REGR_R2 | ||
25 | REGR_AVGX | ||
26 | REGR_AVGY | ||
27 | REGR_SXX | ||
28 | REGR_SYY | ||
29 | REGR_SXY | ||
30 | 统计分析函数 | CORR | 返回一堆表达式的相对系数 |
31 | COVAR_POP | 返回一堆表达式的总体协方差 | |
32 | COVAR_SAMP | 返回一堆表达式的样本协方差 | |
33 | STDDEV | 计算当前行关于分组的标准偏离 | |
34 | STDDEV_POP | 计算总体标准偏离,并返回总体变量的平方根 | |
35 | STDDEV_SAMP | 计算累计样本标准偏离,并返回总体变量的平方根 | |
36 | VAR_POP | 返回非空集合的总体方差 | |
37 | VAR_SAMP | 返回非空集合的样本方差 | |
38 | VARIANCE | 返回表达式的变量 | |
39 | 其它函数 | LISTAGG | 字符串拼接 |
40 | MEDIAN | 返回一组数据的中间值(中位数) | |
41 | NTH_VALUE | 返回排序后结果集中任意一行 |
窗口范围
windowing_clause
用来指定分组中当前行的计算范围,无论是 rows
还是 range
窗口,窗口总是在分组中从上至下滑动的,窗口范围由 betwen ... and ...
限定,而不用的话都表示窗口到当前行结束。
rows
关键字,指定窗口由 物理行 构成,即物理座位排数;range
关键字,指定窗口由 逻辑偏移量 构成,即符合指定的逻辑条件的范围;between ... and ...
关键字,用来指定窗口的起始点和终结点;unbonded preceding
关键字,指明窗口开始于分组的第一行;current row
如果作为起始点,用来指明窗口开始于当前行或当前值;如果作为终结点,用来指明窗口结束于当前行或当前值;unbonded following
关键字,指明窗口结束于分组的最后一行;value_expr
为物理或逻辑偏移量表达式
默认窗口范围
select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date) sum_sal1, --分组第一行和当前行 sum(t.salary) over(partition by t.dept_id) sum_sal2, --分组第一行和分组最后一行 sum(t.salary) over(order by t.hire_date) sum_sal3 --未分组,结果集第一行和最后一行 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY SUM_SAL1 SUM_SAL2 SUM_SAL3 ---------- --------------- ---------- ---------- ---------- ---------- ---------- ---------- 101 Kochhar Neena 90 1989-09-21 17000 17000 34000 17000 102 De Haan Lex 90 1993-01-13 17000 34000 34000 34000 122 Kaufling Payam 50 1995-05-01 7900 7900 35400 41900 120 Weiss Matthew 50 1996-07-18 8000 15900 35400 49900 121 Fripp Adam 50 1997-04-10 8200 24100 35400 58100 123 Vollman Shanta 50 1997-10-10 6500 30600 35400 64600 124 Mourgos Kevin 50 1999-11-16 4800 35400 35400 69400
rows
rows 窗口,是由分组排序后分组中若干连续的行构成的窗口
rows 窗口目录树:
rows between unbounded preceding and unbounded following
select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between unbounded preceding and unbounded following) rows01 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY ROWS01 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 35400 120 Weiss Matthew 50 1996-07-18 8000 35400 121 Fripp Adam 50 1997-04-10 8200 35400 123 Vollman Shanta 50 1997-10-10 6500 35400 124 Mourgos Kevin 50 1999-11-16 4800 35400 101 Kochhar Neena 90 1989-09-21 17000 34000 102 De Haan Lex 90 1993-01-13 17000 34000
rows between unbounded preceding and current row
/rows unbounded preceding
select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between unbounded preceding and current row) rows02, sum(t.salary) over(partition by t.dept_id order by t.hire_date rows unbounded preceding) rows02 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY ROWS02 ROWS02 ---------- --------------- ---------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 7900 7900 120 Weiss Matthew 50 1996-07-18 8000 15900 15900 121 Fripp Adam 50 1997-04-10 8200 24100 24100 123 Vollman Shanta 50 1997-10-10 6500 30600 30600 124 Mourgos Kevin 50 1999-11-16 4800 35400 35400 101 Kochhar Neena 90 1989-09-21 17000 17000 17000 102 De Haan Lex 90 1993-01-13 17000 34000 34000
rows between unbounded preceding and value_expr preceding
# 该分组的第一行和当前行的前 1 行 select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between unbounded preceding and 1 preceding) rows03 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY ROWS03 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 120 Weiss Matthew 50 1996-07-18 8000 7900 121 Fripp Adam 50 1997-04-10 8200 15900 123 Vollman Shanta 50 1997-10-10 6500 24100 124 Mourgos Kevin 50 1999-11-16 4800 30600 101 Kochhar Neena 90 1989-09-21 17000 102 De Haan Lex 90 1993-01-13 17000 17000
rows between current row and value_expr following
# 该分组的第一行和当前行的后 1 行 select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between current row and 1 following) rows04 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY ROWS04 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 15900 120 Weiss Matthew 50 1996-07-18 8000 16200 121 Fripp Adam 50 1997-04-10 8200 14700 123 Vollman Shanta 50 1997-10-10 6500 11300 124 Mourgos Kevin 50 1999-11-16 4800 4800 101 Kochhar Neena 90 1989-09-21 17000 34000 102 De Haan Lex 90 1993-01-13 17000 17000
rows between current row and unbounded following
select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between current row and unbounded following) rows05 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY ROWS05 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 35400 120 Weiss Matthew 50 1996-07-18 8000 27500 121 Fripp Adam 50 1997-04-10 8200 19500 123 Vollman Shanta 50 1997-10-10 6500 11300 124 Mourgos Kevin 50 1999-11-16 4800 4800 101 Kochhar Neena 90 1989-09-21 17000 34000 102 De Haan Lex 90 1993-01-13 17000 17000
rows between current row and current row
/rows current row
select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between current row and current row) rows06, sum(t.salary) over(partition by t.dept_id order by t.hire_date rows current row) rows06 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY ROWS06 ROWS06 ---------- --------------- ---------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 7900 7900 120 Weiss Matthew 50 1996-07-18 8000 8000 8000 121 Fripp Adam 50 1997-04-10 8200 8200 8200 123 Vollman Shanta 50 1997-10-10 6500 6500 6500 124 Mourgos Kevin 50 1999-11-16 4800 4800 4800 101 Kochhar Neena 90 1989-09-21 17000 17000 17000 102 De Haan Lex 90 1993-01-13 17000 17000 17000
rows between current row and value_expr following
# 当前行和当前行的后 1 行 select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between current row and 1 following) rows07 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY ROWS07 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 15900 120 Weiss Matthew 50 1996-07-18 8000 16200 121 Fripp Adam 50 1997-04-10 8200 14700 123 Vollman Shanta 50 1997-10-10 6500 11300 124 Mourgos Kevin 50 1999-11-16 4800 4800 101 Kochhar Neena 90 1989-09-21 17000 34000 102 De Haan Lex 90 1993-01-13 17000 17000
rows between value_expr preceding and unbounded following
# 当前行的前 1 行和分组最后一行 select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between 1 preceding and unbounded following) rows08 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY ROWS08 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 35400 120 Weiss Matthew 50 1996-07-18 8000 35400 121 Fripp Adam 50 1997-04-10 8200 27500 123 Vollman Shanta 50 1997-10-10 6500 19500 124 Mourgos Kevin 50 1999-11-16 4800 11300 101 Kochhar Neena 90 1989-09-21 17000 34000 102 De Haan Lex 90 1993-01-13 17000 34000
rows between value_expr preceding and current row
/rows value_expr preceding
# 当前行的前 1 行和当前行 select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between 1 preceding and current row) rows09 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY ROWS09 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 7900 120 Weiss Matthew 50 1996-07-18 8000 15900 121 Fripp Adam 50 1997-04-10 8200 16200 123 Vollman Shanta 50 1997-10-10 6500 14700 124 Mourgos Kevin 50 1999-11-16 4800 11300 101 Kochhar Neena 90 1989-09-21 17000 17000 102 De Haan Lex 90 1993-01-13 17000 34000
rows between value_expr1 preceding and value_expr2 preceding
# 当前行的前 2 行和当前行的前 1 行(value_expr1 >= value_expr2) select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between 2 preceding and 1 preceding) rows10 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY ROWS10 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 120 Weiss Matthew 50 1996-07-18 8000 7900 121 Fripp Adam 50 1997-04-10 8200 15900 123 Vollman Shanta 50 1997-10-10 6500 16200 124 Mourgos Kevin 50 1999-11-16 4800 14700 101 Kochhar Neena 90 1989-09-21 17000 102 De Haan Lex 90 1993-01-13 17000 17000
rows between value_expr1 preceding and value_expr2 following
# 当前行的前 1 行和当前行的后 1 行 select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between 1 preceding and 1 following) rows11 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY ROWS11 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 15900 120 Weiss Matthew 50 1996-07-18 8000 24100 121 Fripp Adam 50 1997-04-10 8200 22700 123 Vollman Shanta 50 1997-10-10 6500 19500 124 Mourgos Kevin 50 1999-11-16 4800 11300 101 Kochhar Neena 90 1989-09-21 17000 34000 102 De Haan Lex 90 1993-01-13 17000 34000
rows between value_expr following and unbounded following
# 当前行的后 1 行和分组最后一行 select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between 1 following and unbounded following) rows12 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY ROWS12 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 27500 120 Weiss Matthew 50 1996-07-18 8000 19500 121 Fripp Adam 50 1997-04-10 8200 11300 123 Vollman Shanta 50 1997-10-10 6500 4800 124 Mourgos Kevin 50 1999-11-16 4800 101 Kochhar Neena 90 1989-09-21 17000 17000 102 De Haan Lex 90 1993-01-13 17000
rows between value_expr1 following and value_expr2 following
# 当前行的后 1 行和当前行的后 2 行(value_expr1 <= value_expr2) select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date rows between 1 following and 2 following) rows13 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY ROWS13 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 16200 120 Weiss Matthew 50 1996-07-18 8000 14700 121 Fripp Adam 50 1997-04-10 8200 11300 123 Vollman Shanta 50 1997-10-10 6500 4800 124 Mourgos Kevin 50 1999-11-16 4800 101 Kochhar Neena 90 1989-09-21 17000 17000 102 De Haan Lex 90 1993-01-13 17000
range
range 窗口,相当于给 order_by_clause 中的 expr 加一个 where 限定条件,分组中满足条件:
- 当 order by expr asc 时,where expr between a and b
- 当 order by expr desc 时, where expr between b and a
满足以上条件的所有行构成一个逻辑窗口,其中 a 由分组中第 ra 行的值计算而来,b 由分组中第 rb 行的值计算而来,且 ra <= rb
range 窗口目录树(默认升序排序):
range between unbounded preceding and unbounded following
select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date range between unbounded preceding and unbounded following) range01 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY RANGE01 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 35400 120 Weiss Matthew 50 1996-07-18 8000 35400 121 Fripp Adam 50 1997-04-10 8200 35400 123 Vollman Shanta 50 1997-10-10 6500 35400 124 Mourgos Kevin 50 1999-11-16 4800 35400 101 Kochhar Neena 90 1989-09-21 17000 34000 102 De Haan Lex 90 1993-01-13 17000 34000
range between unbounded preceding and current row
/range unbounded preceding
select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date range between unbounded preceding and current row) range02 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY RANGE02 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 7900 120 Weiss Matthew 50 1996-07-18 8000 15900 121 Fripp Adam 50 1997-04-10 8200 24100 123 Vollman Shanta 50 1997-10-10 6500 30600 124 Mourgos Kevin 50 1999-11-16 4800 35400 101 Kochhar Neena 90 1989-09-21 17000 17000 102 De Haan Lex 90 1993-01-13 17000 34000
range between unbounded preceding and value_expr preceding
# 该分组的第一行和当前行hire_date-365天 select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date range between unbounded preceding and 365 preceding) range03 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY RANGE03 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 120 Weiss Matthew 50 1996-07-18 8000 7900 121 Fripp Adam 50 1997-04-10 8200 7900 123 Vollman Shanta 50 1997-10-10 6500 15900 124 Mourgos Kevin 50 1999-11-16 4800 30600 101 Kochhar Neena 90 1989-09-21 17000 102 De Haan Lex 90 1993-01-13 17000 17000
range between unbounded preceding and value_expr following
# 该分组的第一行和当前行hire_date+365天 select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date range between unbounded preceding and 365 following) range04 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY RANGE04 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 7900 120 Weiss Matthew 50 1996-07-18 8000 24100 121 Fripp Adam 50 1997-04-10 8200 30600 123 Vollman Shanta 50 1997-10-10 6500 30600 124 Mourgos Kevin 50 1999-11-16 4800 35400 101 Kochhar Neena 90 1989-09-21 17000 17000 102 De Haan Lex 90 1993-01-13 17000 34000
range between current row and unbounded following
5
range between current row and current row
/range current row
select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date range between current row and current row) range06 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY RANGE06 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 7900 120 Weiss Matthew 50 1996-07-18 8000 8000 121 Fripp Adam 50 1997-04-10 8200 8200 123 Vollman Shanta 50 1997-10-10 6500 6500 124 Mourgos Kevin 50 1999-11-16 4800 4800 101 Kochhar Neena 90 1989-09-21 17000 17000 102 De Haan Lex 90 1993-01-13 17000 17000
range between current row and value_expr following
# 该分组当前行和当前行hire_date+365天 select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date range between current row and 365 following) range07 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY RANGE07 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 7900 120 Weiss Matthew 50 1996-07-18 8000 16200 121 Fripp Adam 50 1997-04-10 8200 14700 123 Vollman Shanta 50 1997-10-10 6500 6500 124 Mourgos Kevin 50 1999-11-16 4800 4800 101 Kochhar Neena 90 1989-09-21 17000 17000 102 De Haan Lex 90 1993-01-13 17000 17000
range between value_expr preceding and unbounded following
# 该分组当前行的值-365天和最后一个值 select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date range between 365 preceding and unbounded following) range08 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY RANGE08 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 35400 120 Weiss Matthew 50 1996-07-18 8000 27500 121 Fripp Adam 50 1997-04-10 8200 27500 123 Vollman Shanta 50 1997-10-10 6500 19500 124 Mourgos Kevin 50 1999-11-16 4800 4800 101 Kochhar Neena 90 1989-09-21 17000 34000 102 De Haan Lex 90 1993-01-13 17000 17000
range between value_expr preceding and current row
/range value_expr preceding
# 该分组当前行的值-365天和当前行的值 select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date range between 365 preceding and current row) range09 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY RANGE09 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 7900 120 Weiss Matthew 50 1996-07-18 8000 8000 121 Fripp Adam 50 1997-04-10 8200 16200 123 Vollman Shanta 50 1997-10-10 6500 14700 124 Mourgos Kevin 50 1999-11-16 4800 4800 101 Kochhar Neena 90 1989-09-21 17000 17000 102 De Haan Lex 90 1993-01-13 17000 17000
range between value_expr1 preceding and value_expr2 preceding
# 该分组当前行的值-365天和当前行的值-30天(value_expr1 >= value_expr2) select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date range between 365 preceding and 30 preceding) range10 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY RANGE10 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 120 Weiss Matthew 50 1996-07-18 8000 121 Fripp Adam 50 1997-04-10 8200 8000 123 Vollman Shanta 50 1997-10-10 6500 8200 124 Mourgos Kevin 50 1999-11-16 4800 101 Kochhar Neena 90 1989-09-21 17000 102 De Haan Lex 90 1993-01-13 17000
range between value_expr1 preceding and value_expr2 following
# 该分组当前行的值hire_date-365天和当前行的值hire_date+365天 select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date range between 365 preceding and 365 following) range11 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY RANGE11 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 7900 120 Weiss Matthew 50 1996-07-18 8000 16200 121 Fripp Adam 50 1997-04-10 8200 22700 123 Vollman Shanta 50 1997-10-10 6500 14700 124 Mourgos Kevin 50 1999-11-16 4800 4800 101 Kochhar Neena 90 1989-09-21 17000 17000 102 De Haan Lex 90 1993-01-13 17000 17000
range between value_expr following and unbounded following
# 该分组当前行的值hire_date+365天和最后一个值 select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date range between 365 following and unbounded following) range12 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY RANGE12 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 27500 120 Weiss Matthew 50 1996-07-18 8000 11300 121 Fripp Adam 50 1997-04-10 8200 4800 123 Vollman Shanta 50 1997-10-10 6500 4800 124 Mourgos Kevin 50 1999-11-16 4800 101 Kochhar Neena 90 1989-09-21 17000 17000 102 De Haan Lex 90 1993-01-13 17000
range between value_expr1 following and value_expr2 following
# 该分组当前行的值hire_date+30天和当前行的值hire_date+365天 select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by t.hire_date range between 30 following and 365 following) range13 from EMPLOYEE t where t.dept_id in (50, 90); EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY RANGE13 ---------- --------------- ---------- ---------- ---------- ---------- 122 Kaufling Payam 50 1995-05-01 7900 120 Weiss Matthew 50 1996-07-18 8000 8200 121 Fripp Adam 50 1997-04-10 8200 6500 123 Vollman Shanta 50 1997-10-10 6500 124 Mourgos Kevin 50 1999-11-16 4800 101 Kochhar Neena 90 1989-09-21 17000 102 De Haan Lex 90 1993-01-13 17000
数据准备
# employee.sql DROP TABLE EMPLOYEE PURGE; CREATE TABLE employee ( EMP_ID NUMBER CONSTRAINT PK_EMPLOYEE PRIMARY KEY, EMP_NAME VARCHAR2(20) NOT NULL, DEPT_ID NUMBER NOT NULL, HIRE_DATE DATE, SALARY NUMBER ); ALTER SESSION SET nls_date_format='YYYY-MM-DD'; INSERT INTO employee VALUES (100,'Wang John',10,'1990-01-01',20000); INSERT INTO employee VALUES (101,'Kochhar Neena',90,'1989-09-21',17000); INSERT INTO employee VALUES (102,'De Haan Lex',90,'1993-01-13',17000); INSERT INTO employee VALUES (103,'Hunold Alexander',60,'1990-01-03',9000); INSERT INTO employee VALUES (104,'Ernst Bruce',60,'1991-05-21',6000); INSERT INTO employee VALUES (105,'Austin David',60,'1997-06-25',4800); INSERT INTO employee VALUES (106,'Pataballa',60,'1998-02-05',4800); INSERT INTO employee VALUES (107,'Lorentz Diana',60,'1999-02-07',4200); INSERT INTO employee VALUES (108,'Greenberg Nancy',100,'1994-08-17',12000); INSERT INTO employee VALUES (109,'Faviet Daniel',100,'1994-08-16',9000); INSERT INTO employee VALUES (110,'Chen John',100,'1997-09-28',8200); INSERT INTO employee VALUES (111,'Sciarra Ismael',100,'1997-09-30',7700); INSERT INTO employee VALUES (112,'Urman Jose Manuel',100,'1998-03-07',7800); INSERT INTO employee VALUES (113,'Popp Luis',100,'1999-12-07',6900); INSERT INTO employee VALUES (114,'Raphaely Den',30,'1994-12-07',11000); INSERT INTO employee VALUES (120,'Weiss Matthew',50,'1996-07-18',8000); INSERT INTO employee VALUES (121,'Fripp Adam',50,'1997-04-10',8200); INSERT INTO employee VALUES (122,'Kaufling Payam',50,'1995-05-01',7900); INSERT INTO employee VALUES (123,'Vollman Shanta',50,'1997-10-10',6500); INSERT INTO employee VALUES (124,'Mourgos Kevin',50,'1999-11-16',4800); INSERT INTO employee VALUES (145,'Russell John',80,'1996-10-01',14000); INSERT INTO employee VALUES (146,'Russell John',80,'1997-01-05',13500); INSERT INTO employee VALUES (147,'Russell John',80,'1997-03-10',12000); INSERT INTO employee VALUES (148,'Russell John',80,'1999-10-15',11000); INSERT INTO employee VALUES (149,'Russell John',80,'2000-01-19',10500); COMMIT;
# sales.sql DROP TABLE sales PURGE; CREATE TABLE sales ( country varchar2(5), sales_month date, sales_number integer, sales_value number(8,0) ); ALTER SESSION SET nls_date_format='YYYY-MM-DD'; insert into sales values ('USA','2008-1-1',1200,'500000.00'); insert into sales values ('USA','2008-2-1',1150,'450000.00'); insert into sales values ('USA','2008-3-1',1300,'520000.00'); insert into sales values ('USA','2008-4-1',1280,'510000.00'); insert into sales values ('USA','2008-5-1',1350,'530000.00'); insert into sales values ('USA','2008-6-1',1400,'535000.00'); insert into sales values ('USA','2008-7-1',1300,'510000.00'); insert into sales values ('USA','2008-8-1',1250,'460000.00'); insert into sales values ('USA','2008-9-1',1400,'530000.00'); insert into sales values ('USA','2008-10-1',1380,'520000.00'); insert into sales values ('USA','2008-11-1',1450,'540000.00'); insert into sales values ('USA','2008-12-1',1500,'545000.00'); insert into sales values ('USA','2009-1-1',1600,'550000.00'); insert into sales values ('USA','2009-2-1',1390,'532000.00'); insert into sales values ('USA','2009-3-1',1730,'570000.00'); insert into sales values ('USA','2009-4-1',1900,'600000.00'); insert into sales values ('USA','2009-5-1',1850,'585000.00'); insert into sales values ('USA','2009-6-1',3800,'780000.00'); insert into sales values ('USA','2009-7-1',1700,'560000.00'); insert into sales values ('USA','2009-8-1',1490,'542000.00'); insert into sales values ('USA','2009-9-1',1830,'580000.00'); insert into sales values ('USA','2009-10-1',2000,'610000.00'); insert into sales values ('USA','2009-11-1',1950,'595000.00'); insert into sales values ('USA','2009-12-1',1900,'590000.00'); commit;
函数用法
常规统计函数
- MAX
# 获取每个部门中工资最高的员工 select * from ( select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, max(t.salary) over(partition by t.dept_id) max_sal from employee t ) where salary = max_sal order by dept_id, emp_id; EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY MAX_SAL --------- -------------------- ---------- ---------- ---------- ---------- 100 Wang John 10 1990-01-01 20000 20000 114 Raphaely Den 30 1994-12-07 11000 11000 121 Fripp Adam 50 1997-04-10 8200 8200 103 Hunold Alexander 60 1990-01-03 9000 9000 145 Russell John 80 1996-10-01 14000 14000 101 Kochhar Neena 90 1989-09-21 17000 17000 102 De Haan Lex 90 1993-01-13 17000 17000 108 Greenberg Nancy 100 1994-08-17 12000 12000
- MIN
# 分析函数中使用 ORDER BY 子句 select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, min(t.salary) over(partition by t.dept_id order by t.hire_date) min_sal from employee t order by t.dept_id, t.hire_date; EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY MIN_SAL ---------- -------------------- ---------- ---------- ---------- ---------- 100 Wang John 10 1990-01-01 20000 20000 114 Raphaely Den 30 1994-12-07 11000 11000 122 Kaufling Payam 50 1995-05-01 7900 7900 120 Weiss Matthew 50 1996-07-18 8000 7900 121 Fripp Adam 50 1997-04-10 8200 7900 123 Vollman Shanta 50 1997-10-10 6500 6500 124 Mourgos Kevin 50 1999-11-16 4800 4800 103 Hunold Alexander 60 1990-01-03 9000 9000 104 Ernst Bruce 60 1991-05-21 6000 6000 105 Austin David 60 1997-06-25 4800 4800 106 Pataballa 60 1998-02-05 4800 4800 107 Lorentz Diana 60 1999-02-07 4200 4200
# 分析函数中不使用 ORDER BY 子句 select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, min(t.salary) over(partition by t.dept_id) min_sal from employee t order by t.dept_id, t.hire_date; EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY MIN_SAL ---------- -------------------- ---------- ---------- ---------- ---------- 100 Wang John 10 1990-01-01 20000 20000 114 Raphaely Den 30 1994-12-07 11000 11000 122 Kaufling Payam 50 1995-05-01 7900 4800 120 Weiss Matthew 50 1996-07-18 8000 4800 121 Fripp Adam 50 1997-04-10 8200 4800 123 Vollman Shanta 50 1997-10-10 6500 4800 124 Mourgos Kevin 50 1999-11-16 4800 4800 103 Hunold Alexander 60 1990-01-03 9000 4200 104 Ernst Bruce 60 1991-05-21 6000 4200 105 Austin David 60 1997-06-25 4800 4200 106 Pataballa 60 1998-02-05 4800 4200 107 Lorentz Diana 60 1999-02-07 4200 4200
差异原因: 1. 不使用 ORDER BY 子句,默认窗口为该分组的第一行和该分组的最后一行,所以 MIN(salary) 会等于分组中的最小值; 2. 使用 ORDER BY 子句,会根据 hire_date 排序,然后获取小于等于当前行的 hire_date 的所有 salary,然后再取出其中的 min(salary)。
- AVG
# 获取部门平均薪水 select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, avg(t.salary) over(partition by t.dept_id) avg_sal from employee t order by t.dept_id; EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY AVG_SAL ---------- -------------------- ---------- ---------- ---------- ---------- 100 Wang John 10 1990-01-01 20000 20000 114 Raphaely Den 30 1994-12-07 11000 11000 121 Fripp Adam 50 1997-04-10 8200 7080 122 Kaufling Payam 50 1995-05-01 7900 7080 124 Mourgos Kevin 50 1999-11-16 4800 7080 120 Weiss Matthew 50 1996-07-18 8000 7080 123 Vollman Shanta 50 1997-10-10 6500 7080 107 Lorentz Diana 60 1999-02-07 4200 5760 106 Pataballa 60 1998-02-05 4800 5760 103 Hunold Alexander 60 1990-01-03 9000 5760 104 Ernst Bruce 60 1991-05-21 6000 5760 105 Austin David 60 1997-06-25 4800 5760
- SUM
# 获取分组内的薪水和(窗口范围为该分区的第一行和当前行) select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, sum(t.salary) over(partition by t.dept_id order by salary range between unbounded preceding and current row) sum_sal from employee t order by t.dept_id, t.salary; EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY SUM_SAL ---------- -------------------- ---------- ---------- ---------- ---------- 100 Wang John 10 1990-01-01 20000 20000 114 Raphaely Den 30 1994-12-07 11000 11000 124 Mourgos Kevin 50 1999-11-16 4800 4800 123 Vollman Shanta 50 1997-10-10 6500 11300 122 Kaufling Payam 50 1995-05-01 7900 19200 120 Weiss Matthew 50 1996-07-18 8000 27200 121 Fripp Adam 50 1997-04-10 8200 35400 107 Lorentz Diana 60 1999-02-07 4200 4200 106 Pataballa 60 1998-02-05 4800 13800 105 Austin David 60 1997-06-25 4800 13800 104 Ernst Bruce 60 1991-05-21 6000 19800 103 Hunold Alexander 60 1990-01-03 9000 28800
- COUNT
# 窗口范围为该分区当前行的 hire_date-365*2 和 当前行的 hire_date+30*2 之间的累计计数 即: 统计当前行2年前至当前行3个月后这段时间内的累计计数。 select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, count(*) over(partition by t.dept_id order by t.hire_date range between 365*2 preceding and 30*3 following) sum_cnt from employee t order by t.dept_id, t.hire_date; EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY SUM_CNT ---------- -------------------- ---------- ---------- ---------- ---------- 100 Wang John 10 1990-01-01 20000 1 114 Raphaely Den 30 1994-12-07 11000 1 122 Kaufling Payam 50 1995-05-01 7900 1 120 Weiss Matthew 50 1996-07-18 8000 2 121 Fripp Adam 50 1997-04-10 8200 3 123 Vollman Shanta 50 1997-10-10 6500 3 124 Mourgos Kevin 50 1999-11-16 4800 1 103 Hunold Alexander 60 1990-01-03 9000 1 104 Ernst Bruce 60 1991-05-21 6000 2 105 Austin David 60 1997-06-25 4800 1 106 Pataballa 60 1998-02-05 4800 2 107 Lorentz Diana 60 1999-02-07 4200 3 说明: 1. preceding("-") 2. following("+") 如 "107 Lorentz Diana" 统计的就是 "1997-02-07 ~ 1999-05-07" 之间的累计计数。
数据排序函数
- RANK
# 根据 ORDER BY 子句中表达式的值,从查询返回的每一行,计算它们与其它行的相对位置(排名跳号不连续) select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, rank() over(partition by t.dept_id order by t.salary) rank from employee t where t.dept_id = 60 order by rank; EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY RANK ---------- -------------------- ---------- ---------- ---------- ---------- 107 Lorentz Diana 60 1999-02-07 4200 1 105 Austin David 60 1997-06-25 4800 2 106 Pataballa 60 1998-02-05 4800 2 104 Ernst Bruce 60 1991-05-21 6000 4 103 Hunold Alexander 60 1990-01-03 9000 5
- DENSE_RANK
# 根据 ORDER BY 子句中表达式的值,从查询返回的每一行,计算它们与其它行的相对位置(排名不跳号连续) select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, dense_rank() over(partition by t.dept_id order by t.salary) dense_rank from employee t where t.dept_id = 60 order by dense_rank; EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY DENSE_RANK ---------- -------------------- ---------- ---------- ---------- ---------- 107 Lorentz Diana 60 1999-02-07 4200 1 105 Austin David 60 1997-06-25 4800 2 106 Pataballa 60 1998-02-05 4800 2 104 Ernst Bruce 60 1991-05-21 6000 3 103 Hunold Alexander 60 1990-01-03 9000 4
- ROW_NUMBER
# 返回有序分组中一行的偏移量 select t.emp_id, t.emp_name, t.dept_id, t.hire_date, t.salary, row_number() over(partition by t.dept_id order by t.salary) row_number from employee t where t.dept_id = 60 order by row_number; EMP_ID EMP_NAME DEPT_ID HIRE_DATE SALARY ROW_NUMBER ---------- -------------------- ---------- ---------- ---------- ---------- 107 Lorentz Diana 60 1999-02-07 4200 1 105 Austin David 60 1997-06-25 4800 2 106 Pataballa 60 1998-02-05 4800 3 104 Ernst Bruce 60 1991-05-21 6000 4 103 Hunold Alexander 60 1990-01-03 9000 5
- FIRST
# 从 DENSE_RANK 返回的结果集中取出排在最前面的一个值的行(可能多行,因为值可能相等) select t.dept_id, min(t.salary) keep(dense_rank first order by t.salary) first, max(t.salary) keep(dense_rank last order by t.salary) last from employee t group by t.dept_id; DEPT_ID FIRST LAST ---------- ---------- ---------- 10 20000 20000 30 11000 11000 50 4800 8200 60 4200 9000 80 10500 14000 90 17000 17000
- FIRST_VALUE
# 返回分组中数据窗口的第一个值 select t.emp_id, t.emp_name, t.dept_id, t.salary, first_value(t.salary) over(partition by t.dept_id order by t.salary) first_val, first_value(t.salary) over(partition by t.dept_id order by t.salary desc) last_val from employee t; EMP_ID EMP_NAME DEPT_ID SALARY FIRST_VAL LAST_VAL ---------- -------------------- ---------- ---------- ---------- ---------- 100 Wang John 10 20000 20000 20000 114 Raphaely Den 30 11000 11000 11000 124 Mourgos Kevin 50 4800 4800 8200 123 Vollman Shanta 50 6500 4800 8200 122 Kaufling Payam 50 7900 4800 8200 120 Weiss Matthew 50 8000 4800 8200 121 Fripp Adam 50 8200 4800 8200 107 Lorentz Diana 60 4200 4200 9000 105 Austin David 60 4800 4200 9000 106 Pataballa 60 4800 4200 9000 104 Ernst Bruce 60 6000 4200 9000 103 Hunold Alexander 60 9000 4200 9000
- LAST
# 从 DENSE_RANK 返回的结果集中取出排在最后面的一个值的行(可能多行,因为值可能相等)
- LAST_VALUE
# 返回分组中数据窗口的最后个值
- LAG
# 在给定的分组中可以访问当前行之前的行,默认偏移量为1 select t.emp_id, t.salary, t.hire_date, lag(salary, 1, 0) over(order by t.hire_date) prev_lag, lag(salary, 2, -1) over(order by t.hire_date) prev_lag2 from employee t where t.dept_id = 60 order by t.hire_date; EMP_ID SALARY HIRE_DATE PREV_LAG PREV_LAG2 ---------- ---------- ---------- ---------- ---------- 103 9000 1990-01-03 0 -1 104 6000 1991-05-21 9000 -1 105 4800 1997-06-25 6000 9000 106 4800 1998-02-05 4800 6000 107 4200 1999-02-07 4800 4800
- LEAD
# 在给定的分组中可以访问当前行之后的行,默认偏移量为1 select t.emp_id, t.salary, t.hire_date, lead(salary, 1, 0) over(order by t.hire_date) next_lead from employee t where t.dept_id = 60 order by t.hire_date; EMP_ID SALARY HIRE_DATE NEXT_LEAD ---------- ---------- ---------- ---------- 103 9000 1990-01-03 6000 104 6000 1991-05-21 4800 105 4800 1997-06-25 4800 106 4800 1998-02-05 4200 107 4200 1999-02-07 0
数据分布函数
- NTILE
# 将一个分组分为 "表达式" 的散列显示。例如表达式为=4,则给分组中的每一行分配一个数(1~4),如果分组中有20行,则给前5行分配1,给下5行分配2等 select t.emp_id, t.emp_name, t.dept_id, t.salary, ntile(4) over(partition by t.dept_id order by t.salary) quartile from employee t where t.dept_id in (60, 100) order by t.dept_id, t.salary, quartile; EMP_ID EMP_NAME DEPT_ID SALARY QUARTILE ---------- -------------------- ---------- ---------- ---------- 107 Lorentz Diana 60 4200 1 105 Austin David 60 4800 1 106 Pataballa 60 4800 2 104 Ernst Bruce 60 6000 3 103 Hunold Alexander 60 9000 4 113 Popp Luis 100 6900 1 111 Sciarra Ismael 100 7700 1 112 Urman Jose Manuel 100 7800 2 110 Chen John 100 8200 2 109 Faviet Daniel 100 9000 3 108 Greenberg Nancy 100 12000 4 说明: 1. 将数据分为部门60和部门100两个分组,部门60有5条数据,因此将另外一个值(5/4余数)分配给 "buckets 1"; 部门100有6条数据,因此将另外两个值(6/4余数)分配给 "buckets 1" 和 "buckets 2"。
- CUME_DIST
# 计算一行在分组中的相对位置,CUME_DIST 返回的值范围是>0到<=1 select t.emp_id, t.emp_name, t.dept_id, t.salary, cume_dist() over(partition by t.dept_id order by t.salary) cume_dist from employee t where t.dept_id = 50 order by t.salary; EMP_ID EMP_NAME DEPT_ID SALARY CUME_DIST ---------- -------------------- ---------- ---------- ---------- 124 Mourgos Kevin 50 4800 .2 123 Vollman Shanta 50 6500 .4 122 Kaufling Payam 50 7900 .6 120 Weiss Matthew 50 8000 .8 121 Fripp Adam 50 8200 1 select t.emp_id, t.emp_name, t.dept_id, t.salary from employee t where t.dept_id = 50 order by t.salary; EMP_ID EMP_NAME DEPT_ID SALARY ---------- -------------------- ---------- ---------- 124 Mourgos Kevin 50 4800 123 Vollman Shanta 50 6500 122 Kaufling Payam 50 7900 120 Weiss Matthew 50 8000 121 Fripp Adam 50 8200 说明: 以 "123 Vollman Shanta" 为例,部门50的员工,有40%的工资低于或等于 Vollman Shanta(含自己)的工资。
- PERCENT_RANK
# 对于一个给定的分组来说,在计算那行的序号时,先减1,然后除以n-1(n为分组中所有的行数),该函数总是返回0~1(包括1)之间的数 select t.emp_id, t.emp_name, t.dept_id, t.salary, percent_rank() over(partition by t.dept_id order by t.salary) pr from employee t where t.dept_id in (10, 50) order by t.dept_id, pr, t.salary; EMP_ID EMP_NAME DEPT_ID SALARY PR ---------- -------------------- ---------- ---------- ---------- 100 Wang John 10 20000 0 124 Mourgos Kevin 50 4800 0 123 Vollman Shanta 50 6500 .25 122 Kaufling Payam 50 7900 .5 120 Weiss Matthew 50 8000 .75 121 Fripp Adam 50 8200 1 说明: 计算每个员工在部门内的工资等级百分比。
- PERCENTILE_CONT
# 返回一个与输入的分布百分比相对应的数据值,根据 PERCENT_RANK 可以获的分布百分比 select t.emp_id, t.emp_name, t.dept_id, t.salary, percent_rank() over(partition by t.dept_id order by t.salary) pr, percentile_cont(0.5) within group (order by t.salary) over(partition by t.dept_id) pd from employee t where t.dept_id = 50 order by t.dept_id, pr, t.salary; EMP_ID EMP_NAME DEPT_ID SALARY PR PD ---------- -------------------- ---------- ---------- ---------- ---------- 124 Mourgos Kevin 50 4800 0 7900 123 Vollman Shanta 50 6500 .25 7900 122 Kaufling Payam 50 7900 .5 7900 120 Weiss Matthew 50 8000 .75 7900 121 Fripp Adam 50 8200 1 7900
- PERCENTILE_DISC
# 返回一个与输入的分布百分比值相对应的数据值,根据 CUME_DIST 可以获取分布百分比,如果找不到就取大于该分布值的下一个值 select t.emp_id, t.dept_id, t.salary, cume_dist() over(partition by t.dept_id order by t.salary) cd, percentile_disc(0.6) within group (order by t.salary) over(partition by t.dept_id) pd, percentile_disc(0.1) within group (order by t.salary) over(partition by t.dept_id) pd2 from employee t where t.dept_id = 50 order by t.dept_id, pd, t.salary; EMP_ID DEPT_ID SALARY CD PD PD2 ---------- ---------- ---------- ---------- ---------- ---------- 124 50 4800 .2 7900 4800 123 50 6500 .4 7900 4800 122 50 7900 .6 7900 4800 120 50 8000 .8 7900 4800 121 50 8200 1 7900 4800 说明: 1. 找到0.6这个分布值,则取值 2. 找不到0.1这个分布在,则取大于分布值的下一个值(0.2)
- RATIO_TO_REPORT
# 该函数计算 expression/(sum(expression)) 的值,它给出相对于总数的百分比 select * from ( select t.emp_id, t.dept_id, t.salary, sum(salary) over() sum_sal, sum(salary) over(partition by t.dept_id) sum_d_sal, round(ratio_to_report(salary) over(), 4) rtr, round(ratio_to_report(salary) over(partition by t.dept_id), 4) rtr2 from employee t ) where dept_id = 60 order by salary; EMP_ID DEPT_ID SALARY SUM_SAL SUM_D_SAL RTR RTR2 ---------- ---------- ---------- ---------- ---------- ---------- ---------- 107 60 4200 241800 28800 .0174 .1458 106 60 4800 241800 28800 .0199 .1667 105 60 4800 241800 28800 .0199 .1667 104 60 6000 241800 28800 .0248 .2083 103 60 9000 241800 28800 .0372 .3125 说明: 1. 计算部门60下每个员工的工资占所有员工工资总额的百分比; 2. 计算部门60下每个员工的工资占部门员工工资总额的百分比。
- REGR_ (Linear Regression) Functions
REGR_ (Linear Regression) Functions 这些线性回归函数适合最小二乘法回归线,有9个不同的函数可使用
统计分析函数
- CORR
# 返回一对表达式的相关系数,通过返回-1~1之间的一个数,相关系数给出了关联的强度,0表示不相关 SELECT t.emp_id, t.dept_id, TO_CHAR((SYSDATE - hire_date) YEAR TO MONTH ) "Yrs-Mns", t.salary, CORR(SYSDATE-hire_date, salary) OVER(PARTITION BY t.dept_id) Correlation FROM employee t WHERE t.dept_id in (50, 80) ORDER BY t.dept_id, t.emp_id; EMP_ID DEPT_ID Yrs-Mns SALARY CORRELATION ---------- ---------- ------- ---------- ----------- 120 50 +22-01 8000 .876156086 121 50 +21-04 8200 .876156086 122 50 +23-03 7900 .876156086 123 50 +20-10 6500 .876156086 124 50 +18-09 4800 .876156086 145 80 +21-10 14000 .912198993 146 80 +21-07 13500 .912198993 147 80 +21-05 12000 .912198993 148 80 +18-10 11000 .912198993 149 80 +18-07 10500 .912198993 说明: 结果集展示了员工在公司工作的时间长短和员工所属部门的薪水之间的相关关系。
- COVAR_POP
# 返回一对表达式的总体协方差,用于表示的是两个变量的总体的误差 select p.product_id, p.supplier_id, round(covar_pop(p.list_price, p.min_price) over(order by p.product_id, p.supplier_id), 4) covar_pop from oe.product_information p where p.category_id = 29 order by p.product_id, p.supplier_id; PRODUCT_ID SUPPLIER_ID COVAR_POP ---------- ----------- ---------- 1774 103088 0 1775 103087 1473.25 1794 103096 1702.7778 1825 103093 1926.25 2004 103086 1591.4 2005 103086 1512.5 2416 103088 1475.9796 2417 103088 1478.7031 2449 103088 1326.8642 3101 103086 1195.2 3170 103089 1590.0744 3171 103089 1718.25
- COVAR_SAMP
# 返回一对表达式的样本协方差,用于表示的是两个变量的总体的误差 select p.product_id, p.supplier_id, round(covar_samp(p.list_price, p.min_price) over(order by p.product_id, p.supplier_id), 4) covar_samp from oe.product_information p where p.category_id = 29 order by p.product_id, p.supplier_id; PRODUCT_ID SUPPLIER_ID COVAR_SAMP ---------- ----------- ---------- 1774 103088 1775 103087 2946.5 1794 103096 2554.1667 1825 103093 2568.3333 2004 103086 1989.25 2005 103086 1815 2416 103088 1721.9762 2417 103088 1689.9464 2449 103088 1492.7222 3101 103086 1328 3170 103089 1749.0818 3171 103089 1874.4545
- STDDEV
# 计算当前行关于分组的标准偏离(标准差) select t.last_name, t.salary, round(stddev(t.salary) over (order by t.hire_date), 4) StdDev from hr.employees t where t.department_id = 30 order by t.last_name, t.salary, StdDev; LAST_NAME SALARY STDDEV ------------ ---------- ---------- Baida 2900 4035.2612 Colmenares 2500 3362.5883 Himuro 2600 3649.2465 Khoo 3100 5586.1436 Raphaely 11000 0 Tobias 2800 4650.0896
- STDDEV_POP
# 该函数计算累积总体标准偏离,并返回总体变量的平方根,其返回值与VAR_POP函数的平方根相同 select t.department_id, t.last_name, t.salary, stddev_pop(t.salary) over (partition by t.department_id) as pop_std from hr.employees t order by t.department_id, t.last_name, t.salary, pop_std; DEPARTMENT_ID LAST_NAME SALARY POP_STD ------------- ------------ ---------- ---------- 10 Whalen 4400 0 20 Fay 6000 3500 20 Hartstein 13000 3500 30 Baida 2900 3069.6091 30 Colmenares 2500 3069.6091 30 Himuro 2600 3069.6091 30 Khoo 3100 3069.6091 30 Raphaely 11000 3069.6091 30 Tobias 2800 3069.6091 40 Mavris 6500 0 说明: 返回样本中员工工资的总体标准差。
- STDDEV_SAMP
# 该函数计算累积样本标准偏离,并返回总体变量的平方根,其返回值与VAR_POP函数的平方根相同 select department_id, last_name, hire_date, salary, stddev_samp(salary) over (partition by department_id order by hire_date rows between unbounded preceding and current row) as cum_sdev from hr.employees order by department_id, last_name, hire_date, salary, cum_sdev; DEPARTMENT_ID LAST_NAME HIRE_DATE SALARY CUM_SDEV ------------- ------------ ---------- ---------- ---------- 10 Whalen 2003-09-17 4400 20 Fay 2005-08-17 6000 4949.74747 20 Hartstein 2004-02-17 13000 30 Baida 2005-12-24 2900 4035.26125 30 Colmenares 2007-08-10 2500 3362.58829 30 Himuro 2006-11-15 2600 3649.2465 30 Khoo 2003-05-18 3100 5586.14357 30 Raphaely 2002-12-07 11000 30 Tobias 2005-07-24 2800 4650.0896 40 Mavris 2002-06-07 6500 说明: 返回员工表中工资的样本标准差。
- VAR_POP
# 该函数返回非空集合的总体方差(忽略null) SUM((expr - (SUM(expr) / COUNT(expr)))2) / COUNT(expr) select t.calendar_month_desc, var_pop(sum(s.amount_sold)) over (order by t.calendar_month_desc) var_pop from sh.sales s, sh.times t where s.time_id = t.time_id and t.calendar_year = 1998 group by t.calendar_month_desc order by t.calendar_month_desc, var_pop; CALENDAR VAR_POP -------- ---------- 1998-01 0 1998-02 2269111326 1998-03 5.5849E+10 1998-04 4.8252E+10 1998-05 6.0020E+10 1998-06 5.4091E+10 1998-07 4.7150E+10 说明: 计算1998年月销售额的累积总体方差。
- VAR_SAMP
# 该函数返回非空集合的样本方差(忽略null) (SUM(expr - (SUM(expr) / COUNT(expr)))2) / (COUNT(expr) - 1) select t.calendar_month_desc, var_samp(sum(s.amount_sold)) over (order by t.calendar_month_desc) var_samp from sh.sales s, sh.times t where s.time_id = t.time_id and t.calendar_year = 1998 group by t.calendar_month_desc order by t.calendar_month_desc, var_samp; CALENDAR VAR_SAMP -------- ---------- 1998-01 1998-02 4538222653 1998-03 8.3774E+10 1998-04 6.4336E+10 1998-05 7.5025E+10 1998-06 6.4909E+10 1998-07 5.5009E+10 说明: 计算1998年月销售额的样本方差。
- VARIANCE
# 该函数返回表达式的变量,如果表达式中行数为1,则返回0,如果表达式中行数大于1,则返回 VAR_SAMP select last_name, salary, round(variance(salary) over (order by hire_date), 4) variance from hr.employees where department_id = 30 order by last_name, salary, variance; LAST_NAME SALARY VARIANCE ------------ ---------- ---------- Baida 2900 16283333.3 Colmenares 2500 11307000 Himuro 2600 13317000 Khoo 3100 31205000 Raphaely 11000 0 Tobias 2800 21623333.3 说明: 返回按雇佣日期排序的部门30的员工工资的累积方差。
其它函数
- LISTAGG
# 在每个分组内,LISTAGG 根据 order by 子句对列值进行排序,将排序后的结果拼接起来(多行合并一行) select department_id dept_id, hire_date, last_name, listagg(last_name, '; ') within group (order by hire_date, last_name) over (partition by department_id) Emp_list from hr.employees where hire_date < '2003-09-01' order by 1, 2, 3; DEPT_ID HIRE_DATE LAST_NAME EMP_LIST ---------- ---------- --------------- -------------------- 30 2002-12-07 Raphaely Raphaely; Khoo 30 2003-05-18 Khoo Raphaely; Khoo 40 2002-06-07 Mavris Mavris 50 2003-05-01 Kaufling Kaufling; Ladwig 50 2003-07-14 Ladwig Kaufling; Ladwig 70 2002-06-07 Baer Baer 90 2001-01-13 De Haan De Haan; King 90 2003-06-17 King De Haan; King 100 2002-08-16 Faviet Faviet; Greenberg 100 2002-08-17 Greenberg Faviet; Greenberg 110 2002-06-07 Gietz Gietz; Higgins 110 2002-06-07 Higgins Gietz; Higgins 说明: 按部门分组, 找出 hire_date < '2003-09-01' 的所有员工,并将其 last_name 拼接起来。
- MEDIAN
# 获取一个数值或日期时间值,并返回中间值或一个内插值,该值在值排序后将成为中间值(忽略null) select t.emp_id, t.dept_id, salary, median(t.salary) over (partition by t.dept_id) "Median by Dept" from employee t where t.dept_id > 60 order by t.dept_id, t.salary; EMP_ID DEPT_ID SALARY Median by Dept ---------- ---------- ---------- -------------- 149 80 10500 12000 148 80 11000 12000 147 80 12000 12000 146 80 13500 12000 145 80 14000 12000 101 90 17000 17000 102 90 17000 17000 113 100 6900 8000 111 100 7700 8000 112 100 7800 8000 110 100 8200 8000 109 100 9000 8000 108 100 12000 8000 说明: 查询结果返回部门中的每个员工的工资中值。 1. 部门80有奇数个员工,所以根据salary排序后,中值=(n/2+1)的数值 2. 部门100有偶数个员工,所以根据salary排序后,中值=((n/2的数值)+(n/2+1的数值))/2
- NTH_VALUE
# 获取排序后结果集中任意一行,根据偏移量获取,而不会仅仅第一行或最后一行 select t.emp_id, t.dept_id, t.salary, nth_value(t.salary, 1) over() "first", nth_value(t.salary, 5) from first over() "fifth", nth_value(t.salary, 2) from last over() "second_desc" from employee t where t.dept_id in (50, 60) order by t.salary; EMP_ID DEPT_ID SALARY first fifth second_desc ---------- ---------- ---------- ---------- ---------- ----------- 107 60 4200 4200 6000 8200 105 60 4800 4200 6000 8200 106 60 4800 4200 6000 8200 124 50 4800 4200 6000 8200 104 60 6000 4200 6000 8200 123 50 6500 4200 6000 8200 122 50 7900 4200 6000 8200 120 50 8000 4200 6000 8200 121 50 8200 4200 6000 8200 103 60 9000 4200 6000 8200 说明(未分组): 1. NTH_VALUE 中的第二个参数是指这个函数取排名第几的记录 2. from first 表示从排名第一往后取(如第五) 3. from last 表示从排名最末往前取(如倒数第二)
# 分组 select t.emp_id, t.dept_id, t.salary, nth_value(t.salary, 1) over(partition by t.dept_id) "first", nth_value(t.salary, 5) from first over(partition by t.dept_id) "fifth", nth_value(t.salary, 2) from last over(partition by t.dept_id) "second_desc" from employee t where t.dept_id in (50, 60) order by t.dept_id, t.salary; EMP_ID DEPT_ID SALARY first fifth second_desc ---------- ---------- ---------- ---------- ---------- ----------- 124 50 4800 4800 8200 8000 123 50 6500 4800 8200 8000 122 50 7900 4800 8200 8000 120 50 8000 4800 8200 8000 121 50 8200 4800 8200 8000 107 60 4200 4200 9000 6000 105 60 4800 4200 9000 6000 106 60 4800 4200 9000 6000 104 60 6000 4200 9000 6000 103 60 9000 4200 9000 6000
分析函数应用
目录
序号 | 应用分类 | 明细分类 |
1 | 现状分析 | 众数分析 |
2 | 80/20 分析 | |
3 | 比重分析 | |
5 | 离散趋势分析 | |
6 | TOPN 分析 | |
7 | 强度分析 | |
8 | 发展分析 | 环比分析 |
9 | 同比分析 | |
10 | 基比分析 |
现状分析
- 众数分析
众数是一组数据中出现次数最多的数值,在统计分布上具有明显集中趋势点的数值,代表数据的一般水平
select salary from ( select salary, rank() over(order by repeat_num desc) rk_repeat_num from ( select t.salary, count(*) repeat_num from employee t group by t.salary ) ) where rk_repeat_num = 1; SALARY ---------- 4800
- 80/20 分析
80/20集合分析源自于帕累托二八定律,也就是80/20法则
select ename, sal, deptno, sal_ratio, commu_ratio, case when commu_ratio >= 0.8 then 0 else 1 end as "80/20分析" from ( select ename, sal, deptno, sal_ratio, sum(sal_ratio) over(order by sal_ratio desc) commu_ratio from ( select t.ename, t.sal, t.deptno, ratio_to_report(t.sal) over() sal_ratio from scott.emp t ) ); ENAME SAL DEPTNO SAL_RATIO COMMU_RATIO 80/20分析 ---------- ---------- ---------- ---------- ----------- ---------- KING 5000 10 .172265289 .172265289 1 FORD 3000 20 .103359173 .378983635 1 SCOTT 3000 20 .103359173 .378983635 1 JONES 2975 20 .102497847 .481481481 1 BLAKE 2850 30 .098191214 .579672696 1 CLARK 2450 10 .084409991 .664082687 1 ALLEN 1600 30 .055124892 .71920758 1 TURNER 1500 30 .051679587 .770887166 1 MILLER 1300 10 .044788975 .815676141 0 WARD 1250 30 .043066322 .901808786 0 MARTIN 1250 30 .043066322 .901808786 0 ADAMS 1100 20 .037898363 .939707149 0 JAMES 950 30 .032730405 .972437554 0 SMITH 800 20 .027562446 1 0 说明: 结果集中 "80/20分析=1" 字段反映了,80%的薪水是由他们构成的。
- 比重分析
比重分析主要用于分析业务整体中各要素在整体中所占的比重,并对这些比重进行排序,`便于决策者了解业务的整体结构
select t.emp_id, t.dept_id, t.hire_date, t.salary, ratio_to_report(t.salary) over() sal_ratio, row_number() over(order by t.salary desc) row_num from employee t; EMP_ID DEPT_ID HIRE_DATE SALARY SAL_RATIO ROW_NUM ---------- ---------- ---------- ---------- ---------- ---------- 100 10 1990-01-01 20000 .082712986 1 101 90 1989-09-21 17000 .070306038 2 102 90 1993-01-13 17000 .070306038 3 145 80 1996-10-01 14000 .05789909 4 146 80 1997-01-05 13500 .055831266 5 147 80 1997-03-10 12000 .049627792 6 108 100 1994-08-17 12000 .049627792 7 148 80 1999-10-15 11000 .045492142 8 114 30 1994-12-07 11000 .045492142 9 149 80 2000-01-19 10500 .043424318 10
- 离散趋势分析
离散趋势分析主要是进行标准差系数(变异系数)的分析,离散分析的主要作用在于分析样本指标的离散程度
在统计学中,我们通常是用标准差系数来体现指标样本的波动情况,标准差系数越大,表示平均值数的代表性越小,反之,则代表性越大
select round(stddev_samp, 4) AS "标准差", round(stddev_samp/avg_sales_number, 4) AS "标准差系数", max_sales_number AS "最大值", min_sales_number AS "最小值", max_sales_number-min_sales_number AS "全距", total_sales_number AS "样本记录数" from ( select distinct stddev_samp(s.sales_number) over() stddev_samp, --累积样本标准偏离 avg(s.sales_number) over() avg_sales_number, max(s.sales_number) over() max_sales_number, min(s.sales_number) over() min_sales_number, count(s.sales_number) over() total_sales_number from sales s ); 标准差 标准差系数 最大值 最小值 全距 样本记录数 --------- ---------- ---------- ---------- ---------- ---------- 530.438 .3256 3800 1150 2650 24
- TOPN 分析
TOPN 分析能快速抽取指标中前N和后N名的数据,满足用户只想看报表中某个指标的最大和最小的几个值的需求
select * from ( select s.country, s.sales_month, s.sales_value, row_number() over(order by s.sales_value) row_num from sales s ) where row_num <= 3 or row_num >= (select count(*) from sales) - 2; COUNT SALES_MONT SALES_VALUE ROW_NUM ----- ---------- ----------- ---------- USA 2008-02-01 450000 1 USA 2008-08-01 460000 2 USA 2008-01-01 500000 3 USA 2009-04-01 600000 22 USA 2009-10-01 610000 23 USA 2009-06-01 780000 24
- 强度分析
用一行指标数据作为基数,计算其它各个指标与基数的比值,用这个值来反映它们之间的差距
强度分析可以用于在不同时间或地区同类数字之间进行比较,产生比较相对指标,用以揭示不同单位之间的差异
select emp_id||'--'||first_emp_id, salary, first_salary, round(salary/first_salary, 2) salary_diff from ( select t.emp_id, t.emp_name, t.dept_id, t.salary, first_value(t.emp_id) over(order by t.emp_id) first_emp_id, --基数数据 first_value(t.salary) over(order by t.emp_id) first_salary --基数数据 from employee t ); EMP_ID||'--'||FIRST_EMP_ID SALARY FIRST_SALARY SALARY_DIFF -------------------------- ---------- ------------ ----------- 100--100 20000 20000 1 101--100 17000 20000 .85 102--100 17000 20000 .85 103--100 9000 20000 .45 104--100 6000 20000 .3 105--100 4800 20000 .24 106--100 4800 20000 .24 107--100 4200 20000 .21 108--100 12000 20000 .6 109--100 9000 20000 .45 110--100 8200 20000 .41 说明: 以 "EMP_ID=100" 作为基数,每一个 "EMP_ID"的 "salary" 和它进行对比,得出比值。
发展分析
- 环比分析
环比就是现在的统计周期和上一个统计周期比较,例如,2018年7月份与2018年6月份相比较
select country, to_char(sales_month, 'YYYYMM') ||' ~ '|| to_char(add_months(sales_month, -1), 'YYYYMM') "环比周期", sales_number "当期销售量", prev_sales_number "环比销售量", round(sales_number/prev_sales_number, 4) as "环比分析" from ( select country, sales_month, sales_number, sales_value, --获取第一个月日期、第一个月的销售量 first_value(sales_month) over(order by sales_month) first_sales_month, first_value(sales_number) over(order by sales_month) first_sales_number, prev_sales_number, prev12_sales_number from ( select s.country, s.sales_month, s.sales_number, s.sales_value, --获取上个月、上年销售量 lag(s.sales_number, 1, 0) over(order by s.sales_month) prev_sales_number, lag(s.sales_number, 12, 0) over(order by s.sales_month) prev12_sales_number from sales s ) where sales_month >= to_date('2009-01-01', 'YYYY-MM-DD') ); COUNTRY 环比周期 当期销售量 环比销售量 环比分析 ---------- ----------------- ---------- ---------- ---------- USA 200901 ~ 200812 1600 1500 1.0667 USA 200902 ~ 200901 1390 1600 .8688 USA 200903 ~ 200902 1730 1390 1.2446 USA 200904 ~ 200903 1900 1730 1.0983 USA 200905 ~ 200904 1850 1900 .9737 USA 200906 ~ 200905 3800 1850 2.0541 USA 200907 ~ 200906 1700 3800 .4474 USA 200908 ~ 200907 1490 1700 .8765 USA 200909 ~ 200908 1830 1490 1.2282 USA 200910 ~ 200909 2000 1830 1.0929 USA 200911 ~ 200910 1950 2000 .975 USA 200912 ~ 200911 1900 1950 .9744 说明: 以 "200912 ~ 200911" 为例, "200912" 为当前统计周期, "200911" 为上一个统计周期(环比周期)。
- 同比分析
同比就是与历史同期相比,例如,2018年8月份与2017年8月份相比较
select country, to_char(sales_month, 'YYYYMM') ||' ~ '|| to_char(add_months(sales_month, -12), 'YYYYMM') "同比周期", sales_number "当期销售量", prev12_sales_number "同比销售量", round(sales_number/prev12_sales_number, 4) as "同比分析" from ( select country, sales_month, sales_number, sales_value, --获取第一个月日期、第一个月的销售量 first_value(sales_month) over(order by sales_month) first_sales_month, first_value(sales_number) over(order by sales_month) first_sales_number, prev_sales_number, prev12_sales_number from ( select s.country, s.sales_month, s.sales_number, s.sales_value, --获取上个月、上年销售量 lag(s.sales_number, 1, 0) over(order by s.sales_month) prev_sales_number, lag(s.sales_number, 12, 0) over(order by s.sales_month) prev12_sales_number from sales s ) where sales_month >= to_date('2009-01-01', 'YYYY-MM-DD') ); COUNTRY 同比周期 当期销售量 同比销售量 同比分析 ---------- --------------- ---------- ---------- ---------- USA 200901 ~ 200801 1600 1200 1.3333 USA 200902 ~ 200802 1390 1150 1.2087 USA 200903 ~ 200803 1730 1300 1.3308 USA 200904 ~ 200804 1900 1280 1.4844 USA 200905 ~ 200805 1850 1350 1.3704 USA 200906 ~ 200806 3800 1400 2.7143 USA 200907 ~ 200807 1700 1300 1.3077 USA 200908 ~ 200808 1490 1250 1.192 USA 200909 ~ 200809 1830 1400 1.3071 USA 200910 ~ 200810 2000 1380 1.4493 USA 200911 ~ 200811 1950 1450 1.3448 USA 200912 ~ 200812 1900 1500 1.2667
- 基比分析(定基比分析)
基比发展速度也叫总速度,是报告期水平与某一个固定时期水平之比,表明这种现象在较长时期内的发展速度
select country, to_char(sales_month, 'YYYYMM') ||' ~ '|| to_char(first_sales_month, 'YYYYMM') "基比周期", sales_number "当期销售量", first_sales_number "基比初始值", round(sales_number/first_sales_number, 4) as "基比分析" from ( select country, sales_month, sales_number, sales_value, --获取第一个月日期、第一个月的销售量 first_value(sales_month) over(order by sales_month) first_sales_month, first_value(sales_number) over(order by sales_month) first_sales_number, prev_sales_number, prev12_sales_number from ( select s.country, s.sales_month, s.sales_number, s.sales_value, --获取上个月、上年销售量 lag(s.sales_number, 1, 0) over(order by s.sales_month) prev_sales_number, lag(s.sales_number, 12, 0) over(order by s.sales_month) prev12_sales_number from sales s ) where sales_month >= to_date('2009-01-01', 'YYYY-MM-DD') ); COUNTRY 基比周期 当期销售量 基比初始值 基比分析 ---------- --------------- ---------- ---------- ---------- USA 200901 ~ 200901 1600 1600 1 USA 200902 ~ 200901 1390 1600 .8688 USA 200903 ~ 200901 1730 1600 1.0813 USA 200904 ~ 200901 1900 1600 1.1875 USA 200905 ~ 200901 1850 1600 1.1563 USA 200906 ~ 200901 3800 1600 2.375 USA 200907 ~ 200901 1700 1600 1.0625 USA 200908 ~ 200901 1490 1600 .9313 USA 200909 ~ 200901 1830 1600 1.1438 USA 200910 ~ 200901 2000 1600 1.25 USA 200911 ~ 200901 1950 1600 1.2188 USA 200912 ~ 200901 1900 1600 1.1875
自定义聚集函数
用户可以通过实现 Oracle 的 Extensibility Framework 中的 ODCIAggregate interface 来创建自定义聚集函数
接口函数
- static function ODCIAggregateInitialize(sctx IN OUT str_conn_type) return number
自定义聚集函数初始化操作,从这儿开始一个聚集函数
- member function ODCIAggregateIterate(self IN OUT str_conn_type, value IN varchar2) return number
自定义聚集函数的最主要步骤,这个函数定义我们聚集函数做什么操作(如取最大值,最小值,平均值),self 为当前聚集函数的指针,用来与前面的计算结果进行关联
- member function ODCIAggregateMerge(self IN OUT str_conn_type, ctx2 str_conn_type) return number
用来合并两个聚集函数的两个不同指针对应的结果
- member function ODCIAggregateTerminate(self IN str_conn_type, returnValue OUT varchar2, flags IN number) return number
终止聚集函数的处理,返回聚集函数处理的结果
创建自定义聚集函数
- 创建 OBJECT TYPE
create or replace type secmax_context as object( firmax number, -- 保存最大值 secmax number, -- 保存第二大值 static function odciaggregateinitialize(sctx in out secmax_context) return number, --初始化函数 member function odciaggregateiterate(self in out secmax_context,value in number) return number, --迭代运算函数 member function odciaggregatemerge(self in out secmax_context, ctx2 in secmax_context)return number, --合并两个上下文到一个上下文中 member function odciaggregateterminate(self in secmax_context, returnvalue out number, flags in number) return number --对结果进行处理并返回处理结果 );
- 实现该 OBJECT TYPE
create or replace type body secmax_context is static function ODCIAggregateInitialize(sctx IN OUT secmax_context) return number is begin sctx := secmax_context(0, 0); return ODCIConst.Success; end; member function ODCIAggregateIterate(self IN OUT secmax_context, value IN number) return number is begin if value > self.firmax then self.secmax := self.firmax; self.firmax := value; elsif value > self.secmax then self.secmax := value; end if; return ODCIConst.Success; end; member function ODCIAggregateTerminate(self IN secmax_context, returnValue OUT number, flags IN number) return number is begin returnValue := self.secmax; return ODCIConst.Success; end; member function ODCIAggregateMerge(self IN OUT secmax_context, ctx2 IN secmax_context) return number is begin if ctx2.firmax > self.firmax then if ctx2.secmax > self.firmax then self.secmax := ctx2.secmax; else self.secmax := self.firmax; end if; self.firmax := ctx2.firmax; elsif ctx2.firmax > self.secmax then self.secmax := ctx2.firmax; end if; return ODCIConst.Success; end; end;
- 创建聚合函数
create or replace function f_secmax (input number) return number parallel_enable aggregate using secmax_context;
- 数据测试
select f_secmax(sal) from scott.emp; F_SECMAX(SAL) ------------- 3000 select * from ( select t.emp_id, t.dept_id, t.hire_date, t.salary, f_secmax(t.salary) over(partition by t.dept_id) max_sal from employee t ) where salary = max_sal order by dept_id, emp_id; EMP_ID DEPT_ID HIRE_DATE SALARY MAX_SAL ---------- ---------- ---------- ---------- ---------- 120 50 1996-07-18 8000 8000 104 60 1991-05-21 6000 6000 146 80 1997-01-05 13500 13500 101 90 1989-09-21 17000 17000 102 90 1993-01-13 17000 17000 109 100 1994-08-16 9000 9000
原创文章,转载请注明出处:http://www.opcoder.cn/article/3/
qingshan
1楼 - 6 年,2 月 之前