什么是model子句
model子句提供了一种很好的替代电子表格的方法。model子句可以使用sql语句中一些很强大的功能,如聚合,并行,以及多维,多变量分析。
model子句可以建立一定维度数的数据矩阵或模型。模型使用了from子句中列出的表中可以列的子集,至少具有一个维度,一个量值,或者还可以有一个或多个分区。可以将模型看作是每个计算值具有单独工作表的电子表格文件。工作表有一个x轴和一个y轴(两个维度)。
定义好模型之后,就可以创建修改量值的规则了。这些规则是model子句的关键所在。通过几种规则,就可以在数据上进行复杂计算,甚至创建新的数据行。量值列现在就是通过维列进行索引的数组,其中的规则应用于数组中所有分区。在所有规则都应用后,模型就重新转换为传统的数据行。
model是sql语言应用的一个扩展,近似于oracle数据库的可扩展性。超过几百万行数据的多维,多变量计算,如果不是10亿级数据量,都可以很容易地通过model子句来实现。同时,很多数据库特性如对象分区以及并行执行都可以在model子句中高效地应用,从而进一步提高了可扩展性。
准备测试数据
--非规范化sales_fact表
drop table sales_fact;
create table sales_fact AS
select c.country_name country,c.country_subRegion region,p.prod_name product,
t.calendar_year year,t.calendar_week_number week,
sum(s.amount_sold) sale,
sum(s.amount_sold*
(
case
when mod(rownum,10)=0 then 1.4
when mod(rownum,5)=0 then 0.6
when mod(rownum,2)=0 then 0.9
when mod(rownum,2)=1 then 1.2
else 1
end
)
) receipts
from sh.sales s,sh.times t,sh.customers cu,sh.countries c, sh.products p
where s.time_id=t.time_id
and s.prod_id=p.prod_id
and s.cust_id=cu.cust_id
and cu.country_id=c.country_id
group by c.country_name,c.country_subregion,p.prod_name,t.calendar_year,t.calendar_week_number;
select * from sales_fact where rownum<50;
下面的sql可以生成一个计算sales_fact表中每年各周的库存的电子表格。这个使用model子句的sql,是对前面电子表格功能进行仿真。
--使用model子句进行库存公式计算
col product fommat A30
col country fommat A10
col region fommat A10
col year fommat 9999
col week fommat 99
col sale fommat 999999
lines 120 pages 100
select product,
country,
year,
week,
inventory,
sale,
receipts
from sales_fact sf
where sf.country in ('Australia')
and sf.product = 'Xtend Memory' model return updated rows
partition by(product, country) dimension by(year, week)
measures(0 inventory, sale, receipts) rules automatic
order(inventory [ year, week ] = nvl(inventory [ cv(year), cv(week) - 1 ], 0) - sale [ cv(year), cv(week) ] + receipts [ cv(year), cv(week) ])
order by product, country, year, week;
PRODUCT COUNTRY YEAR WEEK INVENTORY SALE RECEIPTS
-------------------------------------------------- ---------------------------------------- ---------- ---------- ---------- ---------- ----------
Xtend Memory Australia 1998 1 8.88 58.15 67.03
Xtend Memory Australia 1998 2 14.758 29.39 35.268
Xtend Memory Australia 1998 3 20.656 29.49 35.388
Xtend Memory Australia 1998 4 8.86 29.49 17.694
Xtend Memory Australia 1998 5 14.82 29.8 35.76
Xtend Memory Australia 1998 6 8.942 58.78 52.902
Xtend Memory Australia 1998 9 2.939 58.78 61.719
Xtend Memory Australia 1998 10 0.01 117.76 114.831
Xtend Memory Australia 1998 12 -14.9 59.6 44.7
Xtend Memory Australia 1998 14 11.756 58.78 70.536
Xtend Memory Australia 1998 15 5.878 58.78 52.902
Xtend Memory Australia 1998 17 11.756 58.78 70.536
Xtend Memory Australia 1998 18 8.817 117.56 114.621
Xtend Memory Australia 1998 19 2.919 58.98 53.082
Xtend Memory Australia 1998 21 2.98 59.6 62.58
Xtend Memory Australia 1998 23 -11.756 117.56 105.804
Xtend Memory Australia 1998 26 11.756 117.56 129.316
Xtend Memory Australia 1998 27 14.632 57.52 60.396
Xtend Memory Australia 1998 28 0.202 57.72 43.29
Xtend Memory Australia 1998 29 -14.228 57.72 43.29
关键字model return updated rows
声明使用model子句。在使用model子句的sql中,有3组列:分区列,维度列,度量会值列。分区列类似于电子表格中的一张工作表。维度列类似于行标签和列标签。度量值类似于含有公式的单元格。
子句 | 说明 |
---|---|
partition by(product, country) |
将product和country这两列指定为分区列。 |
dimension by(year, week) |
列指定为维度列。 |
measures(0 inventory, sale, receipts) |
将inventory,sales,receipts列指定为度量值列。 |
order(inventory [ year, week ] = nvl(inventory [ cv(year), cv(week) - 1 ], 0) - sale [ cv(year), cv(week) ] + receipts [ cv(year), cv(week) ]) |
规则类似于一个公式 |
model子句实现了分区数组,维度列是指向数组元素的索引。每个数组元素,也称为单元格,也就是一个度量列值。
分区列值相同的所有行被认为是在同一个分区中。这个例子中,所有产品和国家值相同的行在一个分区中,在一个分区中,维度列唯一辩识每一行。
cv表示现值,可以用来表示从规则左侧计算得来的规则右侧的列值。
如规则左侧的year和week列的值为(2001,3),规则右侧的cv(year)子句所指的值为规则左侧year值的值,也就是2001,类似地,cv(week)子句指的是规则左侧week列的值也就是3。因此,inventory [ cv(year), cv(week) - 1 ]
子句将返回2001年中前1周也就是第2周的库存度量值。类似,sale [ cv(year), cv(week) ]
和 receipts [ cv(year), cv(week) ]
指的是使用c函数计算的2001年第3周的sale列和receipts列的值。在规则中并未声明分区列product和country,规则隐式地为product和country列引用当前分区中的值。
位置标记
--使用位置引用初始华2002年的值 --- upsert
select product,
country,
year,
week,
inventory,
sale,
receipts
from sales_fact sf
where sf.country in ('Australia')
and sf.product = 'Xtend Memory'
model return updated rows
partition by(product, country)
dimension by(year, week)
measures(0 inventory, sale, receipts) rules automatic
order(
inventory [ year, week ] = nvl(inventory [ cv(year), cv(week) - 1 ], 0) - sale [ cv(year), cv(week) ] + receipts [ cv(year), cv(week) ],
sale[2002,1]=0,
receipts[2002,1]=0
)
order by product, country, year, week;
PRODUCT COUNTRY YEAR WEEK INVENTORY SALE RECEIPTS
-------------------------------------------------- ---------------------------------------- ---------- ---------- ---------- ---------- ----------
Xtend Memory Australia 2001 38 -7.795 139 143.384
Xtend Memory Australia 2001 39 5.687 115.57 129.052
Xtend Memory Australia 2001 40 12.174 45.18 51.667
Xtend Memory Australia 2001 41 12.058 67.19 67.074
Xtend Memory Australia 2001 42 6.426 136.98 131.348
Xtend Memory Australia 2001 43 4.053 139.58 137.207
Xtend Memory Australia 2001 44 8.711 23.29 27.948
Xtend Memory Australia 2001 46 2.357 93.58 95.937
Xtend Memory Australia 2001 48 2.314 182.96 185.274
Xtend Memory Australia 2001 49 4.772 45.26 47.718
Xtend Memory Australia 2001 50 9.4 23.14 27.768
Xtend Memory Australia 2001 51 4.86 114.82 110.28
Xtend Memory Australia 2001 52 14.116 23.14 32.396
Xtend Memory Australia 2002 1 0 0 0
位置标记能够在结果集中插入一个新单元格或更新一个己有单元格。如果所引用的单元格在结果集中存在,则会更新单元格的值;如果不存在,则会增加一个新的单元格。这种存在则更新,不存在则插入的概念被称为upsert特性,是update和insert功能的融合版本,位置标记提供了upsert的能力。
符号标记
符号标记能够在规则左侧声明一定的范围值。
--符号引用 --- upsert
select product,
country,
year,
week,
sale
from sales_fact sf
where sf.country in ('Australia')
and sf.product = 'Xtend Memory'
model return updated rows
partition by(product, country)
dimension by(year, week)
measures(sale)
rules(
sale[year in (2000,2001), week in (1,52,53)] order by year,week
=sale[cv(year),cv(week)]*1.10
)
order by product, country, year, week;
PRODUCT COUNTRY YEAR WEEK SALE
-------------------------------------------------- ---------------------------------------- ---------- ---------- ----------
Xtend Memory Australia 2000 1 51.37
Xtend Memory Australia 2000 52 74.195
Xtend Memory Australia 2001 1 101.486
Xtend Memory Australia 2001 52 25.454
将2001年和2002年第1,52,53周的sale列的值更新为实际值的110%。year in (2000,2001)
使用in运算符指定year列的值列表。类似地,week in (1,52,53)
子句指定week列的值列表。
没有数据行满足week列为53周的要求,并且在结果集中对于week=53也没有新行被加入或更新。生成新行的能力是符号标记和位置标记的最主要的区别。符号标记权提供了update的能力,而位置标记提供了upsert的功能。
for循环
--符号引用 model与for循环
select product,country, year,week,inventory, sale,receipts
from sales_fact sf
where sf.country in ('Australia') and sf.product = 'Xtend Memory'
model return updated rows
partition by(product, country)
dimension by(year, week)
measures(0 inventory, sale, receipts)
rules automatic order(
inventory[year,week]=nvl(inventory[cv(year),cv(week)-1],0)-sale[cv(year),cv(week)]+receipts[cv(year),cv(week)],
sale[2002, for week from 1 to 53 increment 1]=0,
receipts[2002, for week from 1 to 53 increment 1]=0
)
order by product, country, year, week;
PRODUCT COUNTRY YEAR WEEK INVENTORY SALE RECEIPTS
-------------------------------------------------- ---------------------------------------- ---------- ---------- ---------- ---------- ----------
Xtend Memory Australia 2001 38 -7.795 139 143.384
Xtend Memory Australia 2001 39 5.687 115.57 129.052
Xtend Memory Australia 2001 40 12.174 45.18 51.667
Xtend Memory Australia 2001 41 12.058 67.19 67.074
Xtend Memory Australia 2001 42 6.426 136.98 131.348
Xtend Memory Australia 2001 43 4.053 139.58 137.207
Xtend Memory Australia 2001 44 8.711 23.29 27.948
Xtend Memory Australia 2001 46 2.357 93.58 95.937
Xtend Memory Australia 2001 48 2.314 182.96 185.274
Xtend Memory Australia 2001 49 4.772 45.26 47.718
Xtend Memory Australia 2001 50 9.4 23.14 27.768
Xtend Memory Australia 2001 51 4.86 114.82 110.28
Xtend Memory Australia 2001 52 14.116 23.14 32.396
Xtend Memory Australia 2002 1 0 0 0
Xtend Memory Australia 2002 2 0 0 0
Xtend Memory Australia 2002 3 0 0 0
Xtend Memory Australia 2002 4 0 0 0
Xtend Memory Australia 2002 5 0 0 0
Xtend Memory Australia 2002 6 0 0 0
Xtend Memory Australia 2002 7 0 0 0
Xtend Memory Australia 2002 8 0 0 0
for循环允许指定规则左侧的值列表。for循环只可以定义在规则的左侧,用来将新的单元格加入到输出中,不能在规则的右侧使用。
语法:
for dimension for to
[increment | decrement]
返回更新后的行
--没有return updated rows的sql语句
select product,country,year,week,sale
from sales_fact sf
where sf.country in ('Australia') and sf.product = 'Xtend Memory'
model --return updated rows
partition by(product, country)
dimension by(year, week)
measures(sale)
rules(
sale[year in (2000,2001), week in (1,52,53)] order by year,week
=sale[cv(year),cv(week)]*1.10
)
order by product, country, year, week;
PRODUCT COUNTRY YEAR WEEK INVENTORY SALE RECEIPTS
-------------------------------------------------- ---------------------------------------- ---------- ---------- ---------- ---------- ----------
Xtend Memory Australia 1998 1 8.88 58.15 67.03
Xtend Memory Australia 1998 2 14.758 29.39 35.268
Xtend Memory Australia 1998 3 20.656 29.49 35.388
Xtend Memory Australia 1998 4 8.86 29.49 17.694
Xtend Memory Australia 1998 5 14.82 29.8 35.76
Xtend Memory Australia 1998 6 8.942 58.78 52.902
Xtend Memory Australia 1998 9 2.939 58.78 61.719
Xtend Memory Australia 1998 10 0.01 117.76 114.831
Xtend Memory Australia 1998 12 -14.9 59.6 44.7
Xtend Memory Australia 1998 14 11.756 58.78 70.536
Xtend Memory Australia 1998 15 5.878 58.78 52.902
Xtend Memory Australia 1998 17 11.756 58.78 70.536
Xtend Memory Australia 1998 18 8.817 117.56 114.621
Xtend Memory Australia 1998 19 2.919 58.98 53.082
Xtend Memory Australia 1998 21 2.98 59.6 62.58
Xtend Memory Australia 1998 23 -11.756 117.56 105.804
Xtend Memory Australia 1998 26 11.756 117.56 129.316
Xtend Memory Australia 1998 27 14.632 57.52 60.396
Xtend Memory Australia 1998 28 0.202 57.72 43.29
Xtend Memory Australia 1998 29 -14.228 57.72 43.29
上面的sql返回了159行,而之前的例子仅返回了4行,return updated rows
控制了这一行为并能够限制sql语句所返回的单元格。如果没有这个子句,不管规则有没有更新这些行,所有的数据行都会返回。
return updated rows
子句同样适用于使用位置标记的语句,如下面的例子:
--return updated rows与upsert
select product,country,year,week,sale
from sales_fact sf
where sf.country in ('Australia') and sf.product = 'Xtend Memory'
model return updated rows
partition by(product, country)
dimension by(year, week)
measures(sale)
rules(
sale[2002,1]=0
)
order by product, country, year, week;
运行结果
PRODUCT COUNTRY YEAR WEEK SALE
-------------------------------------------------- ---------------------------------------- ---------- ---------- ----------
Xtend Memory Australia 2002 1 0
求解顺序
在规则部分可以声明多个规则,并且规则可以声明相互之间的依赖关系。不仅如此,即使在一个单独的规则中,规则的求解也必须要按照一定的逻辑顺序进行。
--产生错误ORA-32637的顺序
select product,country,year,week,inventory,sale,receipts
from sales_fact sf
where sf.country in ('Australia')
model return updated rows
partition by(product, country)
dimension by(year, week)
measures(0 inventory,sale, receipts)
rules -- automatic order
(
inventory[year,week]=nvl(inventory[cv(year),cv(week)-1],0)-sale[cv(year),cv(week)]+receipts[cv(year),cv(week)]
)
order by product, country, year, week;
ORA-32637: 顺序排序 MODEL 中的自循环规则
将automatic order
注释掉,强制使用了sequential order的默认行为。该规则通过inventory[cv(year),cv(week)-1]
子句进行了跨行引用。库存列的值必须按照周的升序进行计算。前一周的库存规则必须在当前周的库存规则之前求解。通过automatic order
,数据库引擎确定了行依赖关系并严格按照依赖关系的顺序对行进行求解。如果没有automatic order
,行求解顺序就不能确定,这将会导致ORA-32637错误。
显式声明行求解顺序以避免这个错误是一种更好的实践方式。
--单元格级的求值顺序
select product,country,year,week,inventory,sale,receipts
from sales_fact sf
where sf.country in ('Australia') and product in ('Xtend Memory')
model return updated rows
partition by(product, country)
dimension by(year, week)
measures(0 inventory,sale, receipts)
rules -- automatic order
(
inventory[year,week] order by year,week =nvl(inventory[cv(year),cv(week)-1],0)-sale[cv(year),cv(week)]+receipts[cv(year),cv(week)]
)
order by product, country, year, week;
PRODUCT COUNTRY YEAR WEEK INVENTORY SALE RECEIPTS
-------------------------------------------------- ---------------------------------------- ---------- ---------- ---------- ---------- ----------
Xtend Memory Australia 1998 1 8.88 58.15 67.03
Xtend Memory Australia 1998 2 14.758 29.39 35.268
Xtend Memory Australia 1998 3 20.656 29.49 35.388
Xtend Memory Australia 1998 4 8.86 29.49 17.694
Xtend Memory Australia 1998 5 14.82 29.8 35.76
Xtend Memory Australia 1998 6 8.942 58.78 52.902
Xtend Memory Australia 1998 9 2.939 58.78 61.719
Xtend Memory Australia 1998 10 0.01 117.76 114.831
Xtend Memory Australia 1998 12 -14.9 59.6 44.7
Xtend Memory Australia 1998 14 11.756 58.78 70.536
Xtend Memory Australia 1998 15 5.878 58.78 52.902
Xtend Memory Australia 1998 17 11.756 58.78 70.536
Xtend Memory Australia 1998 18 8.817 117.56 114.621
Xtend Memory Australia 1998 19 2.919 58.98 53.082
Xtend Memory Australia 1998 21 2.98 59.6 62.58
Xtend Memory Australia 1998 23 -11.756 117.56 105.804
Xtend Memory Australia 1998 26 11.756 117.56 129.316
Xtend Memory Australia 1998 27 14.632 57.52 60.396
Xtend Memory Australia 1998 28 0.202 57.72 43.29
Xtend Memory Australia 1998 29 -14.228 57.72 43.29
在规则部分,通过order by year,week
子句显式声明了行求解顺,表示必须按照year和week列值的升序求解。
--使用desc关键字的求解顺序
select product,country,year,week,inventory,sale,receipts
from sales_fact sf
where sf.country in ('Australia') and product in ('Xtend memory')
model return updated rows
partition by(product, country)
dimension by(year, week)
measures(0 inventory,sale, receipts)
rules
(
inventory[year,week] order by year,week desc =nvl(inventory[cv(year),cv(week)-1],0)-sale[cv(year),cv(week)]+receipts[cv(year),cv(week)]
)
order by product, country, year, week;
上面的尽管语法是对的,但与需求不一致。
规则求解顺序
除了行求解顺序外,还需要面对所应用的规则求解顺序问题。
--规则求值顺序--顺序求值
select * from (
select product,country,year,week,inventory,sale,receipts
from sales_fact sf
where sf.country in ('Australia') and product in ('Xtend memory')
model return updated rows
partition by (product, country)
dimension by (year, week)
measures (0 inventory,sale, receipts)
rules sequential order
(
inventory[year,week] order by year,week =nvl(inventory[cv(year),cv(week)-1],0)-sale[cv(year),cv(week)]+receipts[cv(year),cv(week)],
receipts[year in (2000,2001),week in (51,52,53)] order by year,week =receipts[cv(year),cv(week)]*10
)
order by product, country, year, week
) where week>50;
PRODUCT COUNTRY YEAR WEEK INVENTORY SALE RECEIPTS
-------------------------------------------------- ---------------------------------------- ---------- ---------- ---------- ---------- ----------
Xtend Memory Australia 1998 1 8.88 58.15 67.03
Xtend Memory Australia 1998 2 14.758 29.39 35.268
Xtend Memory Australia 1998 3 20.656 29.49 35.388
Xtend Memory Australia 1998 4 8.86 29.49 17.694
Xtend Memory Australia 1998 5 14.82 29.8 35.76
Xtend Memory Australia 1998 6 8.942 58.78 52.902
Xtend Memory Australia 1998 9 2.939 58.78 61.719
Xtend Memory Australia 1998 10 0.01 117.76 114.831
Xtend Memory Australia 1998 12 -14.9 59.6 44.7
Xtend Memory Australia 1998 14 11.756 58.78 70.536
Xtend Memory Australia 1998 15 5.878 58.78 52.902
Xtend Memory Australia 1998 17 11.756 58.78 70.536
Xtend Memory Australia 1998 18 8.817 117.56 114.621
Xtend Memory Australia 1998 19 2.919 58.98 53.082
Xtend Memory Australia 1998 21 2.98 59.6 62.58
Xtend Memory Australia 1998 23 -11.756 117.56 105.804
Xtend Memory Australia 1998 26 11.756 117.56 129.316
Xtend Memory Australia 1998 27 14.632 57.52 60.396
Xtend Memory Australia 1998 28 0.202 57.72 43.29
Xtend Memory Australia 1998 29 -14.228 57.72 43.29
sequential order
指定了规则按照其在列表中的先后顺序进行求解。
--规则求值顺序--自动求值
select * from (
select product,country,year,week,inventory,sale,receipts
from sales_fact sf
where sf.country in ('Australia') and product in ('Xtend Memory')
model return updated rows
partition by (product, country)
dimension by (year, week)
measures (0 inventory,sale, receipts)
rules sequential order
(
inventory[year,week] order by year,week =nvl(inventory[cv(year),cv(week)-1],0)-sale[cv(year),cv(week)]+receipts[cv(year),cv(week)],
receipts[year in (2000,2001),week in (51,52,53)] order by year,week =receipts[cv(year),cv(week)]*10
)
order by product, country, year, week
) where week>50;
PRODUCT COUNTRY YEAR WEEK INVENTORY SALE RECEIPTS
-------------------------------------------------- ---------------------------------------- ---------- ---------- ---------- ---------- ----------
Xtend Memory Australia 1998 51 0.04 58.32 61.236
Xtend Memory Australia 1998 52 5.812 86.38 92.152
Xtend Memory Australia 1999 53 -2.705 27.05 24.345
Xtend Memory Australia 2000 52 -1.383 67.45 660.67
Xtend Memory Australia 2001 51 4.86 114.82 1102.8
Xtend Memory Australia 2001 52 14.116 23.14 323.96
上面两个sql的结果是不匹配的。automatic order
允许数据库引擎自动识别规则之间的依赖关系。因此,数据库引擎首先对receipts规则求解,然后是inventory规则。规则的求解顺序是非常重要的,如果存在很复杂的相互依赖性,需要指定automatic order
并按照严格求解顺序依次列出规则。
聚合
在数据仓库的查询中经常会用到数据的聚合运算。model子句可以在一定范围的维度列上使用聚合函数从而实现数据聚合。许多不同的聚合函数调用如sum,max,avg,stddev以及olap函数调用都可以用来进行规则中的数据的聚合。
--聚合
select product,country,year,week,inventory,avg_inventory,max_sale
from sales_fact sf
where sf.country in ('Australia') and sf.product ='Xtend Memory'
model return updated rows
partition by (product, country)
dimension by (year, week)
measures (0 inventory,0 avg_inventory,0 max_sale, sale, receipts)
rules automatic order(
inventory[year,week] =nvl(inventory[cv(year),cv(week)-1],0)-sale[cv(year),cv(week)]+receipts[cv(year),cv(week)],
avg_inventory[year,ANY]= avg(inventory)[cv(year),week],
max_sale[year,ANY]= avg(sale)[cv(year),week]
)
order by product, country, year, week;
PRODUCT COUNTRY YEAR WEEK INVENTORY AVG_INVENTORY MAX_SALE
-------------------------------------------------- ---------------------------------------- ---------- ---------- ---------- ------------- ----------
Xtend Memory Australia 1998 1 8.88 -0.7254166666 71.165
Xtend Memory Australia 1998 2 14.758 -0.7254166666 71.165
Xtend Memory Australia 1998 3 20.656 -0.7254166666 71.165
Xtend Memory Australia 1998 4 8.86 -0.7254166666 71.165
Xtend Memory Australia 1998 5 14.82 -0.7254166666 71.165
Xtend Memory Australia 1998 6 8.942 -0.7254166666 71.165
Xtend Memory Australia 1998 9 2.939 -0.7254166666 71.165
Xtend Memory Australia 1998 10 0.01 -0.7254166666 71.165
Xtend Memory Australia 1998 12 -14.9 -0.7254166666 71.165
Xtend Memory Australia 1998 14 11.756 -0.7254166666 71.165
Xtend Memory Australia 1998 15 5.878 -0.7254166666 71.165
Xtend Memory Australia 1998 17 11.756 -0.7254166666 71.165
Xtend Memory Australia 1998 18 8.817 -0.7254166666 71.165
Xtend Memory Australia 1998 19 2.919 -0.7254166666 71.165
Xtend Memory Australia 1998 21 2.98 -0.7254166666 71.165
Xtend Memory Australia 1998 23 -11.756 -0.7254166666 71.165
Xtend Memory Australia 1998 26 11.756 -0.7254166666 71.165
Xtend Memory Australia 1998 27 14.632 -0.7254166666 71.165
Xtend Memory Australia 1998 28 0.202 -0.7254166666 71.165
Xtend Memory Australia 1998 29 -14.228 -0.7254166666 71.165
迭代
迭代是另一种使用简洁的model sql语句来实现复杂业务的功能。迭代意味着一段规则代码能够在循环中执行一定的次数或者当条件保持为真时执行。
语法:
[iterate (n) [until ] ]
( = ... )
--迭代
select year,week,sale,sale_list
from sales_fact sf
where sf.country in ('Australia') and sf.product ='Xtend Memory'
model return updated rows
partition by (product, country)
dimension by (year, week)
measures ( cast(' ' as varchar2(50) ) sale_list, sale )
rules iterate(5)(
sale_list[year,week] order by year,week =sale[cv(year),cv(week)-iteration_number+2] ||
case when iteration_number=0 then '' else ',' end ||
sale_list [cv(year),cv(week)]
)
order by year, week;
YEAR WEEK SALE SALE_LIST
---------- ---------- ---------- --------------------------------------------------
1998 1 58.15 ,,58.15,29.39,29.49
1998 2 29.39 ,58.15,29.39,29.49,29.49
1998 3 29.49 58.15,29.39,29.49,29.49,29.8
1998 4 29.49 29.39,29.49,29.49,29.8,58.78
1998 5 29.8 29.49,29.49,29.8,58.78,
1998 6 58.78 29.49,29.8,58.78,,
1998 9 58.78 ,,58.78,117.76,
1998 10 117.76 ,58.78,117.76,,59.6
1998 12 59.6 117.76,,59.6,,58.78
1998 14 58.78 59.6,,58.78,58.78,
1998 15 58.78 ,58.78,58.78,,58.78
1998 17 58.78 58.78,,58.78,117.56,58.98
1998 18 117.56 ,58.78,117.56,58.98,
1998 19 58.98 58.78,117.56,58.98,,59.6
1998 21 59.6 58.98,,59.6,,117.56
1998 23 117.56 59.6,,117.56,,
1998 26 117.56 ,,117.56,57.52,57.72
1998 27 57.52 ,117.56,57.52,57.72,57.72
1998 28 57.72 117.56,57.52,57.72,57.72,
1998 29 57.72 57.52,57.72,57.72,,
目标为以逗号分隔列表的形式展示5周sale列的值。
子句 | 描述 |
---|---|
rules iterate(5) |
规则程序段进行5次循环 |
iteration_number |
当前循环次数的变量从第一次0开始,结束于n-1,其中n为iterate (n) 子句中指定的循环次数。 |
sale[cv(year),cv(week)-iteration_number+2] |
访问前两周以及后两周的值。 |
case when iteration_number=0 then '' else ',' end |
为列表中除了第一个成员以外的每个成员加上了一个逗号。 |
presentv与空值
如果规则访问一个不存在的行,将会返回一个空值。
--迭代和presntv
select year,week,sale,sale_list
from sales_fact sf
where sf.country in ('Australia') and sf.product ='Xtend Memory'
model return updated rows
partition by (product, country)
dimension by (year, week)
measures ( cast(' ' as varchar2(50) ) sale_list, sale )
rules iterate(5)(
sale_list[year,week] order by year,week =
presentv(sale[cv(year),cv(week)-iteration_number+2],
sale[cv(year),cv(week)-iteration_number+2] ||
case when iteration_number=0 then '' else ',' end ||
sale_list [cv(year),cv(week)],
sale_list [cv(year),cv(week)])
)
order by year, week;
YEAR WEEK SALE SALE_LIST
---------- ---------- ---------- --------------------------------------------------
1998 1 58.15 ,,58.15,29.39,29.49
1998 2 29.39 ,58.15,29.39,29.49,29.49
1998 3 29.49 58.15,29.39,29.49,29.49,29.8
1998 4 29.49 29.39,29.49,29.49,29.8,58.78
1998 5 29.8 29.49,29.49,29.8,58.78,
1998 6 58.78 29.49,29.8,58.78,,
1998 9 58.78 ,,58.78,117.76,
1998 10 117.76 ,58.78,117.76,,59.6
1998 12 59.6 117.76,,59.6,,58.78
1998 14 58.78 59.6,,58.78,58.78,
1998 15 58.78 ,58.78,58.78,,58.78
1998 17 58.78 58.78,,58.78,117.56,58.98
1998 18 117.56 ,58.78,117.56,58.98,
1998 19 58.98 58.78,117.56,58.98,,59.6
1998 21 59.6 58.98,,59.6,,117.56
1998 23 117.56 59.6,,117.56,,
1998 26 117.56 ,,117.56,57.52,57.72
1998 27 57.52 ,117.56,57.52,57.72,57.72
1998 28 57.72 117.56,57.52,57.72,57.72,
1998 29 57.72 57.52,57.72,57.72,,
presentnnv函数与presentv函数类似,但它可以进一步区分所引用的是不存在的单元格还是存在的值为空的单元格。
presentnnv函数的语法:
presentnnv (cell_reference, expr1, expr2)
如果第1个参数cell_reference引用了存在的单元格并且单元格不含空值,那么返回第1个参数expr1,否则返回第2个参数expr2。
presentv与presentnnv的比较
单元格存 | 是否为空格 | presentv | presentnnv |
---|---|---|---|
是 | 非空 | expr1 | expr1 |
是 | 空 | expr1 | expr2 |
否 | 非空 | expr12 | expr2 |
否 | 空 | expr12 | expr2 |
查找表
可以定义一个查找表并在规则部分进行引用。这样的一个查找表有时也称为参考表。
--参考model
select year,week,sale,prod_list_price
from sales_fact
where country in ('Australia') and product in ('Xtend Memory')
model return updated rows
reference ref_prod on
(select prod_name, max(prod_list_price) prod_list_price
from sh.products group by prod_name)
dimension by (prod_name)
measures (prod_list_price)
MAIN main_section
partition by(product, country)
dimension by(year, week)
measures(sale, receipts, 0 prod_list_price)
rules(
prod_list_price[year, week] order by year, week =ref_prod.prod_list_price[cv(product)]
)
order by year,week;
YEAR WEEK SALE PROD_LIST_PRICE
---------- ---------- ---------- ---------------
2001 38 139 20.99
2001 39 115.57 20.99
2001 40 45.18 20.99
2001 41 67.19 20.99
2001 42 136.98 20.99
2001 43 139.58 20.99
2001 44 23.29 20.99
2001 46 93.58 20.99
2001 48 182.96 20.99
2001 49 45.26 20.99
2001 50 23.14 20.99
2001 51 114.82 20.99
2001 52 23.14 20.99
reference ref_prod on
(select prod_name, max(prod_list_price) prod_list_price
from products group by prod_name)
dimension by (prod_name)
measures (prod_list_price)
使用reference
子句定义了一个查找表ref_prod.
reference ref_prod
指定ref_prod为查找表。
prod_name
指定了维度列
prod_list_price
为度量值列
注意引用表的维度列必须唯一,并且针对维度列中的每个值只会取出一行。
MAIN main_section
partition by(product, country)
dimension by(year, week)
measures(sale, receipts, 0 prod_list_price)
rules(
prod_list_price[year, week] order by year, week =ref_prod.prod_list_price[cv(product)]
)
以MAIN关键字开头声明了main model部分。
prod_list_price[year, week] order by year, week =ref_prod.prod_list_price[cv(product)]
子句读取。product列的当前值通过cv(product)子句传递过来作为查找表中的查找键值。
--更多查找表
select year,week,sale,prod_list_price,iso_code
from sales_fact
where country in ('Australia') and product in ('Xtend Memory')
model return updated rows
reference ref_prod on
(select prod_name, max(prod_list_price) prod_list_price from sh.products group by prod_name)
dimension by (prod_name)
measures (prod_list_price)
reference ref_country on
(select c.country_name, c.country_iso_code from sh.countries c)
dimension by (country_name)
measures (country_iso_code)
MAIN main_section
partition by(product, country)
dimension by(year, week)
measures(sale, receipts, 0 prod_list_price, cast(' ' as varchar2(5)) iso_code)
rules(
prod_list_price[year, week] order by year, week =ref_prod.prod_list_price[cv(product)],
iso_code[year, week] order by year, week =ref_country.country_iso_code[cv(country)]
)
order by year,week;
YEAR WEEK SALE PROD_LIST_PRICE ISO_CODE
---------- ---------- ---------- --------------- --------
2001 38 139 20.99 AU
2001 39 115.57 20.99 AU
2001 40 45.18 20.99 AU
2001 41 67.19 20.99 AU
2001 42 136.98 20.99 AU
2001 43 139.58 20.99 AU
2001 44 23.29 20.99 AU
2001 46 93.58 20.99 AU
2001 48 182.96 20.99 AU
2001 49 45.26 20.99 AU
2001 50 23.14 20.99 AU
2001 51 114.82 20.99 AU
2001 52 23.14 20.99 AU
空值
在使用model的sql语句中,有两个原因使得值为空:己经存在的单元格值为空值或引用了不存在的单元格。
--keep nav的例子
select product,country, year,week,sale
from sales_fact
where country in ('Australia') and product in ('Xtend Memory')
model keep nav return updated rows
partition by(product, country)
dimension by(year, week)
measures(sale)
rules sequential order(
sale[2001, 1] order by year, week =sale[2001,1],
sale[2002, 1] order by year, week =sale[2001,1]+sale[2002, 1]
)
order by year,week;
PRODUCT COUNTRY YEAR WEEK SALE
-------------------------------------------------- ---------------------------------------- ---------- ---------- ----------
Xtend Memory Australia 2001 1 92.26
Xtend Memory Australia 2002 1
sale[2002, 1]
子句访问的是2002年第1周的sale列的值。在sales_fact表中没有2002年的数据,因此sale[2002, 1]
访问的是一个不存在的单元格。由于与空值进行数学运算,这个表中的输出为空值。
NAV non available values,表示没有可用值,引用不存在的单元格默认会返回空值。
--忽略nav
select product,country, year,week,sale
from sales_fact
where country in ('Australia') and product in ('Xtend Memory')
model ignore nav return updated rows
partition by(product, country)
dimension by(year, week)
measures(sale)
rules sequential order(
sale[2001, 1] order by year, week =sale[2001,1],
sale[2002, 1] order by year, week =sale[2001,1]+sale[2002, 1]
)
order by year,week;
PRODUCT COUNTRY YEAR WEEK SALE
-------------------------------------------------- ---------------------------------------- ---------- ---------- ----------
Xtend Memory Australia 2001 1 92.26
Xtend Memory Australia 2002 1 92.26
这个默认行为可以使用ignore nav
子句修改。如果访问不存在的单元格,则数值列将会返回0,文本列会返回一个空字符串而不返回空值。sale[2001,1]+sale[2002, 1]
返回了 92.26,因为不存在单元格sale[2002, 1]的返回值为0。
使用model子句进行性能调优
1. acyclic
SQL> set autotrace traceonly
SQL> --自动排序与acyclic
SQL> select product,country, year,week,inventory,sale,receipts
2 from sales_fact
3 where country in ('Australia') and product in ('Xtend Memory')
4 model return updated rows
5 partition by(product, country)
6 dimension by(year, week)
7 measures(0 inventory, sale,receipts)
8 rules automatic order(
9 inventory[year,week] order by year,week=nvl(inventory[cv(year),cv(week)-1],0)-sale[cv(year),cv(week)]+receipts[cv(year),cv(week)]
10 )
11 order by product,country,year,week;
执行计划
----------------------------------------------------------
Plan hash value: 612713790
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 251 | 25351 | 311 (1)| 00:00:04 |
| 1 | SORT ORDER BY | | 251 | 25351 | 311 (1)| 00:00:04 |
| 2 | SQL MODEL ACYCLIC | | 251 | 25351 | 311 (1)| 00:00:04 |
|* 3 | TABLE ACCESS FULL| SALES_FACT | 251 | 25351 | 310 (1)| 00:00:04 |
关键字acyclic 表明规则之间没有可能的cyclic依赖关系
这儿使用了order by year,week
来控制规则之间的依赖关系,避免循环依赖性。
2. acyclic fast
如果规则是只访问某一个单元格的简单规则,那么可以使用acyclic fast算法。
--自动排序与acyclic fast
SELECT DISTINCT product,country,year,week, sale_first_week
FROM sales_fact
WHERE country IN ('Australia') AND product='Xtend Memory'
MODEL RETURN UPDATED ROWS
PARTITION BY (product, country)
DIMENSION BY (year,week)
MEASURES (0 sale_first_week, sale)
RULES AUTOMATIC ORDER(
sale_first_week[2000,1] = 0.12*sale[2000,1]
)
ORDER BY product,country,year,week;
执行计划
----------------------------------------------------------
Plan hash value: 2162534578
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 251 | 22088 | 312 (2)| 00:00:04 |
| 1 | SORT ORDER BY | | 251 | 22088 | 312 (2)| 00:00:04 |
| 2 | SQL MODEL ACYCLIC FAST| | 251 | 22088 | 312 (2)| 00:00:04 |
|* 3 | TABLE ACCESS FULL | SALES_FACT | 251 | 22088 | 310 (1)| 00:00:04 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("COUNTRY"='Australia' AND "PRODUCT"='Xtend Memory')
3. cyclic
下面使用cyclic算法来进行规则的求解。
--自动排序与CYCLIC
select product,country, year,week,inventory,sale,receipts
from sales_fact
where country in ('Australia') and product in ('Xtend Memory')
model return updated rows
partition by(product, country)
dimension by(year, week)
measures(0 inventory, sale,receipts)
rules automatic order(
inventory[year,week]=nvl(inventory[cv(year),cv(week)-1],0)-sale[cv(year),cv(week)]+receipts[cv(year),cv(week)]
)
order by product,country,year,week;
执行计划
----------------------------------------------------------
Plan hash value: 1486878524
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 251 | 25351 | 311 (1)| 00:00:04 |
| 1 | SORT ORDER BY | | 251 | 25351 | 311 (1)| 00:00:04 |
| 2 | SQL MODEL CYCLIC | | 251 | 25351 | 311 (1)| 00:00:04 |
|* 3 | TABLE ACCESS FULL| SALES_FACT | 251 | 25351 | 310 (1)| 00:00:04 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("COUNTRY"='Australia' AND "PRODUCT"='Xtend Memory')
4. sequential
如果规则声明了sequential顺序,那么规则的求解算法将会是ordered。
--sequential顺序
select product,country, year,week,inventory,sale,receipts
from sales_fact
where country in ('Australia') and product in ('Xtend Memory')
model return updated rows
partition by(product, country)
dimension by(year, week)
measures(0 inventory, sale,receipts)
rules sequential order(
inventory[year,week] order by year, week=nvl(inventory[cv(year),cv(week)-1],0)-sale[cv(year),cv(week)]+receipts[cv(year),cv(week)]
)
order by product,country,year,week;
执行计划
----------------------------------------------------------
Plan hash value: 3753083011
----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 251 | 25351 | 311 (1)| 00:00:04 |
| 1 | SORT ORDER BY | | 251 | 25351 | 311 (1)| 00:00:04 |
| 2 | SQL MODEL ORDERED | | 251 | 25351 | 311 (1)| 00:00:04 |
|* 3 | TABLE ACCESS FULL| SALES_FACT | 251 | 25351 | 310 (1)| 00:00:04 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - filter("COUNTRY"='Australia' AND "PRODUCT"='Xtend Memory')
简单来说,规则的复杂度和互相依赖性在求解算法中扮演了关键的角色。acyclic fast和ordered fast算法可扩展性更强,这在数据量不断增加时,它俩的作用尤其明显。
谓词推进
从概念上来说,model子句是分析型sql的一个变体,典型的是在一个视图或内嵌视图中实现。谓词是在视图之外声明的,为了获得可以授受的性能,这些谓词必须被推进到视图中去。事实上,谓词推进对于model子句的性能是非常关键的。如果没有推进,那么model子句将会在更大的行数据集上执行并且可能导致较差的性能。
--谓词推进
SELECT * FROM (
select product,country, year,week,inventory,sale,receipts
from sales_fact
model return updated rows
partition by(product, country)
dimension by(year, week)
measures(0 inventory, sale,receipts)
rules automatic order(
inventory[year,week] order by year, week=nvl(inventory[cv(year),cv(week)-1],0)-sale[cv(year),cv(week)]+receipts[cv(year),cv(week)]
)
) where country in ('Australia') and product in ('Xtend Memory')
order by product,country,year,week;
执行计划
----------------------------------------------------------
Plan hash value: 3432178194
-----------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 251 | 28614 | 311 (1)| 00:00:04 |
| 1 | SORT ORDER BY | | 251 | 28614 | 311 (1)| 00:00:04 |
| 2 | VIEW | | 251 | 28614 | 310 (1)| 00:00:04 |
| 3 | SQL MODEL ACYCLIC | | 251 | 25351 | | |
|* 4 | TABLE ACCESS FULL| SALES_FACT | 251 | 25351 | 310 (1)| 00:00:04 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter("COUNTRY"='Australia' AND "PRODUCT"='Xtend Memory')
上面定义了一个内嵌视图,然后在country和product列上加上了谓词。执行计划中的第4步显示两个谓词都被推进到了视图中,数据行经过了这两个谓词的筛选,然后在其结果集上应用model子句。
--未进行谓词推进
SELECT * FROM (
select product,country, year,week,inventory,sale,receipts
from sales_fact
model return updated rows
partition by(product, country)
dimension by(year, week)
measures(0 inventory, sale,receipts)
rules automatic order(
inventory[year,week] order by year, week=nvl(inventory[cv(year),cv(week)-1],0)-sale[cv(year),cv(week)]+receipts[cv(year),cv(week)]
)
) where year=2000
order by product,country,year,week;
执行计划
----------------------------------------------------------
Plan hash value: 3432178194
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 132K| 14M| | 3711 (1)| 00:00:45 |
| 1 | SORT ORDER BY | | 132K| 14M| 16M| 3711 (1)| 00:00:45 |
|* 2 | VIEW | | 132K| 14M| | 311 (1)| 00:00:04 |
| 3 | SQL MODEL ACYCLIC | | 132K| 12M| | | |
| 4 | TABLE ACCESS FULL| SALES_FACT | 132K| 12M| | 311 (1)| 00:00:04 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter("YEAR"=2000)
上面这个例子是一个谓词没有推进到视图中的例子。这儿指定year=2000
,但没有推进到内嵌视图中。优化器估计显示model需要处理近132000行数据。只有在安全的情况下才可以将谓词推进到视图中。上面将year和week列作为维度列。分区列上的谓词可以很安全地推进到视图中,但并不是所有维度列上的谓词都可以进行推进的。