DBSeer-2016-sigmod

功能：
performance prediction, performance diagno- sis, bottleneck explanation, workload insight, optimal admission control, and what-if analysis
指标：
AggregatedOSstatistics
WorkloadstatisticsfromtheDBMS,includingthenumberof SELECT, UPDATE, DELETE and INSERT commands executed, number of flushed and dirty pages, and the total lock wait- time.
Timestampedquerylogs,containingstart-time,duration,and the SQL statements executed by the system.

1.query分类：

similar:eg,perform similar operations on each table or ex- hibit similar patterns of resource usage. table,lock mode,rows,time between statements. DBSCAN

2.资源预测：（ Barzan Mozafari,2013-sigmod）

2.1 CPU, network, and log writes

black-box models（make minimal assumptions about the underlying system, and hence are not specific to a particular DBMS.）与load的线性模型

White-box models are needed for other resources (e.g., locking, page flushes due to log recycling) when making predictions about a drastically different workload than the one observed during training.
cache model【D.N.Tran,P.C.Huynh,Y.C.Tay,andA.K.H.Tung.Anew
approach to dynamic self-tuning of database buffers. Trans. Storage,
4(1), 2008.】
dirty rate~flush rate
flush rate:比如蒙特卡洛模拟【B.Mozafari,C.Curino,andS.Madden.Performanceandresource
modeling in highly-concurrent oltp workloads. Technical report, MIT,February 2012.】

2.2 lock/IO conflicting

每个请求获取表的主键，rewrite成获取主键，聚合索引的主键和数据在一起，获取这批请求的主键集合，根据相邻和分散性判断分布

2.3 磁盘。ram

log,dirty flush,cache miss
adaptive flushing: dirty flush频率和redo log recycling时阻塞时间之间的heuristic。不会只等recycling时才刷，分期偿还刷脏压力。
每个transaction到第i的概率是pwrite,i. D是所有page。则sum(pwrite,i)=1 假设一个transaction只写一个页
p~是n次trans后不同脏页的数量。Tn,i是第i页被写的概率Tn=sum(Tn,i),Tn,i=1-(1-pwrite,i)^n
log roration分类：1，page是脏的，第一个脏事务在旧log中，2，page是脏的，第一个脏事务在新log中，3，page不是脏的。第i页属于三类的概率P1,i,,,在t时刻属于第一类的页数为d1,t，总长度为L。
log roration只能发生在d1,t=0的时刻。d2,t+1=0.假设1s有n个事务好的刷脏率

2.4. RAM miss

Monte-Carlo simulation of the buffer pool. To estimate the miss rate for a database with N pages of RAM. 模拟根据2.2的分布。模拟LUR,LRU2,累加脏页的Cwrite.t.n和miss的Cread.t.n

2.5 锁模型，估计transaction delay

根据2.2得到g（transation Ci在第n步骤获取第i个region的数据的概率）,根据2.2得到S和m个transaction的LR模型。第N步的平均处理时间S（指数分布）用CPU核数*S/排队的M个数估计新的S，U是在第n步遇到锁的平均delay。平均延时是S+U.
等等调整来评估锁。基于Thomasian'2pl analysis

2.6 黑盒

dbsherlock

输入：异常时间和期间某个特征值
输出：预测的特征或者已知模型
1.最优化目标
找出的Pred，最大化异常和正常之间的separation power
SP的定义是：异常中符合Pred的占比-正常中符合Pred的占比
2.要解决的问题：noisy；user定义的异常和正常的准确度；指标的相关性

预处理

1.将每个特征离散到不同partition中，2.根据定义的异常和正常给每个partition分类正常，异常，混合为空，3.过滤保留左右一样的分类，其他的标记为空，4.对空的算距离，标记为距离近的类别（异常和正常的距离权重不同）

计算

5.对数据标准化。6，计算平均ua-un根据阈值选择特征

领域知识

会计算互信息确认领域知识的正确性

用户确认模型

当规则和模型的置信度>阈值，直接输出模型
causal model会最小化合并

automatic anomaly detection

防止用户不定义或者错误定义，
1.标准化，
2.只考虑特征PP(Attri)大于某个阈值的。这里取的是MEDIAN值与窗口值差最大值。窗口固定。
3.dbscan对异常区间相同的聚类吗？
4.选择size<20%的。