详见:https://towardsdatascience.com/6-ways-to-test-for-a-normal-distribution-which-one-to-use-9dcf47d8fa93
画图不关心,关于定量验证部分摘录如下:
4. Kolmogorov Smirnov test
4.1. Introduction
If the QQ Plot and other visualization techniques are not conclusive, statistical inference (Hypothesis Testing) can give a more objective answer to whether our variable deviates significantly from a normal distribution.
If you have doubts about how and when to use hypothesis testing, here’s an article that gives an intuitive explanation to hypothesis testing.
The Kolmogorov Smirnov test computes the distances between the empirical distribution and the theoretical distribution and defines the test statistic as the supremum of the set of those distances.
The advantage of this is that the same approach can be used for comparing any distribution, not necessary the normal distribution only.
The KS test is well-known but it has not much power. It can be used
for other distribution than the normal.
4.2. Interpretation
The Test Statistic of the KS Test is the Kolmogorov Smirnov Statistic, which follows a Kolmogorov distribution if the null hypothesis is true.
If the observed data perfectly follow a normal distribution, the value of the KS statistic will be 0. The P-Value is used to decide whether the difference is large enough to reject the null hypothesis:
If the P-Value of the KS Test is larger than 0.05, we assume a normal distribution
If the P-Value of the KS Test is smaller than 0.05, we do not assume a normal distribution
4.3. Implementation
The KS Test in Python using Scipy can be implemented as follows. It returns the KS statistic and its P-Value.
4.4. Conclusion
The KS test is well-known but it has not much power. This means that a large number of observations is necessary to reject the null hypothesis. It is also sensitive to outliers. On the other hand, it can be used for other types of distributions.
5. Lilliefors test
5.1. Introduction
The Lilliefors test is strongly based on the KS test. The difference is that in the Lilliefors test, it is accepted that the mean and variance of the population distribution are estimated rather than pre-specified by the user.
Because of this, the Lilliefors test uses the Lilliefors distribution rather than the Kolmogorov distribution.
Unfortunately for Lilliefors, it’s power is still lower than the
Shapiro Wilk test.
5.2. Interpretation
If the P-Value of the Lilliefors Test is larger than 0.05, we assume a normal distribution
If the P-Value of the Lilliefors Test is smaller than 0.05, we do not assume a normal distribution
5.3. Implementation
The Lilliefors test implementation in statsmodels will return the value of the Lilliefors test statistic and the P-Value as follows.
Attention: in the statsmodels implementation, P-Values lower than 0.001 are reported as 0.001 and P-Values higher than 0.2 are reported as 0.2.
5.4. Conclusion
Although Lilliefors is an improvement to the KS test it’s power is still lower than the Shapiro Wilk test.
6. Shapiro Wilk test
6.1. Introduction
The Shapiro Wilk test is the most powerful test when testing for a normal distribution. It has been developed specifically for the normal distribution and it cannot be used for testing against other distributions like for example the KS test.
The Shapiro Wilk test is the most powerful test when testing for a
normal distribution.
6.2. Interpretation
If the P-Value of the Shapiro Wilk Test is larger than 0.05, we assume a normal distribution
If the P-Value of the Shapiro Wilk Test is smaller than 0.05, we do not assume a normal distribution
6.3. Implementation
The Shapiro Wilk test can be implemented as follows. It will return the test statistic called W and the P-Value.
Attention: for N > 5000 the W test statistic is accurate but the p-value may not be.
matlab Lilliefors test专用函数: lillietest().
lillietest函数详见https://www.mathworks.com/help/stats/lillietest.html
按照:
浙大《概率论与数理统计》第四版185页,(三)基于成对数据的检验(t检验),采用t-test验证。
多个样本在两种机器分别测试,得到两组(对)样本。假设两组样本差值满足正态分布,可按照单个总体,总体方差未知情况下的均值检验方法,检验样本差值的均值是否等于/大于/小于零。
一般需完成以下两步:
如何画原始数据概率分布图?
matlab命令如下:
ymin=fix(min(RMSE_2nd)/0.1)*0.1;
ymax=ceil(max(RMSE_2nd)/0.1)*0.1;
x=2.2:0.1:3.4;%
yy=histogram(RMSE_2nd,x,'Normalization','probability');%计算各个区间的个数
如何计算正态分布的概率密度函数参数?
matlab 正态分布概率密度函数参数计算函数:normfit()
详见https://www.mathworks.com/help/stats/normfit.html
注意样本标准差,要除以n-1,不是n。
详见:https://blog.csdn.net/wanghanjiett/article/details/105295034
已知概率密度函数,如何画概率分布图?
[mu,sigma]=normfit(RMSE_2nd);
x = 2.2:0.1:3.4;
for i = 2.2:0.1:3.4
p = normcdf([i i+0.1], mu,sigma);
t=round((i-2.2)*10+1);
y4(t)=p(2)-p(1);
end
hold on
plot(x,y4,'LineWidth',2);
grid on;
关于t-test
no tail, left tail, right tail:
关于整体均值置信区间: