scipy实现单因素方差分析

经典例题

某校高二年级共有四个班,采用四种不同的教学方法进行数学教学,为了比较这四种教学法的效果是否存在明显的差异,期末统考后,从这四个班中各抽取 5 名考生的成绩,如下所示。

班级

一班

二班

三班

四班

1

75

93

65

72

2

77

80

67

70

3

70

85

77

71

4

88

90

68

65

5

72

84

65

81

6

80

86

64

72

7

79

85

62

68

8

81

81

68

74

问这四种教学法的效果是否存在显著性差异(α =0.05)?

1.计算F值

import numpy as np
from scipy.stats import f_oneway

# Data for the four classes
class1 = [75, 77, 70, 88, 72, 80, 79, 81]
class2 = [93, 80, 85, 90, 84, 86, 85, 81]
class3 = [65, 67, 77, 68, 65, 64, 62, 68]
class4 = [72, 70, 71, 65, 81, 72, 68, 74]

# Perform one-way ANOVA
f_statistic, p_value = f_oneway(class1, class2, class3, class4)

# Output the results
print("F-statistic:", f_statistic)
print("P-value:", p_value)

# Interpret the results
alpha = 0.05
if p_value < alpha:
    print("There is a significant difference in the effectiveness of the teaching methods.")
else:
    print("There is no significant difference in the effectiveness of the teaching methods.")
F-statistic: 22.045992451864645
P-value: 1.5622062333927252e-07
There is a significant difference in the effectiveness of the teaching methods.

2.计算SS、df和F值

import numpy as np
import pandas as pd
from scipy.stats import f_oneway, f

# Data for the four classes
class1 = [75, 77, 70, 88, 72, 80, 79, 81]
class2 = [93, 80, 85, 90, 84, 86, 85, 81]
class3 = [65, 67, 77, 68, 65, 64, 62, 68]
class4 = [72, 70, 71, 65, 81, 72, 68, 74]

# Perform one-way ANOVA
f_statistic, p_value = f_oneway(class1, class2, class3, class4)

# Degrees of freedom
num_groups = 4
num_samples = len(class1) + len(class2) + len(class3) + len(class4)
df_between = num_groups - 1
df_within = num_samples - num_groups

# Calculate sum of squares (SS)
mean_total = np.mean([np.mean(class1), np.mean(class2), np.mean(class3), np.mean(class4)])
ss_total = np.sum((np.concatenate([class1, class2, class3, class4]) - mean_total) ** 2)
ss_between = np.sum([len(class1) * (np.mean(class1) - mean_total) ** 2,
                     len(class2) * (np.mean(class2) - mean_total) ** 2,
                     len(class3) * (np.mean(class3) - mean_total) ** 2,
                     len(class4) * (np.mean(class4) - mean_total) ** 2])
ss_within = np.sum((class1 - np.mean(class1)) ** 2) + \
            np.sum((class2 - np.mean(class2)) ** 2) + \
            np.sum((class3 - np.mean(class3)) ** 2) + \
            np.sum((class4 - np.mean(class4)) ** 2)

# Calculate mean squares (MS)
ms_between = ss_between / df_between
ms_within = ss_within / df_within

# Calculate F-statistic
f_statistic_manual = ms_between / ms_within

# Critical F-value
alpha = 0.05
f_crit = f.ppf(1 - alpha, df_between, df_within)

# Create a DataFrame for better tabular representation
data = {
    'Class 1': class1,
    'Class 2': class2,
    'Class 3': class3,
    'Class 4': class4,
}

df = pd.DataFrame(data)

# Output the ANOVA results
print("Analysis of Variance (ANOVA):")
print("F-statistic (from scipy.stats):", f_statistic)
print("P-value (from scipy.stats):", p_value)
print("\nManual Calculation:")
print("SS Between:", ss_between)
print("SS Within:", ss_within)
print("DF Between:", df_between)
print("DF Within:", df_within)
print("MS Between:", ms_between)
print("MS Within:", ms_within)
print("F-statistic (manual calculation):", f_statistic_manual)
print("Critical F-value:", f_crit)

# Interpret the results
if p_value < alpha:
    print("\nThere is a significant difference in the effectiveness of the teaching methods.")
else:
    print("\nThere is no significant difference in the effectiveness of the teaching methods.")
Manual Calculation:
SS Between: 1538.59375
SS Within: 651.375
DF Between: 3
DF Within: 28
MS Between: 512.8645833333334
MS Within: 23.263392857142858
F-statistic (manual calculation): 22.045992451864645
Critical F-value: 2.9466852660172655

你可能感兴趣的:(Python学习,前端,javascript,开发语言)