【计算机科学前沿】第三章答案 2022 - 机器学习

第3章

3.1 通过线性回归预测牧羊犬体重

3.1.1 数据录入

train_x = [27, 29, 34, 40, 42, 47, 48, 49, 50, 52, 52, 52, 54]

train_y = [6, 7.5, 9, 10.7, 12.8, 15.1, 16, 18.5, 19.4, 18.4, 19.7, 21.8, 21.7]

print_shape(train_x)
print_shape(train_y)

3.1.2 定义线性回归模型

model = linear_regressor()

3.1.3 训练线性回归模型

model.train(train_x, train_y)

3.1.4 模型可视化

model.show()

3.1.5 模型预测-predict()函数

x = 40
pred_y = model.predict(x)
print(pred_y)

3.1.6 模型中的参数

weights = model.get_weights()
print(weights)
k = weights[0]
b = weights[1]
print("k=",k)
print("b=",b)

3.1.7 定义线性函数

def linear_function(x, k, b):
    y = k * x + b
    return y

3.1.8 使用线性函数进行预测

x = 40
pred_y = linear_function(x, k, b)
print(pred_y)

3.1.9 模型分析 - 异常数据拟合和结果的影响

train_y[5] = 30
model = linear_regressor()
model.train(train_x, train_y)
model.show()

3.2 通过多项式回归预测牧羊犬体重

3.2.1 数据录入

train_x = [27, 29, 34, 40, 42, 47, 48, 49, 50, 52, 52, 52, 54]

train_y = [6, 7.5, 9, 10.7, 12.8, 15.1, 16, 18.5, 19.4, 18.4, 19.7, 21.8, 21.7]

3.2.2 定义多项式回归模型

model = poly_regressor(2)

3.2.3 训练多项式回归模型

model.train(train_x,train_y)

3.2.4 模型可视化

model.show()

3.2.5 模型预测

x = 40
pred_y = model.predict(x)
print(pred_y)

3.2.6 模型次数对拟合效果的影响

model = poly_regressor(3)
model.train(train_x,train_y)
model.show()

model = poly_regressor(10)
model.train(train_x,train_y)
model.show()

model = poly_regressor(30)
model.train(train_x,train_y)
model.show()

3.2.7 异常数据对拟合效果的影响

train_y[5] = 30
model = poly_regressor(2)
model.train(train_x, train_y)
model.show()

model = poly_regressor(30)
model.train(train_x,train_y)
model.show()

3.3 线性回归模型评估与测试集

3.3.1 训练线性模型

train_x = [27, 29, 34, 40, 42, 47, 48, 49, 50, 52, 52, 52, 54]

train_y = [6, 7.5, 9, 10.7, 12.8, 15.1, 16, 18.5, 19.4, 18.4, 19.7, 21.8, 21.7]

model = linear_regressor()
model.train(train_x, train_y)
model.show()

3.3.2 定义误差函数

def mse_error(pred, y):
    error = 0
    for i in range(len(pred)):
        error = error + (y[i] - pred[i]) **2
        error = error / len(pred)
        return error
    print(mse_error(train_x, train_y))

3.3.3 计算拟合误差

pred_y = model.predict(train_x)
error = mse_error(pred_y, train_y)
print(error)

3.3.4 把误差计算流程写进函数

def compute_error(model, x, y):
    pred = model.predict(x)
    error = mse_error(pred, y)
    return error

3.3.5 模型比较

model2 = poly_regressor(3)
model2.train(train_x, train_y)
model2.show()
print(compute_error(model2, train_x, train_y))

model3 = poly_regressor(30)
model3.train(train_x, train_y)
model3.show()
print(compute_error(model3, train_x, train_y))

3.3.6 训练集,测试集和过拟合问题

test_x = [23, 31, 32, 38, 40, 45, 49, 50, 50, 51, 51, 53, 55]

test_y = [6.3, 7.2, 9.1, 10.5, 12.9, 15.5, 15.9, 18.3, 19.7, 18.9, 19.3, 21.3, 22.1]

print("线性回归误差:", compute_error(model, test_x, test_y))
print("3次多项式误差:", compute_error(model2, test_x, test_y))
print("30次多项式误差:", compute_error(model3, test_x, test_y))

3.4 线性分类预测性能

3.4.1 读取数据

train_x = [60, 56, 60, 55, 60, 57, 65, 60, 62, 59, 43, 52, 41, 45, 43, 50, 46, 52, 56, 56]

train_y = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1]

3.4.2 定义线性分类模型

model = linear_classifier()

3.4.3 训练线性分类模型

model.train(train_x, train_y)

3.4.4 模型可视化

model.show()

3.4.5 模型应用 - predict()函数

x = 60
pred_y = model.predict(x)
print(pred_y)

3.4.6 模型中的参数

weights = model.get_weights()
print(weights)
k = weights[0]
b = weights[1]
print("k=", k)
print("b=", b)

3.4.7 定义线性分类函数

def decision_function(x, k, b):
    if k * x + b > 0:
        return 1
    else:
        return -1

pred = decision_function(3, 2, -5.5)
print(pred)

pred = decision_function(3, 2, -6.5)
print(pred)

3.4.8 使用线性分类函数进行预测

k, b = model.get_weights()

x = 60
pred_y = decision_function(x, k, b)
print(pred_y)

3.4.9 准确率计算

def accuracy(pred, y):
    right = 0
    total = 0
    for i in range(len(pred)):
        if pred[i] == y[i]:
            right += 1
            total += 1
            acc = right / total
            return acc

pred_y = model.predict(train_x)
acc = accuracy(pred_y, train_y)
print(acc)

3.4.10 线性分类与线性回归的比较

model2 = linear_regressor()
model2.train(train_x, train_y)
model2.show()

3.5 利用身高和体重预测性别

3.5.1 读取数据

train_x_m = [[163, 60], [164, 56], [165, 60], [168, 55], [169, 60],[170, 57], [170, 65], [171, 60], [170, 62], [169, 59],[153, 43], [158, 52], [156, 41], [158, 45], [159, 43],[160, 50], [159, 46], [158, 52], [157, 56], [158, 55],[167, 53], [168, 52], [163, 65], [171, 52], [169, 52],[170, 57], [170, 60], [168, 52], [166, 60], [165, 51],[153, 43], [158, 55], [156, 41], [156, 57], [159, 43],[163, 41], [162, 56], [155, 52], [152, 56], [153, 55]]
train_x_s = [60, 56, 60, 55, 60, 57, 65, 60, 62, 59, 43, 52, 41, 45, 43, 50, 46, 52, 56, 55, 53, 52, 65, 52, 52, 57, 60, 52, 60, 51, 43, 55, 41, 57, 43, 41, 56, 52, 56, 55]
train_y = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1]
test_x_m = [[166, 58], [162, 56], [178, 66], [153, 50], [140, 60], [160, 55]]
test_x_s = [58, 56, 66, 50, 60, 55]
test_y = [1, 1, 1, -1, -1, -1]

3.5.2 数据可视化

visualize_data2D(train_x_s, train_y)
visualize_data3D(train_x_m, train_y)

3.5.3 多变量分类训练与验证

model = linear_classifier()
model.train(train_x_m, train_y)
pred_y = model.predict(test_x_m)
acc = accuracy(pred_y, test_y)
print(acc)

3.5.4. 多变量分类器vs 单变量分类器

model2 = linear_classifier()
model2.train(train_x_s, train_y)
red_y = model2.predict(test_x_s)
acc = accuracy(pred_y, test_y)
print(acc)

3.6 利用感知器完成鸢尾花分类

3.6.1 加载数据库

iris = data.get('iris-simple')

3.6.2 展示数据库

fig() + plot(iris)

3.6.3 创建分类器

blc = binary_linear_classifier()

3.6.4 训练分类器

blc.train(iris, alg=Perceptron())

3.6.5 设置感知器的学习率

blc1 = binary_linear_classifier()
blc1.train(iris, alg=Perceptron(lr=0.4))
blc2 = binary_linear_classifier()
blc2.train(iris, alg=Perceptron(lr=0.05))

blc3 = binary_linear_classifier()
blc3.train(iris, alg=Perceptron(w=[1 ,1], b=1))

3.6.6 比较训练结果

fig() + plot(iris) + plot(blc) + plot(blc1) + plot(blc2)

3.7 利用支持向量机完成鸢尾花分类

3.7.1 加载数据集

iris=data.get('iris-simple')
fig() + plot(iris)

3.7.2 创建分类器

blc = binary_linear_classifier() 

3.7.3 用支持向量机训练分类器

blc.train(iris, alg=SVM())

fig() + plot(iris) + plot(blc)

3.8 分类器的测试与应用

3.8.1 加载数据集

iris=data.get('iris-simple')
fig() + plot(iris)

3.8.2 数据分割

iris_train, iris_test = iris.split(7,3)

fig() + plot(iris_train)

3.8.3 获取分类器

blc1=binary_linear_classifier() 
blc2=binary_linear_classifier() 

blc1.train(iris_train, alg=Perceptron(lr=0.2))
blc2.train(iris_train, alg=SVM())

fig() + plot(iris_train) + plot(blc1) + plot(blc2)

3.8.4 测试分类器

acc1 = blc1.accuracy(iris_test)
acc2 = blc2.accuracy(iris_test)
print('Perceptron Accuracy:', acc1)
print('SVM Accuracy:', acc2)

3.8.5 分类器应用

point = [2, 0.7]
fig() + plot(iris) + plot(blc1) + plot(blc2) + plot([point])
label1 = blc1.predict(point)
label2 = blc2.predict(point)
print('Perceptron Prediction: ', label1)
print('SVM Prediction: ', label2)

3.9 理解K均值算法

3.9.1 获取数据集

iris = data.get('iris')
feature, label = iris[0]
print("Feature : ", feature)

3.9.2 数据集特征选取

def select_features(feature):
    return feature[2:4]

iris2 = iris.map(select_features, on_field=0)

fig() + plot(iris2, type='scatter')

3.9.3 创造K均值聚类模型

model = KMeans(K=3)

3.9.4 模型训练并观察训练效果

def select_features(feature):
    return feature[2:4]

iris2 = iris.map(select_features, on_field=0)

model.train(iris2.field(0))

fig() + plot(model, iris2, type='cluster_statistics')

3.9.5 重复并比较

model2 = KMeans(K=3)
model2.train(iris.field(0))
fig() + plot(model2, iris, type='cluster_statistics')

你可能感兴趣的:(计算机科学前沿,python,人工智能)