机器学习—人体运动状态预测实例

机器学习实验报告,仅作记录。
算法流程:

  • 需要从特征文件和标签文件中将所有数据加载到内存中,由于存在缺失值,此步骤还需要进行简单的数据预处理。
  • 创建对应的分类器,并使用训练数据进行训练。
  • 利用测试集预测,通过使用真实值和预测值的比对,计算模型整体的准确率和召回率,来评测模型
  • 流程图:
    开始 -> 加载数据&预处理 -> 创建分类器 -> 训练分类器 -> 在测试集上得到预测结果 -> 计算准确率和召回率 -> 结束

实验数据,可以从这里下载哦。A~D为训练集,E为测试集

实验代码:

# This is a sample Python script.
import numpy as np
import pandas as pd
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import  GaussianNB 
"""从sklearn库中依次导入三个分类器模块:
K近邻分类器KNeighborsClassifier、
决策树分类器DecisionTreeClassifier和
高斯朴素贝叶斯函数GaussianNB。"""
# Press Shift+F10 to execute it or replace it with your code.
# Press Double Shift to search everywhere for classes, files, tool windows, actions, and settings.

def load_dataset(feature_paths,label_paths):
    feature=np.ndarray(shape=(0,41))
    label=np.ndarray(shape=(0,1))
    for file in feature_paths:
        df=pd.read_table(file,delimiter=',',na_values='?',header=None)
        # 使用平均值补充缺省值
        imp=SimpleImputer(missing_values=np.nan,strategy='mean')
        imp.fit(df)
        df=imp.transform(df)
        feature=np.concatenate((feature,df))
    for file in label_paths:
        df=pd.read_table(file,header=None)
        label =np.concatenate((label,df))
    label=np.ravel(label)
    return feature,label

def print_hi(name):
    # Use a breakpoint in the code line below to debug your script.
    print(f'Hi, {name}')  # Press Ctrl+F8 to toggle the breakpoint.


# Press the green button in the gutter to run the script.
if __name__ == '__main__':
    feature_paths=['A/A.feature','B/B.feature','C/C.feature','D/D.feature','E/E.feature']
    label_paths=['A/A.label','B/B.label','C/C.label','D/D.label','E/E.label']
    x_train,y_train=load_dataset(feature_paths[:4],label_paths[:4])
    x_test,y_test=load_dataset(feature_paths[4:],label_paths[4:])
    x_train,x_,y_train,y_=train_test_split(x_train,y_train,test_size=0.0001)
    print("Start training knn")
    knn=KNeighborsClassifier().fit(x_train,y_train)
    print("Training done!")
    answer_knn=knn.predict(x_test)
    print("Predicting done!")
    print("Start training DT")
    dt=DecisionTreeClassifier().fit(x_train,y_train)
    print("Training done!")
    answer_dt=dt.predict(x_test)
    print("Predicting done!")
    print("Start training Bayes")
    gnb = GaussianNB().fit(x_train, y_train)
    print("Training done!")
    answer_gnb = gnb.predict(x_test)
    print("Predicting done!")
    print("\n\nThe classification report for knn:")
    print(classification_report(y_test,answer_knn))
    print("\n\nThe classification report for dt:")
    print(classification_report(y_test, answer_dt))
    print("\n\nThe classification report for gnb:")
    print(classification_report(y_test, answer_gnb))
# See PyCharm help at https://www.jetbrains.com/help/pycharm/

结论:

  1. 从准确度的角度衡量,贝叶斯分类器的效果最好
  2. 从召回率和F1值的角度衡量,k近邻效果最好
  3. 贝叶斯分类器和k近邻的效果好于决策树

几种算法的优缺点:

01 KNN算法特点:
优点:精度高、对异常值不敏感、无数据输入假定
缺点:计算复杂度高、空间复杂度高
02决策树算法特点:
优点:计算复杂度不高,输出结果易于理解,对中间值的缺失不敏感,可以处理不相关特征数据。
缺点:可能会产生过度匹配问题。
03朴素贝叶斯算法特点:
优点:在数据较少的情况下依然有效,可以处理多类别问题。
缺点:对于输入数据的准备方式较为敏感。

你可能感兴趣的:(机器学习,python,python,sklearn)