标称变量(Categorical Features)或者分类变量(Categorical Features)缺失值填补、详解及实战

标称变量(Categorical Features)或者分类变量(Categorical Features)缺失值填补、详解及实战

 

核心学习函数或者方法:

KNeighborsClassifier()

np.hstack()

np.vstack

 

有一个分类特征或者标称变量,它包含需要用预测值替换的缺失值。理想的解决方案是训练一个机器学习分类器算法来预测缺失值,通常是k-nearest neighbors (KNN)分类器来进行缺失值得填补。

 

KNN分类器进行缺失填补:

# Load libraries
import numpy as np
from sklearn.neighbors import KNeighborsClassifier

# Create feature matrix with categorical feature
X = np.array([[0, 2.10, 1.45],
              [1, 1.18, 1.33],
              [0, 1.22, 1.27],
              [1, -0.21, -1.19]])

# Create feature matrix with missing values in the categorical feature
X_with_nan = np.array([[np.nan, 0.87, 1.31],
                       [np.nan, -0.67, -0.22]])

# Train KNN learner
clf = KNeighborsClassifier(3, weights='distance')
trained_model = clf.fit(X[:,1:], X[:,0])

# Pred

你可能感兴趣的:(数据科学,机器学习面试,机器学习,python,深度学习,数据挖掘)