线性回归是机器学习中的基本算法,特别是用于解决回归问题。它用于根据一个或多个输入特征预测连续目标变量。让我们使用 scikit-learn 库在 Python 中实现线性回归:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import numpy as np
# Sample data
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([2, 4, 5, 4, 5])
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Create and train the linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
逻辑回归是一种广泛用于二元分类任务的算法。它对属于特定类的实例的概率进行建模。这是使用 scikit-learn 的代码示例:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
# Sample data
X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]]
y = [0, 0, 1, 1, 1]
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Create and train the logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
决策树是用于分类和回归任务的通用算法。他们根据最重要的特征递归地分割数据集。以下是使用 scikit-learn 构建用于分类的决策树的代码示例:
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
# Sample data
X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]]
y = [0, 0, 1, 1, 1]
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Create and train the decision tree classifier
model = DecisionTreeClassifier()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
随机森林是一种结合多个决策树来提高预测精度的集成学习方法。让我们使用 scikit-learn 实现一个随机森林分类器:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
# Sample data
X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]]
y = [0, 0, 1, 1, 1]
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Create and train the Random Forest classifier
model = RandomForestClassifier()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
支持向量机 (SVM
支持向量机是用于分类和回归的强大算法。他们的目标是找到最能区分不同类别的超平面。让我们使用 scikit-learn 创建一个 SVM 分类器:
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
# Sample data
X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]]
y = [0, 0, 1, 1, 1]
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Create and train the SVM classifier
model = SVC()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
在此代码示例中,我们实现了用于分类任务的 SVM 分类器。
k-最近邻 (KNN)
K 最近邻是一种简单而有效的分类和回归算法。它将数据点分配给其 k 最近邻中的多数类。这是使用 scikit-learn 的代码示例:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
# Sample data
X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]]
y = [0, 0, 1, 1, 1]
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Create and train the KNN classifier
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
此代码演示了如何使用 K-Nearest Neighbors 算法进行分类以及如何指定邻居的数量 (k)。
朴素贝叶斯是一种常用于文本分类和垃圾邮件过滤的概率算法。以下是使用 scikit-learn 构建朴素贝叶斯分类器的代码示例:
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split
# Sample data
X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]]
y = [0, 0, 1, 1, 1]
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Create and train the Naive Bayes classifier
model = GaussianNB()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)