python分类算法
Logistic regression is a very popular machine learning technique. We use logistic regression when the dependent variable is categorical. This article will focus on the implementation of logistic regression for multiclass classification problems. I am assuming that you already know how to implement a binary classification with Logistic Regression.
逻辑回归是一种非常流行的机器学习技术。 当因变量是分类的时,我们使用逻辑回归。 本文将重点介绍针对多类分类问题的逻辑回归的实现。 我假设您已经知道如何使用Logistic回归实现二进制分类。
If you haven’t worked on binary classification with logistic regression yet, I suggest, please go through this article before you dive into this one.
如果您尚未使用Logistic回归进行二进制分类,那么我建议您在进入本文之前先通读这篇文章。
Because multiclass classification is built on the binary classification.
因为多类分类是建立在二进制分类之上的。
You will learn the concepts, formulas, and a working example of binary classification in this article:
您将在本文中学习二进制分类的概念,公式和工作示例:
多类别分类 (Multiclass-Classification)
The implementation of Multiclass classification follows the same ideas as the binary classification. As you know in binary classification, we solve a yes or no problem. Like in the example in the above-mentioned article, the output answered the question if a person has heart disease or not. We had only two classes: heart disease and no heart disease.
多类分类的实现遵循与二进制分类相同的思想。 如您所知,在二进制分类中,我们解决了是或否问题。 就像上述文章中的示例一样,输出回答了一个人是否患有心脏病的问题。 我们只有两类:心脏病和无心脏病。
If the output is 1, the person has heart disease, and if the output is 0 the person does not have heart disease.
如果输出为1,则该人患有心脏病,如果输出为0,则该人没有心脏病。
In multi-class classification, we have more than two classes. Here is an example. Say, we have different features and characteristics of cars, trucks, bikes, and boats as input features. Our job is to predict the label(car, truck, bike, or boat).
在多类别分类中,我们有两个以上的类别。 这是一个例子。 说,我们具有汽车,卡车,自行车和船的不同特征和特性作为输入特征。 我们的工作是预测标签(汽车,卡车,自行车或船)。
How to solve this?
如何解决呢?
We will treat each class as a binary classification problem the way we solved a heart disease or no heart disease problem.
我们将以解决心脏病或无心脏病的方式将每个类别视为二元分类问题。
This approach is called the one vs all method
这种方法称为“一对多”方法
In the one vs all method, when we work with a class, that class is denoted by 1 and the rest of the classes becomes 0.
在one vs all方法中,当我们使用一个类时,该类用1表示,其余类变为0。
For example, if we have four classes: cars, trucks, bikes, and boats. When we will work on the car, we will use the car as 1 and the rest of the classes as zeros. Again, when we will work on the truck, the element of the truck will be one, and the rest of the classes will be zeros.
例如,如果我们有四个类别:汽车,卡车,自行车和船。 当我们在汽车上工作时,我们将汽车用作1,将其余类别用作零。 同样,当我们要在卡车上工作时,卡车的元素将为1,其余类别为零。
It will be more understandable when you will implement it. I suggest, you keep coding and running the codes as you read.
当您将其实现时,它将更加易于理解。 我建议您在阅读时继续编码并运行代码。
Here I will implement this algorithm in two different ways:
在这里,我将以两种不同的方式实现此算法:
- The gradient descent approach. 梯度下降法。
2. The optimization function approach.
2.优化功能方法。
Important equations and how it works:
重要方程式及其工作方式:
Logistic regression uses a sigmoid function to predict the output. The sigmoid function returns a value from 0 to 1. Generally, we take a threshold such as 0.5. If the sigmoid function returns a value greater than or equal to 0.5, we take it as 1, and if the sigmoid function returns a value less than 0.5, we take it as 0.
Logistic回归使用S型函数来预测输出。 S形函数返回的值为0到1。通常,我们采用一个阈值,例如0.5。 如果sigmoid函数返回的值大于或等于0.5,则将其视为1;如果sigmoid函数返回的值小于0.5,则将其视为0。
z is the input features multiplied by a randomly initialized value denoted as theta.
z是输入要素乘以表示为theta的随机初始化值的乘积。
Here, X is the input feature. In most cases, there are several input features. So, this formula becomes big:
X是输入要素。 在大多数情况下,有几种输入功能。 因此,此公式变得很大:
X1, X2, X3 are input features and one theta will be randomly initialized for each input feature. Theta0 in the beginning is the bias term.
X1,X2,X3是输入要素,并且将为每个输入要素随机初始化一个theta。 开头的Theta0是偏差项。
The goal of this algorithm will be to update this theta with each iteration so that it can establish a relationship between the input features and the output label.
该算法的目标是在每次迭代时更新此theta,以便它可以在输入要素和输出标签之间建立关系。
Cost Function and Gradient Descent
成本函数和梯度下降
The cost function gives the idea that how far is our prediction from the original output. Here is the formula for that:
成本函数给出的想法是,我们的预测与原始输出相差多远。 这是该公式:
Here,
这里,
m is the number of training examples or the number of training data,
m是训练示例数或训练数据数,
y is the original output label,
y是原始输出标签,
h is the hypothesis or the predicted output.
h是假设或预测的输出。
This is the equation for the gradient descent. Using this formula, we will update the theta values in each iteration:
这是梯度下降的方程式。 使用此公式,我们将在每次迭代中更新theta值:
梯度下降法的实现 (Implementation With Gradient Descent Method)
Prerequisites:
先决条件:
a. You need to be able to read and write python code comfortably.
一个。 您需要能够舒适地读取和编写python代码。
b. Basic Numpy and Pandas libraries.
b。 基本的Numpy和Pandas库。
Here I am going to show the implementation step by step.
在这里,我将逐步展示实现。
Import the necessary packages and the dataset. I took the dataset from Andrew Ng’s Machine Learning course in Coursera. This is a handwriting recognition dataset. There are digits from 1 to 10.
导入必要的包和数据集。 我从安德鲁·伍(Andrew Ng)在Coursera的机器学习课程中获取了数据集。 这是一个手写识别数据集。 从1到10的数字。
From the dataset of pixels, we need to recognize the digits. In this dataset input variables and output-variables are organized in different sheets in an Excel file. Please feel free to download the dataset from the link at the end of this page.
从像素数据集中,我们需要识别数字。 在此数据集中,输入变量和输出变量在Excel文件中的不同工作表中组织。 请随时从本页末尾的链接下载数据集。
Please run each piece of code if you are reading this to learn this algorithm.
如果您正在阅读本文,请运行每段代码以学习该算法。
Let’s import the necessary packages and the dataset,
让我们导入必要的包和数据集,
import pandas as pd
import numpy as np
xl = pd.ExcelFile('ex3d1.xlsx')
df = pd.read_excel(xl, 'X', header=None)
2. Import y, which is the output variable
2.导入y,它是输出变量
y = pd.read_excel(xl, 'y', hearder = None)
3. Define the hypothesis that takes the input variables and theta. It returns the calculated output variable.
3.定义采用输入变量和theta的假设。 它返回计算出的输出变量。
def hypothesis(theta, X):
return 1 / (1 + np.exp(-(np.dot(theta, X.T)))) - 0.0000001
4. Build the cost function that takes the input variables, output variable, and theta. It returns the cost of the hypothesis. That means it gives the idea about how far the prediction is from the original outputs.
4.构建采用输入变量,输出变量和theta的成本函数。 它返回假设的成本。 这意味着它给出了关于预测距原始输出有多远的想法。
def cost(X, y, theta):
y1 = hypothesis(X, theta)
return -(1/len(X)) * np.sum(y*np.log(y1) + (1-y)*np.log(1-y1))
5. Now, it’s time for data preprocessing.
5.现在,该进行数据预处理了。
The data is clean. Not much preprocessing is required. We need to add a bias column in the input variables. Please check the length of df and y. If the length is different, the model will not work.
数据是干净的。 不需要太多预处理。 我们需要在输入变量中添加一个bias列。 请检查df和y的长度。 如果长度不同,该模型将不起作用。
print(len(df))
print(len(y))
X = pd.concat([pd.Series(1, index=df.index, name='00'), df], axis=1)
6. y column has the digits from 1 to 10. That means we have 10 classes.
6. y列的数字从1到10。这意味着我们有10个类别。
We will make one column for each of the classes with the same length as y. When the class is 5, make a column that has 1 for the rows with 5 and 0 otherwise. We will do it programmatically with some simple code:
我们将为每个类创建一个与y长度相同的列。 当类为5时,请为该行创建一个包含1的列,否则应为5和0。 我们将使用一些简单的代码以编程方式进行操作:
for i in range(0, len(y.unique())):
for j in range(0, len(y1)):
if y[j] == y.unique()[i]:
y1.iloc[j, i] = 1
else:
y1.iloc[j, i] = 0
y1.head()
7. Define the function ‘gradient_descent’ now. This function will take input variables, output variable, theta, alpha, and the number of epochs as the parameter. Here, alpha is the learning rate.
7.现在定义函数“ gradient_descent”。 该函数将输入变量,输出变量,θ,alpha和历元数作为参数。 在这里,alpha是学习率。
You should choose it as per your requirement. A too small or too big learning rate can make your algorithm slow. I like to run the algorithm for different learning rates and get the idea of the right learning rate. It may take a few iterations to select the right learning rate.
您应该根据需要选择它。 太小或太大的学习速度可能会使您的算法变慢。 我喜欢针对不同的学习率运行该算法,并获得正确学习率的想法。 选择正确的学习率可能需要几次迭代。
For each of the columns in y1, we will implement a binary classification.
对于y1中的每一列,我们将实现一个二进制分类。
For example, when I am considering the digit 2, it should return 1 for digit 2 and 0 for the rest of the digits. So, as we have 10 classes, we have run each epoch(iteration) 10 times. So, we have a nested for loop here.
例如,当我考虑数字2时,它应该为数字2返回1,为其余数字返回0。 因此,由于我们有10个类,所以每个epoch(iteration)运行了10次。 因此,我们在这里有一个嵌套的for循环。
def gradient_descent(X, y, theta, alpha, epochs):
m = len(X)
for i in range(0, epochs):
for j in range(0, 10):
theta = pd.DataFrame(theta)
h = hypothesis(theta.iloc[:,j], X)
for k in range(0, theta.shape[0]):
theta.iloc[k, j] -= (alpha/m) * np.sum((h-y.iloc[:, j])*X.iloc[:, k])
theta = pd.DataFrame(theta)
return theta, cost
7. Initialize the theta. Remember, we will implement logistic regression for each class. There will be a series of theta for each class as well.
7.初始化theta。 记住,我们将为每个类实现逻辑回归。 每个课程也会有一系列的theta。
I am running this for 1500 epochs. I am sure the accuracy rate will be higher with more epochs.
我正在运行1500个纪元。 我相信随着时间的推移,准确率会更高。
theta = np.zeros([df.shape[1]+1, y1.shape[1]])
theta = gradient_descent(X, y1, theta, 0.02, 1500)
8. With this updated theta, calculate the output variable.
8.使用更新的theta,计算输出变量。
output = []
for i in range(0, 10):
theta1 = pd.DataFrame(theta)
h = hypothesis(theta1.iloc[:,i], X)
output.append(h)
output=pd.DataFrame(output)
9. Compare the calculated output and the original output variable to calculate the accuracy of the model.
9.比较计算出的输出和原始输出变量,以计算模型的准确性。
accuracy = 0
for col in range(0, 10):
for row in range(len(y1)):
if y1.iloc[row, col] == 1 and output.iloc[col, row] >= 0.5:
accuracy += 1
accuracy = accuracy/len(X)
The accuracy is 72%. I am sure, accuracy will be better for more epochs. Because it takes so much time, I did not rerun the algorithm.
准确度是72%。 我敢肯定,准确度会更高。 因为花费了很多时间,所以我没有重新运行算法。
If you are running this, feel free to try for more epochs and let me know in the comment section, how much accuracy, you have got.
如果您正在运行此程序,请随意尝试更多的纪元,并在注释部分中告诉我您所获得的准确度。
Instead of a gradient descent approach, you can also use an optimization function already built-in for you.
除了梯度下降方法外,您还可以使用已经为您内置的优化功能 。
In this approach, you use an optimization function to optimize the theta for the algorithm. It’s a lot faster approach.
在这种方法中,您可以使用优化函数来优化算法的theta。 这是一个更快的方法。
具有优化功能的实现 (Implementation With An Optimization Function)
1. We are going to use the same datasets as before. Import the dataset with a different name if you are using the same notebook:
1.我们将使用与以前相同的数据集。 如果使用同一笔记本,请使用其他名称导入数据集:
xls = pd.ExcelFile('ex3d1.xlsx')
df = pd.read_excel(xls, 'X', header=None)
2. We still need to add a column of all ones for the bias term in df.
2.我们仍然需要为df中的偏差项添加一列全为1的列。
X = np.c_[np.ones((df.shape[0], 1)), df]
3. Import the data for ‘y’.
3.导入“ y”的数据。
y = pd.read_excel(xls, 'y', header=None)
As this is a DataFrame, just take the column zero as a series and make it two -dimensional to match the dimension with the dimension of X.
由于这是一个DataFrame,因此只需将列零作为一个序列并将其设为二维,以将维与X的维匹配。
y = y[0]
y = y[:, np.newaxis]
Here, ‘y’ has one column only. Make it 10 columns for 10 classes. Each column will deal with one class. For example, when we will deal with class 10, we will keep 10 in its place and replace the rest of the values with zeros. Here is the function y_change that will take y itself and a class(such as 3). Then it will replace 3 with 1 and 0 with all other classes. This function will be used soon in the later steps.
在这里,“ y”只有一列。 将其设为10列,以供10个班级使用。 每列将处理一个类。 例如,当我们处理类10时,我们将保留10的位置,并将其余值替换为零。 这是函数y_change,它将使用y本身和一个类(例如3)。 然后它将用1替换3并用所有其他类替换0。 此功能将在以后的步骤中很快使用。
def y_change(y, cl):
y_pr=[]
for i in range(0, len(y)):
if y[i] == cl:
y_pr.append(1)
else:
y_pr.append(0)
return y_pr
Data preparation is completed. Now develop the model:
数据准备完成。 现在开发模型:
4. Define the hypothesis function. This is the same as the previous method.
4.定义假设函数。 这与以前的方法相同。
def hypothesis(X, theta):
z = np.dot(X, theta)
return 1/(1+np.exp(-(z)))
5. Develop the cost function. This one is also the same as the previous method:
5.开发成本函数。 此方法也与以前的方法相同:
def cost_function(theta, X, y):
m = X.shape[0]
y1 = hypothesis(X, theta)
return -(1/len(X)) * np.sum(y*np.log(y1) + (1-y)*np.log(1-y1))
6. Define the gradient. This one is different. This function defines how to update the theta.
6.定义渐变。 这是不同的。 此函数定义如何更新theta。
def gradient(theta, X, y):
m = X.shape[0]
y1 = hypothesis(X, theta)
return (1/m) * np.dot(X.T, y1 - y)
7. Now, import the optimization function and initialize the theta. I am taking zeros as initial theta values. Any other values should work as well.
7.现在,导入优化函数并初始化theta。 我将零作为初始theta值。 任何其他值也应该起作用。
from scipy.optimize import minimize, fmin_tnc
theta = np.zeros((X.shape[1], 1))
8. Let’s make a fit function that will take X, y, and theta as input. It will use an optimization function and output the optimized theta for us.
8.让我们做一个拟合函数,将X,y和theta作为输入。 它将使用优化函数并为我们输出优化的theta。
It takes these three parameters:
它采用以下三个参数:
i. A function that needs to be minimized,
一世。 需要最小化的功能,
ii. A parameter to be optimized and,
ii。 要优化的参数,
iii. Arguments to use for optimization.
iii。 用于优化的参数。
In this example, cost function should be minimized and theta needs to be optimized for that. Input and output variables X and y are the arguments to use.
在此示例中,应将成本函数最小化,并且为此需要优化theta。 输入和输出变量X和y是要使用的参数。
This optimization function takes another parameter, that is the gradient. But this is optional. Here, we have a formula or function for the gradient. So we are passing it.
该优化函数采用另一个参数,即渐变。 但这是可选的。 在这里,我们有一个用于渐变的公式或函数。 因此,我们正在通过它。
def fit(X, y, theta):
opt_weigths = fmin_tnc(func=cost_function, x0=theta,
fprime=gradient, args=(X, y.flatten()))
return opt_weigths[0]
9. Use this fit method to find the optimized theta. We have to optimize the theta for each class separately. Let’s develop a function where for each class, ‘y’ will be modified accordingly using the y_change method in step 3.
9.使用这种拟合方法来找到优化的theta。 我们必须分别为每个类优化theta。 让我们开发一个函数,其中对于每个类,将在步骤3中使用y_change方法相应地修改“ y”。
def find_param(X, y, theta):
y_uniq = list(set(y.flatten()))
theta_list = []
for i in y_uniq:
y_tr = pd.Series(y_change(y, i))
y_tr = y_tr[:, np.newaxis]
theta1 = fit(X, y, theta)
theta_list.append(theta1)
return theta_list
Use this method to find the final theta
使用此方法找到最终theta
theta_list = find_param(X, y, theta)
10. It’s time to predict the output. We have to predict the classes individually as well.
10.现在是时候预测输出了。 我们还必须单独预测类别。
def predict(theta_list, x, y):
y_uniq = list(set(y.flatten()))
y_hat = [0]*len(y)
for i in range(0, len(y_uniq)):
y_tr = y_change(y, y_uniq[i])
y1 = hypothesis(X, theta_list[i])
for k in range(0, len(y)):
if y_tr[k] == 1 and y1[k] >= 0.5:
y_hat[k] = y_uniq[i]
return y_hat
Use the predict method above and calculate the predicted output y_hat:
使用上面的预测方法并计算预测输出y_hat:
y_hat = predict(theta_list, X, y)
11. Calculate the accuracy
11.计算精度
accuracy=0
for i in range(0, len(y)):
if y_hat[i] == y.flatten()[i]:
accuracy += 1
print(accuracy/len(df)*100)
This process gives a 100% accuracy. Now. you decide for yourself, which method of logistic regression you want to use for your projects.
此过程可提供100%的准确性。 现在。 您可以自己决定要在项目中使用哪种逻辑回归方法。
This same problem is solved using a neural network as well in this article that shows how to develop a neural network from scratch:
本文还使用神经网络解决了相同的问题,该文章展示了如何从头开始开发神经网络:
Please ask me if you have any questions in the comment section. Check this GitHub page for the dataset:
如果在评论部分有任何疑问,请问我。 检查此GitHub页面以获取数据集:
Here is the link for the code of the gradient descent method
这是梯度下降方法代码的链接
Here is the link for the Github link of the optimization function method:
这是优化函数方法的Github链接:
Recommended Reading:
推荐读物:
翻译自: https://towardsdatascience.com/multiclass-classification-algorithm-from-scratch-with-a-project-in-python-step-by-step-guide-485a83c79992
python分类算法