spearman相关性
This article is about correlation and its implication in the machine learning. In my previous article, I have discussed Pearson’s correlation coefficient and later we have written a code to show the usefulness of finding Pearson’s correlation coefficient. Well, you must be thinking that why is there a need to use Spearman's correlation when we already have Pearson’s correlation to find out the correlation between the feature values and the target values? The answer is that "Pearson’s correlation works fine only with the linear relationships whereas Spearman's correlation works well even with the non-linear relationships".
本文介绍了相关性及其在机器学习中的含义。 在上一篇文章中,我讨论了Pearson的相关系数 ,后来我们编写了代码以显示找到Pearson的相关系数的有用性。 好吧,您必须考虑一下, 当我们已经有了Pearson的相关性以找出特征值与目标值之间的相关性时 , 为什么需要使用Spearman的相关性? 答案是“皮尔逊相关仅适用于线性关系,而斯皮尔曼相关甚至适用于非线性关系” 。
Another advantage of using Spearman’s correlation is that since it uses ranks to find the correlation values, therefore, this correlation well suited for continuous as well as discrete datasets.
使用Spearman相关性的另一个优点是,由于它使用秩来查找相关值,因此,此相关性非常适合于连续数据集和离散数据集。
Image source: https://digensia.files.wordpress.com/2012/04/s1.png
图片来源: https : //digensia.files.wordpress.com/2012/04/s1.png
Here, the the value of dican be calculated as X-Y where X= feature values and Y= target values.
在这里,dican的值可以计算为XY ,其中X =特征值 , Y =目标值 。
The Dataset used can be downloaded from here: headbrain4.CSV
可以从此处下载使用的数据集: headbrain4.CSV
Since we have used the continuous dataset. i.e. the same dataset used for Pearson’s correlation, you will not be able to observe much of a difference between the Pearson and Spearman correlation, you can download any discrete dataset and you’ll see the difference.
由于我们使用了连续数据集。 也就是说,与用于Pearson相关的数据集相同,您将无法观察到Pearson和Spearman相关之间的很大差异,您可以下载任何离散的数据集,然后看到差异。
So now, let us see how we can use Spearman's correlation in our machine learning program using python programming:
现在,让我们看看如何使用python编程在我们的机器学习程序中使用Spearman的相关性:
# -*- coding: utf-8 -*-
"""
Created on Sun Jul 29 22:21:12 2018
@author: Raunak Goswami
"""
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#reading the data
"""
here the directory of my code and the headbrain4.csv
file is same make sure both the files are stored in
the same folder or directory
"""
data=pd.read_csv('headbrain4.csv')
#this will show the first five records of the whole data
data.head()
#this will create a variable w which has the feature values i.e Gender
w=data.iloc[:,0:1].values
#this will create a variable x which has the feature values i.e Age Range
y=data.iloc[:,1:2].values
#this will create a variable x which has the feature values i.e head size
x=data.iloc[:,2:3].values
#this will create a variable y which has the target value i.e brain weight
z=data.iloc[:,3:4].values
print(round(data['Gender'].corr(data['Brain Weight(grams)'],method='spearman')))
plt.scatter(w,z,c='red')
plt.title('scattered graph for Spearman correlation between Gender and brainweight' )
plt.xlabel('Gender')
plt.ylabel('brain weight')
plt.show()
print(round(data['Age Range'].corr(data['Brain Weight(grams)'],method='spearman')))
plt.scatter(x,z,c='red')
plt.title('scattered graph for Spearman correlation between age and brainweight' )
plt.xlabel('age range')
plt.ylabel('brain weight')
plt.show()
print(round((data['Head Size(cm^3)'].corr(data['Brain Weight(grams)'],method='spearman'))))
plt.scatter(x,z,c='red')
plt.title('scattered graph for Spearman correlation between head size and brainweight' )
plt.xlabel('head size')
plt.ylabel('brain weight')
plt.show()
data.info()
data['Head Size(cm^3)'].corr(data['Brain Weight(grams)'])
k1=data.corr(method='spearman')
print("The table for all possible values of spearman's coeffecients is as follows")
print(k1)
After you run your code in Spyder tool provided by anaconda distribution just go to your variable explorer and search for the variable named as k1 and double-click to see the values in that variable and you’ll see something like this:
在anaconda发行版提供的Spyder工具中运行代码后,转到变量资源管理器并搜索名为k1的变量,然后双击以查看该变量中的值,您将看到类似以下内容:
Here,1 signifies a perfect correlation,0 is for no correlation and -1 signifies a negative correlation.
此处,1表示完全相关,0表示没有相关,-1表示负相关。
As you look carefully, you will see that the value of the correlation between brain weight and head size is always 1. If you remember were getting a similar value of correlation in Pearson’s correlation
仔细观察,您会发现大脑重量和头部大小之间的相关性值始终为1。如果您记得在皮尔森相关性中获得了相似的相关性值
Now, just go to the ipython console you will see some self-explanatory scattered graphs, in case you are having any trouble understanding those graphs just have a look at my previous article about Pearson’s correlation and its implication in machine learning and you’ll get to know.
现在,只要转到ipython控制台,您将看到一些不言自明的分散图,以防万一您无法理解这些图,请看一下我以前关于Pearson的相关性及其在机器学习中的含义的文章,您将获得要知道。
This was all for today guys hope you liked it if you have any queries just drop a comment below and I would be happy to help you.
今天,这就是全部,如果您有任何疑问,希望您喜欢它,只需在下面发表评论,我们将竭诚为您服务。
翻译自: https://www.includehelp.com/ml-ai/spearmans-correlation-and-its-implication-in-machine-learning.aspx
spearman相关性