肯德尔相关性分析_肯德尔的Tau机器学习相关性

肯德尔相关性分析

Before we begin I hope you guys have a basic understanding of Pearson’s and Spearman's correlation. As the name suggests this correlation was named after Maurice Kendall in the year 1938.

在开始之前,我希望你们对皮尔逊和斯皮尔曼的相关性有一个基本的了解。 顾名思义,这种关联是在1938年莫里斯·肯德尔(Maurice Kendall )命名的。

This type of correlation is best suited for the discrete data. Here we are not completely dependent on the directional flow of the ranks of various observation that we used to do in spearman’s correlation. Here we are more concerned with concordant pairs and discordant pairs.

这种相关性最适合离散数据。 在这里,我们并不完全依赖于我们过去在斯皮尔曼相关性中所做的各种观测的秩的方向流。 在这里,我们更关心一致对和不一致对。

1. Concordant pairs

1.协和对

For a given set of data the concordant pairs are such that for a given set of data suppose (x1, y1) and (x2, y2) then x1 and y1 where x1 and x2 can be any of the attribute values and y1 and y2 are the values in the target column.

对于给定的一组数据,一致对是这样的:对于给定的一组数据,假设(x1,y1)和(x2,y2),x1 y1 ,其中x1x2可以是任何属性值,并且y1y2是目标列中的值。

2. Discordant pairs

2.不和谐对

For a given set of data, the discordant pairs would be the pairs which do not satisfy the property of the concordant pairs which is x1 and y1. Where x1 and x2 can be any of the attribute values and y1 and y2 are the values in the target column.

对于给定的数据集,不一致对将是不满足一致对的属性x1 y1 。 其中x1x2可以是任何属性值,而y1y2是目标列中的值。

After calculating concordant and discordant pairs we find the difference between them and then divide the result by the number of possible combinations of the different pairs. The main aim of dividing the difference by the number of possible combination pairs is to make the value of Kendall's coefficient i.e. tau to fall under -1 to 1 so that it is easier to find out whether the given attribute should be used for predictive analysis of the target value. Unlike other correlations here too, 0 will signify 0 correlation and 1 signifies perfect correlation and -1 signifies the negative correlation.

在计算一致对和不一致对之后,我们找到它们之间的差异,然后将结果除以不同对可能组合的数量。 将差异除以可能的组合对的数量的主要目的是使肯德尔系数(即tau)的值落在-1到1之间,以便更容易找出是否应将给定属性用于对目标值。 也不同于此处的其他相关,0表示0相关,1表示完全相关,-1表示负相关。

The mathematical formula for the given correlation is mentioned below:

给定相关性的数学公式如下:

    ((Number of concordant pairs) - (number of discordant pairs))/(N(N-1))/2

Here, (N(N-1))/2 is the number of possible pairs in the dataset

这里, (N(N-1))/ 2是数据集中可能的对数

Dataset description:

数据集描述:

The data set used has two columns i.e.

使用的数据集有两列,即

  1. YearsExperience

    多年经验

  2. Salary

    薪水

The data set tell about the salary of the different employees based on the years of experience in their field so we would be using correlation to find out the relation between years of experience and the salary.

数据集根据他们在该领域的经验年数来说明不同员工的薪水,因此我们将使用相关性来找出经验年数与薪水之间的关系。

The data set can be downloaded from here: Salary_Data.csv

数据集可从此处下载: Salary_Data.csv

Now without wasting any time let us write the python code for the following correlation.

现在,不浪费时间,让我们为以下关联编写python代码。

Code:

码:

# -*- coding: utf-8 -*-
"""
Created on Sun Jul 29 22:21:12 2018

@author: Raunak Goswami
"""

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#reading the data
"""
here the directory of my code and the headbrain4.csv 
file is same make sure both the files are stored in 
the same folder or directory
""" 
data=pd.read_csv('Salary_Data.csv')

#this will show the first five records of the whole data
data.head()

#this will create a variable w which has the feature values i.e years of experience
w=data.iloc[:,0:1].values
#this will create a variable x which has the feature values i.e salary
y=data.iloc[:,1:2].values

print(round(data['YearsExperience'].corr(data['Salary'],method='kendall')))          
plt.scatter(w,y,c='red')
plt.title('scattered graph for kendall correlation between years of experience and salary' )
plt.xlabel('Gender')
plt.ylabel('brain weight')
plt.show()

data.info()
data['YearsExperience'].corr(data['Salary'])
k1=data.corr(method='kendall')
print("The table for all possible values of kendall'scoeffecients is as follows")
print(k1)

Output

输出量

肯德尔相关性分析_肯德尔的Tau机器学习相关性_第1张图片
肯德尔相关性分析_肯德尔的Tau机器学习相关性_第2张图片

From the given output the value of Kendall tau’s correlation coefficient between years of experience and salary comes out to be 0.841016 which is a fairly good correlation value. That was all for today guys hope you liked this article. Keep learning.

从给定的输出中, Kendall tau在多年经验和薪水之间的相关系数的值为0.841016 ,这是一个相当不错的相关值。 今天就是这些,希望大家喜欢这篇文章。 保持学习。

翻译自: https://www.includehelp.com/ml-ai/kendalls-tau-correlation-in-machine-learning.aspx

肯德尔相关性分析

你可能感兴趣的:(python,机器学习,人工智能,java,数据分析)