lda python代码1

 

#!/usr/bin/env python
# coding: utf-8

# Linear Discriminant Analysis (LDA) is most commonly used as dimensionality reduction technique in the pre-processing step for pattern-classification and machine learning applications. The goal is to project a dataset onto a lower-dimensional space with good class-separability in order avoid overfitting (“curse of dimensionality”) and also reduce computational costs.

# Listed below are the 5 general steps for performing a linear discriminant analysis; we will explore them in more detail in the following sections.
# 
# 1. Compute the d-dimensional mean vectors for the different classes from the dataset.
# 2. Compute the scatter matrices (in-between-class and within-class scatter matrix).
# 3. Compute the eigenvectors and corresponding eigenvalues for the scatter matrices.
# 4. Choose k eigenvectors corresponding to top k eigenvalues to form a d×k dimensional matrix WW(where every column represents an eigenvector).
# 5. Use this d×k eigenvector matrix to transform the samples onto the new subspace.

# In[1]:


#import all dependencies
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
get_ipython().run_line_magic('matplotlib', 'inline')


# #data description Data were extracted from images that were taken from genuine and forged banknote-like specimens. For digitization, an industrial camera usually used for print inspection was used. The final images have 400x 400 pixels. Due to the object lens and distance to the investigated object gray-scale pictures with a resolution of about 660 dpi were gained. Wavelet Transform tool were used to extract features from images.
# 
# Attribute Information:
# 1.variance of Wavelet Transformed image (continuous)
# 2.skewness of Wavelet Transformed image (continuous)
# 3.curtosis of Wavelet Transformed image (continuous)
# 4.entropy of image (continuous)
# 5.class (integer) (0-not authentic, 1-authentic)
# 

# In[2]:


columns = ["var","skewness","curtosis","entropy","class"]
df = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/00267/data_banknote_authentication.txt",index_col=False, names = columns)


# In[3]:


df.head()


# In[4]:


df.describe()


# In[5]:


#now have to look at the class distribution
from collections import Counter
Counter(df["class"])


# In[6]:


# split data table into data X and class labels y
X = df.ix[:,0:4].values
y = df.ix[:,4].values


# In[7]:


f, ax = plt.subplots(1, 4, figsize=(10,3))
sns.distplot(df["var"],bins=10, ax= ax[0])
sns.distplot(df["skewness"],bins=10, ax=ax[1])
sns.distplot(df["curtosis"],bins=10, ax= ax[2])
sns.distplot(df["entropy"],bins=10, ax=ax[3])
f.savefig('subplot.png')


# In[8]:


vis1 = sns.pairplot(df, hue="class")
#fig = vis1.get_fig()
vis1.savefig("lda.png")


# It should be mentioned that LDA assumes normal distributed data, features that are statistically independent, and identical covariance matrices for every class. However, this only applies for LDA as classifier and LDA for dimensionality reduction can also work reasonably well if those assumptions are violated. And even for classification tasks LDA seems can be quite robust to the distribution of the data
#Step 1: Computing the d-dimensional mean vectors (here d = 4 i.e. number of features)
# In[44]:


np.set_printoptions(precision=5)


# In[45]:


mean_vec = []
for i in df["class"].unique():
    mean_vec.append( np.array((df[df["class"]==i].mean()[:4]) ))
print(mean_vec)

# Step 2: Computing the Scatter Matrices
#The within-class scatter matrix SW is computed by the following equation:
#SW = sum(Si)

#where
#Si=sum((x-mi)(x-mi).T)
#(scatter matrix for every class)

#and mi is the mean vector

# In[46]:


SW = np.zeros((4,4))
for i in range(1,4): #2 is number of classes
    per_class_sc_mat = np.zeros((4,4))
    for j in range(df[df["class"]==i].shape[0]):
        row, mv = df.loc[j][:4].values.reshape(4,1), mean_vec[i].reshape(4,1)
        per_class_sc_mat += (row-mv).dot((row-mv).T)
    SW += per_class_sc_mat


# In[47]:


print('within-class Scatter Matrix:\n', SW)

 

你可能感兴趣的:(python机器学习)