数据标准化、归一化、正则化

(1)R

#data<-data.frame(x=c(1,2,3),y=c(2,3,4),z=c(3,4,5))
data<-read.csv(file.choose(),header=T, sep=",")
head(data)

# 1. z-score标准化=================================================
#选择归一化范围(标准化+中心化)
data_z<-scale(data[,c(4:dim(data)[2])], center = T, scale = T) #必须标+中才可以unscale
summary(data_z)
head(data_z)

#还原  
library(DMwR)
unz_dat<-unscale(data_z, data_z) #一步法vals=norm.data
head(unz_dat)

# 2. 正则化========================================================

(1)python

参考:

  • https://www.cnblogs.com/chaosimple/p/4153167.html
  • https://www.jianshu.com/p/bb7f3d51d7f0
  • https://blog.csdn.net/jinping_shi/article/details/52433975(L1 L2的解释)
  • https://blog.csdn.net/zyf89531/article/details/45922151(归一化、标准化和正则化的关系)
  • https://www.iteye.com/blog/lbingkuai-1632032(归一化、标准化和正则化的关系)
# -*- coding: utf-8 -*-
"""
Created on Tue Sep 10 11:16:35 2019

@author: xllix
"""
#静态数据,一次性标准化
#1.1 标准化 Z-Score,或者去除均值和方差缩放-------------------------------------
from  sklearn import preprocessing
import numpy as np
x = np.array([[1,2,3],
             [2,3,4],
             [3,4,5]])
print(x)
x_scaled = preprocessing.scale(x)
print('--------------------new data-----------------------')
print(x_scaled)
print('--------------------均值 标准差---------------------')
x_scaled.mean(axis=0) #均值方差相等
x_scaled.std(axis=0)

#拓展(动态数据,后续逐步添加)---------------------------------------------------------------------------
from  sklearn import preprocessing
import numpy as np
x = np.array([[1,2,3],
             [2,3,4],
             [3,4,5]])
print(x)

#使用sklearn.preprocessing.StandardScaler类,
scaler = preprocessing.StandardScaler().fit(x) #直接使用其对象转换测试集数据。
print('--------------------new method-----------------------')

x_scaled = scaler.transform(x) 
print(x_scaled)

new_data = np.array([[2,5,1],
                     [3,6,2],
                     [4,7,3]])

new_scale = scaler.transform(new_data) #使用训练集对测试集数据进行转换,跟上面的结果是一样的    
print(new_scale)

#--------------------------------------------------------------------------------
#1.2标准化 将属性缩放到一个指定范围,也是就是(x-min)/(max-min)------------------------
#依赖于preprocessing中的MinMaxScaler类
x_train = np.array([[1.,-1.,2.],
            [2.,0.,0.],
            [0.,1.,-1.]])
 
min_max_scaler = preprocessing.MinMaxScaler()
x_train_minmax = min_max_scaler.fit_transform(x_train)
print(x_train_minmax)
# 当然,在构造类对象的时候也可以直接指定最大最小值的范围:feature_range = (min, max),此时应用的公式变为:
# x_std = (X-X.min(axis=0))/(X.max(axis=0)-X.min(axis=0))
# x_scaled = X_std/(max-min)+min
 

#2. 正则化 Normalization--------------------------------------------------------
#静态数据
#https://blog.csdn.net/jinping_shi/article/details/52433975
x = np.array([[1.,-1.,2.],
            [2.,0.,0.],
            [0.,1.,-1.]])
x_normalized = preprocessing.normalize(x,norm='l2') #L2防止过拟合
print(x_normalized)
 
#拓展(动态数据)
#可以使用processing.Normalizer()类实现对训练集和测试集的拟合和转换----------------
normalizer = preprocessing.Normalizer().fit(x)
print(normalizer)
normalizer(copy=True, norm='l2')

#对训练数据进行正则
normalizer.transform(x)

# 对新的测试数据进行正则
normalizer.transform([[-1., 1., 0.]])

你可能感兴趣的:(数据标准化、归一化、正则化)