暑期学习小结

线性判别分析LDA原理总结:https://www.cnblogs.com/pinard/p/6244265.html

用法参考:http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html

https://blog.csdn.net/lHz76ttw1U/article/details/60768981

数据集随机抽样:https://blog.csdn.net/qq_22238533/article/details/71080942

数据预处理:https://blog.csdn.net/csmqq/article/details/51461696

https://blog.csdn.net/pipisorry/article/details/52247679

缺省值处理:https://blog.csdn.net/w352986331qq/article/details/78639233

主成分分析(Principal components analysis)-最大方差解释:http://www.cnblogs.com/jerrylead/archive/2011/04/18/2020209.html

再生希尔伯特空间(RKHS):

https://blog.csdn.net/haolexiao/article/details/72171523?utm_source=itdadao&utm_medium=referral

MMD:https://blog.csdn.net/he_min/article/details/69397975

pandas将时间转化为时间戳:

for i in range(0, len(rc)):
    timeArray = time.strptime(rc.iloc[i]['COLLECTTIME'], "%Y-%m-%d %H:%M:%S")
    rc.loc[i, 'COLLECTTIME'] = int(time.mktime(timeArray))
rc[['COLLECTTIME']]=rc[['COLLECTTIME']].apply(pd.to_numeric)

 

在分类中如何处理训练集中不平衡问题:

https://blog.csdn.net/heyongluoyao8/article/details/49408131

https://blog.csdn.net/nlpuser/article/details/81265614

https://blog.csdn.net/qq_31813549/article/details/79964973

SMOTE算法:

http://contrib.scikit-learn.org/imbalanced-learn/stable/generated/imblearn.over_sampling.SMOTE.html

朴素随机过采样:

http://contrib.scikit-learn.org/imbalanced-learn/stable/generated/imblearn.over_sampling.RandomOverSampler.html

随机森林:

http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

将分类后得到的标签四舍五入取整:

rc.round(0).astype(int)

替换成原来标签:

rc.replace([0,1,2,3,4,5,6,7,8,9,10,11,12,13],[0,1122,1141,1145,1168,1182,1206,1209,1211,1215,1216,1239,1246,1341])

将两列dataframe合并(是否可以设置列标签在python3.6下存疑):

pred= pd.merge(df_2,df_3,left_index=True,right_index=True,how='outer').set_axis(['idx', 'Pred'], axis='columns', inplace=False)

 

你可能感兴趣的:(暑期学习小结)