线性判别分析LDA原理总结:https://www.cnblogs.com/pinard/p/6244265.html
用法参考:http://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html
https://blog.csdn.net/lHz76ttw1U/article/details/60768981
数据集随机抽样:https://blog.csdn.net/qq_22238533/article/details/71080942
数据预处理:https://blog.csdn.net/csmqq/article/details/51461696
https://blog.csdn.net/pipisorry/article/details/52247679
缺省值处理:https://blog.csdn.net/w352986331qq/article/details/78639233
主成分分析(Principal components analysis)-最大方差解释:http://www.cnblogs.com/jerrylead/archive/2011/04/18/2020209.html
再生希尔伯特空间(RKHS):
https://blog.csdn.net/haolexiao/article/details/72171523?utm_source=itdadao&utm_medium=referral
MMD:https://blog.csdn.net/he_min/article/details/69397975
pandas将时间转化为时间戳:
for i in range(0, len(rc)):
timeArray = time.strptime(rc.iloc[i]['COLLECTTIME'], "%Y-%m-%d %H:%M:%S")
rc.loc[i, 'COLLECTTIME'] = int(time.mktime(timeArray))
rc[['COLLECTTIME']]=rc[['COLLECTTIME']].apply(pd.to_numeric)
在分类中如何处理训练集中不平衡问题:
https://blog.csdn.net/heyongluoyao8/article/details/49408131
https://blog.csdn.net/nlpuser/article/details/81265614
https://blog.csdn.net/qq_31813549/article/details/79964973
SMOTE算法:
http://contrib.scikit-learn.org/imbalanced-learn/stable/generated/imblearn.over_sampling.SMOTE.html
朴素随机过采样:
http://contrib.scikit-learn.org/imbalanced-learn/stable/generated/imblearn.over_sampling.RandomOverSampler.html
随机森林:
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
将分类后得到的标签四舍五入取整:
rc.round(0).astype(int)
替换成原来标签:
rc.replace([0,1,2,3,4,5,6,7,8,9,10,11,12,13],[0,1122,1141,1145,1168,1182,1206,1209,1211,1215,1216,1239,1246,1341])
将两列dataframe合并(是否可以设置列标签在python3.6下存疑):
pred= pd.merge(df_2,df_3,left_index=True,right_index=True,how='outer').set_axis(['idx', 'Pred'], axis='columns', inplace=False)