基于SVM特征选择的问题记录

E:\Project_CAD\venv\lib\site-packages\sklearn\svm\base.py:922: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)

翻译:ConvergenceWarning:Liblinear无法收敛,增加了迭代次数。“迭代次数。”,ConvergenceWarning)
Stack Overflow的解决方案
https://stackoverflow.com/questions/52670012/convergencewarning-liblinear-failed-to-converge-increase-the-number-of-iterati

Normally when an optimization algorithm does not converge, it is usually because the problem is not well-conditioned, perhaps due to a poor scaling of the decision variables. There are a few things you can try.

  1. Normalize your training data so that the problem hopefully becomes more well conditioned, which in turn can speed up convergence. One possibility is to scale your data to 0 mean, unit standard deviation using Scikit-Learn’s StandardScaler for an example. Note that you have to apply the StandardScaler fitted on the training data to the test data.
  2. Related to 1), make sure the other arguments such as regularization weight, C, is set appropriately.
  3. Set max_iter to a larger value. The default is 1000.

基于RFE特征选择后如何打印新特征:查看官方文档,及翻译文档。
http://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.RFE.html#sklearn.feature_selection.RFE.get_support(官方文档)
https://blog.csdn.net/kancy110/article/details/72835050(该博客翻译了sklearn很多文章)

  1. 归一化
  2. 问题:注意到,房屋面积及卧室数量两个特征在数值上差异巨大,如果直接将该样本送入训练,则代价函数的轮廓会是“扁长的”,在找到最优解前,梯度下降的过程不仅是曲折的,也是非常耗时的:
    该问题的出现是因为我们没有同等程度的看待各个特征,即我们没有将各个特征量化到统一的区间。
    https://blog.csdn.net/leiting_imecas/article/details/54986045?utm_source=blogxgwz4(该博客介绍透彻,函数编写思路清晰)
  3. 数据归一化和其在sklearn中的处理
    https://blog.csdn.net/Gamer_gyt/article/details/77761884(该博客总结全面)

2 编写归一化函数时遇到的问题
1)参考1.归一化提到的两个博客进行改进z-score归一化函数的编写(基于Python)
2)list列表没有shape属性。numpy.array to use shape attribute.
3)Python2和Python3在除法,取整和求模中有区别。
a. /是精确除法,//是向下取整除法,%是求模(Python3)
b. %求模是基于向下取整除法规则的(Python3)
c. 四舍五入取整round, 向零取整int, 向下和向上取整函数math.floor, math.ceil(Python3)
d. //和math.floor在CPython中的不同
e. /在python 2 中是向下取整运算
f. C中%是向零取整求模。
4)数据的索引不能是浮点数features[m//2],m//2取整
5)**变量类型特别注意:**变量L在赋值时就定义了数据的类型,赋值1.0时是浮点类型,计算结果再赋值L时仍是浮点类型。若赋值定义时,赋值1则定义了整数类型,计算结果是浮点类型时赋值L仍以整数保存,自动把浮点型转换为整数型
6)0 1二分类的也可随其他连续数值型的特征一起归一化,只是将0和1映射到另两个数而已
7)new_scale函数无法调用切片类型的数据进行归一化。解决办法:取切片的值进行归一化。详见https://stackoverflow.com/questions/43290202/python-typeerror-unhashable-type-slice-for-encoding-categorical-data data.iloc[:,:55]=new_scale(data.values[:,:55])

3.固定训练集和测试集后,每次训练准确性还是不一致,原因是特征筛选的数据集不定,筛选的特征不一致导致。如何固定训练集和测试集,如何进一步固定特征筛选的数据集???
4.训练模型时还是存在不收敛的问题,如何解决?
5.画roc曲线,求解三个模型评估指标

你可能感兴趣的:(python,数据预处理,machine,learning)