pycaret
pycaret是机器学习的懒人包。与其他开源机器学习库相比,pycaret是一个备用的低代码库,可用于仅用很少几个单词替换数百行代码。它本质上就是组装了多个机器学习库和框架,例如scikit-learn,XGBoost,Microsoft LightGBM,spaCy等。
比如几年前,为了这样对比sklearn的几个estimator,你需要以下的代码:
# Regression problem
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from sklearn import model_selection
from sklearn.metrics import make_scorer, mean_squared_error
from sklearn.svm import SVR, LinearSVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.linear_model import LinearRegression,Ridge,Lasso,ElasticNet,BayesianRidge,SGDRegressor
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.neighbors import KNeighborsRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.neural_network import MLPRegressor
from sklearn.ensemble import GradientBoostingRegressor,RandomForestRegressor,ExtraTreesRegressor
from sklearn.kernel_ridge import KernelRidge
models=[]
models.append(('DecisionTree', DecisionTreeRegressor()))
models.append(('Ridge', Ridge()))
models.append(('Lasso', Lasso()))
models.append(('EN', ElasticNet(alpha=0.001,max_iter=10000)))
models.append(('BayesianRidge',BayesianRidge()))
models.append(('SVM',SVR()))
models.append(('KNeighbors',KNeighborsRegressor()))
models.append(('NN',MLPRegressor()))
models.append(('GBoosting',GradientBoostingRegressor()))
models.append(('RF',RandomForestRegressor()))
models.append(('ExtraTrees',ExtraTreesRegressor()))
models.append(('SGD',SGDRegressor(max_iter=1000,tol=1e-3)))
models.append(('Kernel_Ridge',KernelRidge(alpha=0.6, kernel='polynomial', degree=2, coef0=2.5)))
models.append(('LR_SVR',LinearSVR()))
models.append(('LR',LinearRegression()))
def compare_scores_mae(models, X, y):
cv_means = []
cv_std = []
cv_resutls= []
names=[]
for name,model in models:
kfold = model_selection.KFold(n_splits=10)
cv_results = model_selection.cross_val_score(model, X, y, cv=kfold, scoring='neg_mean_absolute_error',n_jobs=10)
cv_means.append(cv_results.mean())
cv_std.append(cv_results.std())
names.append(name)
msg = "%s: %f (%f)" % (name, cv_results.mean(), cv_results.std())
print(msg)
cv_res=pd.DataFrame({"CrossValMeans":cv_means,"CrossValerrors": cv_std,"Algorithm":names})
g = sns.barplot("CrossValMeans","Algorithm",data = cv_res, palette="Set3",orient = "h",**{'xerr':cv_std})
g.set_xlabel("negative MAE")
g = g.set_title("Cross validation scores")
return cv_res
对,我在认识pycaret之前就是这么干的。
pycaret就是这样把这些代码封装成了一个函数:compare_models()
安装
官方给出了通过pip
#installing for the first time
pip install pycaret
#if you have installed beta version in past, run the below code to upgrade
pip install --upgrade pycaret
#Run the below code in your notebook to check the installed version
from pycaret.utils import version
version()
或者conda安装的方法,
#create a conda environment
conda create --name yourenvname python=3.6
#activate environment
conda activate yourenvname
#install pycaret
pip install pycaret
#create notebook kernel connected with the conda environment
python -m ipykernel install --user --name yourenvname --display-name "display-name-here"
如果在colab或者kaggle的instance的话使用!pip就好。然鹅如果安装过python或者R包的你知道,事情可能并没有那么简单,Macos在安装llvmlite和LightGBM的时候各种error让人猝不及防,导致安装失败。大概花了一个小时在Macos上安装pycaret (悄悄告诉你kaggle的instance上安装没有任何毛病)。
llvmlite
pip安装总是出现python setup_tools的相关错误,过程中发现llvmite这个包需要cmake。用brew安装了cmake,结果还是不行。最后在github一个角落发现可以使用easy_install的命令轻松解决其不能在python3.8上安装的问题,果断试了试(自己用的conda环境python3.5),问题解决。原理不得而知,pip不行,easy_install就可以。
brew install cmake
easy_install llvmlite
LightGBM
LihgtGBM是树模型中模型能力最优异的模型之一,作为pycaret包含的模型之一,安装pycaret的过程中也需要安装LightGBM。LightGBM在window上的安装很简单(微软自家开发),直接使用python自带的pip安装工具安装即可。在Mac上用pip安装会遇到错误。因此需要安装C版本LightGBM。
pip uninstall lightgbm
git clone --recursive https://github.com/Microsoft/LightGBM ; cd LightGBM
export CXX=g++-8 CC=gcc-8
mkdir build ; cd build
cmake ..
make -j4
如果发现自己没有gcc-8的话,使用brew安装gcc-8,记忆中cmake也是需要用到到。
brew install gcc@8
最后的建议
conda和pip安装最好不要混搭。
不要升级pip,升级过后你会有一种需要重新装python的赶脚。
升级之后使用pip如下
File "F:\anaconda\envs\emotion\lib\site-packages\pkg_resources\__init__.py", line 2331, in resolve
module = __import__(self.module_name, fromlist=['__name__'], level=0)
File "F:\anaconda\envs\emotion\lib\site-packages\pip\_internal\__init__.py", line 42, in
from pip._internal import cmdoptions
File "F:\anaconda\envs\emotion\lib\site-packages\pip\_internal\cmdoptions.py", line 16, in
from pip._internal.index import (
ImportError: cannot import name 'FormatControl'
附赠一份降级教程:
https://pypi.org/project/pip/19.1.1/#files
手动下载第二个文件并解压,在其目录下运行
python setup.py install
End