pyinstaller打包机器学习库若干坑

参考文档Recipe Multiprocessing

背景

之前调研的pyinstaller打包bin的方案进入落地阶段,之前调研文章见利用pyinstaller打包python项目发布到线上。之前实验的对象是个很简单的web服务,没有过多的依赖其他包,这次落地的项目里面使用了很多的机器学习库,所以落地过程中还是稍显麻烦。

问题

  • pyd文件引入问题
  • .so文件引入问题
  • multiprocessing和pyinstaller冲突问题

下面一一来说

pyd文件引入问题

pipenv run pyinstaller -F main.py -n scscore

打包成功后,生成了一个spec文件,执行程序,报错

[doctorq@gz-inf-development01 scscore]$ ./dist/scscore
/tmp/_MEINtWbir/sklearn/externals/joblib/__init__.py:15: DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.
Traceback (most recent call last):
  File "main.py", line 8, in 
    from src.route import load_route
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "src/route.py", line 6, in 
    from src.view.forecast_view.feature_importance_view import FeatureImportanceView
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "src/view/forecast_view/feature_importance_view.py", line 7, in 
    from src.importance.feature_importance import FeatureImportance
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "src/importance/feature_importance.py", line 7, in 
    from src.forecasting.trainer import Trainer
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "src/forecasting/trainer.py", line 13, in 
    from src.Models.collect_models import ModelCollector
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "src/Models/collect_models.py", line 7, in 
    from src.Models.statistic_model.ARIMA import ARIMA
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "src/Models/statistic_model/ARIMA.py", line 2, in 
    from pmdarima.arima import auto_arima
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/pmdarima/__init__.py", line 29, in 
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/pmdarima/arima/__init__.py", line 6, in 
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/pmdarima/arima/arima.py", line 10, in 
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/sklearn/metrics/__init__.py", line 36, in 
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/sklearn/metrics/cluster/__init__.py", line 20, in 
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/sklearn/metrics/cluster/unsupervised.py", line 16, in 
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/sklearn/metrics/pairwise.py", line 32, in 
  File "sklearn/metrics/pairwise_fast.pyx", line 1, in init sklearn.metrics.pairwise_fast
ModuleNotFoundError: No module named 'sklearn.utils._cython_blas'
[6625] Failed to execute script main

这些文件是c/c++编译成的python库,供python调用,需要额外处理,处理逻辑就是把这些库按个加到scscore.spec文件中的hiddenimports属性中,我是把各个库下面的里的cpython关键字的文件都加上了

[doctorq@gz-inf-development01 utils]$ ll|grep cpython
-rwxrwxr-x 1 doctorq doctorq 221256 7月  16 15:41 arrayfuncs.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 426280 7月  16 15:41 _cython_blas.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 238344 7月  16 15:41 fast_dict.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq  95824 7月  16 15:41 graph_shortest_path.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq  28512 7月  16 15:41 lgamma.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 179592 7月  16 15:41 _logistic_sigmoid.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq  88856 7月  16 15:41 murmurhash.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq  99864 7月  16 15:41 _random.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 140992 7月  16 15:41 seq_dataset.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq 643648 7月  16 15:41 sparsefuncs_fast.cpython-37m-x86_64-linux-gnu.so
-rwxrwxr-x 1 doctorq doctorq  62880 7月  16 15:41 weight_vector.cpython-37m-x86_64-linux-gnu.so

添加后的spec文件如下:

                         hiddenimports=['cython','sklearn','sklearn.utils._cython_blas','statsmodels','statsmodels.tsa'
                         'statsmodels.tsa.statespace._kalman_smoother',
                         'statsmodels.tsa.statespace._representation',
                         'statsmodels.tsa.statespace._simulation_smoother',
                         'statsmodels.tsa.statespace._statespace',
                         'statsmodels.tsa.statespace._tools',
                         'statsmodels.tsa.statespace._filters._conventional',
                         'statsmodels.tsa.statespace._filters._inversions',
                         'statsmodels.tsa.statespace._filters._univariate',
                         'statsmodels.tsa.statespace._smoothers._alternative',
                         'statsmodels.tsa.statespace._smoothers._classical',
                         'statsmodels.tsa.statespace._smoothers._conventional',
                         'statsmodels.tsa.statespace._smoothers._univariate',
             			 'sklearn.neighbors.typedefs',
           				 'sklearn.neighbors.quad_tree',
             			 'sklearn.neighbors.ball_tree',
             			 'sklearn.neighbors.dist_metrics',
           				 'sklearn.neighbors.kd_tree',
           				 'sklearn.tree._utils',
            			 'sklearn.tree._criterion',
             			 'sklearn.tree._splitter',
            			 'sklearn.tree._utils',

然后我们再编译,所依赖的这种类型的库,都集成进去了。

> pipenv run pyinstaller scscore.spec # 从spec文件安装
> dist/scscore

  File "site-packages/xgboost/__init__.py", line 11, in 
  File "", line 983, in _find_and_load
  File "", line 967, in _find_and_load_unlocked
  File "", line 677, in _load_unlocked
  File "/home/doctorq/.local/share/virtualenvs/scscore-K9x97I77/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
    exec(bytecode, module.__dict__)
  File "site-packages/xgboost/core.py", line 161, in 
  File "site-packages/xgboost/core.py", line 123, in _load_lib
  File "site-packages/xgboost/libpath.py", line 48, in find_lib_path
xgboost.libpath.XGBoostLibraryNotFound: Cannot find XGBoost Library in the candidate path, did you install compilers and run build.sh in root path?
List of candidates:
/tmp/_MEIdNDrR6/xgboost/libxgboost.so
/tmp/_MEIdNDrR6/xgboost/../../lib/libxgboost.so
/tmp/_MEIdNDrR6/xgboost/./lib/libxgboost.so
/tmp/_MEIdNDrR6/xgboost/libxgboost.so
[32970] Failed to execute script main

.so文件引入问题

上面的报错主要都是xgboost的动态连接库的问题,该问题解决方法就是在${pipenv --venv}/lib/python3.7/site-packages/PyInstaller/hooks下新增一个文件hook-xgboost.py,文件名严格要求,文件内容如下:

from PyInstaller.utils.hooks import collect_all

datas, binaries, hiddenimports = collect_all("xgboost")

然后再运行打包

> pipenv run pyinstaller --clean scscore.spec
> ./dist/scscore

然后执行会出现如下情况,一直在启动,不能停~~

出现这个问题是因为joblib库的一个bug,见文章Pyinstaller exe keeps opening itself,只需要把joblib降级到0.11就行了。

> pipenv install joblib==0.11
> pipenv run pyinstaller --clean scscore.spec
> ./dist/scscore

搞定

通过以下配置将程序临时文件存到其他地方,防止打爆/tmp文件

runtime_tmpdir='/home/doctorq/python-dev/scscore/tmp',

你可能感兴趣的:(策略架构)