在做硕士毕业设计的时候,用到随机森林这个模型,在写完代码的时候,跑的时候,老是出现sklearn.externals.joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.翻译为中文是说:如果一个任务未能取消序列化,请确保函数的参数都是可解析的。
这个问题我是真没遇到过,结果,从晚上十点到十二点多,两个多小时,百度了好多,还去看了英文的一些解决方法,都不尽如意。后来,修改了参数,然后降低了sklearn版本后解决了,很是兴奋,所以,立马写下这个博客,希望后来者看到后能节约时间。
重点是最后一句!!!
D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\preprocessing\data.py:617: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.
return self.partial_fit(X, y)
D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\base.py:465: DataConversionWarning: Data with input dtype int64, float64 were all converted to float64 by StandardScaler.
return self.fit(X, y, **fit_params).transform(X)
Fitting 3 folds for each of 9 candidates, totalling 27 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.
exception calling callback for
sklearn.externals.joblib.externals.loky.process_executor._RemoteTraceback:
'''
Traceback (most recent call last):
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\externals\joblib\externals\loky\process_executor.py", line 393, in _process_worker
call_item = call_queue.get(block=True, timeout=timeout)
File "C:\Program Files\Python37\lib\multiprocessing\queues.py", line 99, in get
if not self._rlock.acquire(block, timeout):
PermissionError: [WinError 5] 拒绝访问。
'''
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\externals\joblib\externals\loky\_base.py", line 625, in _invoke_callbacks
callback(self)
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\externals\joblib\parallel.py", line 375, in __call__
self.parallel.dispatch_next()
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\externals\joblib\parallel.py", line 797, in dispatch_next
if not self.dispatch_one_batch(self._original_iterator):
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\externals\joblib\parallel.py", line 825, in dispatch_one_batch
self._dispatch(tasks)
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\externals\joblib\parallel.py", line 782, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py", line 506, in apply_async
future = self._workers.submit(SafeFunction(func))
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\externals\joblib\externals\loky\reusable_executor.py", line 151, in submit
fn, *args, **kwargs)
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\externals\joblib\externals\loky\process_executor.py", line 1016, in submit
raise self._flags.broken
sklearn.externals.joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
[Parallel(n_jobs=-1)]: Done 2 tasks | elapsed: 1.0s
[Parallel(n_jobs=-1)]: Done 3 tasks | elapsed: 1.0s
[Parallel(n_jobs=-1)]: Done 4 tasks | elapsed: 1.0s
[Parallel(n_jobs=-1)]: Done 5 tasks | elapsed: 1.0s
[Parallel(n_jobs=-1)]: Done 6 tasks | elapsed: 1.0s
[Parallel(n_jobs=-1)]: Done 7 tasks | elapsed: 1.0s
[Parallel(n_jobs=-1)]: Done 8 tasks | elapsed: 1.0s
[Parallel(n_jobs=-1)]: Done 9 tasks | elapsed: 1.0s
[Parallel(n_jobs=-1)]: Done 10 tasks | elapsed: 1.0s
[Parallel(n_jobs=-1)]: Done 11 tasks | elapsed: 1.0s
[Parallel(n_jobs=-1)]: Done 12 tasks | elapsed: 1.0s
[Parallel(n_jobs=-1)]: Done 14 out of 17 | elapsed: 1.0s remaining: 0.1s
����: û���ҵ����� "19312"��
����: û���ҵ����� "20840"��
sklearn.externals.joblib.externals.loky.process_executor._RemoteTraceback:
'''
Traceback (most recent call last):
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\externals\joblib\externals\loky\process_executor.py", line 393, in _process_worker
call_item = call_queue.get(block=True, timeout=timeout)
File "C:\Program Files\Python37\lib\multiprocessing\queues.py", line 99, in get
if not self._rlock.acquire(block, timeout):
PermissionError: [WinError 5] 拒绝访问。
'''
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "", line 1, in
File "C:\Program Files (x86)\myInstall\pycharm\PyCharm 2019.1.1\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "C:\Program Files (x86)\myInstall\pycharm\PyCharm 2019.1.1\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "D:/myCode/spark/spark_ML/buildingModel.py", line 114, in
pipe_rf.fit(xtrain,ytrain)
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\pipeline.py", line 267, in fit
self._final_estimator.fit(Xt, y, **fit_params)
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\model_selection\_search.py", line 722, in fit
self._run_search(evaluate_candidates)
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\model_selection\_search.py", line 1191, in _run_search
evaluate_candidates(ParameterGrid(self.param_grid))
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\model_selection\_search.py", line 711, in evaluate_candidates
cv.split(X, y, groups)))
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\externals\joblib\parallel.py", line 996, in __call__
self.retrieve()
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\externals\joblib\parallel.py", line 899, in retrieve
self._output.extend(job.get(timeout=self.timeout))
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py", line 517, in wrap_future_result
return future.result(timeout=timeout)
File "C:\Program Files\Python37\lib\concurrent\futures\_base.py", line 432, in result
return self.__get_result()
File "C:\Program Files\Python37\lib\concurrent\futures\_base.py", line 384, in __get_result
raise self._exception
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\externals\joblib\externals\loky\_base.py", line 625, in _invoke_callbacks
callback(self)
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\externals\joblib\parallel.py", line 375, in __call__
self.parallel.dispatch_next()
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\externals\joblib\parallel.py", line 797, in dispatch_next
if not self.dispatch_one_batch(self._original_iterator):
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\externals\joblib\parallel.py", line 825, in dispatch_one_batch
self._dispatch(tasks)
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\externals\joblib\parallel.py", line 782, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\externals\joblib\_parallel_backends.py", line 506, in apply_async
future = self._workers.submit(SafeFunction(func))
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\externals\joblib\externals\loky\reusable_executor.py", line 151, in submit
fn, *args, **kwargs)
File "D:\myCode\PythonTest\MachineLearning\venv\lib\site-packages\sklearn\externals\joblib\externals\loky\process_executor.py", line 1016, in submit
raise self._flags.broken
sklearn.externals.joblib.externals.loky.process_executor.BrokenProcessPool: A task has failed to un-serialize. Please ensure that the arguments of the function are all picklable.
刚开始的时候跑的还是可以的,但是后边就不行了。
百度了很久,刚开始是csdn,结果显示的这类问题很少,可能大多数人都没碰到过吧。然后又百度了其他地方。后来在stackoverflow上找到了,截图如下
他的意思是降低版本为0.20.2,并且把n_jobs 这个参数给修改。我试了,但是么有成功,错误依旧。
######################################################################################
之后,又去了GitHub上,找到了英文的一个差不多的问题,英语还凑合,大概能看懂GitHub。(如果英语不行的同学,可以用谷歌浏览器,转换成英语,但是最好还是直接看英文,因为谷歌的翻译的话会连代码也给翻译)仔细研究了很久,大概就是修改参数,降低版本啥的。
后来,仔细研究了截图中的框内的话,再结合可能参数也需要修改,就基本确定了思路:修改参数并且降低版本肯定能行。
于是,降低sklearn为 0.20.0 (为啥用这个版本,我试过其他版本都不行,只有这个版本可以),然后改参数
# 参数搜索 将n_jobs = -1修改为 n_jobs = 1就没错误了
rf_gridsearch = GridSearchCV(rf_reg,rf_grid_params,cv=cv, n_jobs = -1,
scoring='neg_mean_squared_error',verbose=5,refit=True)
######################################################################################
思路是正确的,但是,错误还是没有结束,之前的错误消失了,但是又出来了新的错误:
Python DeprecationWarning the imp module is deprecated in favour of importlib
显然这个错误很简单,就是说imp这个模块有点老,人家不用了,有了新的。imp 从 Python 3.4 之后弃用了,建议使用 importlib 代替
解决:
点开错误提示的链接,然后打开文件,注释掉
imp
,importimportlib
之后就可以完美运行了。
其实就是版本问题,虽然第一次遇到,但是经过了两个多小时的不离座的搜索百度,还是给解决了,整个过程虽然很累,但是解决掉问题后的心情还是很美好的。看着程序在那跑,大概十几分钟的时间,就去开开心心的冲凉了,整个过程都很开心,也许这就是代码的乐趣所在吧。