可直接看思路3,其他部分作为记录
import hcluster as H出错,又在程序中需要用到 hcluster 的 squareform 和 pdist ,所以需要导入该包
hcluster包下载链接
pip参考指南
简略过程:
上网查了,大部分的解决方案,先在cmd中进行以下两步:
python -m pip install --upgrade pip
pip install --upgrade setuptools
再pip install hcluster
,但还是报错
第一步:pip install hcluster
解决过程:进入cmd,pip install hcluster
,发生错误
第一步:pip install hcluster,报以下错误
ERROR: Command errored out with exit status 1:
command: 'e:\python\python36\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\pip-install-uqgpy8v6\\hcluster\\setup.py'"'"'; __file__='"'"'C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\pip-install-uqgpy8v6\\hcluster\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\ADMINI~1\AppData\Local\Temp\pip-pip-egg-info-ic5yyv_s'
cwd: C:\Users\ADMINI~1\AppData\Local\Temp\pip-install-uqgpy8v6\hcluster\
Complete output (6 lines):
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\ADMINI~1\AppData\Local\Temp\pip-install-uqgpy8v6\hcluster\setup.py", line 23
print "No paths in the python path contain numpy/arrayobject.h"
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("No paths in the python path contain numpy/arrayobject.h")?
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
WARNING: You are using pip version 20.2; however, version 20.2.1 is available.
You should consider upgrading via the 'e:\python\python36\python.exe -m pip install --upgrade pip' command.
第二步:更新pip 和setuptools,再尝试pip install hcluster,依旧报错(报错内容和截图见下)!
python -m pip install --upgrade pip
pip install --upgrade setuptools
pip install hcluster
ERROR: Command errored out with exit status 1:
command: 'e:\python\python36\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\pip-install-3fztn8vv\\hcluster\\setup.py'"'"'; __file__='"'"'C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\pip-install-3fztn8vv\\hcluster\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\ADMINI~1\AppData\Local\Temp\pip-pip-egg-info-434oq9md'
cwd: C:\Users\ADMINI~1\AppData\Local\Temp\pip-install-3fztn8vv\hcluster\
Complete output (6 lines):
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\ADMINI~1\AppData\Local\Temp\pip-install-3fztn8vv\hcluster\setup.py", line 23
print "No paths in the python path contain numpy/arrayobject.h"
^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print("No paths in the python path contain numpy/arrayobject.h")?
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
发现程序中只有D = H.squareform(H.pdist(x, distMethod))
中使用到了hcluster,搜了一下,scipy.cluster.hierarchy好像也有这两种方法,就是不知道这两个包中(scipy.spatial.distance中的 squareform 和 pdist 和hcluster 的 squareform 和 pdis)的方法是否有什么不同,但暂且将语句改为(2选1),我选择了2,所以还要把原来程序中的H给删除
import scipy.cluster.hierarchy as H
from scipy.spatial.distance import pdist, squareform
解决方案:
步骤一、将语句
import hcluster as H
改为from scipy.spatial.distance import pdist, squareform
步骤二、把原代码中的H直接删除即可
心得:可能也能用,但总觉得不规范,还是决定不采用了。
问题分析:
查看了hcluster(hcluster 0.2.0 Released: Dec 14, 2008)和dedupe-hcluster(dedupe-hcluster 0.3.8 Released: Jan 12, 2020)的包,可知道hcluste已经发布很久了,猜测有更新版本,而dedupe-hcluster暂且理解为其更新版本,在dedupe-hcluster的README.md中就有提到它保留了hcluster 0.2的API。所以可以放心使用dedupe-hcluster了。
dedupe-hcluste的README.md中写到:
It is a fork of clustering and distance functions from the scipy that removes all the dependencies on scipy. It preserves the API of hcluster 0.2.
Part of the Dedupe.io cloud service and open source toolset for de-duplicating and finding fuzzy matches in your data.
hcluster官方说明文档
dedupe-hcluste
解决方案:
pip install dedupe-hcluster
import hcluster as H