在python36中import hcluster

解决pip install hcluster问题

  • import hcluster as H出错
    • 1. 问题
    • 2. 解决思路
      • 1. 思路1:直接pip install hcluster(失败)
        • 1.1 更新pip和setuptools
        • 1.2 详细过程记录:
      • 2. 思路2:试图采用其他模块中的 squareform 和 pdist (不确定是否能采用)
      • 3. 思路3:pip install dedupe-hcluster(成功,也是最终方案)

import hcluster as H出错

可直接看思路3,其他部分作为记录

1. 问题

import hcluster as H出错,又在程序中需要用到 hcluster 的 squareform 和 pdist ,所以需要导入该包

2. 解决思路

1. 思路1:直接pip install hcluster(失败)

hcluster包下载链接
pip参考指南
简略过程:

1.1 更新pip和setuptools

上网查了,大部分的解决方案,先在cmd中进行以下两步:

python -m pip install --upgrade pip
pip install --upgrade setuptools

pip install hcluster,但还是报错
第一步:pip install hcluster
解决过程:进入cmd,pip install hcluster,发生错误

1.2 详细过程记录:

第一步:pip install hcluster,报以下错误

    ERROR: Command errored out with exit status 1:
     command: 'e:\python\python36\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\pip-install-uqgpy8v6\\hcluster\\setup.py'"'"'; __file__='"'"'C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\pip-install-uqgpy8v6\\hcluster\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\ADMINI~1\AppData\Local\Temp\pip-pip-egg-info-ic5yyv_s'
         cwd: C:\Users\ADMINI~1\AppData\Local\Temp\pip-install-uqgpy8v6\hcluster\
    Complete output (6 lines):
    Traceback (most recent call last):
      File "", line 1, in 
      File "C:\Users\ADMINI~1\AppData\Local\Temp\pip-install-uqgpy8v6\hcluster\setup.py", line 23
        print "No paths in the python path contain numpy/arrayobject.h"
                                                                      ^
    SyntaxError: Missing parentheses in call to 'print'. Did you mean print("No paths in the python path contain numpy/arrayobject.h")?
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
WARNING: You are using pip version 20.2; however, version 20.2.1 is available.
You should consider upgrading via the 'e:\python\python36\python.exe -m pip install --upgrade pip' command.

在python36中import hcluster_第1张图片
第二步:更新pip 和setuptools,再尝试pip install hcluster,依旧报错(报错内容和截图见下)!

  • python -m pip install --upgrade pip
  • pip install --upgrade setuptools
  • pip install hcluster
    ERROR: Command errored out with exit status 1:
     command: 'e:\python\python36\python.exe' -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\pip-install-3fztn8vv\\hcluster\\setup.py'"'"'; __file__='"'"'C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\pip-install-3fztn8vv\\hcluster\\setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base 'C:\Users\ADMINI~1\AppData\Local\Temp\pip-pip-egg-info-434oq9md'
         cwd: C:\Users\ADMINI~1\AppData\Local\Temp\pip-install-3fztn8vv\hcluster\
    Complete output (6 lines):
    Traceback (most recent call last):
      File "", line 1, in 
      File "C:\Users\ADMINI~1\AppData\Local\Temp\pip-install-3fztn8vv\hcluster\setup.py", line 23
        print "No paths in the python path contain numpy/arrayobject.h"
                                                                      ^
    SyntaxError: Missing parentheses in call to 'print'. Did you mean print("No paths in the python path contain numpy/arrayobject.h")?
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

在python36中import hcluster_第2张图片
思路1失败,考虑采用其他方法!

2. 思路2:试图采用其他模块中的 squareform 和 pdist (不确定是否能采用)

发现程序中只有D = H.squareform(H.pdist(x, distMethod))中使用到了hcluster,搜了一下,scipy.cluster.hierarchy好像也有这两种方法,就是不知道这两个包中(scipy.spatial.distance中的 squareform 和 pdist 和hcluster 的 squareform 和 pdis)的方法是否有什么不同,但暂且将语句改为(2选1),我选择了2,所以还要把原来程序中的H给删除

  • import scipy.cluster.hierarchy as H
  • from scipy.spatial.distance import pdist, squareform

解决方案:

步骤一、将语句import hcluster as H改为from scipy.spatial.distance import pdist, squareform
步骤二、把原代码中的H直接删除即可

心得:可能也能用,但总觉得不规范,还是决定不采用了。

3. 思路3:pip install dedupe-hcluster(成功,也是最终方案)

问题分析:
查看了hcluster(hcluster 0.2.0   Released: Dec 14, 2008)和dedupe-hcluster(dedupe-hcluster 0.3.8   Released: Jan 12, 2020)的包,可知道hcluste已经发布很久了,猜测有更新版本,而dedupe-hcluster暂且理解为其更新版本,在dedupe-hcluster的README.md中就有提到它保留了hcluster 0.2的API。所以可以放心使用dedupe-hcluster了。
dedupe-hcluste的README.md中写到:

It is a fork of clustering and distance functions from the scipy that removes all the dependencies on scipy. It preserves the API of hcluster 0.2.
Part of the Dedupe.io cloud service and open source toolset for de-duplicating and finding fuzzy matches in your data.

hcluster官方说明文档
dedupe-hcluste
解决方案:

  1. 进入cmd
  2. pip install dedupe-hcluster
  3. python代码都不需要变动,依旧是语句import hcluster as H

你可能感兴趣的:(pyhton)