bertopic从安装踩坑到成功运行

之前的博文也是下载包。。然后跑一跑。。然后跑出来效果不好。又没继续了。。希望这次能让bertopic成功用起来。。star这么多应该没问题吧555我真的要哭。

intro

GitHub BERTopic
文档
用途:在一堆文档中发现潜在的主题分类。
背景:主题模型可以看作是聚类任务。
intro:bertopic搞出了一个文档嵌入模型,先用基于transformer的预训练玉莲模型搞出一堆嵌入向量(?),给他们做聚类,最后,用基于类别的TF-IDF方法生成主题的表示。
weakness:(我自己瞎翻译的)
1.假设了每个doc只包含一个主题。
2.尽管用bert考虑了上下文语义,但是其主题表示是由词袋模型(bags-of words)生成的,所以就不能展示出这一点?主题的表示里面就是一堆词,这些词只能反映在这个主题中的重要性和相关性。然后有些主题里的单词可能是相似的,主题就会存在冗余。
(所以这个意思是比如
topic1:pipe,rise,rapidly
topic2: tube,ascend,speedily
这俩topic representation语义上几乎一毛一样。但是算法会给他们分成两个topic)
这是我揣测的。不知理解是否正确。

菜鸟运行进度

安装

pytorch环境

经尝试,我pytorch版本没运行起来。所以直接看tensorflow版那里吧。

想用可视化,按如下方式安装BERTopic:

pip install bertopic[visualization]

得到报错:

 Stored in directory: c:\users\user\appdata\local\pip\cache\wheels\5e\6f\8c\d88aec621f3f542d26fac0342bef5e693335d125f4e54aeffe
  Building wheel for hdbscan (PEP 517) ... error
  ERROR: Command errored out with exit status 1:
   command: 'C:\Users\user\anaconda3\python.exe' 'C:\Users\user\anaconda3\lib\site-packages\pip\_vendor\pep517\_in_process.py' build_wheel 'C:\Users\user\AppData\Local\Temp\tmp05_ix5xu'
       cwd: C:\Users\user\AppData\Local\Temp\pip-install-dccfbx0s\hdbscan
  Complete output (40 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-cpython-38
  creating build\lib.win-amd64-cpython-38\hdbscan
  copying hdbscan\flat.py -> build\lib.win-amd64-cpython-38\hdbscan
  copying hdbscan\hdbscan_.py -> build\lib.win-amd64-cpython-38\hdbscan
  copying hdbscan\plots.py -> build\lib.win-amd64-cpython-38\hdbscan
  copying hdbscan\prediction.py -> build\lib.win-amd64-cpython-38\hdbscan
  copying hdbscan\robust_single_linkage_.py -> build\lib.win-amd64-cpython-38\hdbscan
  copying hdbscan\validity.py -> build\lib.win-amd64-cpython-38\hdbscan
  copying hdbscan\__init__.py -> build\lib.win-amd64-cpython-38\hdbscan
  creating build\lib.win-amd64-cpython-38\hdbscan\tests
  copying hdbscan\tests\test_flat.py -> build\lib.win-amd64-cpython-38\hdbscan\tests
  copying hdbscan\tests\test_hdbscan.py -> build\lib.win-amd64-cpython-38\hdbscan\tests
  copying hdbscan\tests\test_prediction_utils.py -> build\lib.win-amd64-cpython-38\hdbscan\tests
  copying hdbscan\tests\test_rsl.py -> build\lib.win-amd64-cpython-38\hdbscan\tests
  copying hdbscan\tests\__init__.py -> build\lib.win-amd64-cpython-38\hdbscan\tests
  running build_ext
  cythoning hdbscan/_hdbscan_tree.pyx to hdbscan\_hdbscan_tree.c
  cythoning hdbscan/_hdbscan_linkage.pyx to hdbscan\_hdbscan_linkage.c
  cythoning hdbscan/_hdbscan_boruvka.pyx to hdbscan\_hdbscan_boruvka.c
  cythoning hdbscan/_hdbscan_reachability.pyx to hdbscan\_hdbscan_reachability.c
  cythoning hdbscan/_prediction_utils.pyx to hdbscan\_prediction_utils.c
  cythoning hdbscan/dist_metrics.pyx to hdbscan\dist_metrics.c
  building 'hdbscan._hdbscan_tree' extension
  C:\Users\user\AppData\Local\Temp\pip-build-env-hmh88ps4\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-dccfbx0s\hdbscan\hdbscan\_hdbscan_tree.pyx
    tree = Parsing.p_module(s, pxd, full_module_name)
  C:\Users\user\AppData\Local\Temp\pip-build-env-hmh88ps4\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-dccfbx0s\hdbscan\hdbscan\_hdbscan_linkage.pyx
    tree = Parsing.p_module(s, pxd, full_module_name)
  C:\Users\user\AppData\Local\Temp\pip-build-env-hmh88ps4\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-dccfbx0s\hdbscan\hdbscan\_hdbscan_boruvka.pyx
    tree = Parsing.p_module(s, pxd, full_module_name)
  C:\Users\user\AppData\Local\Temp\pip-build-env-hmh88ps4\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-dccfbx0s\hdbscan\hdbscan\_hdbscan_reachability.pyx
    tree = Parsing.p_module(s, pxd, full_module_name)
  C:\Users\user\AppData\Local\Temp\pip-build-env-hmh88ps4\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-dccfbx0s\hdbscan\hdbscan\_prediction_utils.pyx
    tree = Parsing.p_module(s, pxd, full_module_name)
  C:\Users\user\AppData\Local\Temp\pip-build-env-hmh88ps4\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-dccfbx0s\hdbscan\hdbscan\dist_metrics.pxd
    tree = Parsing.p_module(s, pxd, full_module_name)
  error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
  ----------------------------------------
  ERROR: Failed building wheel for hdbscan
  Building wheel for pynndescent (setup.py) ... done
  Created wheel for pynndescent: filename=pynndescent-0.5.7-py3-none-any.whl size=54278 sha256=1526eabb33909cf87dba133774cd96564dd55af33c30b688488c8113ad2e54a9
  Stored in directory: c:\users\user\appdata\local\pip\cache\wheels\1b\38\fe\99e22fbae88abd1c5e8d99253cba6d1c590cc7a94408bff3bf
Successfully built umap-learn sentence-transformers pynndescent
Failed to build hdbscan
ERROR: Could not build wheels for hdbscan which use PEP 517 and cannot be installed directly

使用这个博主的方法成功安装hdbscan

conda install -c zeus1942 hdbscan

emm
刚才是直接在项目cmd里安装,没有指定环境
在pycharm里指定环境后重新来一遍。
bertopic从安装踩坑到成功运行_第1张图片
然后pip install bertopic还是报错。
这次报错内容是:

Building wheels for collected packages: hdbscan, sentence-transformers, umap-learn, pynndescent, future
  Building wheel for hdbscan (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for hdbscan (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [40 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build\lib.win-amd64-cpython-37
      creating build\lib.win-amd64-cpython-37\hdbscan
      copying hdbscan\flat.py -> build\lib.win-amd64-cpython-37\hdbscan
      copying hdbscan\hdbscan_.py -> build\lib.win-amd64-cpython-37\hdbscan
      copying hdbscan\plots.py -> build\lib.win-amd64-cpython-37\hdbscan
      copying hdbscan\prediction.py -> build\lib.win-amd64-cpython-37\hdbscan
      copying hdbscan\robust_single_linkage_.py -> build\lib.win-amd64-cpython-37\hdbscan
      copying hdbscan\validity.py -> build\lib.win-amd64-cpython-37\hdbscan
      copying hdbscan\__init__.py -> build\lib.win-amd64-cpython-37\hdbscan
      creating build\lib.win-amd64-cpython-37\hdbscan\tests
      copying hdbscan\tests\test_flat.py -> build\lib.win-amd64-cpython-37\hdbscan\tests
      copying hdbscan\tests\test_hdbscan.py -> build\lib.win-amd64-cpython-37\hdbscan\tests
      copying hdbscan\tests\test_prediction_utils.py -> build\lib.win-amd64-cpython-37\hdbscan\tests
      copying hdbscan\tests\test_rsl.py -> build\lib.win-amd64-cpython-37\hdbscan\tests
      copying hdbscan\tests\__init__.py -> build\lib.win-amd64-cpython-37\hdbscan\tests
      running build_ext
      cythoning hdbscan/_hdbscan_tree.pyx to hdbscan\_hdbscan_tree.c
      cythoning hdbscan/_hdbscan_linkage.pyx to hdbscan\_hdbscan_linkage.c
      cythoning hdbscan/_hdbscan_boruvka.pyx to hdbscan\_hdbscan_boruvka.c
      cythoning hdbscan/_hdbscan_reachability.pyx to hdbscan\_hdbscan_reachability.c
      cythoning hdbscan/_prediction_utils.pyx to hdbscan\_prediction_utils.c
      cythoning hdbscan/dist_metrics.pyx to hdbscan\dist_metrics.c
      building 'hdbscan._hdbscan_tree' extension
      C:\Users\user\AppData\Local\Temp\pip-build-env-ww4g4gx8\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not se
t, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-tba2d8s_\hdbscan_f35b04f4c7ef4bf78c198dde1fe8c1c6\hdbscan\
_hdbscan_tree.pyx
        tree = Parsing.p_module(s, pxd, full_module_name)
      C:\Users\user\AppData\Local\Temp\pip-build-env-ww4g4gx8\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not se
t, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-tba2d8s_\hdbscan_f35b04f4c7ef4bf78c198dde1fe8c1c6\hdbscan\
_hdbscan_linkage.pyx
        tree = Parsing.p_module(s, pxd, full_module_name)
      C:\Users\user\AppData\Local\Temp\pip-build-env-ww4g4gx8\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not se
t, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-tba2d8s_\hdbscan_f35b04f4c7ef4bf78c198dde1fe8c1c6\hdbscan\
_hdbscan_boruvka.pyx
        tree = Parsing.p_module(s, pxd, full_module_name)
      C:\Users\user\AppData\Local\Temp\pip-build-env-ww4g4gx8\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not se
t, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-tba2d8s_\hdbscan_f35b04f4c7ef4bf78c198dde1fe8c1c6\hdbscan\
_hdbscan_reachability.pyx
        tree = Parsing.p_module(s, pxd, full_module_name)
      C:\Users\user\AppData\Local\Temp\pip-build-env-ww4g4gx8\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not se
t, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-tba2d8s_\hdbscan_f35b04f4c7ef4bf78c198dde1fe8c1c6\hdbscan\
_prediction_utils.pyx
        tree = Parsing.p_module(s, pxd, full_module_name)
      C:\Users\user\AppData\Local\Temp\pip-build-env-ww4g4gx8\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not se
t, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-tba2d8s_\hdbscan_f35b04f4c7ef4bf78c198dde1fe8c1c6\hdbscan\
dist_metrics.pxd
        tree = Parsing.p_module(s, pxd, full_module_name)
      error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for hdbscan
  Building wheel for sentence-transformers (setup.py) ... done
  Created wheel for sentence-transformers: filename=sentence_transformers-2.2.2-py3-none-any.whl size=125940 sha256=d580c60d854746cc9537aac0ad1cf1ad0b5f0ace67d0de1dfa874c58
f120e5e5
  Stored in directory: c:\users\user\appdata\local\pip\cache\wheels\bf\06\fb\d59c1e5bd1dac7f6cf61ec0036cc3a10ab8fecaa6b2c3d3ee9
  Building wheel for umap-learn (setup.py) ... done
  Created wheel for umap-learn: filename=umap_learn-0.5.3-py3-none-any.whl size=82829 sha256=43460ec4024bb21b4fdadd1c7260b20c6b5d490767618ce4e4f33a40c3340793
  Stored in directory: c:\users\user\appdata\local\pip\cache\wheels\b3\52\a5\1fd9e3e76a7ab34f134c07469cd6f16e27ef3a37aeff1fe821
  Building wheel for pynndescent (setup.py) ... done
  Created wheel for pynndescent: filename=pynndescent-0.5.7-py3-none-any.whl size=54286 sha256=9203e46f4893cb9228521f3853c8bab4b2a1b40a3dc2bb7c45d29ee220c64d35
  Stored in directory: c:\users\user\appdata\local\pip\cache\wheels\7f\2a\f8\7bd5dcec71bd5c669f6f574db3113513696b98f3f9b51f496c
  Building wheel for future (setup.py) ... done
  Created wheel for future: filename=future-0.18.2-py3-none-any.whl size=491070 sha256=cdaf68da5115f51c5959d42d97b0936f1fc7e57ac2e0b7b1801c1bc5bfe536ca
  Stored in directory: c:\users\user\appdata\local\pip\cache\wheels\56\b0\fe\4410d17b32f1f0c3cf54cdfb2bc04d7b4b8f4ae377e2229ba0
Successfully built sentence-transformers umap-learn pynndescent future
Failed to build hdbscan
ERROR: Could not build wheels for hdbscan, which is required to install pyproject.toml-based projects


我决定再试一遍

conda install -c zeus1942 hdbscan

好的。我是文盲。这不行。不是一个错。这 要 怎么搞 呢。

手动安装hdbscan

这个博主发的一样是pyproject.toml,试一下

问题是。我不知道该选哪一个下载。bertopic从安装踩坑到成功运行_第2张图片
我决定安装和博主一样的。
结果现实

Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
ERROR: hdbscan-0.8.28-cp38-cp38-win_amd64.whl is not a supported wheel on this platform.

是说 我安装的这个版本,和我现在的平台不适配?
于是
又找了一个博主发的
whl is not a supported wheel

pip debug --verbose

我得到的运行结果

Compatible tags: 27
  cp37-cp37m-win_amd64
  cp37-abi3-win_amd64
  cp37-none-win_amd64
  cp36-abi3-win_amd64
  cp35-abi3-win_amd64
  cp34-abi3-win_amd64
  cp33-abi3-win_amd64
  cp32-abi3-win_amd64
  py37-none-win_amd64
  py3-none-win_amd64
  py36-none-win_amd64
  py35-none-win_amd64
  py34-none-win_amd64
  py33-none-win_amd64
  py32-none-win_amd64
  py31-none-win_amd64
  py30-none-win_amd64
  cp37-none-any
  py37-none-any
  py3-none-any
  py36-none-any
  py35-none-any
  py34-none-any
  py33-none-any
  py32-none-any
  py31-none-any
  py30-none-any

bertopic从安装踩坑到成功运行_第3张图片
ok 再来

Installing collected packages: joblib, cython, hdbscan
  Attempting uninstall: joblib
    Found existing installation: joblib 0.13.0
    Uninstalling joblib-0.13.0:
      Successfully uninstalled joblib-0.13.0
  Attempting uninstall: hdbscan
    Found existing installation: hdbscan 0.8.26
    Uninstalling hdbscan-0.8.26:
      Successfully uninstalled hdbscan-0.8.26
Successfully installed cython-0.29.30 hdbscan-0.8.28 joblib-1.1.0

成功安装了
但是这意思是 之前也有安装好的hdbscan?
那么 我现在再一次

pip install bertopic
Successfully installed bertopic-0.10.0 click-8.1.3 filelock-3.7.1 future-0.18.2 huggingface-hub-0.8.1 llvmlite-0.38.1 nltk-3.7 numba-0.55.2 plotly-5.9.0 pynndescent-0.5.7 p
yyaml-5.4.1 regex-2022.6.2 sentence-transformers-2.2.2 sentencepiece-0.1.96 tenacity-8.0.1 tokenizers-0.12.1 transformers-4.20.1 umap-learn-0.5.3

终于成功

tensorflow环境

pip install bertopic

遇到和昨天一样的问题

 Building wheel for hdbscan (PEP 517) ... error

于是执行

conda install -c zeus1942 hdbscan

得到

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

  - hdbscan -> python[version='3.6.*|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0']

Your python: python=3.8

If python is on the left-most side of the chain, that's the version you've asked for.
When python appears to the right, that indicates that the thing on the left is somehow
not available for the python version you are constrained to. Note that conda will not
change your python version to a different minor version unless you explicitly specify
that.

尝试上面的方法2
看到这里是 cp38-cp38-win_amd64
于是下载对应版本的
又一次手动安装hdbscan成功了。
再次 pip install bertopic
我好希望这次换成tensorflow环境后,可以直接运行起官网给的5行样例代码。
但是安装过程中看到

Installing collected packages: llvmlite, numba, pynndescent, tqdm, umap-learn, pandas, click, regex, nltk, filelock, tokenizers, typing-extensions, huggingface-hub, transfor
mers, sentencepiece, torch, pillow, torchvision, sentence-transformers, tenacity, plotly, bertopic

(在安装torch?)

Successfully installed bertopic-0.10.0 click-8.1.3 filelock-3.7.1
huggingface-hub-0.8.1 llvmlite-0.38.1 nltk-3.7 numba-0.55.2
pandas-1.4.3 pillow-9.1.1 plotly-5.9.0 pynndesc ent-0.5.7
regex-2022.6.2 sentence-transformers-2.2.2 sentencepiece-0.1.96
tenacity-8.0.1 tokenizers-0.12.1 torch-1.12.0 torchvision-0.13.0
tqdm-4.64.0 transformers-4.20.1 ty ping-extensions-4.2.0
umap-learn-0.5.3

虽然成功安装了。但是红字。

ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.

We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.

numba 0.55.2 requires numpy<1.23,>=1.18, but you'll have numpy 1.23.0 which is incompatible.
huggingface-hub 0.8.1 requires packaging>=20.9, but you'll have packaging 20.4 which is incompatible.

在需要安装的包语句后面加 --use-feature=2020-resolver
那意思是 numba和huggingface-hub重新安装吗。

pip install numba==0.55.2 --use-feature=2020-resolver

执行后
得到
Successfully installed numpy-1.22.4
就是把我环境里的numpy变成现在numba适配的了。

pip install huggingface-hub==0.8.1 --use-feature=2020-resolver

Successfully installed packaging-21.3

好了!我现在要运行这 5 行 代码!

运行?

用文档给的代码,创建了一个main.py

from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups

docs = fetch_20newsgroups(subset='all',  remove=('headers', 'footers', 'quotes'))['data']
# fetch_20newsgroups ( data_home=None,   subset=’train’,   categories=None,   shuffle=True,   random_state=42,   remove=(),   download_if_missing=True )
# subset='all' 表示有训练集又有测试集,且无序。
# remove= 去掉一些元数据,防止分类器在元数据上过度拟合。‘headers’删除的是新闻组标题,‘footers’删除的是帖子末尾类似于签名的部分,而 ‘quotes’ 删除的是似乎被其他帖子引用了的行。
topic_model = BERTopic()
topics, probs = topic_model.fit_transform(docs)

运行后得到。。。

Traceback (most recent call last):
  File "D:/lll/bertopic/mytopictest/main.py", line 1, in <module>
    from bertopic import BERTopic
  File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\bertopic\__init__.py", line 1, in <module>
    from bertopic._bertopic import BERTopic
  File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\bertopic\_bertopic.py", line 31, in <module>
    from bertopic.backend._utils import select_backend
  File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\bertopic\backend\__init__.py", line 2, in <module>
    from ._word_doc import WordDocEmbedder
  File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\bertopic\backend\_word_doc.py", line 4, in <module>
    from bertopic.backend._utils import select_backend
  File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\bertopic\backend\_utils.py", line 2, in <module>
    from ._sentencetransformers import SentenceTransformerBackend
  File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\bertopic\backend\_sentencetransformers.py", line 3, in <module>
    from sentence_transformers import SentenceTransformer
  File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\sentence_transformers\__init__.py", line 3, in <module>
    from .datasets import SentencesDataset, ParallelSentencesDataset
  File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\sentence_transformers\datasets\__init__.py", line 3, in <module>
    from .ParallelSentencesDataset import ParallelSentencesDataset
  File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\sentence_transformers\datasets\ParallelSentencesDataset.py", line 4, in <module>
    from .. import SentenceTransformer
  File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\sentence_transformers\SentenceTransformer.py", line 11, in <module>
    import transformers
  File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\transformers\__init__.py", line 30, in <module>
    from . import dependency_versions_check
  File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\transformers\dependency_versions_check.py", line 17, in <module>
    from .utils.versions import require_version, require_version_core
  File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\transformers\utils\__init__.py", line 33, in <module>
    from .generic import (
  File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\transformers\utils\generic.py", line 28, in <module>
    from .import_utils import is_flax_available, is_tf_available, is_torch_available, is_torch_fx_proxy
  File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\transformers\utils\import_utils.py", line 60, in <module>
    _tf_available = importlib.util.find_spec("tensorflow") is not None
  File "C:\Users\user\anaconda3\envs\pytorch1\lib\importlib\util.py", line 114, in find_spec
    raise ValueError('{}.__spec__ is None'.format(name))
ValueError: tensorflow.__spec__ is None

我都不知道从哪改起了。


半小时后,我冷静了一下又回来了。
看到报错的部分语句是

from . import
from .. import

代表使用相对路径导入,即从当前项目中寻找需要导入的包或函数。
包导入问题
看这个贴子的意思是,不要用这两个语句,把. 或者 … 改成你要导的这个东西,它本身所在的文件名字。
先放一下
先从底下网上解决吧。

ValueError: tensorflow.__spec__ is None

网上搜索这个语句得到:
方法1:
将当前的transformers版本更换为4.4.0版本
方法2:
别去改动transformers,只要把你的TensorFlow升级到最新版本即可

突然在想。我不知道bertopic用的是pytorch还是tensorflow。
bertopic仓库没有搜到到 import torch
bertopic从安装踩坑到成功运行_第4张图片
bertopic从安装踩坑到成功运行_第5张图片
仓库里也没有搜到 import tensorflow
只有一个import tensorflow_hub

Tensorflow-hub 是 google 提供的机器学习模组打包函式库,帮开发者把TensorFlow的训练模型发布成模组,方便再次使用或是与社交共享。tensorflow_hub 库可与 TensorFlow 1 和 TensorFlow 2 一起安装。我们建议新用户立即从 TensorFlow 2 开始。

所以,用的是tensorflow?
有没有大佬帮帮我

把训练模型 发布成 模组

所以意思是,这个tensorflow_hub只是用来把他搞好的模型,变成一个可以直接pip install的东西吗?

---------2022-6-29 擦擦眼泪再来看
Successfully installed sacremoses-0.0.53 tokenizers-0.10.3 transformers-4.4.0
尝试方法1
现在安装好4.4.0
再运行一下给的那一丢丢五行代码。
得到完全一样的报错。

然后我的环境好像是pytorch?我现在在conda创建新环境。试一下安装tensorflow?但是我从来没搞过,不知道可不可以。

歪日!!!!!!!!!!!!!!!!!
用tensorflow跑起来了!
bertopic从安装踩坑到成功运行_第6张图片
先等一会。目前无动静。
基于bertopic新闻主题建模

看看大佬文章。研究下我要怎么换我的中文文本。


过去了很久
进程已结束,退出代码为 0
然后奇怪的就来了
明明运行结束了。没有这个变量。

>>> topic_model.get_topic_info()
Traceback (most recent call last):
  File "C:\Users\user\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3418, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "", line 1, in <module>
    topic_model.get_topic_info()
NameError: name 'topic_model' is not defined

为什么啊


2022-6-30
运行起来了!
我还是不明白为什么在main.py运行后是不行的。
现在成功运行的方法是:把一行行代码,输入到python控制台。
bertopic从安装踩坑到成功运行_第7张图片
bertopic从安装踩坑到成功运行_第8张图片
然后就可以和他仓库一样成功打印出info了。
然后visualize的时候,并不是输入这一行代码就会跳出来可以看的figure
bertopic从安装踩坑到成功运行_第9张图片

topic_model.visualize_topics()

这样子输入,我是打印出了一堆图片的属性信息。。

    fig = topic_model.visualize_topics()
    fig.write_html("path/to/file.html")

所以只好这样子存下来看。
bertopic从安装踩坑到成功运行_第10张图片
就可以看到结果啦。
但是前面topic输出和这个聚类的图,都和作者仓库现实的不一样。不明白。数据集应该也没变吧?
本文结束。下文开启如何换自己的中文数据集看看。

你可能感兴趣的:(python,bert,nlp,语言模型)