之前的博文也是下载包。。然后跑一跑。。然后跑出来效果不好。又没继续了。。希望这次能让bertopic成功用起来。。star这么多应该没问题吧555我真的要哭。
GitHub BERTopic
文档
用途:在一堆文档中发现潜在的主题分类。
背景:主题模型可以看作是聚类任务。
intro:bertopic搞出了一个文档嵌入模型,先用基于transformer的预训练玉莲模型搞出一堆嵌入向量(?),给他们做聚类,最后,用基于类别的TF-IDF方法生成主题的表示。
weakness:(我自己瞎翻译的)
1.假设了每个doc只包含一个主题。
2.尽管用bert考虑了上下文语义,但是其主题表示是由词袋模型(bags-of words)生成的,所以就不能展示出这一点?主题的表示里面就是一堆词,这些词只能反映在这个主题中的重要性和相关性。然后有些主题里的单词可能是相似的,主题就会存在冗余。
(所以这个意思是比如
topic1:pipe,rise,rapidly
topic2: tube,ascend,speedily
这俩topic representation语义上几乎一毛一样。但是算法会给他们分成两个topic)
这是我揣测的。不知理解是否正确。
经尝试,我pytorch版本没运行起来。所以直接看tensorflow版那里吧。
想用可视化,按如下方式安装BERTopic:
pip install bertopic[visualization]
得到报错:
Stored in directory: c:\users\user\appdata\local\pip\cache\wheels\5e\6f\8c\d88aec621f3f542d26fac0342bef5e693335d125f4e54aeffe
Building wheel for hdbscan (PEP 517) ... error
ERROR: Command errored out with exit status 1:
command: 'C:\Users\user\anaconda3\python.exe' 'C:\Users\user\anaconda3\lib\site-packages\pip\_vendor\pep517\_in_process.py' build_wheel 'C:\Users\user\AppData\Local\Temp\tmp05_ix5xu'
cwd: C:\Users\user\AppData\Local\Temp\pip-install-dccfbx0s\hdbscan
Complete output (40 lines):
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-cpython-38
creating build\lib.win-amd64-cpython-38\hdbscan
copying hdbscan\flat.py -> build\lib.win-amd64-cpython-38\hdbscan
copying hdbscan\hdbscan_.py -> build\lib.win-amd64-cpython-38\hdbscan
copying hdbscan\plots.py -> build\lib.win-amd64-cpython-38\hdbscan
copying hdbscan\prediction.py -> build\lib.win-amd64-cpython-38\hdbscan
copying hdbscan\robust_single_linkage_.py -> build\lib.win-amd64-cpython-38\hdbscan
copying hdbscan\validity.py -> build\lib.win-amd64-cpython-38\hdbscan
copying hdbscan\__init__.py -> build\lib.win-amd64-cpython-38\hdbscan
creating build\lib.win-amd64-cpython-38\hdbscan\tests
copying hdbscan\tests\test_flat.py -> build\lib.win-amd64-cpython-38\hdbscan\tests
copying hdbscan\tests\test_hdbscan.py -> build\lib.win-amd64-cpython-38\hdbscan\tests
copying hdbscan\tests\test_prediction_utils.py -> build\lib.win-amd64-cpython-38\hdbscan\tests
copying hdbscan\tests\test_rsl.py -> build\lib.win-amd64-cpython-38\hdbscan\tests
copying hdbscan\tests\__init__.py -> build\lib.win-amd64-cpython-38\hdbscan\tests
running build_ext
cythoning hdbscan/_hdbscan_tree.pyx to hdbscan\_hdbscan_tree.c
cythoning hdbscan/_hdbscan_linkage.pyx to hdbscan\_hdbscan_linkage.c
cythoning hdbscan/_hdbscan_boruvka.pyx to hdbscan\_hdbscan_boruvka.c
cythoning hdbscan/_hdbscan_reachability.pyx to hdbscan\_hdbscan_reachability.c
cythoning hdbscan/_prediction_utils.pyx to hdbscan\_prediction_utils.c
cythoning hdbscan/dist_metrics.pyx to hdbscan\dist_metrics.c
building 'hdbscan._hdbscan_tree' extension
C:\Users\user\AppData\Local\Temp\pip-build-env-hmh88ps4\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-dccfbx0s\hdbscan\hdbscan\_hdbscan_tree.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
C:\Users\user\AppData\Local\Temp\pip-build-env-hmh88ps4\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-dccfbx0s\hdbscan\hdbscan\_hdbscan_linkage.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
C:\Users\user\AppData\Local\Temp\pip-build-env-hmh88ps4\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-dccfbx0s\hdbscan\hdbscan\_hdbscan_boruvka.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
C:\Users\user\AppData\Local\Temp\pip-build-env-hmh88ps4\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-dccfbx0s\hdbscan\hdbscan\_hdbscan_reachability.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
C:\Users\user\AppData\Local\Temp\pip-build-env-hmh88ps4\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-dccfbx0s\hdbscan\hdbscan\_prediction_utils.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
C:\Users\user\AppData\Local\Temp\pip-build-env-hmh88ps4\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not set, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-dccfbx0s\hdbscan\hdbscan\dist_metrics.pxd
tree = Parsing.p_module(s, pxd, full_module_name)
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
----------------------------------------
ERROR: Failed building wheel for hdbscan
Building wheel for pynndescent (setup.py) ... done
Created wheel for pynndescent: filename=pynndescent-0.5.7-py3-none-any.whl size=54278 sha256=1526eabb33909cf87dba133774cd96564dd55af33c30b688488c8113ad2e54a9
Stored in directory: c:\users\user\appdata\local\pip\cache\wheels\1b\38\fe\99e22fbae88abd1c5e8d99253cba6d1c590cc7a94408bff3bf
Successfully built umap-learn sentence-transformers pynndescent
Failed to build hdbscan
ERROR: Could not build wheels for hdbscan which use PEP 517 and cannot be installed directly
使用这个博主的方法成功安装hdbscan
conda install -c zeus1942 hdbscan
emm
刚才是直接在项目cmd里安装,没有指定环境
在pycharm里指定环境后重新来一遍。
然后pip install bertopic还是报错。
这次报错内容是:
Building wheels for collected packages: hdbscan, sentence-transformers, umap-learn, pynndescent, future
Building wheel for hdbscan (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for hdbscan (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [40 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-cpython-37
creating build\lib.win-amd64-cpython-37\hdbscan
copying hdbscan\flat.py -> build\lib.win-amd64-cpython-37\hdbscan
copying hdbscan\hdbscan_.py -> build\lib.win-amd64-cpython-37\hdbscan
copying hdbscan\plots.py -> build\lib.win-amd64-cpython-37\hdbscan
copying hdbscan\prediction.py -> build\lib.win-amd64-cpython-37\hdbscan
copying hdbscan\robust_single_linkage_.py -> build\lib.win-amd64-cpython-37\hdbscan
copying hdbscan\validity.py -> build\lib.win-amd64-cpython-37\hdbscan
copying hdbscan\__init__.py -> build\lib.win-amd64-cpython-37\hdbscan
creating build\lib.win-amd64-cpython-37\hdbscan\tests
copying hdbscan\tests\test_flat.py -> build\lib.win-amd64-cpython-37\hdbscan\tests
copying hdbscan\tests\test_hdbscan.py -> build\lib.win-amd64-cpython-37\hdbscan\tests
copying hdbscan\tests\test_prediction_utils.py -> build\lib.win-amd64-cpython-37\hdbscan\tests
copying hdbscan\tests\test_rsl.py -> build\lib.win-amd64-cpython-37\hdbscan\tests
copying hdbscan\tests\__init__.py -> build\lib.win-amd64-cpython-37\hdbscan\tests
running build_ext
cythoning hdbscan/_hdbscan_tree.pyx to hdbscan\_hdbscan_tree.c
cythoning hdbscan/_hdbscan_linkage.pyx to hdbscan\_hdbscan_linkage.c
cythoning hdbscan/_hdbscan_boruvka.pyx to hdbscan\_hdbscan_boruvka.c
cythoning hdbscan/_hdbscan_reachability.pyx to hdbscan\_hdbscan_reachability.c
cythoning hdbscan/_prediction_utils.pyx to hdbscan\_prediction_utils.c
cythoning hdbscan/dist_metrics.pyx to hdbscan\dist_metrics.c
building 'hdbscan._hdbscan_tree' extension
C:\Users\user\AppData\Local\Temp\pip-build-env-ww4g4gx8\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not se
t, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-tba2d8s_\hdbscan_f35b04f4c7ef4bf78c198dde1fe8c1c6\hdbscan\
_hdbscan_tree.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
C:\Users\user\AppData\Local\Temp\pip-build-env-ww4g4gx8\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not se
t, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-tba2d8s_\hdbscan_f35b04f4c7ef4bf78c198dde1fe8c1c6\hdbscan\
_hdbscan_linkage.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
C:\Users\user\AppData\Local\Temp\pip-build-env-ww4g4gx8\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not se
t, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-tba2d8s_\hdbscan_f35b04f4c7ef4bf78c198dde1fe8c1c6\hdbscan\
_hdbscan_boruvka.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
C:\Users\user\AppData\Local\Temp\pip-build-env-ww4g4gx8\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not se
t, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-tba2d8s_\hdbscan_f35b04f4c7ef4bf78c198dde1fe8c1c6\hdbscan\
_hdbscan_reachability.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
C:\Users\user\AppData\Local\Temp\pip-build-env-ww4g4gx8\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not se
t, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-tba2d8s_\hdbscan_f35b04f4c7ef4bf78c198dde1fe8c1c6\hdbscan\
_prediction_utils.pyx
tree = Parsing.p_module(s, pxd, full_module_name)
C:\Users\user\AppData\Local\Temp\pip-build-env-ww4g4gx8\overlay\Lib\site-packages\Cython\Compiler\Main.py:369: FutureWarning: Cython directive 'language_level' not se
t, using 2 for now (Py2). This will change in a later release! File: C:\Users\user\AppData\Local\Temp\pip-install-tba2d8s_\hdbscan_f35b04f4c7ef4bf78c198dde1fe8c1c6\hdbscan\
dist_metrics.pxd
tree = Parsing.p_module(s, pxd, full_module_name)
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for hdbscan
Building wheel for sentence-transformers (setup.py) ... done
Created wheel for sentence-transformers: filename=sentence_transformers-2.2.2-py3-none-any.whl size=125940 sha256=d580c60d854746cc9537aac0ad1cf1ad0b5f0ace67d0de1dfa874c58
f120e5e5
Stored in directory: c:\users\user\appdata\local\pip\cache\wheels\bf\06\fb\d59c1e5bd1dac7f6cf61ec0036cc3a10ab8fecaa6b2c3d3ee9
Building wheel for umap-learn (setup.py) ... done
Created wheel for umap-learn: filename=umap_learn-0.5.3-py3-none-any.whl size=82829 sha256=43460ec4024bb21b4fdadd1c7260b20c6b5d490767618ce4e4f33a40c3340793
Stored in directory: c:\users\user\appdata\local\pip\cache\wheels\b3\52\a5\1fd9e3e76a7ab34f134c07469cd6f16e27ef3a37aeff1fe821
Building wheel for pynndescent (setup.py) ... done
Created wheel for pynndescent: filename=pynndescent-0.5.7-py3-none-any.whl size=54286 sha256=9203e46f4893cb9228521f3853c8bab4b2a1b40a3dc2bb7c45d29ee220c64d35
Stored in directory: c:\users\user\appdata\local\pip\cache\wheels\7f\2a\f8\7bd5dcec71bd5c669f6f574db3113513696b98f3f9b51f496c
Building wheel for future (setup.py) ... done
Created wheel for future: filename=future-0.18.2-py3-none-any.whl size=491070 sha256=cdaf68da5115f51c5959d42d97b0936f1fc7e57ac2e0b7b1801c1bc5bfe536ca
Stored in directory: c:\users\user\appdata\local\pip\cache\wheels\56\b0\fe\4410d17b32f1f0c3cf54cdfb2bc04d7b4b8f4ae377e2229ba0
Successfully built sentence-transformers umap-learn pynndescent future
Failed to build hdbscan
ERROR: Could not build wheels for hdbscan, which is required to install pyproject.toml-based projects
我决定再试一遍
conda install -c zeus1942 hdbscan
好的。我是文盲。这不行。不是一个错。这 要 怎么搞 呢。
这个博主发的一样是pyproject.toml,试一下
问题是。我不知道该选哪一个下载。
我决定安装和博主一样的。
结果现实
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
ERROR: hdbscan-0.8.28-cp38-cp38-win_amd64.whl is not a supported wheel on this platform.
是说 我安装的这个版本,和我现在的平台不适配?
于是
又找了一个博主发的
whl is not a supported wheel
pip debug --verbose
我得到的运行结果
Compatible tags: 27
cp37-cp37m-win_amd64
cp37-abi3-win_amd64
cp37-none-win_amd64
cp36-abi3-win_amd64
cp35-abi3-win_amd64
cp34-abi3-win_amd64
cp33-abi3-win_amd64
cp32-abi3-win_amd64
py37-none-win_amd64
py3-none-win_amd64
py36-none-win_amd64
py35-none-win_amd64
py34-none-win_amd64
py33-none-win_amd64
py32-none-win_amd64
py31-none-win_amd64
py30-none-win_amd64
cp37-none-any
py37-none-any
py3-none-any
py36-none-any
py35-none-any
py34-none-any
py33-none-any
py32-none-any
py31-none-any
py30-none-any
Installing collected packages: joblib, cython, hdbscan
Attempting uninstall: joblib
Found existing installation: joblib 0.13.0
Uninstalling joblib-0.13.0:
Successfully uninstalled joblib-0.13.0
Attempting uninstall: hdbscan
Found existing installation: hdbscan 0.8.26
Uninstalling hdbscan-0.8.26:
Successfully uninstalled hdbscan-0.8.26
Successfully installed cython-0.29.30 hdbscan-0.8.28 joblib-1.1.0
成功安装了
但是这意思是 之前也有安装好的hdbscan?
那么 我现在再一次
pip install bertopic
Successfully installed bertopic-0.10.0 click-8.1.3 filelock-3.7.1 future-0.18.2 huggingface-hub-0.8.1 llvmlite-0.38.1 nltk-3.7 numba-0.55.2 plotly-5.9.0 pynndescent-0.5.7 p
yyaml-5.4.1 regex-2022.6.2 sentence-transformers-2.2.2 sentencepiece-0.1.96 tenacity-8.0.1 tokenizers-0.12.1 transformers-4.20.1 umap-learn-0.5.3
终于成功
pip install bertopic
遇到和昨天一样的问题
Building wheel for hdbscan (PEP 517) ... error
于是执行
conda install -c zeus1942 hdbscan
得到
UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:
Specifications:
- hdbscan -> python[version='3.6.*|>=3.6,<3.7.0a0|>=3.7,<3.8.0a0']
Your python: python=3.8
If python is on the left-most side of the chain, that's the version you've asked for.
When python appears to the right, that indicates that the thing on the left is somehow
not available for the python version you are constrained to. Note that conda will not
change your python version to a different minor version unless you explicitly specify
that.
尝试上面的方法2
看到这里是 cp38-cp38-win_amd64
于是下载对应版本的
又一次手动安装hdbscan成功了。
再次 pip install bertopic
我好希望这次换成tensorflow环境后,可以直接运行起官网给的5行样例代码。
但是安装过程中看到
Installing collected packages: llvmlite, numba, pynndescent, tqdm, umap-learn, pandas, click, regex, nltk, filelock, tokenizers, typing-extensions, huggingface-hub, transfor
mers, sentencepiece, torch, pillow, torchvision, sentence-transformers, tenacity, plotly, bertopic
(在安装torch?)
Successfully installed bertopic-0.10.0 click-8.1.3 filelock-3.7.1
huggingface-hub-0.8.1 llvmlite-0.38.1 nltk-3.7 numba-0.55.2
pandas-1.4.3 pillow-9.1.1 plotly-5.9.0 pynndesc ent-0.5.7
regex-2022.6.2 sentence-transformers-2.2.2 sentencepiece-0.1.96
tenacity-8.0.1 tokenizers-0.12.1 torch-1.12.0 torchvision-0.13.0
tqdm-4.64.0 transformers-4.20.1 ty ping-extensions-4.2.0
umap-learn-0.5.3
虽然成功安装了。但是红字。
ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts.
We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default.
numba 0.55.2 requires numpy<1.23,>=1.18, but you'll have numpy 1.23.0 which is incompatible.
huggingface-hub 0.8.1 requires packaging>=20.9, but you'll have packaging 20.4 which is incompatible.
在需要安装的包语句后面加 --use-feature=2020-resolver
那意思是 numba和huggingface-hub重新安装吗。
pip install numba==0.55.2 --use-feature=2020-resolver
执行后
得到
Successfully installed numpy-1.22.4
就是把我环境里的numpy变成现在numba适配的了。
再
pip install huggingface-hub==0.8.1 --use-feature=2020-resolver
Successfully installed packaging-21.3
好了!我现在要运行这 5 行 代码!
用文档给的代码,创建了一个main.py
from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
# fetch_20newsgroups ( data_home=None, subset=’train’, categories=None, shuffle=True, random_state=42, remove=(), download_if_missing=True )
# subset='all' 表示有训练集又有测试集,且无序。
# remove= 去掉一些元数据,防止分类器在元数据上过度拟合。‘headers’删除的是新闻组标题,‘footers’删除的是帖子末尾类似于签名的部分,而 ‘quotes’ 删除的是似乎被其他帖子引用了的行。
topic_model = BERTopic()
topics, probs = topic_model.fit_transform(docs)
运行后得到。。。
Traceback (most recent call last):
File "D:/lll/bertopic/mytopictest/main.py", line 1, in <module>
from bertopic import BERTopic
File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\bertopic\__init__.py", line 1, in <module>
from bertopic._bertopic import BERTopic
File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\bertopic\_bertopic.py", line 31, in <module>
from bertopic.backend._utils import select_backend
File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\bertopic\backend\__init__.py", line 2, in <module>
from ._word_doc import WordDocEmbedder
File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\bertopic\backend\_word_doc.py", line 4, in <module>
from bertopic.backend._utils import select_backend
File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\bertopic\backend\_utils.py", line 2, in <module>
from ._sentencetransformers import SentenceTransformerBackend
File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\bertopic\backend\_sentencetransformers.py", line 3, in <module>
from sentence_transformers import SentenceTransformer
File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\sentence_transformers\__init__.py", line 3, in <module>
from .datasets import SentencesDataset, ParallelSentencesDataset
File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\sentence_transformers\datasets\__init__.py", line 3, in <module>
from .ParallelSentencesDataset import ParallelSentencesDataset
File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\sentence_transformers\datasets\ParallelSentencesDataset.py", line 4, in <module>
from .. import SentenceTransformer
File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\sentence_transformers\SentenceTransformer.py", line 11, in <module>
import transformers
File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\transformers\__init__.py", line 30, in <module>
from . import dependency_versions_check
File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\transformers\dependency_versions_check.py", line 17, in <module>
from .utils.versions import require_version, require_version_core
File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\transformers\utils\__init__.py", line 33, in <module>
from .generic import (
File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\transformers\utils\generic.py", line 28, in <module>
from .import_utils import is_flax_available, is_tf_available, is_torch_available, is_torch_fx_proxy
File "C:\Users\user\anaconda3\envs\pytorch1\lib\site-packages\transformers\utils\import_utils.py", line 60, in <module>
_tf_available = importlib.util.find_spec("tensorflow") is not None
File "C:\Users\user\anaconda3\envs\pytorch1\lib\importlib\util.py", line 114, in find_spec
raise ValueError('{}.__spec__ is None'.format(name))
ValueError: tensorflow.__spec__ is None
我都不知道从哪改起了。
半小时后,我冷静了一下又回来了。
看到报错的部分语句是
from . import
from .. import
代表使用相对路径导入,即从当前项目中寻找需要导入的包或函数。
包导入问题
看这个贴子的意思是,不要用这两个语句,把. 或者 … 改成你要导的这个东西,它本身所在的文件名字。
先放一下
先从底下网上解决吧。
ValueError: tensorflow.__spec__ is None
网上搜索这个语句得到:
方法1:
将当前的transformers版本更换为4.4.0版本
方法2:
别去改动transformers,只要把你的TensorFlow升级到最新版本即可
突然在想。我不知道bertopic用的是pytorch还是tensorflow。
bertopic仓库没有搜到到 import torch
仓库里也没有搜到 import tensorflow
只有一个import tensorflow_hub
Tensorflow-hub 是 google 提供的机器学习模组打包函式库,帮开发者把TensorFlow的训练模型发布成模组,方便再次使用或是与社交共享。tensorflow_hub 库可与 TensorFlow 1 和 TensorFlow 2 一起安装。我们建议新用户立即从 TensorFlow 2 开始。
所以,用的是tensorflow?
有没有大佬帮帮我
把训练模型 发布成 模组
所以意思是,这个tensorflow_hub只是用来把他搞好的模型,变成一个可以直接pip install的东西吗?
---------2022-6-29 擦擦眼泪再来看
Successfully installed sacremoses-0.0.53 tokenizers-0.10.3 transformers-4.4.0
尝试方法1
现在安装好4.4.0
再运行一下给的那一丢丢五行代码。
得到完全一样的报错。
然后我的环境好像是pytorch?我现在在conda创建新环境。试一下安装tensorflow?但是我从来没搞过,不知道可不可以。
歪日!!!!!!!!!!!!!!!!!
用tensorflow跑起来了!
先等一会。目前无动静。
基于bertopic新闻主题建模
看看大佬文章。研究下我要怎么换我的中文文本。
过去了很久
进程已结束,退出代码为 0
然后奇怪的就来了
明明运行结束了。没有这个变量。
>>> topic_model.get_topic_info()
Traceback (most recent call last):
File "C:\Users\user\anaconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3418, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "" , line 1, in <module>
topic_model.get_topic_info()
NameError: name 'topic_model' is not defined
为什么啊
2022-6-30
运行起来了!
我还是不明白为什么在main.py运行后是不行的。
现在成功运行的方法是:把一行行代码,输入到python控制台。
然后就可以和他仓库一样成功打印出info了。
然后visualize的时候,并不是输入这一行代码就会跳出来可以看的figure
topic_model.visualize_topics()
这样子输入,我是打印出了一堆图片的属性信息。。
fig = topic_model.visualize_topics()
fig.write_html("path/to/file.html")
所以只好这样子存下来看。
就可以看到结果啦。
但是前面topic输出和这个聚类的图,都和作者仓库现实的不一样。不明白。数据集应该也没变吧?
本文结束。下文开启如何换自己的中文数据集看看。