[python]bertopic安装后测试代码

from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups

docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']

topic_model = BERTopic()
print('start fit transform...')
topics, probs = topic_model.fit_transform(docs)
print('fit done')
print(topic_model.get_topic_info())

上面fetch_20newsgroups加载需要国外源因此很难下载,需要手动离线加载,加载方法参考文章:

[python]离线加载fetch_20newsgroups数据集-CSDN博客文章浏览阅读438次,点赞7次,收藏9次。打开twenty_newsgroups.py文件。下载这个文件后和脚本放一起就行,然后。首先手动下载这个数据包。https://blog.csdn.net/FL1623863129/article/details/134654050?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522170348829216800222813703%2522%252C%2522scm%2522%253A%252220140713.130102334.pc%255Fblog.%2522%257D&request_id=170348829216800222813703&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2~blog~first_rank_ecpm_v1~rank_v31_ecpm-1-134654050-null-null.nonecase&utm_term=fetch_20newsgroups&spm=1018.2226.3001.4450

测试输出:

start fit transform...
fit done
     Topic  ...                                Representative_Docs
0       -1  ...  [This is a periodic posting intended to answer...
1        0  ...  [I thought I'd post my predicted standings sin...
2        1  ...  [\nI am not an expert in the cryptography scie...
3        2  ...                            [Hello,, Hello,, ites:]
4        3  ...  [*********************************************...
..     ...  ...                                                ...
227    226  ...  [\n\nTrue, coach Matikainen is ready to keep a...
228    227  ...  [Archive-name: typing-injury-faq/software\nVer...
229    228  ...  [\n\nIn this era of AIDS, isn't someone's fuck...
230    229  ...  [Hi, I am doing a term paper on the syringe an...
231    230  ...  [\n\n\n\n\nSounds to me like your dealer reall...

[232 rows x 5 columns]

 

你可能感兴趣的:(Python,python,前端,linux)