我运行的代码出自https://github.com/sunxiangguo/chinese_text_classification,python是3.9的,pycharm是2020.3.3。
训练集和测试集都是自带的,然后要自己创建两个文件夹来存放分词完之后的文本,后来执行TF-IDF的时候出现了以下错误
C:/Users/qianyz/Downloads/chinese_text_classification-master/TFIDF_space.py
Traceback (most recent call last):
File "C:\Users\qianyz\Downloads\chinese_text_classification-master\TFIDF_space.py", line 41, in
vector_space(stopword_path, bunch_path, space_path)
File "C:\Users\qianyz\Downloads\chinese_text_classification-master\TFIDF_space.py", line 30, in vector_space
tfidfspace.tdm = vectorizer.fit_transform(bunch.contents)
File "C:\Users\qianyz\venv\Lib\site-packages\sklearn\feature_extraction\text.py", line 1849, in fit_transform
X = super().fit_transform(raw_documents)
File "C:\Users\qianyz\venv\Lib\site-packages\sklearn\feature_extraction\text.py", line 1203, in fit_transform
vocabulary, X = self._count_vocab(raw_documents,self.fixed_vocabulary_)
File "C:\Users\qianyz\venv\Lib\site-packages\sklearn\feature_extraction\text.py", line 1134, in _count_vocab
raise ValueError("empty vocabulary; perhaps the documents only"
ValueError: empty vocabulary; perhaps the documents only contain stop words
进程已结束,退出代码为 1
我看了很多网上和博客上的解决方法,修改过analyzer的数据变成word和char,但是还是报一样的错误,求各位大佬解答一下,不胜感激