FileNotFoundError: [Errno 2] No such file or directory: ‘errors.out‘ (python自然语言处理章节5.6 最后的示例报错)

在使用python3.7运行Natural Language Processing with Python Chapter 5 的最后一个示例

from nltk.tbl import demo as brill_demo
brill_demo.demo()
print(open("errors.out").read())

时, 出现如下错误:

Traceback (most recent call last):
  File "E:/Python Practice/NLP/Chapter5.py", line 332, in 
    print(open("errors.out").read())
FileNotFoundError: [Errno 2] No such file or directory: 'errors.out'

字面意思就是说,该文件不存在,在当前目录查找后也确实没有。通过搜索没有找到现成的解决方法,于是在StackOverflow求助,怀疑是nltk.tbl.demo模块的版本问题——是不是新的模块中有其他类似的生成errors.out文件的方法?

于是查看nltk/tbl/demo模块的源码,果然发现有一个类似的函数,如下

def demo_error_analysis():
    """
    Writes a file with context for each erroneous word after tagging testing data
    """
    postag(error_output="errors.txt")

根据注释,发现这个函数的功能正是生成类似errors.out的文件。于是自然就想到,我们只要首先执行demo_error_analysis()函数,然后读取生成的文件就好啦,

brill_demo.demo_error_analysis()

然而事情往往没有那么简单。。。运行后报错如下:

Traceback (most recent call last):
  File "E:/Python Practice/NLP/Chapter5.py", line 331, in 
    brill_demo.demo_error_analysis()
  File "D:\Anaconda3\lib\site-packages\nltk\tbl\demo.py", line 124, in demo_error_analysis
    postag(error_output="errors.txt")
  File "D:\Anaconda3\lib\site-packages\nltk\tbl\demo.py", line 322, in postag
    u"\n".join(error_list(gold_data, taggedtest)).encode("utf-8") + "\n" #
TypeError: can't concat str to bytes

跟随提示的路径找到报错所在的源文件,如下

  # writing error analysis to file
    if error_output is not None:
        with open(error_output, "w") as f:
            f.write("Errors for Brill Tagger %r\n\n" % serialize_output)
            f.write(
                u"\n".join(error_list(gold_data, taggedtest)).encode("utf-8") + "\n"
            )
        print("Wrote tagger errors including context to {0}".format(error_output))

那么报错的意思就是说,在下面这一行,生成error_list时出现类型转换的问题了

 u"\n".join(error_list(gold_data, taggedtest)).encode("utf-8") + "\n"

通过查阅这篇文章,发现问题所在:encode函数返回的是bytes类型的变量,不可以直接和string类型的变量合并,需要再调用decode函数,把bytes类型转变为string类型。

因此,解决方法很简单,即把这一行改成

u"\n".join(error_list(gold_data, taggedtest)).encode("utf-8").decode() + "\n" #add .decode()

(修改时可能会出现提示信息询问是否确认修改,放心大胆的改吧朋友们,如果不放心的话后面注释一下修改的内容,向我上面那样做)

经过小小的改动之后,再次运行 

brill_demo.demo_error_analysis()

这时候就正常啦!

Loading tagged data from treebank... 
Read testing data (200 sents/5251 wds)
Read training data (800 sents/19933 wds)
Read baseline data (800 sents/19933 wds) [reused the training set]
Trained baseline tagger
    Accuracy on test set: 0.8366
Training tbl tagger...
TBL train (fast) (seqs: 800; tokens: 19933; tpls: 24; min score: 3; min acc: None)
Finding initial useful rules...
    Found 12799 useful rules.

           B      |
   S   F   r   O  |        Score = Fixed - Broken
   c   i   o   t  |  R     Fixed = num tags changed incorrect -> correct
   o   x   k   h  |  u     Broken = num tags changed correct -> incorrect
   r   e   e   e  |  l     Other = num tags changed incorrect -> incorrect
   e   d   n   r  |  e
------------------+-------------------------------------------------------
  23  23   0   0  | POS->VBZ if Pos:PRP@[-2,-1]
  18  19   1   0  | NN->VB if Pos:-NONE-@[-2] & Pos:TO@[-1]
  14  14   0   0  | VBP->VB if Pos:MD@[-2,-1]
  12  12   0   0  | VBP->VB if Pos:TO@[-1]
  11  11   0   0  | VBD->VBN if Pos:VBD@[-1]
  11  11   0   0  | IN->WDT if Pos:-NONE-@[1] & Pos:VBP@[2]
  10  11   1   0  | VBN->VBD if Pos:PRP@[-1]
   9  10   1   0  | VBD->VBN if Pos:VBZ@[-1]
   8   8   0   0  | NN->VB if Pos:MD@[-1]
   7   7   0   1  | VB->NN if Pos:DT@[-1]
   7   7   0   0  | VB->VBP if Pos:PRP@[-1]
   7   7   0   0  | IN->WDT if Pos:-NONE-@[1] & Pos:VBZ@[2]
   7   8   1   0  | IN->RB if Word:as@[2]
   6   6   0   0  | VBD->VBN if Pos:VBP@[-2,-1]
   6   6   0   1  | IN->WDT if Pos:-NONE-@[1] & Pos:VBD@[2]
   5   5   0   0  | POS->VBZ if Pos:-NONE-@[-1]
   5   5   0   0  | VB->VBP if Pos:NNS@[-1]
   5   5   0   0  | VBD->VBN if Word:be@[-2,-1]
   4   4   0   0  | POS->VBZ if Pos:``@[-2]
   4   4   0   0  | VBP->VB if Pos:VBD@[-2,-1]
   4   6   2   3  | RP->RB if Pos:CD@[1,2]
   4   4   0   0  | RB->JJ if Pos:DT@[-1] & Pos:NN@[1]
   4   4   0   0  | NN->VBP if Pos:NNS@[-2] & Pos:RB@[-1]
   4   5   1   0  | VBN->VBD if Pos:NNP@[-2] & Pos:NNP@[-1]
   4   4   0   0  | IN->WDT if Pos:-NONE-@[1] & Pos:MD@[2]
   4   8   4   0  | VBD->VBN if Word:*@[1]
   4   4   0   0  | JJS->RBS if Word:most@[0] & Word:the@[-1] & Pos:DT@[-1]
   3   3   0   0  | VBD->VBN if Pos:VBN@[-1]
   3   4   1   0  | VBN->VB if Pos:TO@[-1]
   3   4   1   1  | IN->RB if Pos:.@[1]
   3   3   0   0  | JJ->RB if Pos:VBD@[1]
   3   3   0   0  | PRP$->PRP if Pos:TO@[1]
   3   3   0   0  | NN->VBP if Pos:NNS@[-1] & Pos:DT@[1]
   3   3   0   0  | VBP->VB if Word:n't@[-2,-1]
Trained tbl tagger in 2.45 seconds
    Accuracy on test set: 0.8572
Tagging the test data
Wrote tagger errors including context to errors.txt

我们可以看到当前目录下多出了一个errors.txt文件

最后一步,读取并输出文件

print(open("errors.txt").read())

输出内容如下(部分):

Errors for Brill Tagger None

             left context |    word/test->gold     | right context
--------------------------+------------------------+--------------------------
                          |      Soon/NN->RB       | ,/, T-shirts/NNS *ICH*-1/
n/IN the/DT corridors/NNS |      that/IN->WDT      | *T*-2/-NONE- carried/VBD 
NNS that/WDT *T*-2/-NONE- |    carried/VBN->VBD    | the/DT school/NN 's/POS f
D the/DT school/NN 's/POS |    familiar/NN->JJ     | red-and-white/JJ GHS/NNP 
ool/NN 's/POS familiar/JJ |  red-and-white/NN->JJ  | GHS/NNP logo/NN on/IN the
iliar/JJ red-and-white/JJ |      GHS/NN->NNP       | logo/NN on/IN the/DT fron
/NN ,/, the/DT shirts/NNS |     read/VBP->VBD      | ,/, ``/`` We/PRP have/VBP
,/, ``/`` We/PRP have/VBP |      all/DT->PDT       | the/DT answers/NNS ./. ''
JJ colleagues/NNS are/VBP |      angry/NN->JJ      | at/IN Mrs./NNP Yeargin/NN
n/NNP Rice/NNP ,/, who/WP |   *T*-100/NN->-NONE-   | had/VBD discovered/VBN th
VBD discovered/VBN the/DT |      crib/JJ->NN       | notes/NNS ./.
             ``/`` We/PRP |      work/NN->VBP      | damn/RB hard/RB at/IN wha
    ``/`` We/PRP work/VBP |      damn/NN->RB       | hard/RB at/IN what/WP we/
/IN what/WP we/PRP do/VBP |   *T*-101/NN->-NONE-   | for/IN damn/RB little/JJ 
VBP *T*-101/-NONE- for/IN |      damn/NN->RB       | little/JJ pay/NN ,/, and/
...

至此,我们就解决了最初的问题~

赶在双十一的尾巴总结一下这个困扰我两三个小时的问题,希望对后来者有帮助~

你可能感兴趣的:(碎片笔记,python,nlp,自然语言处理,数据类型)