英文过滤停用词

   
    """
    Created on Sun Nov 13 09:14:13 2016
     
    @author: daxiong
    """
    from nltk.corpus import stopwords
    from nltk.tokenize import sent_tokenize,word_tokenize
     
    #英文停止词,set()集合函数消除重复项
    list_stopWords=list(set(stopwords.words('english')))
    example_text="Five score years ago, a great American, in whose symbolic shadow we stand today, signed the Emancipation Proclamation. This momentous decree came as a great beacon light of hope to millions of Negro slaves who had been seared in the flames of withering injustice. It came as a joyous daybreak to end the long night of bad captivity."
    #分句
    list_sentences=sent_tokenize(example_text)
    #分词
    list_words=word_tokenize(example_text)
    #过滤停止词
    filtered_words=[w for w in list_words if not w in list_stopWords]```

你可能感兴趣的:(AI与机器学习,机器学习原理解析与应用)