Python 分词 第三方模块

分词的第三方模块

介绍用过的 Python 两种分词的模块 jieba 和 snownlp,直接上实例:

1、实例 jieba

from jieba import posseg as pseg

# 采用结巴分词进行分词,返回分词和词性
cur_tuple_words = pseg.lcut(words)
for word, flag in cur_tuple_words:
    print(word)
    print(flag)
    

2、实例 snownlp

from snownlp import SnowNLP

s = SnowNLP(text)
fenciList = s.tags
for word, flag in fenciList:
    print(word)
    print(flag)

在使用过程中,发现两者存在一个很明显的区别——结巴在导入时花费时间比较长,snownlp明显短。用代码来说明,请看下面:

  • import snownlp 大概三四秒
import time
start_time = time.time()
from snownlp import SnowNLP
end_time = time.time()
print(end_time - start_time)

输出 3.98864293098

  • import jieba 在 10秒左右
import time
start_time = time.time()
from jieba import posseg as pseg
end_time = time.time()
print(end_time - start_time)

输出 10.1280369759


业务需求用到,没深究,若有错误欢迎请指出。

友情链接:jieba(结巴)分词种词性简介

你可能感兴趣的:(python)