https://www.jianshu.com/p/3d9cd356da1a
https://www.jianshu.com/p/528e46284cbc
(nlp) spring@ubuntu18:~$ pip install pkuseg
Looking in indexes: https://mirrors.aliyun.com/pypi/simple
Collecting pkuseg
Downloading https://mirrors.aliyun.com/pypi/packages/36/d8/2cd2d21fc960815d4bb521e1e2e2f725c0e4d1ab88cefa4c73520cd84825/pkuseg-0.0.22-cp36-cp36m-manylinux1_x86_64.whl (50.2MB)
|████████████████████████████████| 50.2MB 1.9MB/s
Requirement already satisfied: numpy>=1.16.0 in ./anaconda3/envs/nlp/lib/python3.6/site-packages (from pkuseg) (1.17.4)
Installing collected packages: pkuseg
Successfully installed pkuseg-0.0.22
(nlp) spring@ubuntu18:~$ python
Python 3.6.9 |Anaconda, Inc.| (default, Jul 30 2019, 19:07:31)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pkuseg
>>> seg = pkuseg.pkuseg()
tex>>>
>>> text = seg.cut('我爱杭州西湖')
>>> print(text)
['我', '爱', '杭州', '西湖']
>>> text = seg.cut('我叫马化腾,我想学区块链,你说好不好啊,天青色等烟雨,而我在 等你,月色被打捞器,晕开了结局')
>>> text
['我', '叫', '马化腾', ',', '我', '想', '学区', '块链', ',', '你', '说', '好不', '好', '啊', ',', '天青色', '等', '烟雨', ',', '而', '我', '在', '等', '你', ',', '月色', '被', '打捞器', ',', '晕开', '了', '结局']
>>> lexicon = ['区块链','好不好', '天青色']
>>> seg = pkuseg.pkuseg(user_dict=lexicon)
>>> text = seg.cut('我叫马化腾,我想学区块链,你说好不好啊,天青色等烟雨,而我在 等你,月色被打捞器,晕开了结局')
>>> text
['我', '叫', '马化腾', ',', '我', '想', '学', '区块链', ',', '你', '说', '好不好', '啊', ',', '天青色', '等', '烟雨', ',', '而', '我', '在', '等', '你', ',', '月色', '被', '打捞器', ',', '晕开', '了', '结局']