pkuseg分词/词性标注工具源码安装方法及模型文件存放位置

本文写于2022年04月03日，阅读时请注意时效。

问题场景

使用conda创建的python==3.9.11、tensorflow-gpu==2.4.1的环境中使用pip install pkuseg失败，出错情况大概有三种，后来有一种无论如何也无法复现了，所以这里只记录源码安装pkuseg的流程。

经测试pkuseg可以在python==3.7.13或python==3.8.13的环境下使用pip install pkuseg安装成功，环境中是否有tensorflow-gpu不影响安装，遇到环境中缺少numpy cython的依赖装上就可以了。

先在GitHub页面下载代码的zip包，得到文件名为pkuseg-python-master.zip的压缩文件。
安装numpy、cython依赖：pip install numpy cython
安装pkuseg：pip install pkuseg-python-master.zip

终端输出如下：

raner@testnode:~$ conda activate test
(test) raner@testnode:~$ pip install numpy cython
Looking in indexes: http://pypi.tuna.tsinghua.edu.cn/simple
Collecting numpy
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/15/87/4d6bc4e2053a4b517b022746f8e2dae328155a4c723bcad4c7d536febf51/numpy-1.22.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.8 MB)
     |████████████████████████████████| 16.8 MB 919 kB/s
Collecting cython
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/94/6a/0d66e2d9cf405c87c74d1d29439c4910d3d1895fb122667920a4012d0bda/Cython-0.29.28-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (1.9 MB)
     |████████████████████████████████| 1.9 MB 1.3 MB/s
Installing collected packages: numpy, cython
Successfully installed cython-0.29.28 numpy-1.22.3
(test) raner@testnode:~$ pip install pkuseg-python-master.zip
Looking in indexes: http://pypi.tuna.tsinghua.edu.cn/simple
Processing ./pkuseg-python-master.zip
Requirement already satisfied: cython in ./anaconda3/envs/test/lib/python3.10/site-packages (from pkuseg==0.0.25) (0.29.28)
Requirement already satisfied: numpy>=1.16.0 in ./anaconda3/envs/test/lib/python3.10/site-packages (from pkuseg==0.0.25) (1.22.3)
Building wheels for collected packages: pkuseg
  Building wheel for pkuseg (setup.py) ... done
  Created wheel for pkuseg: filename=pkuseg-0.0.25-cp310-cp310-linux_x86_64.whl size=3321624 sha256=6be6d05f53319aac298f9535a6f4a0cd0ed41dd3c91548b07300bc51639d6889
  Stored in directory: /public/home/raner/.cache/pip/wheels/73/db/1c/9a992085963288025e05fe3229efeb59db87a06903c5f4fa7f
Successfully built pkuseg
Installing collected packages: pkuseg
Successfully installed pkuseg-0.0.25

模型文件存放位置

安装结束后，在GitHub的Release中找到后缀为.zip的模型文件，放到~/.pkuseg文件夹中，然后将每个压缩文件解压到对应的文件夹中即可，目录结构如下：

raner@testnode:~$ tree .pkuseg/
.pkuseg/
├── medicine
│   ├── features.pkl
│   ├── medicine_dict.pkl
│   └── weights.npz
├── medicine.zip
├── mixed
│   ├── features.pkl
│   └── weights.npz
├── mixed.zip
├── news
│   ├── features.pkl
│   └── weights.npz
├── news.zip
├── postag
│   ├── features.pkl
│   └── weights.npz
├── postag.zip
├── tourism
│   ├── features.pkl
│   ├── tourism_dict.pkl
│   └── weights.npz
├── tourism.zip
├── web
│   ├── features.pkl
│   └── weights.npz
└── web.zip

6 directories, 20 files

使用方法

这里使用mixed模型，也就是默认模型的方法和文档中的有一些区别，因为使用源码安装是没有自带默认的模型文件的。其他模型如news等使用方法不变。（使用pip安装的情况模型文件放置的位置是相同的）

import pkuseg

# 默认模型的使用方法，需要写绝对路径
# 路径中使用~无效，使用default或者mixed作为参数值均无效
seg = pkuseg.pkuseg(model_name='/public/home/raner/.pkuseg/mixed', postag=True)
text = seg.cut('我昨天忘记签到了。')
print(text)

# 其他模型的使用方法与官方readme一致
seg = pkuseg.pkuseg(model_name='news', postag=True)
text = seg.cut('我昨天忘记签到了。')
print(text)

参考链接：

解决方案：Discussion #7370 at lancopku/pkuseg-python · GitHub
模型文件下载地址和位置：pkuseg-python/config.py at lancopku/pkuseg-python · GitHub

pkuseg分词/词性标注工具源码安装方法及模型文件存放位置

问题场景

模型文件存放位置

使用方法

你可能感兴趣的:(pkuseg分词/词性标注工具源码安装方法及模型文件存放位置)