采用500MB基础docker镜像:sudo docker run -itd registry.cn-hangzhou.aliyuncs.com/allen135681/easyml:ubuntu18.04-nvidia_cuda10.0-base-Miniconda3-py39_4.9.2 bash
sudo docker run -itd registry.cn-hangzhou.aliyuncs.com/allen135681/easyml:ubuntu18.04-nvidia_cuda10.0-base-Miniconda3-py39_4.9.2 bash
执行以下命令,创建paddle新环境:
conda create -n paddle python=3.7
最终使用的paddlepaddle,paddlenlp,paddlehub版本为:paddlepaddle2.0.1_paddlehub2.0.4_paddlenlp2.0.0rc10 也可以直接使用这个镜像:registry.cn-hangzhou.aliyuncs.com/allen135681/easyml:ubuntu18.04-nvidia_cuda10.0-base-Miniconda3-py39_4.9.2_paddlepaddle2.0.1_paddlehub2.0.4_paddlenlp2.0.0rc10 1GB多的大小,已经创建好paddle虚拟环境,进去后可以直接 pyin paddle进入虚拟环境:
pyin paddle
发现以下两个问题:
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
ImportError: libgthread-2.0.so.0: cannot open shared object file: No such file or directory
按照官方首页 https://www.paddlepaddle.org.cn/hub 里的命令安装
pip install --upgrade paddlepaddle -i https://mirror.baidu.com/pypi/simple
pip install --upgrade paddlehub
!pip install --upgrade paddlepaddle -i https://mirror.baidu.com/pypi/simple
!pip install --upgrade paddlehub -i https://mirror.baidu.com/pypi/simple
import paddlehub as hub
lac = hub.Module(name="lac")
test_text = ["今天是个好天气。"]
results = lac.cut(text=test_text, use_gpu=False, batch_size=1, return_tag=True)
print(results)
#{'word': ['今天', '是', '个', '好天气', '。'], 'tag': ['TIME', 'v', 'q', 'n', 'w']}
安装后执行
import paddlehub as hub
报错:
(paddle) root@0e2459f200d3:/# python
Python 3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddlehub as hub
/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:26: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
def convert_to_list(value, n, name, dtype=np.int):
Traceback (most recent call last):
File "
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddlehub/__init__.py", line 31, in
from paddlehub import datasets
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddlehub/datasets/__init__.py", line 15, in
from paddlehub.datasets.canvas import Canvas
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddlehub/datasets/canvas.py", line 24, in
from paddlehub.utils.download import download_data
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddlehub/utils/download.py", line 22, in
from paddlehub.utils import log, utils, xarfile
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddlehub/utils/utils.py", line 18, in
import cv2
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/cv2/__init__.py", line 5, in
from .cv2 import *
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
(paddle) root@0e2459f200d3:/# python
Python 3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddlehub as hub
/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:26: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
def convert_to_list(value, n, name, dtype=np.int):
Traceback (most recent call last):
File "", line 1, in
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddlehub/__init__.py", line 31, in
from paddlehub import datasets
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddlehub/datasets/__init__.py", line 15, in
from paddlehub.datasets.canvas import Canvas
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddlehub/datasets/canvas.py", line 24, in
from paddlehub.utils.download import download_data
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddlehub/utils/download.py", line 22, in
from paddlehub.utils import log, utils, xarfile
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddlehub/utils/utils.py", line 18, in
import cv2
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/cv2/__init__.py", line 5, in
from .cv2 import *
ImportError: libGL.so.1: cannot open shared object file: No such file or directory
执行以下安装命令:
apt install libgl1-mesa-glx
能成功安装好。
执行:
import paddlehub as hub
进一步报错:
(paddle) root@0e2459f200d3:/# python
Python 3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddlehub as hub
/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:26: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
def convert_to_list(value, n, name, dtype=np.int):
Traceback (most recent call last):
File "
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddlehub/__init__.py", line 31, in
from paddlehub import datasets
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddlehub/datasets/__init__.py", line 15, in
from paddlehub.datasets.canvas import Canvas
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddlehub/datasets/canvas.py", line 24, in
from paddlehub.utils.download import download_data
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddlehub/utils/download.py", line 22, in
from paddlehub.utils import log, utils, xarfile
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddlehub/utils/utils.py", line 18, in
import cv2
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/cv2/__init__.py", line 5, in
from .cv2 import *
ImportError: libgthread-2.0.so.0: cannot open shared object file: No such file or directory
(paddle) root@0e2459f200d3:/# python
Python 3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddlehub as hub
/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:26: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
def convert_to_list(value, n, name, dtype=np.int):
Traceback (most recent call last):
File "", line 1, in
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddlehub/__init__.py", line 31, in
from paddlehub import datasets
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddlehub/datasets/__init__.py", line 15, in
from paddlehub.datasets.canvas import Canvas
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddlehub/datasets/canvas.py", line 24, in
from paddlehub.utils.download import download_data
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddlehub/utils/download.py", line 22, in
from paddlehub.utils import log, utils, xarfile
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddlehub/utils/utils.py", line 18, in
import cv2
File "/root/miniconda3/envs/paddle/lib/python3.7/site-packages/cv2/__init__.py", line 5, in
from .cv2 import *
ImportError: libgthread-2.0.so.0: cannot open shared object file: No such file or directory
执行以下命令安装:
apt-get install libglib2.0-dev
apt-get install libglib2.0-dev
其中会让选择地域,随便选个国内的地方即可。
重新执行命令,终于看到已经正常跑通例子了。
(paddle) root@0e2459f200d3:/# python
Python 3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddlehub as hub
/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:26: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
def convert_to_list(value, n, name, dtype=np.int):
>>> lac = hub.Module(name="lac")
/root/miniconda3/envs/paddle/lib/python3.7/site-packages/pip/_vendor/packaging/version.py:130: DeprecationWarning: Creating a LegacyVersion has been deprecated and will be removed in the next major release
DeprecationWarning,
/root/miniconda3/envs/paddle/lib/python3.7/site-packages/pip/_vendor/packaging/version.py:130: DeprecationWarning: Creating a LegacyVersion has been deprecated and will be removed in the next major release
DeprecationWarning,
2021-03-09 16:59:16,186 - INFO - Lock 140416018468688 acquired on /root/.paddlehub/tmp/lac
[INFO 2021-03-09 16:59:16,186 filelock.py:274] Lock 140416018468688 acquired on /root/.paddlehub/tmp/lac
Download https://bj.bcebos.com/paddlehub/paddlehub_dev/lac_2.2.0.tar.gz
[##################################################] 100.00%
Decompress /root/.paddlehub/tmp/tmp8cg9egfq/lac_2.2.0.tar.gz
[##################################################] 100.00%
[2021-03-09 16:59:21,071] [ INFO] - Successfully installed lac-2.2.0
2021-03-09 16:59:21,090 - INFO - Lock 140416018468688 released on /root/.paddlehub/tmp/lac
[INFO 2021-03-09 16:59:21,090 filelock.py:318] Lock 140416018468688 released on /root/.paddlehub/tmp/lac
[2021-03-09 16:59:21,090] [ WARNING] - The _initialize method in HubModule will soon be deprecated, you can use the __init__() to handle the initialization of the object
W0309 16:59:21.160986 2164 analysis_predictor.cc:1145] Deprecated. Please use CreatePredictor instead.
>>> test_text = ["今天是个好天气。"]
>>>
>>> results = lac.cut(text=test_text, use_gpu=False, batch_size=1, return_tag=True)
>>> print(results)
[{'word': ['今天', '是', '个', '好天气', '。'], 'tag': ['TIME', 'v', 'q', 'n', 'w']}]
(paddle) root@0e2459f200d3:/# python
Python 3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddlehub as hub
/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:26: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
def convert_to_list(value, n, name, dtype=np.int):
>>> lac = hub.Module(name="lac")
/root/miniconda3/envs/paddle/lib/python3.7/site-packages/pip/_vendor/packaging/version.py:130: DeprecationWarning: Creating a LegacyVersion has been deprecated and will be removed in the next major release
DeprecationWarning,
/root/miniconda3/envs/paddle/lib/python3.7/site-packages/pip/_vendor/packaging/version.py:130: DeprecationWarning: Creating a LegacyVersion has been deprecated and will be removed in the next major release
DeprecationWarning,
2021-03-09 16:59:16,186 - INFO - Lock 140416018468688 acquired on /root/.paddlehub/tmp/lac
[INFO 2021-03-09 16:59:16,186 filelock.py:274] Lock 140416018468688 acquired on /root/.paddlehub/tmp/lac
Download https://bj.bcebos.com/paddlehub/paddlehub_dev/lac_2.2.0.tar.gz
[##################################################] 100.00%
Decompress /root/.paddlehub/tmp/tmp8cg9egfq/lac_2.2.0.tar.gz
[##################################################] 100.00%
[2021-03-09 16:59:21,071] [ INFO] - Successfully installed lac-2.2.0
2021-03-09 16:59:21,090 - INFO - Lock 140416018468688 released on /root/.paddlehub/tmp/lac
[INFO 2021-03-09 16:59:21,090 filelock.py:318] Lock 140416018468688 released on /root/.paddlehub/tmp/lac
[2021-03-09 16:59:21,090] [ WARNING] - The _initialize method in HubModule will soon be deprecated, you can use the __init__() to handle the initialization of the object
W0309 16:59:21.160986 2164 analysis_predictor.cc:1145] Deprecated. Please use CreatePredictor instead.
>>> test_text = ["今天是个好天气。"]
>>>
>>> results = lac.cut(text=test_text, use_gpu=False, batch_size=1, return_tag=True)
>>> print(results)
[{'word': ['今天', '是', '个', '好天气', '。'], 'tag': ['TIME', 'v', 'q', 'n', 'w']}]
>>> test_text = ["经过近1个月的大幅下跌后,螺纹钢期货上周五出现了小幅反弹,一根小阳线成功突破60分钟MA20均线的压制,并在随后出现了回抽确认。技术面来看,该均线为本轮下跌的主要压力线,鉴于基本面暂不具备大幅反弹的条件,若近期螺纹钢能站稳60 分钟MA20均线上方,后期将出现止跌企稳,并将围绕3500步入振荡整理走势。若期价短期再度跌破60分钟MA20均线,后期将再创本轮新低。"]
>>> results = lac.cut(text=test_text, use_gpu=False, batch_size=1, return_tag=True)
>>> print(results)
[{'word': ['经过', '近1个月', '的', '大幅', '下跌', '后', ',', '螺纹钢', '期货', '上周五', '出现', '了', '小幅', '反弹', ',', '一根', '小阳线', '成功', '突破', '60分钟', 'MA20', '均线', '的', '压制', ',', '并', '在', '随后', '出现', '了', '回', '抽', '确认', '。', '技术', '面', '来看', ',', '该', '均线', '为', '本轮', '下跌', '的', '主要', '压力线', ',', '鉴于', '基本面', '暂', '不具备', '大幅', '反弹', '的', '条件', ',', '若', '近期', '螺纹钢', '能', '站稳', '60分钟', 'MA20', '均线', '上方', ',', '后期', '将', '出现', '止跌', '企稳', ',', '并', '将', '围绕', '3500步', '入', '振荡', '整理', '走势', '。', '若', '期价', '短期', '再度', '跌破', '60分钟', 'MA20', '均线', ',', '后期', '将', '再 创', '本轮', '新低', '。'], 'tag': ['p', 'TIME', 'u', 'd', 'v', 'f', 'w', 'nz', 'n', 'TIME', 'v', 'u', 'd', 'v', 'w', 'm', 'n', 'ad', 'v', 'TIME', 'nz', 'n', 'u', 'vn', 'w', 'c', 'p', 'd', 'v', 'u', 'v', 'v', 'v', 'w', 'n', 'n', 'v', 'w', 'r', 'n', 'v', 'r', 'v', 'u', 'a', 'n', 'w', 'p', 'n', 'd', 'v', 'd', 'v', 'u', 'n', 'w', 'c', 't', 'nz', 'v', 'v', 'TIME', 'nz', 'n', 'f', 'w', 't', 'd', 'v', 'vn', 'vn', 'w', 'c', 'd', 'v', 'm', 'v', 'vn', 'vn', 'n', 'w', 'c', 'n', 'n', 'd', 'v', 'TIME', 'nz', 'n', 'w', 't', 'd', 'v', 'r', 'n', 'w']}]
>>>
从这里http://futures.cngold.org/c/2014-01-13/c2369007.html 复制了第一段的文本,分词效果还可以。
(paddle) root@0e2459f200d3:/# python
Python 3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddlehub as hub
/root/miniconda3/envs/paddle/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:26: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
def convert_to_list(value, n, name, dtype=np.int):
>>> lac = hub.Module(name="lac")
/root/miniconda3/envs/paddle/lib/python3.7/site-packages/pip/_vendor/packaging/version.py:130: DeprecationWarning: Creating a LegacyVersion has been deprecated and will be removed in the next major release
DeprecationWarning,
/root/miniconda3/envs/paddle/lib/python3.7/site-packages/pip/_vendor/packaging/version.py:130: DeprecationWarning: Creating a LegacyVersion has been deprecated and will be removed in the next major release
DeprecationWarning,
2021-03-09 16:59:16,186 - INFO - Lock 140416018468688 acquired on /root/.paddlehub/tmp/lac
[INFO 2021-03-09 16:59:16,186 filelock.py:274] Lock 140416018468688 acquired on /root/.paddlehub/tmp/lac
Download https://bj.bcebos.com/paddlehub/paddlehub_dev/lac_2.2.0.tar.gz
[##################################################] 100.00%
Decompress /root/.paddlehub/tmp/tmp8cg9egfq/lac_2.2.0.tar.gz
[##################################################] 100.00%
[2021-03-09 16:59:21,071] [ INFO] - Successfully installed lac-2.2.0
2021-03-09 16:59:21,090 - INFO - Lock 140416018468688 released on /root/.paddlehub/tmp/lac
[INFO 2021-03-09 16:59:21,090 filelock.py:318] Lock 140416018468688 released on /root/.paddlehub/tmp/lac
[2021-03-09 16:59:21,090] [ WARNING] - The _initialize method in HubModule will soon be deprecated, you can use the __init__() to handle the initialization of the object
W0309 16:59:21.160986 2164 analysis_predictor.cc:1145] Deprecated. Please use CreatePredictor instead.
>>> test_text = ["今天是个好天气。"]
>>>
>>> results = lac.cut(text=test_text, use_gpu=False, batch_size=1, return_tag=True)
>>> print(results)
[{'word': ['今天', '是', '个', '好天气', '。'], 'tag': ['TIME', 'v', 'q', 'n', 'w']}]
>>> test_text = ["经过近1个月的大幅下跌后,螺纹钢期货上周五出现了小幅反弹,一根小阳线成功突破60分钟MA20均线的压制,并在随后出现了回抽确认。技术面来看,该均线为本轮下跌的主要压力线,鉴于基本面暂不具备大幅反弹的条件,若近期螺纹钢能站稳60 分钟MA20均线上方,后期将出现止跌企稳,并将围绕3500步入振荡整理走势。若期价短期再度跌破60分钟MA20均线,后期将再创本轮新低。"]
>>> results = lac.cut(text=test_text, use_gpu=False, batch_size=1, return_tag=True)
>>> print(results)
[{'word': ['经过', '近1个月', '的', '大幅', '下跌', '后', ',', '螺纹钢', '期货', '上周五', '出现', '了', '小幅', '反弹', ',', '一根', '小阳线', '成功', '突破', '60分钟', 'MA20', '均线', '的', '压制', ',', '并', '在', '随后', '出现', '了', '回', '抽', '确认', '。', '技术', '面', '来看', ',', '该', '均线', '为', '本轮', '下跌', '的', '主要', '压力线', ',', '鉴于', '基本面', '暂', '不具备', '大幅', '反弹', '的', '条件', ',', '若', '近期', '螺纹钢', '能', '站稳', '60分钟', 'MA20', '均线', '上方', ',', '后期', '将', '出现', '止跌', '企稳', ',', '并', '将', '围绕', '3500步', '入', '振荡', '整理', '走势', '。', '若', '期价', '短期', '再度', '跌破', '60分钟', 'MA20', '均线', ',', '后期', '将', '再 创', '本轮', '新低', '。'], 'tag': ['p', 'TIME', 'u', 'd', 'v', 'f', 'w', 'nz', 'n', 'TIME', 'v', 'u', 'd', 'v', 'w', 'm', 'n', 'ad', 'v', 'TIME', 'nz', 'n', 'u', 'vn', 'w', 'c', 'p', 'd', 'v', 'u', 'v', 'v', 'v', 'w', 'n', 'n', 'v', 'w', 'r', 'n', 'v', 'r', 'v', 'u', 'a', 'n', 'w', 'p', 'n', 'd', 'v', 'd', 'v', 'u', 'n', 'w', 'c', 't', 'nz', 'v', 'v', 'TIME', 'nz', 'n', 'f', 'w', 't', 'd', 'v', 'vn', 'vn', 'w', 'c', 'd', 'v', 'm', 'v', 'vn', 'vn', 'n', 'w', 'c', 'n', 'n', 'd', 'v', 'TIME', 'nz', 'n', 'w', 't', 'd', 'v', 'r', 'n', 'w']}]
>>>