SeamlessM4T是Meta发布的一款能够转录和翻译近100种语言的一体化AI大模型。可以到官方演示地址进行体验。可以直接使用我改好的项目。
# 指定项目地址
cd /opt
# 拉取项目
git clone https://github.com/facebookresearch/seamless_communication.git
# 创建虚拟环境
mkvirtualenv -p python3 seamless
# 安装依赖
cd /opt/seamless_communication
pip install .
在项目的seamless_communication\src\seamless_communication\assets\cards
文件路径下
seamlessM4T_large.yaml
seamlessM4T_medium.yaml
unity_nllb-100.yaml
unity_nllb-200.yaml
vocoder_36langs.yaml
xlsr2_1b_v2.yaml
写有对应的模型资源地址:
checkpoint: "https://huggingface.co/facebook/seamless-m4t-large/resolve/main/multitask_unity_large.pt"
checkpoint: "https://huggingface.co/facebook/seamless-m4t-medium/resolve/main/multitask_unity_medium.pt"
tokenizer: "https://huggingface.co/facebook/seamless-m4t-large/resolve/main/tokenizer.model"
tokenizer: "https://huggingface.co/facebook/seamless-m4t-medium/resolve/main/tokenizer.model"
checkpoint: "https://huggingface.co/facebook/seamless-m4t-vocoder/resolve/main/vocoder_36langs.pt"
checkpoint: "https://dl.fbaipublicfiles.com/seamlessM4T/models/unit_extraction/xlsr2_1b_v2.pt"
将对应的模型资源下载到本地对应位置后,修改配置内的地址
checkpoint: "file://opt/seamless_communication/models/large/multitask_unity_large.pt"
checkpoint: "file://opt/seamless_communication/models/medium/multitask_unity_medium.pt"
tokenizer: "file://opt/seamless_communication/models/large/tokenizer.model"
tokenizer: "file://opt/seamless_communication/models/medium/tokenizer.model"
checkpoint: "file://opt/seamless_communication/models/vocoder/vocoder_36langs.pt"
checkpoint: "file://opt/seamless_communication/models/xlsr2/xlsr2_1b_v2.pt"
将官方demo文件夹内的依赖文件/seamless_communication/demo/requirements.txt
中的
git+https://github.com/facebookresearch/seamless_communication
修改为使用本地安装(因为更换了模型资源地址为本地)
/opt/seamless_communication
再执行依赖安装
cd /opt/seamless_communication/demo
workon seamless
pip install -r requirements.txt
如果事先执行过使用远程依赖安装的,则需要先卸载远程版本依赖
pip uninstall seamless_communication
再安装本地版本依赖
pip install /opt/seamless_communication
官方demo中有一段代码是用于获取远程音频资源,在国内由于网络问题可能无法获取
# Download sample input audio files
filenames = ["assets/sample_input.mp3", "assets/sample_input_2.mp3"]
for filename in filenames:
hf_hub_download(
repo_id="facebook/seamless_m4t",
repo_type="space",
filename=filename,
local_dir=".",
)
可以手动在官方案例下载后,将sample_input.mp3、sample_input_2.mp3放到对应目录/opt/seamless_communication/demo/assets
并注释掉如上代码片段
workon seamless
cd /opt/seamless_communication/demo
python app.py
就可以看到运行成功
Running on local URL: http://127.0.0.1:7860
# 将
# demo.queue().launch()
# 修改成
demo.queue().launch(share=True, inbrowser=True, server_name='127.0.0.1', server_port=8080)
wget https://cdn-media.huggingface.co/frpc-gradio-0.2/frpc_linux_amd64
mv frpc_linux_amd64 frpc_linux_amd64_v0.2
mv frpc_linux_amd64_v0.2 /root/.virtualenvs/glm/lib/python3.8/site-packages/gradio
cd /root/.virtualenvs/glm/lib/python3.8/site-packages/gradio
chmod 777 frpc_linux_amd64_v0.2
Traceback (most recent call last):
File "app.py", line 587, in
input_audio_mic = gr.Audio(
File "/root/.virtualenvs/seamless/lib/python3.8/site-packages/gradio/component_meta.py", line 152, in wrapper
return fn(self, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'source'
注释掉
# source="microphone",
# source="upload",
在demo同级目录下,创建有Flask接口文件seamless_communication\demo\api.py
,跳转到对应目录后,执行python api.py
即可
接口 | http://127.0.0.1:8080 |
||
---|---|---|---|
类型 | POST |
body 中使用 form-data 上传 |
|
key |
必填秘钥 | ab7d978a80a0b833c460e4cf456edd6b |
|
postType |
传入类型 | url path file text |
默认 text |
inType |
输入类型 | text speech |
默认 text |
inLang |
输入语种 | 默认中文 cmn |
|
inStr |
输入文本 | 被译文本、远程/本地文件地址 | 仅上传文件时可不填 |
outType |
输出类型 | text speech |
默认 text |
outLang |
输出语种 | 默认英语 eng |
|
file |
上传文件 | wav mp3 txt |
因为生成的语音文件,默认保存到本地服务器的指定文件夹内,需要使用Nginx配置将该文件夹设置允许访问
vi /etc/nginx/nginx.conf
http {
# 加入如下内容
server {
listen 8081;
server_name localhost;
location /media/ {
alias /opt/seamless_communication/media/out/;
autoindex on;
}
}
}
service nginx restart
sudo chmod -R 755 /opt/seamless_communication/media/out/
简单开发了一个配合Flask接口使用的FlutterAPP,可以自行根据项目代码编译。