ubuntu 22.04 M4Singer 安装部署

成功的指令记录

 conda create -n python3.7.12 python==3.7.12
 conda activate python3.7.12

(python3.7.12) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ python -V
Python 3.7.12
(python3.7.12) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ python3 -m venv venv3712
(python3.7.12) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ source venv3712/bin/activate
(venv3712) (python3.7.12) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ pip install -r requirements_2080.txt 

采用conda创建python env,再基于python venv在工程目录下创建venv方案。

git信息

(venv3712) (python3.7.12) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ git branch
* master
(venv3712) (python3.7.12) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ git log
commit 45ec525a0834b3f12605120eb36efe992c1f5455 (grafted, HEAD -> master, origin/master, origin/HEAD)
Author: m4singer <18866416692>
Date:   Thu Dec 29 18:22:57 2022 +0800

    init
(venv3712) (python3.7.12) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ git remote -v
origin    https://github.com/M4Singer/M4Singer (fetch)
origin    https://github.com/M4Singer/M4Singer (push)
 

由于github主页没有说明python3版本,经测试,发现python3.7.12可顺利安装依赖,其他几个版本均有故障。日志记录如下

(venv) (python3.9.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ pip install numpy==1.11


   gcc: build/src.linux-x86_64-3.9/numpy/core/src/multiarray/lowlevel_strided_loops.c
      gcc: numpy/core/src/multiarray/mapping.c
      gcc: numpy/core/src/multiarray/methods.c
      gcc: numpy/core/src/multiarray/multiarraymodule.c
      gcc: build/src.linux-x86_64-3.9/numpy/core/src/multiarray/nditer_templ.c
      gcc: numpy/core/src/multiarray/nditer_api.c
      gcc: numpy/core/src/multiarray/nditer_constr.c
      gcc: numpy/core/src/multiarray/nditer_pywrap.c
      gcc: numpy/core/src/multiarray/number.c
      gcc: numpy/core/src/multiarray/numpymemoryview.c
      gcc: numpy/core/src/multiarray/numpyos.c
      numpy/core/src/multiarray/numpyos.c:18:10: fatal error: xlocale.h: 没有那个文件或目录
         18 | #include
            |          ^~~~~~~~~~~
      compilation terminated.
      numpy/core/src/multiarray/numpyos.c:18:10: fatal error: xlocale.h: 没栠  é£ä¸ªæ件æç
         18 | #include
            |          ^~~~~~~~~~~
      compilation terminated.
      error: Command "gcc -pthread -B /home/yeqiang/miniconda3/envs/python3.9.18/compiler_compat -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/yeqiang/miniconda3/envs/python3.9.18/include -fPIC -O2 -isystem /home/yeqiang/miniconda3/envs/python3.9.18/include -fPIC -DHAVE_NPY_CONFIG_H=1 -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE=1 -D_LARGEFILE64_SOURCE=1 -Ibuild/src.linux-x86_64-3.9/numpy/core/src/private -Inumpy/core/include -Ibuild/src.linux-x86_64-3.9/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -I/home/yeqiang/Downloads/ai/M4Singer/code/venv/include -I/home/yeqiang/miniconda3/envs/python3.9.18/include/python3.9 -Ibuild/src.linux-x86_64-3.9/numpy/core/src/private -Ibuild/src.linux-x86_64-3.9/numpy/core/src/private -Ibuild/src.linux-x86_64-3.9/numpy/core/src/private -c numpy/core/src/multiarray/numpyos.c -o build/temp.linux-x86_64-3.9/numpy/core/src/multiarray/numpyos.o" failed with exit status 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> numpy

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

[notice] A new release of pip is available: 23.0.1 -> 23.2.1
[notice] To update, run: pip install --upgrade pip
(venv) (python3.9.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ 


(venv) (python3.9.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ pip install wheel
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Collecting wheel
  Using cached http://mirrors.aliyun.com/pypi/packages/b8/8b/31273bf66016be6ad22bb7345c37ff350276cfd46e389a0c2ac5da9d9073/wheel-0.41.2-py3-none-any.whl (64 kB)
Installing collected packages: wheel
Successfully installed wheel-0.41.2
(venv) (python3.9.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$

gcc版本太高?
(venv) (python3.9.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ sudo apt install gcc-9

(venv) (python3.9.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ export CC=/usr/bin/gcc-9 
(venv) (python3.9.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ pip install numpy==1.16.1


(venv) (python3.9.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ export CC=/usr/bin/gcc
(venv) (python3.9.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ pip install -v  numpy==1.26.0


最终还是numpy版本选择太低了!!!


ERROR: Ignored the following versions that require a different python version: 0.52.0 Requires-Python >=3.6,<3.9; 0.52.0rc3 Requires-Python >=3.6,<3.9; 9.1.0 Requires-Python >=3.10
ERROR: Could not find a version that satisfies the requirement torch==1.6.0 (from versions: 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0, 1.12.0, 1.12.1, 1.13.0, 1.13.1, 2.0.0, 2.0.1)
ERROR: No matching distribution found for torch==1.6.0

(venv) (python3.9.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ pip install -v  torch==1.7.1


(venv) (python3.9.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ pip install -v torch==1.7.1
Using pip 23.2.1 from /home/yeqiang/Downloads/ai/M4Singer/code/venv/lib/python3.9/site-packages/pip (python 3.9)
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Collecting torch==1.7.1
  Downloading http://mirrors.aliyun.com/pypi/packages/41/f4/4da4f26a04d93851e481e76ec17fed0d152a1691e8f1142ad763c9f07997/torch-1.7.1-cp39-cp39-manylinux1_x86_64.whl (776.8 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 776.8/776.8 MB 940.9 kB/s eta 0:00:00
Collecting typing-extensions (from torch==1.7.1)
  Downloading http://mirrors.aliyun.com/pypi/packages/24/21/7d397a4b7934ff4028987914ac1044d3b7d52712f30e2ac7a2ae5bc86dd0/typing_extensions-4.8.0-py3-none-any.whl (31 kB)
Requirement already satisfied: numpy in ./venv/lib/python3.9/site-packages (from torch==1.7.1) (1.26.0)
Installing collected packages: typing-extensions, torch
  changing mode of /home/yeqiang/Downloads/ai/M4Singer/code/venv/bin/convert-caffe2-to-onnx to 775
  changing mode of /home/yeqiang/Downloads/ai/M4Singer/code/venv/bin/convert-onnx-to-caffe2 to 775
Successfully installed torch-1.7.1 typing-extensions-4.8.0
(venv) (python3.9.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ 

同步修改 requirements_2080.txt torch==1.7.1


ERROR: Ignored the following versions that require a different python version: 0.52.0 Requires-Python >=3.6,<3.9; 0.52.0rc3 Requires-Python >=3.6,<3.9; 9.1.0 Requires-Python >=3.10
ERROR: Could not find a version that satisfies the requirement torchaudio==0.6.0 (from versions: 0.7.2, 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.10.2, 0.11.0, 0.12.0, 0.12.1, 0.13.0, 0.13.1, 2.0.0, 2.0.1, 2.0.2)
ERROR: No matching distribution found for torchaudio==0.6.0

同步修改 requirements_2080.txt torchaudio==0.7.2


ERROR: Ignored the following versions that require a different python version: 0.52.0 Requires-Python >=3.6,<3.9; 0.52.0rc3 Requires-Python >=3.6,<3.9; 9.1.0 Requires-Python >=3.10
ERROR: Could not find a version that satisfies the requirement torchvision==0.7.0 (from versions: 0.1.6, 0.1.7, 0.1.8, 0.1.9, 0.2.0, 0.2.1, 0.2.2, 0.2.2.post2, 0.2.2.post3, 0.8.2, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.11.1, 0.11.2, 0.11.3, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.2)
ERROR: No matching distribution found for torchvision==0.7.0


同步修改 requirements_2080.txt torchvision==0.8.2

Successfully built alignment audioread blinker Distance et-xmlfile future ipdb jieba librosa miditoolkit music21 nltk numba olefile praat-parselmouth pycwt PyInstaller python-Levenshtein pytorch-lightning pyworld PyYAML resampy scikit-image stopit typing uuid webrtcvad pretty-midi
Failed to build llvmlite scikit-learn
ERROR: Could not build wheels for llvmlite, scikit-learn, which is required to install pyproject.toml-based projects

  cwd: /tmp/pip-install-5mct2iow/scikit-learn_c819d945f24a40bfbe3a6c7c94f9f28f/
  Building wheel for scikit-learn (setup.py) ... error
  ERROR: Failed building wheel for scikit-learn
  Running setup.py clean for scikit-learn
  Running command python setup.py clean
  Partial import of sklearn during the build process.
  /tmp/pip-install-5mct2iow/scikit-learn_c819d945f24a40bfbe3a6c7c94f9f28f/setup.py:123: DeprecationWarning:

    `numpy.distutils` is deprecated since NumPy 1.23.0, as a result
    of the deprecation of `distutils` itself. It will be removed for
    Python >= 3.12. For older Python versions it will remain present.
    It is recommended to use `setuptools < 60.0` for those Python versions.
    For more details, see:
      https://numpy.org/devdocs/reference/distutils_status_migration.html

===============================================python3.8.18
默认gcc-11

(venv) (python3.9.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ git checkout requirements_2080.txt 
从索引区更新了 1 个路径
(venv) (python3.9.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ deactivate 
(python3.9.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ conda deactivate
(base) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ 
(base) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ 
(base) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ conda activate python3.8.18
(python3.8.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ python3 -m venv venv3818
(python3.8.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ source venv3818/bin/activate
(venv3818) (python3.8.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ pip install --upgrade pip
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Requirement already satisfied: pip in ./venv3818/lib/python3.8/site-packages (23.0.1)
Collecting pip
  Downloading http://mirrors.aliyun.com/pypi/packages/50/c2/e06851e8cc28dcad7c155f4753da8833ac06a5c704c109313b8d5a62968a/pip-23.2.1-py3-none-any.whl (2.1 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 2.6 MB/s eta 0:00:00
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 23.0.1
    Uninstalling pip-23.0.1:
      Successfully uninstalled pip-23.0.1
Successfully installed pip-23.2.1
(venv3818) (python3.8.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ 
(venv3818) (python3.8.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ git install -v -r requirements_2080.txt 


#############
  AttributeError: 'dict' object has no attribute '__NUMPY_SETUP__'
  error: subprocess-exited-with-error
  
  × Preparing metadata (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /home/yeqiang/Downloads/ai/M4Singer/code/venv3818/bin/python3 /home/yeqiang/Downloads/ai/M4Singer/code/venv3818/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py prepare_metadata_for_build_wheel /tmp/tmp2aqbozzx
  cwd: /tmp/pip-install-nque4yqb/pyworld_8b8bd8684bf6448a8623dd78a29a9e66
  Preparing metadata (pyproject.toml) ... error

(venv3818) (python3.8.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ pip install numpy==1.24.4          # 无用!


(venv3818) (python3.8.18) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ pip install --upgrade setuptools   # 无用!
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Requirement already satisfied: setuptools in ./venv3818/lib/python3.8/site-packages (56.0.0)
Collecting setuptools
  Using cached http://mirrors.aliyun.com/pypi/packages/bb/26/7945080113158354380a12ce26873dd6c1ebd88d47f5bc24e2c5bb38c16a/setuptools-68.2.2-py3-none-any.whl (807 kB)
Installing collected packages: setuptools
  Attempting uninstall: setuptools
    Found existing installation: setuptools 56.0.0
    Uninstalling setuptools-56.0.0:
      Successfully uninstalled setuptools-56.0.0
Successfully installed setuptools-68.2.2

同步修改 requirements_2080.txt numpy==1.24.4  # 无用!

Installing collected packages: numpy
  Successfully installed numpy-1.24.4
  Installing backend dependencies ... done
  Running command Preparing metadata (pyproject.toml)
  running dist_info
  creating /tmp/pip-modern-metadata-1tbxqd33/pyworld.egg-info
  writing /tmp/pip-modern-metadata-1tbxqd33/pyworld.egg-info/PKG-INFO
  writing dependency_links to /tmp/pip-modern-metadata-1tbxqd33/pyworld.egg-info/dependency_links.txt
  writing requirements to /tmp/pip-modern-metadata-1tbxqd33/pyworld.egg-info/requires.txt
  writing top-level names to /tmp/pip-modern-metadata-1tbxqd33/pyworld.egg-info/top_level.txt
  writing manifest file '/tmp/pip-modern-metadata-1tbxqd33/pyworld.egg-info/SOURCES.txt'
  /tmp/pip-build-env-w4btlj_8/overlay/lib/python3.8/site-packages/setuptools/dist.py:498: SetuptoolsDeprecationWarning: Invalid dash-separated options
  !!

          ********************************************************************************
          Usage of dash-separated 'description-file' will not be supported in future
          versions. Please use the underscore name 'description_file' instead.

          See https://setuptools.pypa.io/en/latest/userguide/declarative_config.html for details.
          ********************************************************************************

  !!
    opt = self.warn_dash_deprecation(opt, section)
  Traceback (most recent call last):
    File "/home/yeqiang/Downloads/ai/M4Singer/code/venv3818/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in
      main()
    File "/home/yeqiang/Downloads/ai/M4Singer/code/venv3818/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
    File "/home/yeqiang/Downloads/ai/M4Singer/code/venv3818/lib/python3.8/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 149, in prepare_metadata_for_build_wheel
      return hook(metadata_directory, config_settings)
    File "/tmp/pip-build-env-w4btlj_8/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 396, in prepare_metadata_for_build_wheel
      self.run_setup()
    File "/tmp/pip-build-env-w4btlj_8/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 507, in run_setup
      super(_BuildMetaLegacyBackend, self).run_setup(setup_script=setup_script)
    File "/tmp/pip-build-env-w4btlj_8/overlay/lib/python3.8/site-packages/setuptools/build_meta.py", line 341, in run_setup
      exec(code, locals())
    File "", line 43, in
    File "/tmp/pip-build-env-w4btlj_8/overlay/lib/python3.8/site-packages/setuptools/__init__.py", line 103, in setup
      return distutils.core.setup(**attrs)
    File "/tmp/pip-build-env-w4btlj_8/overlay/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 185, in setup
      return run_commands(dist)
    File "/tmp/pip-build-env-w4btlj_8/overlay/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
      dist.run_commands()
    File "/tmp/pip-build-env-w4btlj_8/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
      self.run_command(cmd)
    File "/tmp/pip-build-env-w4btlj_8/overlay/lib/python3.8/site-packages/setuptools/dist.py", line 989, in run_command
      super().run_command(command)
    File "/tmp/pip-build-env-w4btlj_8/overlay/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
      cmd_obj.run()
    File "/tmp/pip-build-env-w4btlj_8/overlay/lib/python3.8/site-packages/setuptools/command/dist_info.py", line 107, in run
      self.egg_info.run()
    File "/tmp/pip-build-env-w4btlj_8/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 318, in run
      self.find_sources()
    File "/tmp/pip-build-env-w4btlj_8/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 326, in find_sources
      mm.run()
    File "/tmp/pip-build-env-w4btlj_8/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 548, in run
      self.add_defaults()
    File "/tmp/pip-build-env-w4btlj_8/overlay/lib/python3.8/site-packages/setuptools/command/egg_info.py", line 586, in add_defaults
      sdist.add_defaults(self)
    File "/tmp/pip-build-env-w4btlj_8/overlay/lib/python3.8/site-packages/setuptools/command/sdist.py", line 113, in add_defaults
      super().add_defaults()
    File "/tmp/pip-build-env-w4btlj_8/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/sdist.py", line 251, in add_defaults
      self._add_defaults_ext()
    File "/tmp/pip-build-env-w4btlj_8/overlay/lib/python3.8/site-packages/setuptools/_distutils/command/sdist.py", line 335, in _add_defaults_ext
      build_ext = self.get_finalized_command('build_ext')
    File "/tmp/pip-build-env-w4btlj_8/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 305, in get_finalized_command
      cmd_obj.ensure_finalized()
    File "/tmp/pip-build-env-w4btlj_8/overlay/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 111, in ensure_finalized
      self.finalize_options()
    File "", line 29, in finalize_options
  AttributeError: 'dict' object has no attribute '__NUMPY_SETUP__'
  error: subprocess-exited-with-error


###############


3.11.5
(base) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ conda activate python3.11.5
(python3.11.5) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ python3 -m venv venv3115
(python3.11.5) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ source venv3115/bin/activate
(venv3115) (python3.11.5) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ git checkout requirements_2080.txt 
从索引区更新了 1 个路径
(venv3115) (python3.11.5) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ pip install -v -r requirements_2080.txt 

× Building wheel for numpy (pyproject.toml) did not run successfully.
    │ exit code: 1
    ╰─> [936 lines of output]
        setup.py:67: RuntimeWarning: NumPy 1.19.3 may not yet support Python 3.11.
          warnings.warn(

(venv3115) (python3.11.5) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ pip install -v numpy==1.26.0
Using pip 23.2.1 from /home/yeqiang/Downloads/ai/M4Singer/code/venv3115/lib/python3.11/site-packages/pip (python 3.11)
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
  Link requires a different Python (3.11.5 not in: '>=3.7,<3.11'): http://mirrors.aliyun.com/pypi/packages/3a/be/650f9c091ef71cb01d735775d554e068752d3ff63d7943b26316dc401749/numpy-1.21.2.zip#sha256=423216d8afc5923b15df86037c6053bf030d15cc9e3224206ef868c2d63dd6dc (from http://mirrors.aliyun.com/pypi/simple/numpy/) (requires-python:>=3.7,<3.11)
  Link requires a different Python (3.11.5 not in: '>=3.7,<3.11'): http://mirrors.aliyun.com/pypi/packages/5f/d6/ad58ded26556eaeaa8c971e08b6466f17c4ac4d786cd3d800e26ce59cc01/numpy-1.21.3.zip#sha256=63571bb7897a584ca3249c86dd01c10bcb5fe4296e3568b2e9c1a55356b6410e (from http://mirrors.aliyun.com/pypi/simple/numpy/) (requires-python:>=3.7,<3.11)
  Link requires a different Python (3.11.5 not in: '>=3.7,<3.11'): http://mirrors.aliyun.com/pypi/packages/fb/48/b0708ebd7718a8933f0d3937513ef8ef2f4f04529f1f66ca86d873043921/numpy-1.21.4.zip#sha256=e6c76a87633aa3fa16614b61ccedfae45b91df2767cf097aa9c933932a7ed1e0 (from http://mirrors.aliyun.com/pypi/simple/numpy/) (requires-python:>=3.7,<3.11)
  Link requires a different Python (3.11.5 not in: '>=3.7,<3.11'): http://mirrors.aliyun.com/pypi/packages/c2/a8/a924a09492bdfee8c2ec3094d0a13f2799800b4fdc9c890738aeeb12c72e/numpy-1.21.5.zip#sha256=6a5928bc6241264dce5ed509e66f33676fc97f464e7a919edc672fb5532221ee (from http://mirrors.aliyun.com/pypi/simple/numpy/) (requires-python:>=3.7,<3.11)
  Link requires a different Python (3.11.5 not in: '>=3.7,<3.11'): http://mirrors.aliyun.com/pypi/packages/45/b7/de7b8e67f2232c26af57c205aaad29fe17754f793404f59c8a730c7a191a/numpy-1.21.6.zip#sha256=ecb55251139706669fdec2ff073c98ef8e9a84473e51e716211b41aa0f18e656 (from http://mirrors.aliyun.com/pypi/simple/numpy/) (requires-python:>=3.7,<3.11)
Collecting numpy==1.26.0
  Downloading http://mirrors.aliyun.com/pypi/packages/c4/36/161e2f8110f8c49e59f6107bd6da4257d30aff9f06373d0471811f73dcc5/numpy-1.26.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.2 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18.2/18.2 MB 5.1 MB/s eta 0:00:00
Installing collected packages: numpy
  changing mode of /home/yeqiang/Downloads/ai/M4Singer/code/venv3115/bin/f2py to 775
Successfully installed numpy-1.26.0


同步修改 requirements_2080.txt numpy==1.26.0

3.7.12
(venv3712) (python3.7.12) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$
+ pip install -v -r requirements_2080.txt
Non-user install because user site-packages disabled
Created temporary directory: /tmp/pip-ephem-wheel-cache-lnvdxczj
Created temporary directory: /tmp/pip-req-tracker-brz5dnty
Initialized build tracking at /tmp/pip-req-tracker-brz5dnty
Created build tracker: /tmp/pip-req-tracker-brz5dnty
Entered build tracker: /tmp/pip-req-tracker-brz5dnty
Created temporary directory: /tmp/pip-install-s7un1lfq
Requirement already satisfied: absl-py==0.11.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 1)) (0.11.0)
Requirement already satisfied: alignment==1.0.10 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 2)) (1.0.10)
Requirement already satisfied: altgraph==0.17 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 3)) (0.17)
Requirement already satisfied: appdirs==1.4.4 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 4)) (1.4.4)
Requirement already satisfied: async-timeout==3.0.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 5)) (3.0.1)
Requirement already satisfied: audioread==2.1.9 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 6)) (2.1.9)
Requirement already satisfied: backcall==0.2.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 7)) (0.2.0)
Requirement already satisfied: blinker==1.4 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 8)) (1.4)
Requirement already satisfied: brotlipy==0.7.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 9)) (0.7.0)
Requirement already satisfied: cachetools==4.2.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 10)) (4.2.0)
Requirement already satisfied: certifi==2020.12.5 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 11)) (2020.12.5)
Requirement already satisfied: cffi==1.14.4 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 12)) (1.14.4)
Requirement already satisfied: chardet==4.0.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 13)) (4.0.0)
Requirement already satisfied: click==7.1.2 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 14)) (7.1.2)
Requirement already satisfied: cycler==0.10.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 15)) (0.10.0)
Requirement already satisfied: Cython==0.29.21 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 16)) (0.29.21)
Requirement already satisfied: cytoolz==0.11.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 17)) (0.11.0)
Requirement already satisfied: decorator==4.4.2 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 18)) (4.4.2)
Requirement already satisfied: Distance==0.1.3 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 19)) (0.1.3)
Requirement already satisfied: einops==0.3.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 20)) (0.3.0)
Requirement already satisfied: et-xmlfile==1.0.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 21)) (1.0.1)
Requirement already satisfied: fsspec==0.8.4 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 22)) (0.8.4)
Requirement already satisfied: future==0.18.2 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 23)) (0.18.2)
Requirement already satisfied: g2p-en==2.1.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 24)) (2.1.0)
Requirement already satisfied: g2pM==0.1.2.5 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 25)) (0.1.2.5)
Requirement already satisfied: google-auth==1.24.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 26)) (1.24.0)
Requirement already satisfied: google-auth-oauthlib==0.4.2 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 27)) (0.4.2)
Requirement already satisfied: grpcio==1.34.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 28)) (1.34.0)
Requirement already satisfied: h5py==3.1.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 29)) (3.1.0)
Requirement already satisfied: horology==1.1.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 30)) (1.1.0)
Requirement already satisfied: httplib2==0.18.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 31)) (0.18.1)
Requirement already satisfied: idna==2.10 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 32)) (2.10)
Requirement already satisfied: imageio==2.9.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 33)) (2.9.0)
Requirement already satisfied: inflect==5.0.2 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 34)) (5.0.2)
Requirement already satisfied: ipdb==0.13.4 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 35)) (0.13.4)
Requirement already satisfied: ipython==7.19.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 36)) (7.19.0)
Requirement already satisfied: ipython-genutils==0.2.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 37)) (0.2.0)
Requirement already satisfied: jdcal==1.4.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 38)) (1.4.1)
Requirement already satisfied: jedi==0.17.2 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 39)) (0.17.2)
Requirement already satisfied: jieba==0.42.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 40)) (0.42.1)
Requirement already satisfied: jiwer==2.2.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 41)) (2.2.0)
Requirement already satisfied: joblib==1.0.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 42)) (1.0.0)
Requirement already satisfied: kiwisolver==1.3.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 43)) (1.3.1)
Requirement already satisfied: librosa==0.8.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 44)) (0.8.0)
Requirement already satisfied: llvmlite==0.31.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 45)) (0.31.0)
Requirement already satisfied: Markdown==3.3.3 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 46)) (3.3.3)
Requirement already satisfied: matplotlib==3.3.3 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 47)) (3.3.3)
Requirement already satisfied: miditoolkit==0.1.7 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 48)) (0.1.7)
Requirement already satisfied: mido==1.2.9 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 49)) (1.2.9)
Requirement already satisfied: music21==5.7.2 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 50)) (5.7.2)
Requirement already satisfied: networkx==2.5 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 51)) (2.5)
Requirement already satisfied: nltk==3.5 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 52)) (3.5)
Requirement already satisfied: numba==0.48.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 53)) (0.48.0)
Requirement already satisfied: numpy==1.19.4 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 54)) (1.19.4)
Requirement already satisfied: oauth2client==4.1.3 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 55)) (4.1.3)
Requirement already satisfied: oauthlib==3.1.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 56)) (3.1.0)
Requirement already satisfied: olefile==0.46 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 57)) (0.46)
Requirement already satisfied: packaging==20.7 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 58)) (20.7)
Requirement already satisfied: pandas==1.2.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 59)) (1.2.0)
Requirement already satisfied: parso==0.7.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 60)) (0.7.1)
Requirement already satisfied: patsy==0.5.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 61)) (0.5.1)
Requirement already satisfied: pexpect==4.8.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 62)) (4.8.0)
Requirement already satisfied: pickleshare==0.7.5 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 63)) (0.7.5)
Requirement already satisfied: Pillow==8.0.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 64)) (8.0.1)
Requirement already satisfied: pooch==1.3.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 65)) (1.3.0)
Requirement already satisfied: praat-parselmouth==0.3.3 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 66)) (0.3.3)
Requirement already satisfied: prompt-toolkit==3.0.8 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 67)) (3.0.8)
Requirement already satisfied: protobuf==3.13.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 68)) (3.13.0)
Requirement already satisfied: ptyprocess==0.6.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 69)) (0.6.0)
Requirement already satisfied: pyasn1==0.4.8 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 70)) (0.4.8)
Requirement already satisfied: pyasn1-modules==0.2.8 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 71)) (0.2.8)
Requirement already satisfied: pycparser==2.20 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 72)) (2.20)
Requirement already satisfied: pycwt==0.3.0a22 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 73)) (0.3.0a22)
Requirement already satisfied: Pygments==2.7.3 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 74)) (2.7.3)
Requirement already satisfied: PyInstaller==3.6 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 75)) (3.6)
Requirement already satisfied: PyJWT==1.7.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 76)) (1.7.1)
Requirement already satisfied: pyloudnorm==0.1.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 77)) (0.1.0)
Requirement already satisfied: pyparsing==2.4.7 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 78)) (2.4.7)
Requirement already satisfied: pypinyin==0.39.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 79)) (0.39.0)
Requirement already satisfied: PySocks==1.7.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 80)) (1.7.1)
Requirement already satisfied: python-dateutil==2.8.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 81)) (2.8.1)
Requirement already satisfied: python-Levenshtein==0.12.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 82)) (0.12.0)
Requirement already satisfied: pytorch-lightning==0.7.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 83)) (0.7.1)
Requirement already satisfied: pytz==2020.5 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 84)) (2020.5)
Requirement already satisfied: PyWavelets==1.1.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 85)) (1.1.1)
Requirement already satisfied: pyworld==0.2.12 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 86)) (0.2.12)
Requirement already satisfied: PyYAML==5.3.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 87)) (5.3.1)
Requirement already satisfied: regex==2020.11.13 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 88)) (2020.11.13)
Requirement already satisfied: requests==2.25.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 89)) (2.25.1)
Requirement already satisfied: requests-oauthlib==1.3.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 90)) (1.3.0)
Requirement already satisfied: resampy==0.2.2 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 91)) (0.2.2)
Requirement already satisfied: Resemblyzer==0.1.1.dev0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 92)) (0.1.1.dev0)
Requirement already satisfied: rsa==4.6 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 93)) (4.6)
Requirement already satisfied: scikit-image==0.16.2 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 94)) (0.16.2)
Requirement already satisfied: scikit-learn==0.22.2.post1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 95)) (0.22.2.post1)
Requirement already satisfied: scipy==1.5.4 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 96)) (1.5.4)
Requirement already satisfied: six==1.15.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 97)) (1.15.0)
Requirement already satisfied: SoundFile==0.10.3.post1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 98)) (0.10.3.post1)
Requirement already satisfied: stopit==1.1.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 99)) (1.1.1)
Requirement already satisfied: tensorboard==2.4.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 100)) (2.4.0)
Requirement already satisfied: tensorboard-plugin-wit==1.7.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 101)) (1.7.0)
Requirement already satisfied: tensorboardX==2.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 102)) (2.1)
Requirement already satisfied: TextGrid==1.5 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 103)) (1.5)
Requirement already satisfied: threadpoolctl==2.1.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 104)) (2.1.0)
Requirement already satisfied: toolz==0.11.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 105)) (0.11.1)
Requirement already satisfied: torch==1.6.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 106)) (1.6.0)
Requirement already satisfied: torchaudio==0.6.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 107)) (0.6.0)
Requirement already satisfied: torchvision==0.7.0 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 108)) (0.7.0)
Requirement already satisfied: tqdm==4.54.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 109)) (4.54.1)
Requirement already satisfied: traitlets==5.0.5 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 110)) (5.0.5)
Requirement already satisfied: typing==3.7.4.3 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 111)) (3.7.4.3)
Requirement already satisfied: urllib3==1.26.2 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 112)) (1.26.2)
Requirement already satisfied: uuid==1.30 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 113)) (1.30)
Requirement already satisfied: wcwidth==0.2.5 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 114)) (0.2.5)
Requirement already satisfied: webencodings==0.5.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 115)) (0.5.1)
Requirement already satisfied: webrtcvad==2.0.10 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 116)) (2.0.10)
Requirement already satisfied: Werkzeug==1.0.1 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 117)) (1.0.1)
Requirement already satisfied: pretty-midi==0.2.9 in ./venv3712/lib/python3.7/site-packages (from -r requirements_2080.txt (line 118)) (0.2.9)
Requirement already satisfied: setuptools>=40.3.0 in ./venv3712/lib/python3.7/site-packages (from google-auth==1.24.0->-r requirements_2080.txt (line 26)) (47.1.0)
Requirement already satisfied: cached-property; python_version < "3.8" in ./venv3712/lib/python3.7/site-packages (from h5py==3.1.0->-r requirements_2080.txt (line 29)) (1.5.2)
Requirement already satisfied: importlib-metadata; python_version < "3.8" in ./venv3712/lib/python3.7/site-packages (from Markdown==3.3.3->-r requirements_2080.txt (line 46)) (6.7.0)
Requirement already satisfied: wheel>=0.26; python_version >= "3" in ./venv3712/lib/python3.7/site-packages (from tensorboard==2.4.0->-r requirements_2080.txt (line 100)) (0.41.2)
Requirement already satisfied: zipp>=0.5 in ./venv3712/lib/python3.7/site-packages (from importlib-metadata; python_version < "3.8"->Markdown==3.3.3->-r requirements_2080.txt (line 46)) (3.15.0)
Requirement already satisfied: typing-extensions>=3.6.4; python_version < "3.8" in ./venv3712/lib/python3.7/site-packages (from importlib-metadata; python_version < "3.8"->Markdown==3.3.3->-r requirements_2080.txt (line 46)) (4.7.1)
WARNING: You are using pip version 20.1.1; however, version 23.2.1 is available.
You should consider upgrading via the '/home/yeqiang/Downloads/ai/M4Singer/code/venv3712/bin/python3 -m pip install --upgrade pip' command.
Removed build tracker: '/tmp/pip-req-tracker-brz5dnty'

运行测试

(venv3712) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ find | grep binarize.py
./data_gen/singing/binarize.py
./data_gen/tts/bin/binarize.py
(venv3712) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ export PYTHONPATH=.
CUDA_VISIBLE_DEVICES=0 python data_gen/tts/bin/binarize.py --config usr/configs/m4singer/base.yaml
| Hparams chains:  ['configs/config_base.yaml', 'configs/tts/base.yaml', 'configs/tts/fs2.yaml', 'configs/tts/base_zh.yaml', 'configs/singing/base.yaml', 'usr/configs/base.yaml', 'usr/configs/popcs_ds_beta6.yaml', 'usr/configs/m4singer/base.yaml']
| Hparams: 
K_step: 51, accumulate_grad_batches: 1, audio_num_mel_bins: 80, audio_sample_rate: 24000, base_config: ['usr/configs/popcs_ds_beta6.yaml'], 
binarization_args: {'shuffle': False, 'with_txt': True, 'with_wav': False, 'with_align': True, 'with_spk_embed': True, 'with_f0': True, 'with_f0cwt': True}, binarizer_cls: data_gen.singing.binarize.M4SingerBinarizer, binary_data_dir: data/binary/m4singer, check_val_every_n_epoch: 10, clip_grad_norm: 1, 
content_cond_steps: [], cwt_add_f0_loss: False, cwt_hidden_size: 128, cwt_layers: 2, cwt_loss: l1, 
cwt_std_scale: 0.8, datasets: ['m4singer'], debug: False, dec_ffn_kernel_size: 9, dec_layers: 4, 
decay_steps: 50000, decoder_type: fft, dict_dir: , diff_decoder_type: wavenet, diff_loss_type: l1, 
dilation_cycle_length: 1, dropout: 0.1, ds_workers: 4, dur_enc_hidden_stride_kernel: ['0,2,3', '0,2,3', '0,1,3'], dur_loss: mse, 
dur_predictor_kernel: 3, dur_predictor_layers: 5, enc_ffn_kernel_size: 9, enc_layers: 4, encoder_K: 8, 
encoder_type: fft, endless_ds: True, ffn_act: gelu, ffn_padding: SAME, fft_size: 512, 
fmax: 12000, fmin: 30, fs2_ckpt: , gen_dir_name: , gen_tgt_spk_id: -1, 
hidden_size: 256, hop_size: 128, infer: False, keep_bins: 80, lambda_commit: 0.25, 
lambda_energy: 0.0, lambda_f0: 1.0, lambda_ph_dur: 1.0, lambda_sent_dur: 1.0, lambda_uv: 1.0, 
lambda_word_dur: 1.0, load_ckpt: , log_interval: 100, loud_norm: False, lr: 0.001, 
max_beta: 0.06, max_epochs: 1000, max_eval_sentences: 1, max_eval_tokens: 60000, max_frames: 5000, 
max_input_tokens: 1550, max_sentences: 12, max_tokens: 40000, max_updates: 160000, mel_loss: ssim:0.5|l1:0.5, 
mel_vmax: 1.5, mel_vmin: -6.0, min_level_db: -120, norm_type: gn, num_ckpt_keep: 3, 
num_heads: 2, num_sanity_val_steps: 1, num_spk: 20, num_test_samples: 0, num_valid_plots: 10, 
optimizer_adam_beta1: 0.9, optimizer_adam_beta2: 0.98, out_wav_norm: False, pe_ckpt: checkpoints/m4singer_pe, pe_enable: True, 
pitch_ar: False, pitch_enc_hidden_stride_kernel: ['0,2,5', '0,2,5', '0,2,5'], pitch_extractor: parselmouth, pitch_loss: l1, pitch_norm: log, 
pitch_type: frame, pre_align_args: {'use_tone': False, 'forced_align': 'mfa', 'use_sox': True, 'txt_processor': 'zh_g2pM', 'allow_no_txt': False, 'denoise': False}, pre_align_cls: data_gen.singing.pre_align.SingingPreAlign, predictor_dropout: 0.5, predictor_grad: 0.1, 
predictor_hidden: -1, predictor_kernel: 5, predictor_layers: 5, prenet_dropout: 0.5, prenet_hidden_size: 256, 
pretrain_fs_ckpt: , processed_data_dir: xxx, profile_infer: False, raw_data_dir: data/raw/m4singer, ref_norm_layer: bn, 
rel_pos: True, reset_phone_dict: True, residual_channels: 256, residual_layers: 20, save_best: False, 
save_ckpt: True, save_codes: ['configs', 'modules', 'tasks', 'utils', 'usr'], save_f0: True, save_gt: True, schedule_type: linear, 
seed: 1234, sort_by_len: True, spec_max: [-0.3894500136375427, -0.3796464204788208, -0.2914905250072479, -0.15550297498703003, -0.08502643555402756, 0.10698417574167252, -0.0739326998591423, -0.0541548952460289, 0.15501998364925385, 0.06483431905508041, 0.03054228238761425, -0.013737732544541359, -0.004876468330621719, 0.04368264228105545, 0.13329921662807465, 0.16471388936042786, 0.04605761915445328, -0.05680707097053528, 0.0542571023106575, -0.0076539707370102406, -0.00953489076346159, -0.04434828832745552, 0.001293870504014194, -0.12238839268684387, 0.06418416649103165, 0.02843189612030983, 0.08505241572856903, 0.07062800228595734, 0.00120724702719599, -0.07675088942050934, 0.03785804659128189, 0.04890783503651619, -0.06888376921415329, -0.0839693546295166, -0.17545585334300995, -0.2911079525947571, -0.4238220453262329, -0.262084037065506, -0.3002263605594635, -0.3845032751560211, -0.3906497061252594, -0.6550108790397644, -0.7810799479484558, -0.7503029704093933, -0.7995198965072632, -0.8092347383499146, -0.6196113228797913, -0.6684317588806152, -0.7735874056816101, -0.8324533104896545, -0.9601566791534424, -0.955253541469574, -0.748817503452301, -0.9106167554855347, -0.9707801342010498, -1.053107500076294, -1.0448424816131592, -1.1082794666290283, -1.1296544075012207, -1.071642279624939, -1.1003081798553467, -1.166810154914856, -1.1408926248550415, -1.1330615282058716, -1.1167492866516113, -1.0716774463653564, -1.035891056060791, -1.0092483758926392, -0.9675999879837036, -0.938962996006012, -1.0120564699172974, -0.9777995347976685, -1.029313564300537, -0.9459163546562195, -0.8519706130027771, -0.7751091122627258, -0.7933766841888428, -0.9019735455513, -0.9983296990394592, -1.505873441696167], spec_min: [-6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0], spk_cond_steps: [], 
stop_token_weight: 5.0, task_cls: usr.diffsinger_task.DiffSingerTask, test_ids: [], test_input_dir: , test_num: 0, 
test_prefixes: ['Alto-2#岁月神偷', 'Alto-2#奇妙能力歌', 'Tenor-1#一千年以后', 'Tenor-1#童话', 'Tenor-2#消愁', 'Tenor-2#一荤一素', 'Soprano-1#念奴娇赤壁怀古', 'Soprano-1#问春'], test_set_name: test, timesteps: 100, train_set_name: train, use_denoise: False, 
use_energy_embed: False, use_gt_dur: False, use_gt_f0: False, use_midi: True, use_nsf: True, 
use_pitch_embed: True, use_pos_embed: True, use_spk_embed: False, use_spk_id: True, use_split_spk_id: False, 
use_uv: True, use_var_enc: False, val_check_interval: 2000, valid_num: 0, valid_set_name: valid, 
validate: False, vocoder: vocoders.hifigan.HifiGAN, vocoder_ckpt: checkpoints/m4singer_hifigan, warmup_updates: 2000, wav2spec_eps: 1e-6, 
weight_decay: 0, win_size: 512, work_dir: , 
| Binarizer:  
Traceback (most recent call last):
  File "data_gen/tts/bin/binarize.py", line 20, in
    binarize()
  File "data_gen/tts/bin/binarize.py", line 15, in binarize
    binarizer_cls().process()
  File "/home/yeqiang/Downloads/ai/M4Singer/code/data_gen/singing/binarize.py", line 90, in process
    self.load_meta_data()
  File "/home/yeqiang/Downloads/ai/M4Singer/code/data_gen/singing/binarize.py", line 304, in load_meta_data
    song_items = json.load(open(os.path.join(raw_data_dir, 'meta.json')))  # [list of dict]
FileNotFoundError: [Errno 2] No such file or directory: 'data/raw/m4singer/meta.json'
 

需要下载

https://drive.google.com/file/d/1xC37E59EWRRFFLdG3aJkVqwtLDgtFNqW/view?usp=share_link

地址来源M4Singer · GitHub

a) Download m4singer.zip, then unzip this file into data/raw.

在src目录下,重新执行

export PYTHONPATH=.
CUDA_VISIBLE_DEVICES=0 python data_gen/tts/bin/binarize.py --config usr/configs/m4singer/base.yaml

histroy

511  mkdir data/raw -p
  512  cd data/raw/
  513  unzip ~/Downloads/ai/m4singer.zip 
  520  source venv3712
  521  source venv3712/bin/activate
  522  export PYTHONPATH=.
  523  CUDA_VISIBLE_DEVICES=0 python data_gen/tts/bin/binarize.py --config usr/configs/m4singer/base.yaml
 

(venv3712) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ export PYTHONPATH=.
CUDA_VISIBLE_DEVICES=0 python data_gen/tts/bin/binarize.py --config usr/configs/m4singer/base.yaml
| Hparams chains:  ['configs/config_base.yaml', 'configs/tts/base.yaml', 'configs/tts/fs2.yaml', 'configs/tts/base_zh.yaml', 'configs/singing/base.yaml', 'usr/configs/base.yaml', 'usr/configs/popcs_ds_beta6.yaml', 'usr/configs/m4singer/base.yaml']
| Hparams: 
K_step: 51, accumulate_grad_batches: 1, audio_num_mel_bins: 80, audio_sample_rate: 24000, base_config: ['usr/configs/popcs_ds_beta6.yaml'], 
binarization_args: {'shuffle': False, 'with_txt': True, 'with_wav': False, 'with_align': True, 'with_spk_embed': True, 'with_f0': True, 'with_f0cwt': True}, binarizer_cls: data_gen.singing.binarize.M4SingerBinarizer, binary_data_dir: data/binary/m4singer, check_val_every_n_epoch: 10, clip_grad_norm: 1, 
content_cond_steps: [], cwt_add_f0_loss: False, cwt_hidden_size: 128, cwt_layers: 2, cwt_loss: l1, 
cwt_std_scale: 0.8, datasets: ['m4singer'], debug: False, dec_ffn_kernel_size: 9, dec_layers: 4, 
decay_steps: 50000, decoder_type: fft, dict_dir: , diff_decoder_type: wavenet, diff_loss_type: l1, 
dilation_cycle_length: 1, dropout: 0.1, ds_workers: 4, dur_enc_hidden_stride_kernel: ['0,2,3', '0,2,3', '0,1,3'], dur_loss: mse, 
dur_predictor_kernel: 3, dur_predictor_layers: 5, enc_ffn_kernel_size: 9, enc_layers: 4, encoder_K: 8, 
encoder_type: fft, endless_ds: True, ffn_act: gelu, ffn_padding: SAME, fft_size: 512, 
fmax: 12000, fmin: 30, fs2_ckpt: , gen_dir_name: , gen_tgt_spk_id: -1, 
hidden_size: 256, hop_size: 128, infer: False, keep_bins: 80, lambda_commit: 0.25, 
lambda_energy: 0.0, lambda_f0: 1.0, lambda_ph_dur: 1.0, lambda_sent_dur: 1.0, lambda_uv: 1.0, 
lambda_word_dur: 1.0, load_ckpt: , log_interval: 100, loud_norm: False, lr: 0.001, 
max_beta: 0.06, max_epochs: 1000, max_eval_sentences: 1, max_eval_tokens: 60000, max_frames: 5000, 
max_input_tokens: 1550, max_sentences: 12, max_tokens: 40000, max_updates: 160000, mel_loss: ssim:0.5|l1:0.5, 
mel_vmax: 1.5, mel_vmin: -6.0, min_level_db: -120, norm_type: gn, num_ckpt_keep: 3, 
num_heads: 2, num_sanity_val_steps: 1, num_spk: 20, num_test_samples: 0, num_valid_plots: 10, 
optimizer_adam_beta1: 0.9, optimizer_adam_beta2: 0.98, out_wav_norm: False, pe_ckpt: checkpoints/m4singer_pe, pe_enable: True, 
pitch_ar: False, pitch_enc_hidden_stride_kernel: ['0,2,5', '0,2,5', '0,2,5'], pitch_extractor: parselmouth, pitch_loss: l1, pitch_norm: log, 
pitch_type: frame, pre_align_args: {'use_tone': False, 'forced_align': 'mfa', 'use_sox': True, 'txt_processor': 'zh_g2pM', 'allow_no_txt': False, 'denoise': False}, pre_align_cls: data_gen.singing.pre_align.SingingPreAlign, predictor_dropout: 0.5, predictor_grad: 0.1, 
predictor_hidden: -1, predictor_kernel: 5, predictor_layers: 5, prenet_dropout: 0.5, prenet_hidden_size: 256, 
pretrain_fs_ckpt: , processed_data_dir: xxx, profile_infer: False, raw_data_dir: data/raw/m4singer, ref_norm_layer: bn, 
rel_pos: True, reset_phone_dict: True, residual_channels: 256, residual_layers: 20, save_best: False, 
save_ckpt: True, save_codes: ['configs', 'modules', 'tasks', 'utils', 'usr'], save_f0: True, save_gt: True, schedule_type: linear, 
seed: 1234, sort_by_len: True, spec_max: [-0.3894500136375427, -0.3796464204788208, -0.2914905250072479, -0.15550297498703003, -0.08502643555402756, 0.10698417574167252, -0.0739326998591423, -0.0541548952460289, 0.15501998364925385, 0.06483431905508041, 0.03054228238761425, -0.013737732544541359, -0.004876468330621719, 0.04368264228105545, 0.13329921662807465, 0.16471388936042786, 0.04605761915445328, -0.05680707097053528, 0.0542571023106575, -0.0076539707370102406, -0.00953489076346159, -0.04434828832745552, 0.001293870504014194, -0.12238839268684387, 0.06418416649103165, 0.02843189612030983, 0.08505241572856903, 0.07062800228595734, 0.00120724702719599, -0.07675088942050934, 0.03785804659128189, 0.04890783503651619, -0.06888376921415329, -0.0839693546295166, -0.17545585334300995, -0.2911079525947571, -0.4238220453262329, -0.262084037065506, -0.3002263605594635, -0.3845032751560211, -0.3906497061252594, -0.6550108790397644, -0.7810799479484558, -0.7503029704093933, -0.7995198965072632, -0.8092347383499146, -0.6196113228797913, -0.6684317588806152, -0.7735874056816101, -0.8324533104896545, -0.9601566791534424, -0.955253541469574, -0.748817503452301, -0.9106167554855347, -0.9707801342010498, -1.053107500076294, -1.0448424816131592, -1.1082794666290283, -1.1296544075012207, -1.071642279624939, -1.1003081798553467, -1.166810154914856, -1.1408926248550415, -1.1330615282058716, -1.1167492866516113, -1.0716774463653564, -1.035891056060791, -1.0092483758926392, -0.9675999879837036, -0.938962996006012, -1.0120564699172974, -0.9777995347976685, -1.029313564300537, -0.9459163546562195, -0.8519706130027771, -0.7751091122627258, -0.7933766841888428, -0.9019735455513, -0.9983296990394592, -1.505873441696167], spec_min: [-6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0], spk_cond_steps: [], 
stop_token_weight: 5.0, task_cls: usr.diffsinger_task.DiffSingerTask, test_ids: [], test_input_dir: , test_num: 0, 
test_prefixes: ['Alto-2#岁月神偷', 'Alto-2#奇妙能力歌', 'Tenor-1#一千年以后', 'Tenor-1#童话', 'Tenor-2#消愁', 'Tenor-2#一荤一素', 'Soprano-1#念奴娇赤壁怀古', 'Soprano-1#问春'], test_set_name: test, timesteps: 100, train_set_name: train, use_denoise: False, 
use_energy_embed: False, use_gt_dur: False, use_gt_f0: False, use_midi: True, use_nsf: True, 
use_pitch_embed: True, use_pos_embed: True, use_spk_embed: False, use_spk_id: True, use_split_spk_id: False, 
use_uv: True, use_var_enc: False, val_check_interval: 2000, valid_num: 0, valid_set_name: valid, 
validate: False, vocoder: vocoders.hifigan.HifiGAN, vocoder_ckpt: checkpoints/m4singer_hifigan, warmup_updates: 2000, wav2spec_eps: 1e-6, 
weight_decay: 0, win_size: 512, work_dir: , 
| Binarizer:  
spkers:  {'Alto-7', 'Tenor-1', 'Bass-3', 'Tenor-5', 'Bass-2', 'Alto-5', 'Soprano-1', 'Alto-3', 'Alto-6', 'Tenor-3', 'Tenor-7', 'Tenor-4', 'Tenor-2', 'Soprano-3', 'Alto-1', 'Soprano-2', 'Alto-4', 'Bass-1', 'Tenor-6', 'Alto-2'}
| spk_map:  {'Alto-1': 0, 'Alto-2': 1, 'Alto-3': 2, 'Alto-4': 3, 'Alto-5': 4, 'Alto-6': 5, 'Alto-7': 6, 'Bass-1': 7, 'Bass-2': 8, 'Bass-3': 9, 'Soprano-1': 10, 'Soprano-2': 11, 'Soprano-3': 12, 'Tenor-1': 13, 'Tenor-2': 14, 'Tenor-3': 15, 'Tenor-4': 16, 'Tenor-5': 17, 'Tenor-6': 18, 'Tenor-7': 19}
| Build phone set:  ['', '', 'a', 'ai', 'an', 'ang', 'ao', 'b', 'c', 'ch', 'd', 'e', 'ei', 'en', 'eng', 'er', 'f', 'g', 'h', 'i', 'ia', 'ian', 'iang', 'iao', 'ie', 'in', 'ing', 'iong', 'iou', 'j', 'k', 'l', 'm', 'n', 'o', 'ong', 'ou', 'p', 'q', 'r', 's', 'sh', 't', 'u', 'ua', 'uai', 'uan', 'uang', 'uei', 'uen', 'uo', 'v', 'van', 've', 'vn', 'x', 'z', 'zh']
Loaded the voice encoder model on cuda in 10.29 seconds.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 217/217 [00:47<00:00,  4.57it/s]
| valid total duration: 1254.837s
Loaded the voice encoder model on cuda in 0.01 seconds.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 217/217 [00:39<00:00,  5.43it/s]
| test total duration: 1254.837s
Loaded the voice encoder model on cuda in 0.01 seconds.
 42%|█████████████████████████████████████████████████████▏                                                                         | 8670/20679 [19:57<21:41,  9.23it/s]| Skip item (Empty **gt** f0). item_name: Bass-1#父亲写的散文诗#0013, wav_fn: data/raw/m4singer/Bass-1#父亲写的散文诗/0013.wav
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20679/20679 [47:41<00:00,  7.23it/s]
| train total duration: 105705.472s
 

GPU使用率低、占用3%左右,显存占用1G+

此过程更消耗CPU

释放预训练模型(不确定此描述是否严谨)

(venv3712) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ mkdir checkpoints
(venv3712) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ ll ../m4singer_*
-rw-rw-r-- 1 yeqiang yeqiang 361083505 2023-09-26 21:21:01 ../m4singer_diff_e2e.zip
-rw-rw-r-- 1 yeqiang yeqiang 265208925 2023-09-26 20:54:37 ../m4singer_fs2_e2e.zip
-rw-rw-r-- 1 yeqiang yeqiang 943383863 2023-09-26 22:41:27 ../m4singer_hifigan.zip
-rw-rw-r-- 1 yeqiang yeqiang  35405898 2023-09-26 19:20:30 ../m4singer_pe.zip
(venv3712) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ cd checkpoints/  
(venv3712) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code/checkpoints$ unzip ../../m4singer_pe.zip 
Archive:  ../../m4singer_pe.zip
  inflating: m4singer_pe/config.yaml  
  inflating: m4singer_pe/model_ckpt_steps_280000.ckpt  
(venv3712) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code/checkpoints$ unzip ../../m4singer_
m4singer_diff_e2e.zip  m4singer_fs2_e2e.zip   m4singer_hifigan.zip   m4singer_pe.zip        
(venv3712) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code/checkpoints$ unzip ../../m4singer_diff_e2e.zip 
Archive:  ../../m4singer_diff_e2e.zip
  inflating: m4singer_diff_e2e/config.yaml  
  inflating: m4singer_diff_e2e/model_ckpt_steps_900000.ckpt  
(venv3712) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code/checkpoints$ unzip ../../m4singer_fs2_e2e.zip 
Archive:  ../../m4singer_fs2_e2e.zip
  inflating: m4singer_fs2_e2e/config.yaml  
  inflating: m4singer_fs2_e2e/model_ckpt_steps_320000.ckpt  
(venv3712) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code/checkpoints$ unzip ../../m4singer_hifigan.zip 
Archive:  ../../m4singer_hifigan.zip
  inflating: m4singer_hifigan/config.yaml  
  inflating: m4singer_hifigan/model_ckpt_steps_1970000.ckpt  
 

训练模型

(venv3712) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config usr/configs/m4singer/fs2.yaml --exp_name m4singer_fs2_e2e --reset
| Hparams chains:  ['configs/config_base.yaml', 'configs/tts/base.yaml', 'configs/tts/fs2.yaml', 'configs/tts/base_zh.yaml', 'configs/singing/base.yaml', 'configs/singing/fs2.yaml', 'usr/configs/base.yaml', 'usr/configs/popcs_ds_beta6.yaml', 'usr/configs/m4singer/base.yaml', 'usr/configs/m4singer/fs2.yaml']
| Hparams: 
K_step: 51, accumulate_grad_batches: 1, audio_num_mel_bins: 80, audio_sample_rate: 24000, base_config: ['configs/singing/fs2.yaml', 'usr/configs/m4singer/base.yaml'], 
binarization_args: {'shuffle': False, 'with_txt': True, 'with_wav': False, 'with_align': True, 'with_spk_embed': True, 'with_f0': True, 'with_f0cwt': True}, binarizer_cls: data_gen.singing.binarize.M4SingerBinarizer, binary_data_dir: data/binary/m4singer, check_val_every_n_epoch: 10, clip_grad_norm: 1, 
content_cond_steps: [], cwt_add_f0_loss: False, cwt_hidden_size: 128, cwt_layers: 2, cwt_loss: l1, 
cwt_std_scale: 0.8, datasets: ['m4singer'], debug: False, dec_ffn_kernel_size: 9, dec_layers: 4, 
decay_steps: 50000, decoder_type: fft, dict_dir: , diff_decoder_type: wavenet, diff_loss_type: l1, 
dilation_cycle_length: 1, dropout: 0.1, ds_workers: 4, dur_enc_hidden_stride_kernel: ['0,2,3', '0,2,3', '0,1,3'], dur_loss: mse, 
dur_predictor_kernel: 3, dur_predictor_layers: 5, enc_ffn_kernel_size: 9, enc_layers: 4, encoder_K: 8, 
encoder_type: fft, endless_ds: True, ffn_act: gelu, ffn_padding: SAME, fft_size: 512, 
fmax: 12000, fmin: 30, fs2_ckpt: , gen_dir_name: , gen_tgt_spk_id: -1, 
hidden_size: 256, hop_size: 128, infer: False, keep_bins: 80, lambda_commit: 0.25, 
lambda_energy: 0.0, lambda_f0: 1.0, lambda_ph_dur: 1.0, lambda_sent_dur: 1.0, lambda_uv: 1.0, 
lambda_word_dur: 1.0, load_ckpt: , log_interval: 100, loud_norm: False, lr: 1, 
max_beta: 0.06, max_epochs: 1000, max_eval_sentences: 1, max_eval_tokens: 60000, max_frames: 5000, 
max_input_tokens: 1550, max_sentences: 12, max_tokens: 40000, max_updates: 320000, mel_loss: ssim:0.5|l1:0.5, 
mel_vmax: 1.5, mel_vmin: -6.0, min_level_db: -120, norm_type: gn, num_ckpt_keep: 3, 
num_heads: 2, num_sanity_val_steps: 1, num_spk: 20, num_test_samples: 0, num_valid_plots: 10, 
optimizer_adam_beta1: 0.9, optimizer_adam_beta2: 0.98, out_wav_norm: False, pe_ckpt: checkpoints/m4singer_pe, pe_enable: True, 
pitch_ar: False, pitch_enc_hidden_stride_kernel: ['0,2,5', '0,2,5', '0,2,5'], pitch_extractor: parselmouth, pitch_loss: l1, pitch_norm: log, 
pitch_type: frame, pre_align_args: {'use_tone': False, 'forced_align': 'mfa', 'use_sox': True, 'txt_processor': 'zh_g2pM', 'allow_no_txt': False, 'denoise': False}, pre_align_cls: data_gen.singing.pre_align.SingingPreAlign, predictor_dropout: 0.5, predictor_grad: 0.1, 
predictor_hidden: -1, predictor_kernel: 5, predictor_layers: 5, prenet_dropout: 0.5, prenet_hidden_size: 256, 
pretrain_fs_ckpt: , processed_data_dir: xxx, profile_infer: False, raw_data_dir: data/raw/m4singer, ref_norm_layer: bn, 
rel_pos: True, reset_phone_dict: True, residual_channels: 256, residual_layers: 20, save_best: False, 
save_ckpt: True, save_codes: ['configs', 'modules', 'tasks', 'utils', 'usr'], save_f0: True, save_gt: True, schedule_type: linear, 
seed: 1234, sort_by_len: True, spec_max: [-0.3894500136375427, -0.3796464204788208, -0.2914905250072479, -0.15550297498703003, -0.08502643555402756, 0.10698417574167252, -0.0739326998591423, -0.0541548952460289, 0.15501998364925385, 0.06483431905508041, 0.03054228238761425, -0.013737732544541359, -0.004876468330621719, 0.04368264228105545, 0.13329921662807465, 0.16471388936042786, 0.04605761915445328, -0.05680707097053528, 0.0542571023106575, -0.0076539707370102406, -0.00953489076346159, -0.04434828832745552, 0.001293870504014194, -0.12238839268684387, 0.06418416649103165, 0.02843189612030983, 0.08505241572856903, 0.07062800228595734, 0.00120724702719599, -0.07675088942050934, 0.03785804659128189, 0.04890783503651619, -0.06888376921415329, -0.0839693546295166, -0.17545585334300995, -0.2911079525947571, -0.4238220453262329, -0.262084037065506, -0.3002263605594635, -0.3845032751560211, -0.3906497061252594, -0.6550108790397644, -0.7810799479484558, -0.7503029704093933, -0.7995198965072632, -0.8092347383499146, -0.6196113228797913, -0.6684317588806152, -0.7735874056816101, -0.8324533104896545, -0.9601566791534424, -0.955253541469574, -0.748817503452301, -0.9106167554855347, -0.9707801342010498, -1.053107500076294, -1.0448424816131592, -1.1082794666290283, -1.1296544075012207, -1.071642279624939, -1.1003081798553467, -1.166810154914856, -1.1408926248550415, -1.1330615282058716, -1.1167492866516113, -1.0716774463653564, -1.035891056060791, -1.0092483758926392, -0.9675999879837036, -0.938962996006012, -1.0120564699172974, -0.9777995347976685, -1.029313564300537, -0.9459163546562195, -0.8519706130027771, -0.7751091122627258, -0.7933766841888428, -0.9019735455513, -0.9983296990394592, -1.505873441696167], spec_min: [-6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0], spk_cond_steps: [], 
stop_token_weight: 5.0, task_cls: usr.diffsinger_task.AuxDecoderMIDITask, test_ids: [], test_input_dir: , test_num: 0, 
test_prefixes: ['Alto-2#岁月神偷', 'Alto-2#奇妙能力歌', 'Tenor-1#一千年以后', 'Tenor-1#童话', 'Tenor-2#消愁', 'Tenor-2#一荤一素', 'Soprano-1#念奴娇赤壁怀古', 'Soprano-1#问春'], test_set_name: test, timesteps: 100, train_set_name: train, use_denoise: False, 
use_energy_embed: False, use_gt_dur: False, use_gt_f0: False, use_midi: True, use_nsf: True, 
use_pitch_embed: False, use_pos_embed: True, use_spk_embed: False, use_spk_id: True, use_split_spk_id: False, 
use_uv: True, use_var_enc: False, val_check_interval: 2000, valid_num: 0, valid_set_name: valid, 
validate: False, vocoder: vocoders.hifigan.HifiGAN, vocoder_ckpt: checkpoints/m4singer_hifigan, warmup_updates: 2000, wav2spec_eps: 1e-6, 
weight_decay: 0, win_size: 512, work_dir: checkpoints/m4singer_fs2_e2e, 
| Mel losses: {'ssim': 0.5, 'l1': 0.5}
09/27 07:06:49 PM gpu available: True, used: True
| Copied codes to checkpoints/m4singer_fs2_e2e/codes/20230927190649.
| model Arch:  FastSpeech2MIDI(
  (encoder_embed_tokens): Embedding(61, 256, padding_idx=0)
  (decoder): FastspeechDecoder(
    (embed_positions): SinusoidalPositionalEmbedding()
    (layers): ModuleList(
      (0): TransformerEncoderLayer(
        (op): EncSALayer(
          (layer_norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=False)
          )
          (layer_norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (ffn): TransformerFFNLayer(
            (ffn_1): Conv1d(256, 1024, kernel_size=(9,), stride=(1,), padding=(4,))
            (ffn_2): Linear(in_features=1024, out_features=256, bias=True)
          )
        )
      )
      (1): TransformerEncoderLayer(
        (op): EncSALayer(
          (layer_norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=False)
          )
          (layer_norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (ffn): TransformerFFNLayer(
            (ffn_1): Conv1d(256, 1024, kernel_size=(9,), stride=(1,), padding=(4,))
            (ffn_2): Linear(in_features=1024, out_features=256, bias=True)
          )
        )
      )
      (2): TransformerEncoderLayer(
        (op): EncSALayer(
          (layer_norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=False)
          )
          (layer_norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (ffn): TransformerFFNLayer(
            (ffn_1): Conv1d(256, 1024, kernel_size=(9,), stride=(1,), padding=(4,))
            (ffn_2): Linear(in_features=1024, out_features=256, bias=True)
          )
        )
      )
      (3): TransformerEncoderLayer(
        (op): EncSALayer(
          (layer_norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=False)
          )
          (layer_norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (ffn): TransformerFFNLayer(
            (ffn_1): Conv1d(256, 1024, kernel_size=(9,), stride=(1,), padding=(4,))
            (ffn_2): Linear(in_features=1024, out_features=256, bias=True)
          )
        )
      )
    )
    (layer_norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
  )
  (mel_out): Linear(in_features=256, out_features=80, bias=True)
  (spk_embed_proj): Embedding(21, 256)
  (dur_predictor): DurationPredictor(
    (conv): ModuleList(
      (0): Sequential(
        (0): ConstantPad1d(padding=(1, 1), value=0)
        (1): Conv1d(256, 256, kernel_size=(3,), stride=(1,))
        (2): ReLU()
        (3): LayerNorm((256,), eps=1e-12, elementwise_affine=True)
        (4): Dropout(p=0.5, inplace=False)
      )
      (1): Sequential(
        (0): ConstantPad1d(padding=(1, 1), value=0)
        (1): Conv1d(256, 256, kernel_size=(3,), stride=(1,))
        (2): ReLU()
        (3): LayerNorm((256,), eps=1e-12, elementwise_affine=True)
        (4): Dropout(p=0.5, inplace=False)
      )
      (2): Sequential(
        (0): ConstantPad1d(padding=(1, 1), value=0)
        (1): Conv1d(256, 256, kernel_size=(3,), stride=(1,))
        (2): ReLU()
        (3): LayerNorm((256,), eps=1e-12, elementwise_affine=True)
        (4): Dropout(p=0.5, inplace=False)
      )
      (3): Sequential(
        (0): ConstantPad1d(padding=(1, 1), value=0)
        (1): Conv1d(256, 256, kernel_size=(3,), stride=(1,))
        (2): ReLU()
        (3): LayerNorm((256,), eps=1e-12, elementwise_affine=True)
        (4): Dropout(p=0.5, inplace=False)
      )
      (4): Sequential(
        (0): ConstantPad1d(padding=(1, 1), value=0)
        (1): Conv1d(256, 256, kernel_size=(3,), stride=(1,))
        (2): ReLU()
        (3): LayerNorm((256,), eps=1e-12, elementwise_affine=True)
        (4): Dropout(p=0.5, inplace=False)
      )
    )
    (linear): Linear(in_features=256, out_features=1, bias=True)
  )
  (length_regulator): LengthRegulator()
  (encoder): FastspeechMIDIEncoder(
    (layers): ModuleList(
      (0): TransformerEncoderLayer(
        (op): EncSALayer(
          (layer_norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=False)
          )
          (layer_norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (ffn): TransformerFFNLayer(
            (ffn_1): Conv1d(256, 1024, kernel_size=(9,), stride=(1,), padding=(4,))
            (ffn_2): Linear(in_features=1024, out_features=256, bias=True)
          )
        )
      )
      (1): TransformerEncoderLayer(
        (op): EncSALayer(
          (layer_norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=False)
          )
          (layer_norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (ffn): TransformerFFNLayer(
            (ffn_1): Conv1d(256, 1024, kernel_size=(9,), stride=(1,), padding=(4,))
            (ffn_2): Linear(in_features=1024, out_features=256, bias=True)
          )
        )
      )
      (2): TransformerEncoderLayer(
        (op): EncSALayer(
          (layer_norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=False)
          )
          (layer_norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (ffn): TransformerFFNLayer(
            (ffn_1): Conv1d(256, 1024, kernel_size=(9,), stride=(1,), padding=(4,))
            (ffn_2): Linear(in_features=1024, out_features=256, bias=True)
          )
        )
      )
      (3): TransformerEncoderLayer(
        (op): EncSALayer(
          (layer_norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (self_attn): MultiheadAttention(
            (out_proj): Linear(in_features=256, out_features=256, bias=False)
          )
          (layer_norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
          (ffn): TransformerFFNLayer(
            (ffn_1): Conv1d(256, 1024, kernel_size=(9,), stride=(1,), padding=(4,))
            (ffn_2): Linear(in_features=1024, out_features=256, bias=True)
          )
        )
      )
    )
    (layer_norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
    (embed_tokens): Embedding(61, 256, padding_idx=0)
    (embed_positions): RelPositionalEncoding(
      (dropout): Dropout(p=0.0, inplace=False)
    )
  )
  (midi_embed): Embedding(300, 256, padding_idx=0)
  (midi_dur_layer): Linear(in_features=1, out_features=256, bias=True)
  (is_slur_embed): Embedding(2, 256)
)
| model Trainable Parameters: 24.195M
09/27 07:06:52 PM model and trainer restored from checkpoint: checkpoints/m4singer_fs2_e2e/model_ckpt_steps_320000.ckpt
Validation sanity check:   0%|                                                                                                                                   | 0/1 [00:00 ==============
 valid results: {'total_loss': 0.5226, 'ssim': 0.2665, 'l1': 0.2351, 'pdur': 0.0188, 'wdur': 0.002, 'sdur': 0.0002}
==============

Epoch 1: : 1batch [00:01,  1.06s/batch, batch_size=12, l1=0.105, lr=0.00011, pdur=0.0135, sdur=0.00344, ssim=0.174, step=320000, wdur=0.00704]| Training end..                            
Epoch 1: : 1batch [00:01,  1.15s/batch, batch_size=12, l1=0.105, lr=0.00011, pdur=0.0135, sdur=0.00344, ssim=0.174, step=320000, wdur=0.00704]

几秒钟结束了,接着训练DiffSinger

(venv3712) yeqiang@yeqiang-MS-7B23:~/Downloads/ai/M4Singer/code$ CUDA_VISIBLE_DEVICES=0 python tasks/run.py --config usr/configs/m4singer/diff.yaml --exp_name m4singer_diff_e2e --reset  
| Hparams chains:  ['configs/config_base.yaml', 'configs/tts/base.yaml', 'configs/tts/fs2.yaml', 'configs/tts/base_zh.yaml', 'configs/singing/base.yaml', 'usr/configs/base.yaml', 'usr/configs/popcs_ds_beta6.yaml', 'usr/configs/m4singer/base.yaml', 'usr/configs/m4singer/diff.yaml']
| Hparams: 
K_step: 1000, accumulate_grad_batches: 1, audio_num_mel_bins: 80, audio_sample_rate: 24000, base_config: ['usr/configs/m4singer/base.yaml'], 
binarization_args: {'shuffle': False, 'with_txt': True, 'with_wav': False, 'with_align': True, 'with_spk_embed': True, 'with_f0': True, 'with_f0cwt': True}, binarizer_cls: data_gen.singing.binarize.M4SingerBinarizer, binary_data_dir: data/binary/m4singer, check_val_every_n_epoch: 10, clip_grad_norm: 1, 
content_cond_steps: [], cwt_add_f0_loss: False, cwt_hidden_size: 128, cwt_layers: 2, cwt_loss: l1, 
cwt_std_scale: 0.8, datasets: ['m4singer'], debug: False, dec_ffn_kernel_size: 9, dec_layers: 4, 
decay_steps: 100000, decoder_type: fft, dict_dir: , diff_decoder_type: wavenet, diff_loss_type: l1, 
dilation_cycle_length: 4, dropout: 0.1, ds_workers: 4, dur_enc_hidden_stride_kernel: ['0,2,3', '0,2,3', '0,1,3'], dur_loss: mse, 
dur_predictor_kernel: 3, dur_predictor_layers: 5, enc_ffn_kernel_size: 9, enc_layers: 4, encoder_K: 8, 
encoder_type: fft, endless_ds: True, ffn_act: gelu, ffn_padding: SAME, fft_size: 512, 
fmax: 12000, fmin: 30, fs2_ckpt: checkpoints/m4singer_fs2_e2e, gaussian_start: True, gen_dir_name: , 
gen_tgt_spk_id: -1, hidden_size: 256, hop_size: 128, infer: False, keep_bins: 80, 
lambda_commit: 0.25, lambda_energy: 0.0, lambda_f0: 0.0, lambda_ph_dur: 1.0, lambda_sent_dur: 1.0, 
lambda_uv: 0.0, lambda_word_dur: 1.0, load_ckpt: , log_interval: 100, loud_norm: False, 
lr: 0.001, max_beta: 0.02, max_epochs: 1000, max_eval_sentences: 1, max_eval_tokens: 60000, 
max_frames: 5000, max_input_tokens: 1550, max_sentences: 28, max_tokens: 36000, max_updates: 900000, 
mel_loss: ssim:0.5|l1:0.5, mel_vmax: 1.5, mel_vmin: -6.0, min_level_db: -120, norm_type: gn, 
num_ckpt_keep: 3, num_heads: 2, num_sanity_val_steps: 1, num_spk: 20, num_test_samples: 0, 
num_valid_plots: 10, optimizer_adam_beta1: 0.9, optimizer_adam_beta2: 0.98, out_wav_norm: False, pe_ckpt: checkpoints/m4singer_pe, 
pe_enable: True, pitch_ar: False, pitch_enc_hidden_stride_kernel: ['0,2,5', '0,2,5', '0,2,5'], pitch_extractor: parselmouth, pitch_loss: l1, 
pitch_norm: log, pitch_type: frame, pndm_speedup: 5, pre_align_args: {'use_tone': False, 'forced_align': 'mfa', 'use_sox': True, 'txt_processor': 'zh_g2pM', 'allow_no_txt': False, 'denoise': False}, pre_align_cls: data_gen.singing.pre_align.SingingPreAlign, 
predictor_dropout: 0.5, predictor_grad: 0.1, predictor_hidden: -1, predictor_kernel: 5, predictor_layers: 5, 
prenet_dropout: 0.5, prenet_hidden_size: 256, pretrain_fs_ckpt: , processed_data_dir: xxx, profile_infer: False, 
raw_data_dir: data/raw/m4singer, ref_norm_layer: bn, rel_pos: True, reset_phone_dict: True, residual_channels: 256, 
residual_layers: 20, save_best: False, save_ckpt: True, save_codes: ['configs', 'modules', 'tasks', 'utils', 'usr'], save_f0: True, 
save_gt: True, schedule_type: linear, seed: 1234, sort_by_len: True, spec_max: [-0.3894500136375427, -0.3796464204788208, -0.2914905250072479, -0.15550297498703003, -0.08502643555402756, 0.10698417574167252, -0.0739326998591423, -0.0541548952460289, 0.15501998364925385, 0.06483431905508041, 0.03054228238761425, -0.013737732544541359, -0.004876468330621719, 0.04368264228105545, 0.13329921662807465, 0.16471388936042786, 0.04605761915445328, -0.05680707097053528, 0.0542571023106575, -0.0076539707370102406, -0.00953489076346159, -0.04434828832745552, 0.001293870504014194, -0.12238839268684387, 0.06418416649103165, 0.02843189612030983, 0.08505241572856903, 0.07062800228595734, 0.00120724702719599, -0.07675088942050934, 0.03785804659128189, 0.04890783503651619, -0.06888376921415329, -0.0839693546295166, -0.17545585334300995, -0.2911079525947571, -0.4238220453262329, -0.262084037065506, -0.3002263605594635, -0.3845032751560211, -0.3906497061252594, -0.6550108790397644, -0.7810799479484558, -0.7503029704093933, -0.7995198965072632, -0.8092347383499146, -0.6196113228797913, -0.6684317588806152, -0.7735874056816101, -0.8324533104896545, -0.9601566791534424, -0.955253541469574, -0.748817503452301, -0.9106167554855347, -0.9707801342010498, -1.053107500076294, -1.0448424816131592, -1.1082794666290283, -1.1296544075012207, -1.071642279624939, -1.1003081798553467, -1.166810154914856, -1.1408926248550415, -1.1330615282058716, -1.1167492866516113, -1.0716774463653564, -1.035891056060791, -1.0092483758926392, -0.9675999879837036, -0.938962996006012, -1.0120564699172974, -0.9777995347976685, -1.029313564300537, -0.9459163546562195, -0.8519706130027771, -0.7751091122627258, -0.7933766841888428, -0.9019735455513, -0.9983296990394592, -1.505873441696167], 
spec_min: [-6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0, -6.0], spk_cond_steps: [], stop_token_weight: 5.0, task_cls: usr.diffsinger_task.DiffSingerMIDITask, test_ids: [], 
test_input_dir: , test_num: 0, test_prefixes: ['Alto-2#岁月神偷', 'Alto-2#奇妙能力歌', 'Tenor-1#一千年以后', 'Tenor-1#童话', 'Tenor-2#消愁', 'Tenor-2#一荤一素', 'Soprano-1#念奴娇赤壁怀古', 'Soprano-1#问春'], test_set_name: test, timesteps: 1000, 
train_set_name: train, use_denoise: False, use_energy_embed: False, use_gt_dur: False, use_gt_f0: False, 
use_midi: True, use_nsf: True, use_pitch_embed: False, use_pos_embed: True, use_spk_embed: False, 
use_spk_id: True, use_split_spk_id: False, use_uv: True, use_var_enc: False, val_check_interval: 2000, 
valid_num: 0, valid_set_name: valid, validate: False, vocoder: vocoders.hifigan.HifiGAN, vocoder_ckpt: checkpoints/m4singer_hifigan, 
warmup_updates: 2000, wav2spec_eps: 1e-6, weight_decay: 0, win_size: 512, work_dir: checkpoints/m4singer_diff_e2e, 

| Mel losses: {'ssim': 0.5, 'l1': 0.5}
| load HifiGAN:  checkpoints/m4singer_hifigan/model_ckpt_steps_1970000.ckpt
Removing weight norm...
| Loaded model parameters from checkpoints/m4singer_hifigan/model_ckpt_steps_1970000.ckpt.
| HifiGAN device: cuda.
| load HifiGAN:  checkpoints/m4singer_hifigan/model_ckpt_steps_1970000.ckpt
Removing weight norm...
| Loaded model parameters from checkpoints/m4singer_hifigan/model_ckpt_steps_1970000.ckpt.
| HifiGAN device: cuda.
| load 'model' from 'checkpoints/m4singer_pe/model_ckpt_steps_280000.ckpt'.
09/27 07:09:00 PM gpu available: True, used: True
| Copied codes to checkpoints/m4singer_diff_e2e/codes/20230927190900.
| load 'model' from 'checkpoints/m4singer_fs2_e2e/model_ckpt_steps_320000.ckpt'.
| model Arch:  GaussianDiffusion(
  (denoise_fn): DiffNet(
    (input_projection): Conv1d(80, 256, kernel_size=(1,), stride=(1,))
    (diffusion_embedding): SinusoidalPosEmb()
    (mlp): Sequential(
      (0): Linear(in_features=256, out_features=1024, bias=True)
      (1): Mish()
      (2): Linear(in_features=1024, out_features=256, bias=True)
    )
    (residual_layers): ModuleList(
      (0): ResidualBlock(
        (dilated_conv): Conv1d(256, 512, kernel_size=(3,), stride=(1,), padding=(1,))
        (diffusion_projection): Linear(in_features=256, out_features=256, bias=True)
        (conditioner_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
        (output_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      )
      (1): ResidualBlock(
        (dilated_conv): Conv1d(256, 512, kernel_size=(3,), stride=(1,), padding=(2,), dilation=(2,))
        (diffusion_projection): Linear(in_features=256, out_features=256, bias=True)
        (conditioner_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
        (output_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      )
      (2): ResidualBlock(
        (dilated_conv): Conv1d(256, 512, kernel_size=(3,), stride=(1,), padding=(4,), dilation=(4,))
        (diffusion_projection): Linear(in_features=256, out_features=256, bias=True)
        (conditioner_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
        (output_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      )
      (3): ResidualBlock(
        (dilated_conv): Conv1d(256, 512, kernel_size=(3,), stride=(1,), padding=(8,), dilation=(8,))
        (diffusion_projection): Linear(in_features=256, out_features=256, bias=True)
        (conditioner_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
        (output_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      )
      (4): ResidualBlock(
        (dilated_conv): Conv1d(256, 512, kernel_size=(3,), stride=(1,), padding=(1,))
        (diffusion_projection): Linear(in_features=256, out_features=256, bias=True)
        (conditioner_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
        (output_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      )
      (5): ResidualBlock(
        (dilated_conv): Conv1d(256, 512, kernel_size=(3,), stride=(1,), padding=(2,), dilation=(2,))
        (diffusion_projection): Linear(in_features=256, out_features=256, bias=True)
        (conditioner_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
        (output_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      )
      (6): ResidualBlock(
        (dilated_conv): Conv1d(256, 512, kernel_size=(3,), stride=(1,), padding=(4,), dilation=(4,))
        (diffusion_projection): Linear(in_features=256, out_features=256, bias=True)
        (conditioner_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
        (output_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      )
      (7): ResidualBlock(
        (dilated_conv): Conv1d(256, 512, kernel_size=(3,), stride=(1,), padding=(8,), dilation=(8,))
        (diffusion_projection): Linear(in_features=256, out_features=256, bias=True)
        (conditioner_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
        (output_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      )
      (8): ResidualBlock(
        (dilated_conv): Conv1d(256, 512, kernel_size=(3,), stride=(1,), padding=(1,))
        (diffusion_projection): Linear(in_features=256, out_features=256, bias=True)
        (conditioner_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
        (output_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      )
      (9): ResidualBlock(
        (dilated_conv): Conv1d(256, 512, kernel_size=(3,), stride=(1,), padding=(2,), dilation=(2,))
        (diffusion_projection): Linear(in_features=256, out_features=256, bias=True)
        (conditioner_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
        (output_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      )
      (10): ResidualBlock(
        (dilated_conv): Conv1d(256, 512, kernel_size=(3,), stride=(1,), padding=(4,), dilation=(4,))
        (diffusion_projection): Linear(in_features=256, out_features=256, bias=True)
        (conditioner_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
        (output_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      )
      (11): ResidualBlock(
        (dilated_conv): Conv1d(256, 512, kernel_size=(3,), stride=(1,), padding=(8,), dilation=(8,))
        (diffusion_projection): Linear(in_features=256, out_features=256, bias=True)
        (conditioner_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
        (output_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      )
      (12): ResidualBlock(
        (dilated_conv): Conv1d(256, 512, kernel_size=(3,), stride=(1,), padding=(1,))
        (diffusion_projection): Linear(in_features=256, out_features=256, bias=True)
        (conditioner_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
        (output_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      )
      (13): ResidualBlock(
        (dilated_conv): Conv1d(256, 512, kernel_size=(3,), stride=(1,), padding=(2,), dilation=(2,))
        (diffusion_projection): Linear(in_features=256, out_features=256, bias=True)
        (conditioner_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
        (output_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      )
      (14): ResidualBlock(
        (dilated_conv): Conv1d(256, 512, kernel_size=(3,), stride=(1,), padding=(4,), dilation=(4,))
        (diffusion_projection): Linear(in_features=256, out_features=256, bias=True)
        (conditioner_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
        (output_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      )
      (15): ResidualBlock(
        (dilated_conv): Conv1d(256, 512, kernel_size=(3,), stride=(1,), padding=(8,), dilation=(8,))
        (diffusion_projection): Linear(in_features=256, out_features=256, bias=True)
        (conditioner_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
        (output_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      )
      (16): ResidualBlock(
        (dilated_conv): Conv1d(256, 512, kernel_size=(3,), stride=(1,), padding=(1,))
        (diffusion_projection): Linear(in_features=256, out_features=256, bias=True)
        (conditioner_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
        (output_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      )
      (17): ResidualBlock(
        (dilated_conv): Conv1d(256, 512, kernel_size=(3,), stride=(1,), padding=(2,), dilation=(2,))
        (diffusion_projection): Linear(in_features=256, out_features=256, bias=True)
        (conditioner_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
        (output_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      )
      (18): ResidualBlock(
        (dilated_conv): Conv1d(256, 512, kernel_size=(3,), stride=(1,), padding=(4,), dilation=(4,))
        (diffusion_projection): Linear(in_features=256, out_features=256, bias=True)
        (conditioner_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
        (output_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      )
      (19): ResidualBlock(
        (dilated_conv): Conv1d(256, 512, kernel_size=(3,), stride=(1,), padding=(8,), dilation=(8,))
        (diffusion_projection): Linear(in_features=256, out_features=256, bias=True)
        (conditioner_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
        (output_projection): Conv1d(256, 512, kernel_size=(1,), stride=(1,))
      )
    )
    (skip_projection): Conv1d(256, 256, kernel_size=(1,), stride=(1,))
    (output_projection): Conv1d(256, 80, kernel_size=(1,), stride=(1,))
  )
  (fs2): FastSpeech2MIDI(
    (encoder_embed_tokens): Embedding(61, 256, padding_idx=0)
    (decoder): FastspeechDecoder(
      (embed_positions): SinusoidalPositionalEmbedding()
      (layers): ModuleList(
        (0): TransformerEncoderLayer(
          (op): EncSALayer(
            (layer_norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (self_attn): MultiheadAttention(
              (out_proj): Linear(in_features=256, out_features=256, bias=False)
            )
            (layer_norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (ffn): TransformerFFNLayer(
              (ffn_1): Conv1d(256, 1024, kernel_size=(9,), stride=(1,), padding=(4,))
              (ffn_2): Linear(in_features=1024, out_features=256, bias=True)
            )
          )
        )
        (1): TransformerEncoderLayer(
          (op): EncSALayer(
            (layer_norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (self_attn): MultiheadAttention(
              (out_proj): Linear(in_features=256, out_features=256, bias=False)
            )
            (layer_norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (ffn): TransformerFFNLayer(
              (ffn_1): Conv1d(256, 1024, kernel_size=(9,), stride=(1,), padding=(4,))
              (ffn_2): Linear(in_features=1024, out_features=256, bias=True)
            )
          )
        )
        (2): TransformerEncoderLayer(
          (op): EncSALayer(
            (layer_norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (self_attn): MultiheadAttention(
              (out_proj): Linear(in_features=256, out_features=256, bias=False)
            )
            (layer_norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (ffn): TransformerFFNLayer(
              (ffn_1): Conv1d(256, 1024, kernel_size=(9,), stride=(1,), padding=(4,))
              (ffn_2): Linear(in_features=1024, out_features=256, bias=True)
            )
          )
        )
        (3): TransformerEncoderLayer(
          (op): EncSALayer(
            (layer_norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (self_attn): MultiheadAttention(
              (out_proj): Linear(in_features=256, out_features=256, bias=False)
            )
            (layer_norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (ffn): TransformerFFNLayer(
              (ffn_1): Conv1d(256, 1024, kernel_size=(9,), stride=(1,), padding=(4,))
              (ffn_2): Linear(in_features=1024, out_features=256, bias=True)
            )
          )
        )
      )
      (layer_norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
    )
    (mel_out): Linear(in_features=256, out_features=80, bias=True)
    (spk_embed_proj): Embedding(21, 256)
    (dur_predictor): DurationPredictor(
      (conv): ModuleList(
        (0): Sequential(
          (0): ConstantPad1d(padding=(1, 1), value=0)
          (1): Conv1d(256, 256, kernel_size=(3,), stride=(1,))
          (2): ReLU()
          (3): LayerNorm((256,), eps=1e-12, elementwise_affine=True)
          (4): Dropout(p=0.5, inplace=False)
        )
        (1): Sequential(
          (0): ConstantPad1d(padding=(1, 1), value=0)
          (1): Conv1d(256, 256, kernel_size=(3,), stride=(1,))
          (2): ReLU()
          (3): LayerNorm((256,), eps=1e-12, elementwise_affine=True)
          (4): Dropout(p=0.5, inplace=False)
        )
        (2): Sequential(
          (0): ConstantPad1d(padding=(1, 1), value=0)
          (1): Conv1d(256, 256, kernel_size=(3,), stride=(1,))
          (2): ReLU()
          (3): LayerNorm((256,), eps=1e-12, elementwise_affine=True)
          (4): Dropout(p=0.5, inplace=False)
        )
        (3): Sequential(
          (0): ConstantPad1d(padding=(1, 1), value=0)
          (1): Conv1d(256, 256, kernel_size=(3,), stride=(1,))
          (2): ReLU()
          (3): LayerNorm((256,), eps=1e-12, elementwise_affine=True)
          (4): Dropout(p=0.5, inplace=False)
        )
        (4): Sequential(
          (0): ConstantPad1d(padding=(1, 1), value=0)
          (1): Conv1d(256, 256, kernel_size=(3,), stride=(1,))
          (2): ReLU()
          (3): LayerNorm((256,), eps=1e-12, elementwise_affine=True)
          (4): Dropout(p=0.5, inplace=False)
        )
      )
      (linear): Linear(in_features=256, out_features=1, bias=True)
    )
    (length_regulator): LengthRegulator()
    (encoder): FastspeechMIDIEncoder(
      (layers): ModuleList(
        (0): TransformerEncoderLayer(
          (op): EncSALayer(
            (layer_norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (self_attn): MultiheadAttention(
              (out_proj): Linear(in_features=256, out_features=256, bias=False)
            )
            (layer_norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (ffn): TransformerFFNLayer(
              (ffn_1): Conv1d(256, 1024, kernel_size=(9,), stride=(1,), padding=(4,))
              (ffn_2): Linear(in_features=1024, out_features=256, bias=True)
            )
          )
        )
        (1): TransformerEncoderLayer(
          (op): EncSALayer(
            (layer_norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (self_attn): MultiheadAttention(
              (out_proj): Linear(in_features=256, out_features=256, bias=False)
            )
            (layer_norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (ffn): TransformerFFNLayer(
              (ffn_1): Conv1d(256, 1024, kernel_size=(9,), stride=(1,), padding=(4,))
              (ffn_2): Linear(in_features=1024, out_features=256, bias=True)
            )
          )
        )
        (2): TransformerEncoderLayer(
          (op): EncSALayer(
            (layer_norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (self_attn): MultiheadAttention(
              (out_proj): Linear(in_features=256, out_features=256, bias=False)
            )
            (layer_norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (ffn): TransformerFFNLayer(
              (ffn_1): Conv1d(256, 1024, kernel_size=(9,), stride=(1,), padding=(4,))
              (ffn_2): Linear(in_features=1024, out_features=256, bias=True)
            )
          )
        )
        (3): TransformerEncoderLayer(
          (op): EncSALayer(
            (layer_norm1): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (self_attn): MultiheadAttention(
              (out_proj): Linear(in_features=256, out_features=256, bias=False)
            )
            (layer_norm2): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
            (ffn): TransformerFFNLayer(
              (ffn_1): Conv1d(256, 1024, kernel_size=(9,), stride=(1,), padding=(4,))
              (ffn_2): Linear(in_features=1024, out_features=256, bias=True)
            )
          )
        )
      )
      (layer_norm): LayerNorm((256,), eps=1e-05, elementwise_affine=True)
      (embed_tokens): Embedding(61, 256, padding_idx=0)
      (embed_positions): RelPositionalEncoding(
        (dropout): Dropout(p=0.0, inplace=False)
      )
    )
    (midi_embed): Embedding(300, 256, padding_idx=0)
    (midi_dur_layer): Linear(in_features=1, out_features=256, bias=True)
    (is_slur_embed): Embedding(2, 256)
  )
)
| model Trainable Parameters: 39.281M
09/27 07:09:01 PM model and trainer restored from checkpoint: checkpoints/m4singer_diff_e2e/model_ckpt_steps_900000.ckpt
Validation sanity check:   0%|                                                                                                                                   | 0/1 [00:00 gaussion start.
sample time step: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:01<00:00, 116.90it/s]
sample time step:  96%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏     | 191/200 [00:01<00:00, 119.51it/s]
==============
 valid results: {'total_loss': 0.065, 'mel': 0.0536, 'pdur': 0.0098, 'wdur': 0.0014, 'sdur': 0.0002}
==============

Epoch 1: : 0batch [00:00, ?batch/s]Traceback (most recent call last):                                                                                                                     
  File "tasks/run.py", line 15, in
    run_task()
  File "tasks/run.py", line 10, in run_task
    task_cls.start()
  File "/home/yeqiang/Downloads/ai/M4Singer/code/tasks/base_task.py", line 257, in start
    trainer.fit(task)
  File "/home/yeqiang/Downloads/ai/M4Singer/code/utils/pl_utils.py", line 489, in fit
    self.run_pretrain_routine(model)
  File "/home/yeqiang/Downloads/ai/M4Singer/code/utils/pl_utils.py", line 582, in run_pretrain_routine
    self.train()
  File "/home/yeqiang/Downloads/ai/M4Singer/code/utils/pl_utils.py", line 1358, in train
    self.run_training_epoch()
  File "/home/yeqiang/Downloads/ai/M4Singer/code/utils/pl_utils.py", line 1392, in run_training_epoch
    output = self.run_training_batch(batch, batch_idx)
  File "/home/yeqiang/Downloads/ai/M4Singer/code/utils/pl_utils.py", line 1514, in run_training_batch
    loss = optimizer_closure()
  File "/home/yeqiang/Downloads/ai/M4Singer/code/utils/pl_utils.py", line 1480, in optimizer_closure
    split_batch, batch_idx, opt_idx, self.hiddens)
  File "/home/yeqiang/Downloads/ai/M4Singer/code/utils/pl_utils.py", line 1588, in training_forward
    output = self.model.training_step(*args)
  File "/home/yeqiang/Downloads/ai/M4Singer/code/tasks/base_task.py", line 128, in training_step
    loss_ret = self._training_step(sample, batch_idx, optimizer_idx)
  File "/home/yeqiang/Downloads/ai/M4Singer/code/usr/task.py", line 57, in _training_step
    log_outputs = self.run_model(self.model, sample)
  File "/home/yeqiang/Downloads/ai/M4Singer/code/usr/diffsinger_task.py", line 301, in run_model
    midi_dur=sample.get('midi_dur'), is_slur=sample.get('is_slur'))
  File "/home/yeqiang/Downloads/ai/M4Singer/code/venv3712/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yeqiang/Downloads/ai/M4Singer/code/usr/diff/shallow_diffusion_tts.py", line 242, in forward
    ret['diff_loss'] = self.p_losses(x, t, cond)
  File "/home/yeqiang/Downloads/ai/M4Singer/code/usr/diff/shallow_diffusion_tts.py", line 214, in p_losses
    x_recon = self.denoise_fn(x_noisy, t, cond)
  File "/home/yeqiang/Downloads/ai/M4Singer/code/venv3712/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yeqiang/Downloads/ai/M4Singer/code/usr/diff/net.py", line 123, in forward
    x, skip_connection = layer(x, cond, diffusion_step)
  File "/home/yeqiang/Downloads/ai/M4Singer/code/venv3712/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yeqiang/Downloads/ai/M4Singer/code/usr/diff/net.py", line 71, in forward
    y = self.dilated_conv(y) + conditioner
  File "/home/yeqiang/Downloads/ai/M4Singer/code/venv3712/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/yeqiang/Downloads/ai/M4Singer/code/venv3712/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 257, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 70.00 MiB (GPU 0; 5.78 GiB total capacity; 3.06 GiB already allocated; 27.62 MiB free; 3.14 GiB reserved in total by PyTorch)
Exception ignored in:
Traceback (most recent call last):
  File "/home/yeqiang/Downloads/ai/M4Singer/code/venv3712/lib/python3.7/site-packages/tqdm/std.py", line 1124, in __del__
  File "/home/yeqiang/Downloads/ai/M4Singer/code/venv3712/lib/python3.7/site-packages/tqdm/std.py", line 1337, in close
  File "/home/yeqiang/Downloads/ai/M4Singer/code/venv3712/lib/python3.7/site-packages/tqdm/std.py", line 1516, in display
  File "/home/yeqiang/Downloads/ai/M4Singer/code/venv3712/lib/python3.7/site-packages/tqdm/std.py", line 1127, in __repr__
  File "/home/yeqiang/Downloads/ai/M4Singer/code/venv3712/lib/python3.7/site-packages/tqdm/std.py", line 1477, in format_dict
TypeError: cannot unpack non-iterable NoneType object

哦豁,2060 表示显存不够!

另外,前面的打包数据集过程应该是用不上显卡,不是torch版本有问题

你可能感兴趣的:(ai,M4Singer,M4Singer,ai)