1、使用服务器下载数据集——创建文件夹
(base) [LiMiao@gpu08 /]$ cd data
(base) [LiMiao@gpu08 data]$ ls
heshulin liu_wang_data lost+found yangyang
(base) [LiMiao@gpu08 data]$ mkdir limiao
(base) [LiMiao@gpu08 data]$ ls
heshulin limiao liu_wang_data lost+found yangyang
(base) [LiMiao@gpu08 data]$ cd limiao
(base) [LiMiao@gpu08 limiao]$ mkdir develop_data
(base) [LiMiao@gpu08 limiao]$ ls
develop_data
(base) [LiMiao@gpu08 limiao]$ cd develop_data
注:下载数据集出错:(原因特殊字符未转码)
(base) [LiMiao@gpu08 develop_data]$ wget http://aidownload.futurelab.tv/2020af-sr-aishell2.zip?e=1589078612&token=kO26gOFamzTLWaBGhBdGdua2WO4ejK-xeVnrFJMk:b2CLU2YtJFdsHSVKdC0XmmMe3tU=&sign=e44d82eb806a436351e847fadd23b085&t=5eb66d34
[1] 26822
[2] 26823
[3] 26824
(base) [LiMiao@gpu08 develop_data]$ --2020-05-09 10:46:36-- http://aidownload.futurelab.tv/2020af-sr-aishell2.zip?e=1589078612
Resolving aidownload.futurelab.tv (aidownload.futurelab.tv)... 202.97.231.18, 116.117.158.58, 113.229.254.8, ...
Connecting to aidownload.futurelab.tv (aidownload.futurelab.tv)|202.97.231.18|:80... connected.
HTTP request sent, awaiting response... 401 Unauthorized
Authorization failed.
[1] Exit 6 wget http://aidownload.futurelab.tv/2020af-sr-aishell2.zip?e=1589078612
[2]- Done token=kO26gOFamzTLWaBGhBdGdua2WO4ejK-xeVnrFJMk:b2CLU2YtJFdsHSVKdC0XmmMe3tU=
[3]+ Done sign=e44d82eb806a436351e847fadd23b085
2、在特殊字符的前面加上\
转码,下载成功:
(base) [LiMiao@gpu08 develop_data]$ wget http://aidownload.futurelab.tv/2020af-sr-aishell2.zip\?e\=1589078612\&token\=kO26gOFamzTLWaBGhBdGdua2WO4ejK-xeVnrFJMk:b2CLU2YtJFdsHSVKdC0XmmMe3tU\=\&sign\=e44d82eb806a436351e847fadd23b085\&t\=5eb66d34
--2020-05-09 10:57:21-- http://aidownload.futurelab.tv/2020af-sr-aishell2.zip?e=1589078612&token=kO26gOFamzTLWaBGhBdGdua2WO4ejK-xeVnrFJMk:b2CLU2YtJFdsHSVKdC0XmmMe3tU=&sign=e44d82eb806a436351e847fadd23b085&t=5eb66d34
Resolving aidownload.futurelab.tv (aidownload.futurelab.tv)... 123.129.244.194, 101.72.205.202, 61.240.154.98, ...
Connecting to aidownload.futurelab.tv (aidownload.futurelab.tv)|123.129.244.194|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3911445147 (3.6G) [application/zip]
Saving to: ‘2020af-sr-aishell2.zip?e=1589078612&token=kO26gOFamzTLWaBGhBdGdua2WO4ejK-xeVnrFJMk:b2CLU2YtJFdsHSVKdC0XmmMe3tU=&sign=e44d82eb806a436351e847fadd23b085&t=5eb66d34’
100%[==========================================>] 3,911,445,147 9.58MB/s in 6m 30s
2020-05-09 11:03:52 (9.56 MB/s) - ‘2020af-sr-aishell2.zip?e=1589078612&token=kO26gOFamzTLWaBGhBdGdua2WO4ejK-xeVnrFJMk:b2CLU2YtJFdsHSVKdC0XmmMe3tU=&sign=e44d82eb806a436351e847fadd23b085&t=5eb66d34’ saved [3911445147/3911445147]
(base) [LiMiao@gpu08 develop_data]$ ls
19
2020af-sr-aishell2.zip?e=1589078612&token=kO26gOFamzTLWaBGhBdGdua2WO4ejK-xeVnrFJMk:b2CLU2YtJFdsHSVKdC0XmmMe3tU=&sign=e44d82eb806a436351e847fadd23b085&t=5eb66d3
3、更改文件名称:
(base) [LiMiao@gpu08 develop_data]$ mv 2020af-sr-aishell2.zip\?e\=1589078612\&token\=kO26gOFamzTLWaBGhBdGdua2WO4ejK-xeVnrFJMk\:b2CLU2YtJFdsHSVKdC0XmmMe3tU\=\&sign\=e44d82eb806a436351e847fadd23b085\&t\=5eb66d34 2020af-sr-aishell2.zip
(base) [LiMiao@gpu08 develop_data]$ ls
19 2020af-sr-aishell2.zip
4、解压文件:
(base) [LiMiao@gpu08 develop_data]$ unzip 2020af-sr-aishell2.zip
Archive: 2020af-sr-aishell2.zip
creating: AISHELL-2/
[2020af-sr-aishell2.zip] AISHELL-2/README.md password:
password incorrect--reenter:
inflating: AISHELL-2/README.md
creating: AISHELL-2/iOS/
inflating: AISHELL-2/iOS/AISHELL2-Data-Specification[ZH].docx
creating: AISHELL-2/iOS/data/
inflating: AISHELL-2/iOS/data/spk_info.txt
creating: AISHELL-2/iOS/data/wav/
inflating: AISHELL-2/iOS/data/wav/D1215.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1225.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1236.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1164.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1217.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1187.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1056.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1226.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1183.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1192.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1179.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1171.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1051.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1162.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1165.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1220.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1055.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1210.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1197.tar.gz
inflating: AISHELL-2/iOS/data/wav/D2164.tar.gz
inflating: AISHELL-2/iOS/data/wav/D2165.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1206.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1211.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1219.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1174.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1190.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1057.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1189.tar.gz
inflating: AISHELL-2/iOS/data/wav/D2166.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1059.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1196.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1194.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1199.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1049.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1198.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1227.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1168.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1163.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1205.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1053.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1180.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1218.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1238.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1200.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1240.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1188.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1237.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1181.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1061.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1048.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1167.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1172.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1186.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1052.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1222.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1223.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1224.tar.gz
inflating: AISHELL-2/iOS/data/wav/D2162.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1054.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1221.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1214.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1235.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1228.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1169.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1229.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1184.tar.gz
inflating: AISHELL-2/iOS/data/wav/D2161.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1202.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1178.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1170.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1185.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1193.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1212.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1232.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1234.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1208.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1231.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1060.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1050.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1173.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1058.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1239.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1191.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1182.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1062.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1233.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1241.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1177.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1175.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1204.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1213.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1230.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1209.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1203.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1201.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1207.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1161.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1195.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1216.tar.gz
inflating: AISHELL-2/iOS/data/wav/D1176.tar.gz
inflating: AISHELL-2/iOS/data/wav.scp
inflating: AISHELL-2/iOS/data/trans.txt
inflating: AISHELL-2/iOS/AISHELL2-Data-Specification[EN].docx
inflating: AISHELL-2/iOS/ChangeLog
(base) [LiMiao@gpu08 develop_data]$ ls
19 2020af-sr-aishell2.zip AISHELL-2
(base) [LiMiao@gpu08 develop_data]$ cd AISHELL-2
(base) [LiMiao@gpu08 AISHELL-2]$ ls
iOS README.md
(base) [LiMiao@gpu08 AISHELL-2]$ cd iOS
(base) [LiMiao@gpu08 iOS]$ ls
AISHELL2-Data-Specification[EN].docx ChangeLog
AISHELL2-Data-Specification[ZH].docx data
(base) [LiMiao@gpu08 iOS]$ cd data
(base) [LiMiao@gpu08 data]$ ls
spk_info.txt trans.txt wav wav.scp
(base) [LiMiao@gpu08 data]$ cd wav
(base) [LiMiao@gpu08 wav]$ ls
D1048.tar.gz D1163.tar.gz D1181.tar.gz D1198.tar.gz D1215.tar.gz D1232.tar.gz
D1049.tar.gz D1164.tar.gz D1182.tar.gz D1199.tar.gz D1216.tar.gz D1233.tar.gz
D1050.tar.gz D1165.tar.gz D1183.tar.gz D1200.tar.gz D1217.tar.gz D1234.tar.gz
D1051.tar.gz D1167.tar.gz D1184.tar.gz D1201.tar.gz D1218.tar.gz D1235.tar.gz
D1052.tar.gz D1168.tar.gz D1185.tar.gz D1202.tar.gz D1219.tar.gz D1236.tar.gz
D1053.tar.gz D1169.tar.gz D1186.tar.gz D1203.tar.gz D1220.tar.gz D1237.tar.gz
D1054.tar.gz D1170.tar.gz D1187.tar.gz D1204.tar.gz D1221.tar.gz D1238.tar.gz
D1055.tar.gz D1171.tar.gz D1188.tar.gz D1205.tar.gz D1222.tar.gz D1239.tar.gz
D1056.tar.gz D1172.tar.gz D1189.tar.gz D1206.tar.gz D1223.tar.gz D1240.tar.gz
D1057.tar.gz D1173.tar.gz D1190.tar.gz D1207.tar.gz D1224.tar.gz D1241.tar.gz
D1058.tar.gz D1174.tar.gz D1191.tar.gz D1208.tar.gz D1225.tar.gz D2161.tar.gz
D1059.tar.gz D1175.tar.gz D1192.tar.gz D1209.tar.gz D1226.tar.gz D2162.tar.gz
D1060.tar.gz D1176.tar.gz D1193.tar.gz D1210.tar.gz D1227.tar.gz D2164.tar.gz
D1061.tar.gz D1177.tar.gz D1194.tar.gz D1211.tar.gz D1228.tar.gz D2165.tar.gz
D1062.tar.gz D1178.tar.gz D1195.tar.gz D1212.tar.gz D1229.tar.gz D2166.tar.gz
D1161.tar.gz D1179.tar.gz D1196.tar.gz D1213.tar.gz D1230.tar.gz
D1162.tar.gz D1180.tar.gz D1197.tar.gz D1214.tar.gz D1231.tar.gz
现在我们还需要对wav中的说话人数据进行解压操作,为此,创建脚本文件,使用for循环进行解压,详细解压过程见下文