需求:从nasa批量下载CAPLISO数据,每个数据在400M~500M之间。
遇到的问题:直接下载速度几K到几十K/s,后可以达到1M/s。
尝试方法:
VPN:不稳定;
云服务器:架设在境外,从nasa到服务器下载速度很好,但从服务器到本地速度很慢,ftp速度在20k~200k/s之间。
解决方案:
用阿里云下载到服务器后,上传到七牛云,然后再从七牛云下载到本地。(其中尝试直接使用qshell sync/fetch 抓取资源,但以失败告终)
wget无法从名字上区分下载失败成功的文件,没有找到合适的下载工具,决定使用Python的wget包,进行数据下载:
import wget,threading,glob,requests
Max_connentions = 3 # 最大线程数
def downUrl(url):
# wget.download(url)
print(url)
wget.download(url)
semaphore.release()
def sub1(i):
print(i)
semaphore.release()
if __name__=="__main__":
semaphore = threading.Semaphore(Max_connentions)
# 读取文件
urlFile = glob.glob("*.txt")[0]
# 获取url
urls = []
with open(urlFile,"r") as f:
for line in f:
url = line.replace("\n",'')
urls.append(url)
# 多线程下载
threads = []
for url in urls:
semaphore.acquire()
t = threading.Thread(target=downUrl,args=(url,))
threads.append(t)
t.start()
for t in threads:
t.join()
参考借助海外服务器+七牛云加速文件下载, 和七牛云命令行工具.
mkdir qshell
cd shell
wget http://devtools.qiniu.com/qshell-linux-x86-v2.4.1.zip
unzip qshell-linux-x86-v2.4.1.zip
mv qshell-linux-x86-v2.4.1 qshell
export PATH=$PATH:/root/qshell
qshell account AK SK name
qshell qupload
同步文件.qshell dircache /root/CAPLISO -o CAL.txt
cat CAL.txt
root@iZrj95owmqogpx359ex82rZ:~/qshell# cat CAL.txt
CAL_LID_L1-Standard-V4-10.2019-01-01T00-25-44ZN.hdf 450167668 15897894038522642
CAL_LID_L1-Standard-V4-10.2019-01-01T01-11-54ZD.hdf 507582814 15897895655303009
CAL_LID_L1-Standard-V4-10.2019-01-01T02-04-19ZN.hdf 450560900 15897894846512810
CAL_LID_L1-Standard-V4-10.2019-01-01T02-50-29ZD.hdf 507976024 15897900092148671
CAL_LID_L1-Standard-V4-10.2019-01-01T03-42-49ZN.hdf 450167690 15897899708814358
CAL_LID_L1-Standard-V4-10.2019-01-01T04-29-00ZD.hdf 507976019 15897901334875073
CAL_LID_L1-Standard-V4-10.2019-01-01T05-21-20ZN.hdf 450167690 15897904515833847
CAL_LID_L1-Standard-V4-10.2019-01-01T06-07-30ZD.hdf 507976018 15897907486464768
CAL_LID_L1-Standard-V4-10.2019-01-01T06-59-55ZN.hdf 450167693 15897906621112456
CAL_LID_L1-Standard-V4-10.2019-01-01T07-46-05ZD.hdf 507976022 15897912499531996
CAL_LID_L1-Standard-V4-10.2019-01-01T08-38-25ZN.hdf 450560903 15897911334968496
CAL_LID_L1-Standard-V4-10.2019-01-01T09-24-35ZD.hdf 507582804 15897915489883694
CAL_LID_L1-Standard-V4-10.2019-01-01T10-17-00ZN.hdf 450560902 15897915841776838
CAL_LID_L1-Standard-V4-10.2019-01-01T11-03-10ZD.hdfs9howd3_.tmp 396845056 15897916606685410
CAL_LID_L1-Standard-V4-10.2019-01-01T11-55-31ZN.hdfpnon2cqk.tmp 88195072 15897916606965420
CAL_LID_L1-Standard-V4-10.2019-01-01T12-41-41ZD.hdf9uzdqnk6.tmp 89456640 15897916607005422
LID_L1_2019_01.txt 116242 15897867594335127
batchDownload.py 840 15897889706120881
只有.hdf结尾的是下载完成的,筛选下:
cat CAL.txt | grep 'hdf' | grep -v '.tmp' > filelist.txt
cat filelist.txt
CAL_LID_L1-Standard-V4-10.2019-01-01T00-25-44ZN.hdf 450167668 15897894038522642
CAL_LID_L1-Standard-V4-10.2019-01-01T01-11-54ZD.hdf 507582814 15897895655303009
CAL_LID_L1-Standard-V4-10.2019-01-01T02-04-19ZN.hdf 450560900 15897894846512810
CAL_LID_L1-Standard-V4-10.2019-01-01T02-50-29ZD.hdf 507976024 15897900092148671
CAL_LID_L1-Standard-V4-10.2019-01-01T03-42-49ZN.hdf 450167690 15897899708814358
CAL_LID_L1-Standard-V4-10.2019-01-01T04-29-00ZD.hdf 507976019 15897901334875073
CAL_LID_L1-Standard-V4-10.2019-01-01T05-21-20ZN.hdf 450167690 15897904515833847
CAL_LID_L1-Standard-V4-10.2019-01-01T06-07-30ZD.hdf 507976018 15897907486464768
CAL_LID_L1-Standard-V4-10.2019-01-01T06-59-55ZN.hdf 450167693 15897906621112456
CAL_LID_L1-Standard-V4-10.2019-01-01T07-46-05ZD.hdf 507976022 15897912499531996
CAL_LID_L1-Standard-V4-10.2019-01-01T08-38-25ZN.hdf 450560903 15897911334968496
CAL_LID_L1-Standard-V4-10.2019-01-01T09-24-35ZD.hdf 507582804 15897915489883694
CAL_LID_L1-Standard-V4-10.2019-01-01T10-17-00ZN.hdf 450560902 15897915841776838
设置配置文件up.conf
{
“src_dir” : “/root/CAPLISO”,
“bucket” : “capliso-download”,
“file_list” : “filelist.txt”,
“ignore_dir” : false,
“overwrite” : false,
“check_exists” : false,
“check_hash” : false,
“check_size” : false,
“rescan_local” : true,
“skip_file_prefixes” : “test,demo,”,
“skip_path_prefixes” : “hello/,temp/”,
“skip_fixed_strings” : “.svn,.git”,
“skip_suffixes” : “.DS_Store,.exe”,
“log_file” : “upload.log”,
“log_level” : “info”,
“log_rotate” : 1,
“log_stdout” : false,
“file_type” : 0,
“delete_on_success” : true
}
2.2 同步:
./qshell dircache /root/CAPLISO -o CAL.txt
cat CAL.txt | grep 'hdf' | grep -v '.tmp' > filelist.txt
cat filelist.txt
./qshell qupload --success-list success.txt --failure-list failure.txt up.conf
import os,glob
File = glob.glob("*.txt")[0]
with open(File,"r") as f:
for line in f:
name = line.split()[1]
if not os.path.exists(name):
print("Downloading: {}".format(name))
os.system("qshell get capliso-download {}".format(name))
else: print("Saved: {}".format(name))
Socket error Event: 32 Error: 10053.
Connection closing…Socket close.
解决:
chmod 400 /etc/ssh/*
service sshd restart
chmod 770 /etc/ssh/ssh_host_dsa_key.pub
chmod 770 /etc/ssh/ssh_host_rsa_key.pub
service network restart
socket.gaierror: [Errno -3] Temporary failure in name resoluion
DNS 解析服务器出错,添加谷歌DNS服务器:
nameserver 8.8.8.8
nameserver 8.8.4.4