百度云重复文件删除脚本

由于百度云的垃圾清理功能中有个重复文件扫描,但是只有会员才有一键删除的权利,故有了用python脚本来个自动删除的想法。

首先随便删除一个文件,抓包,发现是post请求https://pan.baidu.com/api/filemanager?opera=delete&bdstoken=“”,然后data是filelist=*****,其中是用过url编码,果断写个函数代替

def cut(a,cookies,bdstoken):
    t = ""
    for s in range(len(a)):
        if a[s]!="\\" or a[s+1]!='/':
            t=t+a[s]
    t = '["' + t + '"]'
    url = "https://pan.baidu.com/api/filemanager"
    querystring = {"opera":"delete","bdstoken":bdstoken}
    payload = "filelist=" + quote(t,safe='')
    headers = {
        'Host': "pan.baidu.com",
        'Connection': "close",
        'Content-Length': "122",
        'Accept': "application/json, text/javascript, */*; q=0.01",
        'Origin': "https://pan.baidu.com",
        'X-Requested-With': "XMLHttpRequest",
        'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36",
        'Content-Type': "application/x-www-form-urlencoded",
        'Referer': "https://pan.baidu.com/disk/home?",
        'Accept-Language': "zh-CN,zh;q=0.9",
        'Cookie':cookies,
         'cache-control': "no-cache",
        }
    response = requests.request("POST", url, data=payload, headers=headers, params=querystring)
    return response.text

然后现在需要获取重复文件的列表,同样的抓包,发现是get方法,https://pan.baidu.com/api/list?dir=%2F,列表内有MD5值,就可以将MD5加到一个列表,当再次遇到时,即为重复,直接删除就好了,当'is_dir' = 1,时,即为文件夹,递归就好了

def get_file(path,cookies):
    url = "https://pan.baidu.com/api/list?dir="
    path = quote(path, safe='')
    url = url + path
    headers = {
        'Host': "pan.baidu.com",
        'Connection': "keep-alive",
        'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36",
        'Accept': "*/*",
        'Referer': "https://pan.baidu.com/disk/home?",
        'Accept-Encoding': "gzip, deflate, br",
        'Accept-Language': "zh-CN,zh;q=0.9",
        'Cookie':cookies,
       'cache-control': "no-cache",
       }

    response = requests.request("GET", url, headers=headers)
    c = json.loads(response.text)
    return c['list']
def test(s,cookies,bdstoken):
    #获取文件列表
    l = get_file(s,cookies)
    for i in l:
        try:
            # 判断是否是文件夹以便继续循环
            if i["isdir"] == 1:
                test(i["path"],cookies,bdstoken)
            else:
                #如果MD5值存在于liebiao中,删除、、否则将MD5加入liebiao
                if i["md5"] in lie_biao:
                    cut(i["path"],cookies,bdstoken)
                else:
                    lie_biao.append(i["md5"])
        except:
            pass

这样,便可以删除了,美中不足便是要自己拿cookie和bdstoken,还有就是空文件夹没能一次性删除

详细代码在;https://github.com/AlexTomit/baiduyu_filecut;

新手上路,望多指教!

你可能感兴趣的:(百度云重复文件删除脚本)