各位小伙伴,之前一段时间迷上了抖音小姐姐视频,但是下载的视频都有水印,于是自己用Python 写了个爬取小姐姐视频的工具,大家可以直接拷贝到自己编译器上运行。此外,我还利用pyqt5写了个操作界面,这样可以方便不懂代码的人使用。后面代码我都会一一贴上来。备注:这个只是个人兴趣,参考代码的小伙伴切莫用于非法商业途径;
环境:python3.0 pychram/eric
url 获取方式:页面右边的【分享】按钮-》 复制链接。 提取出里面的Url就好了。
杂七杂八:Python开发者可以加我一下,多交流交流,我的微信:stefan1240。
另外这段时间闲着没事,自己弄了个淘客网站。领取天猫淘宝优惠券的,和一般的找券网不同,采集下来的天猫淘宝产品都是经过我的大数据模型筛选的,希望大家帮忙测试下,网页右上角有安卓app下载。
网站链接:http://www.0k8gj.cn/
直接上正文代码:
# -*- coding:utf-8 -*-
from splinter.driver.webdriver.chrome import Options, Chrome
from splinter.browser import Browser
from contextlib import closing
import requests, json, time, re, os, sys
class douyin():
def __init__(self):
pass
"""
视频下载
Parameters:
video_url: 带水印的视频地址
video_name: 视频名
Returns:
无
"""
def video_downloader(self, video_url, video_name=r'douyinsss.mp4'):
size = 0
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Maxthon/4.3.2.1000 Chrome/30.0.1599.101 Safari/537.36"}
try:
with closing(requests.get(video_url, headers = headers, stream=True, verify = False)) as response:
chunk_size = 1024
#print(response.text)
content_size = int(response.headers['content-length'])
if response.status_code == 200:
sys.stdout.write(' [文件大小]:%0.2f MB\n' % (content_size / chunk_size / 1024))
"""
with open(video_name, 'ab') as file:
file.write(response.content)
file.flush()
print('receive data,file size : %d total size:%d' % (os.path.getsize(video_name), content_size))
"""
with open(video_name, "wb") as file:
for data in response.iter_content(chunk_size = chunk_size):
file.write(data)
size += len(data)
file.flush()
#sys.stdout.write(' [下载进度]:%.2f%%' % float(size / content_size * 100) + '\r')
#sys.stdout.flush()
print('视频下载完了...')
except Exception as e:
print(e)
print('下载出错啦.....')
"""
视频下载地址获取
Parameters:
video_url: 带水印的视频地址
Returns:
视频下载链接,视频名字
"""
def downloadUrlGet(self, video_url):
name = ''
downloadUrl = ''
headers = {
'Proxy-Connection':'keep-alive',
'Host': 'v.douyin.com',
'Upgrade-Insecure-Requests':'1',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'zh-CN,zh;q=0.9',
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.146 Safari/537.36",
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8'
}
req = requests.get(url = video_url, headers = headers, verify = False)
newUrl = req.url
#print(req.text)
print('newUrl:%s'%newUrl)
print(req.history)
#302重定向后的请求
headers = {
'Proxy-Connection':'keep-alive',
'Host': 'www.iesdouyin.com',
'Upgrade-Insecure-Requests':'1',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'zh-CN,zh;q=0.9',
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.146 Safari/537.36",
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8'
}
req = requests.get(url = newUrl, headers = headers, verify = False)
reply = req.text
#print(reply)
p = reply.find('playAddr: "') + len('playAddr: "')
downloadUrl = reply[p: reply.find('"', p)]
print('downloadUrl:%s'%downloadUrl)
p = reply.find('"name nowrap">') + len('"name nowrap">')
name = reply[p: reply.find('<', p)]
print(name)
return downloadUrl, name
""" 开始主任务 """
url = 'http://v.douyin.com/dU2Dsn/'
handel = douyin()
downloadUrl, name = handel.downloadUrlGet(url)
handel.video_downloader(url, name)
遇到下载不通的话,自己看看代码哈。
我自己做的客户端是用Pyqt5写的,代码的话是好几个文件夹,我就不贴出来了,源代码可以找我索要。上个界面图吧: