大量小姐姐的图片的爬取

先看一下截图:(爬取了21页,总共是40页)

完整代码如下:

"""
用os模块,requests,re来爬取妹子图片
url网址:http://www.zdqx.com/qingchun/
"""
import requests
import re,os,time

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36',
    'referer': 'http://www.zdqx.com/pcbz/70270.html',
    'cookie': 'Hm_lvt_303a32038183efa6d8efec90c0031b87=1581472898; Hm_lpvt_303a32038183efa6d8efec90c0031b87=1581472912'
}

def get_urls(url):
    response = requests.get(url=url, headers=headers)
    response.encoding = response.apparent_encoding
    result1 = re.findall('
(.*?)
'
, response.text, re.S) result2 = re.findall('(.*?)', str(result1), re.S) for url, title in result2: url = 'http:' + str(url) title = title.replace(r'" height="281', '') savedata(url, title) def savedata(url,title): path = '小姐姐图片' if not os.path.exists(path): os.mkdir(path) response = requests.get(url,headers=headers) response.encoding = response.apparent_encoding with open(path + '/' + title + '.jpg',mode="wb") as f: f.write(response.content) print(title+'保存成功!') f.close() if __name__ == '__main__': for page in range(2,41): if page == 1: url = 'http://www.zdqx.com/qingchun/index.html' else: url ='http://www.zdqx.com/qingchun/index_' + str(page) + '.html' get_urls(url) print('第'+str(page-1)+'采集完毕!') time.sleep(2)

Pycharm运行的截图如下:
大量小姐姐的图片的爬取_第1张图片
注意事项:

  • 爬取的网址url:http://www.zdqx.com/qingchun/
  • 大家只需要在最后的函数调用那块修改url地址就可以爬取那个网站上的其他类型的妹子图片
  • 纯小白练手,大佬勿喷!

你可能感兴趣的:(Python教程)