Python随记(28)爬取碧蓝航线的立绘(狗头)

碧蓝的舰娘们好漂亮啊。。。。。不如全部爬下来吧。。主要是为了学习 收藏 (狗头)
当然作为萌新方法可能不是很好,,,
Python随记(28)爬取碧蓝航线的立绘(狗头)_第1张图片

import requests
from lxml import etree
from urllib import request
import os

os.chdir(r'C:\Users\NERO\Desktop\blhx')
url = 'https://wiki.biligame.com/blhx/%E8%88%B0%E5%A8%98%E5%9B%BE%E9%89%B4'
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}

def get_urls(url):
    resp = requests.get(url,headers=headers)
    text = resp.text
    html = etree.HTML(text)
    detail_urls = []
    for urls in html.xpath('//div[@style="position:relative;display:inline-block;overflow:hidden;border-radius:5px"]/a/@href'):
        detail_urls.append(urls)
    return detail_urls

def get_data(url):
    detail_url = 'https://wiki.biligame.com/' + str(url)
    resp = requests.get(detail_url,headers=headers)
    text = resp.text
    html = etree.HTML(text)
    name = html.xpath('//div[@class="Contentbox2"]//div/img/@alt')
    jpg = html.xpath('//div[@class="Contentbox2"]//div/img/@src')
    data_dict = dict(zip(name,jpg))     # 将俩个列表搞成字典。。
    for each in data_dict:
        request.urlretrieve(data_dict[each],each)   # 转换成图片



def main():
        detail_urls = get_urls(url)
        for each in detail_urls:
            get_data(each)


if __name__ == '__main__':
    main()

那个将列表搞成字典,我想了好几个小时。。。。最后还是百度。。基础不牢地动山摇 哭。。。。

附上上过程中学到的:

  1. 在字符串中提取数据,返回新的字典
    在后期的爬虫课程中,我们需要获取cookies并以字典的形式传参,如果cookies是字符串则需要转换为字典,经典代码案例如下:
cookies = "anonymid=jy0ui55o-u6f6zd; depovince=GW; _r01_=1; JSESSIONID=abcMktGLRGjLtdhBk7OVw; ick_login=a9b557b8-8138-4e9d-8601-de7b2a633f80; _ga=GA1.2.1307141854.1562980962; _gid=GA1.2.201589596.1562980962; _c1=-100; first_login_flag=1; ln_uact=18323008898; ln_hurl=http://head.xiaonei.com/photos/0/0/men_main.gif; jebe_key=88f1340c-592c-4dd6-a738-128a76559f45%7Cad33b3c730fcdc8df220648f0893e840%7C1562981108370%7C1%7C1562981106763; jebe_key=88f1340c-592c-4dd6-a738-128a76559f45%7Cad33b3c730fcdc8df220648f0893e840%7C1562981108370%7C1%7C1562981106765; jebecookies=793eb32e-92c6-470d-b9d0-5f924c335d30|||||; _de=E77807CE44886E0134ABF27E72CFD74F; p=a00d65b1f779614cd242dc719e24c73e0; t=292ba8729a4151c1a357e176d8d91bff0; societyguester=292ba8729a4151c1a357e176d8d91bff0; id=969937120; xnsid=1700b2cc; ver=7.0; loginfrom=null; wp_fold=0"

# 字典推导式
cookies = {cookie.split("=")[0]:cookie.split("=")[1] for cookie in cookies.split("; ")}
print(cookies)

{'anonymid': 'jy0ui55o-u6f6zd', 'depovince': 'GW', '_r01_': '1', 'JSESSIONID': 'abcMktGLRGjLtdhBk7OVw', 。。。。。。
  1. 将列表组合成字典
方法一:
list1 = ['k1','k2','k3']
list2 = ['v1','v2','v3']
dic = dict(map(lambda x,y:[x,y],list1,list2))

>>> print(dic)
{'k3': 'v3', 'k2': 'v2', 'k1': 'v1'}

方法二:
>>> dict(zip(list1,list2))
{'k3': 'v3', 'k2': 'v2', 'k1': 'v1'}

>>> {v:k for k,v in x.items()}            #反过来 将字典中的v和k调

你可能感兴趣的:(爬虫,python)