python爬虫(爬取王者荣耀英雄图片)

爬取王者荣耀全英雄头像和全英雄皮肤图片

英雄信息分析

在页面加载过程中,我们按F12进入开发者工具,由于页面需要加载英雄信息,我们可以获取全部英雄的信息,我们看到Network可以看到加载的东西

python爬虫(爬取王者荣耀英雄图片)_第1张图片

其中herolist.json是英雄的信息

其中一个英雄的信息如下:

{
    "ename":105,
    "cname":"廉颇",
    "title":"正义爆轰",
    "new_type":0,
    "hero_type":3,
    "skin_name":"正义爆轰|地狱岩魂"
}

url分析

访问王者荣耀官网,在https://pvp.qq.com/web201605/herolist.shtml页面可以查看到所有的信息,我们进入开发者模式,查看每张图片url的形式,我们可以看到,这个页面是JS动态生成的,如果我们单纯获取html页面是获取不到图片的url的,所以我们要通过获取英雄信息来自己构造url来下载图片,下面我们给出一个英雄图片的url,并对url进行分析:

python爬虫(爬取王者荣耀英雄图片)_第2张图片

https//game.gtimg.cn/images/yxzj/img201606/heroimg/511/511.jpg

https//game.gtimg.cn/images/yxzj/img201606/heroimg是固定的形式,而511是什么呢?与上一步对照,我们发现这是英雄的ename,那么我们可以得出以下格式:

https//game.gtimg.cn/images/yxzj/img201606/heroimg + ename + ename + .jpg

图片下载

首先我们获取英雄的信息

# url to get html, include every hero's info
url = 'https://pvp.qq.com/web201605/js/herolist.json'

if __name__ == '__main__':
    html = requests.get(url)

然后通过获取每个英雄的cname来构造url,从而下载图片

# path to get image of each hero
imagePath = "https://game.gtimg.cn/images/yxzj/img201606/heroimg/"

if __name__ == '__main__':
    ...
    newUrl = imagePath + str(id) + "/" + str(id) + ".jpg"
    image = requests.get(newUrl).content
    with open(path + name + ".jpg", 'wb') as f:
        f.write(image)
        print("New writing " + name + ' into file...')

完整代码如下

import requests
# url to get html, include every hero's info
url = 'https://pvp.qq.com/web201605/js/herolist.json'
# path to store image
path = 'data/hero/'
# path to get image of each hero
imagePath = "https://game.gtimg.cn/images/yxzj/img201606/heroimg/"
if __name__ == '__main__':
    html = requests.get(url)
    for i in range(len(html.json())):
        id = html.json()[i]['ename']
        name = html.json()[i]['cname']

        skinName = html.json()[i]['skin_name'].split('|')

        newUrl = imagePath + str(id) + "/" + str(id) + ".jpg"
        image = requests.get(newUrl).content
        with open(path + name + ".jpg", 'wb') as f:
            f.write(image)
            print("New writing " + name + ' into file...')


同理,获取全英雄皮肤的代码如下

import requests
import os
# url to get html, include every hero's info
url = 'https://pvp.qq.com/web201605/js/herolist.json'
# path to store image
path = 'data/skin/'
# path to get image of each hero
imagePath = "https://game.gtimg.cn/images/yxzj/img201606/heroimg/"

if __name__ == '__main__':
    html = requests.get(url)
    for i in range(len(html.json())):
        id = html.json()[i]['ename']
        name = html.json()[i]['cname']
        skinName = html.json()[i]['skin_name'].split('|')

        dir = path + name
        if not os.path.exists(dir):
            os.mkdir(dir)

        l = len(skinName)
        for _ in range(l):
            newUrl = imagePath + str(id) + "/" + str(id) + "-smallskin-" + str(_ + 1) + ".jpg"
            image = requests.get(newUrl).content
            with open(dir + "/" + skinName[_] + ".jpg", 'wb') as f:
                f.write(image)
                print("New writing " + skinName[_] + ' into file ' + dir)

## [更多技术博客](https://vilin.club/):https://vilin.club/

你可能感兴趣的:(python,爬虫)