爬虫-猫眼电影票房

背景

最近也不知道咋了,一直遇到 字体反爬手段,起点中文网,抖音等等吧,猫眼我一直想搞,只是没有精力了,前面搞了2个了,不差这一个。搞完这个,不在搞字体反爬了。

目标网站 猫眼票房:

https://piaofang.maoyan.com/?ver=normal
爬虫-猫眼电影票房_第1张图片
就这个鬼
看源码:
在这里插入图片描述
爬虫-猫眼电影票房_第2张图片
这不和抖音 起点一样,窃喜.jpg
那就查找字体 的url 或者文件 。这个网页没有,好像详情页是有url的 woff结尾的,这个之间写在网页源码了:
在这里插入图片描述
是base64
分析完毕,撸代码:

font 和以下的1234567890 是一一对应的,它边 1234567890的对应关系也得边,所以这个找一次即可.这个也是可以用的,童叟无欺, 费了老劲了才找全;
核心的代码就是这个’online_fonts.getGlyphSet()._glyphs.glyphs’。

# 获取对应关系
def get_dict():
    font = "d09GRgABAAAAAAgcAAsAAAAAC7gAAQAAAAAAAAAAAAAAAAAAAAAAAAAAAABHU1VCAAABCAAAADMAAABCsP6z7U9TLzIAAAE8AAAARAAAAFZW7ld+Y21hcAAAAYAAAAC6AAACTDNal69nbHlmAAACPAAAA5AAAAQ0l9+jTWhlYWQAAAXMAAAALwAAADYSf7X+aGhlYQAABfwAAAAcAAAAJAeKAzlobXR4AAAGGAAAABIAAAAwGhwAAGxvY2EAAAYsAAAAGgAAABoGLgUubWF4cAAABkgAAAAfAAAAIAEZADxuYW1lAAAGaAAAAVcAAAKFkAhoC3Bvc3QAAAfAAAAAWgAAAI/mSOW8eJxjYGRgYOBikGPQYWB0cfMJYeBgYGGAAJAMY05meiJQDMoDyrGAaQ4gZoOIAgCKIwNPAHicY2Bk0mWcwMDKwMHUyXSGgYGhH0IzvmYwYuRgYGBiYGVmwAoC0lxTGBwYKn6wM+v812GIYdZhuAIUZgTJAQDX7QsReJzFkbENgzAQRb8DgQRSuPQAlFmFfZggDW0mSZUlGMISokBILpAlGkS+OZpI0CZnPUv3bd2d7gM4A4jIncSAekMhxIuqWvUI2arHeDA30FQuqKyxvvVd05fD7LQrxnpKl4U/jl/2QrHi3gkvV06XsVuMFAl7npBTTg4q/SDU/1p/x229n1vGraDa4IjWCNwfrBeCz60Xgp9dIwTv+1II/g+zwI3DaYG7hysEuoCxFugHplRA/gGlP0OcAAB4nD1Tz2/aVhx/z1R26lBCho0LaQEDsQ0kwfEvAjhAcaDNT0YChJCWhqilNFvbLGq6tI22lv2Q2ml/QHuptMMu1Q69d9K0nrZOWw77Aybtutsq9RLBnoHFt/ee/P38/AIIQPcfIAEKYADEZJryUAJAHzp138Fj7A/04gXAocRSUJYYJ+OkKZywwYCf52KUU9LsPOcn8LDL3VrZS56z2622seuFG3q+VnywFhYeBidho72wUtoMZ/Rb6Sa/srZQffvq7j7cSibkLADQBIPvEU4QgHGaRTgWBBXTFC7gxwk+BaUBImGzEPB9hx8mx4Q4lyjQoUU9vQRrpw9+P2AjlCEKEvPBUKnk9biiUdUnLpyfuT6/kCebN/fKk8sSkxbYybPMGfA/5j7CtALABkbRbFUzQWW4X/W1hPmZMWE4joke3V72Sy6R6fuB/jnGfgMkQDNYlVWhPCrTAZoftUCj8yvMX2o0qn+9LMKjjlh8eYzufjzxsYOwfGACTeB4pIsw9dCmoUib6SWnKjGtZy+kPOhaUxXOj8PnVjqohH1hxnrGtymvHyauZW8/XTI+KWuqtfOMz3FasXCvhDkVZpzxxs+vadNT7aZxd/bF66P6qjhV6rydKEdqy/PrlT4PDCAeARBFSZsoSHEKzkKFxwm8xwFR8MA+I57jYS8CmmJQyt8M62I4ydtwArqiE7GNB59vz+3ryXuFsqKRsLU6k6yEwvcLP+jqeEp1a2NDp/Cw2/1o59ZXi9+2n35XnoqWYXJpo76SD0XWQT+D7r+wi/hEBmw0pWdNjOmp74Wv9UzxQJS/ycskybdHLmqpMh/S3UHSFt9Ia/IcWbXHE6WENK1K0+mLT1pXD0//spitHPICuQyTs2I6lR2pRafdZ6tbi86Ry/krX+zWwEkPutgb4EANV1kaNQwnAmb7zDZE4VHAmJMdrqFNOGr3Jj0ZFrtdzgUb9x9mah+Fm/rBnfhlDo2wnHhr7sokmmV6aWbbp43MRGe0LbJk9tqPWyi0R0hx//Tq493XezvZXPvPC5m8mFXEAGs0L5zzj/tDPpkOlT4rwi+FnQ9v3llqCc6r2SuHKb2Rr3+vpH3eupHpPOFzlIOm+EerxYGv77BT2M/m1g587ZvpYGmWGHTOzBsl/DU5r2WqFSNiUGs5eK3zN++bC9Qfx3Ofbs+mht7kstvPKpyXhLuln5zM4xtbl9a1mRr4D3C64MJ4nGNgZGBgAOKQyuTT8fw2Xxm4WRhA4PoGS2UE/f8NCwPTeSCXg4EJJAoAIT0KPAB4nGNgZGBg1vmvwxDDwgACQJKRARXwAAAzYgHNeJxjYQCCFAYGJh3iMAA3jAI1AAAAAAAAAAwAQAB6AJQAsAD0ATwBfgGiAegCGgAAeJxjYGRgYOBhMGBgZgABJiDmAkIGhv9gPgMADoMBVgB4nGWRu27CQBRExzzyAClCiZQmirRN0hDMQ6lQOiQoI1HQG7MGI7+0XpBIlw/Id+UT0qXLJ6TPYK4bxyvvnjszd30lA7jGNxycnnu+J3ZwwerENZzjQbhO/Um4QX4WbqKNF+Ez6jPhFrp4FW7jBm+8wWlcshrjQ9hBB5/CNVzhS7hO/Ue4Qf4VbuLWaQqfoePcCbewcLrCbTw67y2lJkZ7Vq/U8qCCNLE93zMm1IZO6KfJUZrr9S7yTFmW50KbPEwTNXQHpTTTiTblbfl+PbI2UIFJYzWlq6MoVZlJt9q37sbabNzvB6K7fhpzPMU1gYGGB8t9xXqJA/cAKRJqPfj0DFdI30hPSPXol6k5vTV2iIps1a3Wi+KmnPqxVhjCxeBfasZUUiSrs+XY82sjqpbp46yGPTFpKr2ak0RkhazwtlR86i42RVfGn93nCip5t5gh/gPYnXLBAHicbcpLEkAwEATQ6fiEiLskBNkS5i42dqocX8ls9eZVdTcpkhj6j4VCgRIVamg0aGHQwaInPPq+Th7j9nnMac+uQfQ8Zdm7bGLpeQiy+5gN8uPoFqIXKTcXwQAA"
    fontdata = base64.b64decode(font)
    file = open('./1.woff', 'wb')
    file.write(fontdata)
    file.close()
    path = os.path.dirname(os.path.realpath(__file__))
    online_fonts = TTFont(path + '/1.woff')
    # online_fonts.saveXML("text.xml")
    font_dict = dict()

    base_num = {
        "uniE6CD": "2",
        "uniE1F5": "4",
        "uniEF24": "3",
        "uniEA4D": "1",
        "uniF807": "5",
        "uniEF10": "6",
        "uniE118": "7",
        "uniE4F5": "8",
        "uniECFD": "9",
        "uniF38B": "0"
    }
    _data = online_fonts.getGlyphSet()._glyphs.glyphs
    for k, v in base_num.items():
        font_dict[_data[k].data] = v
    return font_dict

结果:

requests:
爬虫-猫眼电影票房_第3张图片
scrapy:
爬虫-猫眼电影票房_第4张图片

你可能感兴趣的:(python,爬虫)