抓包分析

抓包分析是爬虫必不可少的技能之一，常用的工具有Fiddler4，Charles, whareshark或者浏览器的debug.
什么时候需要抓包分析呢？

- APP数据的抓取，一般要结合反编译(后面有篇文章讲APP数据的抓取)
- 网页需要登录
- 复杂的抓取，比如对请求头，回复的报文头的分析，分析请求失败的原因等

POST http://www.kanzhun.com/login.json HTTP/1.1
Host: www.kanzhun.com
Proxy-Connection: keep-alive
Content-Length: 69
Accept: application/json, text/javascript, */*; q=0.01
Origin: http://www.kanzhun.com
X-Requested-With: XMLHttpRequest
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Referer: http://www.kanzhun.com/login/
Accept-Encoding: gzip, deflate
Accept-Language: zh-CN,zh;q=0.8,en;q=0.6
Cookie: W_CITY_S_V=0; ac="[email protected]"; t=EhhloI4AGnoXJMz; aliyungf_tc=AQAAAOkujCD0yw4ASSL+myvvTxkg1TH/; __c=1465718622; __g=-; __l=l=%2F&r=; __a=74808725.1465379010.1465379010.1465718622.6.2.3.6; AB_T=abvb

redirect=%2F&account=casd1%40sina.com&password=123456&remember=true

点击webforms后发现提交的表单内容为(部分内容我打了*)：

Name	Value
redirect	/
account	c_***@sina.com
password	1111****
remember	true

那么我就可以通过requests模拟提交表单，实现登录。

# -*- coding:utf-8 -*-

"""
File Name : 'test3'.py
Description:
Author: 'chengwei'
Date: '2016/5/24' '14:08'
python: 2.7.10
"""

import requests
import time
import json


def main():
    s = requests.Session()
    data = {
        "redirect": '/',
        "account": 'username',
        "password": 'passwd',
        "remember": 'true',
    }
    headers = {
        'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36',
        'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
        'Accept-Encoding': 'gzip, deflate',
        'X-Requested-With': 'XMLHttpRequest',
        'Accept': 'application/json, text/javascript, */*; q=0.01'
    }
    s.post('http://www.kanzhun.com/login.json', headers=headers, data=data)

    res = s.get('http://www.kanzhun.com/gsx3195.html?ka=com-blocker1-salary', headers=headers)
    time.sleep(1)

if __name__ == '__main__':
    main()

如果不登录，访问工资页面是看不到全部内容的，而我们通过提交表单登录后，这个session以后访问工资页面就会返回全部内容。

爬虫：6. 抓包分析

抓包分析

登录

你可能感兴趣的:(爬虫：6. 抓包分析)