python项目之 抓取动态网页 抓取路由器客户

python项目之 抓取动态网页 抓取路由器客户

前身

前面有一片文章写得是爬取路由器的客户,使用模拟浏览器登录的方式得到的。

python项目之 路由器抓取器
地址为:http://blog.csdn.net/lyffly2011/article/details/50485398

改进

在学习完前端设计的知识后,意识到可以通过HTTP请求,直接得到动态的数据。

实现思路

  1. 打开浏览器调试功能,F12
  2. 分析浏览器数据流量的XHR,得到请求网址和数据
  3. 模拟请求,得到结果,进行解析
    其余和之前类似。

注意点为:cookie,http post中的payload,传送字符串的换行。

具体代码为:

# coding : utf-8
####################################################
# coding by 刘云飞
####################################################

import os
import time
import datetime
import requests
import base64

# 此处天填上自己的路由器密码
MIMA = b'xxxxxxxx'
encode_MIMA = base64.b64encode(MIMA).decode("utf-8")
cookie = 'Authorization=Basic ' + encode_MIMA

headers1 = {
    'Accept': '*/*',
    'Accept-Encoding': 'gzip, deflate,sdch',
    'Accept-Language': 'zh-CN,zh;q=0.8',
    'Connection': 'keep-alive',
    'Content-Type': 'text/plain',
    'Cookie': cookie,
    'Host': '192.168.11.1',
    'Referer': 'http://192.168.11.1/',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36\
                  (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36',
}

headers = {
    'Accept': '*/*',
    'Accept-Encoding': 'gzip, deflate',
    'Accept-Language': 'zh-CN,zh;q=0.8',
    'Connection': 'keep-alive',
    'Content-Length': '98',
    'Content-Type': 'text/plain',
    'Cookie': cookie,
    'Host': '192.168.11.1',
    'Origin': 'http://192.168.11.1',
    'Referer': 'http://192.168.11.1/',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36\
                  (KHTML, like Gecko) Chrome/49.0.2623.75 Safari/537.36',
}

url_dhcpclient = 'http://192.168.11.1/main/dhcpClient.htm?_='

session = requests.Session()

time_ms = str(int(time.time() // 1)) + '012'
url_dhcpclient += time_ms
res1 = session.get(url_dhcpclient, headers=headers1)
print(res1.status_code)
params = '[LAN_HOST_ENTRY#0,0,0,0,0,0#0,0,0,0,0,0]0,4\r\nleaseTimeRemaining\r\nMACAddress\r\nhostName\r\nIPAddress\r\n'
url = 'http://192.168.11.1/cgi?5'
res = session.post(url=url, data=params, headers=headers)
# print(res.request.headers)
# print(res.url)
# print(res.status_code)
# print(res.headers)
print(res.text)

now = datetime.datetime.now()
str_time = now.strftime("%Y_%m_%d_%H_%M_%S")
text = res.text
filename = str_time + '.txt'
with open(filename, 'w+') as f:
    f.write(text)

你可能感兴趣的:(python项目,爬虫项目)