python scrapy 模拟登录(手动登录保存cookie)

先登录网页,获取cookie,然后转化为字典,保存在settings.py中的COOKIES池中,使用中间件用cookie登录。

1、cookie,转化为字典
def cookieChangeToDict(cookie):
'''
将cookie字符串转换成字典
:param cookie: 登录后的cookie
:return:字典
'''
cookieList = cookie.split(';')
cookieDict = {}
for cookie in cookieList:
name = cookie.split('=', maxsplit=1)[0].strip()
value = cookie.split('=', maxsplit=1)[1].strip()
cookieDict[name] = value
return cookieDict

if name == 'main':
cookie = """
你的cookie
"""
print(cookieChangeToDict(cookie))

把打印出的cookie放到settings.py中自定义的COOKIES=[]中

2、使用登录后的cookie发送请求
方式一:

可以重写Spider类的start_requests方法,附带Cookie值,发送POST请求

def start_requests(self):
    url= ''
    return [scrapy.FormRequest(url, cookies = self.cookies, callback = self.parse)]

方式2:使用中间件:

from scrapy import signals
from scrapy.downloadermiddlewares.cookies import CookiesMiddleware
import random

from renren.settings import COOKIES

class RandomCookieMiddleware(CookiesMiddleware):
'''
随机cookie池
'''
def process_request(self, request, spider):
cookie = random.choice(COOKIES)
request.cookies = cookie

在settings.py中设置:

ROBOTSTXT_OBEY = False

COOKIES_ENABLED = True

启用中间件

DOWNLOADER_MIDDLEWARES = {
'renren.middlewares.RandomCookieMiddleware': 543,
}

COOKIES池

COOKIES = [
]

你可能感兴趣的:(python scrapy 模拟登录(手动登录保存cookie))