Python爬虫-Scrapy框架之Scrapy模拟登陆

  发送POST请求:有时候我们想要在请求数据的时候发送POST请求,那么这时候需要使用Request的子类FromRequest来实现,如果想要在爬虫一开始的时候就发送POST请求,那么需要在爬虫类中重写start_request(self)方法,并且不再调用start_urls里的url。

1、创建项目

D:\学习笔记\Python学习\Python_Crawler>scrapy startproject renrenLogin
New Scrapy project 'renrenLogin', using template directory 'c:\python38\lib\site-packages\scrapy\templates\project', created in:
    D:\学习笔记\Python学习\Python_Crawler\renrenLogin

You can start your first spider with:
    cd renrenLogin
    scrapy genspider example example.com

2、创建爬虫

D:\学习笔记\Python学习\Python_Crawler>cd renrenLogin
D:\学习笔记\Python学习\Python_Crawler\renrenLogin>scrapy genspider renren "renren.com"
Created spider 'renren' using template 'basic' in module:
  renrenLogin.spiders.renren

3、代码实现

  A)settings.py文件配置:

ROBOTSTXT_OBEY = False

DOWNLOAD_DELAY = 1

DEFAULT_REQUEST_HEADERS = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'Accept-Language': 'en',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.9 Safari/537.36',
}

  B)start.py文件如下:

from scrapy import cmdline
cmdline.execute("scrapy crawl renren".split())

  C)renren.py文件如下:

# -*- coding: utf-8 -*-
import scrapy


class RenrenSpider(scrapy.Spider):
    name = 'renren'
    allowed_domains = ['renren.com']
    start_urls = ['http://renren.com/']

    def start_requests(self):
        url = "http://www.renren.com/PLogin.do"
        data = {"email": "[email protected]", "password": "1qaz@WSX"}
        request = scrapy.FormRequest(url, formdata=data,callback=self.parse_page)
        yield request

    def parse_page(self, response):
        # with open('renren.html', 'w', encoding='utf-8') as fp:
        #     fp.write(response.text)
        request = scrapy.Request(url="http://www.renren.com/880151247/profile", callback=self.parse_profile)
        yield request

    def parse_profile(self, response):
        with open('dpProfile.html', 'w', encoding='utf-8') as fp:
            fp.write(response.text)

4、说明:

  1)想要发送post请求,那么推荐使用“scrapyFormRequest”方法,可以方便的指定表单数据;
  2)如果想要的爬虫一开始的时候就发送post请求,那么应该重写start_requests方法,在这个方法中,发送post请求。

你可能感兴趣的:(Python爬虫-Scrapy框架之Scrapy模拟登陆)