Scrapy初探二2020-08-21

logging模块的使用

import scrapy
import logging

logger = logging.getLogger(__name__)

class QbSpider(scrapy.Spider):
    name = 'qb'
    allowed_domains = ['qiushibaike.com']
    start_urls = ['http://qiushibaike.com/']

    def parse(self, response):
        for i in range(10):
            item = {}
            item['content'] = "haha"
            # logging.warning(item)
            logger.warning(item)
            yield item

运行结果


image.png
  • settings中设置LOG_FILE = './log.log' 将错误或者警告保存在日志文件中
# 模板
import logging

logging.basicConfig(level=log_level,
                    format='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s',
                    datefmt='%a, %d %b %Y %H:%M:%S',
                    filename='parser_result.log',
                    filemode='w')


if __name__ == '__main__':

    logging.info('i am warning')

结果(这个模板比较详细)

Fri, 21 Aug 2020 13:38:22 logfile.py[line:12] INFO i am warning
pipeline文件
import logging

logger = logging.getLogger(__name__)

class MyspiderPipeline(object):
    def process_item(self, item, spider):
        # print(item)
        logger.warning(item)
        item['hello'] = 'world'
        return item

保存到本地,在setting文件中LOG_FILE = './log.log'
basicConfig样式设置
https://www.cnblogs.com/felixzh/p/6072417.html

回顾
image.png
如何翻页
image.png

腾讯爬虫案例

通过爬取腾讯招聘的页面的招聘信息,学习如何实现翻页请求
http://hr.tencent.com/position.php

创建项目
scrapy startproject tencent

创建爬虫
scrapy genspider hr tencent.com

scrapy.Request知识点

scrapy.Request(url, callback=None, method='GET', headers=None, body=None,cookies=None, meta=None, encoding='utf-8', priority=0,
dont_filter=False, errback=None, flags=None)

常用参数为:
callback:指定传入的URL交给那个解析函数去处理
meta:实现不同的解析函数中传递数据,meta默认会携带部分信息,比如下载延迟,请求深度
dont_filter:让scrapy的去重不会过滤当前URL,scrapy默认有URL去重功能,对需要重复请求的URL有重要用途

item的介绍和使用

# 定义一些我们需要查询的的字段,以防我们提取数据的时候将字段写错,所以可以提前定义好
items.py

import scrapy

class TencentItem(scrapy.Item):
    # define the fields for your item here like:
    title = scrapy.Field()
    position = scrapy.Field()
    date = scrapy.Field()

阳光政务平台案例

http://wz.sun0769.com/index.php/question/questionType?type=4&page=0

你可能感兴趣的:(Scrapy初探二2020-08-21)