使用scrapy爬取京东商城中的书籍信息

爬取京东商城中所有的Python书籍的名字和价格信息

JDBookSpider代码如下:

# -*- coding: utf-8 -*-

import scrapy
from scrapy import Request
from scrapy_splash import SplashRequest


lua_script = '''
function main(splash)
    splash:go(splash.args.url)
    splash:wait(2)
    splash:runjs("document.getElementsByClassName('page')[0].scrollIntoView(true)")
    splash:wait(2)
    return splash:html()
end
'''




class JDBookSpider(scrapy.Spider):


    name = "jd_book"
    allowed_domains = ["search.jd.com"]
    base_url = 'https://search.jd.com/Search?keyword=python&enc=utf-8&book=y&wq=python'


    def start_requests(self):
        yield Request(self.base_url, callback=self.parse_urls, dont_filter=True)


    def parse_urls(self, response):
        total = int(response.css('span#J_resCount::text').extract_first().strip('+'))
        pageNum = total // 60 + (1 if total % 60 else 0)


        for i in range(pageNum):
            url = '%s&page=%s' % (self.base_url, 2*i+1)
            yield SplashRequest(url,
                                endpoint='execute',
                                args={'lua_source': lua_script},
                                cache_args=['lua_source'])


    def parse(self, response):
        for sel in response.css('ul.gl-warp.clearfix > li.gl-item'):
            yield {
                'name': sel.css('div.p-name').xpath('string(.//em)').extract_first(),
                'price': sel.css('div.p-price i::text').extract_first(),


            }



需要在配置setting中设置USER_AGENT
USER_AGENT = ‘Mozilla/5.0(X11; Linux x86_64)AppleWebKit/537.36(KHTML,like Gecko)’

你可能感兴趣的:(使用scrapy爬取京东商城中的书籍信息)