scarpy使用遇到的坑,总结

除了抓取第一页外,抓取第2,3等下一页

参考:Python + Scrapy 抓取豆瓣电影 top 250
http://www.jianshu.com/p/62e0a588ee0d

    # 翻页
    next_page = response.xpath('//span[@class="next"]/a/@href')
    if next_page:
      url = response.urljoin(next_page[0].extract())
      yield scrapy.Request(url, self.parse)

如果下一页是js生成的,可以使用scrapy+selenium(慢)

参考:
selenium with scrapy for dynamic page
http://stackoverflow.com/questions/17975471/selenium-with-scrapy-for-dynamic-page

如果下一页是js生成的,可以使用ScrapyJS

Scraping dynamic content using python-Scrapy
http://stackoverflow.com/questions/30345623/scraping-dynamic-content-using-python-scrapy
Scrapy爬虫中使用Splash处理页面JS
http://ae.yyuap.com/pages/viewpage.action?pageId=919763

你可能感兴趣的:(python,python)