pip install scrapy
如果什么库版本太低就升级对应的库,例如
pip install --upgrade cryptography
pip install --upgrade zope-interface
完成后如图所示
创建一个文件夹fund,创建项目ifund
scrapy startproject ifund
可以看到ifund内的结构
items.py:定义爬虫程序的数据模型
middlewares.py:定义数据模型中的中间件
pipelines.py:管道文件,负责对爬虫返回数据的处理
settings.py:爬虫程序设置,主要是一些优先级设置,优先级越高,值越小
scrapy.cfg:内容为scrapy的基础配置
————————————————
原文链接:https://blog.csdn.net/qq_42543250/article/details/81347368
名为ifund_spider, 限制域名在https://fund.eastmoney.com/
scrapy genspider ifund_spider https://fund.eastmoney.com/
例如,基金名称,基金代码,当日净值等
# Define here the models for your scraped items
#
# See documentation in:
# https://docs.scrapy.org/en/latest/topics/items.html
import scrapy
class IfundItem(scrapy.Item):
fund_name = scrapy.Field()
fund_code = scrapy.Field()
neat_worth = scrapy.Field()
pass
xpath helper安装方法
https://blog.csdn.net/qq1773304209/article/details/105856782
xpath 基础规则
https://blog.csdn.net/seanblog/article/details/78785402
import scrapy
from ifund.items import IfundItem
class IfundSpiderSpider(scrapy.Spider):
name = 'ifund_spider'
allowed_domains = ['https://fund.eastmoney.com/']
start_urls = ['http://fund.eastmoney.com/000001.html']
def parse(self, response):
# deal with IfundItem["fund_name"] IfundItem["fund_code"] IfundItem["neat_worth"]
item = IfundItem()
item["fund_name"] = response.xpath('//div[@class="fundDetail-tit"]//div[1]/text()').extract_first()
item["fund_code"] = response.xpath('//div[@class="fundDetail-tit"]//div[1]//span[@class="ui-num"]/text()').extract_first()
item["neat_worth"] = response.xpath('//dl[2]//dd[@class="dataNums"]//span[1]/text()').extract_first()
yield(item)
将其保存到一个txt中.
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://docs.scrapy.org/en/latest/topics/item-pipeline.html
# useful for handling different item types with a single interface
from itemadapter import ItemAdapter
class IfundPipeline:
def process_item(self, item, spider):
with open("test_code.txt", 'w') as f:
f.write(item["fund_name"])
f.write("\n")
f.write(item["fund_code"])
f.write("\n")
f.write(item["neat_worth"])
return item
7.在终端中运行爬虫
运行完成后,看到当前文件夹下多了一个test_code.txt
打开看看,果不其然,里面记录了所需的内容.
参考:
https://blog.csdn.net/qq_30108237/article/details/105878106
https://blog.csdn.net/seanblog/article/details/78885914