scrapy 数据存储,实则是python和mysql的交互,使用pipeline将爬取的数据入库

  • 在pipeline.py管道处
import pymysql
from BOOK import settings
"""
MYSQL_HOST = 'localhost'
MYSQL_DB_NAME = 'book'
MYSQL_PORT = 3306
MYSQL_USER = 'root'
MYSQL_PASSWORD = '5520'
"""

class BookPipeline(object):
    # 1. python与mysql建立交互关系
    def open_spider(self,spider):
        # 链接成功
        self.client = pymysql.Connect(
            host=settings.MYSQL_HOST,
            user=settings.MYSQL_USER,
            password=settings.MYSQL_PASSWORD,
            database=settings.MYSQL_DB_NAME,
            port=settings.MYSQL_PORT,
            charset='utf8',
        )
        # 创建游标对象 叫他去干活
        self.cur = self.client.cursor()

    # 2. 插入数据
    def process_item(self, item, spider):
        try:
            sql = 'insert into book_bookinfo values (%s,%s,%s,%s,%s,%s,%s,%s);'
            items_list = list(item.values())
            items_list.insert(0,0)
            print(items_list)

            # 让游标执行sql
            self.cur.execute(sql, items_list)
            # 提交sql语句
            self.client.commit()
        except Exception as e:
            # 出现错误时打印错误日志
            print(e)

        return item
    # 3.爬虫结束 断开与mysql的连接
    def close_spider(self, spider):
        self.client.close()

  • 配置mysql的信息和开启pipeline
settings.py

ITEM_PIPELINES = {
    'BOOK.pipelines.BookPipeline': 300,
}

# mysql数据库配置
MYSQL_HOST = 'localhost'
MYSQL_DB_NAME = 'book'
MYSQL_PORT = 3306
MYSQL_USER = 'root'
MYSQL_PASSWORD = 'mysql'
  • 在终端输入指令:
scrapy crawl book

scrapy 数据存储,实则是python和mysql的交互,使用pipeline将爬取的数据入库_第1张图片
之后在web端刷新页面:
scrapy 数据存储,实则是python和mysql的交互,使用pipeline将爬取的数据入库_第2张图片

你可能感兴趣的:(python)