创建存储数据表
首先您应该确定您的计算机上已经安装了 MySQL 数据库,然后再进行如下操作:
mysql -h127.0.0.1 -uroot -p123456
create database maoyandb charset utf8;
use maoyandb;
create table filmtab(
name varchar(100),
star varchar(400),
time varchar(30)
);
Pymysql基本使用
import pymysql
#创建对象
db = pymysql.connect(‘localhost’,‘root’,‘123456’,‘maoyandb’)
cursor = db.cursor()
info_list = [‘’,‘2021-2-12’]
sql = ‘insert into movieinfo values(%s,%s,%s)’
#列表传参
cursor.execute(sql,info_list)
db.commit()
cursor.close()
db.close()
查询数据结果,如下所示:
mysql> select * from movieinfo;
±------------±------------------±----------+
| name | star | time |
±------------±------------------±----------+
|
±------------±------------------±----------+
1 rows in set (0.01 sec)
还有一种效率较高的方法,使用 executemany() 可以同时插入多条数据。示例如下:
db = pymysql.connect(‘localhost’,‘root’,‘123456’,‘maoyandb’,charset=‘utf8’)
cursor = db.cursor()
info_list = [(‘我不是药神’,‘徐峥’,‘2018-07-05’),(‘你好,李焕英’,‘贾玲’,‘2021-02-12’)]
sql = ‘insert into movieinfo values(%s,%s,%s)’
cursor.executemany(sql,info_list)
db.commit()
cursor.close()
db.close()
查询插入结果,如下所示:
mysql> select * from movieinfo;
±------------±------------------±-----------+
| name | star | time |
±------------±------------------±-----------+
2 rows in set (0.01 sec)
修改爬虫程序
from urllib import request
import re
import time
import random
from ua_info import ua_list
import pymysql
class MaoyanSpider(object):
def init(self):
#初始化属性对象
self.url = ‘https://maoyan.com/board/4?offset={}’
#数据库连接对象
self.db = pymysql.connect(
‘localhost’,‘root’,‘123456’,‘maoyandb’,charset=‘utf8’)
#创建游标对象
self.cursor = self.db.cursor()
def get_html(self,url):
headers = {‘User-Agent’:random.choice(ua_list)}
req = request.Request(url=url,headers=headers)
res = request.urlopen(req)
html = res.read().decode()
# 直接解析
self.parse_html(html)
def parse_html(self,html):
re_bds = ‘
(.?)
. ?class=“releasetime”>(.?)