python3 脚本爬取今日百度热点新闻并存放到mysql数据库

目标

python3.x 脚本爬取今日百度热点新闻并存放到mysql数据库

环境准备

  1. 安装pymysql
    pip install pymysql
  2. 下载安装mysql 5.x

知识点

  1. python3.x 使用pymysql来与mysqlDB交互;可以使用 pip install pymysql 命令安装pymysql
  2. pymysql的使用: http://www.runoob.com/python3/python3-mysql.html
  3. mysql的一些用法:
    show databases;
    use test;
    show tables;
    drop tables;
  4. 格式化时间:time.strftime(‘%Y-%m-%d %H:%M:%S’,time.localtime(time.time()))
  5. 某表不存在时创建该表:
    sql = “”“CREATE TABLE IF NOT EXISTS %s (
    text VARCHAR(200),
    time VARCHAR(200),
    date VARCHAR(200))”“” % (table,)

代码

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

from urllib.error import URLError, HTTPError
import urllib.request
import re
import time
import pymysql

class GetAndStore:
    def __init__(self):
        pass

    #function
    def printHotNews(self,url):
        content = urllib.request.urlopen(url).read().decode('gbk')
        pattern = re.compile('
  • for i in hotNews: print(i) return hotNews #function can be reused def storeDB(self,table,news): #use dict store news news_date = time.strftime('%Y-%m-%d',time.localtime(time.time())) news_time = time.strftime('%Y-%m-%d %H:%M:%S',time.localtime(time.time())) #test insert only 1 record text = "'" + news[0] + "'" #Chinese character and symbol time_now = "'" + news_time + "'" date = "'" + news_date + "'" #connect mysqlDB conn = pymysql.connect( host='127.0.0.1', port=3306, user='root', passwd='root', db='test', use_unicode=1, charset='utf8') try: with conn.cursor() as cursor: #create a table sql = """CREATE TABLE IF NOT EXISTS %s ( text VARCHAR(200), time VARCHAR(200), date VARCHAR(200))""" % (table,) cursor.execute(sql) # Create a new record sql = "INSERT INTO %s (%s,%s,%s) VALUES (%s,%s,%s)" % (table, 'text', 'date', 'time', text, date, time_now) cursor.execute(sql) # connection is not autocommit by default. So you must commit to save # your changes. conn.commit() with conn.cursor() as cursor: # Read all records sql = "SELECT * FROM (%s) " %(table,) cursor.execute(sql) result = cursor.fetchall() print(result) cursor.close() finally: conn.close() if __name__ == "__main__": #variable url = 'http://news.baidu.com/' instance1 = GetAndStore() #create an instance try: response = urllib.request.urlopen(url) except HTTPError as e: # http error print('Error code: ', e.code) except URLError as e: # url error print('Reason: ', e.reason) else: # excute function instance1.printHotNews(url) instance1.storeDB("table1", instance1.printHotNews(url))
  • 脚本执行结果

    E:\github_projects\python-crawler>python 2_get_hotNews_and_store_data_in_mysqldb.py
    习近平:扶贫工作不搞层层加码
    720日的国务院常务会定了这3件大事
    《寒战23D电影引争议 特效渣渣有圈钱之嫌
    
    习近平:扶贫工作不搞层层加码
    720日的国务院常务会定了这3件大事
    《寒战23D电影引争议 特效渣渣有圈钱之嫌
    
    (('习近平:扶贫工作不搞层层加码', '2016-07-22 03:12:12', '2016-07-22'),)          

    参考:

    http://stackoverflow.com/questions/14011160/how-to-use-python-mysqldb-to-insert-many-rows-at-once

    http://www.mysqltutorial.org/python-mysql-insert/

    你可能感兴趣的:(python,爬虫)