python 读取txt数据,处理后插入到mysql

背景:

  • 用python尝试写爬某站的数据,获取后存进了txt文件里,格式是: uid__昵称__性别__粉丝数__签名;
  • 后来想着把这些数据插进mysql,方便数据处理。于是就有了以下处理。

步骤:

  • 数据库手工建单表:id自增、主键,新增一列is_followed(1/0)标识bool字段,粉丝数int,其他varchar
  • wb_db.py文件中封装mysql类:
#encoding=utf-8
import pymysql

class Mysql(object):
    def __init__(self):
        self.__db_host = "192.168.1.8"
        self.__db_port = 3306
        self.__db_user = "ys"
        self.__db_psw  = ""
        self.__db_database = "my"

    def db_connection(self):
        self.__connection = pymysql.connect(
                                            host=self.__db_host,
                                            port=self.__db_port,
                                            user=self.__db_user,
                                            passwd=self.__db_psw,
                                            db=self.__db_database,
                                            charset='utf8'
                                            )
        # self.
        self.__connection.autocommit(True)
    def execute_no_query(self,command_text,parameters=None):
        effectRow = 0
        try:
            self.db_connection()
            cursor = self.__connection.cursor()
            effectRow = cursor.execute(command_text,parameters)
        except Exception as e:
            print(e)
        finally:
            cursor.close()
            return effectRow

    def db_close(self):
        if hasattr(self, 'conn') and self.__conn:
            self.__conn.close()
  •  wb.py文件中处理如下:
  1. 从本地txt文件逐行读取
  2. 拆分行,并插入到数据库
# coding: utf-8
import requests,sys,os,time,json
from wb_db import Mysql

file_path =os.path.join(sys.path[0],"test.txt")

file=open(file_path,'r',encoding='utf-8')
for line in file.readlines():
    uid="null"
    screen_name="null"
    gender="null"
    fans_no=0
    description="null"
    is_followed = "0"
    if line:
        list=[]
        list=line.encode('utf-8').decode('utf-8-sig').strip('\n').split("________")
        uid = list[0] if len(list[0])>0 else "null"
        screen_name = list[1] if len(list[1])>0 else "null"
        gender = list[2] if list[2] in ['f','m'] else 'n'
        fans_no = int(list[3]) if type(int(list[3])).__name__=='int' else 0
        description = list[4] if len(list[4])>0 else "null"
        sql = """
            insert into wbuser (uid,screen_name,gender,fans_no,description,is_followed) VALUES ('%s','%s','%s','%d','%s','%s')"""

        try:
            mysql = Mysql()
            mysql.execute_no_query(sql % (uid,screen_name,gender,fans_no,description,is_followed))
        except Exception as e:
            print(e)
        finally:
            mysql.db_close()
    else:
        break
  • 遇到的问题总结:
  1. 写入txt为utf8,逐行读取后开头有 \ufeeff,字符编码问题,处理:读取时把行line.encode('utf-8').decode('utf-8-sig') 
  2. 写入txt时加了换行,读取后末尾有 \n,处理:line.encode('utf-8').decode('utf-8-sig').strip('\n')
  3. sql insert语句values后的占位符需要加引号。
---END---

你可能感兴趣的:(mysql,python)