python爬虫及其可视化

一、思路分析

本文采用比特币网站作为爬取目标(https://www.ibtctrade.com/),从中获取prices、CNY、市值等,然后导出所得到的数据到excel、sqlite数据中。使用pyarm中的flask框架搭建可视化平台,使用sqlite数据库的数据制作简单的网页,并制作折线图、柱状图、散点图等等。

二、数据爬取

1.引入库

代码如下:

from bs4 import BeautifulSoup
import re
import urllib.error,urllib.request
import xlwt
import sqlite3

2.获取目标网页

代码如下:

baseURL = 'https://www.ibtctrade.com/cryptocurrency/p_'  #比特币交易网的数据一共有27页,分别在此网址上加上后缀,即可实现每个网页的获取
head = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'

    }
    request = urllib.request.Request(url,headers=head)
    html = ""

    response = urllib.request.urlopen(request)
    html = response.read().decode('utf-8')


    # print(html)
    return html

该处使用的url网络请求的数据。

2.解析网页

代码如下:

findjname = re.compile(r'(.*?)')
findname = re.compile(r'(.*?)')
findnewprice = re.compile(r'
  • \n(.*?)
  • '
    ,re.S) findtwofourzhangdie = re.compile(r'
  • (.*?)
  • '
    ,re.S) # findtwofourdie = re.compile(r'
  • (.*?)
  • ',re.S)
    findcny = re.compile(r'
  • \n(.*?)
  • '
    ,re.S) findshizhi = re.compile(r'
  • \n(.*?)
  • '
    ,re.S) def getdata(baseURL): datalist = [] for i in range(1,28): url = baseURL + str(i)+'.html' html = askurl(url) soup = BeautifulSoup(html,'html.parser') for item in soup.select('.content>a'): data =[] # print(item) item = str(item) jname = re.findall(findjname,item)[0] data.append(jname) name = re.findall(findname,item)[0] data.append(name) # print(data) newprice = re.findall(findnewprice,item)[0] data.append(newprice.strip()) twofourzhangdie = re.findall(findtwofourzhangdie,item)[0] data.append(twofourzhangdie.strip()) cny = re.findall(findcny,item)[1] data.append(cny.strip()) shizhi = re.findall(findshizhi,item)[2] data.append(shizhi.strip()) datalist.append(data) # print(datalist) return datalist # print(html) return html

    使用正则表达式进行数据的筛选和清洗

    3.数据保存到excel

    代码如下:

    path = "比特币简易数据.xls"
    dbpath = "比特币.db"
        # askurl(baseURL)
    def savedata(datalist,path):
        print('正在saving·······')
        book = xlwt.Workbook(encoding='utf-8',style_compression=0)
        sheet = book.add_sheet('比特币数据',cell_overwrite_ok=True)
        col = ('简称','全称','最新价格','24H涨跌幅','24H成交额','市值')
        for i in range(0,6):
            sheet.write(0,i,col[i])
        for i in range(0,700):
            data = datalist[i]
            for j in range(0,6):
                sheet.write(i+1,j,data[j])
        book.save(path)
    

    4.数据保存到sqlite数据库

    代码如下:

    path = "比特币简易数据.xls"
    dbpath = "比特币.db"
        # askurl(baseURL)
    ef savedb(datalist,dbpath):
        init_db(dbpath)
        conn = sqlite3.connect(dbpath)
        cur = conn.cursor()
        for data in datalist:
            for i in range(len(data)):
                data[i] = '"' +data[i]+'"'
                sql = """
                    insert into bitebi750
                    (jname, name,newprice,twofourzhangdie,cny,shizhi)
                    values(%s)"""%','.join(data)
            cur.execute(sql)
            conn.commit()
        cur.close()
        conn.close()
    
    
    
    
    def init_db(dbpath):
        sql = '''
            create table bitebi750
                (id integer primary key autoincrement,
                    jname text, 
                    name text,
                    newprice text,
                    twofourzhangdie text,
                    cny text,
                    shizhi text)
                
        
        
        '''
        conn =sqlite3.connect(dbpath)
        cursor =conn.cursor()
        cursor.execute(sql)
        conn.commit()
        conn.close()
    
    

    三、基于flask框架的可视化

    app.py

    提示:这里对文章进行总结:
    在app.py中对sqlite数据库的数据进行提取处理,主要把参数,传给所需要的数据,来制作图表,每个html的代码过多,不在贴出,可根据index.html自行修改.

    from flask import Flask,render_template
    import sqlite3
    app = Flask(__name__)
    
    
    @app.route('/')
    def index():
        return render_template('index.html')
    @app.route('/shuju')
    def e():
        datalist = []
        con = sqlite3.connect("比特币.db")
        cur = con.cursor()
        sql = "select*from bitebi750"
        data = cur.execute(sql)
        for item in data:
            datalist.append(item)
        cur.close()
        con.close()
        return render_template('shuju.html',movies = datalist)
    
    @app.route('/zhangdie')
    def zhangdie():
        num = []
        sum = []
        con = sqlite3.connect("比特币.db")
        cur = con.cursor()
        sql = "select jname,twofourzhangdie from bitebi750 limit 0,70"
        data = cur.execute(sql)
        for item in data:
            num.append(str(item[0]))
            sum.append(float(item[1][:-1]))
        cur.close()
        con.close()
        return render_template("zhangdie.html",num = num ,sum = sum)
    @app.route('/wordcloud')
    def wordcloud():
        return render_template('wordcloud.html')
    
    
    @app.route('/qujian')
    def qujian():
        num = []
        sum = []
        con = sqlite3.connect("比特币.db")
        cur = con.cursor()
        sql = "select jname,newprice from bitebi750 limit 0,15"
        data = cur.execute(sql)
        for item in data:
            num.append(str(item[0]))
            sum.append(float(item[1][1:]))
        cur.close()
        con.close()
    
        return render_template('qujian.html',num = num ,sum = sum)
    
    @app.route('/sandian')
    def sandian():
        num = []
        sum = []
        yum = []
        con = sqlite3.connect("比特币.db")
        cur = con.cursor()
        sql = "select jname,twofourzhangdie,shizhi from bitebi750 limit 0,50"
        data = cur.execute(sql)
        for item in data:
            num.append(str(item[0]))
            sum.append(float(item[1][:-1]))
            yum.append(float(item[2][1:-1]))
        cur.close()
        con.close()
        return render_template('sandian.html',num = num ,sum = sum ,yum =yum)
    @app.route('/shuliang')
    def shuliang():
        q = 0
        w = 0
        e = 0
        r = 0
        t = 0
        y = 0
        u = 0
    
        sum = []
        con = sqlite3.connect("比特币.db")
        cur = con.cursor()
        sql = "select jname,shizhi from bitebi750 limit 0,204"
        data = cur.execute(sql)
        for item in data:
    
            sum.append(float(item[1][1:-1]))
        for i in sum:
            if i>500 and i<1000:
                q += 1
            elif i>100 and i<500:
                w+=1
            elif i>1 and i<100:
                e+=1
        sql = "select jname,shizhi from bitebi750 limit 204,700"
        data = cur.execute(sql)
        for item in data:
    
            sum.append(float(item[1][1:-1]))
            for i in sum:
                if i>100 and i<=1000:
                    r+=1
                elif i>1000 and i<9999:
                    y+=1
                elif i > 1 and i < 10:
                    t+=1
                elif i > 10 and i < 100:
                    u+=1
    
    
        cur.close()
        con.close()
        return render_template("shuliang.html",q=q,w=w,e=e,r=r,t=t,y=y,u=u)
    if __name__ == '__main__':
        app.run()
    
    
    

    index.html

    Mamba Bootstrap Template - Index
    
    
    
    
    
    
    
    
    
    
    

    Services

    数据总览

    共整合了741条数据供分析

    各币种涨跌幅情况

    跟着政策走,永远不回头

    市值区间币种数量

    肯定还是正太分布了

    最具竞争力的币种

    看看那个最厉害

    热门币种市值与涨跌幅关系

    只要热门肯定就会涨的啦

    币名词云

    猜猜那个词是最大的

    Our Portfolio

    Magnam dolores commodi suscipit. Necessitatibus eius consequatur ex aliquid fuga eum quidem. Sit sint consectetur velit. Quisquam quos quisquam cupiditate. Et nemo qui impedit suscipit alias ea. Quia fugiat sit in iste officiis commodi quidem hic quas.

    • All
    • App
    • Card
    • Web

    App 1

    App

    Web 3

    Web

    App 2

    App

    Card 2

    Card

    Web 2

    Web

    App 3

    App

    Card 1

    Card

    Card 3

    Card

    Web 3

    Web

    Our Team

    Magnam dolores commodi suscipit. Necessitatibus eius consequatur ex aliquid fuga eum quidem.

    xiangbo zhu

    队长

    Amanda Jepson

    贴身妹子

    Contact Us

    Address

    江大长山校区文理大楼数据分析实验室

    Email Us

    869676614.com
    10086.com

    Call Us

    17836925032
    17851006312

      
    Designed by BootstrapMade

    qujian.html

    其余部分不再显示,只显示主要部分

        

    比特币数据展示

    shuju.html

    {% for movie in movies %} {% endfor %}
    排名 简称 全称 当前价格 24小时涨跌幅 交易额 市值
    {{ movie[0] }} {{ movie[1] }} {{ movie[2] }} {{ movie[3] }} {{ movie[4] }} {{ movie[5] }} {{ movie[6] }}

    shuliang.html

    比特币数据展示

    zhangdie.html

        

    比特币数据展示

    wordcloud.html

        

    词云

    采用比特币的名称来制作图云,当中network, 币,比特,coin,chain等词出现的频率很高,说明了比特币的命名与本身所包含的意义相关

    源代码可到微信公众号"一团追梦喵"回复"python爬虫及其可视化"获取

    你可能感兴趣的:(sqlite,python,爬虫,flask)