用scrapy写爬虫(四)利用Flask和ECharts进行数据可视化

前几篇文章,我们对慕课网的课程进行了爬取,本文就对数据进行统计和可视化,让这些数据更直观的展现出来。

介绍

Flask 是基于Python的非常流行的框架之一,主要用于web开发,适合开发中小型项目,易于扩展。Flask的官方网站是 http://flask.pocoo.org/ 。

Echarts (http://echarts.baidu.com/ )是百度出品的,基于Canvas的,纯Javascript 的图表库,提供直观,生动,可交互,可个性化定制的数据可视化图表。创新的拖拽重计算、数据视图、值域漫游等特性大大增强了用户体验,赋予了用户对数据进行挖掘、整合的能力。

搭建Flaskweb项目

安装必要的依赖库

pip install Flask
pip install PyMySQL

web项目目录结构如下:

├── web│   
│   ├── static
│   │   └── js
│   │       ├── dark.js
│   │       └── echarts.min.js
│   ├── templates
│   │   └── index.html
│   ├── __init__.py
│   └── views.py
├── runserver.py

其中runserver.py为项目启动文件:

#!/usr/bin/python

# -*- coding: utf-8 -*-

from web import app

if __name__ == '__main__':
    app.run(host='0.0.0.0', debug=True)

__init__.py是项目的主文件

# -*- coding: utf-8 -*-

from flask import Flask
app = Flask(__name__)

import web.views

views.py 为视图函数:

# -*- coding: utf-8 -*-

import contextlib

import pymysql
from flask import jsonify, make_response, render_template, request

from web import app


# 数据库连接
# 定义上下文管理器,连接后自动关闭连接
@contextlib.contextmanager
def mysql(host='127.0.0.1',
          port=3306,
          user='root',
          passwd='abc-123',
          db='demo_db',
          charset='utf8'):
    conn = pymysql.connect(
        host=host, port=port, user=user, passwd=passwd, db=db, charset=charset)
    cursor = conn.cursor(cursor=pymysql.cursors.DictCursor)
    try:
        yield cursor
    finally:
        conn.commit()
        cursor.close()
        conn.close()


# 首页
@app.route('/')
def hello_world():
    return render_template('index.html')


# 每个课程类型的课程数
@app.route('/api/type')
def api_type():
    with mysql() as cursor:
        cursor.execute(
            "SELECT type as name,count(id) as value from imooc_courses GROUP BY type"
        )
        return json_success(cursor.fetchall())


# 每个学习方向的课程数
@app.route('/api/cate')
def api_cate():
    with mysql() as cursor:
        cursor.execute(
            "SELECT cate as name,count(id) as value from imooc_courses GROUP BY cate"
        )
        cate_data = cursor.fetchall()
        cate_data_new = transform_cate(cate_data)
        return json_success(cate_data_new)


# 所以课程的学习人数
@app.route('/api/learn_num')
def api_learn_num():
    with mysql() as cursor:
        cursor.execute(
            "SELECT title as name,learn_num as value from imooc_courses ORDER BY learn_num ASC"
        )
        return json_success(cursor.fetchall())


# 每个方向的学习人数
@app.route('/api/learn_num_cate')
def api_learn_num_cate():
    with mysql() as cursor:
        cursor.execute(
            "SELECT cate as name,CAST(sum(learn_num) AS CHAR) as value from imooc_courses GROUP BY cate ORDER BY sum(learn_num) DESC"
        )
        cate_data = cursor.fetchall()
        cate_data_new = transform_cate(cate_data)
        return json_success(cate_data_new)


# 难度级别
@app.route('/api/difficulty_level')
def api_difficulty_level():
    with mysql() as cursor:
        cursor.execute(
            "SELECT difficulty_level as name,count(id) as value from imooc_courses GROUP BY difficulty_level"
        )
        return json_success(cursor.fetchall())


# 课程评分
@app.route('/api/overall_rating')
def api_overall_rating():
    with mysql() as cursor:
        cursor.execute(
            "SELECT overall_rating as name,count(id) as value from imooc_courses GROUP BY overall_rating order by overall_rating+0 ASC"
        )
        return json_success(cursor.fetchall())


# 课程评分
@app.route('/api/duration')
def api_duration():
    with mysql() as cursor:
        cursor.execute(
            "SELECT duration as name,count(id) as value  from imooc_courses GROUP BY duration order by duration+0 ASC"
        )
        return json_success(cursor.fetchall())


# 学习人数与评分的关系
@app.route('/api/bubble_gradient')
def api_bubble_gradient():
    with mysql() as cursor:
        cursor.execute(
            "SELECT overall_rating,learn_num,0,title FROM imooc_courses")
        return json_success(cursor.fetchall())


# 搜索
@app.route('/api/search')
def api_search():
    if request.values.get('keywords'):
        keywords = request.values.get('keywords')
    else:
        keywords = ''
    with mysql() as cursor:
        cursor.execute("SELECT * FROM imooc_courses WHERE title like '%" +
                       keywords + "%' or cate like '%" +
                       keywords + "%' or type like '%" +
                       keywords + "%' or brief like '%" +
                       keywords + "%' order by learn_num desc limit 50")
        return json_success(cursor.fetchall())


# 由于一个课程可能存在多少cate,以逗号分隔,所以此处重新组合
def transform_cate(cate_data):
    cate_data_tmp = {}
    for item in cate_data:
        if item['name'] == '':
            item['name'] = '其他'
        if item['name'].find(',') > 0:
            for item_sub in item['name'].split(','):
                if item_sub not in cate_data_tmp.keys():
                    cate_data_tmp[item_sub] = item['value']
                else:
                    cate_data_tmp[item_sub] = int(
                        cate_data_tmp[item_sub]) + int(item['value'])
        else:
            if item['name'] not in cate_data_tmp.keys():
                cate_data_tmp[item['name']] = item['value']
            else:
                cate_data_tmp[item['name']] = int(
                    cate_data_tmp[item['name']]) + int(item['value'])
    cate_data_new = []
    for key in cate_data_tmp:
        cate_data_new.append({'name': key, 'value': cate_data_tmp[key]})
    return cate_data_new


# 返回json数据
def json_success(data):
    data = {'status': 'success', 'data': data, 'info': '成功'}
    response = make_response(jsonify(data))
    # 支持跨域
    response.headers['Access-Control-Allow-Origin'] = '*'
    response.headers['Access-Control-Allow-Methods'] = 'GET,POST'
    return response

templates\index.html为模板文件,主要是通过views.py接口提供的数据,用Echarts进行可视化。





    
    
    
    数据可视化分析
    
    
    
    
    
    
    



    

数据可视化分析(慕课网)

数据来源 慕课网,使用python-scrapy爬取数据,解析,预处理,缓存于mysql。

可视化采用python的flask框架获取统计数据,使用 Echarts进行简单的可视化。

课程类型课程数汇总

学习方向课程数汇总

课程时长统计

课程难度级别

课程评分统计

学习人数与评分的关系

学习人数排行

不多解释了,请看代码。

运行项目

python runserver.py 

线上项目请结合uwsgi+nginx部署,这里就不多说啦。

最终效果

数据可视化分析(慕课网) 数据来源 慕课网,使用python-scrapy爬取数据,解析,预处理,缓存于mysql。

可视化采用python的flask框架获取统计数据,使用 Echarts进行简单的可视化。

你可能感兴趣的:(Scrapy)