前几篇文章,我们对慕课网的课程进行了爬取,本文就对数据进行统计和可视化,让这些数据更直观的展现出来。
介绍
Flask 是基于Python的非常流行的框架之一,主要用于web开发,适合开发中小型项目,易于扩展。Flask的官方网站是 http://flask.pocoo.org/ 。
Echarts (http://echarts.baidu.com/ )是百度出品的,基于Canvas的,纯Javascript 的图表库,提供直观,生动,可交互,可个性化定制的数据可视化图表。创新的拖拽重计算、数据视图、值域漫游等特性大大增强了用户体验,赋予了用户对数据进行挖掘、整合的能力。
搭建Flaskweb项目
安装必要的依赖库
pip install Flask
pip install PyMySQL
web项目目录结构如下:
├── web│
│ ├── static
│ │ └── js
│ │ ├── dark.js
│ │ └── echarts.min.js
│ ├── templates
│ │ └── index.html
│ ├── __init__.py
│ └── views.py
├── runserver.py
其中runserver.py
为项目启动文件:
#!/usr/bin/python
# -*- coding: utf-8 -*-
from web import app
if __name__ == '__main__':
app.run(host='0.0.0.0', debug=True)
__init__.py
是项目的主文件
# -*- coding: utf-8 -*-
from flask import Flask
app = Flask(__name__)
import web.views
views.py
为视图函数:
# -*- coding: utf-8 -*-
import contextlib
import pymysql
from flask import jsonify, make_response, render_template, request
from web import app
# 数据库连接
# 定义上下文管理器,连接后自动关闭连接
@contextlib.contextmanager
def mysql(host='127.0.0.1',
port=3306,
user='root',
passwd='abc-123',
db='demo_db',
charset='utf8'):
conn = pymysql.connect(
host=host, port=port, user=user, passwd=passwd, db=db, charset=charset)
cursor = conn.cursor(cursor=pymysql.cursors.DictCursor)
try:
yield cursor
finally:
conn.commit()
cursor.close()
conn.close()
# 首页
@app.route('/')
def hello_world():
return render_template('index.html')
# 每个课程类型的课程数
@app.route('/api/type')
def api_type():
with mysql() as cursor:
cursor.execute(
"SELECT type as name,count(id) as value from imooc_courses GROUP BY type"
)
return json_success(cursor.fetchall())
# 每个学习方向的课程数
@app.route('/api/cate')
def api_cate():
with mysql() as cursor:
cursor.execute(
"SELECT cate as name,count(id) as value from imooc_courses GROUP BY cate"
)
cate_data = cursor.fetchall()
cate_data_new = transform_cate(cate_data)
return json_success(cate_data_new)
# 所以课程的学习人数
@app.route('/api/learn_num')
def api_learn_num():
with mysql() as cursor:
cursor.execute(
"SELECT title as name,learn_num as value from imooc_courses ORDER BY learn_num ASC"
)
return json_success(cursor.fetchall())
# 每个方向的学习人数
@app.route('/api/learn_num_cate')
def api_learn_num_cate():
with mysql() as cursor:
cursor.execute(
"SELECT cate as name,CAST(sum(learn_num) AS CHAR) as value from imooc_courses GROUP BY cate ORDER BY sum(learn_num) DESC"
)
cate_data = cursor.fetchall()
cate_data_new = transform_cate(cate_data)
return json_success(cate_data_new)
# 难度级别
@app.route('/api/difficulty_level')
def api_difficulty_level():
with mysql() as cursor:
cursor.execute(
"SELECT difficulty_level as name,count(id) as value from imooc_courses GROUP BY difficulty_level"
)
return json_success(cursor.fetchall())
# 课程评分
@app.route('/api/overall_rating')
def api_overall_rating():
with mysql() as cursor:
cursor.execute(
"SELECT overall_rating as name,count(id) as value from imooc_courses GROUP BY overall_rating order by overall_rating+0 ASC"
)
return json_success(cursor.fetchall())
# 课程评分
@app.route('/api/duration')
def api_duration():
with mysql() as cursor:
cursor.execute(
"SELECT duration as name,count(id) as value from imooc_courses GROUP BY duration order by duration+0 ASC"
)
return json_success(cursor.fetchall())
# 学习人数与评分的关系
@app.route('/api/bubble_gradient')
def api_bubble_gradient():
with mysql() as cursor:
cursor.execute(
"SELECT overall_rating,learn_num,0,title FROM imooc_courses")
return json_success(cursor.fetchall())
# 搜索
@app.route('/api/search')
def api_search():
if request.values.get('keywords'):
keywords = request.values.get('keywords')
else:
keywords = ''
with mysql() as cursor:
cursor.execute("SELECT * FROM imooc_courses WHERE title like '%" +
keywords + "%' or cate like '%" +
keywords + "%' or type like '%" +
keywords + "%' or brief like '%" +
keywords + "%' order by learn_num desc limit 50")
return json_success(cursor.fetchall())
# 由于一个课程可能存在多少cate,以逗号分隔,所以此处重新组合
def transform_cate(cate_data):
cate_data_tmp = {}
for item in cate_data:
if item['name'] == '':
item['name'] = '其他'
if item['name'].find(',') > 0:
for item_sub in item['name'].split(','):
if item_sub not in cate_data_tmp.keys():
cate_data_tmp[item_sub] = item['value']
else:
cate_data_tmp[item_sub] = int(
cate_data_tmp[item_sub]) + int(item['value'])
else:
if item['name'] not in cate_data_tmp.keys():
cate_data_tmp[item['name']] = item['value']
else:
cate_data_tmp[item['name']] = int(
cate_data_tmp[item['name']]) + int(item['value'])
cate_data_new = []
for key in cate_data_tmp:
cate_data_new.append({'name': key, 'value': cate_data_tmp[key]})
return cate_data_new
# 返回json数据
def json_success(data):
data = {'status': 'success', 'data': data, 'info': '成功'}
response = make_response(jsonify(data))
# 支持跨域
response.headers['Access-Control-Allow-Origin'] = '*'
response.headers['Access-Control-Allow-Methods'] = 'GET,POST'
return response
templates\index.html
为模板文件,主要是通过views.py接口提供的数据,用Echarts进行可视化。
数据可视化分析
不多解释了,请看代码。
运行项目
python runserver.py
线上项目请结合uwsgi+nginx部署,这里就不多说啦。
最终效果
数据可视化分析(慕课网) 数据来源 慕课网,使用python-scrapy爬取数据,解析,预处理,缓存于mysql。
可视化采用python的flask框架获取统计数据,使用 Echarts进行简单的可视化。