喜欢大海的CC

数据可视化方向的毕业设计——基于Python爬虫的招聘信息及租房数据可视化分析系统

距离我本科答辩顺利通过已经过去十几天了，我决定把本科阶段最后的小成果做个总结分享给想做此方向项目的小伙伴们，希望能让你们想在动手实操时有项目可供参考，有实现思路可供学习，演示视频先呈现给大家。

一、研究目的及意义

（一）现状

应届毕业生关注重点难点：找工作+租房子
招聘网站繁杂：拉勾网、BOSS直聘、前程无忧等
各个大学的就业信息网站成熟
租房网站众多：链家网、我爱我家等

（二）缺点

仅提供信息，功能单一
信息分散，无法了解整体情况
文字、数字形式不直观
招聘与租房无关联

（三）改进

整合信息、统计数据
分区域数据可视化
丰富的图表呈现
集招聘租房于一体

因此，当下迫切需要一个能够把尽可能多的信息整合到一起的平台，且该平台需要具备强大的统计数据及数据可视化的功能，这样，用户就可以通过该平台来检索招聘信息及房源信息，并且可以通过图表可视化了解整体情况。对于每年日益增多的就业大军而言，可以从该系统中清楚了解到目前在一线城市、新一线城市、二线城市的互联网各行业及租房现状，有助于做出适合自身情况的选择。

二、实现思路与相关技术

前后端数据交互的实现——ajax技术

通过ajax传递参数，它是用户和服务器之间的一个中间层，使得用户操作和服务器响应异步化，前端将需要传递的参数转化为JSON字符串（json.stringify）再通过get/post方式向服务器发送一个请求并将参数直接传递给后台，后台对前端请求作出反应，接收数据，将数据作为条件进行查询，返回json字符串格式的查询结果集给前端，前端接收到后台返回的数据进行条件判断并做出相应的页面展示。

get/post请求方式的区别

都是向服务器提交数据，都能从服务器获取数据。Get方式的请求，浏览器会把响应头和数据体一并发送出去，服务器响应200表示请求已成功，返回数据。Post方式的请求，浏览器会先发送响应头，服务器响应100后，浏览器再发送数据体，服务器响应200，请求成功，返回数据。Post方式安全性更好。

什么时候用get，什么时候用post请求方式？

登录注册、修改信息部分都使用post方式，数据概况展示、可视化展示部分都使用get方式。也就是数据查询用get方式，数据增加、删除、修改用post方式更加安全。

数据可视化图表展示——ECharts图表库（包含所有想要的图表生成代码，可支持在线调试代码，图表大气美观，此网站真的绝绝子，分享给大家）

网站：ECharts开源可视化图表库

三、系统整体功能框架

四、详细实现

（一）数据获取

1、获取招聘信息

因为拉勾网具有较强的反爬机制，使用user-agent和cookies封装头部信息，将爬虫程序伪装成浏览器访问网页，通过requests包的post方法进行url请求，请求成功返回json格式字符串，并使用字典方法直接读取数据，即可拿到我们想要的python职位相关的信息，可以通过读取总职位数,通过总的职位数和每页能显示的职位数.我们可以计算出总共有多少页，然后使用循环按页爬取, 最后将职位信息汇总, 写入到CSV格式的文件以及本地mysql数据库中。

import requests
import math
import time
import pandas as pd
import pymysql
from sqlalchemy import create_engine

def get_json(url, num):
    """
    从指定的url中通过requests请求携带请求头和请求体获取网页中的信息,
    :return:
    """
    url1 = 'https://www.lagou.com/jobs/list_python/p-city_0?&cl=false&fromSearch=true&labelWords=&suginput='
    headers = {
     
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36',
        'Host': 'www.lagou.com',
        'Referer': 'https://www.lagou.com/jobs/list_%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90?labelWords=&fromSearch=true&suginput=',
        'X-Anit-Forge-Code': '0',
        'X-Anit-Forge-Token': 'None',
        'X-Requested-With': 'XMLHttpRequest',
        'Cookie':'user_trace_token=20210218203227-35e936a1-f40f-410d-8400-b87f9fb4be0f; _ga=GA1.2.331665492.1613651550; LGUID=20210218203230-39948353-de3f-4545-aa01-43d147708c69; LG_HAS_LOGIN=1; hasDeliver=0; privacyPolicyPopup=false; showExpriedIndex=1; showExpriedCompanyHome=1; showExpriedMyPublish=1; RECOMMEND_TIP=true; index_location_city=%E5%85%A8%E5%9B%BD; Hm_lvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1613651550,1613652253,1613806244,1614497914; _putrc=52ABCFBE36E5D0BD123F89F2B170EADC; gate_login_token=ea312e017beac7fe72547a32956420b07d6d5b1816bc766035dd0f325ba92b91; JSESSIONID=ABAAAECAAEBABII8D8278DB16CB050FD656DD1816247B43; login=true; unick=%E7%94%A8%E6%88%B72933; WEBTJ-ID=20210228%E4%B8%8B%E5%8D%883:38:37153837-177e7932b7f618-05a12d1b3d5e8c-53e356a-1296000-177e7932b8071; sensorsdata2015session=%7B%7D; _gid=GA1.2.1359196614.1614497918; __lg_stoken__=bb184dd5d959320e9e61d943e802ac98a8538d44699751621e807e93fe0ffea4c1a57e923c71c93a13c90e5abda7a51873c2e488a4b9d76e67e0533fe9e14020734016c0dcf2; X_MIDDLE_TOKEN=90b85c3630b92280c3ad7a96c881482e; LGSID=20210228161834-659d6267-94a3-4a5c-9857-aaea0d5ae2ed; TG-TRACK-CODE=index_navigation; SEARCH_ID=092c1fd19be24d7cafb501684c482047; X_HTTP_TOKEN=fdb10b04b25b767756070541617f658231fd72d78b; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%2220600756%22%2C%22first_id%22%3A%22177b521c02a552-08c4a0f886d188-73e356b-1296000-177b521c02b467%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%2C%22%24latest_referrer%22%3A%22%22%2C%22%24os%22%3A%22Linux%22%2C%22%24browser%22%3A%22Chrome%22%2C%22%24browser_version%22%3A%2288.0.4324.190%22%2C%22lagou_company_id%22%3A%22%22%7D%2C%22%24device_id%22%3A%22177b521c02a552-08c4a0f886d188-73e356b-1296000-177b521c02b467%22%7D; _gat=1; Hm_lpvt_4233e74dff0ae5bd0a3d81c6ccf756e6=1614507066; LGRID=20210228181106-f2d71d85-74fe-4b43-b87e-d78a33c872ad'
    }
    data = {
     
        'first': 'true',
        'pn': num,
        'kd': 'BI工程师'}
    #得到Cookies信息
    s = requests.Session()
    print('建立session：', s, '\n\n')
    s.get(url=url1, headers=headers, timeout=3)
    cookie = s.cookies
    print('获取cookie：', cookie, '\n\n')
    #添加请求参数以及headers、Cookies等信息进行url请求
    res = requests.post(url, headers=headers, data=data, cookies=cookie, timeout=3)
    res.raise_for_status()
    res.encoding = 'utf-8'
    page_data = res.json()
    print('请求响应结果：', page_data, '\n\n')
    return page_data

def get_page_num(count):
    """
    计算要抓取的页数，通过在拉勾网输入关键字信息，可以发现最多显示30页信息,每页最多显示15个职位信息
    :return:
    """
    page_num = math.ceil(count / 15)
    if page_num > 29:
        return 29
    else:
        return page_num

def get_page_info(jobs_list):
    """
    获取职位
    :param jobs_list:
    :return:
    """
    page_info_list = []
    for i in jobs_list:  # 循环每一页所有职位信息
        job_info = []
        job_info.append(i['companyFullName'])
        job_info.append(i['companyShortName'])
        job_info.append(i['companySize'])
        job_info.append(i['financeStage'])
        job_info.append(i['district'])
        job_info.append(i['positionName'])
        job_info.append(i['workYear'])
        job_info.append(i['education'])
        job_info.append(i['salary'])
        job_info.append(i['positionAdvantage'])
        job_info.append(i['industryField'])
        job_info.append(i['firstType'])
        job_info.append(",".join(i['companyLabelList']))
        job_info.append(i['secondType'])
        job_info.append(i['city'])
        page_info_list.append(job_info)
    return page_info_list

def unique(old_list):
    newList = []
    for x in old_list:
        if x not in newList :
            newList.append(x)
    return newList

def main():
    connect_info = 'mysql+pymysql://{}:{}@{}:{}/{}?charset=utf8'.format("root", "123456", "localhost", "3366",
                                                                        "lagou")
    engine = create_engine(connect_info)
    url = ' https://www.lagou.com/jobs/positionAjax.json?needAddtionalResult=false'
    first_page = get_json(url, 1)
    total_page_count = first_page['content']['positionResult']['totalCount']
    num = get_page_num(total_page_count)
    total_info = []
    time.sleep(10)
    for num in range(1, num + 1):
        # 获取每一页的职位相关的信息
        page_data = get_json(url, num)  # 获取响应json
        jobs_list = page_data['content']['positionResult']['result']  # 获取每页的所有python相关的职位信息
        page_info = get_page_info(jobs_list)
        total_info += page_info
        print('已经爬取到第{}页，职位总数为{}'.format(num, len(total_info)))
        time.sleep(20)
    #将总数据转化为data frame再输出,然后在写入到csv格式的文件中以及本地数据库中
    df = pd.DataFrame(data=unique(total_info),
                      columns=['companyFullName', 'companyShortName', 'companySize', 'financeStage',
                               'district', 'positionName', 'workYear', 'education',
                               'salary', 'positionAdvantage', 'industryField',
                               'firstType', 'companyLabelList', 'secondType', 'city'])
    df.to_csv('bi.csv', index=True)
    print('职位信息已保存本地')
    df.to_sql(name='demo', con=engine, if_exists='append', index=False)
    print('职位信息已保存数据库')

if __name__ == '__main__':
    main()

2、获取租房房源信息

使用user-agent和cookies封装头部信息将爬虫程序伪装成浏览器访问网页，通过requests包的get方法进行url请求获取网页数据，将经过pyquery包解析后网页数据按字段添加到dataframe中，再将dataframe存入csv文件及本地数据库中。

import requests
from pyquery import PyQuery as pq
from fake_useragent import UserAgent
import time
import pandas as pd
import random
import pymysql
from sqlalchemy import create_engine

UA = UserAgent()
headers = {
     
    'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
    'Cookie': 'lianjia_uuid=6383a9ce-19b9-47af-82fb-e8ec386eb872; UM_distinctid=1777521dc541e1-09601796872657-53e3566-13c680-1777521dc5547a; _smt_uid=601dfc61.4fcfbc4b; _ga=GA1.2.894053512.1612577894; _jzqc=1; _jzqckmp=1; _gid=GA1.2.1480435812.1614959594; Hm_lvt_9152f8221cb6243a53c83b956842be8a=1614049202,1614959743; csrfSecret=lqKM3_19PiKkYOfJSv6ldr_c; activity_ke_com=undefined; ljisid=6383a9ce-19b9-47af-82fb-e8ec386eb872; select_nation=1; crosSdkDT2019DeviceId=-kkiavn-2dq4ie-j9ekagryvmo7rd3-qjvjm0hxo; Hm_lpvt_9152f8221cb6243a53c83b956842be8a=1615004691; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%221777521e37421a-0e1d8d530671de-53e3566-1296000-1777521e375321%22%2C%22%24device_id%22%3A%221777521e37421a-0e1d8d530671de-53e3566-1296000-1777521e375321%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E8%87%AA%E7%84%B6%E6%90%9C%E7%B4%A2%E6%B5%81%E9%87%8F%22%2C%22%24latest_referrer%22%3A%22https%3A%2F%2Fwww.baidu.com%2Flink%22%2C%22%24latest_referrer_host%22%3A%22www.baidu.com%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC%22%2C%22%24latest_utm_source%22%3A%22guanwang%22%2C%22%24latest_utm_medium%22%3A%22pinzhuan%22%2C%22%24latest_utm_campaign%22%3A%22wybeijing%22%2C%22%24latest_utm_content%22%3A%22biaotimiaoshu%22%2C%22%24latest_utm_term%22%3A%22biaoti%22%7D%7D; lianjia_ssid=7a179929-0f9a-40a4-9537-d1ddc5164864; _jzqa=1.3310829580005876700.1612577889.1615003848.1615013370.6; _jzqy=1.1612577889.1615013370.2.jzqsr=baidu|jzqct=%E9%93%BE%E5%AE%B6.jzqsr=baidu; select_city=440300; srcid=eyJ0Ijoie1wiZGF0YVwiOlwiZjdiNTI1Yjk4YjI3MGNhNjRjMGMzOWZkNDc4NjE4MWJkZjVjNTZiMWYxYTM4ZTJkNzMxN2I0Njc1MDEyY2FiOWMzNTIzZTE1ZjEyZTE3NjlkNTRkMTA2MWExZmIzMWM5YzQ3ZmQxM2M3NTM5YTQ1YzM5OWU0N2IyMmFjM2ZhZmExOGU3ZTc1YWU0NDQ4NTdjY2RiMjEwNTQyMDQzM2JiM2UxZDQwZWQwNzZjMWQ4OTRlMGRkNzdmYjExZDQwZTExNTg5NTFkODIxNWQzMzdmZTA4YmYyOTFhNWQ2OWQ1OWM4ZmFlNjc0OTQzYjA3NDBjNjNlNDYyNTZiOWNhZmM4ZDZlMDdhNzdlMTY1NmM0ZmM4ZGI4ZGNlZjg2OTE2MmU4M2MwYThhNTljMGNkODYxYjliNGYwNGM0NzJhNGM3MmVmZDUwMTJmNmEwZWMwZjBhMzBjNWE2OWFjNzEzMzM4M1wiLFwia2V5X2lkXCI6XCIxXCIsXCJzaWduXCI6XCJhYWEyMjhiNVwifSIsInIiOiJodHRwczovL20ubGlhbmppYS5jb20vY2h1enUvc3ovenVmYW5nL3BnJTdCJTdELyIsIm9zIjoid2ViIiwidiI6IjAuMSJ9',
    'Host': 'sz.lianjia.com',
    'Referer': 'https://sz.lianjia.com/zufang/',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36',
}
num_page = 2

class Lianjia_Crawer:

    def __init__(self, txt_path):
        super(Lianjia_Crawer, self).__init__()
        self.file = str(txt_path)
        self.df = pd.DataFrame(columns = ['title', 'district', 'area', 'orient', 'floor', 'price', 'city'])

    def run(self):
        '''启动脚本'''
        connect_info = 'mysql+pymysql://{}:{}@{}:{}/{}?charset=utf8'.format("root", "123456", "localhost", "3366", "lagou")
        engine = create_engine(connect_info)
        for i in range(100):
            url = "https://sz.lianjia.com/zufang/pg{}/".format(str(i))
            self.parse_url(url)
            time.sleep(random.randint(2, 5))
            print('正在爬取的 url 为 {}'.format(url))
        print('爬取完毕！！！！！！！！！！！！！！')
        self.df.to_csv(self.file, encoding='utf-8')
        print('租房信息已保存至本地')
        self.df.to_sql(name='house', con=engine, if_exists='append', index=False)
        print('租房信息已保存数据库')

    def parse_url(self, url):
        headers['User-Agent'] = UA.chrome
        res = requests.get(url, headers=headers)
        #声明pq对象
        doc = pq(res.text)
        for i in doc('.content__list--item .content__list--item--main'):
            try:
                pq_i = pq(i)
                # 房屋标题
                title = pq_i('.content__list--item--title a').text()
                # 具体信息
                houseinfo = pq_i('.content__list--item--des').text()
                # 行政区
                address = str(houseinfo).split('/')[0]
                district = str(address).split('-')[0]
                # 房屋面积
                full_area = str(houseinfo).split('/')[1]
                area = str(full_area)[:-1]
                # 朝向
                orient = str(houseinfo).split('/')[2]
                # 楼层
                floor = str(houseinfo).split('/')[-1]
                # 价格
                price = pq_i('.content__list--item-price').text()
                #城市
                city = '深圳'
                data_dict = {
     'title': title, 'district': district, 'area': area, 'orient': orient, 'floor': floor, 'price': price, 'city': city}
                self.df = self.df.append(data_dict, ignore_index=True)
                print([title, district, area, orient, floor, price, city])
            except Exception as e:
                print(e)
                print("索引提取失败，请重试！！！！！！！！！！！！！")

if __name__ =="__main__":
    txt_path = "zufang_shenzhen.csv"
    Crawer = Lianjia_Crawer(txt_path)
    Crawer.run() # 启动爬虫脚本

（二）数据库

需要创建三张数据库表分别用来存储招聘信息、房源信息、和用户信息。

将爬取到的数据存入csv文件以及本地mysql数据库中，部分数据展示如下图所示：

数据库与后台进行连接

使用pymysql连接本地mysql数据库，首先通过pip安装pymysql并创建好数据库以及数据库表，导入pymysql包，打开数据库连接，使用cursor()方法创建一个游标对象，使用execute()方法执行SQL查询。

（三）注册登录

注册登录流程：

实现代码（后端响应）：

#注册用户
@app.route('/addUser',methods=['POST'])
def addUser():
    #服务器端获取json
    get_json = request.get_json()
    name = get_json['name']
    password = get_json['password']
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    cursor.execute("select count(*) from `user` where `username` = '" + name + "'")
    count = cursor.fetchall()
    #该昵称已存在
    if (count[0][0]!= 0):
        table_result = {
     "code": 500, "msg": "该昵称已存在！"}
        cursor.close()
    else:
        add = conn.cursor()
        sql = "insert into `user`(username,password) values('"+name+"','"+password+"');"
        add.execute(sql)
        conn.commit()
        table_result = {
     "code": 200, "msg": "注册成功"}
        add.close()
    conn.close()
    return jsonify(table_result)
#用户登录
@app.route('/loginByPassword',methods=['POST'])
def loginByPassword():
    get_json = request.get_json()
    name = get_json['name']
    password = get_json['password']
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    cursor.execute("select count(*) from `user` where `username` = '" + name +"' and password = '" + password+"';")
    count = cursor.fetchall()
    if(count[0][0] != 0):
        table_result = {
     "code": 200, "msg": name}
        cursor.close()
    else:
        name_cursor = conn.cursor()
        name_cursor.execute("select count(*) from `user` where `username` = '" + name +"';")
        name_count = name_cursor.fetchall()
        #print(name_count)
        if(name_count[0][0] != 0):
            table_result = {
     "code":500, "msg": "密码错误！"}
        else:
            table_result = {
     "code":500, "msg":"该用户不存在，请先注册！"}
        name_cursor.close()
    conn.close()
    print(name)
    return jsonify(table_result)

（四）首页功能

（五）修改个人信息

#个人信息修改
@app.route('/updateUserInfo',methods=['POST'])
def updateUserInfo():
    get_json = request.get_json()
    name = get_json['name']
    print(name)
    email = get_json['email']
    content = get_json['content']
    address = get_json['address']
    phone = get_json['phone']
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    cursor.execute("update `user` set email = '"+email+"',content = '"+content+"',address = '"+address+"',phone = '"+phone+"' where username = '"+ name +"';")
    conn.commit()
    table_result = {
     "code": 200, "msg": "更新成功！","youxiang": email, "tel": phone}
    cursor.close()
    conn.close()
    print(table_result)
    return jsonify(table_result)

（六）修改密码
可以通过两种方式修改密码，均需要进行安全验证

#密码修改
@app.route('/updatePass',methods=['POST'])
def updatePass():
    get_json = request.get_json()
    name = get_json['name']
    oldPsw = get_json['oldPsw']
    newPsw = get_json['newPsw']
    rePsw = get_json['rePsw']
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    cursor.execute("select count(*) from `user` where `username` = '" + name + "' and password = '" + oldPsw+"';")
    count = cursor.fetchall()
    print(count[0][0])
    #确定昵称密码对应
    if (count[0][0] == 0):
        table_result = {
     "code": 500, "msg": "原始密码错误！"}
        cursor.close()
    else:
        updatepass = conn.cursor()
        sql = "update `user` set password = '"+newPsw+"' where username = '"+ name +"';"
        updatepass.execute(sql)
        conn.commit()
        table_result = {
     "code": 200, "msg": "密码修改成功！", "username": name, "new_password": newPsw}
        updatepass.close()
    conn.close()
    return jsonify(table_result)

（七）数据概况展示

注：仅研究互联网岗位招聘信息

以招聘数据概况为例：

@app.route('/data',methods=['GET'])
def data():
    limit = int(request.args['limit'])
    page = int(request.args['page'])
    page = (page-1)*limit
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',charset='utf8mb4')
    cursor = conn.cursor()
    if (len(request.args) == 2):
        cursor.execute("select count(*) from demo")
        count = cursor.fetchall()
        cursor = conn.cursor(cursor=pymysql.cursors.DictCursor)
        cursor.execute("select * from demo limit "+str(page)+","+str(limit))
        data_dict = []
        result = cursor.fetchall()
        for field in result:
            data_dict.append(field)
    else:
        education = str(request.args['education'])
        positionName = str(request.args['positionName']).lower()
        if(education=='不限'):
            cursor.execute("select count(*) from demo where positionName like '%"+positionName+"%'")
            count = cursor.fetchall()
            cursor = conn.cursor(cursor=pymysql.cursors.DictCursor)
            cursor.execute("select * from demo where positionName like '%"+positionName+"%' limit " + str(page) + "," + str(limit))
            data_dict = []
            result = cursor.fetchall()
            for field in result:
                data_dict.append(field)
        else:
            cursor.execute("select count(*) from demo where positionName like '%"+positionName+"%' and education = '"+education+"'")
            count = cursor.fetchall()
            cursor = conn.cursor(cursor=pymysql.cursors.DictCursor)
            cursor.execute("select * from demo where positionName like '%"+positionName+"%' and education = '"+education+"' limit " + str(page) + "," + str(limit))
            data_dict = []
            result = cursor.fetchall()
            for field in result:
                data_dict.append(field)
    table_result = {
     "code": 0, "msg": None, "count": count[0], "data": data_dict}
    cursor.close()
    conn.close()
    return jsonify(table_result)

（八）招聘数据可视化
从全国、一线城市、新一线城市、二线城市四个角度分区域分析。
举个栗子，全国范围内的企业情况分析如下图所示：

@app.route('/qiye',methods=['GET'])
def qiye():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    cursor.execute("SELECT DISTINCT(city) from demo")
    result = cursor.fetchall()
    city = []
    city_result = []
    companySize = []
    companySizeResult = []
    selected = {
     }
    # 获取到的城市
    for field in result:
        city.append(field[0])
    if (len(request.args) == 0):
        # 没有查询条件
        # 获取到城市对应的个数
        for i in city:
            cursor.execute("SELECT count(*) from demo where city = '" + i + "'")
            count = cursor.fetchall()
            dict = {
     'value': count[0][0], 'name': i}
            city_result.append(dict)
        # 初始化最开始显示几个城市
        for i in city[10:]:
            selected[i] = False
        # 获取到几种公司规模
        cursor.execute("SELECT DISTINCT(companySize) from demo")
        company = cursor.fetchall()
        for field in company:
            companySize.append(field[0])
            # 每种公司规模对应的个数
            cursor.execute("SELECT count(*) from demo where companySize = '" + field[0] + "'")
            count = cursor.fetchall()
            companySizeResult.append(count[0][0])
    else:
        positionName = str(request.args['positionName']).lower()
        # 查询条件：某种职业
        # 每个城市某种职业的个数
        for i in city:
            cursor.execute("SELECT count(*) from demo where city = '" + i + "' and positionName like '%"+positionName+"%'")
            count = cursor.fetchall()
            dict = {
     'value': count[0][0], 'name': i}
            city_result.append(dict)
        for i in city[10:]:
            selected[i] = False
        cursor.execute("SELECT DISTINCT(companySize) from demo")
        company = cursor.fetchall()
        for field in company:
            companySize.append(field[0])
            cursor.execute("SELECT count(*) from demo where companySize = '" + field[0] + "' and positionName like '%"+positionName+"%'")
            count = cursor.fetchall()
            companySizeResult.append(count[0][0])
    result = {
     "city": city, "city_result": city_result, "selected": selected, "companySize": companySize, "companySizeResult": companySizeResult}
    cursor.close()
    return jsonify(result)

一线城市的企业情况分析如下图所示：

@app.route('/qiye_first',methods=['GET'])
def qiye_first():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    #cursor.execute("SELECT DISTINCT(city) from demo")
    #result = cursor.fetchall()
    city = ['北京', '上海', '广州', '深圳']
    city_result = []
    companySize = []
    companySizeResult = []
    selected = {
     }
    # 获取到的城市
    #for field in result:
        #city.append(field[0])
    if (len(request.args) == 0):
        # 没有查询条件
        # 获取到城市对应的个数
        for i in city:
            cursor.execute("SELECT count(*) from demo where city = '" + i + "'")
            count = cursor.fetchall()
            dict = {
     'value': count[0][0], 'name': i}
            city_result.append(dict)
        # 初始化最开始显示几个城市
        for i in city[4:]:
            selected[i] = False
        # 获取到几种公司规模
        cursor.execute("SELECT DISTINCT(companySize) from demo where city in ('北京', '上海', '广州', '深圳');")
        company = cursor.fetchall()
        for field in company:
            companySize.append(field[0])
            # 每种公司规模对应的个数
            cursor.execute("SELECT count(*) from demo where companySize = '" + field[0] + "' and city in ('北京', '上海', '广州', '深圳');")
            count = cursor.fetchall()
            companySizeResult.append(count[0][0])
    else:
        positionName = str(request.args['positionName']).lower()
        # 查询条件：某种职业
        # 每个城市某种职业的个数
        for i in city:
            cursor.execute("SELECT count(*) from demo where city = '" + i + "' and positionName like '%"+positionName+"%'")
            count = cursor.fetchall()
            dict = {
     'value': count[0][0], 'name': i}
            city_result.append(dict)
        for i in city[4:]:
            selected[i] = False
        cursor.execute("SELECT DISTINCT(companySize) from demo where city in ('北京', '上海', '广州', '深圳');")
        company = cursor.fetchall()
        for field in company:
            companySize.append(field[0])
            cursor.execute("SELECT count(*) from demo where companySize = '" + field[0] + "' and positionName like '%"+positionName+"%' and city in ('北京', '上海', '广州', '深圳');")
            count = cursor.fetchall()
            companySizeResult.append(count[0][0])

    result = {
     "city": city, "city_result": city_result, "selected": selected, "companySize": companySize, "companySizeResult": companySizeResult}
    cursor.close()
    return jsonify(result)

全国本科学历薪资情况分析展示如图所示：

@app.route('/xinzi',methods=['GET'])
def xinzi():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    positionName = ['java', 'python', 'php', 'web', 'bi', 'android', 'ios', '算法', '大数据', '测试', '运维', '数据库']
    #柱状图返回列表
    zzt_list = []
    zzt_list.append(['product', 'Java', 'Python', 'PHP', 'web', 'bi', 'android', 'ios', '算法', '大数据', '测试', '运维', '数据库'])
    if (len(request.args) == 0 or str(request.args['education'])=='不限'):
        temp_list = []
        for i in positionName:
            cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) like '%K%' and positionName like '%"+i+"%';")
            count = cursor.fetchall()
            temp_list += count[0]
        zzt_list.append(['0—10K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5], temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 10 AND 20 and positionName like '%"+i+"%';")
            count = cursor.fetchall()
            temp_list += count[0]
        zzt_list.append(['10—20K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5], temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 20 AND 30 and positionName like '%"+i+"%';")
            count = cursor.fetchall()
            temp_list += count[0]
        zzt_list.append(['20—30K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5], temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute(
                "SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 30 AND 40 and positionName like '%" + i + "%';")
            count = cursor.fetchall()
            temp_list += count[0]
        zzt_list.append(['30—40K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5], temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute(
                "SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) > 40 and positionName like '%" + i + "%';")
            count = cursor.fetchall()
            temp_list += count[0]
        zzt_list.append(['40以上', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5], temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
    else:
        education = str(request.args['education'])
        temp_list = []
        for i in positionName:
            cursor.execute(
                "SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) like '%K%' and positionName like '%" + i + "%' and education = '"+education+"'")
            count = cursor.fetchall()
            temp_list += count[0]
        zzt_list.append(['0—10K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5], temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute(
                "SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 10 AND 20 and positionName like '%" + i + "%' and education = '"+education+"'")
            count = cursor.fetchall()
            temp_list += count[0]
        zzt_list.append(['10—20K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5], temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute(
                "SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 20 AND 30 and positionName like '%" + i + "%' and education = '"+education+"'")
            count = cursor.fetchall()
            temp_list += count[0]
        zzt_list.append(['20—30K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5], temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute(
                "SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 30 AND 40 and positionName like '%" + i + "%' and education = '"+education+"'")
            count = cursor.fetchall()
            temp_list += count[0]
        zzt_list.append(['30—40K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5], temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute(
                "SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) > 40 and positionName like '%" + i + "%' and education = '"+education+"'")
            count = cursor.fetchall()
            temp_list += count[0]
        zzt_list.append(['40以上', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5], temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
    result = {
     "zzt": zzt_list}
    cursor.close()
    return jsonify(result)

全国福利情况分析展示如图所示：

@app.route('/fuli',methods=['GET'])
def fuli():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor(cursor=pymysql.cursors.DictCursor)
    cursor.execute("select positionAdvantage from `demo`")
    data_dict = []
    result = cursor.fetchall()
    for field in result:
        data_dict.append(field['positionAdvantage'])
    content = ''.join(data_dict)
    positionAdvantage = []
    jieba.analyse.set_stop_words('./stopwords.txt')
    tags = jieba.analyse.extract_tags(content, topK=100, withWeight=True)
    for v, n in tags:
        mydict = {
     }
        mydict["name"] = v
        mydict["value"] = str(int(n * 10000))
        positionAdvantage.append(mydict)
    cursor.execute("select companyLabelList from `demo`")
    data_dict = []
    result = cursor.fetchall()
    for field in result:
        data_dict.append(field['companyLabelList'])
    content = ''.join(data_dict)
    companyLabelList = []
    jieba.analyse.set_stop_words('./stopwords.txt')
    tags = jieba.analyse.extract_tags(content, topK=100, withWeight=True)
    for v, n in tags:
        mydict = {
     }
        mydict["name"] = v
        mydict["value"] = str(int(n * 10000))
        companyLabelList.append(mydict)
    cursor.close()
    return jsonify({
     "zwfl": positionAdvantage, "gsfl": companyLabelList})

全国互联网岗位学历与工作经验要求情况分析展示如图所示：

@app.route('/xueli',methods=['GET'])
def xueli():
    #打开数据库连接
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    #创建一个游标对象cursor
    cursor = conn.cursor()
    #执行sql语句
    cursor.execute("SELECT DISTINCT(education) from demo")
    #获取所有记录列表
    result = cursor.fetchall()
    education = []
    education_data = []
    color_list = ['#459AF0', '#38C3B0', '#86CA5A', '#BFD44F', '	#90EE90']
    #获取到学历的五种情况：不限、大专、本科、硕士、博士
    for field in result:
        education.append(field[0])
    #获取到每种学历对应的个数
    for i in range(len(education)):
        cursor.execute("SELECT count(*) from demo where education = '" + education[i] + "'")
        count = cursor.fetchall()
        education_data.append({
     'value': count[0][0], 'itemStyle': {
     'color': color_list[i]}})
    cursor.execute("SELECT DISTINCT(workYear) from demo")
    result = cursor.fetchall()
    workYear = []
    workYear_data = []
    #获取到的几种工作经验
    for field in result:
        workYear.append(field[0])
    #获取到每种工作经验对应的个数
    for i in workYear:
        cursor.execute("SELECT count(*) from demo where workYear = '" + i + "'")
        count = cursor.fetchall()
        workYear_data.append({
     'value': count[0][0], 'name': i})
    cursor.close()
    return jsonify({
     "education":education, "education_data":education_data, "workYear_data":workYear_data})

全国互联网公司融资阶段分布情况分析如图所示：

@app.route('/rongzi',methods=['GET'])
def rongzi():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    cursor.execute("SELECT DISTINCT(financeStage) from demo")
    result = cursor.fetchall()
    finance = []
    finance_data = []
    # 获取到融资的几种情况
    for field in result:
        finance.append(field[0])
    # 获取到每种融资对应的个数
    for i in range(len(finance)):
        cursor.execute("SELECT count(*) from demo where financeStage = '" + finance[i] + "'")
        count = cursor.fetchall()
        finance_data.append({
     'value': count[0][0], 'name': finance[i]})
    cursor.close()
    return jsonify({
     "finance": finance, "finance_data": finance_data})

全国互联网公司职位类型分布情况分析展示如图所示：

@app.route('/poststyle',methods=['GET'])
def poststyle():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    cursor.execute("SELECT DISTINCT(firstType) from demo")
    result = cursor.fetchall()
    firstType = []
    firstType_data = []
    # 获取到职位类型的几种情况
    for field in result:
        firstType.append(field[0])
    # 获取到每种职位类型对应的个数
    for i in range(len(firstType)):
        cursor.execute("SELECT count(*) from demo where firstType = '" + firstType[i] + "'")
        count = cursor.fetchall()
        firstType_data.append({
     'value': count[0][0], 'name': firstType[i]})
    cursor.execute("SELECT DISTINCT(secondType) from demo")
    second = cursor.fetchall()
    secondType = []
    secondType_data = []
    # 获取到职位类型的几种情况
    for field in second:
        secondType.append(field[0])
    # 获取到每种职位类型对应的个数
    for i in range(len(secondType)):
        cursor.execute("SELECT count(*) from demo where secondType = '" + secondType[i] + "'")
        count = cursor.fetchall()
        secondType_data.append({
     'value': count[0][0], 'name': secondType[i]})
    cursor.close()
    return jsonify({
     "firstType": firstType, "firstType_data": firstType_data, "secondType": secondType, "secondType_data": secondType_data})

（九）九大热门城市招聘对比

可以通过饼图纹理展示某城市互联网公司的规模情况，漏斗图展示某城市某职位的学历要求情况等

以北京为例：

@app.route('/beijing',methods=['GET'])
def beijing():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()

    district = []
    district_result = []
    companySize = []
    companySizeResult = []
    education = []
    educationResult = []
    workYear = []
    workYear_data = []
    firstType = []
    firstType_data = []
    finance = []
    finance_data = []
    leida_max_dict = []

    # 获取到的行政区
    cursor.execute("SELECT DISTINCT(district) from demo where city='北京';")
    result = cursor.fetchall()
    for field in result:
        district.append(field[0])
    if (len(request.args) == 0):
        # 没有查询条件
        # 获取到行政区对应的个数
        for i in district:
            cursor.execute("SELECT count(*) from demo where district = '" + i + "';")
            count = cursor.fetchall()
            dict = {
     'value': count[0][0], 'name': i}
            district_result.append(dict)

        # 获取到几种公司规模
        cursor.execute("SELECT DISTINCT(companySize) from demo where city = '北京';")
        company = cursor.fetchall()
        for field in company:
            companySize.append(field[0])
            # 每种公司规模对应的个数
            cursor.execute("SELECT count(*) from demo where companySize = '" + field[0] + "' and city = '北京';")
            count = cursor.fetchall()
            dict = {
     'value': count[0][0], 'name': field[0]}
            companySizeResult.append(dict)

        # 获取到几种学历要求
        cursor.execute("SELECT DISTINCT(education) from demo where city = '北京';")
        eduresult = cursor.fetchall()
        for field in eduresult:
            education.append(field[0])
            # 每种学历要求对应的个数
            cursor.execute("SELECT count(*) from demo where education = '" + field[0] + "' and city = '北京';")
            count = cursor.fetchall()
            dict = {
     'value': count[0][0], 'name': field[0]}
            educationResult.append(dict)


        cursor.execute("SELECT DISTINCT(workYear) from demo where city = '北京';")
        workyear = cursor.fetchall()
        # 获取到的几种工作经验
        for field in workyear:
            workYear.append(field[0])
        # 获取到每种工作经验对应的个数
        for i in workYear:
            cursor.execute("SELECT count(*) from demo where workYear = '" + i + "' and city = '北京';")
            count = cursor.fetchall()
            workYear_data.append({
     'value': count[0][0], 'name': i})

        cursor.execute("SELECT DISTINCT(financeStage) from demo where city = '北京';")
        result = cursor.fetchall()
        # 获取到融资的几种情况
        for field in result:
            finance.append(field[0])
            leida_max_dict.append({
     'name': field[0], 'max': 300})
        # 获取到每种融资对应的个数
        for i in range(len(finance)):
            cursor.execute("SELECT count(*) from demo where financeStage = '" + finance[i] + "' and city = '北京';")
            count = cursor.fetchall()
            finance_data.append(count[0][0])

        # 职位福利
        cursor = conn.cursor(cursor=pymysql.cursors.DictCursor)
        cursor.execute("select positionAdvantage from `demo` where city = '北京';")
        data_dict = []
        result = cursor.fetchall()
        for field in result:
            data_dict.append(field['positionAdvantage'])
        content = ''.join(data_dict)
        positionAdvantage = []
        jieba.analyse.set_stop_words('./stopwords.txt')
        tags = jieba.analyse.extract_tags(content, topK=100, withWeight=True)
        for v, n in tags:
            mydict = {
     }
            mydict["name"] = v
            mydict["value"] = str(int(n * 10000))
            positionAdvantage.append(mydict)

        # 职位类型
        cursor.execute("SELECT DISTINCT(firstType) from demo where city = '北京';")
        result = cursor.fetchall()
        # 获取到职位类型的几种情况
        for field in result:
            for i in field.keys():
                firstType.append(field[i])

        # 获取到每种职位类型对应的个数
        for i in range(len(firstType)):
            cursor.execute("SELECT count(*) from demo where firstType = '" + firstType[i] + "' and city = '北京';")
            count = cursor.fetchall()
            for field in count:
                for j in field.keys():
                    value = field[j]

            firstType_data.append({
     'value': value, 'name': firstType[i]})

        #薪资待遇
        positionName = ['java', 'python', 'php', 'web', 'bi', 'android', 'ios', '算法', '大数据', '测试', '运维', '数据库']
        # 柱状图返回列表
        zzt_list = []
        zzt_list.append(
            ['product', 'Java', 'Python', 'PHP', 'web', 'bi', 'android', 'ios', '算法', '大数据', '测试', '运维', '数据库'])
        temp_list = []
        for i in positionName:
            cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) like '%k%' and positionName like '%" + i + "%' and city = '北京';")
            count = cursor.fetchall()
            for i in count[0].keys():
                value = count[0][i]
            temp_list.append(value)
        zzt_list.append(
            ['0—10K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5],temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 10 AND 20 and positionName like '%" + i + "%' and city = '北京';")
            count = cursor.fetchall()
            for i in count[0].keys():
                value = count[0][i]
            temp_list.append(value)
        zzt_list.append(['10—20K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5],temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 20 AND 30 and positionName like '%" + i + "%' and city = '北京';")
            count = cursor.fetchall()
            for i in count[0].keys():
                value = count[0][i]
            temp_list.append(value)
        zzt_list.append(['20—30K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5],temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 30 AND 40 and positionName like '%" + i + "%' and city = '北京';")
            count = cursor.fetchall()
            for i in count[0].keys():
                value = count[0][i]
            temp_list.append(value)
        zzt_list.append(['30—40K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5],temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        temp_list = []
        for i in positionName:
            cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) > 40 and positionName like '%" + i + "%' and city = '北京';")
            count = cursor.fetchall()
            for i in count[0].keys():
                value = count[0][i]
            temp_list.append(value)
        zzt_list.append(['40以上', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5],temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])


    else:
        positionName = str(request.args['positionName']).lower()
        print(positionName)
        # 查询条件：某种职业
        # 行政区
        for i in district:
            cursor.execute("SELECT count(*) from demo where district = '" + i + "' and positionName like '%"+positionName+"%';")
            count = cursor.fetchall()
            dict = {
     'value': count[0][0], 'name': i}
            district_result.append(dict)
        # 公司规模
        cursor.execute("SELECT DISTINCT(companySize) from demo where city = '北京';")
        company = cursor.fetchall()
        for field in company:
            companySize.append(field[0])
            cursor.execute("SELECT count(*) from demo where companySize = '" + field[0] + "' and positionName like '%"+positionName+"%' and city = '北京';")
            count = cursor.fetchall()
            dict = {
     'value': count[0][0], 'name': field[0]}
            companySizeResult.append(dict)
        # 学历要求
        cursor.execute("SELECT DISTINCT(education) from demo where city = '北京';")
        eduresult = cursor.fetchall()
        for field in eduresult:
            education.append(field[0])
            cursor.execute("SELECT count(*) from demo where education = '" + field[0] + "' and positionName like '%" + positionName + "%' and city = '北京';")
            count = cursor.fetchall()
            dict = {
     'value': count[0][0], 'name': field[0]}
            educationResult.append(dict)
        #工作经验
        cursor.execute("SELECT DISTINCT(workYear) from demo where city = '北京';")
        workyear = cursor.fetchall()
        for field in workyear:
            workYear.append(field[0])
            cursor.execute("SELECT count(*) from demo where workYear = '" + field[0] + "' and positionName like '%" + positionName + "%' and city = '北京';")
            count = cursor.fetchall()
            workYear_data.append({
     'value': count[0][0], 'name': field[0]})
        # 融资阶段
        cursor.execute("SELECT DISTINCT(financeStage) from demo where city = '北京';")
        result = cursor.fetchall()
        # 获取到融资的几种情况
        for field in result:
            finance.append(field[0])
            leida_max_dict.append({
     'name': field[0], 'max': 300})
        # 获取到每种融资对应的个数
        for i in range(len(finance)):
            cursor.execute("SELECT count(*) from demo where financeStage = '" + finance[i] + "' and positionName like '%" + positionName + "%' and city = '北京';")
            count = cursor.fetchall()
            finance_data.append(count[0][0])
        # 职位类型
        cursor.execute("SELECT DISTINCT(firstType) from demo where city = '北京';")
        result = cursor.fetchall()
        # 获取到职位类型的几种情况
        for field in result:
            firstType.append(field[0])
        # 获取到每种职位类型对应的个数
        for i in range(len(firstType)):
            cursor.execute("SELECT count(*) from demo where firstType = '" + firstType[i] + "' and positionName like '%" + positionName + "%' and city = '北京';")
            count = cursor.fetchall()
            firstType_data.append({
     'value': count[0][0], 'name': firstType[i]})
        # 职位福利
        cursor = conn.cursor(cursor=pymysql.cursors.DictCursor)
        cursor.execute("select positionAdvantage from `demo` where city = '北京' and positionName like '%" + positionName + "%' ;")
        data_dict = []
        result = cursor.fetchall()
        for field in result:
            data_dict.append(field['positionAdvantage'])
        content = ''.join(data_dict)
        positionAdvantage = []
        jieba.analyse.set_stop_words('./stopwords.txt')
        tags = jieba.analyse.extract_tags(content, topK=100, withWeight=True)
        for v, n in tags:
            mydict = {
     }
            mydict["name"] = v
            mydict["value"] = str(int(n * 10000))
            positionAdvantage.append(mydict)
        # 薪资待遇
        positionName_sample = ['java', 'python', 'php', 'web', 'bi', 'android', 'ios', '算法', '大数据', '测试', '运维', '数据库']
        # 柱状图返回列表
        zzt_list = []
        zzt_list.append(['product', 'Java', 'Python', 'PHP', 'Web', 'BI', 'Android', 'ios', '算法', '大数据', '测试', '运维', '数据库'])
        # <10k
        temp_list = []
        cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) like '%k%' and positionName like '%" + positionName + "%' and city = '北京';")
        count = cursor.fetchall()
        #print(count)
        for i in count[0].keys():
            value = count[0][i]
        print(value)
        for num in range(len(positionName_sample)):
            if positionName == positionName_sample[num]:
                temp_list.append(value)
            else:
                temp_list.append(0)
        # print(temp_list)
        # temp_list.append(value)
        zzt_list.append(['0—10K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5],temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        # 10-20k
        temp_list = []
        cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 10 AND 20 and positionName like '%" + positionName + "%' and city = '北京';")
        count = cursor.fetchall()
        for i in count[0].keys():
            value = count[0][i]
        for num in range(len(positionName_sample)):
            if positionName == positionName_sample[num]:
                temp_list.append(value)
            else:
                temp_list.append(0)
        # temp_list.append(value)
        zzt_list.append(['10—20K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5],temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        # 20-30k
        temp_list = []
        cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 20 AND 30 and positionName like '%" + positionName + "%' and city = '北京';")
        count = cursor.fetchall()
        for i in count[0].keys():
            value = count[0][i]
        for num in range(len(positionName_sample)):
            if positionName == positionName_sample[num]:
                temp_list.append(value)
            else:
                temp_list.append(0)
        #temp_list.append(value)
        zzt_list.append(['20—30K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5],temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        # 30-40k
        temp_list = []
        cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) BETWEEN 30 AND 40 and positionName like '%" + positionName + "%' and city = '北京';")
        count = cursor.fetchall()
        for i in count[0].keys():
            value = count[0][i]
        for num in range(len(positionName_sample)):
            if positionName == positionName_sample[num]:
                temp_list.append(value)
            else:
                temp_list.append(0)
        #temp_list.append(value)
        zzt_list.append(['30—40K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5],temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
        # >40k
        temp_list = []
        cursor.execute("SELECT COUNT(*) FROM demo WHERE SUBSTR(salary,1,2) > 40 and positionName like '%" + positionName + "%' and city = '北京';")
        count = cursor.fetchall()
        for i in count[0].keys():
            value = count[0][i]
        for num in range(len(positionName_sample)):
            if positionName == positionName_sample[num]:
                temp_list.append(value)
            else:
                temp_list.append(0)
        #temp_list.append(value)
        zzt_list.append(['>40K', temp_list[0], temp_list[1], temp_list[2], temp_list[3], temp_list[4], temp_list[5],temp_list[6], temp_list[7], temp_list[8], temp_list[9], temp_list[10], temp_list[11]])
    print(zzt_list)
    result = {
     "district": district, "district_result": district_result, "companySize": companySize, "companySizeResult": companySizeResult, "education": education, "educationResult": educationResult, "workYear_data":workYear_data, "firstType": firstType, "firstType_data": firstType_data, "leida_max_dict":leida_max_dict, "cyt": positionAdvantage, "finance": finance, "finance_data": finance_data, "zzt": zzt_list}
    cursor.close()
    return jsonify(result)

（十）租房数据可视化
从全国、一线城市、新一线城市、二线城市四个角度分区域分析。
举个栗子，全国范围内的租房房屋面积情况分析如下图所示：

@app.route('/area',methods=['GET'])
def area():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    area_kind = ['<=20㎡', '21~40㎡', '41~60㎡', '61~80㎡', '81~100㎡', '101~120㎡', '121~140㎡', '141~160㎡', '161~180㎡', '181~200㎡']
    area_data = []
    # 获取到每种面积类别对应的个数
    #<=20㎡
    cursor.execute("SELECT count(*) from house where area between 0 and 20;")
    count = cursor.fetchall()
    area_data.append(count[0][0])
    #21~40㎡
    cursor.execute("SELECT count(*) from house where area between 21 and 40;")
    count = cursor.fetchall()
    area_data.append(count[0][0])
    # 41~60㎡
    cursor.execute("SELECT count(*) from house where area between 41 and 60;")
    count = cursor.fetchall()
    area_data.append(count[0][0])
    # 61~80㎡
    cursor.execute("SELECT count(*) from house where area between 61 and 80;")
    count = cursor.fetchall()
    area_data.append(count[0][0])
    # 81~100㎡
    cursor.execute("SELECT count(*) from house where area between 81 and 100;")
    count = cursor.fetchall()
    area_data.append(count[0][0])
    # 101~120㎡
    cursor.execute("SELECT count(*) from house where area between 101 and 120;")
    count = cursor.fetchall()
    area_data.append(count[0][0])
    # 121~140㎡
    cursor.execute("SELECT count(*) from house where area between 121 and 140;")
    count = cursor.fetchall()
    area_data.append(count[0][0])
    # 141~160㎡
    cursor.execute("SELECT count(*) from house where area between 141 and 160;")
    count = cursor.fetchall()
    area_data.append(count[0][0])
    # 161~180㎡
    cursor.execute("SELECT count(*) from house where area between 161 and 180;")
    count = cursor.fetchall()
    area_data.append(count[0][0])
    # 181~200㎡
    cursor.execute("SELECT count(*) from house where area between 181 and 200;")
    count = cursor.fetchall()
    area_data.append(count[0][0])
    cursor.close()
    print(area_data)
    return jsonify({
     "area_kind": area_kind, "area_data": area_data})

全国范围内租房房屋楼层情况分析展示如图所示：

@app.route('/floor',methods=['GET'])
def floor():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    cursor.execute("SELECT DISTINCT(floor) from house;")
    result = cursor.fetchall()
    floor_kind = []
    floor_data = []
    # 获取到楼层的几种情况
    for field in result:
        floor_kind.append(field[0])
    # 获取到每种楼层类型对应的个数
    for i in range(len(floor_kind)):
        cursor.execute("SELECT count(*) from house where floor = '" + floor_kind[i] + "'")
        count = cursor.fetchall()
        floor_data.append({
     'value': count[0][0], 'name': floor_kind[i]})
    cursor.close()
    return jsonify({
     "floor_kind": floor_kind, "floor_data": floor_data})

全国范围内租房房屋朝向情况分析展示如图所示：

@app.route('/orient',methods=['GET'])
def orient():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    cursor.execute("SELECT DISTINCT(orient) from house;")
    result = cursor.fetchall()
    orient_kind = []
    orient_data = []
    # 获取到朝向的几种情况
    for field in result:
        orient_kind.append(field[0])
    # 获取到每种朝向类型对应的个数
    for i in range(len(orient_kind)):
        cursor.execute("SELECT count(*) from house where orient = '" + orient_kind[i] + "'")
        count = cursor.fetchall()
        orient_data.append({
     'value': count[0][0], 'name': orient_kind[i]})
    cursor.close()
    print(orient_data)
    return jsonify({
     "orient_kind": orient_kind, "orient_data": orient_data})

全国范围内租房房屋价格情况分析展示如图所示：

@app.route('/price',methods=['GET'])
def price():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    price_kind = ['<=1000', '1001~2000', '2001~3000', '3001~4000', '4001~5000', '5001~6000', '6001~7000', '7001~8000', '8001~9000', '9001~10000', '>10000']
    price_data = []
    # 获取到每种价格类别对应的个数
    # <=1000
    cursor.execute("SELECT count(*) from house where price between 0 and 1000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    # 1001~2000
    cursor.execute("SELECT count(*) from house where price between 1001 and 2000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    # 2001~3000
    cursor.execute("SELECT count(*) from house where price between 2001 and 3000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    # 3001~4000
    cursor.execute("SELECT count(*) from house where price between 3001 and 4000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    # 4001~5000
    cursor.execute("SELECT count(*) from house where price between 4001 and 5000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    # 5001~6000
    cursor.execute("SELECT count(*) from house where price between 5001 and 6000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    # 6001~7000
    cursor.execute("SELECT count(*) from house where price between 6001 and 7000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    # 7001~8000
    cursor.execute("SELECT count(*) from house where price between 7001 and 8000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    # 8001~9000
    cursor.execute("SELECT count(*) from house where price between 8001 and 9000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    # 9001~10000
    cursor.execute("SELECT count(*) from house where price between 9001 and 10000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    # >10000
    cursor.execute("SELECT count(*) from house where price >10000;")
    count = cursor.fetchall()
    price_data.append(count[0][0])
    cursor.close()
    print(price_data)
    return jsonify({
     "price_kind": price_kind, "price_data": price_data})

全国范围内租房房屋价格与房屋面积关系情况分析展示如图所示：

@app.route('/relation',methods=['GET'])
def relation():
    conn = pymysql.connect(host='localhost', user='root', password='123456', port=3366, db='lagou',
                           charset='utf8mb4')
    cursor = conn.cursor()
    relation_data = []
    cursor.execute("select count(*) from house;")
    count = cursor.fetchall()
    #print(count[0][0])
    cursor.execute("SELECT area,price from house;")
    result = cursor.fetchall()
    for i in range(count[0][0]):
        relation_data.append(list(result[i]))
    #print(relation_data)
    cursor.close()
    return jsonify({
     "relation_data": relation_data})

（十一）智能预测

@app.route('/predict',methods=['GET'])
def predict():
    y_data = ['0—10K', '10—20K', '20—30K', '30—40K', '40K以上']
    positionName = str(request.args['positionName']).lower()
    model = str(request.args['model'])
    with open(positionName+'_'+model+'.model', 'rb') as fr:
        selected_model = pickle.load(fr)
    companySize = int(request.args['companySize'])
    workYear = int(request.args['workYear'])
    education = int(request.args['education'])
    city = int(request.args['city'])
    x = [companySize, workYear, education, city]
    x = np.array(x)
    y = selected_model.predict(x.reshape(1, -1))
    return jsonify(y_data[y[0]])

（十二）网站接入

项目也存在着一些不足之处：

1.招聘数据库中职位信息较少，爬取网站种类单一，只研究了互联网岗位
2.本系统通过网络爬虫技术抓取招聘信息和租房信息只能进行手动输入网址爬取
3.本系统的岗位信息和租房信息尚未实现交集，若能根据公司地址智能推荐附近房源会更好

你可能感兴趣的:(数据分析,爬虫,数据可视化)

Python数据分析与可视化实战指南 William数据分析 python python 数据
在数据驱动的时代，Python因其简洁的语法、强大的库生态系统以及活跃的社区，成为了数据分析与可视化的首选语言。本文将通过一个详细的案例，带领大家学习如何使用Python进行数据分析，并通过可视化来直观呈现分析结果。一、环境准备1.1安装必要库在开始数据分析和可视化之前，我们需要安装一些常用的库。主要包括pandas、numpy、matplotlib和seaborn等。这些库分别用于数据处理、数学
Pyecharts数据可视化大屏：打造沉浸式数据分析体验我的运维人生信息可视化数据分析数据挖掘运维开发技术共享
Pyecharts数据可视化大屏：打造沉浸式数据分析体验在当今这个数据驱动的时代，如何将海量数据以直观、生动的方式展现出来，成为了数据分析师和企业决策者关注的焦点。Pyecharts，作为一款基于Python的开源数据可视化库，凭借其丰富的图表类型、灵活的配置选项以及高度的定制化能力，成为了构建数据可视化大屏的理想选择。本文将深入探讨如何利用Pyecharts打造数据可视化大屏，并通过实际代码案例
Day1笔记-Python简介&标识符和关键字&输入输出 ~在杰难逃~ Python python 开发语言大数据数据分析数据挖掘
大家好，从今天开始呢，杰哥开展一个新的专栏，当然，数据分析部分也会不定时更新的，这个新的专栏主要是讲解一些Python的基础语法和知识，帮助0基础的小伙伴入门和学习Python，感兴趣的小伙伴可以开始认真学习啦！一、Python简介【了解】1.计算机工作原理编程语言就是用来定义计算机程序的形式语言。我们通过编程语言来编写程序代码，再通过语言处理程序执行向计算机发送指令，让计算机完成对应的工作，编程
pyecharts——绘制柱形图折线图 2224070247 信息可视化 python java 数据可视化
一、pyecharts概述自2013年6月百度EFE(ExcellentFrontEnd）数据可视化团队研发的ECharts1.0发布到GitHub网站以来，ECharts一直备受业界权威的关注并获得广泛好评，成为目前成熟且流行的数据可视化图表工具，被应用到诸多数据可视化的开发领域。Python作为数据分析领域最受欢迎的语言，也加入ECharts的使用行列，并研发出方便Python开发者使用的数据
高级 ECharts 技巧：自定义图表主题与样式 SnowMan1993 echarts 信息可视化数据分析
ECharts是一个强大的数据可视化库，提供了多种内置主题和样式，但你也可以根据项目的设计需求，自定义图表的主题与样式。本文将介绍如何使用ECharts自定义图表主题，以提升数据可视化的吸引力和一致性。1.什么是ECharts主题？ECharts的主题是指定义图表样式的配置项，包括颜色、字体、线条样式等。通过预设主题，你可以快速更改图表的整体风格，而自定义主题则允许你在此基础上进行个性化设置。2.
Python爬虫解析工具之xpath使用详解 eqa11 python 爬虫开发语言
文章目录Python爬虫解析工具之xpath使用详解一、引言二、环境准备1、插件安装2、依赖库安装三、xpath语法详解1、路径表达式2、通配符3、谓语4、常用函数四、xpath在Python代码中的使用1、文档树的创建2、使用xpath表达式3、获取元素内容和属性五、总结Python爬虫解析工具之xpath使用详解一、引言在Python爬虫开发中，数据提取是一个至关重要的环节。xpath作为一门
nosql数据库技术与应用知识点皆过客，揽星河 NoSQL nosql 数据库大数据数据分析数据结构非关系型数据库
Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)
《Python数据分析实战终极指南》 xjt921122 python 数据分析开发语言
对于分析师来说，大家在学习Python数据分析的路上，多多少少都遇到过很多大坑**，有关于技能和思维的**：Excel已经没办法处理现有的数据量了，应该学Python吗？找了一大堆Python和Pandas的资料来学习，为什么自己动手就懵了？跟着比赛类公开数据分析案例练了很久，为什么当自己面对数据需求还是只会数据处理而没有分析思路？学了对比、细分、聚类分析，也会用PEST、波特五力这类分析法，为啥
Python开发常用的三方模块如下：换个网名有点难 python 开发语言
Python是一门功能强大的编程语言，拥有丰富的第三方库，这些库为开发者提供了极大的便利。以下是100个常用的Python库，涵盖了多个领域：1、NumPy，用于科学计算的基础库。2、Pandas，提供数据结构和数据分析工具。3、Matplotlib，一个绘图库。4、Scikit-learn，机器学习库。5、SciPy，用于数学、科学和工程的库。6、TensorFlow，由Google开发的开源机
ES聚合分析原理与代码实例讲解光剑书架上的书大厂Offer收割机面试题简历程序员读书硅基计算碳基计算认知计算生物计算深度学习神经网络大数据 AIGC AGI LLM Java Python 架构设计 Agent 程序员实现财富自由
ES聚合分析原理与代码实例讲解1.背景介绍1.1问题的由来在大规模数据分析场景中，特别是在使用Elasticsearch（ES）进行数据存储和检索时，聚合分析成为了一个至关重要的功能。聚合分析允许用户对数据集进行细分和分组，以便深入探索数据的结构和模式。这在诸如实时监控、日志分析、业务洞察等领域具有广泛的应用。1.2研究现状目前，ES聚合分析已经成为现代大数据平台的核心组件之一。它支持多种类型的聚
Java爬虫框架（一）--架构设计狼图腾-狼之传说 java 框架 java 任务 html解析器存储电子商务
一、架构图那里搜网络爬虫框架主要针对电子商务网站进行数据爬取，分析，存储，索引。爬虫：爬虫负责爬取，解析，处理电子商务网站的网页的内容数据库：存储商品信息索引：商品的全文搜索索引Task队列：需要爬取的网页列表Visited表：已经爬取过的网页列表爬虫监控平台：web平台可以启动，停止爬虫，管理爬虫，task队列，visited表。二、爬虫1.流程1)Scheduler启动爬虫器，TaskMast
Java：爬虫框架 dingcho Java java 爬虫
一、ApacheNutch2【参考地址】Nutch是一个开源Java实现的搜索引擎。它提供了我们运行自己的搜索引擎所需的全部工具。包括全文搜索和Web爬虫。Nutch致力于让每个人能很容易,同时花费很少就可以配置世界一流的Web搜索引擎.为了完成这一宏伟的目标,Nutch必须能够做到:每个月取几十亿网页为这些网页维护一个索引对索引文件进行每秒上千次的搜索提供高质量的搜索结果简单来说Nutch支持分
WebMagic：强大的Java爬虫框架解析与实战 Aaron_945 Java java 爬虫开发语言
文章目录引言官网链接WebMagic原理概述基础使用1.添加依赖2.编写PageProcessor高级使用1.自定义Pipeline2.分布式抓取优点结论引言在大数据时代，网络爬虫作为数据收集的重要工具，扮演着不可或缺的角色。Java作为一门广泛使用的编程语言，在爬虫开发领域也有其独特的优势。WebMagic是一个开源的Java爬虫框架，它提供了简单灵活的API，支持多线程、分布式抓取，以及丰富的
00. 这里整理了最全的爬虫框架（Java + Python）有一只柴犬爬虫系列爬虫 java python
目录1、前言2、什么是网络爬虫3、常见的爬虫框架3.1、java框架3.1.1、WebMagic3.1.2、Jsoup3.1.3、HttpClient3.1.4、Crawler4j3.1.5、HtmlUnit3.1.6、Selenium3.2、Python框架3.2.1、Scrapy3.2.2、BeautifulSoup+Requests3.2.3、Selenium3.2.4、PyQuery3.2
GenVisR 基因组数据可视化实战(三) 11的雾
3.genCov画每个突变位点附件的coverage，跟igv有点相似。这个操作起来很复杂，但是图还是挺有用的。可以考虑。由于我的referencegenomebuild是hg38BiocManager::install(c("TxDb.Hsapiens.UCSC.hg38.knownGene","BSgenome.Hsapiens.UCSC.hg38"))library(TxDb.Hsapien
Python数据分析与可视化 jun778895 python 数据分析开发语言
Python数据分析与可视化是一个涉及数据处理、分析和以图形化方式展示数据的过程，它对于数据科学家、分析师以及任何需要从数据中提取洞察力的专业人员来说至关重要。以下将详细探讨Python在数据分析与可视化方面的应用，包括常用的库、数据处理流程、可视化技巧以及实际应用案例。一、Python数据分析与可视化的重要性数据可视化是将数据以图形或图像的形式表示出来，以便人们能够更直观地理解数据背后的信息和规
python爬取微信小程序数据,python爬取小程序数据 2301_81900439 前端
大家好，小编来为大家解答以下问题，python爬取微信小程序数据，python爬取小程序数据，现在让我们一起来看看吧！Python爬虫系列之微信小程序实战基于Scrapy爬虫框架实现对微信小程序数据的爬取首先，你得需要安装抓包工具，这里推荐使用Charles，至于怎么使用后期有时间我会出一个事例最重要的步骤之一就是分析接口，理清楚每一个接口功能，然后连接起来形成接口串思路,再通过Spider的回调
【Python】tkinter及组件如何使用小九不懂SAP 我的Python日记 python 开发语言 tkinter
一、tkinter的应用场景tkinter是Python的标准GUI（图形用户界面）库，它提供了丰富的控件和工具，使得开发者能够轻松创建跨平台的桌面应用程序。以下是一些tkinter的常见应用场景：桌面应用程序开发：开发者可以使用tkinter来创建各种桌面应用程序，如文本编辑器、计算器、图片查看器、游戏等。这些应用程序可以具有复杂的用户界面，包括窗口、按钮、文本框、下拉菜单、滚动条等。数据可视化
大模型训练数据库Common Crawl WindyChanChan 数据集语言模型数据库
CommonCrawl介绍‌‌CommonCrawl是一个非营利组织，致力于通过大规模分布式爬虫系统定期抓取整个Web并将其存储在一个可公开访问的数据库中。CommonCrawl的数据收集和处理过程包括使用Python开源爬虫工具收集全球范围内的网站数据，并将其上传到‌CommonCrawl基金会的数据仓库中。该项目从2008年开始，至今已经积累了大量的原始网页数据、元数据和文本提取数据。这些数据
Python精选200Tips：121-125 AnFany Python200+Tips python 开发语言
Spendyourtimeonself-improvement121Requests-简化的HTTP请求处理发送GET请求发送POST请求发送PUT请求发送DELETE请求会话管理处理超时文件上传122BeautifulSoup-网页解析和抓取解析HTML和XML文档查找单个标签查找多个标签使用CSS选择器查找标签提取文本修改文档内容删除标签处理XML文档123Scrapy-强大的网络爬虫框架示例
爬虫技术抓取网站数据被限制怎么处理 Bearjumpingcandy 爬虫
爬虫技术用于抓取网站数据时，可能会遇到一些限制，常见的包括反爬机制、速率限制、IP封禁等。以下是应对这些情况的一些策略：尊重robots.txt：每个网站都有robots.txt文件，遵循其中的规定可以避免触犯网站的抓取规则。设置合理频率：控制爬虫请求的速度，通过添加延迟或使用代理服务器，减少对目标网站的压力。使用代理：获取并使用代理IP地址可以更换访问来源，降低被识别的可能性。模拟用户行为：使用
网站推广爬虫 Bearjumpingcandy 爬虫
网站推广爬虫是一种用于升网站曝光度和推广效果的工具。它通过自动化地访问和收集网站信息，从而实现对目标网站的广告、关键词、排名等数据进行分析和优化。以下是网站推广爬虫的一些介绍：数据收集：网站推广爬虫可以自动访问目标网站，并收集相关的数据，如网站流量、关键词排名、竞争对手信息等。这些数据可以帮助网站推广人员了解网站的现状和竞争环境，从而制定相应的推广策略。关键词优化：通过分析搜索引擎的关键词排名情况
爬虫技术抓取网站数据 Bearjumpingcandy 爬虫
爬虫技术是一种自动化获取网站数据的技术，它可以模拟人类浏览器的行为，访问网页并提取所需的信息。以下是爬虫技术抓取网站数据的一般步骤：发起HTTP请求：爬虫首先会发送HTTP请求到目标网站，获取网页的内容。解析HTML：获取到网页内容后，爬虫会使用HTML解析器解析HTML代码，提取出需要的数据。数据提取：通过使用XPath、CSS选择器或正则表达式等工具，爬虫可以从HTML中提取出所需的数据，如文
爬虫技术抓取网站数据 Bearjumpingcandy 爬虫
爬虫技术是指通过程序自动访问网页并提取数据的技术。一般来说，爬虫技术包含以下几个步骤：确定目标网站：确定需要抓取的网站，并了解其页面结构和数据特点。分析页面结构：分析网页的结构和源代码，找到需要抓取的数据在页面中的位置和标识。编写爬虫程序：使用编程语言（如Python）编写爬虫程序，实现对目标网站的自动访问和数据提取。处理抓取数据：对抓取到的数据进行清洗、去重、整合等处理，以便后续的分析和利用。爬
爬虫之隧道代理：如何在爬虫中使用代理IP？ 2401_87251497 python 开发语言爬虫网络 tcp/ip 网络协议
在进行网络爬虫时，使用代理IP是一种常见的方式来绕过网站的反爬虫机制，提高爬取效率和数据质量。本文将详细介绍如何在爬虫中使用隧道代理，包括其原理、优势以及具体的实现方法。无论您是爬虫新手还是有经验的开发者，这篇文章都将为您提供实用的指导。什么是隧道代理？隧道代理是一种高级的代理技术，它通过创建一个加密的隧道，将数据从客户端传输到代理服务器，再由代理服务器转发到目标服务器。这样不仅可以隐藏客户端的真
分享一个基于python的电子书数据采集与可视化分析 hadoop电子书数据分析与推荐系统 spark大数据毕设项目（源码、调试、LW、开题、PPT) 计算机源码社 Python项目大数据大数据 python hadoop 计算机毕业设计选题计算机毕业设计源码数据分析 spark毕设
作者：计算机源码社个人简介：本人八年开发经验，擅长Java、Python、PHP、.NET、Node.js、Android、微信小程序、爬虫、大数据、机器学习等，大家有这一块的问题可以一起交流！学习资料、程序开发、技术解答、文档报告如需要源码，可以扫取文章下方二维码联系咨询Java项目微信小程序项目Android项目Python项目PHP项目ASP.NET项目Node.js项目选题推荐项目实战|p
python读写CSV文件 bcbobo21cn .Net python 开发语言机器学习 CSV
做数据分析，有时候要分析的数据在CSV文件里；先看一下python读写CSV文件；importpandasaspddf=pd.read_csv('test1.csv')print(df)print('')print(df.head(2))companyname=["A1","B2","E3","F4"]legperson=["lier","yanqi","wangwu","zhangsan"]le
python抓取网页内容401应该用哪个库_python3使用requests模块爬取页面内容入门坂田月半
python的爬虫相关模块有很多，除了requests模块，再如urllib和pycurl以及tornado等。相比而言，requests模块是相对简单易上手的。通过文本，大家可以迅速学会使用python的requests模块爬取页码内容。1.Requests唯一的一个非转基因的PythonHTTP库，人类可以安全享用。官网：http://cn.python-requests.org/zh_CN/
软件测试/测试开发/全日制 |利用Django REST framework构建微服务霍格沃兹-慕漓 django 微服务 sqlite
霍格沃兹测试开发学社推出了《Python全栈开发与自动化测试班》。本课程面向开发人员、测试人员与运维人员，课程内容涵盖Python编程语言、人工智能应用、数据分析、自动化办公、平台开发、UI自动化测试、接口测试、性能测试等方向。为大家提供更全面、更深入、更系统化的学习体验，课程还增加了名企私教服务内容，不仅有名企经理为你1v1辅导，还有行业专家进行技术指导，针对性地解决学习、工作中遇到的难题。让找
【Python爬虫】百度百科词条内容 PokiFighting 数据处理 python 爬虫开发语言
词条内容我这里随便选取了一个链接，用的是FBI的词条importurllib.requestimporturllib.parsefromlxmlimportetreedefquery(url):headers={'user-agent':'Mozilla/5.0(WindowsNT6.1;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/80.
JVM StackMapTable 属性的作用及理解 lijingyao8206 jvm 字节码 Class文件 StackMapTable
在Java 6版本之后JVM引入了栈图(Stack Map Table)概念。为了提高验证过程的效率，在字节码规范中添加了Stack Map Table属性，以下简称栈图，其方法的code属性中存储了局部变量和操作数的类型验证以及字节码的偏移量。也就是一个method需要且仅对应一个Stack Map Table。在Java 7版
回调函数调用方法百合不是茶 java
最近在看大神写的代码时,.发现其中使用了很多的回调 ,以前只是在学习的时候经常用到 ,现在写个笔记记录一下代码很简单: MainDemo :调用方法得到方法的返回结果
[时间机器]制造时间机器需要一些材料 comsci 制造
根据我的计算和推测,要完全实现制造一台时间机器,需要某些我们这个世界不存在的物质和材料... 甚至可以这样说,这种材料和物质,我们在反应堆中也无法获得......
开口埋怨不如闭口做事邓集海邓集海做人做事工作
“开口埋怨，不如闭口做事。”不是名人名言，而是一个普通父亲对儿子的训导。但是，因为这句训导，这位普通父亲却造就了一个名人儿子。这位普通父亲造就的名人儿子，叫张明正。　　　　张明正出身贫寒，读书时成绩差，常挨老师批评。高中毕业，张明正连普通大学的分数线都没上。高考成绩出来后，平时开口怨这怨那的张明正，不从自身找原因，而是不停地埋怨自己家庭条件不好、埋怨父母没有给他创造良好的学习环境。　　　　
jQuery插件开发全解析，类级别与对象级别开发 IT独行者 jquery 开发插件　函数
jQuery插件的开发包括两种：一种是类级别的插件开发，即给 jQuery添加新的全局函数，相当于给 jQuery类本身添加方法。 jQuery的全局函数就是属于 jQuery命名空间的函数，另一种是对象级别的插件开发，即给 jQuery对象添加方法。下面就两种函数的开发做详细的说明。 1 、类级别的插件开发类级别的插件开发最直接的理解就是给jQuer
Rome解析Rss 413277409 Rome解析Rss
import java.net.URL; import java.util.List; import org.junit.Test; import com.sun.syndication.feed.synd.SyndCategory; import com.sun.syndication.feed.synd.S
RSA加密解密无量加密解密 rsa
RSA加密解密代码代码有待整理 package com.tongbanjie.commons.util; import java.security.Key; import java.security.KeyFactory; import java.security.KeyPair; import java.security.KeyPairGenerat
linux 软件安装遇到的问题 aichenglong linux 遇到的问题 ftp
1 ftp配置中遇到的问题 500 OOPS: cannot change directory 出现该问题的原因:是SELinux安装机制的问题.只要disable SELinux就可以了修改方法:1 修改/etc/selinux/config 中SELINUX=disabled 2 source /etc
面试心得 alafqq 面试
最近面试了好几家公司。记录下；支付宝，面试我的人胖胖的，看着人挺好的；博彦外包的职位，面试失败；阿里金融，面试官人也挺和善，只不过我让他吐血了。。。由于印象比较深，记录下； 1，自我介绍 2，说下八种基本类型；（算上string。楼主才答了3种，哈哈，string其实不是基本类型，是引用类型） 3，什么是包装类，包装类的优点； 4，平时看过什么书？NND，什么书都没看过。。照样
java的多态性探讨百合不是茶 java
java的多态性是指main方法在调用属性的时候类可以对这一属性做出反应的情况 //package 1; class A{ public void test(){ System.out.println("A"); } } class D extends A{ public void test(){ S
网络编程基础篇之JavaScript-学习笔记 bijian1013 JavaScript
1.documentWrite <html> <head> <script language="JavaScript"> document.write("这是电脑网络学校"); document.close(); </script> </h
探索JUnit4扩展：深入Rule bijian1013 JUnit Rule 单元测试
本文将进一步探究Rule的应用，展示如何使用Rule来替代@BeforeClass，@AfterClass，@Before和@After的功能。在上一篇中提到，可以使用Rule替代现有的大部分Runner扩展，而且也不提倡对Runner中的withBefores()，withAfte
[CSS]CSS浮动十五条规则 bit1129 css
这些浮动规则，主要是参考CSS权威指南关于浮动规则的总结，然后添加一些简单的例子以验证和理解这些规则。 1. 所有的页面元素都可以浮动 2. 一个元素浮动后，会成为块级元素，比如<span>,a, strong等都会变成块级元素 3.一个元素左浮动，会向最近的块级父元素的左上角移动，直到浮动元素的左外边界碰到块级父元素的左内边界；如果这个块级父元素已经有浮动元素停靠了
【Kafka六】Kafka Producer和Consumer多Broker、多Partition场景 bit1129 partition
0.Kafka服务器配置 3个broker 1个topic，6个partition，副本因子是2 2个consumer，每个consumer三个线程并发读取 1. Producer package kafka.examples.multibrokers.producers; import java.util.Properties; import java.util.
zabbix_agentd.conf配置文件详解 ronin47 zabbix 配置文件
Aliaskey的别名，例如 Alias=ttlsa.userid:vfs.file.regexp[/etc/passwd,^ttlsa:.:([0-9]+),,,,\1]，或者ttlsa的用户ID。你可以使用key：vfs.file.regexp[/etc/passwd,^ttlsa:.: ([0-9]+),,,,\1]，也可以使用ttlsa.userid。备注: 别名不能重复，但是可以有多个
java--19.用矩阵求Fibonacci数列的第N项 bylijinnan fibonacci
参考了网上的思路，写了个Java版的： public class Fibonacci { final static int[] A={1,1,1,0}; public static void main(String[] args) { int n=7; for(int i=0;i<=n;i++){ int f=fibonac
Netty源码学习-LengthFieldBasedFrameDecoder bylijinnan java netty
先看看LengthFieldBasedFrameDecoder的官方API http://docs.jboss.org/netty/3.1/api/org/jboss/netty/handler/codec/frame/LengthFieldBasedFrameDecoder.html API举例说明了LengthFieldBasedFrameDecoder的解析机制，如下：实
AES加密解密 chicony 加密解密
AES加解密算法，使用Base64做转码以及辅助加密： package com.wintv.common; import javax.crypto.Cipher; import javax.crypto.spec.IvParameterSpec; import javax.crypto.spec.SecretKeySpec; import sun.misc.BASE64Decod
文件编码格式转换 ctrain 编码格式
package com.test; import java.io.File; import java.io.FileInputStream; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStream;
mysql 在linux客户端插入数据中文乱码 daizj mysql 中文乱码
1、查看系统客户端，数据库，连接层的编码查看方法： http://daizj.iteye.com/blog/2174993 进入mysql，通过如下命令查看数据库编码方式： mysql> show variables like 'character_set_%'; +--------------------------+------
好代码是廉价的代码 dcj3sjt126com 程序员读书
长久以来我一直主张：好代码是廉价的代码。当我跟做开发的同事说出这话时，他们的第一反应是一种惊愕，然后是将近一个星期的嘲笑，把它当作一个笑话来讲。当他们走近看我的表情、知道我是认真的时，才收敛一点。当最初的惊愕消退后，他们会用一些这样的话来反驳： “好代码不廉价，好代码是采用经过数十年计算机科学研究和积累得出的最佳实践设计模式和方法论建立起来的精心制作的程序代码。” 我只
Android网络请求库——android-async-http dcj3sjt126com android
在iOS开发中有大名鼎鼎的ASIHttpRequest库，用来处理网络请求操作，今天要介绍的是一个在Android上同样强大的网络请求库android-async-http，目前非常火的应用Instagram和Pinterest的Android版就是用的这个网络请求库。这个网络请求库是基于Apache HttpClient库之上的一个异步网络请求处理库，网络处理均基于Android的非UI线程，通
ORACLE 复习笔记之SQL语句的优化 eksliang SQL优化 Oracle sql语句优化 SQL语句的优化
转载请出自出处：http://eksliang.iteye.com/blog/2097999 SQL语句的优化总结如下 sql语句的优化可以按照如下六个步骤进行：合理使用索引避免或者简化排序消除对大表的扫描避免复杂的通配符匹配调整子查询的性能 EXISTS和IN运算符下面我就按照上面这六个步骤分别进行总结：
浅析：Android 嵌套滑动机制（NestedScrolling） gg163 android 移动开发滑动机制嵌套
谷歌在发布安卓 Lollipop版本之后，为了更好的用户体验，Google为Android的滑动机制提供了NestedScrolling特性 NestedScrolling的特性可以体现在哪里呢？ 比如你使用了Toolbar，下面一个ScrollView，向上滚
使用hovertree菜单作为后台导航 hvt JavaScript jquery .net hovertree asp.net
hovertree是一个jquery菜单插件，官方网址：http://keleyi.com/jq/hovertree/ ，可以登录该网址体验效果。 0.1.3版本：http://keleyi.com/jq/hovertree/demo/demo.0.1.3.htm hovertree插件包含文件： http://keleyi.com/jq/hovertree/css
SVG 教程（二）矩形天梯梦 svg
SVG <rect> SVG Shapes SVG有一些预定义的形状元素，可被开发者使用和操作：矩形 <rect> 圆形 <circle> 椭圆 <ellipse> 线 <line> 折线 <polyline> 多边形 <polygon> 路径 <path>
一个简单的队列 luyulong java 数据结构队列
public class MyQueue { private long[] arr; private int front; private int end; // 有效数据的大小 private int elements; public MyQueue() { arr = new long[10]; elements = 0; front
基础数据结构和算法九：Binary Search Tree sunwinner Algorithm
A binary search tree (BST) is a binary tree where each node has a Comparable key (and an associated value) and satisfies the restriction that the key in any node is larger than the keys in all
项目出现的一些问题和体会 Steven-Walker DAO Web servlet
第一篇博客不知道要写点什么，就先来点近阶段的感悟吧。这几天学了servlet和数据库等知识，就参照老方的视频写了一个简单的增删改查的，完成了最简单的一些功能，使用了三层架构。 dao层完成的是对数据库具体的功能实现，service层调用了dao层的实现方法，具体对servlet提供支持。 &
高手问答：Java老A带你全面提升Java单兵作战能力！ ITeye管理员 java
本期特邀《Java特种兵》作者：谢宇，CSDN论坛ID: xieyuooo 针对JAVA问题给予大家解答，欢迎网友积极提问，与专家一起讨论! 作者简介：淘宝网资深Java工程师，CSDN超人气博主，人称“胖哥”。 CSDN博客地址： http://blog.csdn.net/xieyuooo 作者在进入大学前是一个不折不扣的计算机白痴，曾经被人笑话过不懂鼠标是什么，