【Python爬虫+可视化案例】采集电商网站商品数据信息,并可视化分析

爬虫+可视化案例 :苏宁易购

  1. 案例所需要掌握的知识点:
  • selenium的使用
  • html标签数据解析方法
  1. 需要准备的环境:
  • python 3.8
  • pycharm 2022专业版
  • selenium python里面的第三方库 可以用来操作浏览器

爬虫代码展示

所需模块
【代码领取 请看文末名片】

import time
from selenium import webdriver  # 第三方库 操作浏览器驱动的 浏览器驱动用来操作浏览器的
from selenium.webdriver.common.by import By
import csv

新建文件

f = open('苏宁易购.csv', mode='a', encoding='utf-8', newline='')
csv_writer = csv.writer(f)
csv_writer.writerow(['title', 'price', 'comment', 'store_stock', 'href'])
  1. 打开浏览器 是没问题的
driver = webdriver.Chrome()
  1. 打开网站
driver.get("https://****/iPhone14/")
for i in range(15):
  1. 将滚动条 拉到最下方
    # 通过js代码去操作 页面
    # deocument.documentElement.scrollHeight: 获取当前整个页面的高度
    # document.documentElement.scrollTop: 当前滚动条的位置
    # document.documentElement.scrollTop = document.documentElement.scrollHeight: 将当前滚动条的位置设置为 整个页面的高度
    for page in range(0, 14500, 2900):
        driver.execute_script('document.documentElement.scrollTop = ' + str(page))
        time.sleep(1)
  1. 取数据

.product-box: 匹配到所有的商品标签

    goods = driver.find_elements(By.CSS_SELECTOR, ".product-box")
    ""代码获取:文末名片""
    for good in goods:
        price = good.find_element(By.CSS_SELECTOR, ".price-box").text
        title = good.find_element(By.CSS_SELECTOR, ".title-selling-point").text
        href = good.find_element(By.CSS_SELECTOR, ".title-selling-point a").get_attribute("href")
        comment = good.find_element(By.CSS_SELECTOR, ".evaluate-old.clearfix").text
        store_stock = good.find_element(By.CSS_SELECTOR, ".store-stock").text
        print(title, price, comment, store_stock)
        csv_writer.writerow([title, price, comment, store_stock, href])
    driver.find_element(By.CSS_SELECTOR, "#nextPage").click()
# 阻塞 不让程序结束 因为程序结束 浏览器就自动关闭了
# 退出浏览器
driver.quit()

【Python爬虫+可视化案例】采集电商网站商品数据信息,并可视化分析_第1张图片

可视化效果演示

一共就是以下三个表格,外加一个词云图

我觉得如果是大学生把这个交给老师,应该也许大概可能,老师会觉得你还不错吧,哈哈哈

【Python爬虫+可视化案例】采集电商网站商品数据信息,并可视化分析_第2张图片

开个玩笑,不过现在大学生基本都写完作业,早早的放假了

还是希望这篇文章可以帮助到大家吧,emm不过最近没怎么更新,已经没什么人看文章了哈哈

【Python爬虫+可视化案例】采集电商网站商品数据信息,并可视化分析_第3张图片
【Python爬虫+可视化案例】采集电商网站商品数据信息,并可视化分析_第4张图片
【Python爬虫+可视化案例】采集电商网站商品数据信息,并可视化分析_第5张图片
【Python爬虫+可视化案例】采集电商网站商品数据信息,并可视化分析_第6张图片

可视化代码

{源码领取,请看文末名片
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "d19250a4",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd \n",
    "import jieba\n",
    "import time\n",
    "from pyecharts.charts import Bar,Line,Map,Page,Pie  \n",
    "from pyecharts import options as opts \n",
    "from pyecharts.globals import SymbolType"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "69c29f78",
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "
\n", "\n", "\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
title price comment store_stock href
0 Apple iPhone 14 128G 午夜色 移动联通电信5G手机 ¥5999.00 1.3万+评价 苏宁自营 https://product.suning.com/0000000000/12391268...
1 Apple iPhone 14 Pro Max 256G 暗紫色 移动联通电信5G手机 ¥9899.00 6300+评价 苏宁自营 https://product.suning.com/0000000000/12391268...
2 Apple iPhone 14 Pro Max 128G 暗紫色 移动联通电信5G手机 ¥8999.00 6300+评价 苏宁自营 https://product.suning.com/0000000000/12391268...
3 Apple iPhone 14 Pro 256G 深空黑色 移动联通电信5G手机 ¥8899.00 6400+评价 苏宁自营 https://product.suning.com/0000000000/12391268...
4 Apple iPhone 14 Pro 128G 暗紫色 移动联通电信5G手机 ¥7999.00 6400+评价 苏宁自营 https://product.suning.com/0000000000/12391268...
... ... ...
1693 圣幻 iphone11手机壳苹果11pro硅胶套iphone11PROMAX全包防摔ipho... ¥46.00 200+评价 任意门数码专营店 https://product.suning.com/0070067325/11398343...
1694 圣幻 iphone11苹果11proMax手机壳薄透明苹果11全包边11Pro电镀软壳防摔1... ¥46.00 300+评价 任意门数码专营店 https://product.suning.com/0070067325/11398610...
1695 圣幻 iphone11/12/12pro手机壳透明防摔苹果12ProMAX保护套新款轻薄硅胶... ¥46.00 1800+评价 任意门数码专营店 https://product.suning.com/0070067325/12179125...
1696 VMONN苹果13手机壳新款防摔iphone13Pro max翻盖保护皮套mini钱包插卡 ¥48.00 0评价 骑猪漫舞数码配件专营店 https://product.suning.com/0070154072/12321840...
1697 KIVee 可逸 PD20W快充套苹果PD充电器+1米数据线适用于苹果iPhone14/13pro ¥58.00 200+评价 苏宁自营 https://product.suning.com/0000000000/12395644...
\n"
, "

1698 rows × 5 columns

\n"
, "
"
], "text/plain": [ " title price comment \\\n", "0 Apple iPhone 14 128G 午夜色 移动联通电信5G手机 ¥5999.00 1.3万+评价 \n", "1 Apple iPhone 14 Pro Max 256G 暗紫色 移动联通电信5G手机 ¥9899.00 6300+评价 \n", "2 Apple iPhone 14 Pro Max 128G 暗紫色 移动联通电信5G手机 ¥8999.00 6300+评价 \n", "3 Apple iPhone 14 Pro 256G 深空黑色 移动联通电信5G手机 ¥8899.00 6400+评价 \n", "4 Apple iPhone 14 Pro 128G 暗紫色 移动联通电信5G手机 ¥7999.00 6400+评价 \n", "... ... ... ... \n", "1693 圣幻 iphone11手机壳苹果11pro硅胶套iphone11PROMAX全包防摔ipho... ¥46.00 200+评价 \n", "1694 圣幻 iphone11苹果11proMax手机壳薄透明苹果11全包边11Pro电镀软壳防摔1... ¥46.00 300+评价 \n", "1695 圣幻 iphone11/12/12pro手机壳透明防摔苹果12ProMAX保护套新款轻薄硅胶... ¥46.00 1800+评价 \n", "1696 VMONN苹果13手机壳新款防摔iphone13Pro max翻盖保护皮套mini钱包插卡 ¥48.00 0评价 \n", "1697 KIVee 可逸 PD20W快充套苹果PD充电器+1米数据线适用于苹果iPhone14/13pro ¥58.00 200+评价 \n", "\n", " store_stock href \n", "0 苏宁自营 https://product.suning.com/0000000000/12391268... \n", "1 苏宁自营 https://product.suning.com/0000000000/12391268... \n", "2 苏宁自营 https://product.suning.com/0000000000/12391268... \n", "3 苏宁自营 https://product.suning.com/0000000000/12391268... \n", "4 苏宁自营 https://product.suning.com/0000000000/12391268... \n", "... ... ... \n", "1693 任意门数码专营店 https://product.suning.com/0070067325/11398343... \n", "1694 任意门数码专营店 https://product.suning.com/0070067325/11398610... \n", "1695 任意门数码专营店 https://product.suning.com/0070067325/12179125... \n", "1696 骑猪漫舞数码配件专营店 https://product.suning.com/0070154072/12321840... \n", "1697 苏宁自营 https://product.suning.com/0000000000/12395644... \n", "\n", "[1698 rows x 5 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_tb = pd.read_csv('苏宁易购.csv')\n", "df_tb" ] }, { "cell_type": "code", "execution_count": 7, "id": "199ddb66", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 1698 entries, 0 to 1697\n", "Data columns (total 5 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 title 1698 non-null object\n", " 1 price 1698 non-null object\n", " 2 comment 1692 non-null object\n", " 3 store_stock 1698 non-null object\n", " 4 href 1698 non-null object\n", "dtypes: object(5)\n", "memory usage: 66.5+ KB\n" ] } ], "source": [ "df_tb.info()" ] }, { "cell_type": "code", "execution_count": 8, "id": "359af5ec", "metadata": { "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ ":3: FutureWarning: The default value of regex will change from True to False in a future version.\n", " df_tb['price'] = df_tb['price'].str.replace('\\n26.90', '')\n", ":4: FutureWarning: The default value of regex will change from True to False in a future version. In addition, single character regular expressions will*not* be treated as literal strings when regex=True.\n", " df_tb['comment'] = df_tb['comment'].str.replace('+', '')\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
title price comment store_stock href
0 Apple iPhone 14 128G 午夜色 移动联通电信5G手机 5999.0 1.3 苏宁自营 https://product.suning.com/0000000000/12391268...
1 Apple iPhone 14 Pro Max 256G 暗紫色 移动联通电信5G手机 9899.0 6300.0 苏宁自营 https://product.suning.com/0000000000/12391268...
2 Apple iPhone 14 Pro Max 128G 暗紫色 移动联通电信5G手机 8999.0 6300.0 苏宁自营 https://product.suning.com/0000000000/12391268...
3 Apple iPhone 14 Pro 256G 深空黑色 移动联通电信5G手机 8899.0 6400.0 苏宁自营 https://product.suning.com/0000000000/12391268...
4 Apple iPhone 14 Pro 128G 暗紫色 移动联通电信5G手机 7999.0 6400.0 苏宁自营 https://product.suning.com/0000000000/12391268...
\n"
, "
"
], "text/plain": [ " title price comment store_stock \\\n", "0 Apple iPhone 14 128G 午夜色 移动联通电信5G手机 5999.0 1.3 苏宁自营 \n", "1 Apple iPhone 14 Pro Max 256G 暗紫色 移动联通电信5G手机 9899.0 6300.0 苏宁自营 \n", "2 Apple iPhone 14 Pro Max 128G 暗紫色 移动联通电信5G手机 8999.0 6300.0 苏宁自营 \n", "3 Apple iPhone 14 Pro 256G 深空黑色 移动联通电信5G手机 8899.0 6400.0 苏宁自营 \n", "4 Apple iPhone 14 Pro 128G 暗紫色 移动联通电信5G手机 7999.0 6400.0 苏宁自营 \n", "\n", " href \n", "0 https://product.suning.com/0000000000/12391268... \n", "1 https://product.suning.com/0000000000/12391268... \n", "2 https://product.suning.com/0000000000/12391268... \n", "3 https://product.suning.com/0000000000/12391268... \n", "4 https://product.suning.com/0000000000/12391268... " ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_tb['price'] = df_tb['price'].str.replace('¥', '')\n", "df_tb['price'] = df_tb['price'].str.replace('到手价', '')\n", "df_tb['price'] = df_tb['price'].str.replace('\\n26.90', '')\n", "df_tb['comment'] = df_tb['comment'].str.replace('+', '')\n", "df_tb['comment'] = df_tb['comment'].str.replace('评价', '')\n", "df_tb['comment'] = df_tb['comment'].str.replace('万', '')\n", "\n", "df_tb['price'] = df_tb['price'].astype('float64')\n", "df_tb['comment'] = df_tb['comment'].astype('float64')\n", "df_tb.head()" ] }, { "cell_type": "code", "execution_count": 10, "id": "e3225a22", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "store_stock\n", "Apple产品啟韬专卖店 103802.0\n", "任意门数码专营店 5582.0\n", "小米智能生活旗舰店 1.0\n", "小米智能电器旗舰店 0.0\n", "数格尚品数码配件专营店 2417.0\n", "深得二手电脑专营店 0.0\n", "直营 0.0\n", "禧运二手靓品专营店 10.0\n", "科华专营店 84.8\n", "竟纬科技专营店 370.0\n", "绿联官方旗舰店 20620.0\n", "绿联数码旗舰店 8604.8\n", "苏宁二手优品授权旗舰店 0.0\n", "苏宁国际\\n3C数码海外专营店 320.0\n", "苏宁国际\\n八达通海外专营店 1950.0\n", "苏宁国际\\n嘉怡海外专营店 47219.0\n", "苏宁国际\\n德天诺海外专营店 17626.0\n", "苏宁国际\\n方都数码海外旗舰店 33902.0\n", "苏宁国际\\n百思卖海外专营店 225.0\n", "苏宁国际\\n黑海数码海外官方旗舰店 11802.0\n", "苏宁服务\\nApple智能数码苏宁专卖店 1081901.0\n", "苏宁服务\\n小米智能苏宁专卖店 0.0\n", "苏宁服务\\n易购优选数码苏宁旗舰店 0.0\n", "苏宁服务\\n波格朗苏宁旗舰店 21.0\n", "苏宁自营 161359.6\n", "苏宁自营\\n华均魅苏宁旗舰店 416.0\n", "讯天国际手机专营店 480.0\n", "诗薇蒂数码专营店 27.0\n", "质点旗舰店 124720.0\n", "迈动智能数码专营店 300.0\n", "锦际数码专营店 2858.9\n", "顺宇数码专营店 1202.0\n", "骑猪漫舞数码配件专营店 7.0\n", "Name: comment, dtype: float64" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_tb.groupby('store_stock')['comment'].sum()" ] }, { "cell_type": "code", "execution_count": 11, "id": "583489b0", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "store_stock\n", "苏宁服务\\nApple智能数码苏宁专卖店 1081901.0\n", "苏宁自营 161359.6\n", "质点旗舰店 124720.0\n", "Apple产品啟韬专卖店 103802.0\n", "苏宁国际\\n嘉怡海外专营店 47219.0\n", "苏宁国际\\n方都数码海外旗舰店 33902.0\n", "绿联官方旗舰店 20620.0\n", "苏宁国际\\n德天诺海外专营店 17626.0\n", "苏宁国际\\n黑海数码海外官方旗舰店 11802.0\n", "绿联数码旗舰店 8604.8\n", "Name: comment, dtype: float64" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "shop_top10 = df_tb.groupby('store_stock')['comment'].sum().sort_values(ascending=False).head(10)\n", "shop_top10" ] }, { "cell_type": "code", "execution_count": 22, "id": "ad50c7b9", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'C:\\\\02-讲师文件夹\\\\巳月公开课\\\\课题\\\\苏宁易购\\\\1.html'" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#条形图 \n", "#bar1 = Bar(init_opts=opts.InitOpts(width='1350px', height='750px')) \n", "bar1 = Bar() \n", "bar1.add_xaxis(shop_top10.index.tolist())\n", "bar1.add_yaxis('', shop_top10.values.tolist()) \n", "bar1.set_global_opts(title_opts=opts.TitleOpts(title='iphone13排名Top10苏宁店铺'),\n", " xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-15)),\n", " visualmap_opts=opts.VisualMapOpts(max_=28669)\n", " ) \n", "\n", "bar1.render('1.html')" ] }, { "cell_type": "code", "execution_count": 13, "id": "657a7e89", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1000元以上 858\n", "0~50元 317\n", "50~100元 92\n", "100~200元 84\n", "200~300元 8\n", "300~500元 6\n", "500~1000元 2\n", "Name: price, dtype: int64" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cut_bins = [0,50,100,200,300,500,1000,8888] \n", "cut_labels = ['0~50元', '50~100元', '100~200元', '200~300元', '300~500元', '500~1000元', '1000元以上']\n", "\n", "price_cut = pd.cut(df_tb['price'],bins=cut_bins,labels=cut_labels)\n", "price_num = price_cut.value_counts()\n", "price_num" ] }, { "cell_type": "code", "execution_count": 23, "id": "569f1dc3", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'C:\\\\02-讲师文件夹\\\\巳月公开课\\\\课题\\\\苏宁易购\\\\2.html'" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bar3 = Bar() \n", "bar3.add_xaxis(['0~50元', '50~100元', '100~200元', '200~300元', '300~500元', '500~1000元', '1000元以上'])\n", "bar3.add_yaxis('', [895, 486, 701, 288, 370, 411, 260]) \n", "bar3.set_global_opts(title_opts=opts.TitleOpts(title='不同价格区间的商品数量'),\n", " visualmap_opts=opts.VisualMapOpts(max_=900)) \n", "bar3.render('2.html')" ] }, { "cell_type": "code", "execution_count": 15, "id": "4bfcfcb1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "price_cut\n", "0~50元 80778.9\n", "50~100元 27471.5\n", "100~200元 6371.9\n", "200~300元 320.0\n", "300~500元 203.0\n", "500~1000元 1600.0\n", "1000元以上 1118024.2\n", "Name: comment, dtype: float64" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df_tb['price_cut'] = price_cut \n", "\n", "cut_purchase = df_tb.groupby('price_cut')['comment'].sum()\n", "cut_purchase" ] }, { "cell_type": "code", "execution_count": 16, "id": "61b3f3b9", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n"
, "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_pair = [list(z) for z in zip(cut_purchase.index.tolist(), cut_purchase.values.tolist())]\n", "# 绘制饼图\n", "pie1 = Pie() \n", "pie1.add('', data_pair, radius=['35%', '60%'])\n", "pie1.set_global_opts(title_opts=opts.TitleOpts(title='不同价格区间的销售额整体表现'), \n", " legend_opts=opts.LegendOpts(orient='vertical', pos_top='15%', pos_left='2%'))\n", "pie1.set_series_opts(label_opts=opts.LabelOpts(formatter=\"{b}:{d}%\"))\n", "pie1.set_colors(['#EF9050', '#3B7BA9', '#6FB27C', '#FFAF34', '#D8BFD8', '#00BFFF', '#7FFFAA'])\n", "pie1.render_notebook() " ] }, { "cell_type": "code", "execution_count": 17, "id": "11f85a85", "metadata": {}, "outputs": [], "source": [ "def get_cut_words(content_series):\n", " # 读入停用词表\n", " stop_words = [] \n", " \n", " # with open(r\"E:\\py练习\\数据分析\\stop_words.txt\", 'r', encoding='utf-8') as f:\n", " # lines = f.readlines()\n", " # for line in lines:\n", " # stop_words.append(line.strip())\n", "\n", " # 添加关键词\n", " my_words = ['丝袜', '夏天', '女薄款', '一体'] \n", " for i in my_words:\n", " jieba.add_word(i) \n", "\n", " # 自定义停用词\n", "# my_stop_words = []\n", "# stop_words.extend(my_stop_words) \n", "\n", " # 分词\n", " word_num = jieba.lcut(content_series.str.cat(sep='。'), cut_all=False)\n", "\n", " # 条件筛选\n", " word_num_selected = [i for i in word_num if i not in stop_words and len(i)>=2]\n", " \n", " return word_num_selected" ] }, { "cell_type": "code", "execution_count": 18, "id": "9985bd7a", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Building prefix dict from the default dictionary ...\n", "Loading model from cache C:\\Users\\ADMINI~1\\AppData\\Local\\Temp\\jieba.cache\n", "Loading model cost 0.580 seconds.\n", "Prefix dict has been built successfully.\n" ] } ], "source": [ "text = get_cut_words(content_series=df_tb['title'])" ] }, { "cell_type": "code", "execution_count": null, "id": "1db0b1ef-12b8-4190-bad7-f939b6068ce3", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.8" } }, "nbformat": 4, "nbformat_minor": 5 }

最后

今天的案例分享到这里就结束啦

下篇文章再见吧

你可能感兴趣的:(python实战性项目,python,爬虫,开发语言)