selenium爬取淘宝商品信息

文章目录

  • 导入必要的包、登录淘宝,找到要爬取的页面
  • 以第一个商品为例进行爬取
  • 该页面中所有商品信息的爬取
  • 将爬取到的信息存储到excel中

导入必要的包、登录淘宝,找到要爬取的页面

from selenium import webdriver
import pandas as pd
url = 'https://www.taobao.com/' 
driver = webdriver.Chrome("C:\Program Files (x86)\Google\Chrome\Application\chromedriver.exe")
driver.get(url)
#ps:页面打开之后需要登录自己的淘宝账号

登录之后,在搜索框中输入自己想找的商品(我输的是王一博,然后选择了施华洛世奇,然后选的是天猫),最后我进行爬取的页面是:
https://s.taobao.com/search?spm=a230r.1.1998181369.d4919860.37358b56oUZ2tL&q=%E7%8E%8B%E4%B8%80%E5%8D%9A&imgfile=&commend=all&ssid=s5-e&search_type=item&sourceId=tb.index&ie=utf8&initiative_id=tbindexz_20170306&tab=mall&cps=yes&ppath=20000%3A41399
selenium爬取淘宝商品信息_第1张图片

whole_products = driver.find_elements_by_xpath('//div[@class="item J_MouserOnverReq  "]')
len(whole_products) #44说明本页有44个商品

以第一个商品为例进行爬取

#商品名称
print(whole_products[0].find_element_by_xpath('.//a[@class="J_ClickStat"]').text)

#商品价格
print(whole_products[0].find_element_by_xpath('.//div[@class="price g_price g_price-highlight"]/strong').text)

#商品详情链接
print(whole_products[0].find_element_by_xpath('.//div[@class="row row-2 title"]/a').get_attribute('href'))

#图片链接
print(whole_products[0].find_element_by_xpath('.//img[@class="J_ItemPic img"]').get_attribute('src'))

#月销量
print(whole_products[0].find_element_by_xpath('.//div[@class="deal-cnt"]').text[:-3])

#店铺名称
print(whole_products[0].find_element_by_xpath('.//div[@class="shop"]').text)

#店铺所在地
print(whole_products[0].find_element_by_xpath('.//div[@class="location"]').text)
【王一博海报同款系列】施华洛世奇ATELIERSWAROVSKI竹子造型耳钉
1990.00
https://detail.tmall.com/item.htm?id=612072866704&ns=1&abbucket=14
https://g-search3.alicdn.com/img/bao/uploaded/i4/i1/2576722561/O1CN01826HZg1UmyvsgKHXW_!!2576722561.jpg_360x360Q90.jpg_.webp
12
施华洛世奇官方旗舰店
浙江 嘉兴

该页面中所有商品信息的爬取

name=[]
price=[]
detail_link=[]
img_link=[]
month_sale=[]
shop_name=[]
shop_location=[]
for i in range(len(whole_products)):
    name.append(whole_products[i].find_element_by_xpath('.//a[@class="J_ClickStat"]').text)
    price.append(whole_products[i].find_element_by_xpath('.//div[@class="price g_price g_price-highlight"]/strong').text)
    detail_link.append(whole_products[i].find_element_by_xpath('.//div[@class="row row-2 title"]/a').get_attribute('href'))
    img_link.append(whole_products[i].find_element_by_xpath('.//img[@class="J_ItemPic img"]').get_attribute('src'))
    month_sale.append(whole_products[i].find_element_by_xpath('.//div[@class="deal-cnt"]').text[:-3])
    shop_name.append(whole_products[i].find_element_by_xpath('.//div[@class="shop"]').text)
    shop_location.append(whole_products[i].find_element_by_xpath('.//div[@class="location"]').text)

将爬取到的信息存储到excel中

data = pd.DataFrame([name,price,detail_link,img_link,month_sale,shop_name,shop_location]).T

data.columns=['name','price','detail_link','img_link','month_sale','shop_name','shop_location']

data.to_excel('./yibo_swarovski.xls',encoding='utf-8')

selenium爬取淘宝商品信息_第2张图片

你可能感兴趣的:(爬虫)