【week1】day2:解析本地网页

1,基础知识

使用BeautifulSoup解析网页:

步骤:
  • Step1:解析网页
    BeautifulSoup(html, 'lxml')
  • Step2:描述要爬取得东西在哪
    Soup.select( )
  • Step3:从标签中获取需要的信息
    Soup.select(???)

2,自己动手写程序

-The Result:

Paste_Image.png

-The Code:

from bs4 import BeautifulSoup
path = '/Users/huoqi/Documents/pythonlearn/combating/week1/1_2/homework1_2/1_2_homework_required/index.html'

with open(path, 'r') as wb_data:
    #print(wb_data)
    Soup = BeautifulSoup(wb_data, 'lxml')
    #print(Soup)
    images = Soup.select('body > div > div > div.col-md-9 > div > div > div > img')
    titles = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4 > a')
    prices = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4.pull-right')
    views = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p.pull-right')
    stars = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p:nth-of-type(2)')

    #print(images, titles, prices, views, stars)

for image, title, price, view, star in zip(images, titles, prices, views, stars):
    data = {
        'image' : image.get('src'),
        'title' : title.get_text(),
        'price' : price.get_text(),
        'view' : view.get_text(),
        'star' : len(star.find_all('span', class_= "glyphicon glyphicon-star"))
    }

    print(data)

3,反思与总结

  • len()函数可以返回列表元素的个数。
  • 使用copy selector选出来的路径要多比较。
  • 路径的修改问题尚未明白,现在仍在思考。

KEEP FIGHTING!

你可能感兴趣的:(【week1】day2:解析本地网页)