python实战计划第一周作业:1.2解析网页

实现了用python代码读取本地网页的并解析出其中的内容

python实战计划第一周作业:1.2解析网页_第1张图片
需要解析的网页

实现代码

from bs4 import BeautifulSoup

info = []
starslist = []
with open('/Users/Trudy/Desktop/plan-for-combating/week1/1_2/1_2answer_of_homework/index.html', 'r') as wb_data:
    soup = BeautifulSoup(wb_data, 'lxml')
    images = soup.select(
        "body > div > div > div.col-md-9 > div > div > div > img")
    prices = soup.select(
        "body > div > div > div.col-md-9 > div > div > div > div.caption > h4.pull-right")
    titles = soup.select(
        "body > div > div > div.col-md-9 > div > div > div > div.caption > h4 > a")
    stars = soup.select(
        "body > div > div > div.col-md-9 > div > div > div > div.ratings > p:nth-of-type(2)")
    reviews = soup.select(
        "body > div > div > div.col-md-9 > div > div > div > div.ratings > p.pull-right")

for image,price,title,star,review in zip(images,prices,titles,stars,reviews):
    data={
        'image':image.get_text(),
        'price':price.get_text(),
        'title':title.get_text(),
        'star':len(star.find_all("span","glyphicon glyphicon-star")),
        'review':review.get_text()
    }
    info.append(data)

for i in info:
    print(i['title'],i['price'],i['image'],i['review'],i['star'])

总结:

  • nth-of-type(2)父元素的第二个 p 元素的每个 p
  • find_all() 方法搜索当前tag的所有tag子节点,并判断是否符合过滤器的条件.这里有几个例子:
soup.find_all("title")
#[The Dormouse's story]
soup.find_all("p", "title"
[

The Dormouse's story

] soup.find_all("a") #[Elsie, #Lacie, #Tillie] soup.find_all(id="link2") #[Lacie] import re soup.find(string=re.compile("sisters")) # u'Once upon a time there were three little sisters; and their names were\n'

你可能感兴趣的:(python实战计划第一周作业:1.2解析网页)