Python实战计划学习笔记0629

实战计划第一天,抓了一个本地网页。

最终成果是这样的:

Python实战计划学习笔记0629_第1张图片
Paste_Image.png

我的代码:

from bs4 import BeautifulSoup
info = []
with open('E:/PycharmProjects/homework2/homework2/1_2_homework_required/index.html','r') as data:
Soup = BeautifulSoup(data,'lxml')
images = Soup.select('body > div > div > div.col-md-9 > div > div > div > img')
titles = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4 > a')
prices = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4.pull-right')
grades = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p:nth-of-type(2)')
counts = Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p.pull-right')
#  print(images,titles,grades,prices,counts)
for title,image,price,grade,count in zip(titles,images,prices,grades,counts):
data1 = {
'title' : title.get_text(),
'image' : image.get('src'),
'price' : price.get_text(),
'grade' : len(grade.find_all("span" , class_ = "glyphicon glyphicon-star" )),
'count' : count.get_text()
}
print(data1)
info.append(data1)

总结

  • lxml在内的三种解析方式
  • :nth-child(1)>img 代表具体到每一个子节点,抓所有元素时要删除或 变成nth-of-type
  • 步骤1.soup解析2.复制CSS path(注意格式要对,尤其空格等)3.筛选信息4.字典扩充info.append(data1)
  • ()tupple []list {}dic
  • grade和grades区别:抓网页时grades是父节点个数,grade是每个父节点下星星构成的list

你可能感兴趣的:(Python实战计划学习笔记0629)