BeautifulSoup 爬取网络数据(4)-处理同辈节点(siblings)和父辈节点(parents)

BeautifulSoup的next_siblings()函数非常适用于表格查找,尤其是带有标题的表格。


BeautifulSoup 爬取网络数据(4)-处理同辈节点(siblings)和父辈节点(parents)_第1张图片
image.png
from urllib.request import urlopen
from bs4 import BeautifulSoup


html = urlopen("http://www.pythonscraping.com/pages/page3.html")
soup = BeautifulSoup(html, 'lxml')

siblings = soup.find("table",{'id':'giftList'}).tr.next_siblings
sum = 0
for sibling in siblings:
    print(sibling)
    sum+=1
print(sum)

结果为:




Vegetable Basket

This vegetable basket is the perfect gift for your health conscious (or overweight) friends!
Now with super-colorful bell peppers!

$15.00






Russian Nesting Dolls

Hand-painted by trained monkeys, these exquisite dolls are priceless! And by "priceless," we mean "extremely expensive"! 8 entire dolls per set! Octuple the presents!

$10,000.52




...

11
0
[Finished in 2.2s]

代码输出产品表中的所有产品,除了首行标题。因为:

  1. 查找对象本身不是自己的同辈,因此使用sibling相关函数时查找对象都会被跳过。
    2.代码使用的是next siblings,因此会返回查找对象的下一个(些)同辈节点。

补充:除了next_siblings,记住previous_siblings经常用来查找已知最后一行容易定位且不需要抓取的情况。当然,next_sibling 和 previous_sibling 可以用来查找一个同辈节点。

你可能感兴趣的:(BeautifulSoup 爬取网络数据(4)-处理同辈节点(siblings)和父辈节点(parents))