【Python3 爬虫学习笔记】解析库的使用 4 —— Beautiful Soup 2

父节点和祖先节点

如果要获取某个节点元素的父节点,可以调用parent属性:

html = """


The Dormouse's story


Once upon a time there were three little sisters; and their names were Elsie

...

"""
from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(soup.a.parent)

运行结果如下:

<p class="story">
            Once upon a time there were three little sisters; and their names were
<a class="sister" href="http://example.com/elsie" id="link1">
<span>Elsie</span>
</a>
</p>

这里我们选择的是第一个a节点的父节点元素。很明显,它的父节点是p节点,输出结果便是p节点及其内部的内容。
需要注意的是,这里输出的仅仅是a节点的直接父节点,而没有再向外寻找父节点的祖先节点。如果想获取所有的祖先节点,可以调用parents属性:

html = """


Elsie

"""
from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print(type(soup.a.parents)) print(list(enumerate(soup.a.parents)))

运行结果如下:

<class 'generator'>
[(0, <p class="story">
<a class="sister" href="http://example.com/elsie" id="link1">
<span>Elsie</span>
</a>
</p>), (1, <body>
<p class="story">
<a class="sister" href="http://example.com/elsie" id="link1">
<span>Elsie</span>
</a>
</p>
</body>), (2, <html>
<body>
<p class="story">
<a class="sister" href="http://example.com/elsie" id="link1">
<span>Elsie</span>
</a>
</p>
</body></html>), (3, <html>
<body>
<p class="story">
<a class="sister" href="http://example.com/elsie" id="link1">
<span>Elsie</span>
</a>
</p>
</body></html>)]

可以发现,返回结果是生成器类型。这里用列表输出了它的索引和内容,而列表中的元素就是a节点的祖先节点。

兄弟节点

兄弟节点的获取方式:

html = """


Once upon a time there were little sisters; and their names were Elsie Hello Lacie and Tillie and they lived at the bottom of a well.

"""
from bs4 import BeautifulSoup soup = BeautifulSoup(html, 'lxml') print('Next Sibling', soup.a.next_sibling) print('Prev Sibling', soup.a.previous_sibling) print('Next Siblings', list(enumerate(soup.a.next_siblings))) print('Prev Siblings', list(enumerate(soup.a.previous_siblings)))

运行结果如下:

Next Sibling
        Hello

Prev Sibling
        Once upon a time there were little sisters; and their names were

Next Siblings [(0, '\n        Hello\n'), (1, <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>), (2, '\n        and\n'), (3, <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>), (4, '\n        and they lived at the bottom of a well.\n')]
Prev Siblings [(0, '\n        Once upon a time there were little sisters; and their names were\n')]

可以看到,这里调用了4个属性,其中next_sibling和previous_sibling分别获取节点的下一个和上一个兄弟元素,next_siblings和previous_siblings则分别返回所有前面和后面的兄弟节点的生成器。

你可能感兴趣的:(学习笔记)