爬虫基础系列BeautifulSoup——搜索文档树(3)

8586231_192932724000_2.jpg
  • find_all方法返回的是BeautifulSoup特有的结果集,里面装的是标签对象
from bs4 import BeautifulSoup
import re
html = """
The Dormouse's storyThe Dormouse's story2

The Dormouse's story

Once upon a time there were three little sisters; and their names were , Lacie and Tillie; and they lived at the bottom of a well.

...

""" #解析字符串形式的html soup=BeautifulSoup(html,'lxml')

data=soup.find_all('a')
print(type(data))

结果:


取值方法:

data=soup.find_all('a')
for i in data:
    print(i.string)

结果:

 Elsie 
Lacie
Tillie

#根据正则表达式查找标签
data1=soup.find_all(re.compile('^b'))
for i in data1:
    print(data1)

返回结果为所有以b开头的所有标签

#根据属性查找标签
data2=soup.find_all(id='link2')
for i in data2:
    print(data2)

结果:

[Lacie]

#根据标签内容获取标签内容
data3=soup.find_all(text='Tillie')
data4=soup.find_all(text=['Lacie','Tillie'])
data5=soup.find_all(text=re.compile("Do"))
print(data5)

结果:data3,data4,data5

['Tillie']
['Lacie', 'Tillie']
["The Dormouse's story", "The Dormouse's story2", "The Dormouse's story"]

你可能感兴趣的:(爬虫基础系列BeautifulSoup——搜索文档树(3))