BeautifulSoup4

1. bs4简介

  • BeautifulSoup,一个可以从html或者xml文件中提取数据的网页信息库
  • 安装:
      pip install lxml
      pip install bs4
    

2. bs4使用

html_doc = """
The Dormouse's story
  
      

The Dormouse's story

Once upon a time there were three little sisters; a nd their names were Elsie Lacieand Tillie;and they lived at the bottom of a well.

...

"""

1 # 获取bs对象
2 bs = BeautifulSoup(html_doc,'lxml')
3 # 打印⽂档内容(把我们的标签更加规范的打印)
4 print(bs.prettify())
5 print(bs.title) # 获取title标签内容 The Dormouse's story
6 print(bs.title.name) # 获取title标签名称 title
7 print(bs.title.string) # title标签⾥⾯的⽂本内容 The Dormouse's story
8 print(bs.p) # 获取p段落

你可能感兴趣的:(BeautifulSoup4)