Python基础学习19

BeautifulSoup安装库

 ~ pip3 install bs4
Collecting bs4
  Downloading https://files.pythonhosted.org/packages/10/ed/7e8b97591f6f456174139ec089c769f89a94a1a4025fe967691de971f314/bs4-0.0.1.tar.gz
Collecting beautifulsoup4 (from bs4)
  Downloading https://files.pythonhosted.org/packages/1d/5d/3260694a59df0ec52f8b4883f5d23b130bc237602a1411fa670eae12351e/beautifulsoup4-4.7.1-py3-none-any.whl (94kB)
    100% |████████████████████████████████| 102kB 230kB/s
Collecting soupsieve>=1.2 (from beautifulsoup4->bs4)
  Downloading https://files.pythonhosted.org/packages/77/78/bca00cc9fa70bba1226ee70a42bf375c4e048fe69066a0d9b5e69bc2a79a/soupsieve-1.8-py2.py3-none-any.whl (88kB)
    100% |████████████████████████████████| 92kB 114kB/s
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py) ... done
  Stored in directory: /Users/insight2026/Library/Caches/pip/wheels/a0/b0/b2/4f80b9456b87abedbc0bf2d52235414c3467d8889be38dd472
Successfully built bs4
Installing collected packages: soupsieve, beautifulsoup4, bs4
Successfully installed beautifulsoup4-4.7.1 bs4-0.0.1 soupsieve-1.8

BeautifulSoup替代正则提取html内容应用案例:

html_doc = """
The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well.

...

""" from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc, 'lxml')#定义lxml格式承载html格式,如没有可用pip3 install lxml安装 print(soup.prettify()) # # 找到title标签 print(soup.title) # # title 标签里的内容 print(soup.title.string) # # 找到p标签 print(soup.p) # # 找到p标签class的名字 print(soup.p['class']) # # 找到第一个a标签 print(soup.a) # # 找到所有的a标签 print(soup.find_all('a')) # # # # 找到id为link3的的标签 print(soup.find(id="link3")) # # 找到所有标签的链接 for link in soup.find_all('a'): print(link.get('href')) # # 找到文档中所有的文本内容 print(soup.get_text())

你可能感兴趣的:(Python基础学习19)