python数据提取工具beautifulsoup教程1快速入门

简介

图片.png

Beautiful Soup是可以轻松从网页上抓取信息的库。 它位于HTML或XML解析器的顶部,提供Pythonic方式迭代,搜索和修改解析树。

安装

pip install -U beautifulsoup4

快速入门

图片.png
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup("

SomebadHTML", 'html.parser') >>> print(soup.prettify())

Some bad HTML

>>> soup.find(text="bad") 'bad' >>> soup.i HTML >>> soup = BeautifulSoup("SomebadXML", "xml") >>> print(soup.prettify()) Some bad XML >>>

参考资料:https://pypi.org/project/beautifulsoup4/

参考资料

  • python测试开发项目实战-目录
  • python工具书籍下载-持续更新
  • python 3.7极速入门教程 - 目录
  • 讨论qq群630011153 144081101
  • 原文地址
  • 本文涉及的python测试开发库 谢谢点赞!
  • [本文相关海量书籍下载](https://github.com/china-testing/python-api-tesing/blob/master/books.md
  • 本文持续更新,最新版本:https://www.jianshu.com/p/fbc416635987

稍稍深入

图片.png

Beautiful Soup 可从HTML或XML文件中提取数据的Python库。通过你喜欢的转换器实现文档导航,查找,修改。.Beautiful Soup会帮你节省数小时或天的时间.

  • 加载文档

本节示例文档来自爱丽丝梦游仙境故事:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(html_doc, 'html.parser')
>>> print(soup.prettify())

 
  
   The Dormouse's story
  
 
 
  

The Dormouse's story

Once upon a time there were three little sisters; and their names were Elsie , Lacie and Tillie ; and they lived at the bottom of a well.

...

>>>

html代码的形式:

The Dormouse's story

The Dormouse's story

Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well.

...

浏览器打开的效果:

图片.png
  • 导航该数据结构的简单方法:
>>> soup.title
The Dormouse's story
>>> soup.title.name
'title'
>>> soup.title.string
"The Dormouse's story"
>>> soup.title.parent.name
'head'
>>> soup.p

The Dormouse's story

>>> soup.p['class'] ['title'] >>> soup.a Elsie >>> soup.find_all('a') [Elsie, Lacie, Tillie] >>> soup.find(id="link3") Tillie

常用操作:1,通过标签找到所有链接。2,从页面中提取所有文本

>>> for link in soup.find_all('a'):
...     print(link.get('href'))
... 
http://example.com/elsie
http://example.com/lacie
http://example.com/tillie
>>> print(soup.get_text())

The Dormouse's story

The Dormouse's story
Once upon a time there were three little sisters; and their names were
Elsie,
Lacie and
Tillie;
and they lived at the bottom of a well.
...

>>> 

你可能感兴趣的:(python数据提取工具beautifulsoup教程1快速入门)