Python爬虫学习日志(4)

目录

  • Beautiful Soup库
    • 1.作用
    • 2.BeautifulSoup类
    • 3.基本元素
    • 4.库的理解
    • 5.基于bs4库的HTML内容遍历方法
    • 6.基于bs4库的HTML格式输出

Beautiful Soup库

B和S要大写

1.作用

  • Beautiful Soup库是解析、遍历、维护“标签树”的功能。
    标签树:
<html>
	<body>
		<p class="title">...</p>
	</body>
</html>

2.BeautifulSoup类

  • HTML页面<——>标签树<——>BeautifulSoup类
from bs4 import BeautifulSoup
soup = BeautifulSoup("data", "html.parser") # "html.parser"是HTML解析器
soup2 = BeautifulSoup(open("D://demo.html"), "html.parser")
  • BeautifulSoup类对应一个HTML/XML文档的全部内容
    Python爬虫学习日志(4)_第1张图片

3.基本元素

Python爬虫学习日志(4)_第2张图片

  • NavigableString可以跨越多个标签层次

4.库的理解

Python爬虫学习日志(4)_第3张图片
Python爬虫学习日志(4)_第4张图片

from bs4 import BeautifulSoup
newsoup = BeautifulSoup("

This is not a comment

"
, "html.parser") newsoup.b.string 'This is a comment' type(newsoup.b.string) <class 'bs4.element.Comment'> newsoup.p.string
  • 是注释标签,解析时会自动忽略,只提取文本。为了区分b标签p标签中的文本内容,可以通过字符类型进行区分。

5.基于bs4库的HTML内容遍历方法

Python爬虫学习日志(4)_第5张图片

  1. 标签树的下行遍历
    Python爬虫学习日志(4)_第6张图片
  2. 标签树的上行遍历
    Python爬虫学习日志(4)_第7张图片
  3. 标签树的平行遍历
    Python爬虫学习日志(4)_第8张图片

6.基于bs4库的HTML格式输出

  • 对HTML格式输出进行美化
import requests
r = requests.get("http://python123.io/ws/demo.html")
demo = r.text
>>>demo
'This is a python demo page\r\n\r\n

The demo python introduces several python courses.

\r\n

Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:\r\nBasic Python and Advanced Python.

\r\n'
from bs4 import BeautifulSoup soup = BeautifulSoup(demo, "html.parser") >>>soup.prettify() '\n \n \n This is a python demo page\n \n \n \n

\n \n The demo python introduces several python courses.\n \n

\n

\n Python is a wonderful general-purpose programming language. You can learn Python from novice to professional by tracking the following courses:\n \n Basic Python\n \n and\n \n Advanced Python\n \n .\n

\n \n'
print(soup.prettify()) #美化,添加回车符 <html> <head> <title> This is a python demo page </title> </head> <body> <p class="title"> <b> The demo python introduces several python courses. </b> </p>

你可能感兴趣的:(Python爬虫基础教程,Python,爬虫,基础)