[分章:代码知识]python BeautifulSoup用法

BeautifulSoup bs4内HTML解析库

作用:用于解析HTML信息

示例:

from bs4 import BeautifulSoup   

soup = BeautifulSoup(html_doc,"html.parser")  

BeautifulSoup

将数据转换成指定格式,方便解析HTML

示例:

import requests

from bs4 import BeautifulSoup

url = "https://movie.douban.com/"

headers = {

    "user-agent":"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36"

}

# 3、解析信息

soup = BeautifulSoup(requests.get(url,headers=headers).content,"html.parser")

Soup.find 查找第一个tageName

查找第一个目标string。同等与Soup.tageName。

还可以进行属性定位,用法为soup.find(tageName,属性=属性名)

import requests

from bs4 import BeautifulSoup

soup = BeautifulSoup(open(‘123.html’),"lxml")

list1 = soup.find('div',class_='index-left') # ‘div’为目标标签,class_='index-left为目标属性名

print(list1)

Soup.find().decompose 删除第一个string
Soup.find_all 查找所有tageName

查找所有目标string。

进行属性定位,查找所有目标属性数据,用法为soup.find_all(tageName,属性=属性名)

import requests

from bs4 import BeautifulSoup

soup = BeautifulSoup(open(‘123.html’),"lxml")

list1 = soup.find_all('div',class_='index-left') # ‘div’为目标标签,class_='index-left为目标属性名

print(list1)

list2 = soup.find_all(‘img’)

print(list2)

Soup.select() 选择tageName

选择标签,可通过’>’方式查找指定标签目录下的数据;空格表示多个层级。

from bs4 import BeautifulSoup

soup = BeautifulSoup(open(‘123.html’),"lxml")

list1 = soup.select('.high-quality-list > ul > li > a > img')[0]   # 选则路径:.high-quality-list类-->ul标签-->li-->a-->img

list1 = soup.select('.high-quality-list > ul a')    # 空格表示多个层级

Soup.select().get_text() 选择标签内的文本

获取标签内的文本

from bs4 import BeautifulSoup

soup = BeautifulSoup(open(‘123.html’),"lxml")

list1 = soup.select('.high-quality-list > ul > li > a > img')[0].get_text()   # 选则路径:.high-quality-list类-->ul标签-->li-->a-->img

Soup. 查找标签

查找第一个标签内数据,tageName为目标标签名。

from bs4 import BeautifulSoup

soup = BeautifulSoup(open(‘123.html’),"lxml")

list1 =  soup.li

print(f"找到了{len(list1)}个数据")

print(list1)

你可能感兴趣的:(#,python,分章,python,beautifulsoup,开发语言)