BeautifulSoup学习笔记

bs4 中 BeautifulSoup 常用命令:  BeautifulSoup,prettify,find,find_all,get,get_text

 

示例:

scenery.html如下:

 1 <html lang="en">
 2 <head>
 3     <meta charset="UTF-8">
 4         <title>武汉旅游景点title>
 5     <meta name="description" content="武汉旅游景点 精简版" />
 6     <meta name="authot" content="hstking" >
 7 head>
 8 <body>
 9     <div id="content">
10         <div class="title">
11             <h3>武汉景点h3>
12         div>
13         <ul class="table">
14             <li>景点<a>门票价格a>li>
15         ul>
16         <ul class="content">
17             <li nu="1">东湖 <a class="price">60a>li>
18             <li nu="2">磨山 <a class="price">60a>li>
19             <li nu="3">欢乐谷 <a class="price">108a>li>
20             <li nu="4">海洋世界 <a class="price">150a>li>
21             <li nu="5">水上乐园 <a class="price">150a>li>
22         ul>
23     div>
24 body>
25 html>
View Code

 

操作代码如下:

 1 from bs4 import BeautifulSoup
 2 soup = BeautifulSoup(open('scenery.html',encoding='utf-8'),'lxml')  #常用lxml解析
 3 print("soup类型:\t",end="");print(type(soup))
 4 
 5 l = soup.find('li')    #find()可以指定属性attrs,只返回一个结果
 6 #l = soup.find('li',attrs={'nu':'3'})
 7 l_all = soup.find_all('li') #find_all()返回所有结果
 8 #l_all = soup.find_all('li',attrs={'nu':'3'})
 9 print("find()的内容:\t",end="");print(l)
10 print("find()返回类型:\t",end="");print(type(l))
11 print("find_all()的内容:\t",end="");print(l_all)
12 print("find_all()返回类型:\t",end="");print(type(l_all))    #find_all()返回类型可隐式变换为list
13 print("find_all()中第3个元素的值:\t ",end="");print(l_all[2])
14 print("find_all()中元素类型:\t",end="");print(type(l_all[2]))
15 print("find()可获取下级标签的内容:\t",end="");print(l.find('a'))
16 print("get()可获取属性的值:\t",end="");print(l.get('nu'))
17 print("get()返回类型:\t",end="");print(type(l.get('nu')))
18 print("get_text()可获取标签包含的字符串:\t",end="");print(l.a.get_text())
19 print("get_text()返回类型:\t",end="");print(type(l.a.get_text()))
View Code

 

结果如下:

BeautifulSoup学习笔记_第1张图片

 

你可能感兴趣的:(BeautifulSoup学习笔记)