Beautifulsoup 使用笔记

1.在线文档 http://www.crummy.com/software/BeautifulSoup/bs4/doc/

2.常用方法

  • 选择器 find_all(name, attrs, recursive, text, limit, **kwargs)
3.主要调用方法

Beautifulsoup 使用笔记_第1张图片

4.完整代码

#-*- coding:utf-8 -*-

from bs4 import BeautifulSoup;

def main():
	html = """
	The Dormouse's story 
	

The Dormouse's story

Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well.

...

""" # print html_doc soup = BeautifulSoup(html) print soup.get_text() #获得文本 print soup.find_all('title') #获取标题 print soup.find_all('a') #获取链接 print soup.find_all(id="link2") #根据ID来获取HTML元素 print soup.find_all("a",class_="cla") #根据class来获取HTML元素 #根据class属性来选择 print soup.find_all("a", class_="sister") print soup.select("p.title") #多重属性来选择 print soup.find_all("a", attrs={"class": "sister"}) #根据文本来选择 print soup.find_all(text="Elsie") print soup.find_all(text=["Tillie", "Elsie", "Lacie"]) #限制查询的个数 print soup.find_all("a", limit=2) if __name__ == '__main__': main()


你可能感兴趣的:(工具,Beautifulsoup)