Beautifulsoup爬取网页标题

既然是使用BeautifulSoup我们就来爬取BeautifulSoup官方文档的标题吧。也就是下面红框内的图片。网址:https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/#id4
Beautifulsoup爬取网页标题_第1张图片
接下来我们进行分步处理:
①请求数据:

r=requests.get("https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/#id4")
text=r.text 

②利用Beautifulsoup获取网页信息:

soup=BeautifulSoup(text,"html.parser")

③提取网页中自己想要的信息:

a = soup.find('div',{'class':'local-toc'}).find_all('a',{'class':'reference internal'})

④将提取导的信息放入到一个列表中:

b = []
for i in range(len(a)):  		#放入列表中
	b.append(a[i].get_text())

⑤将列表中的数据放入txt文件中

with open(r'C:\Users\Rsvp\Desktop\标题.txt','w',encoding = 'utf-8') as f: 		#放入文档中
     for i in range(len(b)):
             f.write(b[i]+'\r')

下面我们展示全部代码:

import requests
import json
from bs4 import BeautifulSoup
import urllib
r=requests.get("https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/#id4")
text=r.text                                                                                     
soup=BeautifulSoup(text,"html.parser")
a = soup.find('div',{'class':'local-toc'}).find_all('a',{'class':'reference internal'})
b = []
for i in range(len(a)):  		#放入列表中
	b.append(a[i].get_text())
with open(r'C:\Users\Rsvp\Desktop\标题.txt','w',encoding = 'utf-8') as f: 		#放入文档中
     for i in range(len(b)):
             f.write(b[i]+'\r')

你可能感兴趣的:(python,爬虫)