既然是使用BeautifulSoup我们就来爬取BeautifulSoup官方文档的标题吧。也就是下面红框内的图片。网址:https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/#id4
接下来我们进行分步处理:
①请求数据:
r=requests.get("https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/#id4")
text=r.text
②利用Beautifulsoup获取网页信息:
soup=BeautifulSoup(text,"html.parser")
③提取网页中自己想要的信息:
a = soup.find('div',{'class':'local-toc'}).find_all('a',{'class':'reference internal'})
④将提取导的信息放入到一个列表中:
b = []
for i in range(len(a)): #放入列表中
b.append(a[i].get_text())
⑤将列表中的数据放入txt文件中
with open(r'C:\Users\Rsvp\Desktop\标题.txt','w',encoding = 'utf-8') as f: #放入文档中
for i in range(len(b)):
f.write(b[i]+'\r')
下面我们展示全部代码:
import requests
import json
from bs4 import BeautifulSoup
import urllib
r=requests.get("https://beautifulsoup.readthedocs.io/zh_CN/v4.4.0/#id4")
text=r.text
soup=BeautifulSoup(text,"html.parser")
a = soup.find('div',{'class':'local-toc'}).find_all('a',{'class':'reference internal'})
b = []
for i in range(len(a)): #放入列表中
b.append(a[i].get_text())
with open(r'C:\Users\Rsvp\Desktop\标题.txt','w',encoding = 'utf-8') as f: #放入文档中
for i in range(len(b)):
f.write(b[i]+'\r')