今天利用简单的Python获取到了B站首页所有的链接,代码如下
#coding=utf-8
import requests
from bs4 import BeautifulSoup
resp = requests.get('https://www.bilibili.com/') #请求首页
bsobj = BeautifulSoup(resp.content, 'lxml') #将网页源码构造成BeautifulSoup对象
a_list = bsobj.find_all('a')
text = ''
for a in a_list:
href = a.get('href')
if isinstance(href, str):
print(href)
text += href+'\n'
with open('url.txt', 'w') as f:
f.write(text)
这里需要注意一个问题,在代码中专门加了一个判断条件
if isinstance(href, str):
主要是因为如果直接写入会报错:
Traceback (most recent call last):
File "C:\***\***\***", line 14, in <module>
text += href+'\n'
TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'