爬虫入门2(爬取酷狗TOP500的数据)

万恶的酷狗浏览器网页版居然只能看第一页,要下载播放器才能浏览后面的内容。
此段代码爬取所有的歌曲及链接

image.png

第一页就是这样的,观察发现https://www.kugou.com/yy/rank/home/1-8888.html?from=rank
把1改成2
https://www.kugou.com/yy/rank/home/2-8888.html?from=rank
就是第二页了,爬取多页,如下
'''
import lxml
import requests
from bs4 import BeautifulSoup

headers={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.75 Safari/537.36"
}

def get_informations(url):
web_data=requests.get(url,headers)
soup=BeautifulSoup(web_data.text,"lxml")
informations=soup.find_all("a","pc_temp_songname")
for information in informations:
data={
'歌曲':information.get('title'),
'网址':information.get("href")
}
print(data)
urls=["https://www.kugou.com/yy/rank/home/{}-8888.html?from=rank".format(str(i)) for i in range(1,24)]
for url in urls:
get_informations(url)
'''

你可能感兴趣的:(爬虫入门2(爬取酷狗TOP500的数据))