【python爬虫自学笔记】-----爬取网易云歌单中歌曲歌词

工具:python3.6 ,pycharm

个人歌单的链接地址为https://music.163.com/#/playlist?id=2251736705

【python爬虫自学笔记】-----爬取网易云歌单中歌曲歌词_第1张图片

开始对网页的内容进行爬取的时候,使用requests获得响应,只传url,但是没有获得响应,使用urllib添加请求头部,并对response的内容使用utf-8进行解码,使用BeautifulSoup转换为html对象,并格式化打印对象内容。

此爬虫中最重要的一点是获得歌词的链接,此链接在网页的源代码中是隐藏的,参看文章说明,使用的是网易云开放的API接口。

#爬取网易云音乐我的歌单里面所有歌曲的歌词
import json
import requests
import re
import urllib
from bs4 import *
myurl = "http://music.163.com/playlist?id=2251736705"
headers = {"Host":" music.163.com",
"User-Agent":" Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:56.0) Gecko/20100101 Firefox/56.0",
}
request = urllib.request.Request(myurl,headers=headers)
response = urllib.request.urlopen(request)
#不decode的话text是十六进制,不是中文
html = response.read().decode('utf-8','ignore')
soup = BeautifulSoup(html,'lxml')
print(soup.prettify())

#打印的有用的数据部分

将爬取的歌词写入一个文件中

#打开jazz.txt 把歌单中的歌词写入
f=open('jazz.txt','w',encoding='utf-8')

首先获得歌曲的id,根据打印输出html对象结构可以看出,他们包含在一个ul标签中,每首歌包含在一个li标签中


for item in soup.ul.children:
    #取出歌单里歌曲的id  形式为:/song?id=11111111
    song_id = item('a')[0].get("href",None)
    #歌曲名称
    song_name = item.string
    #利用正则表达式提取出song_id的数字部分sid
    pat = re.compile(r'[0-9].*$')#提取模式为全都为数字的字符串
    sid = re.findall(pat,song_id)[0]#提取歌曲ID
    #打印歌曲ID以及名称
    print(sid+"-"+song_name)

5048569-Wonderful Tonight
1299217-Tears in Heaven
17541009-Autumn Leaves
28851137-Sensitive Kind 
25542198-My Back Pages
17541090-Lay Down Sally
26641658-Riding With the King
17540892-Change The World
28040815-Layla
26641663-Help the Poor
5201813-Tears In Heaven
17540496-Piece Of My Heart (Album Version)
28851139-Magnolia 
17540498-One Track Mind (Album Version)
26641661-Marry You
26641665-Worried Life Blues
28851135-Someday 
28851134-Rock And Roll Records 
17541200-Old Love
17541190-Hey Hey
26641669-Come Rain or Come Shine
1077606-Change the World (Live)
28851141-Songbird
413961594-I Will Be There
18610067-Last Will And Testament (Album Version)
28851136-Lies
1298826-Knockin' on Heaven's Door
17540893-My Father's Eyes
27490248-Everytime I Sing the Blues
17540856-Cocaine
18610066-Don't Cry Sister (Album Version)
31918662-Riding With The King
26641662-Three O'Clock Blues
1299044-Jeff's Blues
26641668-Hold On! I'm Comin'
17540639-Golden Ring
31918653-Behind The Mask
28851140-I Got The Same Old Blues 
1297898-Over The Rainbow
17540956-Tears In Heaven
17540890-Running On Faith - Unplugged
26641659-Ten Long Years
26641660-Key to the Highway
26641664-I Wanna Be
31918654-Sweet Home Chicago
28040813-Driftin'
413961593-Can't Let You Do It
28851133-They Call Me The Breeze
18610062-It's Easy (Album Version)
17541198-San Francisco Bay Blues

得到的歌曲为json格式,解析并且打印:

 #这里的url是真实的歌词页面
    url = "http://music.163.com/api/song/lyric?"+"id="+str(sid)+"&lv=1&kv=1&tv=-1"
    html = requests.post(url)
    json_obj = html.text
    #歌词是一个json对象 解析它
    j = json.loads(json_obj)
    print(j)
{'sgc': True, 'sfy': False, 'qfy': False, 'transUser': {'id': 5048569, 'status': 99, 'demand': 1, 'userid': 121424, 'nickname': '老白怪蜀黍', 'uptime': 1522309673919}, 'lrc': {'version': 12, 'lyric': "[00:22.270]It's late in the evening\n[00:27.140]she's wondering what clothes to wear\n[00:32.200]She puts on her make-up\n[00:37.410]and brushes her long blonde hair\n[00:42.600]And then she asks me Do I look all right\n[00:50.690]And I say Yes you look wonderful tonight\n[01:07.890]We go to a party and everyone turns to see\n[01:17.760]This beautiful lady that's walking around with me\n[01:27.790]And then she asks me Do you feel all right\n[01:36.160]And I say Yes I feel wonderful tonight\n[01:46.030]I feel wonderful because I see\n[01:51.720]The love light in your eyes\n[01:57.140]And the wonder of it all\n[02:01.770]Is that you just don't realize how much I love you\n[02:29.420]It's time to go home now and I've got an aching head\n[02:39.040]So I give her the car keys and she helps me to bed\n[02:49.400]And then I tell her as I turn out the light\n[02:57.860]I say My darling you were wonderful tonight\n[03:07.960]Oh my darling you were wonderful tonight\n"}, 'klyric': {'version': 0, 'lyric': None}, 'tlyric': {'version': 1, 'lyric': '[by:阿坤_Arcane]\n[00:22.270]那是一个傍晚\n[00:27.140]她在想穿什么衣服\n[00:32.200]她打扮好自己\n[00:37.410]然后梳理妥金色的长发\n[00:42.600]然后她问我:我看起来还好吗?\n[00:50.690]我说:是的,今晚的你美极了\n[01:07.890]我们去参加派对,所有的人都转过头\n[01:17.760]看着这位陪在我身边的美丽的女士\n[01:27.790]然后她问我:你感觉还好吧\n[01:36.160]我说:是的,今晚感觉棒极了\n[01:46.030]我感到美妙,是因为我看到了\n[01:51.720]你眼中爱的光芒\n[01:57.140]而其中最最美妙的\n[02:01.770]恰是你不会明白我有多么的爱你\n[02:29.420]是时候回家了,我有一点酒醉头痛\n[02:39.040]我把车钥匙给她,她会服侍我回家躺下\n[02:49.400]当我走出派对最后一缕灯光\n[02:57.860]我说:亲爱的,今晚你真的很美\n[03:07.960]哦,我的爱人,今晚你真的很美\n'}, 'code': 200}

得到json格式的歌词并获得歌词部分的内容,得到原歌词内容以及翻译的歌词内容:

 try:
        lyric = j['lrc']['lyric']
        tlyric = j['tlyric']['lyric']
        print(lyric)
        print(tlyric)
    except KeyError:
        lyric = "无歌词"

[00:22.270]It's late in the evening
[00:27.140]she's wondering what clothes to wear
[00:32.200]She puts on her make-up
[00:37.410]and brushes her long blonde hair
[00:42.600]And then she asks me Do I look all right
[00:50.690]And I say Yes you look wonderful tonight
[01:07.890]We go to a party and everyone turns to see
[01:17.760]This beautiful lady that's walking around with me
[01:27.790]And then she asks me Do you feel all right
[01:36.160]And I say Yes I feel wonderful tonight
[01:46.030]I feel wonderful because I see
[01:51.720]The love light in your eyes
[01:57.140]And the wonder of it all
[02:01.770]Is that you just don't realize how much I love you
[02:29.420]It's time to go home now and I've got an aching head
[02:39.040]So I give her the car keys and she helps me to bed
[02:49.400]And then I tell her as I turn out the light
[02:57.860]I say My darling you were wonderful tonight
[03:07.960]Oh my darling you were wonderful tonight

[by:阿坤_Arcane]
[00:22.270]那是一个傍晚
[00:27.140]她在想穿什么衣服
[00:32.200]她打扮好自己
[00:37.410]然后梳理妥金色的长发
[00:42.600]然后她问我:我看起来还好吗?
[00:50.690]我说:是的,今晚的你美极了
[01:07.890]我们去参加派对,所有的人都转过头
[01:17.760]看着这位陪在我身边的美丽的女士
[01:27.790]然后她问我:你感觉还好吧
[01:36.160]我说:是的,今晚感觉棒极了
[01:46.030]我感到美妙,是因为我看到了
[01:51.720]你眼中爱的光芒
[01:57.140]而其中最最美妙的
[02:01.770]恰是你不会明白我有多么的爱你
[02:29.420]是时候回家了,我有一点酒醉头痛
[02:39.040]我把车钥匙给她,她会服侍我回家躺下
[02:49.400]当我走出派对最后一缕灯光
[02:57.860]我说:亲爱的,今晚你真的很美
[03:07.960]哦,我的爱人,今晚你真的很美

使用正则表达式获得例如[00:22.270]的模式然后使用空字符串进行替换,re.sub()具体使用方法见re正则表达式用法。string.strip()方法具体使用见string.strip()使用。

    pat = re.compile(r'\[.*\]')
    lrc = re.sub(pat,"",lyric)
    tlrc = re.sub(pat,"",tlyric)
    lrc = sid+"-"+song_name+'\n'+lrc.strip()+'\n'+tlrc.strip()+'\n'
    print(lrc)
    f.write(lrc)
f.close()

5048569-Wonderful Tonight
It's late in the evening
she's wondering what clothes to wear
She puts on her make-up
and brushes her long blonde hair
And then she asks me Do I look all right
And I say Yes you look wonderful tonight
We go to a party and everyone turns to see
This beautiful lady that's walking around with me
And then she asks me Do you feel all right
And I say Yes I feel wonderful tonight
I feel wonderful because I see
The love light in your eyes
And the wonder of it all
Is that you just don't realize how much I love you
It's time to go home now and I've got an aching head
So I give her the car keys and she helps me to bed
And then I tell her as I turn out the light
I say My darling you were wonderful tonight
Oh my darling you were wonderful tonight
那是一个傍晚
她在想穿什么衣服
她打扮好自己
然后梳理妥金色的长发
然后她问我:我看起来还好吗?
我说:是的,今晚的你美极了
我们去参加派对,所有的人都转过头
看着这位陪在我身边的美丽的女士
然后她问我:你感觉还好吧
我说:是的,今晚感觉棒极了
我感到美妙,是因为我看到了
你眼中爱的光芒
而其中最最美妙的
恰是你不会明白我有多么的爱你
是时候回家了,我有一点酒醉头痛
我把车钥匙给她,她会服侍我回家躺下
当我走出派对最后一缕灯光
我说:亲爱的,今晚你真的很美
哦,我的爱人,今晚你真的很美

 

你可能感兴趣的:(python)