用python爬取酷狗音乐

用python爬取某狗音乐

(此文为小左1120原创,转载请标明出处,如违规请联系作者立刻删除)
此文为python爬虫文章,讲了用python爬取某狗里的歌名并下载(包括vip)后面带exe
用python爬取酷狗音乐_第1张图片

由于本人爬虫经验不足,代码可能会粗糙,有bug,麻烦大佬加个微信QQ一起学吧!代码失效了也给我通知一声
微信:aicoder-XZ
QQ:2583188733
老规矩
配置好python
配置好pycharm
再安装依赖的模块:
点这里(感谢大神hy_696的文章)(ps:在他给的网站里看一下最新版本,不要下载,在https://mirrors.huaweicloud.com/geckodriver找到刚看的最新版本,进去后下载,这样又快又稳定)
接下来,按下win+R输入cmd-回车!在黑窗口里输入

pip install requests

打开pycharm,
新建项目,(不会的话点这里)
进去后,右键文件夹,
用python爬取酷狗音乐_第2张图片
输入个你喜欢的名字,回车!

开始写代码吧!
先打开浏览器
访问某狗音乐下载(不要用ie)
用python爬取酷狗音乐_第3张图片

随便搜索搜索看看网址
以《癞蛤蟆》为例
网址是

https://music.liuzhijin.cn/?name=癞蛤蟆&type=kugou

所以搜索的格式是

https://music.liuzhijin.cn/?name=歌曲名&type=kugou

好,咱先爬:

from selenium import webdriver
import time

name = input(">")
url = "https://music.liuzhijin.cn/?name=" + name + "&type=kugou"
browser = webdriver.Firefox()
browser.get(url)

这个网站有一点问题,底下有个载入更多,不能直接全部加载,
没有事的。。
按下F12
在出来的控制台的左上角点这个按钮
在这里插入图片描述
点击载入更多

ok,浏览器会定位到这样一条元素,

<div class="aplayer-more">载入更多div>

右击-复制-完整的XPath(火狐是XPath)
用python爬取酷狗音乐_第4张图片
用python爬取酷狗音乐_第5张图片

from selenium import webdriver
import time

name = input(">")
url = "https://music.liuzhijin.cn/?name=" + name + "&type=kugou"
browser = webdriver.Firefox()
browser.get(url)
time.sleep(3)
while True:
    try:
        browser.find_element_by_xpath("你的XPath").click()
    except:
        break

我的是/html/body/section/div[1]/div/form[2]/div[4]/div/div[4]
所以就是

from selenium import webdriver
import time

name = input(">")
url = "https://music.liuzhijin.cn/?name=" + name + "&type=kugou"
browser = webdriver.Firefox()
browser.get(url)
time.sleep(3)
while True:
    try:
        browser.find_element_by_xpath("/html/body/section/div[1]/div/form[2]/div[4]/div/div[4]").click()
    except:
        break

一切就绪,获取网页源代码

html = browser.page_source

控制台上看一看

看到一大堆li
用python爬取酷狗音乐_第6张图片

挑一个(我搜索的Mojito)

<li>
                        <span class="aplayer-list-cur" style="background: #0e90d2;">span>
                        <span class="aplayer-list-index">2span>
                        <span class="aplayer-list-title">Mojitospan>
                        <span class="aplayer-list-author">许66span>
                    li>
                    <li>
                        <span class="aplayer-list-cur" style="background: #0e90d2;">span>
                        <span class="aplayer-list-index">3span>
                        <span class="aplayer-list-title">Mojitospan>
                        <span class="aplayer-list-author">音乐之声-SongTastespan>
                    li>

经过阅读知道

<li>
                        <span class="aplayer-list-cur" style="background: #0e90d2;">span>
                        <span class="aplayer-list-index">歌曲排名span>
                        <span class="aplayer-list-title">歌曲名span>
                        <span class="aplayer-list-author">歌手span>
li>

呵呵呵,我们用正则表达式提取一下,顺便封装成方法

用python爬取酷狗音乐_第7张图片

#coding:utf-8
from selenium import webdriver
import time,re
name = input(">")
url = "https://music.liuzhijin.cn/?name=" + name + "&type=kugou"
def geturl(url):
    browser = webdriver.Firefox()
    browser.get(url)
    time.sleep(3)
    while True:
        try:
            browser.find_element_by_xpath("/html/body/section/div[1]/div/form[2]/div[4]/div/div[4]").click()
        except:
            break
    return browser.page_source
html = geturl(url)
a111 = '''
  • (.*?) (.*?) (.*?)
  • '''
    a11='''
  • (.*?) (.*?) (.*?)
  • '''
    a='''
  • (.*?) (.*?) (.*?)
  • '''
    a1=re.findall(a111,html) b1 = re.findall(a11,html) b2=re.findall(a,html) data = a1+b1+b2

    加上选择

    #coding:utf-8
    from selenium import webdriver
    import time,re
    name = input(">")
    url = "https://music.liuzhijin.cn/?name=" + name + "&type=kugou"
    def geturl(url):
        browser = webdriver.Firefox()
        browser.get(url)
        time.sleep(3)
        while True:
            try:
                browser.find_element_by_xpath("/html/body/section/div[1]/div/form[2]/div[4]/div/div[4]").click()
            except:
                break
        return browser.page_source
    html = geturl(url)
    a111 = '''
  • (.*?) (.*?) (.*?)
  • '''
    a11='''
  • (.*?) (.*?) (.*?)
  • '''
    a='''
  • (.*?) (.*?) (.*?)
  • '''
    a1=re.findall(a111,html) b1 = re.findall(a11,html) b2=re.findall(a,html) data = a1+b1+b2 song = 0 for i in data: print(i) num = input(">>") for i in data: if (str(i[0])==num): song = i url = "https://music.liuzhijin.cn/?name=" + song[1]+' - '+song[2] + "&type=kugou" print(geturl(url))

    看下载链接的位置的代码是

    <span class="am-input-group-btn">
                                        <a id="j-src-btn" class="am-btn am-btn-default" target="_blank" href="(.*?)" download="(.*?)">
                                            <i id="j-src-btn-icon" class="am-icon-download"></i>
                                        </a>
                                    </span>
    

    我们通过搜索歌曲名 - 歌手来搜索并下载

    #coding:utf-8
    from selenium import webdriver
    import time,re,requests
    name = input(">")
    url = "https://music.liuzhijin.cn/?name=" + name + "&type=kugou"
    def geturl(url):
        browser = webdriver.Firefox()
        browser.get(url)
        time.sleep(3)
        while True:
            try:
                browser.find_element_by_xpath("/html/body/section/div[1]/div/form[2]/div[4]/div/div[4]").click()
            except:
                break
        return browser.page_source
    html = geturl(url)
    a111 = '''
  • (.*?) (.*?) (.*?)
  • '''
    a11='''
  • (.*?) (.*?) (.*?)
  • '''
    a='''
  • (.*?) (.*?) (.*?)
  • '''
    a1=re.findall(a111,html) b1 = re.findall(a11,html) b2=re.findall(a,html) data = a1+b1+b2 song = 0 for i in data: print(i) num = input(">>") for i in data: if (str(i[0])==num): song = i url1 = "https://music.liuzhijin.cn/?name=" + song[1]+' - '+song[2] + "&type=kugou" a7=''' ''' html1=geturl(url1) a8 = re.findall(a7,html1) r = requests.get(a8[0][0]) with open(a8[0][1], "wb") as code: code.write(r.content)

    接下来是优化环节:
    我发现,用这东西乱蹦浏览器,我们使用无头模式,再加一个循环

    完整代码

    #coding:utf-8
    from selenium import webdriver
    import time,re,requests
    while True:
        name = input("输入你想下载的歌曲名并按下回车(不输入并按下回车退出):")
        if (name==''):
            break
        else:
            url = "https://music.liuzhijin.cn/?name=" + name + "&type=kugou"
            def geturl(url):
                options = webdriver.FirefoxOptions()
                options.add_argument('-headless')
                browser = webdriver.Firefox(options=options)
                browser.get(url)
                time.sleep(3)
                while True:
                    try:
                        browser.find_element_by_xpath("/html/body/section/div[1]/div/form[2]/div[4]/div/div[4]").click()
                    except:
                        break
                temp = browser.page_source
                browser.close()
                return temp
            html = geturl(url)
            a111 = '''
  • (.*?) (.*?) (.*?)
  • '''
    a11='''
  • (.*?) (.*?) (.*?)
  • '''
    a='''
  • (.*?) (.*?) (.*?)
  • '''
    a1=re.findall(a111,html) b1 = re.findall(a11,html) b2=re.findall(a,html) data = a1+b1+b2 song = 0 for i in data: print(i) num = input("请输入序号:") for i in data: if (str(i[0])==num): song = i url1 = "https://music.liuzhijin.cn/?name=" + song[1]+' - '+song[2] + "&type=kugou" a7=''' ''' html1=geturl(url1) a8 = re.findall(a7,html1) r = requests.get(a8[0][0]) with open(a8[0][1], "wb") as code: code.write(r.content)

    使用示例:
    用python爬取酷狗音乐_第8张图片

    exe文件下载地址:https://www.jianguoyun.com/p/DcMuR8YQs7rYCBjy1rED
    -小左1120
    这个程序费了我2天时间,求求你们点个赞,关个注呗

    你可能感兴趣的:(python爬虫,pycharm,python,selenium,python,windows)