(此文为小左1120原创,转载请标明出处,如违规请联系作者立刻删除)
此文为python爬虫文章,讲了用python爬取某狗里的歌名并下载(包括vip)后面带exe
由于本人爬虫经验不足,代码可能会粗糙,有bug,麻烦大佬加个微信QQ一起学吧!代码失效了也给我通知一声
微信:aicoder-XZ
QQ:2583188733
老规矩
配置好python
配置好pycharm
再安装依赖的模块:
点这里(感谢大神hy_696的文章)(ps:在他给的网站里看一下最新版本,不要下载,在https://mirrors.huaweicloud.com/geckodriver找到刚看的最新版本,进去后下载,这样又快又稳定)
接下来,按下win+R输入cmd-回车!在黑窗口里输入
pip install requests
打开pycharm,
新建项目,(不会的话点这里)
进去后,右键文件夹,
输入个你喜欢的名字,回车!
开始写代码吧!
先打开浏览器
访问某狗音乐下载(不要用ie)
随便搜索搜索看看网址
以《癞蛤蟆》为例
网址是
https://music.liuzhijin.cn/?name=癞蛤蟆&type=kugou
所以搜索的格式是
https://music.liuzhijin.cn/?name=歌曲名&type=kugou
好,咱先爬:
from selenium import webdriver
import time
name = input(">")
url = "https://music.liuzhijin.cn/?name=" + name + "&type=kugou"
browser = webdriver.Firefox()
browser.get(url)
这个网站有一点问题,底下有个载入更多,不能直接全部加载,
没有事的。。
按下F12
在出来的控制台的左上角点这个按钮
点击载入更多
ok,浏览器会定位到这样一条元素,
<div class="aplayer-more">载入更多div>
好
from selenium import webdriver
import time
name = input(">")
url = "https://music.liuzhijin.cn/?name=" + name + "&type=kugou"
browser = webdriver.Firefox()
browser.get(url)
time.sleep(3)
while True:
try:
browser.find_element_by_xpath("你的XPath").click()
except:
break
我的是/html/body/section/div[1]/div/form[2]/div[4]/div/div[4]
所以就是
from selenium import webdriver
import time
name = input(">")
url = "https://music.liuzhijin.cn/?name=" + name + "&type=kugou"
browser = webdriver.Firefox()
browser.get(url)
time.sleep(3)
while True:
try:
browser.find_element_by_xpath("/html/body/section/div[1]/div/form[2]/div[4]/div/div[4]").click()
except:
break
一切就绪,获取网页源代码
html = browser.page_source
控制台上看一看
挑一个(我搜索的Mojito)
<li>
<span class="aplayer-list-cur" style="background: #0e90d2;">span>
<span class="aplayer-list-index">2span>
<span class="aplayer-list-title">Mojitospan>
<span class="aplayer-list-author">许66span>
li>
<li>
<span class="aplayer-list-cur" style="background: #0e90d2;">span>
<span class="aplayer-list-index">3span>
<span class="aplayer-list-title">Mojitospan>
<span class="aplayer-list-author">音乐之声-SongTastespan>
li>
经过阅读知道
<li>
<span class="aplayer-list-cur" style="background: #0e90d2;">span>
<span class="aplayer-list-index">歌曲排名span>
<span class="aplayer-list-title">歌曲名span>
<span class="aplayer-list-author">歌手span>
li>
呵呵呵,我们用正则表达式提取一下,顺便封装成方法
#coding:utf-8
from selenium import webdriver
import time,re
name = input(">")
url = "https://music.liuzhijin.cn/?name=" + name + "&type=kugou"
def geturl(url):
browser = webdriver.Firefox()
browser.get(url)
time.sleep(3)
while True:
try:
browser.find_element_by_xpath("/html/body/section/div[1]/div/form[2]/div[4]/div/div[4]").click()
except:
break
return browser.page_source
html = geturl(url)
a111 = '''
(.*?)
(.*?)
'''
a11='''
(.*?)
(.*?)
'''
a='''
(.*?)
(.*?)
'''
a1=re.findall(a111,html)
b1 = re.findall(a11,html)
b2=re.findall(a,html)
data = a1+b1+b2
加上选择
#coding:utf-8
from selenium import webdriver
import time,re
name = input(">")
url = "https://music.liuzhijin.cn/?name=" + name + "&type=kugou"
def geturl(url):
browser = webdriver.Firefox()
browser.get(url)
time.sleep(3)
while True:
try:
browser.find_element_by_xpath("/html/body/section/div[1]/div/form[2]/div[4]/div/div[4]").click()
except:
break
return browser.page_source
html = geturl(url)
a111 = '''
(.*?)
(.*?)
'''
a11='''
(.*?)
(.*?)
'''
a='''
(.*?)
(.*?)
'''
a1=re.findall(a111,html)
b1 = re.findall(a11,html)
b2=re.findall(a,html)
data = a1+b1+b2
song = 0
for i in data:
print(i)
num = input(">>")
for i in data:
if (str(i[0])==num):
song = i
url = "https://music.liuzhijin.cn/?name=" + song[1]+' - '+song[2] + "&type=kugou"
print(geturl(url))
看下载链接的位置的代码是
<span class="am-input-group-btn">
<a id="j-src-btn" class="am-btn am-btn-default" target="_blank" href="(.*?)" download="(.*?)">
<i id="j-src-btn-icon" class="am-icon-download"></i>
</a>
</span>
我们通过搜索歌曲名 - 歌手来搜索并下载
#coding:utf-8
from selenium import webdriver
import time,re,requests
name = input(">")
url = "https://music.liuzhijin.cn/?name=" + name + "&type=kugou"
def geturl(url):
browser = webdriver.Firefox()
browser.get(url)
time.sleep(3)
while True:
try:
browser.find_element_by_xpath("/html/body/section/div[1]/div/form[2]/div[4]/div/div[4]").click()
except:
break
return browser.page_source
html = geturl(url)
a111 = '''
(.*?)
(.*?)
'''
a11='''
(.*?)
(.*?)
'''
a='''
(.*?)
(.*?)
'''
a1=re.findall(a111,html)
b1 = re.findall(a11,html)
b2=re.findall(a,html)
data = a1+b1+b2
song = 0
for i in data:
print(i)
num = input(">>")
for i in data:
if (str(i[0])==num):
song = i
url1 = "https://music.liuzhijin.cn/?name=" + song[1]+' - '+song[2] + "&type=kugou"
a7='''
'''
html1=geturl(url1)
a8 = re.findall(a7,html1)
r = requests.get(a8[0][0])
with open(a8[0][1], "wb") as code:
code.write(r.content)
接下来是优化环节:
我发现,用这东西乱蹦浏览器,我们使用无头模式,再加一个循环
完整代码
#coding:utf-8
from selenium import webdriver
import time,re,requests
while True:
name = input("输入你想下载的歌曲名并按下回车(不输入并按下回车退出):")
if (name==''):
break
else:
url = "https://music.liuzhijin.cn/?name=" + name + "&type=kugou"
def geturl(url):
options = webdriver.FirefoxOptions()
options.add_argument('-headless')
browser = webdriver.Firefox(options=options)
browser.get(url)
time.sleep(3)
while True:
try:
browser.find_element_by_xpath("/html/body/section/div[1]/div/form[2]/div[4]/div/div[4]").click()
except:
break
temp = browser.page_source
browser.close()
return temp
html = geturl(url)
a111 = '''
(.*?)
(.*?)
'''
a11='''
(.*?)
(.*?)
'''
a='''
(.*?)
(.*?)
'''
a1=re.findall(a111,html)
b1 = re.findall(a11,html)
b2=re.findall(a,html)
data = a1+b1+b2
song = 0
for i in data:
print(i)
num = input("请输入序号:")
for i in data:
if (str(i[0])==num):
song = i
url1 = "https://music.liuzhijin.cn/?name=" + song[1]+' - '+song[2] + "&type=kugou"
a7='''
'''
html1=geturl(url1)
a8 = re.findall(a7,html1)
r = requests.get(a8[0][0])
with open(a8[0][1], "wb") as code:
code.write(r.content)
exe文件下载地址:https://www.jianguoyun.com/p/DcMuR8YQs7rYCBjy1rED
-小左1120
这个程序费了我2天时间,求求你们点个赞,关个注呗