下面是Python实现的简单功能:爬取图片
首先,定义数据库
import requests,os,shutil
from lxml import html
etree = html.etree
然后创建软链接:
url = 'https://******'
responce = requests.get(url)
root = etree.HTML(responce.content)
big_list = root.xpath('//ul[@class="bzmenu"]/li/a')
为方便查看,我们可以专门为图片创建一个文件夹
#爬取图片全部放入同一文件夹
os.mkdir('图')
os.chdir('图')
下面,遍历网页网格信息,拿到我们想要图片的地址
for big_cate in big_list:
big_cate_src = big_cate.xpath('@href')[0]
big_cate_alt = big_cate.xpath("text()")[0]
big_cate_src = "https://www.ivsky.com" + big_cate_src
os.mkdir(big_cate_alt)
接下来就是重中之重了,我们将抓取1到3页的图片
#抓取1到3页的图片
for page in range(1,3):
big_src = big_cate_src+ 'index_%s.html' % page
print(big_src)
big_responce = requests.get(big_src)
big_root = etree.HTML(big_responce.content)
img_src = big_root.xpath('//ul[@class="ali"]/li/div/a/img/@src')
img_alt = big_root.xpath('//ul[@class="ali"]/li/div/a/img/@alt')
tuple_img = zip(img_alt, img_src)
#图片抓取
for alt, src in tuple_img:
src = 'https:' + src
img_responce = requests.get(src)
if '/' in alt:
alt = alt.replace('/','-')
flie = open('C:/Users/Administrator/Desktop/图/' +big_cate_alt+'/'+ alt + '.jpg', 'wb')
flie.write(img_responce.content)
flie.close()
实测效果清晰明了,简单大方,Python真的是太秀了!