首先安装cssselect
pip install cssselect
再安装lxml
pip install lxml
#coding=utf-8
import requests
from lxml import etree
def getHtml(url):
page = requests.get(url)
html =page.text
return html
def getImg(html):
html = etree.HTML(html)
img_info = html.cssselect('.BDE_Image[src]')
for img in img_info:
print (img.attrib['src'])
if __name__=='__main__':
url = "https://tieba.baidu.com/p/5113603072"
html = getHtml(url)
getImg(html)
print ("OK!All DownLoad!")
通过分析网页源码得知,所有的图片原始地址都类似于如下:
img class="BDE_Image" src="https://imgsa.baidu.com/forum/w%3D580/sign=29a773eb871001e94e3c1407880f7b06/50cf3bc79f3df8dc5b6bb593c711728b47102859.jpg"
即包含class属性,也包含src属性,写出css表达式,
.BDE_Image 表示class的属性值为BDE_Image
后面加上[src]表示该标签内同时也包含src属性,即.BDE_Image[src]为该css选择器表达式。
欢迎关注本人微信公众号,会分享更多的干货: