爬取篇-利用python3爬取美女图片

平台:ubuntu16.04

python版本:3.6.3

引入模块 bs4  ,urllib.request, imp,sys

爬取地址:http://www.dbmeinv.com/

#!/usr/bin/python3
# -*- coding: UTF-8 -*-

from bs4 import BeautifulSoup
import urllib.request
from imp import reload
import sys
reload(sys)
html = 'very good'
soup = BeautifulSoup(html,'html.parser')
print(soup.title)

#test  open test.html
soup2 = BeautifulSoup(open('test.html'),'html.parser')
print(soup2.prettify()) #print local file content

url = 'http://www.dbmeinv.com/?pager_offset=2'
num = 1
#get code

def crawl(url):   #伪装火狐浏览器,反爬
  header = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:65.0) Gecko/20100101 Firefox/65.0'}
  req = urllib.request.Request(url,headers=header)
  page = urllib.request.urlopen(req,timeout=60)
  content = page.read()
  print(content,decode())   #cotent里的内容需要进行解码才能正常看

  soup = BeautifulSoup(content,'html.parser')
  get_girl = soup.find_all('img')  找到img标签
  for girl in get_girl:   #遍历
      link = girl.get('src')   找到src标签
      print(link)
      global num
      urllib.request.urlretrieve(link,'image/{}.jpg'.format(num))  #利用urllib模块下的request里的urlretrieve进行链接下载,保存到image文件夹里
      num += 1
crawl(url)

在本目录下创建一个image文件夹

mkdir image

运行:

guo@ubuntu:~/test/python/A_meizitu$ python3 meizitu.py 
very good
test test

you are very good

不羞涩 | 真实的图片分享交友社区
不羞涩,真实的图片分享交友社区。
 
https://wx3.sinaimg.cn/bmiddle/0060lm7Tgy1g22fgqr55oj30u00k00um.jpg https://wx4.sinaimg.cn/bmiddle/0060lm7Tgy1g22ff9ge8zj30u00s777u.jpg https://wx4.sinaimg.cn/bmiddle/0060lm7Tgy1g22fhvz2z5j30k00f0mya.jpg https://wx1.sinaimg.cn/bmiddle/0060lm7Tgy1g22ff31rzuj30u00s077j.jpg https://wx4.sinaimg.cn/bmiddle/0060lm7Tgy1g22fhjkxdoj30hv0ih3zf.jpg https://wx4.sinaimg.cn/bmiddle/0060lm7Tgy1g22fh7l9qhj30u01hc7an.jpg https://wx4.sinaimg.cn/bmiddle/0060lm7Tgy1g22fi7pdy6j30u0140n35.jpg https://wx4.sinaimg.cn/bmiddle/0060lm7Tgy1g22fhwgxx2j30u0140djp.jpg https://wx1.sinaimg.cn/bmiddle/0060lm7Tgy1g22fhw7p7aj30u014041u.jpg https://wx1.sinaimg.cn/bmiddle/0060lm7Tgy1g22ff9oa5pj30u0140gpd.jpg https://wx1.sinaimg.cn/bmiddle/0060lm7Tgy1g22ffkxeptj30rs0rst9k.jpg https://wx4.sinaimg.cn/bmiddle/0060lm7Tgy1g22ff37jsaj30u00l6acv.jpg https://wx4.sinaimg.cn/bmiddle/0060lm7Tgy1g22ff3ay1hj30k00plq57.jpg https://wx4.sinaimg.cn/bmiddle/0060lm7Tgy1g22fgkxa75j30u00mijwr.jpg https://wx2.sinaimg.cn/bmiddle/0060lm7Tgy1g22fidcdp4j30tr1sg7a5.jpg https://wx3.sinaimg.cn/bmiddle/0060lm7Tgy1g22fhpsiy1j30u0140dhi.jpg https://wx1.sinaimg.cn/bmiddle/0060lm7Tgy1g22fflat9oj30u0140tkl.jpg https://wx2.sinaimg.cn/bmiddle/0060lm7Tgy1g22fft14xtj30u0140wqo.jpg https://wx1.sinaimg.cn/bmiddle/0060lm7Tgy1g22ffsig5wj30u0140wqn.jpg https://wx1.sinaimg.cn/bmiddle/0060lm7Tgy1g22ffs63lhj30u0140gv0.jpg

结果:

guo@ubuntu:~/test/python/A_meizitu/image$ ls
10.jpg  12.jpg  14.jpg  16.jpg  18.jpg  1.jpg   2.jpg  4.jpg  6.jpg  8.jpg
11.jpg  13.jpg  15.jpg  17.jpg  19.jpg  20.jpg  3.jpg  5.jpg  7.jpg  9.jpg
guo@ubuntu:~/test/python/A_meizitu/image$ 

爬取篇-利用python3爬取美女图片_第1张图片

你可能感兴趣的:(Linux,python)