Python实战计划学习笔记0701

实战计划第四天,抓了100张照片。

最终成果是这样的:

Python实战计划学习笔记0701_第1张图片
Paste_Image.png

我的代码:

#!/usr/bin/env python    #告诉计算机执行程序在系统环境变量中的名字,详细位置在环境变量中设置好了
# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup
import requests
import time
import urllib.request

url = 'http://weheartit.com/inspirations/taylorswift?page='     #网址弄错了耽误了效率
proxies = {"HTTP":"121.58.227.252:8080"}
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 Safari/537.36 SE 2.X MetaSr 1.0'}

def download (url):
    wb_data = requests.get(url,headers=headers)
    if wb_data.status_code != 200:
        return
    filename = url.split('/')[4]   #split是将字符串分解成小字符串
    target = 'E:\PycharmProjects\homework4\imgs\{}.jpg'.format(filename)
    with open(target,'wb') as fs:
        fs.write(wb_data.content)
    print('%s -> %s' % (url,target))   #遍历 cookies 中的 name 和 value 信息打印#和C中的占位符一致

'''''
def dl_image(url):
    urllib.request.urlretrieve(url,path + url.split('/')[2] + url.split('.')[-1])
    print('Done')
'''''

def get_img(url,data=None):
    wb_data = requests.get(url,headers=headers)     #代理和请求头文件
    soup = BeautifulSoup(wb_data.text,'lxml')
    imgs = soup.select('#main-container > div > div > div > div > div > a > img')   #copy CSS selector
    if data == None:
        for img in imgs:
            data = img.get('src')
            print(data)
            download(data)


def get_more_pages(start,end):
    for one in range(start,end):
        get_img(url+str(one))
        time.sleep(2)

get_more_pages(1,10)

总结

  • 对网址的处理,很多时候网址选择错误导致报错
  • 代理搞了半天,每一个能用的,老师报错,用VPN解决掉了
  • with as 读写文件方法
  • split分割
  • 异步加载 XRH下检视器看网页

你可能感兴趣的:(Python实战计划学习笔记0701)