Python学习笔记 - aiohttp

上一个笔记总结了asyncio的一些知识点,这次就来应用一下。如果使用协程的方式来写爬虫,网络相关的请求就要将requests库替换成aiohttp这个库。

  • 官方文档
  • 中文文档

一. 效率对比

  1. 用上次写的爬虫,先爬一些壁纸链接

    urls = [
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-729560.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-724055.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716644.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716643.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716645.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-686220.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-686212.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-652608.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639894.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639893.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639892.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639890.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639888.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-468197.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467016.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467012.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467009.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467007.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467005.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466997.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466998.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466993.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466994.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466995.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-729560.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-724055.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716644.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716643.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716645.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-686220.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-686212.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-652608.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639894.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639893.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639892.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639890.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639888.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-468197.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467016.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467012.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467009.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467007.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467005.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466997.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466998.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466993.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466994.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466995.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466992.jpg",
        "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466989.jpg"
    ]
    复制代码
  2. 用协程的方式下载这些图片

    • 安装aiohttp库

    • 根据文档,aiohttp库发起请求使用的是aiohttp.ClientSession(),文档建议不用每次请求都创建一个session会话,所以这里就只创建一个:

      async def main():
          async with aiohttp.ClientSession() as session:
              pass
      复制代码
    • 定义下载图片的协程函数和创建储存路径的函数:

      # 下载图片
      async def download_img(session, url):
          image_name = url.split('/')[-1]
          async with session.get(url, headers=headers) as response:
              with open('%s/%s' % (get_store_path('city'), image_name), 'wb') as fd:
                  while True:
                      chunk = await response.content.read(200)
                      if not chunk:
                          break
                      fd.write(chunk)
      复制代码
      # 获取图片储存路径,如果没有则创建
      def get_store_path(dir_name):
          current_path = os.path.abspath('.')
          target_path = os.path.join(current_path, 'wallpaper/%s' % dir_name)
          folder = os.path.exists(target_path)
          if not folder:
              os.makedirs(target_path)
      
          return target_path
      复制代码
    • 补全main函数

      async def main(loop):
          async with aiohttp.ClientSession() as session:
              tasks = [loop.create_task(download_img(session, url)) for url in urls]
              await asyncio.wait(tasks)
              
      loop = asyncio.get_event_loop()
      loop.run_until_complete(main(loop))
      loop.close()
      复制代码
    • 计算下载耗时:

      if __name__ == '__main__':
          t1 = time.time()
          loop = asyncio.get_event_loop()
          loop.run_until_complete(main(loop))
          loop.close()
          print('耗时:%fs' % (time.time() - t1))
      复制代码
    • 结果:

      26张图片用时1.476328s

  3. 使用多进程方式下载

     import requests
     import multiprocessing
     import os
     import time
    
    
     urls = [
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-729560.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-724055.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716644.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716643.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716645.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-686220.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-686212.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-652608.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639894.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639893.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639892.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639890.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639888.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-468197.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467016.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467012.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467009.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467007.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467005.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466997.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466998.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466993.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466994.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466995.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-729560.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-724055.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716644.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716643.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-716645.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-686220.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-686212.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-652608.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639894.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639893.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639892.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639890.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-639888.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-468197.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467016.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467012.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467009.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467007.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-467005.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466997.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466998.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466993.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466994.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466995.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466992.jpg",
         "https://alpha.wallhaven.cc/wallpapers/thumb/small/th-466989.jpg"
     ]
    
     req_session = requests.Session()
     req_session.headers['user-agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'
    
    
     def get_store_path(dir_name):
         current_path = os.path.abspath('.')
         target_path = os.path.join(current_path, 'wallpaper/%s' % dir_name)
         folder = os.path.exists(target_path)
         if not folder:
             os.makedirs(target_path)
    
         return target_path
    
    
     def download_img(url):
         img = req_session.get(url, stream=True)
         image_name = url.split('/')[-1]
         with open('%s/%s' % (get_store_path('city'), image_name), 'wb') as fd:
             for chunk in img.iter_content(chunk_size=128):
                 fd.write(chunk)
    
    
     def main():
         p = multiprocessing.Pool()
         [p.apply_async(download_img, args=(url,)) for url in urls]
         p.close()
         p.join()
    
     if __name__ == '__main__':
         t1 = time.time()
         main()
         print('耗时:%fs' % (time.time() - t1))
    
    复制代码

    结果

两种我都执行了多次,差不多都在1.4~2.7s之间,可以看到协程还是很强大的,仅仅用单线程就做到了类似多进程的效果。

转载于:https://juejin.im/post/5c4ff008f265da617831cb68

你可能感兴趣的:(Python学习笔记 - aiohttp)