python_异步爬虫asyncio

目录

        • import asyncio
          • 1、定义协程
          • 2、绑定回调
          • 3、多任务协程
          • 4、协程实现
          • 5、使用aiohttp

import asyncio

  • event_loop:事件循环,相当于一个无限循环,我们可以把些函数注册到这个事件循环上,当满足条件发生的时候,就会调用对应的处理方法。
  • coroutine:中文翻译叫协程,在 Python 中常指代为协程对象类型,我们可以将协程对象注册到时间循环中,它会被事件循环调用。我们可以使用 async 关键字来定义一个方法,这个方法在调用时不会立即被执行,而是返回一个协程对象。
  • task:任务,它是对协程对象的进一步封装,包含了任务的各个状态。
  • future:代表将来执行或没有执行的任务的结果,实际上和task没有本质区别。
  • 参考文章链接
1、定义协程
  • (1)使用async定义了一个方法
  • (2)调用该方法,返回一个coroutine协成对象
  • (3)使用get_event_loop()方法创建一个事件循环loop
  • (4)调用loop对象的run_until_complete()方法将协程注册到事件循环loop中,然后启动
import asyncio
async def execute(x):
    print('Number:', x)
coroutine = execute(1)
print('Coroutine:', coroutine)
print('After calling execute')
loop = asyncio.get_event_loop()
loop.run_until_complete(coroutine)
print('After calling loop')
# Coroutine: 
# After calling execute
# Number: 1
# After calling loop
  • (5)定义task对象,调用loop对象的create_task()方法将coroutine对象转化为task对象,随后我们打印输出一下,发现它是 pending 状态。接着我们将 task 对象添加到事件循环中得到执行,随后我们再打印输出一下 task 对象,发现它的状态就变成了 finished。
import asyncio
async def execute(x):
    print('Number:', x)
    return x
coroutine = execute(1)
print('Coroutine:', coroutine)
print('After calling execute')
loop = asyncio.get_event_loop()
task = loop.create_task(coroutine)
print('Task:', task)
loop.run_until_complete(task)
print('Task:', task)
print('After calling loop')
# Coroutine: 
# After calling execute
# Task: >
# Number: 1
# Task:  result=1>
# After calling loop
  • (6)定义task对象的另一种方式通过asyncio的ensure_future()方法,返回的也是task对象
import asyncio
async def execute(x):
    print('Number:', x)
    return x
coroutine = execute(1)
print('Coroutine:', coroutine)
print('After calling execute')
task = asyncio.ensure_future(coroutine)
print('Task:', task)
loop = asyncio.get_event_loop()
loop.run_until_complete(task)
print('Task:', task)
print('After calling loop')
# Coroutine: 
# After calling execute
# Task: >
# Number: 1
# Task:  result=1>
# After calling loop
2、绑定回调
  • (1)调用add_done_callback()方法为某个task绑定一个回调方法。我们将 callback() 方法传递给了封装好的 task 对象,这样当 task 执行完毕之后就可以调用 callback() 方法了,同时 task 对象还会作为参数传递给 callback() 方法,调用 task 对象的 result() 方法就可以获取返回结果了
import asyncio
import requests
async def request():
    url = 'https://www.baidu.com'
    status = requests.get(url)
    return status
def callback(task):
    print('Status:', task.result())
coroutine = request()
task = asyncio.ensure_future(coroutine)
task.add_done_callback(callback)
print('Task:', task)
loop = asyncio.get_event_loop()
loop.run_until_complete(task)
print('Task:', task)
# Task:  cb=[callback() at E:/project/python基础.py:8]>
# Status: 
# Task:  result=>
  • (2)直接调用task运行完毕之后直接调用result()方法获取结果
import asyncio
import requests
async def request():
    url = 'https://www.baidu.com'
    status = requests.get(url)
    return status
coroutine = request()
task = asyncio.ensure_future(coroutine)
print('Task:', task)
loop = asyncio.get_event_loop()
loop.run_until_complete(task)
print('Task:', task)
print('Task Result:', task.result())
# Task: >
# Task:  result=>
# Task Result: 
3、多任务协程
  • (1)定义一个task列表,然后使用asyncio的wait()方法即可执行;我们使用一个 for 循环创建了五个 task,组成了一个列表,然后把这个列表首先传递给了 asyncio 的 wait() 方法,然后再将其注册到时间循环中,就可以发起五个任务了。最后我们再将任务的运行结果输出出来
import asyncio
import requests
async def request():
    url = 'https://www.baidu.com'
    status = requests.get(url)
    return status
tasks = [asyncio.ensure_future(request()) for _ in range(5)]
print('Tasks:', tasks)
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
for task in tasks:
    print('Task Result:', task.result())
# Tasks: [>, >, >, >, >]
# Task Result: 
# Task Result: 
# Task Result: 
# Task Result: 
# Task Result: 
4、协程实现
  • (1)使用 await 可以将耗时等待的操作挂起,让出控制权。当协程执行的时候遇到 await,时间循环就会将本协程挂起,转而去执行别的协程,直到其他的协程挂起或执行完毕。
import asyncio
import requests
import time
start = time.time()
async def get(url):
    return requests.get(url)
async def request():
    url = 'https://www.baidu.com'
    print('Waiting for', url)
    response = await get(url)
    print('Get response from', url, 'Result', response.status_code)
tasks = [asyncio.ensure_future(request()) for _ in range(5)]
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
end = time.time()
print('Cost time:', end - start)
# Waiting for https://www.baidu.com
# Get response from https://www.baidu.com Result 200
# Waiting for https://www.baidu.com
# Get response from https://www.baidu.com Result 200
# Waiting for https://www.baidu.com
# Get response from https://www.baidu.com Result 200
# Waiting for https://www.baidu.com
# Get response from https://www.baidu.com Result 200
# Waiting for https://www.baidu.com
# Get response from https://www.baidu.com Result 200
# Cost time: 0.45502614974975586
5、使用aiohttp
  • (1)aiohttp是一个支持异步请求的库,利用它和asyncio配合我们可以非常方便的实现异步请求操作。
import asyncio
import aiohttp
import time
start = time.time()
async def get(url):
    session = aiohttp.ClientSession()
    response = await session.get(url)
    result = await response.text()
    await session.close()
    return result
async def request():
    url = 'http://www.newsmth.net/nForum/#!mainpage'
    print('Waiting for', url)
    result = await get(url)
    print('Get response from', url, 'Result:', result)
tasks = [asyncio.ensure_future(request()) for _ in range(5)]
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))
end = time.time()
print('Cost time:', end - start)

你可能感兴趣的:(SpiderCrawl)