Python协程asynico模块解读

为了搞清楚asynico模块的具体作用，那我们要先明白一些基本概念。一般在爬虫里面，为了加快速度，我们可以使用多进程、多线程、协程，这篇文章详细解读一下这三者的区别的区别。在一个主程序运行的时候，会有一个主进程产生，但是进程是不能处理任务的，要处理具体任务，例如发送请求，读写文件，它就要产生一个主线程帮它完成，而在任务处理的时候，这里继续以发送请求为例，很多时候请求发送出去了，但回复并不是及时的，这时候线程会卡在这里，等待回复之后再处理下一个URL，这个时候就出现了协程。如果在发送请求的时候，线程并不等待服务器的回复，而是直接发送第二个URL的请求，第三个，第四个......直到第一个请求回复了，那么再回来处理第一个请求的的回复，这就是协程的工作原理，充分利用线程的工作效率，也没有多线程切换的开销，所以在处理IO操作时协程非常高效。
asynico模块就是支持异步IO的，是在Python3.4之后才有的模块，功能相当强大，但是目前它不支持发送http请求，只支持tcp请求，如果要发送http请求，就要自己再tcp基础之上封装自己的http请求，当然这啃不动不用我们自己写啊，谁叫我们用了Python呢，早就有人为我们封装了这个模块，那就是aiohttp，我们直接用就好了。下面是一个简单的事例代码，python3.4版本的。

import aiohttp,asyncio


@asyncio.coroutine#这里是协程的固定用法，需要用该装饰器
def downing_url(url):
    print('downing',url)
    response = yield from aiohttp.request('GET', url)

    print(url, response)
    response.close()


tasks = [downing_url('http://www.cnblogs.com/'), downing_url('http://www.chouti.com/')]

event_loop = asyncio.get_event_loop()#事件循环，也就是监听是否遇到IO阻塞
results = event_loop.run_until_complete(asyncio.gather(*tasks))#把任务添加进去，也就是执行的函数
event_loop.close()#关闭事件循环

执行结果如图

Paste_Image.png

在看崔大神的课程时，发现Python3.5对于协程有了更好的封装，代码如图

import aiohttp
async def downing_url(url):
    print('downing',url)
    response = await aiohttp.request('GET', url)
    print(url, response)
    response.close()

tasks = [downing_url('http://www.cnblogs.com/'), downing_url('http://www.chouti.com/')]

event_loop = asyncio.get_event_loop()#事件循环
results = event_loop.run_until_complete(asyncio.gather(*tasks))#把任务添加进去，也就是执行的函数
event_loop.close()#关闭事件循环

我们也可以为请求添加代理或者请求头

async def downing_url(url):
    async with aiohttp.ClientSession() as session:
        print('downing',url)
        async with session.get(url,proxy='http://121.193.143.249:80') as html:
            text = await html.text()
            print(text)




tasks = [downing_url('http://ip.chinaz.com/getip.aspx'), downing_url('http://ip.chinaz.com/getip.aspx')]
event_loop = asyncio.get_event_loop()#事件循环
results = event_loop.run_until_complete(asyncio.gather(*tasks))
event_loop.close()#关闭事件循环

详细的信息请移步官方文档http://aiohttp.readthedocs.io/en/stable/
这里在函数面前直接添加async以及用await替代yield from，看着就更合理化了。
如果用asynico结合requests模块，就要复杂一点了,下面两段代码是网上摘录老男孩武沛齐老师博客的

import asyncio
import requests


@asyncio.coroutine
def fetch_async(func, *args):
    loop = asyncio.get_event_loop()
    future = loop.run_in_executor(None, func, *args)
    response = yield from future
    print(response.url, response.content)


tasks = [
    fetch_async(requests.get, 'http://www.cnblogs.com/wupeiqi/'),
    fetch_async(requests.get, 'http://dig.chouti.com/pic/show?nid=4073644713430508&lid=10273091')
]

loop = asyncio.get_event_loop()
results = loop.run_until_complete(asyncio.gather(*tasks))
loop.close()

异步模块twsted

from twisted.web.client import getPage
from twisted.internet import reactor

REV_COUNTER = 0
REQ_COUNTER = 0

def callback(contents):
    print(contents,)

    global REV_COUNTER
    REV_COUNTER += 1
    if REV_COUNTER == REQ_COUNTER:
        reactor.stop()


url_list = ['http://www.bing.com', 'http://www.baidu.com', ]
REQ_COUNTER = len(url_list)
for url in url_list:
    deferred = getPage(bytes(url, encoding='utf8'))
    deferred.addCallback(callback)
reactor.run()

Python协程asynico模块解读

你可能感兴趣的:(Python协程asynico模块解读)