Async与aiohttp介绍

【async/await】 asyncio--异步IO async--异步
将异步从yieled写法中解放出来.
async一般用于方法或者条件语句前面,用于表明当前条件语句内部或者方法内部存在异步函数
await 用于具体的操作前面,表明当前操作为异步操作

!/usr/local/bin/python3.5

import asyncio
from aiohttp import ClientSession

async def hello():
async with ClientSession() as session:
async with session.get("http://httpbin.org/headers") as response:
response = await response.read()
print(response)

loop = asyncio.get_event_loop()
loop.run_until_complete(hello())

使用async和await将函数异步化,上述hello()实际有两个异步操作:首先异步获取响应;然后异步堵气响应内容
Aiohttp推荐使用ClientSession作为主要的接口发起请求。ClientSession允许在多个请求之间保存cookie以及相关对象信息。
Session(会话)在使用完毕之后需要关闭,关闭Session是另一个异步操作,所以每次你都需要使用async with关键字。
要让程序正常跑起来需要将他们加入时间循环中,因此要创建asyncio loop实例,然后将任务加入其中。

【aiohttp】
基础用法:
async with aiohttp.get('https://github.com') as r:---异步发起请求
await r.text()-----异步操作

设置超时时间:
with aiohttp.Timeout(0.001):
async with aiohttp.get('https://github.com') as r:
await r.text()

构建session:
async with aiohttp.ClientSession() as session:
async with session.get('https://api.github.com/events') as resp:
print(resp.status)
print(await resp.text())

构建headers:
url = 'https://api.github.com/some/endpoint'
headers = {'content-type': 'application/json'}
await session.get(url, headers=headers)

使用代理:
EG_1:
conn = aiohttp.ProxyConnector(proxy="http://some.proxy.com")----创建代理
session = aiohttp.ClientSession(connector=conn)
async with session.get('http://python.org') as resp:
print(resp.status)
EG_2:
conn = aiohttp.ProxyConnector(
proxy="http://some.proxy.com",
proxy_auth=aiohttp.BasicAuth('user', 'pass')
)
session = aiohttp.ClientSession(connector=conn)
async with session.get('http://python.org') as r:
assert r.status == 200

自定义cookie:
url = 'http://httpbin.org/cookies'
async with ClientSession({'cookies_are': 'working'}) as session:
async with session.get(url) as resp:
assert await resp.json() == {"cookies": {"cookies_are": "working"}}

爬虫实例:
import urllib.request as request
from bs4 import BeautifulSoup as bs
import asyncio
import aiohttp

@asyncio.coroutine
async def getPage(url,res_list):
print(url)
headers = {'User-Agent':'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'}
# conn = aiohttp.ProxyConnector(proxy="http://127.0.0.1:8087")
async with aiohttp.ClientSession() as session:
async with session.get(url,headers=headers) as resp:
assert resp.status==200
res_list.append(await resp.text())

class parseListPage():
def init(self,page_str):
self.page_str = page_str
def enter(self):
page_str = self.page_str
page = bs(page_str,'lxml')
# 获取文章链接
articles = page.find_all('div',attrs={'class':'article_title'})
art_urls = []
for a in articles:
x = a.find('a')['href']
art_urls.append('http://blog.csdn.net'+x)
return art_urls
def exit(self, exc_type, exc_val, exc_tb):
pass

with open() as f:

page_num = 5
page_url_base = 'http://blog.csdn.net/u014595019/article/list/'
page_urls = [page_url_base + str(i+1) for i in range(page_num)]
loop = asyncio.get_event_loop()
ret_list = []
tasks = [getPage(host,ret_list) for host in page_urls]
loop.run_until_complete(asyncio.wait(tasks))

articles_url = []
for ret in ret_list:
with parseListPage(ret) as tmp:
articles_url += tmp
ret_list = []

tasks = [getPage(url, ret_list) for url in articles_url]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()


你可能感兴趣的:(Async与aiohttp介绍)