2019实战第二期-并发实战打卡

题目:

用并发写一个小程序,获取有道翻译上的单词意思,比如 单词china
爬取url:http://dict.youdao.com/w/eng/china/
输出的数据结构:单词的名称,单词的发音,单词的内容。例如
{'Word': 'china','Proc': '', 'Desc':''}

提示:大家可以多线程,也可以用线程池(ThreadPoolExcutor)

代码思路:
1、先写一个函数去下载http://dict.youdao.com/w/eng/china/
2、然后解析这个页面,解析可以用pyquery,这个库非常好用,大概只要几行代码可以解析
3、然后用多线程去处理上面的task

from pyquery import PyQuery as pq
import requests
import threadpool


def download_html(word):
    output = {'Word': word}
    final_output = {}
    url = 'http://dict.youdao.com/w/eng/{}/'.format(word)
    try:
        r = requests.get(url)
        if r.status_code == 200:
            doc = pq(r.text)
            final_output = decode_html(doc, output)
            print(final_output)
    except Exception as e:
        print('抓取页面异常,抓取不到:' + word)
        return None
    return final_output


def decode_html(doc, output):
    output['Proc'] = ''
    output['Desc'] = ''
    for pro in doc.items('.baav .pronounce'):
        output['Proc'] = output['Proc'] + pro.text()

    for li in doc.items('#phrsListTab .trans-container ul li'):
        output['Desc'] = output['Desc'] + li.text()
    return output


word_list = ['china', 'nice', 'python', 'beautiful', 'girl']
pool = threadpool.ThreadPool(10)
word_pool = threadpool.makeRequests(download_html, word_list)
[pool.putRequest(req) for req in word_pool]
pool.wait()

你可能感兴趣的:(2019实战第二期-并发实战打卡)