Python 抓取google链接代码

简单介绍下程序,PY2.7.2写的,如果是PY3的有不兼容的话请参照2-》3的手册自己改吧,另外由于msvcrt模块,只支持windows哈
本程序的原理是基于google的json的api,例如: https://ajax.googleapis.com/ajax ... p;rsz=8&start=1

如下图

Python 抓取google链接代码_第1张图片

1.line代表线程数
2.key是关键字,支持google语法
3.How many代表拉取几条,由于json一页只有8条,所以一个线程一次拉取8条哈
4.任何时候,按q键,直接退出
5.请大家按喜好随便修改

#! /usr/bin/env python
#coding=utf-8
import urllib2,urllib,threading,Queue,os
import msvcrt
import simplejson
import sys

seachstr = raw_input("Key?:")
pagenum = raw_input("How many?:")
pagenum = int(pagenum)/8+1
line = 5

class googlesearch(threading.Thread):
    def __init__(self):
        threading.Thread.__init__(self)
        self.urls= []

    def run(self):
        while 1:
            self.catchURL()
            queue.task_done()
    def catchURL(self):
        self.key = seachstr.decode('gbk').encode('utf-8')
        self.page= str(queue.get())
        url = ('https://ajax.googleapis.com/ajax/services/search/web?v=1.0&q=%s&rsz=8&start=%s') % (urllib.quote(self.key),self.page)
        try:
            request = urllib2.Request(url)
            response = urllib2.urlopen(request)
            results = simplejson.load(response)
            URLinfo = results['responseData']['results']
        except Exception,e:
            print e
        else:
            for info in URLinfo:
                print info['url']

class ThreadGetKey(threading.Thread):
    def run(self):
        while 1:
            try:
                chr = msvcrt.getch()
                if chr == 'q':
                    print "stopped by your action ( q )" 
                    os._exit(1)
                else:
                    continue
            except:
                os._exit(1)

if __name__ == '__main__':
    pages=[]
    queue = Queue.Queue()

    for i in range(1,pagenum+1):
        pages.append(i)

    for n in pages:
        queue.put(n)

    ThreadGetKey().start()

    for p in range(line):
        googlesearch().start()

转自: http://sb.f4ck.org/forum.php?mod=viewthread&tid=6205&highlight=python

你可能感兴趣的:(python)