https://github.com/qiyeboy/SpiderBook
本书采用的是Python 2.7版本
sudo apt-get install python-pip python-dev
搭建Eclipse + PyDev : 通过扩展PyDev插件,Eclipse就具有了编写Python程序的功能。
启动Eclipse, 点击Help -> Install New Software ...
Add: name : Pydev , location: http://pydev.org/updates
Pydev解释器配置:window -> Pydev --> Interpreters --> Python Interpreter 添加python路径
读文件
try:
f = open(r'qiye.txt', 'r')
print f.read()
finally:
if f:
f.close()
上面代码略长,使用简单的写法,用with语句来代替try ... finally和close()
with open(r'qiye.txt', 'r') as fileReader:
print fileReader.read()
序列化操作:用dict 对象, 和CPickle模块(用C语言编写,速度快)和pickle模块
# 优先导入cPickle
try:
import cPickle as picker
except ImportError:
import pickle
import cPickle as pickle
d = dict(url='index.html', title='首页', content='内容')
pickel.dumps(d) #dumps可以将任意对象序列化成一个str
f = open(r'dump.txt', 'wb')
pickle.dump(d, f) # dump直接将对象写入文件
f.close()
反序列化:loads方法或load方法
f = open(r'dump.txt', 'rb')
d = pickle.load(f)
f.close()
进程和线程:
taskManager.py
# coding: utf-8
import random, time, Queue
from multiprocessing.managers import BaseManager
# 1
task_queue = Queue.Queue()
result_queue = Queue.Queue()
class Queuemanager(BaseManager):
pass
# 2 register
Queuemanager.register('get_task_queue', callable=lambda:task_queue);
Queuemanager.register('get_result_queue', callable=lambda:result_queue);
# 3 bind port, set the password "qiye"
manager = Queuemanager(address=('', 8001), authkey='qiye')
# 4
manager.start();
# 5
task = manager.get_task_queue()
result = manager.get_result_queue()
# 6
for url in ['ImageUrl_'+bytes(i) for i in range(10)]:
print 'put task %s ...' % url
task.put(url)
#
print 'try get result...'
for i in range(10):
print 'result is %s' % result.get(timeout=10)
#
manager.shutdown()
taskWorker.py
# coding: utf-8
import time
from multiprocessing.managers import BaseManager
# 0
class QueueManager(BaseManager):
pass
# 1
QueueManager.register('get_task_queue')
QueueManager.register('get_result_queue')
# 2
server_addr = '127.0.0.1'
print('Connect to server %s...' % server_addr)
m = QueueManager(address=(server_addr, 8001), authkey='qiye')
m.connect()
# 3 获取Queue的对象
task = m.get_task_queue()
result = m.get_result_queue()
# 4
while (not task.empty()):
image_url = task.get(True, timeout=5)
print('run task download %s...' % image_url)
time.sleep(1)
result.put('%s--->success'%image_url)
print('worker exit.')
网络编程
Python提供了两个基本的Socket模块:
Socket, 提供了标准的BSD Sockets API
SocketServer, 提供了服务器中心类,可以简化网络服务器的开发
Socket类型