最近尝试了一下BeautifuSoup 这个解析html的类库,概叹BeautifuSoup 的强大啊,了了几行代码就能抓取香港官网iphone4s的信息 哈哈——
from BeautifulSoup import BeautifulSoup import urllib webpage = urllib.urlopen(r"http://store.apple.com/hk-zh/browse/home/shop_iphone/family/iphone/iphone4s"); soup = BeautifulSoup(webpage.read()) tags = soup('ul',{'class':'selection-options all-models'}) tags = tags[0](lambda tag : len(tag.attrs) == 1 and tag.name in ['span'] and tag['class'] in ['shipping','price','color','title']) for tag in tags : print tag.text print '-' * 30
输入结果:
16GB1
------------------------------
black
------------------------------
HK$ 5,088
------------------------------
估計付運時間:暫無供應
------------------------------
32GB1
------------------------------
black
------------------------------
HK$ 5,888
------------------------------
估計付運時間:暫無供應
------------------------------
64GB1
------------------------------
black
------------------------------
HK$ 6,688
------------------------------
估計付運時間:暫無供應
------------------------------
16GB1
------------------------------
white
------------------------------
HK$ 5,088
------------------------------
估計付運時間:暫無供應
------------------------------
32GB1
------------------------------
white
------------------------------
HK$ 5,888
------------------------------
估計付運時間:暫無供應
------------------------------
64GB1
------------------------------
white
------------------------------
HK$ 6,688
------------------------------
估計付運時間:暫無供應
------------------------------
关于BeautifulSoup 大家可以参考 http://www.crummy.com/software/BeautifulSoup/documentation.zh.html
,赶紧加入pythoner 的行列吧,哈哈
我的微博:http://weibo.com/lei6744