python爬虫记录

爬虫是比较常用的程序,用python实现起来非常简单,有几个相关的库,这里就记录一下python常用的爬虫代码,备忘。

1 requestxs

import requests
url ='http://onevanillachecker.com/'
rep = requests.get(url)
rep.encoding = 'utf-8'
print(rep.text)

一些参数的记录

import requests
url ='http://onevanillachecker.com/'
header={
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'Accept-Encoding': 'gzip, deflate, sdch',
        'Accept-Language': 'zh-CN,zh;q=0.8',
        'Connection': 'keep-alive',
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X x.y; rv:42.0) Gecko/20100101 Firefox/42.0'
    }
timeout = random.choice(range(80, 180))
rep = requests.get(url,headers = header,timeout = timeout)
rep.encoding = 'utf-8'
print(rep.text)

2 urllib2

import urllib2
req = urllib2.Request('http://onevanillachecker.com/')
response = urllib2.urlopen(req)
html = response.read()

3 beautifulsoup

beautifulsoup是用来解析页面的库,使用起来非常方便
相关文档https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/
下面简单记一些常用的东西,备忘。
配置安装

pip install beautifulsoup4

简单使用

from bs4 import BeautifulSoup
import urllib2
req = urllib2.Request('http://onevanillachecker.com/')
response = urllib2.urlopen(req)
html = response.read()

# beautifulsoup
soup = BeautifulSoup(html)
print(soup.title)
# One Vanilla Gift Card Balance Check -Official Website
print(soup.title.name)
# title
print(soup.title.string)
# One Vanilla Gift Card Balance Check -Official Website
print(soup.title.parent.name)
# head
print(soup.p)
# 

Life happens every day. And OneVanilla
helps make it simpler. Shop, dine, fill 'er up
and more - all with one prepaid card.

# print(soup.p['class']) print(soup.a) # Vanilla Gift Card print(soup.find_all('a')) # Vanilla Gift Card, Check Vanilla 3 Balance # Vanilla Gift Cards, Where to Buy # Sign In, About Vanilla Gift Card # Using Your Vanilla Gift Card, Try Vanilla Gift # ...... print(soup.find(alt="2")) # 2 print(soup.get_text())

你可能感兴趣的:(python爬虫记录)