Python爬虫-Request爬取网站内容

爬去网页-Requests,网站库-Scrapy,全网爬取-定制Google这种。

爬取京东一个页面的信息

import requests
url = 'http://item.jd.com/2967929.html'
 

    try:
        r = requests.get(url)
        r.raise_for_status()
        #如果状态不是200,引发HTTPError异常
        r.encoding = r.apparent_encoding
        print(r.text[:1000])
    except:
        return("爬取失败")

爬取亚马逊的一个网页信息

import requests
url = 'https://www.amazon.cn/gp/yourstore/home/ref=nav_cs_ys'
 

    try:
        kv = {"user-agent":"Mozilla/5.0"}
        r = requests.get(url,header=kv)
        r.raise_for_status()
        #如果状态不是200,引发HTTPError异常
        r.encoding = r.apparent_encoding
        print(r.text[:1000])
    except:
        return("爬取失败")

百度360搜索提交

import requests
keyword = "python"


    try:
        kv = {"wd":keyword}
        r = requests.get("http://baidu.com/s",params=kv)
        r.raise_for_status()
        #如果状态不是200,引发HTTPError异常
        print(r.request.url)
        r.encoding = r.apparent_encoding
        print(Len(r.text))
    except:
        return("爬取失败")

图片爬取

import requests
import os
url = "https://timgsa.baidu.com/timg?image&quality=80&size=b9999_10000&sec=1532578961266&di=45b828a25eab838db2716ad5ccc860ac&imgtype=0&src=http%3A%2F%2Fi.ce.cn%2Fce%2Fxwzx%2Fkj%2F201601%2F06%2FW020160106587980353192.jpg"
root = "/Users/python/Desktop"
path = root+url.split("/")[-1]
try:
	if not os.path.exists(path):
		os.mkdir(root)
	if not os.path.exists(path):
		r = requests.get(url)
		with open(path,"wb") as f:
			f.write(r.content)
			f.close()
			print("文本保存成功")
	else:
		print("文件已存在")
except:
	print(“爬取失败”)

IP地址归属地的自动查询

import requests
url = "http://m.ip138.com/ip.asp?ip="
try:
    r = request.get(url+"202.204.80.112")
    r.raise_for_status()
    r.encoding = r.apparent_encoding
    print(r.text[-500:])
except:
    print("爬取失败")

网络爬虫视角,并不是点击,而是通过url实现的。

你可能感兴趣的:(python网络爬虫)