[Python爬虫] X-一些日常笔记

# 2018-11-12
# 1)觉得scrapy框架过于笨重,小型爬虫尝试使用requests并调用IP代理池以做到反反爬虫
# 注意事项:
# 1.代理池中IP要选择优质的IP(芝麻代理真的好垃圾啊,白花50大洋)
# 2.requests.get()中的proxies参数要是字典形式
# 3.可以通过访问"http://httpbin.org/ip"获取目前IP地址 
# 4.proxies可以是{"http":"61.138.33.20:808"},也可以是{"http":"http://61.138.33.20:808"}
# 5.同理可以对U-A做同样的处理,test_user_agent_pool = "http://httpbin.org/user-agent"

import requests
import random
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36"}
proxy_pool = ["61.138.33.20:808","61.135.217.7:80","118.190.95.35:9001","106.56.102.5:8070"]
proxies = {"http":random.choice(proxy_pool)}
test_proxy_url = "http://httpbin.org/ip"
response = requests.get(url=test_proxy_url,headers=headers,proxies=proxies)
text = response.text
print(text)

 

你可能感兴趣的:(爬虫)