Python 爬虫之反扒(未完)

今天,我们来讲python反扒,本人写这些只是为了记录我的学习,以及供别人参考,无他意,爬虫本就是黑客技术,但我们要做一个正直的人!!!

#浏览器伪装
import urllib.request
url="http://blog.csdn.net"
#文件夹格式
headers=("User-Agent","Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko")#标头(是个元组)
opener=urllib.request.build_opener()
opener.addheaders=[headers]#设置标头
file=opener.open(url).read()#打开
fn=open("D:\\python\\标头.html","wb")
fn.write(file)
fn.close()
'''
#爬去腾讯新闻首页所有内容
'''
1、爬取新闻首页
2、得到各新闻链接
3、爬取新闻链接
4、寻找有没有framo
5、若有,抓取frame下对应网页内容
6、若没有,直接抓取当前页面

import urllib.request
import re
url="http://new.qq.com/omn/20190315/20190315A1IJ96.html"
hearder=("user-agent","Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0")
opener=urllib.request.build_opener()
opener.addheaders=[hearder]
file=opener.open(url).read()
print(file)
fn=open("D:\\python\\标头.html","wb")
fn.write(file)
fn.close()

import urllib.request
url="http://new.qq.com/"
urllib.request.urlretrieve(url,"D:\\python\\zhuye.html")
hearder=("user-agent","Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0")
opener=urllib.request.build_opener()
opener.addheaders=[hearder]
file=opener.open(url).read()
fn=open("D:\\python\\zhuye2.html","wb")
fn.write(file)
fn.close()
'''

import urllib.request
import re
url="http://www.youth.cn"
#urllib.request.urlretrieve(url,"D:\\python\\zhuye.html")
hearder=("user-agent","Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36 SE 2.X MetaSr 1.0")
opener=urllib.request.build_opener()
opener.addheaders=[hearder]
file=opener.open(url).read()
data='

 

你可能感兴趣的:(Python 爬虫之反扒(未完))