【Python爬虫】requests模块练习

一、构造一个访问阳光电影网的请求(url,headers)
二、输出请求的状态码
三、输出请求的网页源码
四、将源码保存成html文件(文件为'moive.html')

查找URL

【Python爬虫】requests模块练习_第1张图片
找到URL

查看编码方式

1.进去网页源代码



2.charset

import requests

url='http://www.ygdy8.com/'
headers= {
    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
    'Accept-Encoding':'gzip, deflate',
    'Accept-Language':'zh-CN,zh;q=0.8',
    'Cache-Control':'max-age=0',
    'Connection':'keep-alive',
    'Cookie':'37cs_pidx=1; 37cs_user=37cs44901069952; UM_distinctid=15e28057c4a30f-02b42d1c91d0ef-31617c01-13c680-15e28057c4b997; CNZZDATA5783118=cnzz_eid%3D2068724491-1503909932-null%26ntime%3D1503909932; 37cs_show=69; cscpvrich4016_fidx=1',
    'Host':'www.ygdy8.com',
    'If-Modified-Since':'Mon, 28 Aug 2017 04:48:57 GMT',
    'If-None-Match': "804213f5b81fd31:530",
    'Referer' : 'https://www.baidu.com/link?url=utMdaXmlbTPNR8LWph_DbE_m09qwXW-0X52v5rstvIy&wd=&eqid=8aa7bb7d00010f010000000259a3d86e',
    'Upgrade-Insecure-Requests':'1',
    'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36',
}
#定义req为一个request请求的对象
req = requests.get(url,headers=headers)

#获取请求的状态码
status_code = req.status_code
print(status_code)

#指定网页编码方式
req.encoding = 'gb2312'

#获取网页源码 用html变量接收 text content方法灵活运用
#html = req.text
html = req.content
print(html)

你可能感兴趣的:(【Python爬虫】requests模块练习)