Requests-HTML

1、GET请求

from requests_html import HTMLSession
session = HTMLSession()
url = 'http://news.youth.cn/'
response = session.get(url)
response.encoding='gb2312'
print(response.text)

2、POST请求

from requests_html import HTMLSession
session = HTMLSession()
url = 'http://httpbin.org/post'
data = {'user':'admin','password':123456}
response = session.post(url=url,data=data)
if response.status_code == 200:
    print(response.text)

3、 修改请求头信息

from requests_html import HTMLSession
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.131 Safari/537.36'
}
url = 'http://httpbin.org/post'
data = {'user':'admin','password':123456}
session = HTMLSession()
response = session.post(url=url,data=data,headers=headers)
print(response.text)

4、生成随机的请求头信息

from requests_html import HTMLSession,UserAgent
session = HTMLSession()
user_agent = UserAgent().random
url = 'http://httpbin.org/get'
response = session.get(url,headers={'User-Agent':user_agent})
if response.status_code == 200:
    print(response.text)

5、数据的提取

        以往使用requests模块实现爬虫程序时,还需要为其配置一个解析HTML代码的搭档。Request-HTML模块对此进行了一个比较大的升级,不仅支持CSS选择器,还支持XPath的节点提取方式。

1、 CSS 选择器

CSS选择器中需要使用HTML的find()方法

find(selector:str='*',containing:_Contraining=None,clean:bool=False,first:bool=False,_ending:str=None)
参数:
selector: 使用CSS选择器定位网页元素
containing: 通过指定文本获取网页元素
clean: 是否清楚HTML中的
                    
                    

你可能感兴趣的:(python爬虫,python)