requests方法的post请求方式

0x01前言

对post请求方式,我们需要明确表单内容的类型,一般情况下,直接提交data参数即可,但如果前端对此有所校验,就需要根据实际情况进行调整。

0x02常见的post方式

post form-data
这里我自己搭建了个简单的登陆界面
源码如下 login.jsp

<%@ page contentType="text/html;charset=UTF-8" language="java" import="java.util.*"  pageEncoding="utf-8" %>
<%
        String path = request.getContextPath();
        String basePath = request.getScheme() + "://" + request.getServerName() + ":" + request.getServerPort() + path + "/";
        System.out.println(basePath);
%>
<html>
<head>
    <title>登陆</title>
</head>
<body>
<form action="<%=basePath%>loginServlet" method="post">

    <tabel align="center">
        <tr>
            <td height="200"></td>
        </tr>
        <tr><td>用户名:</td><td ><input type="text" name="username"></td></tr>
        <tr><td>密码:</td><td ><input type="password" name="password"></td></tr>
        <tr>
            <td colspan="2" align="center">
                <input type="submit" value="提交">
            </td>
        </tr>
    </tabel>
</form>
</body>
</html>

运行界面:
requests方法的post请求方式_第1张图片
用于处理登陆请求的loginServlet.java如下:

@WebServlet("/loginServlet")
public class loginServlet extends HttpServlet {
    //doGet实现页面转发
    @Override
    protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
        req.getRequestDispatcher("/login.jsp").forward(req,resp);
        System.out.println("访问了login.jsp");
    }
    //doPost处理post请求
    @Override
    protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
        String username = req.getParameter("username");
        String password = req.getParameter("password");
        System.out.println("username");
        req.getSession().setAttribute("username",username);
        String path = req.getContextPath();
        resp.sendRedirect(path + "/index/main.jsp");
    }
}

登陆成功后的main.jsp

<%@ page contentType="text/html;charset=UTF-8" language="java" %>
<html>
<head>
    <title>主页</title>
</head>
<body>
<h1>这是主页</h1>
welcome you ,${username}
</body>
</html>

我们填写账号,密码后点击登陆
同时打开dev-tools 看一下请求情况
requests方法的post请求方式_第2张图片
requests方法的post请求方式_第3张图片
requests方法的post请求方式_第4张图片
接着用python模拟看看
python代码如下:

import requests

start_url = 'http://localhost:8080/loginServlet'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'
}
data = {
    'username': 'aaa',
    'password': 'aaa'
}
response = requests.post(start_url,headers=headers,data=data)
print(response.text)

响应结果:
requests方法的post请求方式_第5张图片
可以看到成功了

0x03两种通用思路处理post请求失败的网站

3.1使用json=data传值

首先我们来看一个例子
url:aHR0cHM6Ly93d3cua3VhaXNob3UuY29tL3Byb2ZpbGUvM3h4Ymt3ZDhta250ZWFj

先抓包分析接口

requests方法的post请求方式_第6张图片payload:
requests方法的post请求方式_第7张图片

注意我这里圈起来的content-type:用于定义网络文件的类型和网页的编码,决定文件接收方将以什么形式、什么编码读取这个文件。这意味这我们直接以data的形式post,或许不行了。可以尝试一下

这里我们使用json = data进行post

response = session.post(self.start_url,headers=self.headers,json=self.data)

requests方法的post请求方式_第8张图片
可以看到成功请求到了
请求代码如下:

class ksSpider(object):
    def __init__(self):
        self.start_url = "https://www.kuaishou.com/graphql"
        self.headers = {

            'Cookie': 'kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3; did=web_74dfd4a4a602c0350b9d5a6715e61e87',
            'Referer': 'https://www.kuaishou.com/profile/3xxbkwd8mknteac',
            'User-Agent': random.choice(USER_AGENT_LIST)
        }
        self.data = {"operationName":"visionProfilePhotoList","variables":{"userId":"3xxbkwd8mknteac","pcursor":"1.637041162378E12","page":"profile"},"query":"fragment photoContent on PhotoEntity {\n  id\n  duration\n  caption\n  likeCount\n  viewCount\n  realLikeCount\n  coverUrl\n  photoUrl\n  photoH265Url\n  manifest\n  manifestH265\n  videoResource\n  coverUrls {\n    url\n    __typename\n  }\n  timestamp\n  expTag\n  animatedCoverUrl\n  distance\n  videoRatio\n  liked\n  stereoType\n  profileUserTopPhoto\n  __typename\n}\n\nfragment feedContent on Feed {\n  type\n  author {\n    id\n    name\n    headerUrl\n    following\n    headerUrls {\n      url\n      __typename\n    }\n    __typename\n  }\n  photo {\n    ...photoContent\n    __typename\n  }\n  canAddComment\n  llsid\n  status\n  currentPcursor\n  __typename\n}\n\nquery visionProfilePhotoList($pcursor: String, $userId: String, $page: String, $webPageArea: String) {\n  visionProfilePhotoList(pcursor: $pcursor, userId: $userId, page: $page, webPageArea: $webPageArea) {\n    result\n    llsid\n    webPageArea\n    feeds {\n      ...feedContent\n      __typename\n    }\n    hostName\n    pcursor\n    __typename\n  }\n}\n"}
    def parse_start_url(self):
        response = session.post(self.start_url,headers=self.headers,json=self.data)
        print(response.text)

3.2使用xhr断点,找出最初post的数据

这里我换个url:
aHR0cDovL3d3dy53aGdnenkuY29tL1BvbGljaWVzQW5kUmVndWxhdGlvbnMvaW5kZXguaHRtbD91dG09c2l0ZXNfZ3JvdXBfZnJvbnQuMmVmNTAwMWYuMC4wLmZkNGU1ODMwMDhiYjExZWQ5ZjIzYjUzNmUyN2YwNmFk
requests方法的post请求方式_第9张图片
我们先传入data进行post

data = {"categoryCode": "GovernmentProcurement", "pageSize": 15, "pageNo": 1}
        response = session.post(self.start_url,headers=self.headers,data=data).text

当然失败了
在这里插入图片描述
接着我们复制api的资源路径 /front/search/category 添加xhr断点

requests方法的post请求方式_第10张图片
触发请求
关注scope
requests方法的post请求方式_第11张图片
requests方法的post请求方式_第12张图片
在控制台输出一下s[“data”],很明显是字符串
在这里插入图片描述
那么我们把Headers补全,试试post 这个字符串
代码如下:

data2 = "{\"categoryCode\":\"GovernmentProcurement\",\"pageSize\":\"15\",\"pageNo\":\"2\"}"

response = session.post(self.start_url,headers=self.headers,data=data2).text

在这里插入图片描述
成功
完整代码:

class wSpider(object):
    def __init__(self):
        self.start_url = "http://www.whggzy.com/front/search/category"
        self.headers = {

            "Content-Type": "application/json",
            "X-Requested-With": "XMLHttpRequest",
            'Cookie': 'acw_tc=2f624a2b16583799791827338e7686367845620cecc4e2a1b7ed6f85bbdd0a',
            'Referer': 'http://www.whggzy.com/PoliciesAndRegulations/index.html',
            'user-agent' : random.choice(USER_AGENT_LIST)
        }

    def parse_start_url(self):
        data = {"categoryCode": "GovernmentProcurement", "pageSize": 15, "pageNo": 1}
        data2 = "{\"categoryCode\":\"GovernmentProcurement\",\"pageSize\":\"15\",\"pageNo\":\"2\"}"

        response = session.post(self.start_url,headers=self.headers,data=data2).text
        print(response)

if __name__ == '__main__':
    w = wSpider()
    w.parse_start_url()

0x04总结

post的时候需要根据具体情况 大致来说表单用data= 表单字典用 json=;同时还可以通过xhr断点来确定发送的数据。

你可能感兴趣的:(爬虫,python)