爬虫爬取数据遇到302,301重定向如何获取重定向后的地址(完美解决)

当用java或者python爬取目标网站的时候,浏览器可以正确重定向,而用编程爬取始终是code:200

只需要将请求头修改成如下,可以根据需要进行更改

  Map<String, String> headers = Map.of(
                    "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
                    "Accept-Encoding", "gzip, deflate, sdch, br",
                    "Accept-Language", "zh-CN,zh;q=0.8",
                    "Connection", "keep-alive",
                    "Host", "www.baidu.com",
                    "Upgrade-Insecure-Requests", "1",
                    "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36"
            );

然后就可以获取目标重定向后的地址

String redirectedUrl = connection.getHeaderField("Location");

完整java语言get请求获取重定向地址方法

   /**
     * 获取重定向后的地址
     * @param url
     * @return
     */
 public static String sendGetRequestWithRedirect(String url) {
        try {
            URL getUrl = new URL(url);
            HttpURLConnection connection = (HttpURLConnection) getUrl.openConnection();
            connection.setRequestMethod("GET");

            // 设置请求头,模拟浏览器行为
            // 设置自定义请求头
            Map<String, String> headers = Map.of(
                    "Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
                    "Accept-Encoding", "gzip, deflate, sdch, br",
                    "Accept-Language", "zh-CN,zh;q=0.8",
                    "Connection", "keep-alive",
                    "Upgrade-Insecure-Requests", "1",
                    "User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36"
            );

            // 添加自定义请求头
            for (Map.Entry<String, String> entry : headers.entrySet()) {
                connection.setRequestProperty(entry.getKey(), entry.getValue());
            }
            // 设置重定向处理
            connection.setInstanceFollowRedirects(false);

            int responseCode = connection.getResponseCode();
            if (responseCode == HttpURLConnection.HTTP_OK || responseCode == HttpURLConnection.HTTP_MOVED_TEMP || responseCode == HttpURLConnection.HTTP_MOVED_PERM) {
                String redirectedUrl = connection.getHeaderField("Location");

                if (redirectedUrl != null) {
                    // 重定向时获取新地址
                    return redirectedUrl;
                } else {
                    return  url;
                }
            } else {
                // 处理错误响应
                System.out.println("Error response code: " + responseCode);
                return null;
            }
        } catch (Exception e) {
            e.printStackTrace();
            return null;
        }
    }

你可能感兴趣的:(爬虫,爬虫)