【问题解决】Error 400. The request has an invalid header name

最近在爬虫的过程中,访问微软必应词典接口时,遇到了Error 400. The request has an invalid header name的错误。
请求头文件如下:

Host: cn.bing.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: zh-CN,en-US;q=0.7,en;q=0.3
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Upgrade-Insecure-Requests: 1
Pragma: no-cache
Cache-Control: no-cache
TE: Trailers

构建http请求头:

static Header[] getHeader(String headersPath) {
		//fileToLines :工具类,把文件以行字符串为单位,读到一个list中
		ArrayList<String> hs=FileUtils.fileToLines(new File(headersPath));
		//org.apache.http.Header;
		Header[] headers=new Header[hs.size()];
		for (int i=0;i<headers.length;i++) {
			String key=hs.get(i).split(":",2)[0];
			String value=hs.get(i).split(":",2)[1];
			headers[i]=new BasicHeader(key, value);
		}
		return headers;
	}

发送请求:

public void doGet()throws Exception{
	CloseableHttpClient httpClient = HttpClients.createDefault();
        String url = "https://cn.bing.com/dict/search?q=happily";
        //Get方法
        HttpGet httpGet = new HttpGet(url);
        //配置各种响应时间
        RequestConfig requestConfig = RequestConfig.custom()  
                .setConnectTimeout(3000).setConnectionRequestTimeout(1000)  
                .setSocketTimeout(5000).build();  
        httpGet.setConfig(requestConfig); 
		//调用上面getHeader方法
        httpGet.setHeaders(getHeader("D:\\desktop\\新建文本文档.txt"));
        //执行请求
		CloseableHttpResponse httpResponse=httpClient.execute(httpGet);
		int statusCode = httpResponse.getStatusLine().getStatusCode();//状态码
		System.out.println(statusCode);
		//获取响应内容
		String content= EntityUtils.toString(httpResponse.getEntity(), "utf-8");
		System.out.println(content);
}

执行结果如下:

400

Bad Request

Bad Request - Invalid Header


HTTP Error 400. The request has an invalid header name.

提示请求包含不合法请求头。
我寻思着请求头都合法啊,还又挨个去看了D:\desktop\新建文本文档.txt,我文件里面的请求头。
然后就想,是不是把请求头读到list中的fileToLines方法出问题了,导致设置的请求头和文件里的请求头不一样

查看fileToLines方法:

public static Collection<String> fileToLines(String fileName,Collection<String>list){
		try (BufferedReader br=new BufferedReader(new FileReader(fileName))){
			String info;
			while((info=br.readLine())!=null) {
				list.add(info.replaceAll("\\s*", ""));
			}
		} catch (Exception e) {
		}
		return list;
	}

看到方法的那一刻,我立马明白了!replaceAll("\s", “”)把所有空格换行什么的,都整没了!
导致请求头中的:
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0
变成了:
User-Agent:Mozilla/5.0(WindowsNT10.0;Win64;x64;rv:69.0)Gecko/20100101Firefox/69.0
所有空格都没了,这不是合法的UA。
想必是以前出于某种需求,改了工具类的fileToLines方法。
教训:莫要在抽象出来的工具类中添加特定需求代码!如:replaceAll("\s", “”)

至此问题解决!

你可能感兴趣的:(Java网络爬虫,网络,java基础)