如何根据URL下载文件

使用htmlunit可以很方便的实现URL文件的下载

public static String httpDownload(String url, String encode) {
        WebClient webClient = new WebClient();
        webClient.getOptions().setActiveXNative(false);
        webClient.getOptions().setJavaScriptEnabled(false);
        webClient.getOptions().setCssEnabled(false);

        InputStream is = null;
        String temp = null;
        StringBuilder sb = new StringBuilder();
        try {
            Page page = webClient.getPage(url);
            is = page.getWebResponse().getContentAsStream();

            byte[] bytes = new byte[4096];
            int len = 0;
            while ((len = is.read(bytes)) != -1) {
                sb.append(new String(bytes, 0, len, encode));
            }

            byte[] specialByte = { (byte) 0xC2, (byte) 0xA0 };
            String UTFSpace = new String(specialByte, StandardCharsets.UTF_8);

            temp = sb.toString().replaceAll(UTFSpace, " ");
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            if (null != is) {
                try {
                    is.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
            webClient.close();
        }

        return temp;
    }

注意:该方法针对HTML文件进行了空字符替换,将0xC2和0xA0替换为了HTML里面的 。如果是其他类型的文件,不要替换空字符,不然会导致文件打不开或乱码什么的。

你可能感兴趣的:(如何根据URL下载文件)