关于jsoup的常用方法

插播下jsoup的使用,解析xml文件很方便

Document parse = Jsoup.parse(new URL("https://api.bilibili.com/x/v1/dm/list.so?oid=81111362"), 2000);
        Elements d = parse.getElementsByTag("d");
        for (Element element : d) {
            System.out.println(element.text());
        }

对于接口返回的类型不是html形式的,则可以用下边的方式将其转为json形式,此方法是get请求

  public static JSONObject getdatalist(String url) throws Exception {
        Document document = Jsoup.connect(url).ignoreContentType(true).get();
        Element body = document.body();
        String text = body.text();
        JSONObject jsonObject = JSONObject.fromObject(text);
        return jsonObject;
    }

关于模拟post请求的几个坑

最直接的方法就是把请求头,请求体的所有参数都给带上,这样非常稳,否则若少带啦几个参数,容易翻车

插播下jsoup的使用,解析xml文件很方便

Document parse = Jsoup.parse(new URL("https://api.bilibili.com/x/v1/dm/list.so?oid=81111362"), 2000);
        Elements d = parse.getElementsByTag("d");
        for (Element element : d) {
            System.out.println(element.text());
        }

对于接口返回的类型不是html形式的,则可以用下边的方式将其转为json形式

  public static JSONObject getdatalist(String url) throws Exception {
        Document document = Jsoup.connect(url).ignoreContentType(true).get();
        Element body = document.body();
        String text = body.text();
        JSONObject jsonObject = JSONObject.fromObject(text);
        return jsonObject;
    }

得到url,下载资源

    try {
            URL url = new URL("http://video-qn.ibaotu.com/19/42/15/54X888piCwYi.mp4");
            DataInputStream dataInputStream = new DataInputStream(url.openStream());

            FileOutputStream fileOutputStream = new FileOutputStream(new File("d:\\3.mp4"));
            ByteArrayOutputStream output = new ByteArrayOutputStream();

            byte[] buffer = new byte[1024];
            int length;

            while ((length = dataInputStream.read(buffer)) > 0) {
                output.write(buffer, 0, length);
            }
            fileOutputStream.write(output.toByteArray());
            dataInputStream.close();
            fileOutputStream.close();
        } catch (MalformedURLException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }


你可能感兴趣的:(爬虫)