pdfbox解析pdf文档

一、下载地址:http://sourceforge.net/projects/pdfbox/?source=dlp下载之后把lib中的PDFBox.jar引用到包中,单单引入这一个包是不够的,我们还需要引入external中的jar文件

 

二、编写源代码:

public class PdfParser {
	/**
	 * 
	 * @param path
	 *            文件路径
	 * @param ts
	 *            pdf字符穿解析类,此处为构造参数防止在解析的时候频繁打开,关闭
	 * @return
	 * @throws IOException
	 */
	public static String getContent(String path, PDFTextStripper ts,StringWriter writer)
			throws IOException {
  //PDFTextStripper ts=new PDFTextStripper(); 
  // StringWriter writer=new StringWriter();
		FileInputStream fis = new FileInputStream(path);
		PDDocument pdfDocument = PDDocument.load(fis);
		ts.writeText(pdfDocument, writer);
		String s = writer.getBuffer().toString();
		pdfDocument.close();
		writer.close();
		fis.close();
		return s;
	}

	public static void main(String args[]) throws IOException {
		PDFTextStripper ts = new PDFTextStripper();
		StringWriter writer= new StringWriter();
		String s = getContent(
				"E:\\$RECYCLE.BIN\\S-1-5-21-3743469558-3320822278-4247495569-500\\$R19UCXC\\architecture\\startup\\serverStartup.pdf",
				ts,writer);
		System.out.println(s);
		String s1 = getContent(
				"E:\\$RECYCLE.BIN\\S-1-5-21-3743469558-3320822278-4247495569-500\\$R19UCXC\\architecture\\startup\\serverStartup.pdf",
				ts,writer);
		System.out.println(s1);
		
		String s2 = getContent(
				"E:\\$RECYCLE.BIN\\S-1-5-21-3743469558-3320822278-4247495569-500\\$RC3NO6U\\[北京圣思园Struts2应用开发详解]_017.Struts2访问Servlet API及Web应用单元测试详解(容器内测试与Mock测试)\\北京圣思园科技有限公司第一期面授培训大纲.pdf",
				ts,writer);
		
	

	}

 

 

三、常见异常分析

java.lang.Throwable: Warning: You did not close the PDF Document
 at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:418)
 at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
 at java.lang.ref.Finalizer.runFinalizer(Unknown Source)
 at java.lang.ref.Finalizer.access$100(Unknown Source)
 at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)

     

如果按照:上面的代码是不会出现异常的,异常出现可能是没有利用stringWriter进行操作

你可能感兴趣的:(pdfbox解析pdf文档)