最近遇到PDF抽词报错:
java.lang.NullPointerException
at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:194)
at org.pdfbox.pdmodel.PDPageNode.getAllKids(PDPageNode.java:182)
at org.pdfbox.pdmodel.PDDocumentCatalog.getAllPages(PDDocumentCatalog.java:226)
at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
at com.index.extractor.impl.PdfFileTextExtractor.getText(PdfFileTextExtractor.java:46)
at test.TextConvert.convert(TextConvert.java:147)
at test.TextConvert.getEFiles(TextConvert.java:111)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.getEFiles(TextConvert.java:130)
at test.TextConvert.go(TextConvert.java:47)
at test.TextConvert.main(TextConvert.java:42)
java.lang.Throwable: Warning: You did not close the PDF Document
at org.pdfbox.cos.COSDocument.finalize(COSDocument.java:418)
at java.lang.ref.Finalizer.invokeFinalizeMethod(Native Method)
at java.lang.ref.Finalizer.runFinalizer(Unknown Source)
at java.lang.ref.Finalizer.access$100(Unknown Source)
at java.lang.ref.Finalizer$FinalizerThread.run(Unknown Source)
网上搜了一下发现有人提交了此bug,在0.8版本中已做了修正,而我的工程里现在用到的还是FontBox-0.1.0-dev.jar PDFBox-0.7.3.jar,可其最新版本已是fontbox-1.4.0.jar pdfbox-1.4.0.jar 并已共享给了apache。
下载最新的包把后缀扩展名zip修改为jar,然后导入工程中即可。