java实现pdf转word(文字)

1:添加依赖



         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    4.0.0

    pdfToWord
    pdfToWord
    1.0-SNAPSHOT
   
       
           commons-logging
           commons-logging
           1.2
       
       
           org.apache.pdfbox
           fontbox
           2.0.11
       
       
           com.levigo.jbig2
           levigo-jbig2-imageio
           2.0
       
       
           org.apache.pdfbox
           pdfbox-tools
           2.0.11
       
       
           commons-io
           commons-io
           2.6
       
   
    

2:编写转换的方法

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;

import java.io.*;

/**
 * 把pdf转换为word格式
 *
 * @author Angin
 * @date 2019/3/18 0018.
 */
public class PdfToWord {
    /**
     * 转换
     */
    public void convertText(String pdfPath) {
        PDDocument doc = null;
        OutputStream fos = null;
        Writer writer = null;
        PDFTextStripper stripper = null;
        try {
            doc = PDDocument.load(new File(pdfPath));
            fos = new FileOutputStream(pdfPath.substring(0, pdfPath.indexOf(".")) + ".doc");
            writer = new OutputStreamWriter(fos, "UTF-8");
            stripper = new PDFTextStripper();
            int pageNumber = doc.getNumberOfPages();
            stripper.setSortByPosition(true);
            stripper.setStartPage(1);
            stripper.setEndPage(pageNumber);
            stripper.writeText(doc, writer);
            writer.close();
            doc.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
        System.out.println("end..");
    }
}

3:main方法中进行测试

/**
 * main方法测试
 * @author Angin
 * @date 2019/3/18 0018.
 */
public class MainClass {
    public static void main(String[] args) {
   PdfToWord convert=new PdfToWord();
   convert.convertText("E:\\pdfToWord.pdf");
    }
}

此方法只适合文档型的pdf转换,如果图片的话,转换后无法读取。

转载于:https://www.cnblogs.com/angin-iit/p/10551829.html

你可能感兴趣的:(java实现pdf转word(文字))