Java 读取Doc/Docx 文档

    Java后台系统中常常会遇到读取文档内容的需求,今天把 Java 读取两种格式的word文档写了一个简单的工具类附上:

1. 需要添加 Apache.poi 的依赖

 
        
            org.apache.poi
            poi-scratchpad
            3.8
        
        
            commons-io
            commons-io
            2.6
        

2. 工具类代码

package com.lq.file.word;

/**
 * 

Description:POIUtil 工具类

*

Copyright: Copyright (c)2019

*

Company: Tope

*

@version 1.0

*/ import org.apache.poi.POIXMLDocument; import org.apache.poi.POIXMLTextExtractor; import org.apache.poi.hwpf.extractor.WordExtractor; import org.apache.poi.openxml4j.opc.OPCPackage; import org.apache.poi.xwpf.extractor.XWPFWordExtractor; import java.io.File; import java.io.FileInputStream; import java.io.InputStream; import java.util.ArrayList; import java.util.List; public class POIUtil { /** * @Description: POI 读取 word * @create: 2019-07-27 9:48 * @update logs * @throws Exception */ public static List readWord(String filePath) throws Exception{ List linList = new ArrayList(); String buffer = ""; try { if (filePath.endsWith(".doc")) { InputStream is = new FileInputStream(new File(filePath)); WordExtractor ex = new WordExtractor(is); buffer = ex.getText(); ex.close(); if(buffer.length() > 0){ //使用回车换行符分割字符串 String [] arry = buffer.split("\\r\\n"); for (String string : arry) { linList.add(string.trim()); } } } else if (filePath.endsWith(".docx")) { OPCPackage opcPackage = POIXMLDocument.openPackage(filePath); POIXMLTextExtractor extractor = new XWPFWordExtractor(opcPackage); buffer = extractor.getText(); extractor.close(); if(buffer.length() > 0){ //使用换行符分割字符串 String [] arry = buffer.split("\\n"); for (String string : arry) { linList.add(string.trim()); } } } else { return null; } return linList; } catch (Exception e) { System.out.print("error---->"+filePath); e.printStackTrace(); return null; } } }

 

你可能感兴趣的:(常见工具类收集,Java)