Java 使用 POI 3.17根据Word 模板替换、操作书签

由于项目的需求,需要对大量的word文档进行处理。

查找了大量的文档发现很多的博客对这个进行了介绍,主要有2种方案做处理,jacob 和poi。但是现在的服务器基本上是部署在Linux上,所以jacob基本上是不可行的。所以呢,主要是使用poi来进行这些操作。

       Apache poi的hwpf模块是专门用来对word doc文件进行读写操作的。在hwpf里面我们使用HWPFDocument来表示一个word doc文档。在HWPFDocument里面有这么几个概念:
 Range:它表示一个范围,这个范围可以是整个文档,也可以是里面的某一小节(Section),也可以是某一个段落(Paragraph),还可以是拥有共同属性的一段文本(CharacterRun)。

 Section:word文档的一个小节,一个word文档可以由多个小节构成。

 Paragraph:word文档的一个段落,一个小节可以由多个段落构成。

 CharacterRun:具有相同属性的一段文本,一个段落可以由多个CharacterRun组成。
 
Table:一个表格。
TableRow:表格对应的行。
TableCell:表格对应的单元格。
Section、Paragraph、CharacterRun和Table都继承自Range。

1、基本的替换方法

        InputStream inputStream = new FileInputStream(modulePath);
        HWPFDocument document = new HWPFDocument(inputStream);
        Range range = document.getRange();
        for (Map.Entry entry : maps.entrySet()) {
            range.replaceText("@" + entry.getKey() + "@", entry.getValue());

        }
        OutputStream outputStream = new FileOutputStream(outPath);
        document.write(outputStream);
        this.closeStream(outputStream);
        this.closeStream(inputStream);

这些在网上已经有很普遍的使用了,但是这些基本上是基于3.9poi进行使用的,目前poi的版本已经更新到了3.17了,而且后续的就不会对Java6的支持了,最低支持Java8的,所以我们要使用3.17来进行对word进行文本的替换,书签的操作。

我们这里主要使用了两个类。(这两个类主要是参考http://www.jb51.net/article/101910.htm)中的dome的fang

BookMarkWord 文件中标签的封装类,保存了其定义和内部的操作

package com;
import java.util.List;
import java.util.Stack;

import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import org.apache.poi.xwpf.usermodel.XWPFTable;
import org.apache.poi.xwpf.usermodel.XWPFTableCell;
import org.apache.poi.xwpf.usermodel.XWPFTableRow;
import org.apache.xmlbeans.XmlException;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBookmark;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTText;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

/**
 *
 * Word 文件中标签的封装类,保存了其定义和内部的操作
 *
 * @author
 *
 * 

Modification History:

*

Date Author Description

*

------------------------------------------------------------------

*

*

*/ public class BookMark { //以下为定义的常量 /** 替换标签时,设于标签的后面 **/ public static final int INSERT_AFTER = 0; /** 替换标签时,设于标签的前面 **/ public static final int INSERT_BEFORE = 1; /** 替换标签时,将内容替换书签 **/ public static final int REPLACE = 2; /** docx中定义的部分常量引用 **/ public static final String RUN_NODE_NAME = "w:r"; public static final String TEXT_NODE_NAME = "w:t"; public static final String BOOKMARK_START_TAG = "bookmarkStart"; public static final String BOOKMARK_END_TAG = "bookmarkEnd"; public static final String BOOKMARK_ID_ATTR_NAME = "w:id"; public static final String STYLE_NODE_NAME = "w:rPr"; /** 内部的标签定义类 **/ private CTBookmark _ctBookmark = null; /** 标签所处的段落 **/ private XWPFParagraph _para = null; /** 标签所在的表cell对象 **/ private XWPFTableCell _tableCell = null; /** 标签名称 **/ private String _bookmarkName = null; /** 该标签是否处于表格内 **/ private boolean _isCell = false; /** * 构造函数 * @param ctBookmark * @param para */ public BookMark(CTBookmark ctBookmark, XWPFParagraph para) { this._ctBookmark = ctBookmark; this._para = para; this._bookmarkName = ctBookmark.getName(); this._tableCell = null; this._isCell = false; } /** * 构造函数,用于表格中的标签 * @param ctBookmark * @param para * @param tableCell */ public BookMark(CTBookmark ctBookmark, XWPFParagraph para, XWPFTableCell tableCell) { this(ctBookmark, para); this._tableCell = tableCell; this._isCell = true; } public boolean isInTable() { return this._isCell; } public XWPFTable getContainerTable() { return this._tableCell.getTableRow().getTable(); } public XWPFTableRow getContainerTableRow() { return this._tableCell.getTableRow(); } public String getBookmarkName() { return this._bookmarkName; } /** * Insert text into the Word document in the location indicated by this * bookmark. * * @param bookmarkValue An instance of the String class that encapsulates * the text to insert into the document. * @param where A primitive int whose value indicates where the text ought * to be inserted. There are three options controlled by constants; insert * the text immediately in front of the bookmark (Bookmark.INSERT_BEFORE), * insert text immediately after the bookmark (Bookmark.INSERT_AFTER) and * replace any and all text that appears between the bookmark's square * brackets (Bookmark.REPLACE). */ public void insertTextAtBookMark(String bookmarkValue, int where) { //根据标签的类型,进行不同的操作 if(this._isCell) { this.handleBookmarkedCells(bookmarkValue, where); } else { //普通标签,直接创建一个元素 XWPFRun run = this._para.createRun(); run.setText(bookmarkValue); switch(where) { case BookMark.INSERT_AFTER: this.insertAfterBookmark(run); break; case BookMark.INSERT_BEFORE: this.insertBeforeBookmark(run); break; case BookMark.REPLACE: this.replaceBookmark(run); break; } } } /** * Inserts some text into a Word document in a position that is immediately * after a named bookmark. * * Bookmarks can take two forms, they can either simply mark a location * within a document or they can do this but contain some text. The * difference is obvious from looking at some XML markup. The simple * placeholder bookmark will look like this; * *
     *
     * 
     *
     * 
* * Simply a pair of tags where one tag has the name bookmarkStart, the other * the name bookmarkEnd and both share matching id attributes. In this case, * the text will simply be inserted into the document at a point immediately * after the bookmarkEnd tag. No styling will be applied to the text, it * will simply inherit the documents defaults. * * The more complex case looks like this; * *
     *
     * 
     *   
     *     
     *       
     *       
     *     
     *     text
     *   
     * 
     *
     * 
* * Here, the user has selected the word 'text' and chosen to insert a * bookmark into the document at that point. So, the bookmark tags 'contain' * a character run that is styled. Inserting any text after this bookmark, * it is important to ensure that the styling is preserved and copied over * to the newly inserted text. * * The approach taken to dealing with both cases is similar but slightly * different. In both cases, the code simply steps along the document nodes * until it finds the bookmarkEnd tag whose ID matches that of the * bookmarkStart tag. Then, it will look to see if there is one further node * following the bookmarkEnd tag. If there is, it will insert the text into * the paragraph immediately in front of this node. If, on the other hand, * there are no more nodes following the bookmarkEnd tag, then the new run * will simply be positioned at the end of the paragraph. * * Styles are dealt with by 'looking' for a 'w:rPr' element whilst iterating * through the nodes. If one is found, its details will be captured and * applied to the run before the run is inserted into the paragraph. If * there are multiple runs between the bookmarkStart and bookmarkEnd tags * and these have different styles applied to them, then the style applied * to the last run before the bookmarkEnd tag - if any - will be cloned and * applied to the newly inserted text. * * @param run An instance of the XWPFRun class that encapsulates the text * that is to be inserted into the document following the bookmark. */ private void insertAfterBookmark(XWPFRun run) { Node nextNode = null; Node insertBeforeNode = null; Node styleNode = null; int bookmarkStartID = 0; int bookmarkEndID = -1; // Capture the id of the bookmarkStart tag. The code will step through // the document nodes 'contained' within the start and end tags that have // matching id numbers. bookmarkStartID = this._ctBookmark.getId().intValue(); // Get the node for the bookmark start tag and then enter a loop that // will step from one node to the next until the bookmarkEnd tag with // a matching id is fouind. nextNode = this._ctBookmark.getDomNode(); while (bookmarkStartID != bookmarkEndID) { // Get the next node along and check to see if it is a bookmarkEnd // tag. If it is, get its id so that the containing while loop can // be terminated once the correct end tag is found. Note that the // id will be obtained as a String and must be converted into an // integer. This has been coded to fail safely so that if an error // is encuntered converting the id to an int value, the while loop // will still terminate. nextNode = nextNode.getNextSibling(); if (nextNode.getNodeName().contains(BookMark.BOOKMARK_END_TAG)) { try { bookmarkEndID = Integer.parseInt( nextNode.getAttributes().getNamedItem( BookMark.BOOKMARK_ID_ATTR_NAME).getNodeValue()); } catch (NumberFormatException nfe) { bookmarkEndID = bookmarkStartID; } } // If we are not dealing with a bookmarkEnd node, are we dealing // with a run node that MAY contains styling information. If so, // then get that style information from the run. else { if (nextNode.getNodeName().equals(BookMark.RUN_NODE_NAME)) { styleNode = this.getStyleNode(nextNode); } } } // After the while loop completes, it should have located the correct // bookmarkEnd tag but we cannot perform an insert after only an insert // before operation and must, therefore, get the next node. insertBeforeNode = nextNode.getNextSibling(); // Style the newly inserted text. Note that the code copies or clones // the style it found in another run, failure to do this would remove the // style from one node and apply it to another. if (styleNode != null) { run.getCTR().getDomNode().insertBefore( styleNode.cloneNode(true), run.getCTR().getDomNode().getFirstChild()); } // Finally, check to see if there was a node after the bookmarkEnd // tag. If there was, then this code will insert the run in front of // that tag. If there was no node following the bookmarkEnd tag then the // run will be inserted at the end of the paragarph and this was taken // care of at the point of creation. if (insertBeforeNode != null) { this._para.getCTP().getDomNode().insertBefore( run.getCTR().getDomNode(), insertBeforeNode); } } /** * Inserts some text into a Word document immediately in front of the * location of a bookmark. * * This case is slightly more straightforward than inserting after the * bookmark. For example, it is possible only to insert a new node in front * of an existing node. When inserting after the bookmark, then end node had * to be located whereas, in this case, the node is already known, it is the * CTBookmark itself. The only information that must be discovered is * whether there is a run immediately in front of the boookmarkStart tag and * whether that run is styled. If there is and if it is, then this style * must be cloned and applied the text which will be inserted into the * paragraph. * * @param run An instance of the XWPFRun class that encapsulates the text * that is to be inserted into the document following the bookmark. */ private void insertBeforeBookmark(XWPFRun run) { Node insertBeforeNode = null; Node childNode = null; Node styleNode = null; // Get the dom node from the bookmarkStart tag and look for another // node immediately preceding it. insertBeforeNode = this._ctBookmark.getDomNode(); childNode = insertBeforeNode.getPreviousSibling(); // If a node is found, try to get the styling from it. if (childNode != null) { styleNode = this.getStyleNode(childNode); // If that previous node was styled, then apply this style to the // text which will be inserted. if (styleNode != null) { run.getCTR().getDomNode().insertBefore( styleNode.cloneNode(true), run.getCTR().getDomNode().getFirstChild()); } } // Insert the text into the paragraph immediately in front of the // bookmarkStart tag. this._para.getCTP().getDomNode().insertBefore( run.getCTR().getDomNode(), insertBeforeNode); } /** * Replace the text - if any - contained between the bookmarkStart and it's * matching bookmarkEnd tag with the text specified. The technique used will * resemble that employed when inserting text after the bookmark. In short, * the code will iterate along the nodes until it encounters a matching * bookmarkEnd tag. Each node encountered will be deleted unless it is the * final node before the bookmarkEnd tag is encountered and it is a * character run. If this is the case, then it can simply be updated to * contain the text the users wishes to see inserted into the document. If * the last node is not a character run, then it will be deleted, a new run * will be created and inserted into the paragraph between the bookmarkStart * and bookmarkEnd tags. * * @param run An instance of the XWPFRun class that encapsulates the text * that is to be inserted into the document following the bookmark. */ private void replaceBookmark(XWPFRun run) { Node nextNode = null; Node styleNode = null; Node lastRunNode = null; Node toDelete = null; NodeList childNodes = null; Stack nodeStack = null; boolean textNodeFound = false; boolean foundNested = true; int bookmarkStartID = 0; int bookmarkEndID = -1; int numChildNodes = 0; nodeStack = new Stack(); bookmarkStartID = this._ctBookmark.getId().intValue(); nextNode = this._ctBookmark.getDomNode(); nodeStack.push(nextNode); // Loop through the nodes looking for a matching bookmarkEnd tag while (bookmarkStartID != bookmarkEndID) { nextNode = nextNode.getNextSibling(); nodeStack.push(nextNode); // If an end tag is found, does it match the start tag? If so, end // the while loop. if (nextNode.getNodeName().contains(BookMark.BOOKMARK_END_TAG)) { try { bookmarkEndID = Integer.parseInt( nextNode.getAttributes().getNamedItem( BookMark.BOOKMARK_ID_ATTR_NAME).getNodeValue()); } catch (NumberFormatException nfe) { bookmarkEndID = bookmarkStartID; } } //else { // Place a reference to the node on the nodeStack // nodeStack.push(nextNode); //} } // If the stack of nodes found between the bookmark tags is not empty // then they have to be removed. if (!nodeStack.isEmpty()) { // Check the node at the top of the stack. If it is a run, get it's // style - if any - and apply to the run that will be replacing it. //lastRunNode = nodeStack.pop(); lastRunNode = nodeStack.peek(); if ((lastRunNode.getNodeName().equals(BookMark.RUN_NODE_NAME))) { styleNode = this.getStyleNode(lastRunNode); if (styleNode != null) { run.getCTR().getDomNode().insertBefore( styleNode.cloneNode(true), run.getCTR().getDomNode().getFirstChild()); } } // Delete any and all node that were found in between the start and // end tags. This is slightly safer that trying to delete the nodes // as they are found while stepping through them in the loop above. // If we are peeking, then this line can be commented out. //this._para.getCTP().getDomNode().removeChild(lastRunNode); this.deleteChildNodes(nodeStack); } // Place the text into position, between the bookmark tags. this._para.getCTP().getDomNode().insertBefore( run.getCTR().getDomNode(), nextNode); } /** * When replacing the bookmark's text, it is necessary to delete any nodes * that are found between matching start and end tags. Complications occur * here because it is possible to have bookmarks nested within bookmarks to * almost any level and it is important to not remove any inner or nested * bookmarks when replacing the contents of an outer or containing * bookmark. This code successfully handles the simplest occurrence - where * one bookmark completely contains another - but not more complex cases * where one bookmark overlaps another in the markup. That is still to do. * * @param nodeStack An instance of the Stack class that encapsulates * references to any and all nodes found between the opening and closing * tags of a bookmark. */ private void deleteChildNodes(Stack nodeStack) { Node toDelete = null; int bookmarkStartID = 0; int bookmarkEndID = 0; boolean inNestedBookmark = false; // The first element in the list will be a bookmarkStart tag and that // must not be deleted. for(int i = 1; i < nodeStack.size(); i++) { // Get an element. If it is another bookmarkStart tag then // again, we do not want to delete it, it's matching end tag // or any nodes that fall inbetween. toDelete = nodeStack.elementAt(i); if(toDelete.getNodeName().contains(BookMark.BOOKMARK_START_TAG)) { bookmarkStartID = Integer.parseInt( toDelete.getAttributes().getNamedItem(BookMark.BOOKMARK_ID_ATTR_NAME).getNodeValue()); inNestedBookmark = true; } else if(toDelete.getNodeName().contains(BookMark.BOOKMARK_END_TAG)) { bookmarkEndID = Integer.parseInt( toDelete.getAttributes().getNamedItem(BookMark.BOOKMARK_ID_ATTR_NAME).getNodeValue()); if(bookmarkEndID == bookmarkStartID) { inNestedBookmark = false; } } else { if(!inNestedBookmark) { this._para.getCTP().getDomNode().removeChild(toDelete); } } } } /** * Recover styling information - if any - from another document node. Note * that it is only possible to accomplish this if the node is a run (w:r) * and this could be tested for in the code that calls this method. However, * a check is made in the calling code as to whether a style has been found * and only if a style is found is it applied. This method always returns * null if it does not find a style making that checking process easier. * * @param parentNode An instance of the Node class that encapsulates a * reference to a document node. * @return An instance of the Node class that encapsulates the styling * information applied to a character run. Note that if no styling * information is found in the run OR if the node passed as an argument to * the parentNode parameter is NOT a run, then a null value will be * returned. */ private Node getStyleNode(Node parentNode) { Node childNode = null; Node styleNode = null; if (parentNode != null) { // If the node represents a run and it has child nodes then // it can be processed further. Note, whilst testing the code, it // was observed that although it is possible to get a list of a nodes // children, even when a node did have children, trying to obtain this // list would often return a null value. This is the reason why the // technique of stepping from one node to the next is used here. if (parentNode.getNodeName().equalsIgnoreCase(BookMark.RUN_NODE_NAME) && parentNode.hasChildNodes()) { // Get the first node and catch it's reference for return if // the first child node is a style node (w:rPr). childNode = parentNode.getFirstChild(); if (childNode.getNodeName().equals("w:rPr")) { styleNode = childNode; } else { // If the first node was not a style node and there are other // child nodes remaining to be checked, then step through // the remaining child nodes until either a style node is // found or until all child nodes have been processed. while ((childNode = childNode.getNextSibling()) != null) { if (childNode.getNodeName().equals(BookMark.STYLE_NODE_NAME)) { styleNode = childNode; // Note setting to null here if a style node is // found in order order to terminate any further // checking childNode = null; } } } } } return (styleNode); } /** * Get the text - if any - encapsulated by this bookmark. The creator of a * Word document can chose to select one or more items of text and then * insert a bookmark at that location. The highlighted text will appear * between the square brackets that denote the location of a bookmark in the * document's text and they will be returned by a call to this method. * * @return An instance of the String class encapsulating any text that * appeared between the opening and closing square bracket associated with * this bookmark. * @throws XmlException Thrown if a problem is encountered parsing the XML * markup recovered from the document in order to construct a CTText * instance which may required to obtain the bookmarks text. */ public String getBookmarkText() throws XmlException { StringBuilder builder = null; // Are we dealing with a bookmarked table cell? If so, the entire // contents of the cell - if anything - must be recovered and returned. if(this._tableCell != null) { builder = new StringBuilder(this._tableCell.getText()); } else { builder = this.getTextFromBookmark(); } return(builder == null ? null : builder.toString()); } /** * There are two types of bookmarks. One is a simple placeholder whilst the * second is still a placeholder but it 'contains' some text. In the second * instance, the creator of the document has selected some text and then * chosen to insert a bookmark there and the difference if obvious when * looking at the XML markup. * * The simple case; * *
     *
     * 
     *
     * 
* * The more complex case; * *
     *
     * 
     *   
     *     
     *       
     *       
     *     
     *     text
     *   
     * 
     *
     * 
* * This method assumes that the user wishes to recover the content from any * character run that appears in the markup between a matching pair of * bookmarkStart and bookmarkEnd tags; thus, using the example above again, * this method would return the String 'text' to the user. It is possible * however for a bookmark to contain more than one run and for a bookmark to * contain other bookmarks. In both of these cases, this code will return * the text contained within any and all runs that appear in the XML markup * between matching bookmarkStart and bookmarkEnd tags. The term 'matching * bookmarkStart and bookmarkEndtags' here means tags whose id attributes * have matching value. * * @return An instance of the StringBuilder class encapsulating the text * recovered from any character run elements found between the bookmark's * start and end tags. If no text is found then a null value will be * returned. * @throws XmlException Thrown if a problem is encountered parsing the XML * markup recovered from the document in order to construct a CTText * instance which may be required to obtain the bookmarks text. */ private StringBuilder getTextFromBookmark() throws XmlException { int startBookmarkID = 0; int endBookmarkID = -1; Node nextNode = null; Node childNode = null; CTText text = null; StringBuilder builder = null; String rawXML = null; // Get the ID of the bookmark from it's start tag, the DOM node from the // bookmark (to make looping easier) and initialise the StringBuilder. startBookmarkID = this._ctBookmark.getId().intValue(); nextNode = this._ctBookmark.getDomNode(); builder = new StringBuilder(); // Loop through the nodes held between the bookmark's start and end // tags. while (startBookmarkID != endBookmarkID) { // Get the next node and, if it is a bookmarkEnd tag, get it's ID // as matching ids will terminate the while loop.. nextNode = nextNode.getNextSibling(); if (nextNode.getNodeName().contains(BookMark.BOOKMARK_END_TAG)) { // Get the ID attribute from the node. It is a String that must // be converted into an int. An exception could be thrown and so // the catch clause will ensure the loop ends neatly even if the // value might be incorrect. Must inform the user. try { endBookmarkID = Integer.parseInt( nextNode.getAttributes(). getNamedItem(BookMark.BOOKMARK_ID_ATTR_NAME).getNodeValue()); } catch (NumberFormatException nfe) { endBookmarkID = startBookmarkID; } } else { // This is not a bookmarkEnd node and can processed it for any // text it may contain. Note the check for both type - it must // be a run - and contain children. Interestingly, it seems as // though the node may contain children and yet the call to // nextNode.getChildNodes() will still return an empty list, // hence the need to step through the child nodes. if (nextNode.getNodeName().equals(BookMark.RUN_NODE_NAME) && nextNode.hasChildNodes()) { // Get the text from the child nodes. builder.append(this.getTextFromChildNodes(nextNode)); } } } return (builder); } /** * Iterates through all and any children of the Node whose reference will be * passed as an argument to the node parameter, and recover the contents of * any text nodes. Testing revealed that a node can be called a text node * and yet report it's type as being something different, an element node * for example. Calling the getNodeValue() method on a text node will return * the text the node encapsulates but doing the same on an element node will * not. In fact, the call will simply return a null value. As a result, this * method will test the nodes name to catch all text nodes - those whose * name is to 'w:t' and then it's type. If the type is reported to be a text * node, it is a trivial task to get at it's contents. However, if the type * is not reported as a text type, then it is necessary to parse the raw XML * markup for the node to recover it's value. * * @param node An instance of the Node class that encapsulates a reference * to a node recovered from the document being processed. It should be * passed a reference to a character run - 'w:r' - node. * @return An instance of the String class that encapsulates the text * recovered from the nodes children, if they are text nodes. * @throws XmlException Thrown if a problem is encountered parsing the XML * markup recovered from the document in order to construct the CTText * instance which may be required to obtain the bookmarks text. */ private String getTextFromChildNodes(Node node) throws XmlException { NodeList childNodes = null; Node childNode = null; CTText text = null; StringBuilder builder = new StringBuilder(); int numChildNodes = 0; // Get a list of chid nodes from the node passed to the method and // find out how many children there are in the list. childNodes = node.getChildNodes(); numChildNodes = childNodes.getLength(); // Iterate through the children one at a time - it is possible for a // run to ciontain zero, one or more text nodes - and recover the text // from an text type child nodes. for (int i = 0; i < numChildNodes; i++) { // Get a node and check it's name. If this is 'w:t' then process as // text type node. childNode = childNodes.item(i); if (childNode.getNodeName().equals(BookMark.TEXT_NODE_NAME)) { // If the node reports it's type as txet, then simply call the // getNodeValue() method to get at it's text. if (childNode.getNodeType() == Node.TEXT_NODE) { builder.append(childNode.getNodeValue()); } else { // Correct the type by parsing the node's XML markup and // creating a CTText object. Call the getStringValue() // method on that to get the text. text = CTText.Factory.parse(childNode); builder.append(text.getStringValue()); } } } return (builder.toString()); } private void handleBookmarkedCells(String bookmarkValue, int where) { List paraList = null; List runs = null; XWPFParagraph para = null; XWPFRun readRun = null; // Get a list if paragraphs from the table cell and remove any and all. paraList = this._tableCell.getParagraphs(); for(int i = 0; i < paraList.size(); i++) { this._tableCell.removeParagraph(i); } para = this._tableCell.addParagraph(); para.createRun().setText(bookmarkValue); } }
BookMarks:    利用POI进行Word文件相关的操作,针对docx形式的封装
package com;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFTable;
import org.apache.poi.xwpf.usermodel.XWPFTableCell;
import org.apache.poi.xwpf.usermodel.XWPFTableRow;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTBookmark;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Collection;
import java.util.Set;
import org.w3c.dom.NamedNodeMap;
import org.w3c.dom.Node;

/**
 *
 * 利用POI进行Word文件相关的操作,针对docx形式的封装
 *
 * @author
 *
 * 

Modification History:

*

Date Author Description

*

------------------------------------------------------------------

*

*

*/ public class BookMarks { /** 保存Word文件中定义的标签 **/ private HashMap _bookmarks = null; /** * 构造函数,用以分析文档,解析出所有的标签 * * @param document Word OOXML document instance. */ public BookMarks(XWPFDocument document) { //初始化标签缓存 this._bookmarks = new HashMap(); // 首先解析文档普通段落中的标签 this.procParaList(document.getParagraphs()); //利用繁琐的方法,从所有的表格中得到得到标签,处理比较原始和简单 List tableList = document.getTables(); for (XWPFTable table : tableList) { //得到表格的列信息 List rowList = table.getRows(); for (XWPFTableRow row : rowList){ //得到行中的列信息 List cellList = row.getTableCells(); for (XWPFTableCell cell : cellList) { //逐个解析标签信息 //this.procParaList(cell.getParagraphs(), row); this.procParaList(cell); } } } } /** * 根据标签名称,获得标签的相关定义,如果不存在,则返回空 * @param bookmarkName 标签名称 * @return 返回封装好的对象 */ public BookMark getBookmark(String bookmarkName) { BookMark bookmark = null; if(this._bookmarks.containsKey(bookmarkName)) { bookmark = this._bookmarks.get(bookmarkName); } return bookmark; } /** * 得到所有的标签信息集合 * * @return 缓存的标签信息集合 */ public Collection getBookmarkList() { return(this._bookmarks.values()); } /** * 返回文档中的标签名称迭代器 * @return 由Map KEY 转换的迭代器 */ public Iterator getNameIterator() { return(this._bookmarks.keySet().iterator()); } private void procParaList(XWPFTableCell cell){ List paragraphList = cell.getParagraphs(); for(XWPFParagraph paragraph : paragraphList){ //得到段落中的标签标记 List bookmarkList = paragraph.getCTP().getBookmarkStartList(); for (CTBookmark bookmark : bookmarkList ) { this._bookmarks.put(bookmark.getName(), new BookMark(bookmark, paragraph, cell)); } } } /** * 解析表格中的标签 * @param paragraphList 传入的段落列表 * @param tableRow 对应的表格行对象 */ private void procParaList(List paragraphList, XWPFTableRow tableRow) { NamedNodeMap attributes = null; Node colFirstNode = null; Node colLastNode = null; int firstColIndex = 0; int lastColIndex = 0; //循环判断,解析段落中的标签 for (XWPFParagraph paragraph : paragraphList) { //得到段落中的标签标记 List bookmarkList = paragraph.getCTP().getBookmarkStartList(); for (CTBookmark bookmark : bookmarkList ) { // With a bookmark in hand, test to see if the bookmarkStart tag // has w:colFirst or w:colLast attributes. If it does, we are // dealing with a bookmarked table cell. This will need to be // handled differnetly - I think by an different concrete class // that implements the Bookmark interface!! attributes = bookmark.getDomNode().getAttributes(); if(attributes != null) { // Get the colFirst and colLast attributes. If both - for // now - are found, then we are dealing with a bookmarked // cell. colFirstNode = attributes.getNamedItem("w:colFirst"); colLastNode = attributes.getNamedItem("w:colLast"); if(colFirstNode != null && colLastNode != null) { // Get the index of the cell (or cells later) from them. // First convefrt the String values both return to primitive // int value. TO DO, what happens if there is a // NumberFormatException. firstColIndex = Integer.parseInt(colFirstNode.getNodeValue()); lastColIndex = Integer.parseInt(colLastNode.getNodeValue()); // if the indices are equal, then we are dealing with a# // cell and can create the bookmark for it. if(firstColIndex == lastColIndex) { this._bookmarks.put(bookmark.getName(), new BookMark(bookmark, paragraph, tableRow.getCell(firstColIndex))); } else { System.out.println("This bookmark " + bookmark.getName() + " identifies a number of cells in the " + "table. That condition is not handled yet."); } } else { this._bookmarks.put(bookmark.getName(), new BookMark(bookmark, paragraph,tableRow.getCell(1))); } } else { this._bookmarks.put(bookmark.getName(), new BookMark(bookmark, paragraph,tableRow.getCell(1))); } } } } /** * 解析普通段落中的标签 * @param paragraphList 传入的段落 */ private void procParaList(List paragraphList) { for (XWPFParagraph paragraph : paragraphList) { List bookmarkList = paragraph.getCTP().getBookmarkStartList(); //循环加入标签 for (CTBookmark bookmark : bookmarkList) { this._bookmarks.put(bookmark.getName(), new BookMark(bookmark, paragraph)); } } } }

使用的工具类:MSWordTool

package com;

import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.math.BigInteger;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.Map.Entry;

import org.apache.poi.POIXMLDocument;
import org.apache.poi.xwpf.usermodel.ParagraphAlignment;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFParagraph;
import org.apache.poi.xwpf.usermodel.XWPFRun;
import org.apache.poi.xwpf.usermodel.XWPFTable;
import org.apache.poi.xwpf.usermodel.XWPFTableCell;
import org.apache.poi.xwpf.usermodel.XWPFTableRow;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTHeight;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTRow;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTrPr;
import org.w3c.dom.Node;

/**
 * 使用POI,进行Word相关的操作
 *
 *
 * @author    xuyu
 *
 * 

Modification History:

*

Date Author Description

*

------------------------------------------------------------------

*

*

*/ public class MSWordTool { /** 内部使用的文档对象 **/ private XWPFDocument document; private BookMarks bookMarks = null; /** * 为文档设置模板 * @param templatePath 模板文件名称 */ public void setTemplate(String templatePath) { try { this.document = new XWPFDocument( POIXMLDocument.openPackage(templatePath)); bookMarks = new BookMarks(document); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } /** * 进行标签替换的例子,传入的Map中,key表示标签名称,value是替换的信息 * @param indicator */ public void replaceBookMark(Map indicator) { //循环进行替换 Iterator bookMarkIter = bookMarks.getNameIterator(); while (bookMarkIter.hasNext()) { String bookMarkName = bookMarkIter.next(); //得到标签名称 BookMark bookMark = bookMarks.getBookmark(bookMarkName); //进行替换 if (indicator.get(bookMarkName)!=null) { bookMark.insertTextAtBookMark(indicator.get(bookMarkName), BookMark.INSERT_BEFORE); } } } public void fillTableAtBookMark(String bookMarkName,List> content) { //rowNum来比较标签在表格的哪一行 int rowNum = 0; //首先得到标签 BookMark bookMark = bookMarks.getBookmark(bookMarkName); Map columnMap = new HashMap(); Map styleNode = new HashMap(); //标签是否处于表格内 if(bookMark.isInTable()){ //获得标签对应的Table对象和Row对象 XWPFTable table = bookMark.getContainerTable(); XWPFTableRow row = bookMark.getContainerTableRow(); CTRow ctRow = row.getCtRow(); List rowCell = row.getTableCells(); for(int i = 0; i < rowCell.size(); i++){ columnMap.put(i+"", rowCell.get(i).getText().trim()); //System.out.println(rowCell.get(i).getParagraphs().get(0).createRun().getFontSize()); //System.out.println(rowCell.get(i).getParagraphs().get(0).getCTP()); //System.out.println(rowCell.get(i).getParagraphs().get(0).getStyle()); //获取该单元格段落的xml,得到根节点 Node node1 = rowCell.get(i).getParagraphs().get(0).getCTP().getDomNode(); //遍历根节点的所有子节点 for (int x=0;x cells = newRow.getTableCells(); for(int j = 0; j < cells.size(); j++){ XWPFParagraph para = cells.get(j).getParagraphs().get(0); XWPFRun run = para.createRun(); if(content.get(i-rowNum).get(columnMap.get(j+"")) != null){ //改变单元格的值,标题栏不用改变单元格的值 run.setText(content.get(i-rowNum).get(columnMap.get(j+""))+""); //将单元格段落的字体格式设为原来单元格的字体格式 run.getCTR().getDomNode().insertBefore(styleNode.get(j+"").cloneNode(true), run.getCTR().getDomNode().getFirstChild()); } para.setAlignment(ParagraphAlignment.CENTER); } } } } public void replaceText(Map bookmarkMap, String bookMarkName) { //首先得到标签 BookMark bookMark = bookMarks.getBookmark(bookMarkName); //获得书签标记的表格 XWPFTable table = bookMark.getContainerTable(); //获得所有的表 //Iterator it = document.getTablesIterator(); if(table != null){ //得到该表的所有行 int rcount = table.getNumberOfRows(); for(int i = 0 ;i < rcount; i++){ XWPFTableRow row = table.getRow(i); //获到改行的所有单元格 List cells = row.getTableCells(); for(XWPFTableCell c : cells){ for(Entry e : bookmarkMap.entrySet()){ if(c.getText().equals(e.getKey())){ //删掉单元格内容 c.removeParagraph(0); //给单元格赋值 c.setText(e.getValue()); } } } } } } public void saveAs() { File newFile = new File("e:\\test\\Word模版_REPLACE.docx"); FileOutputStream fos = null; try { fos = new FileOutputStream(newFile); } catch (FileNotFoundException e) { // TODO Auto-generated catch block e.printStackTrace(); } try { this.document.write(fos); fos.flush(); fos.close(); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } }

测试方法

/**
	 * @param args
	 */
	public static void main(String[] args) {
		long startTime = System.currentTimeMillis();
		MSWordTool changer = new MSWordTool();
		changer.setTemplate("E:\\test\\Word.docx");
		Map content = new HashMap();
		content.put("Principles", "格式规范、标准统一、利于阅览");
		content.put("Purpose", "规范会议操作、提高会议质量");
		content.put("Scope", "公司会议、部门之间业务协调会议");

		content.put("customerName", "**有限公司");
		content.put("address", "机场路2号");
		content.put("userNo", "3021170207");
		content.put("tradeName", "水泥制造");
		content.put("price1", "1.085");
		content.put("price2", "0.906");
		content.put("price3", "0.433");
		content.put("numPrice", "0.675");

		content.put("company_name", "**有限公司");
		content.put("company_address", "机场路2号");
		changer.replaceBookMark(content);


		//替换表格标签
		List> content2 = new ArrayList>();
		Map table1 = new HashMap();

		table1.put("MONTH", "*月份");
		table1.put("SALE_DEP", "75分");
		table1.put("TECH_CENTER", "80分");
		table1.put("CUSTOMER_SERVICE", "85分");
		table1.put("HUMAN_RESOURCES", "90分");
		table1.put("FINANCIAL", "95分");
		table1.put("WORKSHOP", "80分");
		table1.put("TOTAL", "85分");

		for(int i = 0; i < 3; i++){
			content2.add(table1);
		}
		changer.fillTableAtBookMark("Table" ,content2);
		changer.fillTableAtBookMark("month", content2);

		//表格中文本的替换
		Map table = new HashMap();
		table.put("CUSTOMER_NAME", "**有限公司");
		table.put("ADDRESS", "机场路2号");
		table.put("USER_NO", "3021170207");
		table.put("tradeName", "水泥制造");
		table.put("PRICE_1", "1.085");
		table.put("PRICE_2", "0.906");
		table.put("PRICE_3", "0.433");
		table.put("NUM_PRICE", "0.675");
		changer.replaceText(table,"Table2");

		//保存替换后的WORD
		changer.saveAs();
		System.out.println("time=="+(System.currentTimeMillis() - startTime));

	}

文案中使用的word文档也是从(http://www.jb51.net/article/101910.htm)中项目中获得的使用,测试完全可以

这里主要的区别就是,他使用的是poi3.9的,但是引用3.17的话就会报错,有些方法进行了修改。

Java 使用 POI 3.17根据Word 模板替换、操作书签_第1张图片

它的修改之后,我们有一些方法不能使用。需要引入新的包。我们可以在poi的官网上进行下载3.17的包

这里附上下载的地址:https://poi.apache.org/download.html

下载解压后如下图所示:

Java 使用 POI 3.17根据Word 模板替换、操作书签_第2张图片

其中ooxml-lib就是之前没有的或者说是修改后分离出来的。在项目中引用就可以了。

Java 使用 POI 3.17根据Word 模板替换、操作书签_第3张图片

当然了,poi相关的也要添加进来。本人测试可行。

如有需要的童鞋,可以去克隆下来看一下https://github.com/cocoforgod/J2W


你可能感兴趣的:(POI,Java)