Docx4j的Word转PDF及HTML的实现

最近在研究Web中的文档管理,文档管理永远是企业中的很重要的一部分,其中遇到一最大的难题就是如何实现把大部分常用的文档进行在线预览及搜索,如果仅是实现Doc的展示,问题比较简单,JAVA中对Word的操作比较多。一国外哥们还对不同的技术框架写了文档的展示
https://angelozerr.wordpress.com/2012/12/06/how-to-convert-docxodt-to-pdfhtml-with-java/

本文对docx4j3.2.1的版本进行了测试,发现使用上还是比较方便,目前这个框架还分商业版及免费版本,商业版本多一些对OLE的嵌入式的对象进行转换。虽然该平台提供了PPTX,XLSX的转换,但目前其中的示例不提供这种转换示例。

对下为转换的代码示例:

package com.redxun.core.pdf.docx4j;

import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;
import org.apache.commons.io.IOUtils;
import org.docx4j.Docx4J;
import org.docx4j.Docx4jProperties;
import org.docx4j.convert.out.FOSettings;
import org.docx4j.convert.out.HTMLSettings;
import org.docx4j.fonts.IdentityPlusMapper;
import org.docx4j.fonts.Mapper;
import org.docx4j.fonts.PhysicalFonts;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;

/**
 * dox转pdfl工具类
 * @author redxun
 */
public class PdfTool {

    /**
     * docx文档转换为PDF
     *
     * @param docx docx文档
     * @param pdfPath PDF文档存储路径
     * @throws Exception 可能为Docx4JException, FileNotFoundException, IOException等
     */
    public static void convertDocxToPDF(String docxPath, String pdfPath) throws Exception {
        OutputStream os = null;
        try {
            WordprocessingMLPackage mlPackage = WordprocessingMLPackage.load(new File(docxPath));
            //Mapper fontMapper = new BestMatchingMapper();
            Mapper fontMapper = new IdentityPlusMapper();
            fontMapper.put("华文行楷", PhysicalFonts.get("STXingkai"));
            fontMapper.put("华文仿宋", PhysicalFonts.get("STFangsong"));
            fontMapper.put("隶书", PhysicalFonts.get("LiSu"));
            mlPackage.setFontMapper(fontMapper);

            os = new java.io.FileOutputStream(pdfPath);

            FOSettings foSettings = Docx4J.createFOSettings();
            foSettings.setWmlPackage(mlPackage);
            Docx4J.toFO(foSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);

        }catch(Exception ex){
            ex.printStackTrace();
        }finally {
            IOUtils.closeQuietly(os);
        }
    }
    
    /**
     * 把docx转成html
     * @param docxFilePath
     * @param htmlPath
     * @throws Exception 
     */
    public static void convertDocxToHtml(String docxFilePath,String htmlPath) throws Exception{
        
	WordprocessingMLPackage wordMLPackage= Docx4J.load(new java.io.File(docxFilePath));

    	HTMLSettings htmlSettings = Docx4J.createHTMLSettings();
        String imageFilePath=htmlPath.substring(0,htmlPath.lastIndexOf("/")+1)+"/images";
    	htmlSettings.setImageDirPath(imageFilePath);
    	htmlSettings.setImageTargetUri( "images");
    	htmlSettings.setWmlPackage(wordMLPackage);

    	String userCSS = "html, body, div, span, h1, h2, h3, h4, h5, h6, p, a, img,  ol, ul, li, table, caption, tbody, tfoot, thead, tr, th, td " +
    			"{ margin: 0; padding: 0; border: 0;}" +
    			"body {line-height: 1;} ";
        
    	htmlSettings.setUserCSS(userCSS);

        OutputStream os;
        
        os = new FileOutputStream(htmlPath);

    	Docx4jProperties.setProperty("docx4j.Convert.Out.HTML.OutputMethodXML", true);

        Docx4J.toHTML(htmlSettings, os, Docx4J.FLAG_EXPORT_PREFER_XSL);
       
    }
}

 

你可能感兴趣的:(Docx4j的Word转PDF及HTML的实现)