应用POI,word2007转html

 

poi 3.9

http://poi.apache.org/

 

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;

import org.apache.poi.xwpf.converter.core.FileImageExtractor;
import org.apache.poi.xwpf.converter.core.FileURIResolver;
import org.apache.poi.xwpf.converter.xhtml.XHTMLConverter;
import org.apache.poi.xwpf.converter.xhtml.XHTMLOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFPictureData;
//import org.junit.Assert;
//import org.junit.Test;

public class word07toHtml {

	//@Test
	public static void canExtractImage() throws IOException {
		File f = new File("d:/test/test.docx");
		if (!f.exists()) {
			System.out.println("Sorry File does not Exists!");
		} else {
			if (f.getName().endsWith(".docx") || f.getName().endsWith(".DOCX")) {
				
				// 1) Load DOCX into XWPFDocument
				InputStream in = new FileInputStream(f);
				XWPFDocument document = new XWPFDocument(in);

				// 2) Prepare XHTML options (here we set the IURIResolver to
				// load images from a "word/media" folder)
				File imageFolderFile = new File("d:/test/media");
				XHTMLOptions options = XHTMLOptions.create().URIResolver(
						new FileURIResolver(imageFolderFile));
				options.setExtractor(new FileImageExtractor(imageFolderFile));
				//options.setIgnoreStylesIfUnused(false);
				//options.setFragment(true);
				
				// 3) Convert XWPFDocument to XHTML
				OutputStream out = new FileOutputStream(new File(
						"d:/test/test.htm"));
				XHTMLConverter.getInstance().convert(document, out, options);
			} else {
				System.out.println("Enter only MS Office 2007+ files");
			}
		}
	}
	
	public static void main(String args[]) {
		try {
			canExtractImage();
		} catch (IOException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
	}
}

 

其中org.apache.poi.xwpf.converter需要扩展包

如果你的项目用到了maven做如下配置即可,若果没用maven,请从本文附件下载

1.0.4 对应的是 poi 3.9

1.0.0 对应的是 poi 3.8

import org.apache.poi.xwpf.converter.xhtml.XHTMLConverter;
import org.apache.poi.xwpf.converter.xhtml.XHTMLOptions;

所需jar包

<dependencies>
	<dependency>
		<groupId> fr.opensagres.xdocreport</groupId>
		<artifactId> org.apache.poi.xwpf.converter.core</artifactId>
		<version> 1.0.4</version>
	</dependency>
	<dependency>
		<groupId> fr.opensagres.xdocreport</groupId>
		<artifactId> org.apache.poi.xwpf.converter.xhtml</artifactId>
		<version> 1.0.4</version>
	</dependency>
</dependencies>

 

如果报错:

java.lang.ClassNotFoundException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTSectPrImpl$1HeaderReferenceList

请添加 ooxml-schemas-1.1.jar

java.lang.ClassNotFoundException: org.openxmlformats.schemas.wordprocessingml.x2006.main.impl.CTBodyImpl$1TblList

也是需要 ooxml-schemas-1.1.jar

用maven的会自动下来,没用maven的请从本文附件下载ooxml-schemas-1.1.rar,需要解压

 

不过,发现转换后的table没有边框,有待解决

 

java word转html(03,07) jacob,openoffcie,poi

http://happyqing.iteye.com/blog/2086437

 

你可能感兴趣的:(html,poi,word,XHTMLConverter,XHTMLOptions)