word转换成html(poi)

使用word转换成html常见问题:版本、图片无法显示问题。

  • 版本

异常:(The supplied data appears to be in the Office 2007+ XML. You are calling the part of POI that deals with OLE2 Office Documents. You need to call a different part of POI to process this data (eg XSSF instead of HSSF))

1、判断文件类型是03版本的word,还是07版本。

                String wordURL="";	//WORD 地址
		String htmlURL="";	//html 地址
		File file=new File(wordURL);
		InputStream inputStream =new FileInputStream(file);
		String fileName=file.getName();    
	        String             fileTyle=fileName.substring(fileName.lastIndexOf("."),fileName.length()); 
		if (".doc".equals(fileTyle)) { // word 03 版本
			docToHtml(inputStream, htmlURL);
		} else if (".docx".equals(fileTyle)) {// word 07 版本
			docx2Html(inputStream, htmlURL);
		}

2、03版本调用方法

/**
	 * 生成dochtml
	 * @Description: TODO
	 * @param @param inputStream
	 * @param @param outPutFile
	 * @param @param wmfValueList
	 * @param @throws Exception   
	 * @return void  
	 * @throws
	 * @author willie
	 * @date 2019年8月17日
	 */
	public static void docToHtml(InputStream inputStream, String outPutFile) throws Exception {
		int nPos = outPutFile.lastIndexOf(File.separator);
		if (nPos < 0) {
			throw new IOException("输出文件路径: " + outPutFile + "中不包含:" + File.separator);
		}
		// 创建存放图片的目录
		final String savePath = outPutFile.substring(0, nPos + File.separator.length());
		File fDir = new File(savePath + "fileimgs" + File.separator);
		fDir.mkdir();

		HWPFDocument wordDocument = new HWPFDocument(inputStream);

		WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
				DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
		wordToHtmlConverter.setPicturesManager(new PicturesManager() {
			public String savePicture(byte[] content, PictureType pictureType, String suggestedName, float widthInches,
					float heightInches) {
				if (pictureType == PictureType.WMF) {
					return "";
				}
				String path = "fileimgs" + File.separator + suggestedName;
				System.out.println("filepath:" + path);
				try {
					// 图片生成
					createFile(savePath + "fileimgs" + File.separator, suggestedName, content);
				} catch (Exception e) {
					e.printStackTrace();
				}
				return path;
			}
		});
		wordToHtmlConverter.processDocument(wordDocument);

		Document htmlDocument = wordToHtmlConverter.getDocument();
		ByteArrayOutputStream out = new ByteArrayOutputStream();
		DOMSource domSource = new DOMSource(htmlDocument);
		StreamResult streamResult = new StreamResult(out);

		TransformerFactory tf = TransformerFactory.newInstance();
		Transformer serializer = tf.newTransformer();
		serializer.setOutputProperty("encoding", "GB2312");
		serializer.setOutputProperty("indent", "yes");
		serializer.setOutputProperty("method", "html");
		serializer.transform(domSource, streamResult);
		out.close();
		// 文件写入
		writeFile(new String(out.toByteArray()), outPutFile);
	}

3、07版本调用方法

/**
	 * docx格式word转换为html
	 * @Description: TODO
	 * @param @param inputStream
	 * @param @param outPutFile
	 * @param @throws IOException   
	 * @return void  
	 * @throws
	 * @author willie
	 * @date 2019年8月17日
	 */
	public static void docx2Html(InputStream inputStream, String outPutFile) throws IOException {
		OutputStreamWriter outputStreamWriter = null;
		try {
			int nPos = outPutFile.lastIndexOf(File.separator);
			if (nPos < 0) {
				throw new IOException("输出文件路径: " + outPutFile + "中不包含:" + File.separator);
			}
			// 创建存放图片的目录
			final String savePath = outPutFile.substring(0, nPos + File.separator.length());
			File fDir = new File(savePath + "fileimgs" + File.separator);
			fDir.mkdir();
			XWPFDocument document = new XWPFDocument(inputStream);
			XHTMLOptions options = XHTMLOptions.create();
			// 存放图片的文件夹
			options.setExtractor(new FileImageExtractor(new File(savePath + "fileimgs" + File.separator)));
			// html中图片的路径
			options.URIResolver(new BasicURIResolver("fileimgs"));
			outputStreamWriter = new OutputStreamWriter(new FileOutputStream(outPutFile), "utf-8");
			XHTMLConverter xhtmlConverter = (XHTMLConverter) XHTMLConverter.getInstance();
			xhtmlConverter.convert(document, outputStreamWriter, options);
		} finally {
			if (outputStreamWriter != null) {
				outputStreamWriter.close();
			}
		}

	}
  • 图片无法显示

问题解释:包含图片无法正常显示,以及缺少图片。

03版本:

word转换成html(poi)_第1张图片

问题描述:savePicture方法执行7次,在获取allPictures时,pics长度为4,出现图片丢失情况,在pic.suggerFullFileName()中,调试过程中savePicture中文件名称或顺序不对应的情况。

问题解决办法:使用上述03版本调用方法

 

 

 

 

你可能感兴趣的:(WEB开发)