文档在线预览是一个复杂功能,文档格式的繁复更加增加了难度,虽然office给出了在线预览功能(https://products.office.com/en-us/office-online/view-office-documents-online)但是仍感觉限制多多。
笔者目前研究的方案是借助openoffice转换为pdf在线预览,目前大多浏览器均支持pdf在线预览,目前手机端浏览器还无法支持,有两种方案:1、pdf.js 2、将pdf转为图片预览。
首先,接口定义
public interface ResourceConverter {
/**
* 临时文件存储目录
*/
String CONF_KEY_TEMP_DIR = "temp.dir";
String CONF_KEY_MAX_PAGE_SIZE = "maxPageSize";
/**
* 是否支持转化指定资源
*
* @param resourceUri
* @return
*/
boolean support(String resourceUri);
/**
* @param resourceUri
* @param config
* @return
*/
State trans(String resourceUri, Map config);
/**
* 将中间生成的临时文件存储到正式文件仓库,不论成功与否完成后均删除临时文件
*
* @param tempImg
* @param index
* @return 存储的正式地址URI,若保存失败则返回 null
*/
String save(File tempImg, int index);
}
因为目的明确,接口定义也就很简单,核心方法就是trans,将给定资源转换并给出结果,针对个性化设置(例如:临时文件存储路径,清晰度等)通过config传递。support方法判断接口实现是否支持给定资源转换,定义此方法为了方便上下文切换算法,借鉴策略模式。为了统一输出结果,笔者定义了State。
public class State {
private boolean state = false;
private String info = null;
private Map infoMap = new HashMap();
State(boolean state, String info) {
this.state = state;
this.info = info;
}
public static State errorResource(String info) {
return new State(false, StringUtils.defaultString(info, "原始资源错误"));
}
public static State pageSizeLimit() {
return new State(false, "资源页数超出,上限200页");
}
// 省略getter,setter
}
State定义也很简单,bool值说明转换状态成功或失败,info提供简略信息,infoMap存储最终转换后有效信息。同时还提供了两个常用的语义化抽象工厂方法,便于返回状态。
按照惯例,接口实现给出骨架实现(好处:1、确定接口实现的通用功能,方便个性化实现调用,2、易于扩展,一旦接口新增方法,骨架实现可以给出默认实现,以前定义的实现类就不需要更新)
public abstract class AbstractResourcesConverter implements ResourceConverter {
private static QiniuUtil qiniuUtil = QiniuUtil.getInstance(QiniuUtil.Namespace.ADMIN);
public abstract State doTransHttpResource(InputStream inputStream, URL url, Map config);
public abstract State doTransFileResource(File file, String filePath, Map config);
public abstract String getTempDir(Map config);
@Override
public State trans(String resourceUri, Map config) {
if (StringUtils.isBlank(resourceUri)) {
return State.errorResource(null);
}
if (resourceUri.startsWith("http")) {
try {
URL url = new URL(resourceUri);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
//设置超时10S
conn.setConnectTimeout(10000);
return doTransHttpResource(conn.getInputStream(), url, config);
} catch (Exception e) {
e.printStackTrace();
return State.errorResource(e.getMessage());
}
} else {
File file = new File(resourceUri);
if (!file.exists()) {
return State.errorResource(null);
}
return doTransFileResource(file, resourceUri, config);
}
}
protected int getMaxPageSize(Map config) {
Object dpi = config.get(CONF_KEY_MAX_PAGE_SIZE);
if (dpi == null) {
return 200;
}
return (int) dpi;
}
protected void makeTempDir(Map config) {
String imgFilePathPrefix = getTempDir(config);
File dir = new File(imgFilePathPrefix);
if (!dir.exists()) {
dir.mkdirs();
}
}
@Override
public String save(File tempImg, int index) {
String key = qiniuUtil.upload(tempImg);
tempImg.delete();
return key == null ? key : key;
}
}
笔者给出的骨架实现中,有三点需要说明:
1. 分化了http资源和本地文件资源,并提前处理了错误资源的反馈。
2. 将创建临时文件目录代码做了通用处理
3. 为临时资源的存储也给了默认实现,在正式环境,一般会将资源存储到服务器上,笔者这里将转化资源存储到七牛云上
各种资源的转换实现
使用openoffice将文档类型转为pdf
public class Doc2PdfConverter extends AbstractResourcesConverter {
private static final Logger logger = LoggerFactory.getLogger(Doc2PdfConvert.class);
public static final String CONF_KEY_OPEN_OFFICE_HOST = "openoffice.host";
public static final String CONF_KEY_OPEN_OFFICE_PORT = "openoffice.port";
public static final String SUPPORT_TYPE_PATTERN = ".*(.doc|.docx|.ppt|.pptx|.xls|.xlsx)$";
public static final String RESULT_INFO_KEY_PDF = "pdf";
@Override
public boolean support(String resourceUri) {
return Pattern.matches(SUPPORT_TYPE_PATTERN, resourceUri);
}
@Override
public State doTransHttpResource(InputStream inputStream, URL url, Map config) {
try {
return doTrans(inputStream, buildFormat(url.toString()), config);
} catch (Exception e) {
return State.errorResource(null);
}
}
@Override
public State doTransFileResource(File file, String filePath, Map config) {
try {
return doTrans(new FileInputStream(file), buildFormat(filePath), config);
} catch (Exception e) {
return State.errorResource(null);
}
}
private State doTrans(InputStream inputStream, DocumentFormat sourceType, Map config) {
State state = new State(true, "");
int openofficePort = getOpenofficePort(config);
String openofficeHost = getOpenofficeHost(config);
String pdfOutputFile = getTempDir(config) + System.currentTimeMillis() + ".pdf";
OpenOfficeConnection connection = new SocketOpenOfficeConnection(openofficeHost, openofficePort);
OutputStream dist = null;
try {
makeTempDir(config);
connection.connect();
dist = new FileOutputStream(pdfOutputFile);
DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
converter.convert(inputStream, sourceType, dist, buildFormat(pdfOutputFile));
state.putInfo(RESULT_INFO_KEY_PDF, pdfOutputFile);
} catch (Exception e) {
logger.error("open office converting ERROR, maybe it's not running with port" + openofficePort, e);
// 删除临时文件
new File(pdfOutputFile).deleteOnExit();
return State.errorResource("open office converting occur an ERROR");
} finally {
connection.disconnect();
IOUtils.closeQuietly(inputStream);
IOUtils.closeQuietly(dist);
}
return state;
}
private DocumentFormat buildFormat(String filePath) {
String extension = FilenameUtils.getExtension(filePath);
DefaultDocumentFormatRegistry defaultDocumentFormatRegistry = new DefaultDocumentFormatRegistry();
DocumentFormat format = defaultDocumentFormatRegistry.getFormatByFileExtension(extension);
return format;
}
private int getOpenofficePort(Map config) {
Object port = config.get(CONF_KEY_OPEN_OFFICE_PORT);
if (port == null) {
return 8100;
}
return (int) port;
}
private String getOpenofficeHost(Map config) {
Object host = config.get(CONF_KEY_OPEN_OFFICE_HOST);
if (host == null) {
return "localhost";
}
return (String) host;
}
@Override
public String getTempDir(Map config) {
return config.get(CONF_KEY_TEMP_DIR) + "/doc/";
}
}
这里的具体实现就没什么可说的了,使用openoffice提供的jar(jodconverter-2.2.2.jar)即可,有一个坑需要指出的是:jar包版本2.2.2才支持docx、pptx,但是maven中央仓库没有这个版本,笔者的解决方案是从官网下载jar包之后上传至maven私服,此jar包还有很多其他依赖,所以还需要上传pom文件。
pdf转图片
public class Pdf2ImgConverter extends AbstractResourcesConverter {
public static final String CONF_KEY_DPI = "pdf.dpi";
public static final String RESULT_INFO_KEY_DIST = "dist";
@Override
public State doTransHttpResource(InputStream inputStream, URL url, Map config) {
try {
PDDocument document = PDDocument.load(inputStream);
PDFRenderer renderer = new PDFRenderer(document);
PdfReader pdfReader = new PdfReader(url);
return doTrans(renderer, pdfReader, config);
} catch (Exception e) {
return State.errorResource(null);
}
}
@Override
public State doTransFileResource(File file, String filePath, Map config) {
try {
PDDocument document = PDDocument.load(file);
PDFRenderer renderer = new PDFRenderer(document);
PdfReader pdfReader = new PdfReader(filePath);
return doTrans(renderer, pdfReader, config);
} catch (Exception e) {
return State.errorResource(null);
}
}
@Override
public String getTempDir(Map config) {
return config.get(CONF_KEY_TEMP_DIR) + "/pdf/";
}
protected State doTrans(PDFRenderer renderer, PdfReader pdfReader, Map config) {
State state = new State(true, "");
try {
int pageCount = pdfReader.getNumberOfPages();
int maxPageSize = getMaxPageSize(config);
if (pageCount > maxPageSize) {
return State.pageSizeLimit();
}
makeTempDir(config);
List distFiles = new ArrayList<>();
for (int i = 0; i < pageCount; i++) {
BufferedImage image = renderer.renderImageWithDPI(i, getDPIFromConfig(config));
File distFile = new File(getTempDir(config) + System.currentTimeMillis() + ".png");
ImageIO.write(image, "png", distFile);
distFiles.add(save(distFile, i));
}
state.putInfo(RESULT_INFO_KEY_DIST, StringUtils.join(distFiles, ","));
} catch (Exception e) {
return new State(false, "转换失败");
}
return state;
}
protected float getDPIFromConfig(Map config) {
Object dpi = config.get(CONF_KEY_DPI);
if (dpi == null) {
return 100F;
}
return (float) dpi;
}
@Override
public boolean support(String resourceUri) {
return StringUtils.endsWith(resourceUri, ".pdf");
}
}
最后需要在Linux安装openoffice才能真正进行转换
官网下载安装包
tar -zxvf Apache_OpenOffice_4.x.x _Linux_x86-64_install-rpm_zh-CN.tar.gz
解压之后会在当前目录生成 zh-CN 文件夹,进入zh-CN里面的RPMS
cd ./zh-CN/RPMS
yum localinstall *.rpm
成功之后,会在当前目录生成desktop-integration文件夹,进入到此文件夹
cd ./desktop-integration
yum localinstall openoffice_4.x.x-redhat-menus-x.x.x.noarch.rpm
成功安装后会在 /opt目录下生成oppenoffice4文件夹
启动
nohup /opt/openoffice4/program/soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard &
可能出现的问题
error while loading shared libraries: libXext.so.6
解决:
yum install libXext.x86_64
no suitable windowing system found, exiting
解决:
yum groupinstall "X Window System"
查看服务是否成功启动
ps -ef | grep openoffice
netstat -lntp | grep 8100
环境准备完成后可能需要的问题就是中文乱码,这是因为Linux没有win上的字体文件,需要安装字体文件,安装完成后需要重启openoffice,openoffice提供了一个客户端jar,可以借用此jar包测试转换是否完好(需要将其所依赖的所有jar包放在同一目录下才能运行,官网下载后在lib目录下有所有jar包)
java -jar jodconverter-cli-2.2.2.jar source.doc tagert.pdf