文档在线预览解决方案——openoffice转换

文档在线预览是一个复杂功能,文档格式的繁复更加增加了难度,虽然office给出了在线预览功能(https://products.office.com/en-us/office-online/view-office-documents-online)但是仍感觉限制多多。

笔者目前研究的方案是借助openoffice转换为pdf在线预览,目前大多浏览器均支持pdf在线预览,目前手机端浏览器还无法支持,有两种方案:1、pdf.js 2、将pdf转为图片预览。

首先,接口定义

public interface ResourceConverter {

    /**
     * 临时文件存储目录
     */
    String CONF_KEY_TEMP_DIR = "temp.dir";
    String CONF_KEY_MAX_PAGE_SIZE = "maxPageSize";

    /**
     * 是否支持转化指定资源
     *
     * @param resourceUri
     * @return
     */
    boolean support(String resourceUri);

    /**
     * @param resourceUri
     * @param config
     * @return
     */
    State trans(String resourceUri, Map config);

    /**
     * 将中间生成的临时文件存储到正式文件仓库,不论成功与否完成后均删除临时文件
     *
     * @param tempImg
     * @param index
     * @return 存储的正式地址URI,若保存失败则返回 null
     */
    String save(File tempImg, int index);

}

因为目的明确,接口定义也就很简单,核心方法就是trans,将给定资源转换并给出结果,针对个性化设置(例如:临时文件存储路径,清晰度等)通过config传递。support方法判断接口实现是否支持给定资源转换,定义此方法为了方便上下文切换算法,借鉴策略模式。为了统一输出结果,笔者定义了State。

public class State {
    private boolean state = false;
    private String info = null;
    private Map infoMap = new HashMap();

    State(boolean state, String info) {
        this.state = state;
        this.info = info;
    }

    public static State errorResource(String info) {
        return new State(false, StringUtils.defaultString(info, "原始资源错误"));
    }

    public static State pageSizeLimit() {
        return new State(false, "资源页数超出,上限200页");
    }

   // 省略getter,setter
}

State定义也很简单,bool值说明转换状态成功或失败,info提供简略信息,infoMap存储最终转换后有效信息。同时还提供了两个常用的语义化抽象工厂方法,便于返回状态。

按照惯例,接口实现给出骨架实现(好处:1、确定接口实现的通用功能,方便个性化实现调用,2、易于扩展,一旦接口新增方法,骨架实现可以给出默认实现,以前定义的实现类就不需要更新)

public abstract class AbstractResourcesConverter implements ResourceConverter {

    private static QiniuUtil qiniuUtil = QiniuUtil.getInstance(QiniuUtil.Namespace.ADMIN);

    public abstract State doTransHttpResource(InputStream inputStream, URL url, Map config);

    public abstract State doTransFileResource(File file, String filePath, Map config);

    public abstract String getTempDir(Map config);


    @Override
    public State trans(String resourceUri, Map config) {
        if (StringUtils.isBlank(resourceUri)) {
            return State.errorResource(null);
        }
        if (resourceUri.startsWith("http")) {
            try {
                URL url = new URL(resourceUri);
                HttpURLConnection conn = (HttpURLConnection) url.openConnection();
                //设置超时10S
                conn.setConnectTimeout(10000);
                return doTransHttpResource(conn.getInputStream(), url, config);
            } catch (Exception e) {
                e.printStackTrace();
                return State.errorResource(e.getMessage());
            }
        } else {
            File file = new File(resourceUri);
            if (!file.exists()) {
                return State.errorResource(null);
            }
            return doTransFileResource(file, resourceUri, config);
        }
    }

    protected int getMaxPageSize(Map config) {
        Object dpi = config.get(CONF_KEY_MAX_PAGE_SIZE);
        if (dpi == null) {
            return 200;
        }
        return (int) dpi;
    }

    protected void makeTempDir(Map config) {
        String imgFilePathPrefix = getTempDir(config);
        File dir = new File(imgFilePathPrefix);
        if (!dir.exists()) {
            dir.mkdirs();
        }
    }

    @Override
    public String save(File tempImg, int index) {
        String key = qiniuUtil.upload(tempImg);
        tempImg.delete();
        return key == null ? key : key;
    }
}

笔者给出的骨架实现中,有三点需要说明:
1. 分化了http资源和本地文件资源,并提前处理了错误资源的反馈。
2. 将创建临时文件目录代码做了通用处理
3. 为临时资源的存储也给了默认实现,在正式环境,一般会将资源存储到服务器上,笔者这里将转化资源存储到七牛云上

各种资源的转换实现

使用openoffice将文档类型转为pdf

public class Doc2PdfConverter extends AbstractResourcesConverter {
    private static final Logger logger = LoggerFactory.getLogger(Doc2PdfConvert.class);
    public static final String CONF_KEY_OPEN_OFFICE_HOST = "openoffice.host";
    public static final String CONF_KEY_OPEN_OFFICE_PORT = "openoffice.port";
    public static final String SUPPORT_TYPE_PATTERN = ".*(.doc|.docx|.ppt|.pptx|.xls|.xlsx)$";
    public static final String RESULT_INFO_KEY_PDF = "pdf";

    @Override
    public boolean support(String resourceUri) {
        return Pattern.matches(SUPPORT_TYPE_PATTERN, resourceUri);
    }

    @Override
    public State doTransHttpResource(InputStream inputStream, URL url, Map config) {
        try {
            return doTrans(inputStream, buildFormat(url.toString()), config);
        } catch (Exception e) {
            return State.errorResource(null);
        }
    }

    @Override
    public State doTransFileResource(File file, String filePath, Map config) {
        try {
            return doTrans(new FileInputStream(file), buildFormat(filePath), config);
        } catch (Exception e) {
            return State.errorResource(null);
        }
    }

    private State doTrans(InputStream inputStream, DocumentFormat sourceType, Map config) {
        State state = new State(true, "");
        int openofficePort = getOpenofficePort(config);
        String openofficeHost = getOpenofficeHost(config);
        String pdfOutputFile = getTempDir(config) + System.currentTimeMillis() + ".pdf";
        OpenOfficeConnection connection = new SocketOpenOfficeConnection(openofficeHost, openofficePort);
        OutputStream dist = null;
        try {
            makeTempDir(config);
            connection.connect();
            dist = new FileOutputStream(pdfOutputFile);
            DocumentConverter converter = new OpenOfficeDocumentConverter(connection);
            converter.convert(inputStream, sourceType, dist, buildFormat(pdfOutputFile));
            state.putInfo(RESULT_INFO_KEY_PDF, pdfOutputFile);
        } catch (Exception e) {
            logger.error("open office converting ERROR, maybe it's not running with port" + openofficePort, e);
            // 删除临时文件
            new File(pdfOutputFile).deleteOnExit();
            return State.errorResource("open office converting occur an ERROR");
        } finally {
            connection.disconnect();
            IOUtils.closeQuietly(inputStream);
            IOUtils.closeQuietly(dist);
        }
        return state;
    }

    private DocumentFormat buildFormat(String filePath) {
        String extension = FilenameUtils.getExtension(filePath);
        DefaultDocumentFormatRegistry defaultDocumentFormatRegistry = new DefaultDocumentFormatRegistry();
        DocumentFormat format = defaultDocumentFormatRegistry.getFormatByFileExtension(extension);
        return format;
    }

    private int getOpenofficePort(Map config) {
        Object port = config.get(CONF_KEY_OPEN_OFFICE_PORT);
        if (port == null) {
            return 8100;
        }
        return (int) port;
    }

    private String getOpenofficeHost(Map config) {
        Object host = config.get(CONF_KEY_OPEN_OFFICE_HOST);
        if (host == null) {
            return "localhost";
        }
        return (String) host;
    }

    @Override
    public String getTempDir(Map config) {
        return config.get(CONF_KEY_TEMP_DIR) + "/doc/";
    }

}

这里的具体实现就没什么可说的了,使用openoffice提供的jar(jodconverter-2.2.2.jar)即可,有一个坑需要指出的是:jar包版本2.2.2才支持docx、pptx,但是maven中央仓库没有这个版本,笔者的解决方案是从官网下载jar包之后上传至maven私服,此jar包还有很多其他依赖,所以还需要上传pom文件。

pdf转图片

public class Pdf2ImgConverter extends AbstractResourcesConverter {
    public static final String CONF_KEY_DPI = "pdf.dpi";
    public static final String RESULT_INFO_KEY_DIST = "dist";

    @Override
    public State doTransHttpResource(InputStream inputStream, URL url, Map config) {
        try {
            PDDocument document = PDDocument.load(inputStream);
            PDFRenderer renderer = new PDFRenderer(document);
            PdfReader pdfReader = new PdfReader(url);
            return doTrans(renderer, pdfReader, config);
        } catch (Exception e) {
            return State.errorResource(null);
        }
    }

    @Override
    public State doTransFileResource(File file, String filePath, Map config) {
        try {
            PDDocument document = PDDocument.load(file);
            PDFRenderer renderer = new PDFRenderer(document);
            PdfReader pdfReader = new PdfReader(filePath);
            return doTrans(renderer, pdfReader, config);
        } catch (Exception e) {
            return State.errorResource(null);
        }
    }

    @Override
    public String getTempDir(Map config) {
        return config.get(CONF_KEY_TEMP_DIR) + "/pdf/";
    }

    protected State doTrans(PDFRenderer renderer, PdfReader pdfReader, Map config) {
        State state = new State(true, "");
        try {
            int pageCount = pdfReader.getNumberOfPages();
            int maxPageSize = getMaxPageSize(config);
            if (pageCount > maxPageSize) {
                return State.pageSizeLimit();
            }
            makeTempDir(config);
            List distFiles = new ArrayList<>();
            for (int i = 0; i < pageCount; i++) {
                BufferedImage image = renderer.renderImageWithDPI(i, getDPIFromConfig(config));
                File distFile = new File(getTempDir(config) + System.currentTimeMillis() + ".png");
                ImageIO.write(image, "png", distFile);
                distFiles.add(save(distFile, i));
            }
            state.putInfo(RESULT_INFO_KEY_DIST, StringUtils.join(distFiles, ","));
        } catch (Exception e) {
            return new State(false, "转换失败");
        }
        return state;
    }

    protected float getDPIFromConfig(Map config) {
        Object dpi = config.get(CONF_KEY_DPI);
        if (dpi == null) {
            return 100F;
        }
        return (float) dpi;
    }


    @Override
    public boolean support(String resourceUri) {
        return StringUtils.endsWith(resourceUri, ".pdf");
    }

}

最后需要在Linux安装openoffice才能真正进行转换

官网下载安装包

tar -zxvf Apache_OpenOffice_4.x.x _Linux_x86-64_install-rpm_zh-CN.tar.gz

解压之后会在当前目录生成 zh-CN 文件夹,进入zh-CN里面的RPMS

cd ./zh-CN/RPMS

yum localinstall *.rpm

成功之后,会在当前目录生成desktop-integration文件夹,进入到此文件夹

cd ./desktop-integration

yum localinstall openoffice_4.x.x-redhat-menus-x.x.x.noarch.rpm

成功安装后会在 /opt目录下生成oppenoffice4文件夹

启动

nohup /opt/openoffice4/program/soffice -headless -accept="socket,host=127.0.0.1,port=8100;urp;" -nofirststartwizard &

可能出现的问题

 error while loading shared libraries: libXext.so.6

 解决:
  yum install libXext.x86_64

  no suitable windowing system found, exiting

  解决:
  yum groupinstall "X Window System" 

查看服务是否成功启动

ps -ef | grep openoffice

netstat -lntp | grep 8100

环境准备完成后可能需要的问题就是中文乱码,这是因为Linux没有win上的字体文件,需要安装字体文件,安装完成后需要重启openoffice,openoffice提供了一个客户端jar,可以借用此jar包测试转换是否完好(需要将其所依赖的所有jar包放在同一目录下才能运行,官网下载后在lib目录下有所有jar包)

java -jar jodconverter-cli-2.2.2.jar source.doc tagert.pdf

你可能感兴趣的:(后台,Linux)