LibreOffice操作office文档

Java 利用 LibreOffice/OpenOffice 将 Office 文档(.doc/.docx .ppt/.pptx )转换成 PDF,进而转图片,实现在线预览功能

2020-02-23 

  • java
    • LibreOffice
      • OpenOffice
        • PDF

  阅读量:570次 | 字数:2.6k | 阅读时长:大约13分钟

  • 项目中需要将 Office 文档上传并实现在线预览,用到了 LibreOffice 将 Office 文档转换为 PDF 文档,然后再用 pdfbox 将 PDF 转为图片。
  • 本文介绍借助 LibreOffice 将 Office 文当转换成 PDF 文档。
  • 本文同样适用于 OpenOffice,以下仅以 LibreOffice 为例。

相关文章

  • 配置 LibreOffice 环境:CentOS7 安装配置 LibreOffice 6.3.4.2 以及安装 windows 自带中文字体
  • 将 PDF 转换为图片:Java 利用 pdfbox 将 PDF 转换为图片

前言

  • 为了实现 Office 文档上传并实现在线预览功能,我试了 POI 和收费库 aspose,均不理想。
  • 但是将 PDF 转换为图片有成熟的方案,于是,问题变成了:如何将 Office 文档转换成 PDF。
  • LibreOffice 可将 Office 文当转换成 PDF文件,并且效果非常棒,和通过微软的 Office 直接另存为 PDF 文件的效果几乎一样。针对在线预览的需求,效果是最好的,也许是最好的解决方案。
  • 不过,前提是需要在服务器上安装 LibreOffice,为了测试,开发环境也需要安装,不过,好的是 LibreOffice 跨平台。

    尝试通过 LibreOffice 将 Office 文档直接转换成图片,可惜只得到第一页的内容,查看帮助,没有找到直接转换为图片的方法。

Java 利用 LibreOffice 将 Office 文档转换成 PDF

有两种转换方式,各有优劣,请自行选择。

异步转换

通过调用操作系统命令的方式实现,这个转换是异步的,根据文件的大小需要的时间不确定,如果在上传之后就要立即预览,需要用同步方式。

  • 优点:实现方式简单,不需要额外配置信息,不需要添加第三方依赖库(当然 LibreOffice 是必须要安装的)。
  • 缺点:发送指令之后,转换是否成功,是否有异常,无法获知,也就是说,转换是否成功,是不确定的。当然,通过严格的测试,一般还是可以保证转换的可靠性的。

同步转换

用到了 JodConverter:GitHub - sbraconnier/jodconverter: JODConverter automates document conversions using LibreOffice or Apache OpenOffice.

  • 优点:转换是同步的,转换成功与否是确定的。
  • 缺点:代码运行期需要启动 LibreOffice 服务,需要占用操作系统资源,相对于异步转换方式,需要依赖第三方库,需要额外配置信息。

完整代码

添加依赖(仅同步方式需要)

1
2
3
4
5

    org.jodconverter
    jodconverter-local
    4.2.4

在 resources 目录添加 libre.properties 文件(仅同步方式需要)

内容如下:

1
2
3
4
5
6
7
8
9
# LibreOffice主目录
libreOfficeHome=C:/dev/LibreOffice6.4
# 开启多个LibreOffice进程,每个端口对应一个进程
# portNumbers=2002,2003
portNumbers=2002
# 任务执行超时为5分钟
taskExecutionTimeoutMinutes=5
# 任务队列超时为1小时
taskQueueTimeoutHours=1

转换类 LibreOfficeUtil

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
package com.example.demo;

import com.example.factory.OfficeManagerInstance;
import org.jodconverter.JodConverter;

import java.io.File;

public class LibreOfficeUtil {
    /**
     * 利用 JodConverter 将 Offfice 文档转换为 PDF(要依赖 LibreOffice),该转换为同步转换,返回时就已经转换完成
     */
    public static boolean convertOffice2PDFSyncIsSuccess(File sourceFile, File targetFile) {
        try {
            OfficeManagerInstance.start();
            JodConverter.convert(sourceFile).to(targetFile).execute();
        } catch (Exception e) {
            e.printStackTrace();
            return false;
        }

        return true;
    }

    /**
     * 利用 LibreOffice 将 Office 文档转换成 PDF,该转换是异步的,返回时,转换可能还在进行中,转换是否有异常也未可知
     * @param filePath       目标文件地址
     * @param targetFilePath 输出文件夹
     * @return 子线程执行完毕的返回值
     */
    public static int convertOffice2PDFAsync(String filePath, String fileName, String targetFilePath) throws Exception {
        String command;
        int exitStatus;
        String osName = System.getProperty("os.name");
        String outDir = targetFilePath.length() > 0 ? " --outdir " + targetFilePath : "";

        if (osName.contains("Windows")) {
            command = "cmd /c cd /d " + filePath + " && start soffice --headless --invisible --convert-to pdf ./" + fileName + outDir;
        } else {
            command = "libreoffice6.3 --headless --invisible --convert-to pdf:writer_pdf_Export " + filePath + fileName + outDir;
        }

        exitStatus = executeOSCommand(command);
        return exitStatus;
    }

    /**
     * 调用操作系统的控制台,执行 command 指令
     * 执行该方法时,并没有等到指令执行完毕才返回,而是执行之后立即返回,返回结果为 0,只能说明正确的调用了操作系统的控制台指令,但执行结果如何,是否有异常,在这里是不能体现的,所以,更好的姿势是用同步转换功能。
     */
    private static int executeOSCommand(String command) throws Exception {
        Process process;
        process = Runtime.getRuntime().exec(command); // 转换需要时间,比如一个 3M 左右的文档大概需要 8 秒左右,但实际测试时,并不会等转换结束才执行下一行代码,而是把执行指令发送出去后就立即执行下一行代码了。

        int exitStatus = process.waitFor();

        if (exitStatus == 0) {
            exitStatus = process.exitValue();
        }

        // 销毁子进程
        process.destroy();
        return exitStatus;
    }
}

OfficeManagerInstance(仅同步方式需要)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
package com.example.factory;

import org.jodconverter.office.LocalOfficeManager;
import org.jodconverter.office.OfficeManager;
import org.springframework.core.io.support.PropertiesLoaderUtils;
import org.springframework.stereotype.Component;

import javax.annotation.PostConstruct;
import java.io.IOException;
import java.util.Properties;

/**
 * github https://github.com/uncleAndyChen
 * email [email protected]
 * homepage https://www.lovesofttech.com/
 * author andyChen
 * since 2020/02/29
 */
@Component
public class OfficeManagerInstance {
    private static OfficeManager INSTANCE = null;

    public static synchronized void start() {
        officeManagerStart();
    }

    @PostConstruct
    private void init() {
        try {
            Properties properties = PropertiesLoaderUtils.loadAllProperties("libre.properties");
            String[] portNumbers = properties.getProperty("portNumbers", "").split(",");
            int[] ports = new int[portNumbers.length];

            for (int i = 0; i < portNumbers.length; i++) {
                ports[i] = Integer.parseInt(portNumbers[i]);
            }

            LocalOfficeManager.Builder builder = LocalOfficeManager.builder().install();
            builder.officeHome(properties.getProperty("libreOfficeHome", ""));
            builder.portNumbers(ports);
            builder.taskExecutionTimeout(Integer.parseInt(properties.getProperty("taskExecutionTimeoutMinutes", "")) * 1000 * 60); // minute
            builder.taskQueueTimeout(Integer.parseInt(properties.getProperty("taskQueueTimeoutHours", "")) * 1000 * 60 * 60); // hour

            INSTANCE = builder.build();
            officeManagerStart();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    private static void officeManagerStart() {
        if (INSTANCE.isRunning()) {
            return;
        }

        try {
            INSTANCE.start();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

参考

https://github.com/sbraconnier/jodconverter/wiki/Getting-Started
Configuration · sbraconnier/jodconverter Wiki · GitHub
Java Library · sbraconnier/jodconverter Wiki · GitHub

遇到的坑

请移步:Maven 项目 jar 包依赖冲突导致运行期错误的排查方法

附:libreoffice6.3 转换帮助文档

libreoffice6.3 转换文档的用法,官方没有详细的在线文档,通过 -h 可以查看到详细的帮助,已经可以满足开发所需。
例如将一个文件转换为 pdf :libreoffice6.3 --headless --invisible --convert-to pdf:writer_pdf_Export ./奇妙的记忆力.pptx,后面可以指定保存 pdf 的目录,不指定就保存到当前目录。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
[root@ebs-60027 lib64]#  libreoffice6.3 -h
Usage: soffice [argument...]
       argument - switches, switch parameters and document URIs (filenames).   

Using without special arguments:                                               
Opens the start center, if it is used without any arguments.                   
   {file}              Tries to open the file (files) in the components        
                       suitable for them.                                      
   {file} {macro:///Library.Module.MacroName}                                  
                       Opens the file and runs specified macros from           
                       the file.                                               

Getting help and information:                                                  
   --help | -h | -?    Shows this help and quits.                              
   --helpwriter        Opens built-in or online Help on Writer.                
   --helpcalc          Opens built-in or online Help on Calc.                  
   --helpdraw          Opens built-in or online Help on Draw.                  
   --helpimpress       Opens built-in or online Help on Impress.               
   --helpbase          Opens built-in or online Help on Base.                  
   --helpbasic         Opens built-in or online Help on Basic scripting        
                       language.                                               
   --helpmath          Opens built-in or online Help on Math.                  
   --version           Shows the version and quits.                            
   --nstemporarydirectory                                                      
                       (MacOS X sandbox only) Returns path of the temporary    
                       directory for the current user and exits. Overrides     
                       all other arguments.                                    

General arguments:                                                             
   --quickstart[=no]   Activates[Deactivates] the Quickstarter service.        
   --nolockcheck       Disables check for remote instances using one           
                       installation.                                           
   --infilter={filter} Force an input filter type if possible. For example:    
                       --infilter="Calc Office Open XML"                     
                       --infilter="Text (encoded):UTF8,LF,,,"                
   --pidfile={file}    Store soffice.bin pid to {file}.                        
   --display {display} Sets the DISPLAY environment variable on UNIX-like      
                       platforms to the value {display} (only supported by a   
                       start script).                                          

User/programmatic interface control:                                           
   --nologo            Disables the splash screen at program start.            
   --minimized         Starts minimized. The splash screen is not displayed.   
   --nodefault         Starts without displaying anything except the splash    
                       screen (do not display initial window).                 
   --invisible         Starts in invisible mode. Neither the start-up logo nor 
                       the initial program window will be visible. Application 
                       can be controlled, and documents and dialogs can be     
                       controlled and opened via the API. Using the parameter, 
                       the process can only be ended using the taskmanager     
                       (Windows) or the kill command (UNIX-like systems). It   
                       cannot be used in conjunction with --quickstart.        
   --headless          Starts in "headless mode" which allows using the      
                       application without GUI. This special mode can be used  
                       when the application is controlled by external clients  
                       via the API.                                            
   --norestore         Disables restart and file recovery after a system crash.
   --safe-mode         Starts in a safe mode, i.e. starts temporarily with a   
                       fresh user profile and helps to restore a broken        
                       configuration.                                          
   --accept={connect-string}  Specifies a UNO connect-string to create a UNO   
                       acceptor through which other programs can connect to    
                       access the API. Note that API access allows execution   
                       of arbitrary commands.                                  
                       The syntax of the {connect-string} is:                  
                         connection-type,params;protocol-name,params           
                       e.g.  pipe,name={some name};urp                         
                         or  socket,host=localhost,port=54321;urp              
   --unaccept={connect-string}  Closes an acceptor that was created with       
                       --accept. Use --unaccept=all to close all acceptors.    
   --language={lang}   Uses specified language, if language is not selected    
                       yet for UI. The lang is a tag of the language in IETF   
                       language tag.                                           

Developer arguments:                                                           
   --terminate_after_init                                                      
                       Exit after initialization complete (no documents loaded)
   --eventtesting      Exit after loading documents.                           

New document creation arguments:                                               
The arguments create an empty document of specified kind. Only one of them may 
be used in one command line. If filenames are specified after an argument,     
then it tries to open those files in the specified component.                  
   --writer            Creates an empty Writer document.                       
   --calc              Creates an empty Calc document.                         
   --draw              Creates an empty Draw document.                         
   --impress           Creates an empty Impress document.                      
   --base              Creates a new database.                                 
   --global            Creates an empty Writer master (global) document.       
   --math              Creates an empty Math document (formula).               
   --web               Creates an empty HTML document.                         

File open arguments:                                                           
The arguments define how following filenames are treated. New treatment begins 
after the argument and ends at the next argument. The default treatment is to  
open documents for editing, and create new documents from document templates.  
   -n                  Treats following files as templates for creation of new 
                       documents.                                              
   -o                  Opens following files for editing, regardless whether   
                       they are templates or not.                              
   --pt {Printername}  Prints following files to the printer {Printername},    
                       after which those files are closed. The splash screen   
                       does not appear. If used multiple times, only last      
                       {Printername} is effective for all documents of all     
                       --pt runs. Also, --printer-name argument of             
                       --print-to-file switch interferes with {Printername}.   
   -p                  Prints following files to the default printer, after    
                       which those files are closed. The splash screen does    
                       not appear. If the file name contains spaces, then it   
                       must be enclosed in quotation marks.                    
   --view              Opens following files in viewer mode (read-only).       
   --show              Opens and starts the following presentation documents   
                       of each immediately. Files are closed after the showing.
                       Files other than Impress documents are opened in        
                       default mode , regardless of previous mode.             
   --convert-to OutputFileExtension[:OutputFilterName] \                      
     [--outdir output_dir] [--convert-images-to]                               
                       Batch convert files (implies --headless). If --outdir   
                       isn't specified, then current working directory is used 
                       as output_dir. If --convert-images-to is given, its     
                       parameter is taken as the target filter format for *all*
                       images written to the output format. If --convert-to is 
                       used more than once, the last value of                  
                       OutputFileExtension[:OutputFilterName] is effective. If 
                       --outdir is used more than once, only its last value is 
                       effective. For example:                                 
                   --convert-to pdf *.odt                                      
                   --convert-to epub *.doc                                     
                   --convert-to pdf:writer_pdf_Export --outdir /home/user *.doc
                   --convert-to "html:XHTML Writer File:UTF8" \             
                                --convert-images-to "jpg" *.doc              
                   --convert-to "txt:Text (encoded):UTF8" *.doc              
   --print-to-file [--printer-name printer_name] [--outdir output_dir]         
                       Batch print files to file. If --outdir is not specified,
                       then current working directory is used as output_dir.   
                       If --printer-name or --outdir used multiple times, only 
                       last value of each is effective. Also, {Printername} of 
                       --pt switch interferes with --printer-name.             
   --cat               Dump text content of the following files to console     
                       (implies --headless). Cannot be used with --convert-to. 
   --script-cat        Dump text content of any scripts embedded in the files  
                       to console (implies --headless). Cannot be used with    
                       --convert-to.                                           
   -env:[=] Set a bootstrap variable. For example: to set          
                       a non-default user profile path:                        
                       -env:UserInstallation=file:///tmp/test                  

Ignored switches:                                                              
   -psn                Ignored (MacOS X only).                                 
   -Embedding          Ignored (COM+ related; Windows only).                   
   --nofirststartwizard Does nothing, accepted only for backward compatibility.
   --protector {arg1} {arg2}                                                   
                       Used only in unit tests and should have two arguments.

参考地址

Java 利用 LibreOffice/OpenOffice 将 Office 文档(.doc/.docx .ppt/.pptx )转换成 PDF,进而转图片,实现在线预览功能 | 安迪陈技术日志,架构、感悟、系统分析、团队管理 | 自强不息,厚德载物

你可能感兴趣的:(java,开发语言)