java word doc转xml 解析word里面的树

最近做的一个项目遇到一个需要将word里面画的一个树导入到数据库,于是就想用doc 转成xml,然后再解析到数据库里面。
word里面的树是这样的:

java word doc转xml 解析word里面的树_第1张图片

转成xml后有了一下关系结构:
<o:relationtable v:ext="edit">
<o:rel v:ext="edit" idsrc="#_s1028" iddest="#_s1028"/>
<o:rel v:ext="edit" idsrc="#_s1029" iddest="#_s1028" idcntr="#_s1032"/>
<o:rel v:ext="edit" idsrc="#_s1030" iddest="#_s1028" idcntr="#_s1033"/>
<o:rel v:ext="edit" idsrc="#_s1117" iddest="#_s1028" idcntr="#_s1118"/>
<o:rel v:ext="edit" idsrc="#_s1161" iddest="#_s1028" idcntr="#_s1162"/>
o:relationtable>
格式转换找了网上好多方法都不好用,最后看到一个用word录制宏,然后用jacob调用宏的方法来实现批量转换。

宏代码:

Sub hong1()
'
' hong1 宏
'
'
  Dim name As String
    name = "01"
    For i = 1 To 4
    ChangeFileOpenDirectory "D:\doc\"
    Documents.Open filename:=name & ".doc", ConfirmConversions:=False, ReadOnly:= _
        False, AddToRecentFiles:=False, PasswordDocument:="", PasswordTemplate:= _
        "", Revert:=False, WritePasswordDocument:="", WritePasswordTemplate:="", _
        Format:=wdOpenFormatAuto, XMLTransform:=""
    ChangeFileOpenDirectory "D:\doc2xml\"
    ActiveDocument.SaveAs2 filename:=name & ".xml", FileFormat:=wdFormatFlatXML, _
        LockComments:=False, password:="", AddToRecentFiles:=True, WritePassword _
        :="", ReadOnlyRecommended:=False, EmbedTrueTypeFonts:=False, _
        SaveNativePictureFormat:=False, SaveFormsData:=False, SaveAsAOCELetter:= _
        False, CompatibilityMode:=11
        ActiveWindow.Close
        name = name + 1
        If name < 10 Then name = "0" & name
        Next i
End Sub

调用宏的java代码:

static void runMacros(String path) {
        ActiveXComponent word = new ActiveXComponent("Word.Application");
        Dispatch documents = word.getProperty("Documents").toDispatch();
        //String filename = "01.doc";
        File file = new File(path);
        File[] files = file.listFiles();
        for (File tf : files) {
            Dispatch document = Dispatch.call(documents, "Open", tf.getAbsolutePath()).toDispatch();
            Dispatch.call(word, "Run", new Variant("macro1"), new Variant(path), new Variant(tf.getName()),
                    new Variant(path), new Variant(tf.getName().substring(0,tf.getName().lastIndexOf("."))));
        }

        // Dispatch.call(documents, "Close");
    }
实现转换之后再用dom4j来解析xml树。基本搞定了。

你可能感兴趣的:(java)