python word解析 嵌套表格

研究生一直做文档解析相关,但是局限于段落文本内容, 对于表格解析没有涉及(如有疑问:可加微信13161411563),

如下图的嵌套表格:

python word解析 嵌套表格_第1张图片方法一:使用python-docx进行解析:

import docx
from docx.document import Document as _Document
from docx.oxml.text.paragraph import CT_P
from docx.oxml.table import CT_Tbl
from docx.table import _Cell, Table, _Row
from docx.text.paragraph import Paragraph
doc = docx.Document('test.docx')
def table_nested_parsing(cell, current_row, current_col):
    for block in cell._element:
        if isinstance(block, CT_P):
            print(Paragraph(block, cell).text)
        if isinstance(block, CT_Tbl):
            block = Table(block, cell)
            for row in range(len(block.rows)):
                for col in range(len(block.columns)):
                    cell_table = block.cell(row, col)
                    table_nested_parsing(cell_table, row, col)
        
def doc_parsing(doc):
    doc_list = []
    for doc_part in doc.element.body:
        if isinstance(doc_part, CT_P):
            print(Paragraph(doc_part, doc).text)
        if isinstance(doc_part, CT_Tbl):
            tb1 = Table(doc_part, doc)
            for row in range(len(tb1.rows)):
                for col in range(len(tb1.columns)):
                    cell_table = tb1.cell(row, col)
                    table_nested_parsing(cell_table, row, col)
                
if __name__ == "__main__":
    doc_parsing(doc)

方法二:使用libreoffice将word转换成xml,然后解析xml标签。

你可能感兴趣的:(解决方案)