python pdf提取表格_如何使用Python从PDF中提取表格作为文本?

我有一个PDF,其中包含表格,文本和一些图像.我想在PDF中的表格中提取表格.

现在我正在手动查找页面中的表格.从那里我捕获该页面并保存到另一个PDF.

import PyPDF2

PDFfilename = "Sammamish.pdf" #filename of your PDF/directory where your PDF is stored

pfr = PyPDF2.PdfFileReader(open(PDFfilename, "rb")) #PdfFileReader object

pg4 = pfr.getPage(126) #extract pg 127

writer = PyPDF2.PdfFileWriter() #create PdfFileWriter object

#add pages

writer.addPage(pg4)

NewPDFfilename = "allTables.pdf" #filename of your PDF/directory where you want your new PDF to be

with open(NewPDFfilename, "wb") as outputStream:

writer.write(outputStream) #write pages to new PDF

我的目标是从整个PDF文档中提取表格.

你可能感兴趣的:(python,pdf提取表格)