Python-78 用pypdf2读取pdf文件里的txt,并写入制定的word文件 2020-09-28

用pypdf2读取pdf文件里的txt,并写入制定的word文件;
具体代码实现与解释如下:

import PyPDF2
import re

pdfFileObject = open(r"C:/Users/Mr.R/Desktop/test1/2.pdf", 'rb') #open the pdf file of the path

pdfReader = PyPDF2.PdfFileReader(pdfFileObject)#read the pdf file

print(" No. Of Pages :", pdfReader.numPages) #give the number of pages of PDF file

pageObject = pdfReader.getPage(0) #read the txt of page 1
txt = pageObject.extractText() #get the text

#print(txt)
#t1= print(txt)
#t2=str(t1)
pdfFileObject.close() #close the file
newtxt= str(txt) #transform into txt
print(newtxt)

file = open('C:/Users/Mr.R/Desktop/test1/1.doc','w')

print(file.write(str(newtxt)))
file.close()

运行后会形成对应的word文件,并把读取的文字写入其中。

你可能感兴趣的:(Python-78 用pypdf2读取pdf文件里的txt,并写入制定的word文件 2020-09-28)