使用PyPDF2合并PDF文件

借助软件合成(有些是收费的)或者在线的(不安全),也可以使用强大的python库PyPDF2

PyPDF2安装

pip或者conda

代码

# -*- coding:utf-8*-
from PyPDF2 import PdfFileMerger
#保存需要合并的文件路径,我这里是本地路径,也可以借助os等库实现合并一个文件夹下所有pdf
paths = ['PDF1.pdf', 'PDF2.pdf']
# merge_pdfs(paths, output='merged.pdf')
file_merger = PdfFileMerger()
for pdf in paths:
    file_merger.append(pdf)
file_merger.write("merge.pdf")

可以说是非常简单了

遇到的问题

运行时,提示如下错误:

latin-1′ codec can’t encode characters in position 8-11: ordinal not in range(256)

网上查阅得知是中文编码的原因。

解决

需要更改PyPDF2库中的代码,首先找到PyPDF2的安装位置,我的是D:\Anaconda\Lib\site-packages\PyPDF2,然后改下面两个文件:

  • generic.py
    大概在483-488行,原代码是:
try:
    return NameObject(name.decode('utf-8'))
except (UnicodeEncodeError, UnicodeDecodeError) as e:
   # Name objects should represent irregular characters
   # with a '#' followed by the symbol's hex number
   if not pdf.strict:
      warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
      return NameObject(name)
   else:
      raise utils.PdfReadError("Illegal character in Name Object")

改为:

try:
     return NameObject(name.decode('utf-8'))
 except (UnicodeEncodeError, UnicodeDecodeError) as e:
     try:
         return NameObject(name.decode('gbk'))
     except (UnicodeEncodeError, UnicodeDecodeError) as e:
         # Name objects should represent irregular characters
         # with a '#' followed by the symbol's hex number
         if not pdf.strict:
             warnings.warn("Illegal character in Name Object", utils.PdfReadWarning)
             return NameObject(name)
         else:
             raise utils.PdfReadError("Illegal character in Name Object")
  • utils.py
    大概在238行,原代码为:
r = s.encode('latin-1')
if len(s) < 2:
    bc[s] = r
return r

改为:

try:
    r = s.encode('latin-1')
    if len(s) < 2:
        bc[s] = r
    return r
except Exception as e:
    print(s)
    r = s.encode('utf-8')
    if len(s) < 2:
        bc[s] = r
    return r

完成。
参考文章:https://www.codenong.com/cs105218309/

题外话

第一次发布文章!!!!!!
2021年希望俺和小明一切顺利!!!!!!


image.png

你可能感兴趣的:(使用PyPDF2合并PDF文件)