使用 PyPDF2 操作 pdf 文件

使用 PyPDF2 操作 pdf 文件

Python 中读写 pdf 文件最常用的模块是 PyPDF2。

PyPDF2 将读与写分成两个类来操作:

from PyPDF2 import PdfFileWriter, PdfFileReader

writer = PdfFileWriter()
reader = PdfFileReader(open("document1.pdf", "rb"))

如果是要修改一个已有的 pdf 文件,可以将 reader 的页面添加到 writer 中:

writer.appendPagesFromReader(reader)

添加书签:

writer.addBookmark(title, pagenum, parent=parent)

一个包含添加书签方法的类:

# -*- coding: utf-8 -*-

import os

from PyPDF2 import PdfFileWriter, PdfFileReader


class Pdf(object):
    def __init__(self, path):
        self.path = path
        reader = PdfFileReader(open(path, "rb"))
        self.writer = PdfFileWriter()
        self.writer.appendPagesFromReader(reader)
        self.writer.addMetadata(reader.getDocumentInfo())

    @property
    def new_path(self):
        name, ext = os.path.splitext(self.path)
        return name + '_new' + ext

    def add_bookmark(self, title, pagenum, parent=None):
        return self.writer.addBookmark(title, pagenum, parent=parent)

    def save_pdf(self):
        with open(self.new_path, 'wb') as out:
            self.writer.write(out)

官方示例:

from PyPDF2 import PdfFileWriter, PdfFileReader

output = PdfFileWriter()
input1 = PdfFileReader(open("document1.pdf", "rb"))

# print how many pages input1 has:
print "document1.pdf has %d pages." % input1.getNumPages()

# add page 1 from input1 to output document, unchanged
output.addPage(input1.getPage(0))

# add page 2 from input1, but rotated clockwise 90 degrees
output.addPage(input1.getPage(1).rotateClockwise(90))

# add page 3 from input1, rotated the other way:
output.addPage(input1.getPage(2).rotateCounterClockwise(90))
# alt: output.addPage(input1.getPage(2).rotateClockwise(270))

# add page 4 from input1, but first add a watermark from another PDF:
page4 = input1.getPage(3)
watermark = PdfFileReader(open("watermark.pdf", "rb"))
page4.mergePage(watermark.getPage(0))
output.addPage(page4)


# add page 5 from input1, but crop it to half size:
page5 = input1.getPage(4)
page5.mediaBox.upperRight = (
    page5.mediaBox.getUpperRight_x() / 2,
    page5.mediaBox.getUpperRight_y() / 2
)
output.addPage(page5)

# add some Javascript to launch the print window on opening this PDF.
# the password dialog may prevent the print dialog from being shown,
# comment the the encription lines, if that's the case, to try this out
output.addJS("this.print({bUI:true,bSilent:false,bShrinkToFit:true});")

# encrypt your new PDF and add a password
password = "secret"
output.encrypt(password)

# finally, write "output" to document-output.pdf
outputStream = file("PyPDF2-output.pdf", "wb")
output.write(outputStream)

其他接口可参考官方文档

你可能感兴趣的:(使用 PyPDF2 操作 pdf 文件)