PyPDF2

安装

直接使用 pip 安装就可以了
pip install PyPDF2

PyPDF2 包含了 PdfFileReader PdfFileMerger PageObject PdfFileWriter 四个常用的主要 Class。

 简单读写 PDF

from PyPDF2 import PdfFileReader, PdfFileWriter
infn = 'infn.pdf'
outfn = 'outfn.pdf'
# 获取一个 PdfFileReader 对象
pdf_input = PdfFileReader(open(infn, 'rb'))
# 获取 PDF 的页数
page_count = pdf_input.getNumPages()
print(page_count)
# 返回一个 PageObject
page = pdf_input.getPage(i)

# 获取一个 PdfFileWriter 对象
pdf_output = PdfFileWriter()
# 将一个 PageObject 加入到 PdfFileWriter 中
pdf_output.addPage(page)
# 输出到文件中
pdf_output.write(open(outfn, 'wb'))

应用实例 合并分割 PDF

from PyPDF2 import PdfFileReader, PdfFileWriter

def split_pdf(infn, outfn):
    pdf_output = PdfFileWriter()
    pdf_input = PdfFileReader(open(infn, 'rb'))
    # 获取 pdf 共用多少页
    page_count = pdf_input.getNumPages()
    print(page_count)
    # 将 pdf 第五页之后的页面,输出到一个新的文件
    for i in range(5, page_count):
        pdf_output.addPage(pdf_input.getPage(i))
    pdf_output.write(open(outfn, 'wb'))

def merge_pdf(infnList, outfn):
    pdf_output = PdfFileWriter()
    for infn in infnList:
        pdf_input = PdfFileReader(open(infn, 'rb'))
        # 获取 pdf 共用多少页
        page_count = pdf_input.getNumPages()
        print(page_count)
        for i in range(page_count):
            pdf_output.addPage(pdf_input.getPage(i))
    pdf_output.write(open(outfn, 'wb'))

if __name__ == '__main__':
    infn = 'infn.pdf'
    outfn = 'outfn.pdf'
    split_pdf(infn, outfn)

应用实例源代码可以在 github.com/xchaoinfo/Py 找到。

Refer: PyPDF2 Documentation

转自:https://zhuanlan.zhihu.com/p/26647491

Easy Concatenation with pdfcat

PyPDF2 contains a growing variety of sample programs meant to demonstrate its features. It also contains useful scripts such as pdfcat, located within the Scripts folder. This script makes it easy to concatenate PDF files by using Python slicing syntax. Because we are slicing PDF pages, we refer to the slices as page ranges.

Page range expression examples:

: all pages -1 last page
22 just the 23rd page :-1 all but the last page
0:3 the first three pages -2 second-to-last page
:3 the first three pages -2: last two pages
5: from the sixth page onward -3:-1 third & second to last

The third stride or step number is also recognized:

::2 0 2 4 ... to the end
1:10:2 1 3 5 7 9
::-1 all pages in reverse order
3:0:-1 3 2 1 but not 0
2::-1 2 1 0

Usage for pdfcat is as follows:

>>> pdfcat [-h] [-o output.pdf] [-v] input.pdf [page_range...] ...

You can add as many input files as you like. You may also specify as many page ranges as needed for each file.

Optional arguments:
-h--help Show the help message and exit
-o--output Follow this argument with the output PDF file. Will be created if it doesn’t exist.
-v--verbose Show page ranges as they are being read

Examples:

>>> pdfcat -o output.pdf head.pdf content.pdf :6 7: tail.pdf -1

Concatenates all of head.pdf, all but page seven of content.pdf, and the last page of tail.pdf, producing output.pdf.

>>> pdfcat chapter*.pdf >book.pdf

You can specify the output file by redirection.

>>> pdfcat chapter?.pdf chapter10.pdf >book.pdf

In case you don’t want chapter 10 before chapter 2.



其它转化工具:pdftk

用法:

pdftk 1.pdf 2.pdf 3.pdf cat output 123.pdf


你可能感兴趣的:(python)