Python代码库之PDF转换为图片poppler

Python代码库之PDF转换为图片

安装

1、下载后,在系统中环境变量中配置一下poppler的bin目录
http://macappstore.org/poppler/

2、运行
pip install pdf2image
在国内可以用下面的方式加速
pip3 install -i https://pypi.douban.com/simple pdf2image

样例代码

from pdf2image import convert_from_path, convert_from_bytes
pdf_pdf='D:\\pythondev\\dev\\abc.pdf'
outpath='D:\\pythondev\\dev\\output'
#指定一下文件格式,避免 MemoryError 
images = convert_from_path('D:\\pythondev\\dev\\abc.pdf', fmt='jpeg')
#还可以设置一下输出目录
images_from_path = convert_from_path(pdf_pdf, output_folder=outpath, fmt='png')
  • paths_only
    参数将返回图像路径而不是Image对象,以防止在转换大文件时发生内存泄露
    size parameter allows you to define the shape of the resulting images (-scale-to in pdftoppm CLI)
    size=400 will fit the image to a 400x400 box, preserving aspect ratio
    size=(400, None) will make the image 400 pixels wide, preserving aspect ratio
    size=(500, 500) will resize the image to 500x500 pixels, not preserving aspect ratio
    grayscale parameter allows you to convert images to grayscale (-gray in pdftoppm CLI)
    single_file parameter allows you to convert the first PDF page only, without adding digits at the end of the output_file
    Allow the user to specify poppler's installation path with poppler_path
    Fixed a bug where PNGs buffer with a non-terminating I-E-N-D sequence would throw an exception
    Fixed a bug that left open file descriptors when using convert_from_bytes() (Thank you @FabianUken)
    fmt='tiff' parameter allows you to create .tiff files (You need pdftocairo for this)

相关参数配置

convert_from_path(pdf_path, dpi=200, output_folder=None, first_page=None, last_page=None, fmt='ppm', thread_count=1, userpw=None, use_cropbox=False, strict=False, transparent=False, single_file=False, output_file=str(uuid.uuid4()), poppler_path=None, grayscale=False, size=None)

convert_from_bytes(pdf_file, dpi=200, output_folder=None, first_page=None, last_page=None, fmt='ppm', thread_count=1, userpw=None, use_cropbox=False, strict=False, transparent=False, single_file=False, output_file=str(uuid.uuid4()), poppler_path=None, grayscale=False, size=None)

参考链接

  • https://pypi.org/project/pdf2image/
  • https://stackoverflow.com/questions/56471728/how-to-solve-memoryerror-using-python-3-7-pdf2image-library

更多精彩代码请关注我的专栏

  • selenium & python 源码大全
  • reportlab教程和源码大全
  • python源码大全

你可能感兴趣的:(Python代码库之PDF转换为图片poppler)