Pdf2htmlEX 的安装

Pdf2htmlEX的作者是国人,项目地址https://github.com/coolwanglu/pdf2htmlEX,可把PDF文件转换成html单一文件格式,最酷的是完全保持PDF的分页和各种编码、图形,还有公式格式。简而言之,PDF真的变成了一模一样的html文件。酷!!很多外国人也在用,这块目前似乎是独一无二(?欢迎有更好的大家留言给我)。

但美中不足的是,编译真心不轻松,笔者用了半天多的时间才算弄过,特此与大家分享一下。 
最初的测试环境是在ubuntu上,但生产环境用了Amazon Linux,所以此处可以分享给大家两个系统上的安装。

Amazon Linux 

1. 启用epel

yum-config-manager –enable epel 

2. 更新

yum -y update

3. 安装key 

cd /etc/pki/rpm-gpg/ 
wget http://mirrors.163.com/centos/RPM-GPG-KEY-CentOS-6 
rpm –import /etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-6

cd /etc/yum.repos.d 
wget http://linuxsoft.cern.ch/cern/scl/slc6-scl.repo

4.升级pip

pip install –upgrade setuptools

pip install lxml 
https://github.com/cms-sw/cms-docker/blob/master/slc6-vanilla/RPM-GPG-KEY-cern To -o /etc/pki/rpm-gpg/RPM-GPG-KEY-cern 
/etc/yum/pluginconf.d/priorities.conf and set enabled = 0

wget http://ftp5.gwdg.de/pub/opensuse/repositories/server:/mail/CentOS_6/x86_64/python-lxml-2.3.3-20.1.x86_64.rpm 
rpm -Uvh –nosignature python-lxml-2.3.3-20.1.x86_64.rpm

5. 安装包依赖

yum -y install libtool-ltdl-devel.x86_64 zlib-devel.x86_64 glib2-devel.x86_64 freetype-devel.x86_64 poppler-glib-devel.x86_64 git cmake mk-configure.noarch libjpeg-turbo.x86_64 libtiff.x86_64 libpng-devel.x86_64 giflib-devel.x86_64 libXt-devel.x86_64 autoconf automake libtool bzip2 libxml2.x86_64 libuninameslist-devel.x86_64 libspiro.x86_64 dbus-python-devel.x86_64 pango-devel.x86_64 chrpath uuid-c++.x86_64 uuid.x86_64 uthash-devel.noarch cmake gcc java-1.8.0-openjdk libpng-devel.x86_64 fontforge-devel.x86_64 cairo-devel.x86_64 poppler-devel.x86_64 libspiro-devel.x86_64 freetype-devel.x86_64 poppler-data libjpeg-turbo-devel git gcc-c++ libjpeg-turbo-devel.x86_64 poppler-data.noarch jpackage-utils.noarch gettext.x86_64 jpackage-utils.noarch python27-python-devel.x86_64 libxml2-python27.x86_64 libxml2-python26.x86_64 python27-python-devel.x86_64 libxslt-devel.x86_64 libxslt-python26.x86_64 libxslt.x86_64 libxml2-devel libxslt-devel python-devel python-javapackages.noarch –nogpgcheck install poppler-cpp.x86_64 poppler-cpp-devel.x86_64 libstdc++48-static.x86_64 openjpeg-devel.x86_64

6.安装库依赖

wget http://downloads.sourceforge.net/openjpeg.mirror/openjpeg-2.1.0.tar.gz 
tar -xzf openjpeg-2.1.0.tar.gz; cd openjpeg-2.1.0 
cmake . && make && make install

wget http://poppler.freedesktop.org/poppler-0.35.0.tar.xz

tar -xf poppler-0.35.0.tar.xz

./configure –enable-xpdf-headers –enable-libjpeg 

make && make install

git clone https://github.com/coolwanglu/fontforge.git fontforge.git

cd fontforge.git && git checkout pdf2htmlEX && ./autogen.sh && ./configure && make V=1 && sudo make install 
cp fontforge.pc /usr/local/lib/pkgconfig/ 
export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig 
vim CMakeLists.txt #adjust version

wget https://www.softwarecollections.org/repos/rhscl/mongodb24/epel-6-x86_64/javapackages-tools-3.4.1-1.1.el6.noarch.rpm 
wget ftp://bo.mirror.garr.it/pub/1/slc/centos/7.0.1406/os/x86_64/Packages/libxml2-2.9.1-5.el7.x86_64.rpm

git clone git://github.com/coolwanglu/pdf2htmlEX.git 
cd pdf2htmlEX && cmake . && make && sudo make install 
pkg-config –print-provides –cflags –libs poppler

7. 调整库

ln -s /usr/local/lib/libpoppler.so.54 /usr/lib64/libpoppler.so.54 
ln -s /usr/local/lib/libfontforge.so.2 /usr/lib64/libfontforge.so.2

8. 测试用的命令:

 pdf2htmlEX.exe  --hdpi 144 --vdpi 144 a.pdf a.html

Ubuntu:

1  安装git

yum install git –y

2  下载pdf2htmlEX源代码(最新version 0.12)

git clone https://github.com/coolwanglu/pdf2htmlEX.git

3  下载fontforge

git clone https://github.com/coolwanglu/fontforge.git

4  源代码安装autoconf,

版本至少在2.68以上, 不要用yum 安装,因为yum安装的版本是2.63, too old

5  yum install libtool patch libtool-ltdl-devel

解决找不到libtoolize patch
copying.lib' not found in /usr/share/libtool/libltdl'

6  升级python到2.7以上

参考此文http://www.gowhich.com/blog/553

7  升级pkg-config到版本2.8

###########################################################
tar -zxvf pkg-config-0.28.tar.gz 
cd pkg-config-0.28
./configure --with-internal-glib
make && make install

mv /usr/bin/pkg-config /usr/bin/pkg-config0.23
ln -s /usr/local/bin/pkg-config /usr/bin/pkg-config
pkg-config --version验证
###########################################################

8  yum install glib2-devel    

###########################################################
解决checking for GLIB... configure: error: Package requirements (glib-2.0 >= 2.6 gio-2.0) were not met:
No package 'glib-2.0' found
No package 'gio-2.0' found
参考 https://github.com/fontforge/fontforge/issues/564
###########################################################

9  安装fontforge

###########################################################
cd fontforge
yum install gettext
防止make过程中报错msgfmt: Command not found,则需要安装gettext以提供msgfmt命令, 参考https://github.com/coolwanglu/pdf2htmlEX/issues/118

./boostrap

./configure --without-libzmq --without-x --without-iconv --disable-python-scripting --disable-python-extension  #yum install pango-devel(如果./configure不成功)
make
make install
###########################################################

10  可选安装Cairo  (为生成SVG背景图片并且处理Type3字体)

###########################################################
安装freetype2
不要安装版本2.5.1 2.5.1遇到ahronbd.ttf等字体时, fc-chae -fv会导致freetype崩溃http://sourceforge.net/projects/freetype/files/freetype2/2.5.2/
tar -zxvf freetype-2.5.2.tar.gz
cd freetype-2.5.2
./configure
make 
make install

安装pixman
wget http://cairographics.org/releases/pixman-0.32.4.tar.gz
tar -zxvf pixman-0.32.4.tar.gz
cd pixman-0.32.4
./configure
make 
make install

export png_REQUIRES="libpng"
解决configure: error: recommended PNG functions feature could not be enabled  参考http://mattgwwalker.wordpress.com/2010/01/07/cairo-configure-issues/
xz -d cairo-1.12.14.tar.xz
tar -xvf cairo-1.12.14.tar
cd cairo-1.12.14
./configure
make 
make install

11  可选 为TTF字体增加提示信息

###########################################################
安装harfbuzz >= 0.9.19
wget http://www.freedesktop.org/software/harfbuzz/release/harfbuzz-0.9.33.tar.bz2
tar jxf harfbuzz-0.9.33.tar.bz2
cd harfbuzz-0.9.33
./configure
make 
make install

安装qt4
yum install qt4-devel

安装ttfautohint
tar -zxvf ttfautohint-1.00.tar.gz
cd ttfautohint-1.00
./configure
make 
make install
###########################################################

12  安装pdf2htmlEX

###########################################################
cd pdf2htmlEX

配置好poppler和libfontforge
export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/share/fontconfig:/usr/local/lib/pkgconfig:/usr/lib64/pkgconfig:/usr/share/pkgconfig:/usr/share/glib-2.0:/usr/lib64/gio

升级gcc,使之支持lamba表达式  参考文章http://wecoding.cn/?p=190
wget http://people.centos.org/tru/devtools-2/devtools-2.repo -O /etc/yum.repos.d/devtools-2.repo
yum install devtoolset-2-gcc devtoolset-2-binutils devtoolset-2-gcc-c++
mv /usr/bin/gcc /usr/bin/gcc.backup
ln -s /opt/rh/devtoolset-2/root/usr/bin/gcc /usr/bin/gcc

mv /usr/bin/g++ /usr/bin/g++.backup
ln -s /opt/rh/devtoolset-2/root/usr/bin/g++ /usr/bin/g++
gcc -v 查看版本,查看是否支持C++

如果有CMakeCache.txt,要先删除 rm CMakeCache.txt

cmake . -DCMAKE_C_COMPILER=/usr/bin/gcc -DCMAKE_CXX_COMPILER=/usr/bin/g++
make 

if echo fatal error: glib.h no such file or directory
cp /usr/include/glib-2.0/glib.h  /usr/include/

if echo fatal error: glib/galloca.h: 没有那个文件或目录
mkdir /usr/include/glib
cp /usr/include/glib-2.0/glib/*.h /usr/include/glib

if echo fatal error: glibconfig.h: 没有那个文件或目录
cp /usr/lib64/glibconfig.h /usr/include

if echo fatal error: glib-object.h: 没有那个文件或目录
mkdir /usr/include/gobject
cp /usr/include/glib-2.0/gobject/*.h /usr/include/gobject/

把cairo export 到PKG_CONFIG_PATH
export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/cairo:/usr/local/include/cairo
make install
pdf2htmlEX -v

if echo pdf2htmlEX: error while loading shared libraries: libfontforge.so.2: cannot open shared object file: No such file or directory
cp /opt/fontforge/fontforge/.libs/libfontforge.so.2 /usr/local/include/poppler/

if echo  pdf2htmlEX: error while loading shared libraries: libpoppler.so.45: cannot open shared object file: No such file or directory
cp /opt/poppler-0.25.0/poppler/.libs/libpoppler.so.45 /usr/local/include/poppler/

#要把poppler的路径export到LD_LIBRARY_PATH  (或者在编译poppler的时候,通过./configure --prefix=/usr 来避免)
export LD_LIBRARY_PATH=/usr/local/include/poppler/

为了长久有效,LD_LIBRARY_PATH写入/etc/profile
vi /etc/profile
LD_LIBRARY_PATH=/usr/local/include/poppler/
export LD_LIBRARY_PATH
source /etc/profile

如果crontab定时任务出现
sh java commond not found 
先检查是否配置java环境变量 参考http://www.cnblogs.com/zhoulf/archive/2013/02/04/2891608.html
然后软链java bin的目录到/usr/bin/java  
例如: ln -s /usr/local/jdk/java/bin   /usr/bin/java

pdf2htmlEX: error while loading shared libraries: libfontforge.so.2: cannot open shared object file: No such file or directory
cd /usr/local/include/poppler/libpoppler.so.45 /usr/lib
cd /usr/local/include/poppler/libpoppler.so.45 /usr/lib64
cd /usr/local/include/poppler/libpoppler.so.45 /usr/lib64

参考资料:

http://www.ibm.com/developerworks/cn/linux/l-cn-cmake/

http://blog.atime.me/note/cmake.html

http://www.hyzgame.com.cn/?p=1631

http://www.cmake.org/cmake-tutorial/

http://sewm.pku.edu.cn/src/paradise/reference/CMake%20Practice.pdf

http://www.linuxfromscratch.org/blfs/view/svn/general/poppler.html

http://www.linuxfromscratch.org/blfs/view/svn/general/openjpeg2.html

https://github.com/coolwanglu/pdf2htmlEX/issues/420

http://unix.stackexchange.com/questions/118550/autoreconf-fails-with-cant-exec-libtoolize

http://unix.stackexchange.com/questions/118550/autoreconf-fails-with-cant-exec-libtoolize

https://github.com/klokoy/pdf2htmlEX_docker/blob/master/Dockerfile

https://github.com/aur-archive/pdf2htmlex/blob/master/PKGBUILD

http://blog.mathieu-leplatre.info/static-build-of-cairo-and-librsvg.html

https://github.com/coolwanglu/pdf2htmlEX

你可能感兴趣的:(linux,python,python,Linux)