PHP读取office word文档内容及图片

PHP读取word文档里的文字及图片,并保存

一、composer安装phpWord

composer require phpoffice/phpword

传送门:https://packagist.org/packages/phpoffice/phpword

 

二、phpWord 读取 docx 文档(注意是docx格式,doc格式不行

如果你的文件是doc格式,直接另存为一个docx就行了;如果你的doc文档较多,可以下一个批量转换工具:http://www.batchwork.com/en/doc2doc/download.htm

如果你还没配置自动加载,则先配置一下:

require './vendor/autoload.php';

加载文档:

$dir = str_replace('\\', '/', __DIR__) . '/';
$source = $dir . 'test.docx';
$phpWord = \PhpOffice\PhpWord\IOFactory::load($source);

 

三、关键点

1)对齐方式:PhpOffice\PhpWord\Style\Paragraph -> getAlignment()

2)字体名称:\PhpOffice\PhpWord\Style\Font -> getName()

3)字体大小:\PhpOffice\PhpWord\Style\Font -> getSize()

4)是否加粗:\PhpOffice\PhpWord\Style\Font -> isBold()

5)读取图片:\PhpOffice\PhpWord\Element\Image -> getImageStringData()

6)ba64格式图片数据保存为图片:file_put_contents($imageSrc, base64_decode($imageData))

 

四、完整代码

require './vendor/autoload.php';

function docx2html($source)
{
    $phpWord = \PhpOffice\PhpWord\IOFactory::load($source);
    $html = '';
    foreach ($phpWord->getSections() as $section) {
        foreach ($section->getElements() as $ele1) {
            $paragraphStyle = $ele1->getParagraphStyle();
            if ($paragraphStyle) {
                $html .= '

$paragraphStyle->getAlignment() .';text-indent:20px;">'; } else { $html .= '

'; } if ($ele1 instanceof \PhpOffice\PhpWord\Element\TextRun) { foreach ($ele1->getElements() as $ele2) { if ($ele2 instanceof \PhpOffice\PhpWord\Element\Text) { $style = $ele2->getFontStyle(); $fontFamily = mb_convert_encoding($style->getName(), 'GBK', 'UTF-8'); $fontSize = $style->getSize(); $isBold = $style->isBold(); $styleString = ''; $fontFamily && $styleString .= "font-family:{$fontFamily};"; $fontSize && $styleString .= "font-size:{$fontSize}px;"; $isBold && $styleString .= "font-weight:bold;"; $html .= sprintf('%s', $styleString, mb_convert_encoding($ele2->getText(), 'GBK', 'UTF-8') ); } elseif ($ele2 instanceof \PhpOffice\PhpWord\Element\Image) { $imageSrc = 'images/' . md5($ele2->getSource()) . '.' . $ele2->getImageExtension(); $imageData = $ele2->getImageStringData(true); // $imageData = 'data:' . $ele2->getImageType() . ';base64,' . $imageData; file_put_contents($imageSrc, base64_decode($imageData)); $html .= '$imageSrc .'" style="width:100%;height:auto">'; } } } $html .= '

'; } } return mb_convert_encoding($html, 'UTF-8', 'GBK'); } $dir = str_replace('\\', '/', __DIR__) . '/'; $source = $dir . 'test.docx'; echo docx2html($source);

 

五、补充

很明显,这是一个简陋的word读取示例,只读取了段落的对齐方式,文字的字体、大小、是否加粗及图片等信息,其他例如文字颜色、行高。。。等等信息都忽悠了。需要的话,请自行查看phpWord源码,看\PhpOffice\PhpWord\Style\xxx 和 \PhpOffice\PhpWord\Element\xxx 等类里有什么读取方法就可以了

你可能感兴趣的:(PHP读取office word文档内容及图片)