最难的是转PDF啦!最开始是使用XPDF来做,但是语言那么多,编码那么杂,上哪里去找合适的办法啊,而且要求在运行时调用.EXE文件,估计异常一大堆。
索性去找PDFBox,而且要命的是传说这个不支持中文!这个是一个开源的java项目,编码出来当然是java的啦,怎么用.NET调用呢?
正在郁闷毛躁的时候,我看到了一个外国博客上的文章studentclub.ro/lucians_weblog/archive/2007/03/22/read-from-a-pdf-file-using-c.aspx
文章如下:
know, this may seem like a simple task, and you will probably find references on the web about how to do this. But I’ll also write a blog post on this topic, as I came across this problem today.
So, if you have a PDF file and don’t know how to read data from it, here it is what you could do.
First of all, you’ll need some DLLs that will help you manipulate the PDF files. I came across the PDFBox. What is PDFBox? I’ll cite from their website: PDFBox is an open source Java PDF library for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. PDFBox also includes several command line utilities.
Oh, nice, you’ll say, but I need a .NET solution. Don’t worry. Even though PDFBox is written in Java, there is also a .NET version that is available. It utilizes IKVM (also, a very interesting project: an implementation of the Java language for .NET Framework and Mono) to create a fully functioning PDF library for the .NET framework. The released version contains a bin directory with all of the required DLL files.
So you’ll have to download the PDFBox package. In this package you’ll find a bin directory. To read your PDF file, you’ll need the following files:
You’ll have to add a reference to the first two in your project. You’ll also have to copy the last two on your project’s bin directory.
The program will look something like this (if you’re working with a Console application):
using System;
using org.pdfbox.pdmodel;
using org.pdfbox.util;
namespace PDFReader
{
class Program
{
static void Main(string[] args)
{
PDDocument doc = PDDocument.load("lopreacamasa.pdf");
PDFTextStripper pdfStripper = new PDFTextStripper();
Console.Write(pdfStripper.getText(doc));
}
}
}
哈哈,希望来了!
原来可以通过一个叫IKVM 的开源工具可以将java的库镜像到.net的版本下!
而且,更好的是PDFBox 0.7.3可以支持中文了!而且是很好的支持!
所以开发起来相当容易了 。
PS:在此纠正一下外国那小子的一个错误
在bin文件下面同样需要加载bcprov-jdk14-132.dll. 否则会报错,而且我找了半天才发现是少了这个引用库。
也就是说,转PDF的方法步骤如下:
1.下载PDFBox 0.7.3 sourceforge.net/project/showfiles.php
2.复制并加载如下5个DLL文件到bin目录下面
之后示例代码如下:
using org.pdfbox.pdmodel;
using org.pdfbox.util;
using org.pdfbox;
public string PdfReader(string filename)
{
string fullname = DocPath + filename;
PDDocument doc = PDDocument.load(fullname);
PDFTextStripper stripper = new PDFTextStripper();
string pdoc = stripper.getText(doc);
return pdoc;
}