antiword 安装与应用

由于openoffice的体积过于庞大(足足333.30MB!),而且使用的机会也比较少,就算用了只是用来看doc……
最终还是决定将其卸载,换成只有0.49MB的antiword,用来看doc最合适不过……
在使用的过程中发现了个小小的问题,就是在默认情况下antiword生成的文本把doc中的换行都改成了硬回车,明显有点多余……
用-h看了看,发现只须在“antiword”命令后"-w 0"(-w为设置行宽,0即无限制)便可解决问题……
 
 
一、安装
wget http://www.winfield.demon.nl/linux/antiword-0.37.tar.gz
tar xvzf antiword-0.37.tar.gz
cd antiword-0.37
make all
make install
 
二、英文文档
 
I know what you’re thinking: “Why not just use OpenOffice to get the text you need?” There’s a good reason. If you’ve ever used one word processor to get raw text from another you know that formatting is often left behind. End of line characters, etc can remain making the cutting and pasting of text from one source to another a problem (especially when going from a .doc file to an html end point.) This has caused me plenty of issues when I have written articles off-line to be pasted into, say, ghacks. I have seen formatting strings left behind only to have to go back and delete them.
When extracting text with a tool like antiword you won’t have this problem. And even though antiword is a command-line only tool, it isn’t complicated to install or use. With this tool you can either extract the text immediately to standard output (the terminal window) or you can extract it to a text. Both methods are simple, both are effective.
Installing antiword
The installation of antiword can be done two ways: Command line or GUI. If you want to use the GUI fire up your Add/Remove Software utility, do a search for antiword, select the results, and click apply. You will also want to install catdoc as well, which can be installed with the same method.
If you are partial to the command line you can open up a console and issue a command similar to:
sudo apt-get install antiword catdoc
yum install antiword catdoc
One of those is sure to install the applications on your machine.
Now, how is this tool used?
Basic usage
The basic structure of the antiword command is:
antiword [OPTIONS] file.doc
When the command structure above is used you will see the text from the .doc file scroll by in the console window. The options are not many, but are useful:
-a [PAPERSIZE] Output in Adobe PDF format. You have to specify the papersize for the document. Valid papersizes are: a3, a4, a5, b4, b5, executive, folio, legal, letter, note, quarto, statement, or tabloid.
-f Output in formatted text form. This will print bold text like *bold*, italics like /italics/, and underlined text as _underlined_.
-i This defines the image level. 0 = use non-standard Ghostscript extensions. 1 = No images. 2 = Postscript level 2. 3 = Postscript level 3.
-m Which unicode mapping file to use. You can find a listing of available mapping files in /usr/share/antiword.
So to see the text from file.doc you would issue the command:
antiword -f file.doc
which would quickly scroll the content of the file in the console window. Not much help unless you need to copy and past the final bit �C or you can maximize the console to see all of the text. Instead you can cat the text to a file like so:
antiword -f file.doc > file.txt
This text can now be viewed with the command:
less file.txt
PDF format
Let’s say you want to export the text from a .doc document into a .pdf document. Believe it or not this is simple as well. For this you will need the -p option along with the associated paper size. So let’s say we want to export the document into a letter sized PDF document. To do this issue the command:
antiword -p letter file.doc > file.pdf
You might run into mapping issues here. If you do most likely you will need to tell antiword to use the 8859-1 mapping with the command:
antiword -m 8859-1 -p file.doc > file.doc
The file.doc file will be a readable PDF document you can now use.
Final thoughts
Obviously this is only the “bare bones” of antiword. Using this command and others you really get creative and set up automated extraction scripts and much more. If you do much pasting into formats that can’t handle carriage returnes or end of line marks, antiword is the perfect solution for you.
 
 

本文出自 “mojo” 博客,谢绝转载!

你可能感兴趣的:(txt,word,Office,doc,休闲,antiword)