Apache Tika - Apache Tika

发现和抽取文档元数据、文本内容,文件编码,字符集工具

The Apache Tika™ toolkit detects and extracts metadata and structured text content from various documents using existing parser libraries. You can find the latest release on the download page. See the Getting Started guide for instructions on how to start using Tika.

阅读全文……

你可能感兴趣的:(text)