logo SCIE PDF Text Extractor

This is an optimized version of Apache PDFBox. It allows to extract the rough structure of a document (pages, blocks of text and paragraphs as well as formatting information) and was made with the intent to optimize text extraction results for scientific papers. The output can easily be transformed to plaintext (toString) or to an XML format (toXML).

homepage: openresearch.cit-ec.de/projects/scie
fresh index:
last release: 2 years ago, first release: 2 years ago
packaging: jar
get this artifact from: central
see this artifact on: search.maven.org

This chart shows how much is this artifact used as a dependency in other Maven artifacts in Central repository and GitHub:

© Jiri Pinkas 2015 - 2016. Admin login To submit bugs / feature requests please use this github page
related: JavaVids | Top Java Blogs | Java školení | monitored using: sitemonitoring
Apache and Apache Maven are trademarks of the Apache Software Foundation. The Central Repository is a service mark of Sonatype, Inc.