[Okular-devel] Export from pdf to txt, invoking from the command line
Jiri Baum
jiri at baum.com.au
Fri Nov 11 07:18:02 UTC 2011
Hello,
filippo di natale:
> I need to parse "csv" or "fixed length" like documents that are
> unfortunately in pdf format, if anyone has any suggestion on how to parse
> them without translating them to text...
The library that okular uses is Poppler - http://poppler.freedesktop.org
For "fixed length" like documents in pdf format, the recently-implemented
"Table Selection Tool" might be useful - see very recent git master and/or
bugs 279859 and 283440. That will let you select the "fixed length" part of
the pdf document, divide it up into rows and columns, then paste into a
spreadsheet or other tabular document.
If you need automated processing, there are things like TableSeer floating
around, but be prepared for fairly moderate performance only - sometimes it
finds and extracts the tables, sometimes it doesn't or only partially. It
would probably depend on your documents. http://tableseer.sf.net
Jiri
--
Jiri Baum <jiri at baum.com.au> http://www.baum.com.au/sabik
More information about the Okular-devel
mailing list