[Okular-devel] Export from pdf to txt, invoking from the command line

Jiri Baum jiri at baum.com.au
Fri Nov 11 07:18:02 UTC 2011


Hello,

filippo di natale:
> I need to parse "csv" or "fixed length" like documents that are
> unfortunately in pdf format, if anyone has any suggestion on how to parse
> them without translating them to text...

The library that okular uses is Poppler - http://poppler.freedesktop.org

For "fixed length" like documents in pdf format, the recently-implemented 
"Table Selection Tool" might be useful - see very recent git master and/or 
bugs 279859 and 283440. That will let you select the "fixed length" part of 
the pdf document, divide it up into rows and columns, then paste into a 
spreadsheet or other tabular document.

If you need automated processing, there are things like TableSeer floating 
around, but be prepared for fairly moderate performance only - sometimes it 
finds and extracts the tables, sometimes it doesn't or only partially. It 
would probably depend on your documents. http://tableseer.sf.net


Jiri
-- 
Jiri Baum <jiri at baum.com.au>                   http://www.baum.com.au/sabik


More information about the Okular-devel mailing list