More pdf2kmymoney (overflos/wrapping lines)

Jack ostroffjh at users.sourceforge.net
Thu Dec 31 20:14:21 GMT 2020


I started this yesterday, and I know there have been additional posts 
since, but I think this particular point hasn't been resolved.

On 12/30/20 8:59 PM, pjfarley3 at earthlink.net wrote:
> In my experience pdftotext does not “overflow lines”.  That is 
> probably “extra information” (i.e., “Memo” field data) related to the 
> transaction on the previous line.  That is quite common in bank 
> statements.  You have to expect such lines and be prepared to attach 
> them  to the prior transaction.   I do it as the “Memo” field in my 
> output.
Aaron would have to confirm, but I suspect he refers to a case where a 
single table row as shown in the PDF has two rows of text in each cell, 
becuase there is just too much text for one line.  Because PDF knows 
only about where exactly on the page any text is, but  not why it is 
there (no information about things like tables) the text output would 
have two lines.  The first would have the first line of text from each 
cell, and the send would have the second line of text from each cell.  
Putting them back together is theoretically possible, but only if there 
is some way to know that the second line is not a new row (missing 
header info?) or part of a manually controlled cleanup phase of the 
conversion.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kmymoney/attachments/20201231/6a6506b5/attachment.htm>


More information about the KMyMoney mailing list