More pdf2kmymoney (overflos/wrapping lines)

Aaron Mehl mehlzaidy770 at yahoo.com
Thu Dec 31 22:22:13 GMT 2020


 Just as an experiment I manually deleted the overflow lines..But that isn't automatic.And as I read on and experiment, I think that semi-automatic might be the best option.So to rephrase my question:What is the best semi-automatic way to bring a pdf bank statement into KMyMoney.
I see that without serious programming a converter (I googled and tried a few) from text to Qif or to csv all require manual input.The question is where in the food chain is the best place to make these changes.I see that pdftotext doesn't like a wide column length, and I gather there is no way to change it?Qif seems to want deposits listed with a plus sign and expenses with a minus.There probably other things that would need tweaking.
So I wonder what is the best way to get bank statements into KMyMoney. My bank only lets me get a pdf.Aaron
    On Thursday, December 31, 2020, 04:41:34 PM EST, <pjfarley3 at earthlink.net> wrote:  
 
 #yiv9995229445 #yiv9995229445 -- _filtered {} _filtered {}#yiv9995229445 #yiv9995229445 p.yiv9995229445MsoNormal, #yiv9995229445 li.yiv9995229445MsoNormal, #yiv9995229445 div.yiv9995229445MsoNormal {margin:0in;font-size:11.0pt;font-family:sans-serif;}#yiv9995229445 a:link, #yiv9995229445 span.yiv9995229445MsoHyperlink {color:blue;text-decoration:underline;}#yiv9995229445 span.yiv9995229445EmailStyle19 {font-family:sans-serif;color:windowtext;}#yiv9995229445 .yiv9995229445MsoChpDefault {font-size:10.0pt;} _filtered {}#yiv9995229445 div.yiv9995229445WordSection1 {}#yiv9995229445 
Jack,

  

It is quite common in bank statement PDF’s to have transactions be formatted like this (I hope the alignment works, I will format as fixed-font to try to help):

  

MM/DD/YY   Payee Name                 Amount paid          Running balance

           Additional info about payment

           Can be multiple lines

  

MM/DD/YY   Next Payee Name            Amount Paid          Running balance

  

MM/DD/YY   DEPOSIT                    Amount deposited      Running Balance

  

So when the PDF is translated to text, those “additional info” line(s) appear as separate physical lines without the MM/DD/YY header or any money amounts following.

  

Depending heavily on the PDF construction, I have also (but rarely) seen the money amounts (paid or deposited and balance) show up on the SECOND line after conversion of the PDF to text.  The pdftotext “-layout” switch has improved over time to where I seldom see this any more, but it can happen.

  

Like I said, it can get complicated.

  

Peter

  

From: KMyMoney <kmymoney-bounces at kde.org> On Behalf Of Jack
Sent: Thursday, December 31, 2020 3:14 PM
To: kmymoney at kde.org
Subject: Re: More pdf2kmymoney (overflos/wrapping lines)

  

I started this yesterday, and I know there have been additional posts since, but I think this particular point hasn't been resolved.

  

On 12/30/20 8:59 PM, pjfarley3 at earthlink.net wrote:


In my experience pdftotext does not “overflow lines”.  That is probably “extra information” (i.e., “Memo” field data) related to the transaction on the previous line.  That is quite common in bank statements.  You have to expect such lines and be prepared to attach them  to the prior transaction.   I do it as the “Memo” field in my output. 


Aaron would have to confirm, but I suspect he refers to a case where a single table row as shown in the PDF has two rows of text in each cell, becuase there is just too much text for one line.  Because PDF knows only about where exactly on the page any text is, but  not why it is there (no information about things like tables) the text output would have two lines.  The first would have the first line of text from each cell, and the send would have the second line of text from each cell.  Putting them back together is theoretically possible, but only if there is some way to know that the second line is not a new row (missing header info?) or part of a manually controlled cleanup phase of the conversion. 
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kmymoney/attachments/20201231/1ab5c172/attachment-0001.htm>


More information about the KMyMoney mailing list