More pdf2kmymoney (overflos/wrapping lines)
Jack
ostroffjh at users.sourceforge.net
Thu Dec 31 23:08:23 GMT 2020
I really hate to be negative, but I think you're fighting a losing
battle. If you can program with almost any scripting language, and are
willing to spend some time experimenting, you can likely pull together
something that works for you, depending on how long you think the
effort is worth.
On the sign of transactions, how would KMM know whether it's a deposit
or withdrawal? The csv import gives you two ways. First, the amount
column needs to have minus signs on withdrawals. (There is a check box
to reverse sign if the deposits show up as negative.) The other way is
to have separate columns for credits and for debits. If the statement
actually uses positive numbers for both, and doesn't give you any way
to reverse the appropriate ones, you will probably end up with as much
effort in post-import editing as you would have had just typing them in
manually in the first place. Remember, you will probably also need to
post-import adjust most of the categories.
On 2020.12.31 17:22, Aaron Mehl wrote:
> Just as an experiment I manually deleted the overflow lines..But
> that isn't automatic.And as I read on and experiment, I think that
> semi-automatic might be the best option.So to rephrase my
> question:What is the best semi-automatic way to bring a pdf bank
> statement into KMyMoney.
> I see that without serious programming a converter (I googled and
> tried a few) from text to Qif or to csv all require manual input.The
> question is where in the food chain is the best place to make these
> changes.I see that pdftotext doesn't like a wide column length, and I
> gather there is no way to change it?Qif seems to want deposits listed
> with a plus sign and expenses with a minus.There probably other
> things that would need tweaking.
> So I wonder what is the best way to get bank statements into
> KMyMoney. My bank only lets me get a pdf.Aaron
> On Thursday, December 31, 2020, 04:41:34 PM EST,
> <pjfarley3 at earthlink.net> wrote:
>
> #yiv9995229445 #yiv9995229445 -- _filtered {} _filtered
> {}#yiv9995229445 #yiv9995229445 p.yiv9995229445MsoNormal,
> #yiv9995229445 li.yiv9995229445MsoNormal, #yiv9995229445
> div.yiv9995229445MsoNormal
> {margin:0in;font-size:11.0pt;font-family:sans-serif;}#yiv9995229445
> a:link, #yiv9995229445 span.yiv9995229445MsoHyperlink
> {color:blue;text-decoration:underline;}#yiv9995229445
> span.yiv9995229445EmailStyle19
> {font-family:sans-serif;color:windowtext;}#yiv9995229445
> .yiv9995229445MsoChpDefault {font-size:10.0pt;} _filtered
> {}#yiv9995229445 div.yiv9995229445WordSection1 {}#yiv9995229445
> Jack,
>
>
>
> It is quite common in bank statement PDF’s to have transactions be
> formatted like this (I hope the alignment works, I will format as
> fixed-font to try to help):
>
>
>
> MM/DD/YY Payee Name Amount paid Running
> balance
>
> Additional info about payment
>
> Can be multiple lines
>
>
>
> MM/DD/YY Next Payee Name Amount Paid Running
> balance
>
>
>
> MM/DD/YY DEPOSIT Amount deposited Running
> Balance
>
>
>
> So when the PDF is translated to text, those “additional info”
> line(s) appear as separate physical lines without the MM/DD/YY header
> or any money amounts following.
>
>
>
> Depending heavily on the PDF construction, I have also (but rarely)
> seen the money amounts (paid or deposited and balance) show up on the
> SECOND line after conversion of the PDF to text. The pdftotext
> “-layout” switch has improved over time to where I seldom see this
> any more, but it can happen.
>
>
>
> Like I said, it can get complicated.
>
>
>
> Peter
>
>
>
> From: KMyMoney <kmymoney-bounces at kde.org> On Behalf Of Jack
> Sent: Thursday, December 31, 2020 3:14 PM
> To: kmymoney at kde.org
> Subject: Re: More pdf2kmymoney (overflos/wrapping lines)
>
>
>
> I started this yesterday, and I know there have been additional posts
> since, but I think this particular point hasn't been resolved.
>
>
>
> On 12/30/20 8:59 PM, pjfarley3 at earthlink.net wrote:
>
>
> In my experience pdftotext does not “overflow lines”. That is
> probably “extra information” (i.e., “Memo” field data) related to the
> transaction on the previous line. That is quite common in bank
> statements. You have to expect such lines and be prepared to attach
> them to the prior transaction. I do it as the “Memo” field in my
> output.
>
>
> Aaron would have to confirm, but I suspect he refers to a case where
> a single table row as shown in the PDF has two rows of text in each
> cell, becuase there is just too much text for one line. Because PDF
> knows only about where exactly on the page any text is, but not why
> it is there (no information about things like tables) the text output
> would have two lines. The first would have the first line of text
> from each cell, and the send would have the second line of text from
> each cell. Putting them back together is theoretically possible, but
> only if there is some way to know that the second line is not a new
> row (missing header info?) or part of a manually controlled cleanup
> phase of the conversion.
>
More information about the KMyMoney
mailing list