More pdf2kmymoney (overflos/wrapping lines)
Brendan Coupe
brendan at coupeware.com
Thu Dec 31 23:40:52 GMT 2020
I have been reading along with interest. While I agree that normally
"Making things clear/easy for the end user is never a losing battle", in
this case I think you are hoping to make something that is extremely
difficult (almost impossible and ever changing) easy enough for a normal
user. You may lose this battle since you can not control the source of the
input data.
I have written a fair number of text processing scripts to grab data from
websites or from other forms. I have never tried to extract tables from a
PDF file. I do know that the slightest change in the source can result in
hours of troubleshooting to find and fix the problem, especially when it's
been a while since you worked with the script. I would never attempt to do
this from a bank PDF. Most of the time banks can't even get OFX files to
follow the OFX spec.
If you want to make it easy for the user, you only need one line:
Step 1: Switch banks ☺
Done.
I have accounts at many US banks and all either provide direct connect
access to OFX data or allow me to download OFX files from the website. I
understand this support is disappearing but for now it's still an option at
many US banks. Before I open an account, I see if it has direct connect
support in KMM.
----
Brendan Coupe
*----Brendan Coupe*
On Thu, Dec 31, 2020 at 4:16 PM Aaron Mehl <mehlzaidy770 at yahoo.com> wrote:
> Well,
> I hear you, but since I am not doing this for me, but for an average user,
> I ask my questions, experiment, and then write procedures showing how to
> import a pdf file.
> The minute I try to do my own/their own scripting I forget who my audience
> is. There is data about the education/intellectual level of the average
> user, and it rules out scripting.
>
> If there was already such a script it would be another matter.
> I am more interested in making it as easy as possible, I realize it won't
> be perfect.
> Making things clear/easy for the end user is never a losing battle.
> Aaron
>
> On Thursday, December 31, 2020, 06:08:42 PM EST, Jack <
> ostroffjh at users.sourceforge.net> wrote:
>
>
> I really hate to be negative, but I think you're fighting a losing
> battle. If you can program with almost any scripting language, and are
> willing to spend some time experimenting, you can likely pull together
> something that works for you, depending on how long you think the
> effort is worth.
>
> On the sign of transactions, how would KMM know whether it's a deposit
> or withdrawal? The csv import gives you two ways. First, the amount
> column needs to have minus signs on withdrawals. (There is a check box
> to reverse sign if the deposits show up as negative.) The other way is
> to have separate columns for credits and for debits. If the statement
> actually uses positive numbers for both, and doesn't give you any way
> to reverse the appropriate ones, you will probably end up with as much
> effort in post-import editing as you would have had just typing them in
> manually in the first place. Remember, you will probably also need to
> post-import adjust most of the categories.
>
> On 2020.12.31 17:22, Aaron Mehl wrote:
> > Just as an experiment I manually deleted the overflow lines..But
> > that isn't automatic.And as I read on and experiment, I think that
> > semi-automatic might be the best option.So to rephrase my
> > question:What is the best semi-automatic way to bring a pdf bank
> > statement into KMyMoney.
> > I see that without serious programming a converter (I googled and
> > tried a few) from text to Qif or to csv all require manual input.The
> > question is where in the food chain is the best place to make these
> > changes.I see that pdftotext doesn't like a wide column length, and I
> > gather there is no way to change it?Qif seems to want deposits listed
> > with a plus sign and expenses with a minus.There probably other
> > things that would need tweaking.
> > So I wonder what is the best way to get bank statements into
> > KMyMoney. My bank only lets me get a pdf.Aaron
> > On Thursday, December 31, 2020, 04:41:34 PM EST,
> > <pjfarley3 at earthlink.net> wrote:
> >
> > #yiv9995229445 #yiv9995229445 -- _filtered {} _filtered
> > {}#yiv9995229445 #yiv9995229445 p.yiv9995229445MsoNormal,
> > #yiv9995229445 li.yiv9995229445MsoNormal, #yiv9995229445
> > div.yiv9995229445MsoNormal
> > {margin:0in;font-size:11.0pt;font-family:sans-serif;}#yiv9995229445
> > a:link, #yiv9995229445 span.yiv9995229445MsoHyperlink
> > {color:blue;text-decoration:underline;}#yiv9995229445
> > span.yiv9995229445EmailStyle19
> > {font-family:sans-serif;color:windowtext;}#yiv9995229445
> > .yiv9995229445MsoChpDefault {font-size:10.0pt;} _filtered
> > {}#yiv9995229445 div.yiv9995229445WordSection1 {}#yiv9995229445
>
> > Jack,
> >
> >
> >
> > It is quite common in bank statement PDF’s to have transactions be
> > formatted like this (I hope the alignment works, I will format as
> > fixed-font to try to help):
> >
> >
> >
> > MM/DD/YY Payee Name Amount paid Running
> > balance
> >
> > Additional info about payment
> >
> > Can be multiple lines
> >
> >
> >
> > MM/DD/YY Next Payee Name Amount Paid Running
> > balance
> >
> >
> >
> > MM/DD/YY DEPOSIT Amount deposited Running
> > Balance
> >
> >
> >
> > So when the PDF is translated to text, those “additional info”
> > line(s) appear as separate physical lines without the MM/DD/YY header
> > or any money amounts following.
> >
> >
> >
> > Depending heavily on the PDF construction, I have also (but rarely)
> > seen the money amounts (paid or deposited and balance) show up on the
> > SECOND line after conversion of the PDF to text. The pdftotext
> > “-layout” switch has improved over time to where I seldom see this
> > any more, but it can happen.
> >
> >
> >
> > Like I said, it can get complicated.
> >
> >
> >
> > Peter
> >
> >
> >
> > From: KMyMoney <kmymoney-bounces at kde.org> On Behalf Of Jack
> > Sent: Thursday, December 31, 2020 3:14 PM
> > To: kmymoney at kde.org
> > Subject: Re: More pdf2kmymoney (overflos/wrapping lines)
> >
> >
> >
> > I started this yesterday, and I know there have been additional posts
> > since, but I think this particular point hasn't been resolved.
> >
> >
> >
> > On 12/30/20 8:59 PM, pjfarley3 at earthlink.net wrote:
> >
> >
> > In my experience pdftotext does not “overflow lines”. That is
> > probably “extra information” (i.e., “Memo” field data) related to the
> > transaction on the previous line. That is quite common in bank
> > statements. You have to expect such lines and be prepared to attach
> > them to the prior transaction. I do it as the “Memo” field in my
> > output.
> >
> >
> > Aaron would have to confirm, but I suspect he refers to a case where
> > a single table row as shown in the PDF has two rows of text in each
> > cell, becuase there is just too much text for one line. Because PDF
> > knows only about where exactly on the page any text is, but not why
> > it is there (no information about things like tables) the text output
> > would have two lines. The first would have the first line of text
> > from each cell, and the send would have the second line of text from
> > each cell. Putting them back together is theoretically possible, but
> > only if there is some way to know that the second line is not a new
> > row (missing header info?) or part of a manually controlled cleanup
> > phase of the conversion.
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kmymoney/attachments/20201231/0a5769aa/attachment.htm>
More information about the KMyMoney
mailing list