<div dir="ltr">I have been reading along with interest. While I agree that normally "Making things clear/easy for the end user is never a losing battle", in this case I think you are hoping to make something that is extremely difficult (almost impossible and ever changing) easy enough for a normal user. You may lose this battle since you can not control the source of the input data. <div><br></div><div>I have written a fair number of text processing scripts to grab data from websites or from other forms. I have never tried to extract tables from a PDF file. I do know that the slightest change in the source can result in hours of troubleshooting to find and fix the problem, especially when it's been a while since you worked with the script. I would never attempt to do this from a bank PDF. Most of the time banks can't even get OFX files to follow the OFX spec.</div><div><br></div><div>If you want to make it easy for the user, you only need one line:</div><div><br></div><div>Step 1: Switch banks ☺</div><div><br></div><div>Done.</div><div><br></div><div>I have accounts at many US banks and all either provide direct connect access to OFX data or allow me to download OFX files from the website. I understand this support is disappearing but for now it's still an option at many US banks. Before I open an account, I see if it has direct connect support in KMM.<br><br>----<br>Brendan Coupe<br></div></div><br clear="all"><div><div dir="ltr" class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><b><span style="font-family:monospace"><span style="color:rgb(0,0,0)"><font size="1"><br></font></span></span></b></div><div dir="ltr"><b><span style="font-family:monospace"><span style="color:rgb(0,0,0)"><font size="1">----<br>Brendan Coupe</font></span></span></b><br></div></div></div></div></div></div></div></div></div></div></div><br><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Dec 31, 2020 at 4:16 PM Aaron Mehl <<a href="mailto:mehlzaidy770@yahoo.com">mehlzaidy770@yahoo.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><div style="font-family:"Helvetica Neue",Helvetica,Arial,sans-serif;font-size:16px"><div></div>
<div dir="ltr">Well,</div><div dir="ltr">I hear you, but since I am not doing this for me, but for an average user, I ask my questions, experiment, and then write procedures showing how to import a pdf file.</div><div dir="ltr">The minute I try to do my own/their own scripting I forget who my audience is. There is data about the education/intellectual level of the average user, and it rules out scripting.</div><div dir="ltr"><br></div><div dir="ltr">If there was already such a script it would be another matter.</div><div dir="ltr">I am more interested in making it as easy as possible, I realize it won't be perfect.</div><div dir="ltr">Making things clear/easy for the end user is never a losing battle.</div><div dir="ltr">Aaron </div><div><br></div>
</div><div id="gmail-m_-6026764885946615029yahoo_quoted_0148785097">
<div style="font-family:"Helvetica Neue",Helvetica,Arial,sans-serif;font-size:13px;color:rgb(38,40,42)">
<div>
On Thursday, December 31, 2020, 06:08:42 PM EST, Jack <<a href="mailto:ostroffjh@users.sourceforge.net" target="_blank">ostroffjh@users.sourceforge.net</a>> wrote:
</div>
<div><br></div>
<div><br></div>
<div>I really hate to be negative, but I think you're fighting a losing <br clear="none">battle. If you can program with almost any scripting language, and are <br clear="none">willing to spend some time experimenting, you can likely pull together <br clear="none">something that works for you, depending on how long you think the <br clear="none">effort is worth.<br clear="none"><br clear="none">On the sign of transactions, how would KMM know whether it's a deposit <br clear="none">or withdrawal? The csv import gives you two ways. First, the amount <br clear="none">column needs to have minus signs on withdrawals. (There is a check box <br clear="none">to reverse sign if the deposits show up as negative.) The other way is <br clear="none">to have separate columns for credits and for debits. If the statement <br clear="none">actually uses positive numbers for both, and doesn't give you any way <br clear="none">to reverse the appropriate ones, you will probably end up with as much <br clear="none">effort in post-import editing as you would have had just typing them in <br clear="none">manually in the first place. Remember, you will probably also need to <br clear="none">post-import adjust most of the categories.<br clear="none"><br clear="none">On 2020.12.31 17:22, Aaron Mehl wrote:<br clear="none">> Just as an experiment I manually deleted the overflow lines..But <br clear="none">> that isn't automatic.And as I read on and experiment, I think that <br clear="none">> semi-automatic might be the best option.So to rephrase my <br clear="none">> question:What is the best semi-automatic way to bring a pdf bank <br clear="none">> statement into KMyMoney.<br clear="none">> I see that without serious programming a converter (I googled and <br clear="none">> tried a few) from text to Qif or to csv all require manual input.The <br clear="none">> question is where in the food chain is the best place to make these <br clear="none">> changes.I see that pdftotext doesn't like a wide column length, and I <br clear="none">> gather there is no way to change it?Qif seems to want deposits listed <br clear="none">> with a plus sign and expenses with a minus.There probably other <br clear="none">> things that would need tweaking.<br clear="none">> So I wonder what is the best way to get bank statements into <br clear="none">> KMyMoney. My bank only lets me get a pdf.Aaron<br clear="none">> On Thursday, December 31, 2020, 04:41:34 PM EST, <br clear="none">> <<a shape="rect" href="mailto:pjfarley3@earthlink.net" target="_blank">pjfarley3@earthlink.net</a>> wrote:<br clear="none">> <br clear="none">> #yiv9995229445 #yiv9995229445 -- _filtered {} _filtered <br clear="none">> {}#yiv9995229445 #yiv9995229445 p.yiv9995229445MsoNormal, <br clear="none">> #yiv9995229445 li.yiv9995229445MsoNormal, #yiv9995229445 <br clear="none">> div.yiv9995229445MsoNormal <br clear="none">> {margin:0in;font-size:11.0pt;font-family:sans-serif;}#yiv9995229445 <br clear="none">> a:link, #yiv9995229445 span.yiv9995229445MsoHyperlink <br clear="none">> {color:blue;text-decoration:underline;}#yiv9995229445 <br clear="none">> span.yiv9995229445EmailStyle19 <br clear="none">> {font-family:sans-serif;color:windowtext;}#yiv9995229445 <br clear="none">> .yiv9995229445MsoChpDefault {font-size:10.0pt;} _filtered <br clear="none">> {}#yiv9995229445 div.yiv9995229445WordSection1 {}#yiv9995229445<div id="gmail-m_-6026764885946615029yqtfd60229"><br clear="none">> Jack,<br clear="none">> <br clear="none">> <br clear="none">> <br clear="none">> It is quite common in bank statement PDF’s to have transactions be <br clear="none">> formatted like this (I hope the alignment works, I will format as <br clear="none">> fixed-font to try to help):<br clear="none">> <br clear="none">> <br clear="none">> <br clear="none">> MM/DD/YY Payee Name Amount paid Running <br clear="none">> balance<br clear="none">> <br clear="none">> Additional info about payment<br clear="none">> <br clear="none">> Can be multiple lines<br clear="none">> <br clear="none">> <br clear="none">> <br clear="none">> MM/DD/YY Next Payee Name Amount Paid Running <br clear="none">> balance<br clear="none">> <br clear="none">> <br clear="none">> <br clear="none">> MM/DD/YY DEPOSIT Amount deposited Running <br clear="none">> Balance<br clear="none">> <br clear="none">> <br clear="none">> <br clear="none">> So when the PDF is translated to text, those “additional info” <br clear="none">> line(s) appear as separate physical lines without the MM/DD/YY header <br clear="none">> or any money amounts following.<br clear="none">> <br clear="none">> <br clear="none">> <br clear="none">> Depending heavily on the PDF construction, I have also (but rarely) <br clear="none">> seen the money amounts (paid or deposited and balance) show up on the <br clear="none">> SECOND line after conversion of the PDF to text. The pdftotext <br clear="none">> “-layout” switch has improved over time to where I seldom see this <br clear="none">> any more, but it can happen.<br clear="none">> <br clear="none">> <br clear="none">> <br clear="none">> Like I said, it can get complicated.<br clear="none">> <br clear="none">> <br clear="none">> <br clear="none">> Peter<br clear="none">> <br clear="none">> <br clear="none">> <br clear="none">> From: KMyMoney <<a shape="rect" href="mailto:kmymoney-bounces@kde.org" target="_blank">kmymoney-bounces@kde.org</a>> On Behalf Of Jack<br clear="none">> Sent: Thursday, December 31, 2020 3:14 PM<br clear="none">> To: <a shape="rect" href="mailto:kmymoney@kde.org" target="_blank">kmymoney@kde.org</a><br clear="none">> Subject: Re: More pdf2kmymoney (overflos/wrapping lines)<br clear="none">> <br clear="none">> <br clear="none">> <br clear="none">> I started this yesterday, and I know there have been additional posts <br clear="none">> since, but I think this particular point hasn't been resolved.<br clear="none">> <br clear="none">> <br clear="none">> <br clear="none">> On 12/30/20 8:59 PM, <a shape="rect" href="mailto:pjfarley3@earthlink.net" target="_blank">pjfarley3@earthlink.net</a> wrote:<br clear="none">> <br clear="none">> <br clear="none">> In my experience pdftotext does not “overflow lines”. That is <br clear="none">> probably “extra information” (i.e., “Memo” field data) related to the <br clear="none">> transaction on the previous line. That is quite common in bank <br clear="none">> statements. You have to expect such lines and be prepared to attach <br clear="none">> them to the prior transaction. I do it as the “Memo” field in my <br clear="none">> output.<br clear="none">> <br clear="none">> <br clear="none">> Aaron would have to confirm, but I suspect he refers to a case where <br clear="none">> a single table row as shown in the PDF has two rows of text in each <br clear="none">> cell, becuase there is just too much text for one line. Because PDF <br clear="none">> knows only about where exactly on the page any text is, but not why <br clear="none">> it is there (no information about things like tables) the text output <br clear="none">> would have two lines. The first would have the first line of text <br clear="none">> from each cell, and the send would have the second line of text from <br clear="none">> each cell. Putting them back together is theoretically possible, but <br clear="none">> only if there is some way to know that the second line is not a new <br clear="none">> row (missing header info?) or part of a manually controlled cleanup <br clear="none">> phase of the conversion.<br clear="none">> <br clear="none"></div></div>
</div>
</div></div></blockquote></div>