<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-cite-prefix">On 12/30/20 8:59 PM,
<a class="moz-txt-link-abbreviated" href="mailto:pjfarley3@earthlink.net">pjfarley3@earthlink.net</a> wrote:<br>
</div>
<blockquote type="cite"
cite="mid:000901d6df18$9749e5a0$c5ddb0e0$@earthlink.net">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta name="Generator" content="Microsoft Word 15 (filtered
medium)">
<style>@font-face
{font-family:Helvetica;
panose-1:2 11 6 4 2 2 2 2 2 4;}@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}div.WordSection1
{page:WordSection1;}</style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal">Aaron,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">In my experience pdftotext does not
“overflow lines”. That is probably “extra information” (i.e.,
“Memo” field data) related to the transaction on the previous
line. That is quite common in bank statements. You have to
expect such lines and be prepared to attach them to the prior
transaction. I do it as the “Memo” field in my output.</p>
</div>
</blockquote>
Aaron would have to confirm, but I suspect he refers to a case where
a single table row as shown in the PDF has two rows of text in each
cell, becuase there is just too much text for one line. Because PDF
knows only about where exactly on the page any text is, but not why
it is there (no information about things like tables) the text
output would have two lines. The first would have the first line of
text from each cell, and the send would have the second line of text
from each cell. Putting them back together is theoretically
possible, but only if there is some way to know that the second line
is not a new row (missing header info?) or part of a manually
controlled cleanup phase of the conversion.<br>
<blockquote type="cite"
cite="mid:000901d6df18$9749e5a0$c5ddb0e0$@earthlink.net">
<div class="WordSection1">
<p class="MsoNormal"><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">And no, there is no “automatic” way to turn
it into a csv file. Every bank’s statements are unique to
that bank, so there is no “common format” for a software
company or enterprising independent programmer looking for
work to use as the model of how to do it. *You* have to write
the code to do that for *your* particular bank’s statement
format, using the text scripting language of your choice. As
I said previously, I used gawk and then miller languages for
my needs. Some would recommend the use of the perl language,
which I never got around to learning but which certainly has a
lot going for it.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">You could also think about paying a
programmer currently out of work to write this stuff for you,
but in order to test the new software they would write for you
properly, you would have to trust them with copies of your
highly personal financial information to make that work.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Converting a bank’s PDF statement to CSV
data for KMM (or any other money management software) is NOT
something that the “low tech user” can do by themselves. You
need to be (or hire) a programmer with a fair amount of
experience to accomplish this task. “Just out of school”
programmers won’t get the job done right, or even necessarily
at all. It is definitely NOT a trivial task.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Peter<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0in
0in 0in 4.0pt">
<div>
<div style="border:none;border-top:solid #E1E1E1
1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> KMyMoney
<a class="moz-txt-link-rfc2396E" href="mailto:kmymoney-bounces@kde.org"><kmymoney-bounces@kde.org></a> <b>On Behalf Of </b>Aaron
Mehl<br>
<b>Sent:</b> Wednesday, December 30, 2020 6:22 AM<br>
<b>To:</b> KMyMoney Users' Mailing List
<a class="moz-txt-link-rfc2396E" href="mailto:kmymoney@kde.org"><kmymoney@kde.org></a><br>
<b>Subject:</b> More pdf2kmymoney<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Helvetica",sans-serif">I
need a little help,<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Helvetica",sans-serif">I
tried the different ideas for moving pdf to kmymoney.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Helvetica",sans-serif"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Helvetica",sans-serif">Okular
- I used the table select, but couldn't figure out how
to export csv. Plus I want something more automatic,
if possible<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Helvetica",sans-serif"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Helvetica",sans-serif">pdftotext
- produced a nice text file, but since the columns
were separated, by commas and the dates also use
commas I need to get the columns separated by tabs?
Also I found that in the final text file there was
extra rows that was overflow from the same column but
the row above.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Helvetica",sans-serif"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Helvetica",sans-serif">I
want to come up with a way to get the file as clean as
I can before importing it into KMyMoney. I am not sure
what strategy to use to do this. I am writing
directions for users with that get their bank
statements as pdfs. I am assuming low tech users as my
base line. These are people who I gather wouldn't know
regular expressions etc.<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Helvetica",sans-serif">Also
once my text file is in the correct format is there an
automatic way to turn it into csv?<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Helvetica",sans-serif"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Helvetica",sans-serif">If
anyone can float a few ideas of how to clean up these
files, I would most appreciate it,<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Helvetica",sans-serif">Thanks,<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span
style="font-size:12.0pt;font-family:"Helvetica",sans-serif">Aaron<o:p></o:p></span></p>
</div>
</div>
</div>
</div>
</blockquote>
</body>
</html>