<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <div class="moz-cite-prefix">On 12/30/20 8:59 PM,

      <a class="moz-txt-link-abbreviated" href="mailto:pjfarley3@earthlink.net">pjfarley3@earthlink.net</a> wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:000901d6df18$9749e5a0$c5ddb0e0$@earthlink.net">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <meta name="Generator" content="Microsoft Word 15 (filtered

        medium)">

      <style>@font-face

        {font-family:Helvetica;

        panose-1:2 11 6 4 2 2 2 2 2 4;}@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        font-size:11.0pt;

        font-family:"Calibri",sans-serif;}span.EmailStyle18

        {mso-style-type:personal-reply;

        font-family:"Calibri",sans-serif;

        color:windowtext;}.MsoChpDefault

        {mso-style-type:export-only;

        font-size:10.0pt;}div.WordSection1

        {page:WordSection1;}</style><!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]-->

      <div class="WordSection1">

        <p class="MsoNormal">Aaron,<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">In my experience pdftotext does not

          “overflow lines”.  That is probably “extra information” (i.e.,

          “Memo” field data) related to the transaction on the previous

          line.  That is quite common in bank statements.  You have to

          expect such lines and be prepared to attach them  to the prior

          transaction.   I do it as the “Memo” field in my output.</p>

      </div>

    </blockquote>

    Aaron would have to confirm, but I suspect he refers to a case where

    a single table row as shown in the PDF has two rows of text in each

    cell, becuase there is just too much text for one line.  Because PDF

    knows only about where exactly on the page any text is, but  not why

    it is there (no information about things like tables) the text

    output would have two lines.  The first would have the first line of

    text from each cell, and the send would have the second line of text

    from each cell.  Putting them back together is theoretically

    possible, but only if there is some way to know that the second line

    is not a new row (missing header info?) or part of a manually

    controlled cleanup phase of the conversion.<br>

    <blockquote type="cite"

      cite="mid:000901d6df18$9749e5a0$c5ddb0e0$@earthlink.net">

      <div class="WordSection1">

        <p class="MsoNormal"><o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">And no, there is no “automatic” way to turn

          it into a csv file.  Every bank’s statements are unique to

          that bank, so there is no “common format” for a software

          company or enterprising independent programmer looking for

          work to use as the model of how to do it.  *You* have to write

          the code to do that for *your* particular bank’s statement

          format, using the text scripting language of your choice.  As

          I said previously, I used gawk and then miller languages for

          my needs.  Some would recommend the use of the perl language,

          which I never got around to learning but which certainly has a

          lot going for it.<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">You could also think about paying a

          programmer currently out of work to write this stuff for you,

          but in order to test the new software they would write for you

          properly, you would have to trust them with copies of your

          highly personal financial information to make that work.<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">Converting a bank’s PDF statement to CSV

          data for KMM (or any other money management software) is NOT

          something that the “low tech user” can do by themselves.  You

          need to be (or hire) a programmer with a fair amount of

          experience to accomplish this task.  “Just out of school”

          programmers won’t get the job done right, or even necessarily

          at all.  It is definitely NOT a trivial task.<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <p class="MsoNormal">Peter<o:p></o:p></p>

        <p class="MsoNormal"><o:p> </o:p></p>

        <div style="border:none;border-left:solid blue 1.5pt;padding:0in

          0in 0in 4.0pt">

          <div>

            <div style="border:none;border-top:solid #E1E1E1

              1.0pt;padding:3.0pt 0in 0in 0in">

              <p class="MsoNormal"><b>From:</b> KMyMoney

                <a class="moz-txt-link-rfc2396E" href="mailto:kmymoney-bounces@kde.org"><kmymoney-bounces@kde.org></a> <b>On Behalf Of </b>Aaron

                Mehl<br>

                <b>Sent:</b> Wednesday, December 30, 2020 6:22 AM<br>

                <b>To:</b> KMyMoney Users' Mailing List

                <a class="moz-txt-link-rfc2396E" href="mailto:kmymoney@kde.org"><kmymoney@kde.org></a><br>

                <b>Subject:</b> More pdf2kmymoney<o:p></o:p></p>

            </div>

          </div>

          <p class="MsoNormal"><o:p> </o:p></p>

          <div>

            <div>

              <p class="MsoNormal"><span

                  style="font-size:12.0pt;font-family:"Helvetica",sans-serif">I

                  need a little help,<o:p></o:p></span></p>

            </div>

            <div>

              <p class="MsoNormal"><span

                  style="font-size:12.0pt;font-family:"Helvetica",sans-serif">I

                  tried the different ideas for moving pdf to kmymoney.<o:p></o:p></span></p>

            </div>

            <div>

              <p class="MsoNormal"><span

                  style="font-size:12.0pt;font-family:"Helvetica",sans-serif"><o:p> </o:p></span></p>

            </div>

            <div>

              <p class="MsoNormal"><span

                  style="font-size:12.0pt;font-family:"Helvetica",sans-serif">Okular

                  - I used the table select, but couldn't figure out how

                  to export csv. Plus I want something more automatic,

                  if possible<o:p></o:p></span></p>

            </div>

            <div>

              <p class="MsoNormal"><span

                  style="font-size:12.0pt;font-family:"Helvetica",sans-serif"><o:p> </o:p></span></p>

            </div>

            <div>

              <p class="MsoNormal"><span

                  style="font-size:12.0pt;font-family:"Helvetica",sans-serif">pdftotext

                  - produced a nice text file, but since the columns

                  were separated, by commas and the dates also use

                  commas I need to get the columns separated by tabs?

                  Also I found that in the final text file there was

                  extra rows that was overflow from the same column but

                  the row above.<o:p></o:p></span></p>

            </div>

            <div>

              <p class="MsoNormal"><span

                  style="font-size:12.0pt;font-family:"Helvetica",sans-serif"><o:p> </o:p></span></p>

            </div>

            <div>

              <p class="MsoNormal"><span

                  style="font-size:12.0pt;font-family:"Helvetica",sans-serif">I

                  want to come up with a way to get the file as clean as

                  I can before importing it into KMyMoney. I am not sure

                  what strategy to use to do this. I am writing

                  directions for users with that get their bank

                  statements as pdfs. I am assuming low tech users as my

                  base line. These are people who I gather wouldn't know

                  regular expressions etc.<o:p></o:p></span></p>

            </div>

            <div>

              <p class="MsoNormal"><span

                  style="font-size:12.0pt;font-family:"Helvetica",sans-serif">Also

                  once my text file is in the correct format is there an

                  automatic way to turn it into csv?<o:p></o:p></span></p>

            </div>

            <div>

              <p class="MsoNormal"><span

                  style="font-size:12.0pt;font-family:"Helvetica",sans-serif"><o:p> </o:p></span></p>

            </div>

            <div>

              <p class="MsoNormal"><span

                  style="font-size:12.0pt;font-family:"Helvetica",sans-serif">If

                  anyone can float a few ideas of how to clean up these

                  files, I would most appreciate  it,<o:p></o:p></span></p>

            </div>

            <div>

              <p class="MsoNormal"><span

                  style="font-size:12.0pt;font-family:"Helvetica",sans-serif">Thanks,<o:p></o:p></span></p>

            </div>

            <div>

              <p class="MsoNormal"><span

                  style="font-size:12.0pt;font-family:"Helvetica",sans-serif">Aaron<o:p></o:p></span></p>

            </div>

          </div>

        </div>

      </div>

    </blockquote>

  </body>

</html>