More pdf2kmymoney
Jack
ostroffjh at users.sourceforge.net
Thu Dec 31 16:35:34 GMT 2020
On 12/30/20 8:59 PM, pjfarley3 at earthlink.net wrote:
>
> Aaron,
>
> In my experience pdftotext does not “overflow lines”. That is
> probably “extra information” (i.e., “Memo” field data) related to the
> transaction on the previous line. That is quite common in bank
> statements. You have to expect such lines and be prepared to attach
> them to the prior transaction. I do it as the “Memo” field in my
> output.
>
Aaron would have to confirm, but I suspect he refers to a case where a
single table row as shown in the PDF has two rows of text in each cell,
becuase there is just too much text for one line. Because PDF knows
only about where exactly on the page any text is, but not why it is
there (no information about things like tables) the text output would
have two lines. The first would have the first line of text from each
cell, and the send would have the second line of text from each cell.
Putting them back together is theoretically possible, but only if there
is some way to know that the second line is not a new row (missing
header info?) or part of a manually controlled cleanup phase of the
conversion.
>
> And no, there is no “automatic” way to turn it into a csv file. Every
> bank’s statements are unique to that bank, so there is no “common
> format” for a software company or enterprising independent programmer
> looking for work to use as the model of how to do it. *You* have to
> write the code to do that for *your* particular bank’s statement
> format, using the text scripting language of your choice. As I said
> previously, I used gawk and then miller languages for my needs. Some
> would recommend the use of the perl language, which I never got around
> to learning but which certainly has a lot going for it.
>
> You could also think about paying a programmer currently out of work
> to write this stuff for you, but in order to test the new software
> they would write for you properly, you would have to trust them with
> copies of your highly personal financial information to make that work.
>
> Converting a bank’s PDF statement to CSV data for KMM (or any other
> money management software) is NOT something that the “low tech user”
> can do by themselves. You need to be (or hire) a programmer with a
> fair amount of experience to accomplish this task. “Just out of
> school” programmers won’t get the job done right, or even necessarily
> at all. It is definitely NOT a trivial task.
>
> Peter
>
> *From:* KMyMoney <kmymoney-bounces at kde.org> *On Behalf Of *Aaron Mehl
> *Sent:* Wednesday, December 30, 2020 6:22 AM
> *To:* KMyMoney Users' Mailing List <kmymoney at kde.org>
> *Subject:* More pdf2kmymoney
>
> I need a little help,
>
> I tried the different ideas for moving pdf to kmymoney.
>
> Okular - I used the table select, but couldn't figure out how to
> export csv. Plus I want something more automatic, if possible
>
> pdftotext - produced a nice text file, but since the columns were
> separated, by commas and the dates also use commas I need to get the
> columns separated by tabs? Also I found that in the final text file
> there was extra rows that was overflow from the same column but the
> row above.
>
> I want to come up with a way to get the file as clean as I can before
> importing it into KMyMoney. I am not sure what strategy to use to do
> this. I am writing directions for users with that get their bank
> statements as pdfs. I am assuming low tech users as my base line.
> These are people who I gather wouldn't know regular expressions etc.
>
> Also once my text file is in the correct format is there an automatic
> way to turn it into csv?
>
> If anyone can float a few ideas of how to clean up these files, I
> would most appreciate it,
>
> Thanks,
>
> Aaron
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kmymoney/attachments/20201231/108d868e/attachment-0001.htm>
More information about the KMyMoney
mailing list