pdf to kmymoney

pjfarley3 at earthlink.net pjfarley3 at earthlink.net
Mon Dec 28 17:00:31 GMT 2020


It depends entirely on the text you get from pdftotext.  In my gawk code I searched for the header string(s) that delineated the checking transaction area and then consumed each transaction line translating the details to csv format.  In my case the transaction text can span more than one  line, so I had to keep flags set to tell me if I was starting a new transaction or continuing one that started on a prior line, so it got quite complicated.  Plus in a busy month there were page breaks in the section of text that listed transactions (with various page headings and/or page numbers) to skip as well.

 

An additional complexity I encountered in my bank’s statement text is that various “advertisements” for bank services or offerings can appear in the transaction section at the end of transaction lines and/or as their own lines, and those had to be skipped as well.

 

All I can tell you to do is to examine the text closely and see how the lines you are interested in can be easily identified (e.g., with a date as the first non-blank word(s) on the line) and then code from there.  And don’t forget to look for the lines that state your starting and ending balances and the dates covered by the statement.  The dates can be used, for instance, to dynamically name the output file with a name that includes the date of your choice (starting or ending date).

 

It isn’t rocket science but it can be tricky to get right in one go.  I think I wrote several dozen versions before I finally got it right, but I also extracted the savings and retirement fund sections of my bank statement separately from the checking transactions.  The final suite that I use now runs to a dozen or more text and tooling scripts.  I applied the lessons I learned extracting the bank statement transactions to do the same for my credit card statements.  I don’t invest in stocks or bonds via my bank, so I don’t have any of that in the bank statements that I process.

 

AFAIK there is no generic tool that can do the job for you.  You have to craft it yourself using whatever text processing tool(s) you are most comfortable using.

 

It would be awfully nice if all banks provided OFX dynamic interfaces to their banking services, but it appears that only the European banks (particularly in Germany) are doing that.  No US-based bank that I have found provides anything but Quicken access, and sometimes even that is limited to just statement downloads and not any interactive banking tasks like bill paying or inter-account transfers.  They all seem to want to limit you to using their web or smart-phone APP interfaces.  I’d even take OFX access via MFA [multi-factor authorization] rules if they offered it, but none do.

 

OTOH, OFX itself has certain format limits that make the interface problematic, like limits on how long certain text fields can be.  Extracting text from the bank statements sometimes provides far more detail in your transactions (extended or even multi-line MEMO text for instance) than can be represented by OFX rules.

 

Another thing you might check out is whether your bank provides a distinct “transaction download” facility on their web site.  This will be separate from the statement PDF download.  My bank provides such a facility on their website to download transactions already in CSV format for a range of dates, so if I wish I can download transactions covering dates that I choose (e.g. from the first day of a month to the last day of that month) and not be limited to the bank statement period.  Not so useful for reconciling the statement of course, but it is an option if your bank provides it.  I use this facility from my bank to cross-check my bank statement extractions so that if I miss something in the statement extraction I will see it sooner rather than later.

 

BTW, my bank is Citibank (USA).  What is yours?

 

Good luck.

 

Peter

 

From: KMyMoney <kmymoney-bounces at kde.org> On Behalf Of Aaron Mehl
Sent: Monday, December 28, 2020 10:51 AM
To: KMyMoney Users' mailing list <kmymoney at kde.org>
Subject: Re: pdf to kmymoney

 

Wow your pdftotext settings worked.

You said I need to make my own script to strip the transactions from the file. Do you have any hints for me?

I can open the file in a text editor and just delete the unneeded text, but I was looking for something more hands off that will clean it up and move this to comma separated.

Thanks,

Aaron

 

On Monday, December 28, 2020, 10:14:42 AM EST, pjfarley3 at earthlink.net <pjfarley3 at earthlink.net> wrote: 

 

 

I had the same issue.  My bank statements are also PDF and my bank provides no online access from KMM or any checking program other than Quicken, so I download the PDF from my bank website and use pdftotext (yes, there is a Windows version) to extract the statement to a text file that can then be processed by any text-processing language of your choice.  My original bank statement text processing code was written (by me) in gawk, then later I switched to an awk derivative called miller.

 

For my bank’s PDF’s I found this to be the most effective way to extract the text, which (for the most part) preserves columns and headings from the PDF version.  The key is the “-layout” and “-enc UTF-8” options:

 

C:\MyBankFolder>  pdftotext -eol dos -cfg sample-xpdfrc -layout -nopgbrk -enc UTF-8 Bank-statement.PDF bank-statement.txt

 

Writing text-processing code to strip out only the transactions from the text file is something you have to write yourself.

 

The pdftotext executable download for Windows can be found here:

 

https://www.xpdfreader.com/download.html

 

Select the “Windows 32/64-bit” download under “Download the Xpdf command line tools:”.

 

If you are interested, a very good Windows gawk can be found here:

 

https://sourceforge.net/projects/ezwinports/files/

 

The miller language executable is available here:

 

https://github.com/johnkerl/miller

 

HTH

 

Peter

 

From: KMyMoney <kmymoney-bounces at kde.org> On Behalf Of Aaron Mehl
Sent: Monday, December 28, 2020 8:47 AM
To: KMyMoney Users' Mailing List <kmymoney at kde.org>
Subject: pdf to kmymoney

 

Hi all,

My bank statements are in pdf format.

I am looking for a clean easy way to import them into KMyMoney.

I googled and found a non-clean answer, paste it as text in excel and export as csv. 

The problem was the the huge amount of manual clean up I had to do.

I see a command line utility pdftotext but I still need a csv file to import. Is there a utility that will turn this text file into csv or some other way to do this.

Thanks,

Aaron

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kmymoney/attachments/20201228/d428467f/attachment-0001.htm>


More information about the KMyMoney mailing list