OFX Import Matching Problem

Thomas Baumgart thb at net-bembel.de
Sun Jun 23 08:56:57 BST 2019


On Samstag, 22. Juni 2019 23:33:43 CEST Jack wrote:

> Minor point - I hope you mean default category (rather than account)  
> for a payee.
> 
> Primarily, I'm just trying to think of cases that might end up with  
> unintended consequences, such as your current problem, after the change  
> Thomas made in January.  I'm also partly just talking out loud, to make  
> sure I understand how things work, as I often discover is not the  
> case.  One thing I was not framing correctly in my mind is that a split  
> transaction has only one payee, but multiple categories.  You are  
> asking (do I have this right?) to choose the matching transaction not  
> based on total amount of the transaction, but the closest amount  
> (within a specified time limit) for a category specified in the  
> configuration for the payee.  Given the newly imported transaction is  
> not yet split, are you trying to match the total amount of the new  
> transaction to the amount of the specified category in past  
> transactions?  (Or am I further off the mark than I thought?)

This may get clearer for you if you start thinking in splits. Each (non-zero and balanced) transaction has at least two splits: one for the account and at least one for a category. The imported new transaction only has one split (as the category is yet unknown). So what KMyMoney does is to take a list of transactions filtered by payee and account (which means: transactions that have a split with the payee in that account. It would even work if each split of a transaction can have a different payee, which exists as wish list item). Amount comparison of the new and existing transactions happens on the split referencing the account (which in fact is what you refer to as the total amount). Anything else would not really work.

The old implementation (prior to my January change) looked for a transaction in that list that has the exact same amount and copied the categories that were assigned. In case no transaction with the exact same amount exists, it simply took the last one entered for that payee.

This is what bugged me with two alternating transactions from the same payee with different amounts each month: it took the wrong one most of the time. Hence my change, which now works as follows:

KMyMoney looks for a transaction in the list filtered by payee and account that has the exact same amount and copies the categories that were assigned. In case no transaction with the exact same amount exists, it simply takes the one with the smallest difference in amount for that payee. While doing so, it goes back to day one of your data in that account.

Brendan now asks to limit this search further by adding a date filter which should be configurable on a per payee basis.

That seems to be doable with an addition to the payee editor and a new storage attribute.

Thomas


> Separately, I'm trying to think how I could use this for my problem,  
> which is that I never (or very rarely) want to match a newly imported  
> transaction to a split transaction, which seems to happen fairly often  
> when the most recent transaction for the payee is split.
> 
> On 2019.06.22 16:23, Brendan Coupe wrote:
> > If I understood Thomas correctly matching is only looking at existing
> > transactions in the account. That works fine for me when I duplicate
> > the previous paycheck prior to importing the OXF file from my bank.
> > Not an ideal way to do this but when I don't it matches the closest
> > amount for that payee since the beginning of time.
> > 
> > The paycheck has 15 splits so a single default account does not work.
> > Even if I could assign 15 default accounts I would have to update them
> > fairly often or they would become less and less useful.
> > 
> > Basically what I am asking for is an option in the payee default
> > account settings that says pick the closest amount in the past xx days
> > and use that/those category(ies). That alone would eliminate this
> > weekly problem for me and probably many others that are less frequent.
> > The global settings and my original suggestion are probably not needed
> > if this setting was added for each payee.
> > 
> > ----
> > Brendan Coupe
> > 
> > On Sat, Jun 22, 2019 at 2:07 PM Jack  
> > <ostroffjh at users.sourceforge.net> wrote:
> > >
> > > On 2019.06.22 11:51, Thomas Baumgart wrote:
> > > > On Samstag, 22. Juni 2019 17:16:45 CEST Brendan Coupe wrote:
> > > >
> > > > > I see why my 30 day limit did not help. It does when I manually  
> > copy
> > > > > the most recent paycheck and then import the OFX data.
> > > > >
> > > > > I have an idea how to deal with this. In the Default Account  
> > tab for
> > > > > the payee there is a checkbox "Use the default category..." If
> > > > checked
> > > > > you can select a single default category.
> > > > >
> > > > > How about making 4 radio buttons:
> > > > >
> > > > > - None
> > > > > - Most recent transaction
> > > > > - Closest amount
> > > > > - Use the default category... (enable the dropdown list when
> > > > selected)
> > > >
> > > > How about a system wide setting with the above option set (maybe
> > > > without the last one) and a per payee override option?  
> > Introduction
> > > > of this feature would be done as follows:
> > > >
> > > > a) the system wide default setting is "closest amount" (which
> > > > reflects today's default)
> > > > b) payees that don't have the category set will use the system  
> > wide
> > > > setting
> > > > c) payees that have a default category set will override the  
> > system
> > > > wide setting with the default category
> > > I THINK that sounds right, but I'm wondering what should be per  
> > account
> > > vs per payee vs per category.
> > >
> > > I may be over thinking it - but when looking for a transaction to
> > > "match," am I missing something, or do we still have a lack of clear
> > > terminology to distinguish finding the existing transaction to use  
> > as a
> > > "model" [again - not a formal term] for an imported transaction vs.
> > > what I think of as "true" matching - to find if the imported
> > > transaction is a duplicate of one already present?  I hate to admit  
> > it,
> > > but I'm still not completely clear of that steps followed - first
> > > (assuming the imported transaction is not a duplicate) to find the  
> > best
> > > transaction to model (based on what) and then whether to use the  
> > payee
> > > and/or category of that transaction, or the default category of the
> > > assumed payee.  Just to add to the mix here, the problem I often  
> > face
> > > is for a payee which usually has transactions with a single category
> > > (marked default for that payee) I sometimes create split  
> > transactions -
> > > and it is almost always wrong to use one of these split  
> > transactions as
> > > the model for a newly imported transaction.  How might that fit into
> > > this process?
> > >
> > > >
> > > > Does that make sense? Any objections anyone?
> > > >
> > > > Thomas
> > > >
> > > >
> > > >
> > > > > On Sat, Jun 22, 2019 at 4:25 AM Thomas Baumgart  
> > <thb at net-bembel.de>
> > > > wrote:
> > > > > >
> > > > > > On Freitag, 21. Juni 2019 22:55:29 CEST Brendan Coupe wrote:
> > > > > >
> > > > > > > I'm running a week old build from the 5.0 branch on Fedora  
> > 29.
> > > > > > >
> > > > > > > When I download my savings account transaction using online
> > > > banking
> > > > > > > the paycheck frequently matches with a very old paycheck.  
> > This
> > > > results
> > > > > > > in the splits being way off.
> > > > > > >
> > > > > > > This happens when the amount of the new paycheck is not very
> > > > close to
> > > > > > > the most recent paycheck which has been happening a lot  
> > lately
> > > > due to
> > > > > > > reimbursed business expanses.
> > > > > > >
> > > > > > > On the import tab of the ledge settings I have tried setting
> > > > "Match
> > > > > > > transaction within days" from 7 days (paycheck is weekly)  
> > to 30
> > > > days
> > > > > > > and the same thing happens. KMM is definitely matching
> > > > transactions
> > > > > > > that are much more than 30 days old. In fact the transaction
> > > > that it
> > > > > > > matched was only $0.01 closer to the new transaction than  
> > the
> > > > previous
> > > > > > > paycheck (difference was $8.29 versus $8.30). The  
> > transaction it
> > > > > > > matched is over 18 months old. It appears to be ignoring the
> > > > "Match
> > > > > > > transaction within days" setting. it's simply matching the
> > > > transaction
> > > > > > > from the same payee that is closest in value.
> > > > > > >
> > > > > > > I'm pretty sure this is fairly new behavior but I'm not  
> > sure if
> > > > it
> > > > > > > started with the initial version of KMM5 that I used or more
> > > > recently.
> > > > > >
> > > > > > This probably goes back to a change I made in January this  
> > year:
> > > > > >
> > > > > >
> > > >  
> > https://cgit.kde.org/kmymoney.git/commit/?id=447213e04d6e7ab9022caeb5c258800625036967
> > > > > >
> > > > > > which added the part of choosing an ancient transaction based  
> > on
> > > > the smallest difference in amount whereas before it only used old
> > > > transactions if the amount was identical.
> > > > > >
> > > > > > Here's what I found in the code (which perfectly explains what
> > > > you encounter):
> > > > > >
> > > > > > In case the payee name has been found, the following will take
> > > > place:
> > > > > >
> > > > > >       // Fill in other side of the transaction (category/etc)
> > > > based on payee
> > > > > >       //
> > > > > >                 // [...]
> > > > > >                 //
> > > > > >       // We'll search for the most recent transaction in this
> > > > account with
> > > > > >       // this payee.  If this reference transaction is a  
> > simple
> > > > 2-split
> > > > > >       // transaction, it's simple.  If it's a complex split,  
> > and
> > > > the amounts
> > > > > >       // are different, we have a problem.  Somehow we have to
> > > > balance the
> > > > > >       // transaction.  For now, we'll leave it unbalanced, and
> > > > let the user
> > > > > >       // handle it.
> > > > > >
> > > > > > For the category to be found, the first thing is to check if  
> > the
> > > > payee has a default category assigned. If yes, it is taken and  
> > we're
> > > > done. If not, all transactions for that payee in the account will  
> > be
> > > > searched backwards. Note: no date filtering here, which certainly  
> > is
> > > > the cause of the behavior you encounter. The algorithm then works  
> > as
> > > > follows:
> > > > > >
> > > > > >           // if there is more than one matching transaction,  
> > try
> > > > to be a little
> > > > > >           // smart about which one we use.  we scan them all  
> > and
> > > > check if
> > > > > >           // we find an exact match or use the one with the
> > > > closest value
> > > > > >
> > > > > > The scan works backwards with the last one being the default.  
> > So
> > > > we have at least one transaction for that payee, and in case of
> > > > multiple the one with the least difference in amount will be
> > > > selected. Then we continue with:
> > > > > >
> > > > > >                 // in case the old transaction has two splits
> > > > > >                 // we simply inverse the amount of the current
> > > > > >                 // transaction found in s1. In other cases  
> > (more
> > > > > >                 // than two splits we copy all splits and  
> > don't
> > > > > >                 // modify the splits. This may lead to  
> > unbalanced
> > > > > >                 // transactions which the user has to fix  
> > manually
> > > > > >
> > > > > > The point is, that we are not talking about 'matching' at this
> > > > point but automatic categorization of the imported transaction.
> > > > Matching happens in the next step when KMyMoney tries to figure  
> > out
> > > > if you already have the said transaction on file (entered manually
> > > > for example). And it is for that matching that the interval is  
> > used,
> > > > but not the automatic categorization happening in the step before.
> > > > Matching actually means merge two transactions (the one on file  
> > and
> > > > the imported one) into a single one. This is not what is happening
> > > > for you and what you certainly don't want with older transactions.
> > > > > >
> > > > > > I am not sure at this point what happens, if I increase the
> > > > matching period beyond one month and another salary payment comes  
> > in
> > > > and it matches. It is certainly not detected as a duplicate but  
> > does
> > > > it match the transactions? I honestly don't know and have never  
> > tried.
> > > > > >
> > > > > > Why did I implement the feature as it is: I receive two  
> > payments
> > > > with very different amounts from the same payee each month and  
> > they
> > > > differ in categories. One of the amounts varies each month and the
> > > > other one is fix (we talk salary and reimbursement here as well,  
> > but
> > > > I receive them in two payments). The old behavior was always  
> > wrong,
> > > > because taking the last payment from that payee as categorization
> > > > base is certainly false and only worked when there was no
> > > > reimbursement (which means I received two salary payments in a  
> > row).
> > > > So for me, a matching period of a few days is OK, but for the
> > > > categorization I probably need a few months. The default to take  
> > the
> > > > last one on file if nothing else was found is probably a good
> > > > decision.
> > > > > >
> > > > > > Would a new setting to limit the search for transactions to do
> > > > the auto categorization help here? What would best describe it and
> > > > what would be a neat name for it?
> > > > > >
> > > > > > Any ideas, anyone?
> > > >
> > > > --
> > > >
> > > > Regards
> > > >
> > > > Thomas Baumgart
> > > >
> > > > https://www.signal.org/       Signal, the better WhatsApp
> > > > -------------------------------------------------------------
> > > > A: Because it destroys the flow of the conversation
> > > > Q: Why is top-posting bad?
> > > > A: Top-posting
> > > > Q: What is the most annoying thing in e-mail?
> > > > -------------------------------------------------------------
> > > >
> > >
> > 
> 

-- 

Regards

Thomas Baumgart

https://www.signal.org/       Signal, the better WhatsApp
-------------------------------------------------------------
'Good code is not created, it evolves.'
-- George Anzinger
-------------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 868 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kmymoney-devel/attachments/20190623/9c2b655f/attachment.sig>


More information about the KMyMoney-devel mailing list