OFX Import Matching Problem

Jack ostroffjh at users.sourceforge.net
Sun Jun 23 16:08:33 BST 2019


On 2019.06.23 03:56, Thomas Baumgart wrote:
> On Samstag, 22. Juni 2019 23:33:43 CEST Jack wrote:
> 
>> Minor point - I hope you mean default category (rather than account)  
>> for a payee.
> >
>> Primarily, I'm just trying to think of cases that might end up with  
>> unintended consequences, such as your current problem, after the  
>> change Thomas made in January.  I'm also partly just talking out  
>> loud, to make sure I understand how things work, as I often discover  
>> is not the case.  One thing I was not framing correctly in my mind  
>> is that a split transaction has only one payee, but multiple  
>> categories.  You are asking (do I have this right?) to choose the  
>> matching transaction not based on total amount of the transaction,  
>> but the closest amount (within a specified time limit) for a  
>> category specified in the configuration for the payee.  Given the  
>> newly imported transaction is not yet split, are you trying to match  
>> the total amount of the new transaction to the amount of the  
>> specified category in past transactions?  (Or am I further off the  
>> mark than I thought?)
> 
> This may get clearer for you if you start thinking in splits. Each  
> (non-zero and balanced) transaction has at least two splits: one for  
> the account and at least one for a category. The imported new  
> transaction only has one split (as the category is yet unknown). So  
> what KMyMoney does is to take a list of transactions filtered by  
> payee and account (which means: transactions that have a split with  
> the payee in that account. It would even work if each split of a  
> transaction can have a different payee, which exists as wish list  
> item). Amount comparison of the new and existing transactions happens  
> on the split referencing the account (which in fact is what you refer  
> to as the total amount). Anything else would not really work.
Thanks, that does give me a much better picture than I had.
> 
> The old implementation (prior to my January change) looked for a  
> transaction in that list that has the exact same amount and copied  
> the categories that were assigned. In case no transaction with the  
> exact same amount exists, it simply took the last one entered for  
> that payee.
> 
> This is what bugged me with two alternating transactions from the  
> same payee with different amounts each month: it took the wrong one  
> most of the time. Hence my change, which now works as follows:
> 
> KMyMoney looks for a transaction in the list filtered by payee and  
> account that has the exact same amount and copies the categories that  
> were assigned. In case no transaction with the exact same amount  
> exists, it simply takes the one with the smallest difference in  
> amount for that payee. While doing so, it goes back to day one of  
> your data in that account.
> 
> Brendan now asks to limit this search further by adding a date filter  
> which should be configurable on a per payee basis.
OK, so I have no objection to these changes.  As far as I can tell,  
they will have no effect on the problem I've mentioned, but for now,  
the only thing I can think of that would help me is a setting to never  
match to a split transaction, or perhaps to only use the category with  
the largest split amount from the previous transaction, and I suspect  
that would not be a good rule in general (even as an optional setting.)
> 
> That seems to be doable with an addition to the payee editor and a  
> new storage attribute.
> 
> Thomas
> 
> 
> > Separately, I'm trying to think how I could use this for my problem,
> > which is that I never (or very rarely) want to match a newly  
> imported
> > transaction to a split transaction, which seems to happen fairly  
> often
> > when the most recent transaction for the payee is split.
> >
> > On 2019.06.22 16:23, Brendan Coupe wrote:
> > > If I understood Thomas correctly matching is only looking at  
> existing
> > > transactions in the account. That works fine for me when I  
> duplicate
> > > the previous paycheck prior to importing the OXF file from my  
> bank.
> > > Not an ideal way to do this but when I don't it matches the  
> closest
> > > amount for that payee since the beginning of time.
> > >
> > > The paycheck has 15 splits so a single default account does not  
> work.
> > > Even if I could assign 15 default accounts I would have to update  
> them
> > > fairly often or they would become less and less useful.
> > >
> > > Basically what I am asking for is an option in the payee default
> > > account settings that says pick the closest amount in the past xx  
> days
> > > and use that/those category(ies). That alone would eliminate this
> > > weekly problem for me and probably many others that are less  
> frequent.
> > > The global settings and my original suggestion are probably not  
> needed
> > > if this setting was added for each payee.
> > >
> > > ----
> > > Brendan Coupe
> > >
> > > On Sat, Jun 22, 2019 at 2:07 PM Jack
> > > <ostroffjh at users.sourceforge.net> wrote:
> > > >
> > > > On 2019.06.22 11:51, Thomas Baumgart wrote:
> > > > > On Samstag, 22. Juni 2019 17:16:45 CEST Brendan Coupe wrote:
> > > > >
> > > > > > I see why my 30 day limit did not help. It does when I  
> manually
> > > copy
> > > > > > the most recent paycheck and then import the OFX data.
> > > > > >
> > > > > > I have an idea how to deal with this. In the Default Account
> > > tab for
> > > > > > the payee there is a checkbox "Use the default category..."  
> If
> > > > > checked
> > > > > > you can select a single default category.
> > > > > >
> > > > > > How about making 4 radio buttons:
> > > > > >
> > > > > > - None
> > > > > > - Most recent transaction
> > > > > > - Closest amount
> > > > > > - Use the default category... (enable the dropdown list when
> > > > > selected)
> > > > >
> > > > > How about a system wide setting with the above option set  
> (maybe
> > > > > without the last one) and a per payee override option?
> > > Introduction
> > > > > of this feature would be done as follows:
> > > > >
> > > > > a) the system wide default setting is "closest amount" (which
> > > > > reflects today's default)
> > > > > b) payees that don't have the category set will use the system
> > > wide
> > > > > setting
> > > > > c) payees that have a default category set will override the
> > > system
> > > > > wide setting with the default category
> > > > I THINK that sounds right, but I'm wondering what should be per
> > > account
> > > > vs per payee vs per category.
> > > >
> > > > I may be over thinking it - but when looking for a transaction  
> to
> > > > "match," am I missing something, or do we still have a lack of  
> clear
> > > > terminology to distinguish finding the existing transaction to  
> use
> > > as a
> > > > "model" [again - not a formal term] for an imported transaction  
> vs.
> > > > what I think of as "true" matching - to find if the imported
> > > > transaction is a duplicate of one already present?  I hate to  
> admit
> > > it,
> > > > but I'm still not completely clear of that steps followed -  
> first
> > > > (assuming the imported transaction is not a duplicate) to find  
> the
> > > best
> > > > transaction to model (based on what) and then whether to use the
> > > payee
> > > > and/or category of that transaction, or the default category of  
> the
> > > > assumed payee.  Just to add to the mix here, the problem I often
> > > face
> > > > is for a payee which usually has transactions with a single  
> category
> > > > (marked default for that payee) I sometimes create split
> > > transactions -
> > > > and it is almost always wrong to use one of these split
> > > transactions as
> > > > the model for a newly imported transaction.  How might that fit  
> into
> > > > this process?
> > > >
> > > > >
> > > > > Does that make sense? Any objections anyone?
> > > > >
> > > > > Thomas
> > > > >
> > > > >
> > > > >
> > > > > > On Sat, Jun 22, 2019 at 4:25 AM Thomas Baumgart
> > > <thb at net-bembel.de>
> > > > > wrote:
> > > > > > >
> > > > > > > On Freitag, 21. Juni 2019 22:55:29 CEST Brendan Coupe  
> wrote:
> > > > > > >
> > > > > > > > I'm running a week old build from the 5.0 branch on  
> Fedora
> > > 29.
> > > > > > > >
> > > > > > > > When I download my savings account transaction using  
> online
> > > > > banking
> > > > > > > > the paycheck frequently matches with a very old  
> paycheck.
> > > This
> > > > > results
> > > > > > > > in the splits being way off.
> > > > > > > >
> > > > > > > > This happens when the amount of the new paycheck is not  
> very
> > > > > close to
> > > > > > > > the most recent paycheck which has been happening a lot
> > > lately
> > > > > due to
> > > > > > > > reimbursed business expanses.
> > > > > > > >
> > > > > > > > On the import tab of the ledge settings I have tried  
> setting
> > > > > "Match
> > > > > > > > transaction within days" from 7 days (paycheck is  
> weekly)
> > > to 30
> > > > > days
> > > > > > > > and the same thing happens. KMM is definitely matching
> > > > > transactions
> > > > > > > > that are much more than 30 days old. In fact the  
> transaction
> > > > > that it
> > > > > > > > matched was only $0.01 closer to the new transaction  
> than
> > > the
> > > > > previous
> > > > > > > > paycheck (difference was $8.29 versus $8.30). The
> > > transaction it
> > > > > > > > matched is over 18 months old. It appears to be  
> ignoring the
> > > > > "Match
> > > > > > > > transaction within days" setting. it's simply matching  
> the
> > > > > transaction
> > > > > > > > from the same payee that is closest in value.
> > > > > > > >
> > > > > > > > I'm pretty sure this is fairly new behavior but I'm not
> > > sure if
> > > > > it
> > > > > > > > started with the initial version of KMM5 that I used or  
> more
> > > > > recently.
> > > > > > >
> > > > > > > This probably goes back to a change I made in January this
> > > year:
> > > > > > >
> > > > > > >
> > > > >
> > >  
> https://cgit.kde.org/kmymoney.git/commit/?id=447213e04d6e7ab9022caeb5c258800625036967
> > > > > > >
> > > > > > > which added the part of choosing an ancient transaction  
> based
> > > on
> > > > > the smallest difference in amount whereas before it only used  
> old
> > > > > transactions if the amount was identical.
> > > > > > >
> > > > > > > Here's what I found in the code (which perfectly explains  
> what
> > > > > you encounter):
> > > > > > >
> > > > > > > In case the payee name has been found, the following will  
> take
> > > > > place:
> > > > > > >
> > > > > > >       // Fill in other side of the transaction  
> (category/etc)
> > > > > based on payee
> > > > > > >       //
> > > > > > >                 // [...]
> > > > > > >                 //
> > > > > > >       // We'll search for the most recent transaction in  
> this
> > > > > account with
> > > > > > >       // this payee.  If this reference transaction is a
> > > simple
> > > > > 2-split
> > > > > > >       // transaction, it's simple.  If it's a complex  
> split,
> > > and
> > > > > the amounts
> > > > > > >       // are different, we have a problem.  Somehow we  
> have to
> > > > > balance the
> > > > > > >       // transaction.  For now, we'll leave it  
> unbalanced, and
> > > > > let the user
> > > > > > >       // handle it.
> > > > > > >
> > > > > > > For the category to be found, the first thing is to check  
> if
> > > the
> > > > > payee has a default category assigned. If yes, it is taken and
> > > we're
> > > > > done. If not, all transactions for that payee in the account  
> will
> > > be
> > > > > searched backwards. Note: no date filtering here, which  
> certainly
> > > is
> > > > > the cause of the behavior you encounter. The algorithm then  
> works
> > > as
> > > > > follows:
> > > > > > >
> > > > > > >           // if there is more than one matching  
> transaction,
> > > try
> > > > > to be a little
> > > > > > >           // smart about which one we use.  we scan them  
> all
> > > and
> > > > > check if
> > > > > > >           // we find an exact match or use the one with  
> the
> > > > > closest value
> > > > > > >
> > > > > > > The scan works backwards with the last one being the  
> default.
> > > So
> > > > > we have at least one transaction for that payee, and in case  
> of
> > > > > multiple the one with the least difference in amount will be
> > > > > selected. Then we continue with:
> > > > > > >
> > > > > > >                 // in case the old transaction has two  
> splits
> > > > > > >                 // we simply inverse the amount of the  
> current
> > > > > > >                 // transaction found in s1. In other cases
> > > (more
> > > > > > >                 // than two splits we copy all splits and
> > > don't
> > > > > > >                 // modify the splits. This may lead to
> > > unbalanced
> > > > > > >                 // transactions which the user has to fix
> > > manually
> > > > > > >
> > > > > > > The point is, that we are not talking about 'matching' at  
> this
> > > > > point but automatic categorization of the imported  
> transaction.
> > > > > Matching happens in the next step when KMyMoney tries to  
> figure
> > > out
> > > > > if you already have the said transaction on file (entered  
> manually
> > > > > for example). And it is for that matching that the interval is
> > > used,
> > > > > but not the automatic categorization happening in the step  
> before.
> > > > > Matching actually means merge two transactions (the one on  
> file
> > > and
> > > > > the imported one) into a single one. This is not what is  
> happening
> > > > > for you and what you certainly don't want with older  
> transactions.
> > > > > > >
> > > > > > > I am not sure at this point what happens, if I increase  
> the
> > > > > matching period beyond one month and another salary payment  
> comes
> > > in
> > > > > and it matches. It is certainly not detected as a duplicate  
> but
> > > does
> > > > > it match the transactions? I honestly don't know and have  
> never
> > > tried.
> > > > > > >
> > > > > > > Why did I implement the feature as it is: I receive two
> > > payments
> > > > > with very different amounts from the same payee each month and
> > > they
> > > > > differ in categories. One of the amounts varies each month  
> and the
> > > > > other one is fix (we talk salary and reimbursement here as  
> well,
> > > but
> > > > > I receive them in two payments). The old behavior was always
> > > wrong,
> > > > > because taking the last payment from that payee as  
> categorization
> > > > > base is certainly false and only worked when there was no
> > > > > reimbursement (which means I received two salary payments in a
> > > row).
> > > > > So for me, a matching period of a few days is OK, but for the
> > > > > categorization I probably need a few months. The default to  
> take
> > > the
> > > > > last one on file if nothing else was found is probably a good
> > > > > decision.
> > > > > > >
> > > > > > > Would a new setting to limit the search for transactions  
> to do
> > > > > the auto categorization help here? What would best describe  
> it and
> > > > > what would be a neat name for it?
> > > > > > >
> > > > > > > Any ideas, anyone?
> > > > >
> > > > > --
> > > > >
> > > > > Regards
> > > > >
> > > > > Thomas Baumgart
> > > > >
> > > > > https://www.signal.org/       Signal, the better WhatsApp
> > > > > -------------------------------------------------------------
> > > > > A: Because it destroys the flow of the conversation
> > > > > Q: Why is top-posting bad?
> > > > > A: Top-posting
> > > > > Q: What is the most annoying thing in e-mail?
> > > > > -------------------------------------------------------------
> > > > >
> > > >
> > >
> >
> 
> --
> 
> Regards
> 
> Thomas Baumgart
> 
> https://www.signal.org/       Signal, the better WhatsApp
> -------------------------------------------------------------
> 'Good code is not created, it evolves.'
> -- George Anzinger
> -------------------------------------------------------------
> 



More information about the KMyMoney-devel mailing list