OFX Import Matching Problem

Brendan Coupe brendan at coupeware.com
Sun Jun 23 17:16:09 BST 2019


Thomas,

Let me know if you have a patch for me to test. I can easily add it
with my build scripts.

Jack, do you have a default category for the payee that sometimes has
multiple categories and sometimes (most times) does not? I assume that
setting a default category for a payee takes precedence. If the
transactions with one category has different categories each time then
I'm guessing that's an edge case that will be hard to handle every
time.

Your question about account versus category points out what I think is
an error in the user interface. I used the Payee tab label "Default
Account" in my email but the tab should probably be labeled "Default
Category" since that's what it really is.


----
Brendan Coupe

On Sun, Jun 23, 2019 at 9:08 AM Jack <ostroffjh at users.sourceforge.net> wrote:
>
> On 2019.06.23 03:56, Thomas Baumgart wrote:
> > On Samstag, 22. Juni 2019 23:33:43 CEST Jack wrote:
> >
> >> Minor point - I hope you mean default category (rather than account)
> >> for a payee.
> > >
> >> Primarily, I'm just trying to think of cases that might end up with
> >> unintended consequences, such as your current problem, after the
> >> change Thomas made in January.  I'm also partly just talking out
> >> loud, to make sure I understand how things work, as I often discover
> >> is not the case.  One thing I was not framing correctly in my mind
> >> is that a split transaction has only one payee, but multiple
> >> categories.  You are asking (do I have this right?) to choose the
> >> matching transaction not based on total amount of the transaction,
> >> but the closest amount (within a specified time limit) for a
> >> category specified in the configuration for the payee.  Given the
> >> newly imported transaction is not yet split, are you trying to match
> >> the total amount of the new transaction to the amount of the
> >> specified category in past transactions?  (Or am I further off the
> >> mark than I thought?)
> >
> > This may get clearer for you if you start thinking in splits. Each
> > (non-zero and balanced) transaction has at least two splits: one for
> > the account and at least one for a category. The imported new
> > transaction only has one split (as the category is yet unknown). So
> > what KMyMoney does is to take a list of transactions filtered by
> > payee and account (which means: transactions that have a split with
> > the payee in that account. It would even work if each split of a
> > transaction can have a different payee, which exists as wish list
> > item). Amount comparison of the new and existing transactions happens
> > on the split referencing the account (which in fact is what you refer
> > to as the total amount). Anything else would not really work.
> Thanks, that does give me a much better picture than I had.
> >
> > The old implementation (prior to my January change) looked for a
> > transaction in that list that has the exact same amount and copied
> > the categories that were assigned. In case no transaction with the
> > exact same amount exists, it simply took the last one entered for
> > that payee.
> >
> > This is what bugged me with two alternating transactions from the
> > same payee with different amounts each month: it took the wrong one
> > most of the time. Hence my change, which now works as follows:
> >
> > KMyMoney looks for a transaction in the list filtered by payee and
> > account that has the exact same amount and copies the categories that
> > were assigned. In case no transaction with the exact same amount
> > exists, it simply takes the one with the smallest difference in
> > amount for that payee. While doing so, it goes back to day one of
> > your data in that account.
> >
> > Brendan now asks to limit this search further by adding a date filter
> > which should be configurable on a per payee basis.
> OK, so I have no objection to these changes.  As far as I can tell,
> they will have no effect on the problem I've mentioned, but for now,
> the only thing I can think of that would help me is a setting to never
> match to a split transaction, or perhaps to only use the category with
> the largest split amount from the previous transaction, and I suspect
> that would not be a good rule in general (even as an optional setting.)
> >
> > That seems to be doable with an addition to the payee editor and a
> > new storage attribute.
> >
> > Thomas
> >
> >
> > > Separately, I'm trying to think how I could use this for my problem,
> > > which is that I never (or very rarely) want to match a newly
> > imported
> > > transaction to a split transaction, which seems to happen fairly
> > often
> > > when the most recent transaction for the payee is split.
> > >
> > > On 2019.06.22 16:23, Brendan Coupe wrote:
> > > > If I understood Thomas correctly matching is only looking at
> > existing
> > > > transactions in the account. That works fine for me when I
> > duplicate
> > > > the previous paycheck prior to importing the OXF file from my
> > bank.
> > > > Not an ideal way to do this but when I don't it matches the
> > closest
> > > > amount for that payee since the beginning of time.
> > > >
> > > > The paycheck has 15 splits so a single default account does not
> > work.
> > > > Even if I could assign 15 default accounts I would have to update
> > them
> > > > fairly often or they would become less and less useful.
> > > >
> > > > Basically what I am asking for is an option in the payee default
> > > > account settings that says pick the closest amount in the past xx
> > days
> > > > and use that/those category(ies). That alone would eliminate this
> > > > weekly problem for me and probably many others that are less
> > frequent.
> > > > The global settings and my original suggestion are probably not
> > needed
> > > > if this setting was added for each payee.
> > > >
> > > > ----
> > > > Brendan Coupe
> > > >
> > > > On Sat, Jun 22, 2019 at 2:07 PM Jack
> > > > <ostroffjh at users.sourceforge.net> wrote:
> > > > >
> > > > > On 2019.06.22 11:51, Thomas Baumgart wrote:
> > > > > > On Samstag, 22. Juni 2019 17:16:45 CEST Brendan Coupe wrote:
> > > > > >
> > > > > > > I see why my 30 day limit did not help. It does when I
> > manually
> > > > copy
> > > > > > > the most recent paycheck and then import the OFX data.
> > > > > > >
> > > > > > > I have an idea how to deal with this. In the Default Account
> > > > tab for
> > > > > > > the payee there is a checkbox "Use the default category..."
> > If
> > > > > > checked
> > > > > > > you can select a single default category.
> > > > > > >
> > > > > > > How about making 4 radio buttons:
> > > > > > >
> > > > > > > - None
> > > > > > > - Most recent transaction
> > > > > > > - Closest amount
> > > > > > > - Use the default category... (enable the dropdown list when
> > > > > > selected)
> > > > > >
> > > > > > How about a system wide setting with the above option set
> > (maybe
> > > > > > without the last one) and a per payee override option?
> > > > Introduction
> > > > > > of this feature would be done as follows:
> > > > > >
> > > > > > a) the system wide default setting is "closest amount" (which
> > > > > > reflects today's default)
> > > > > > b) payees that don't have the category set will use the system
> > > > wide
> > > > > > setting
> > > > > > c) payees that have a default category set will override the
> > > > system
> > > > > > wide setting with the default category
> > > > > I THINK that sounds right, but I'm wondering what should be per
> > > > account
> > > > > vs per payee vs per category.
> > > > >
> > > > > I may be over thinking it - but when looking for a transaction
> > to
> > > > > "match," am I missing something, or do we still have a lack of
> > clear
> > > > > terminology to distinguish finding the existing transaction to
> > use
> > > > as a
> > > > > "model" [again - not a formal term] for an imported transaction
> > vs.
> > > > > what I think of as "true" matching - to find if the imported
> > > > > transaction is a duplicate of one already present?  I hate to
> > admit
> > > > it,
> > > > > but I'm still not completely clear of that steps followed -
> > first
> > > > > (assuming the imported transaction is not a duplicate) to find
> > the
> > > > best
> > > > > transaction to model (based on what) and then whether to use the
> > > > payee
> > > > > and/or category of that transaction, or the default category of
> > the
> > > > > assumed payee.  Just to add to the mix here, the problem I often
> > > > face
> > > > > is for a payee which usually has transactions with a single
> > category
> > > > > (marked default for that payee) I sometimes create split
> > > > transactions -
> > > > > and it is almost always wrong to use one of these split
> > > > transactions as
> > > > > the model for a newly imported transaction.  How might that fit
> > into
> > > > > this process?
> > > > >
> > > > > >
> > > > > > Does that make sense? Any objections anyone?
> > > > > >
> > > > > > Thomas
> > > > > >
> > > > > >
> > > > > >
> > > > > > > On Sat, Jun 22, 2019 at 4:25 AM Thomas Baumgart
> > > > <thb at net-bembel.de>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > On Freitag, 21. Juni 2019 22:55:29 CEST Brendan Coupe
> > wrote:
> > > > > > > >
> > > > > > > > > I'm running a week old build from the 5.0 branch on
> > Fedora
> > > > 29.
> > > > > > > > >
> > > > > > > > > When I download my savings account transaction using
> > online
> > > > > > banking
> > > > > > > > > the paycheck frequently matches with a very old
> > paycheck.
> > > > This
> > > > > > results
> > > > > > > > > in the splits being way off.
> > > > > > > > >
> > > > > > > > > This happens when the amount of the new paycheck is not
> > very
> > > > > > close to
> > > > > > > > > the most recent paycheck which has been happening a lot
> > > > lately
> > > > > > due to
> > > > > > > > > reimbursed business expanses.
> > > > > > > > >
> > > > > > > > > On the import tab of the ledge settings I have tried
> > setting
> > > > > > "Match
> > > > > > > > > transaction within days" from 7 days (paycheck is
> > weekly)
> > > > to 30
> > > > > > days
> > > > > > > > > and the same thing happens. KMM is definitely matching
> > > > > > transactions
> > > > > > > > > that are much more than 30 days old. In fact the
> > transaction
> > > > > > that it
> > > > > > > > > matched was only $0.01 closer to the new transaction
> > than
> > > > the
> > > > > > previous
> > > > > > > > > paycheck (difference was $8.29 versus $8.30). The
> > > > transaction it
> > > > > > > > > matched is over 18 months old. It appears to be
> > ignoring the
> > > > > > "Match
> > > > > > > > > transaction within days" setting. it's simply matching
> > the
> > > > > > transaction
> > > > > > > > > from the same payee that is closest in value.
> > > > > > > > >
> > > > > > > > > I'm pretty sure this is fairly new behavior but I'm not
> > > > sure if
> > > > > > it
> > > > > > > > > started with the initial version of KMM5 that I used or
> > more
> > > > > > recently.
> > > > > > > >
> > > > > > > > This probably goes back to a change I made in January this
> > > > year:
> > > > > > > >
> > > > > > > >
> > > > > >
> > > >
> > https://cgit.kde.org/kmymoney.git/commit/?id=447213e04d6e7ab9022caeb5c258800625036967
> > > > > > > >
> > > > > > > > which added the part of choosing an ancient transaction
> > based
> > > > on
> > > > > > the smallest difference in amount whereas before it only used
> > old
> > > > > > transactions if the amount was identical.
> > > > > > > >
> > > > > > > > Here's what I found in the code (which perfectly explains
> > what
> > > > > > you encounter):
> > > > > > > >
> > > > > > > > In case the payee name has been found, the following will
> > take
> > > > > > place:
> > > > > > > >
> > > > > > > >       // Fill in other side of the transaction
> > (category/etc)
> > > > > > based on payee
> > > > > > > >       //
> > > > > > > >                 // [...]
> > > > > > > >                 //
> > > > > > > >       // We'll search for the most recent transaction in
> > this
> > > > > > account with
> > > > > > > >       // this payee.  If this reference transaction is a
> > > > simple
> > > > > > 2-split
> > > > > > > >       // transaction, it's simple.  If it's a complex
> > split,
> > > > and
> > > > > > the amounts
> > > > > > > >       // are different, we have a problem.  Somehow we
> > have to
> > > > > > balance the
> > > > > > > >       // transaction.  For now, we'll leave it
> > unbalanced, and
> > > > > > let the user
> > > > > > > >       // handle it.
> > > > > > > >
> > > > > > > > For the category to be found, the first thing is to check
> > if
> > > > the
> > > > > > payee has a default category assigned. If yes, it is taken and
> > > > we're
> > > > > > done. If not, all transactions for that payee in the account
> > will
> > > > be
> > > > > > searched backwards. Note: no date filtering here, which
> > certainly
> > > > is
> > > > > > the cause of the behavior you encounter. The algorithm then
> > works
> > > > as
> > > > > > follows:
> > > > > > > >
> > > > > > > >           // if there is more than one matching
> > transaction,
> > > > try
> > > > > > to be a little
> > > > > > > >           // smart about which one we use.  we scan them
> > all
> > > > and
> > > > > > check if
> > > > > > > >           // we find an exact match or use the one with
> > the
> > > > > > closest value
> > > > > > > >
> > > > > > > > The scan works backwards with the last one being the
> > default.
> > > > So
> > > > > > we have at least one transaction for that payee, and in case
> > of
> > > > > > multiple the one with the least difference in amount will be
> > > > > > selected. Then we continue with:
> > > > > > > >
> > > > > > > >                 // in case the old transaction has two
> > splits
> > > > > > > >                 // we simply inverse the amount of the
> > current
> > > > > > > >                 // transaction found in s1. In other cases
> > > > (more
> > > > > > > >                 // than two splits we copy all splits and
> > > > don't
> > > > > > > >                 // modify the splits. This may lead to
> > > > unbalanced
> > > > > > > >                 // transactions which the user has to fix
> > > > manually
> > > > > > > >
> > > > > > > > The point is, that we are not talking about 'matching' at
> > this
> > > > > > point but automatic categorization of the imported
> > transaction.
> > > > > > Matching happens in the next step when KMyMoney tries to
> > figure
> > > > out
> > > > > > if you already have the said transaction on file (entered
> > manually
> > > > > > for example). And it is for that matching that the interval is
> > > > used,
> > > > > > but not the automatic categorization happening in the step
> > before.
> > > > > > Matching actually means merge two transactions (the one on
> > file
> > > > and
> > > > > > the imported one) into a single one. This is not what is
> > happening
> > > > > > for you and what you certainly don't want with older
> > transactions.
> > > > > > > >
> > > > > > > > I am not sure at this point what happens, if I increase
> > the
> > > > > > matching period beyond one month and another salary payment
> > comes
> > > > in
> > > > > > and it matches. It is certainly not detected as a duplicate
> > but
> > > > does
> > > > > > it match the transactions? I honestly don't know and have
> > never
> > > > tried.
> > > > > > > >
> > > > > > > > Why did I implement the feature as it is: I receive two
> > > > payments
> > > > > > with very different amounts from the same payee each month and
> > > > they
> > > > > > differ in categories. One of the amounts varies each month
> > and the
> > > > > > other one is fix (we talk salary and reimbursement here as
> > well,
> > > > but
> > > > > > I receive them in two payments). The old behavior was always
> > > > wrong,
> > > > > > because taking the last payment from that payee as
> > categorization
> > > > > > base is certainly false and only worked when there was no
> > > > > > reimbursement (which means I received two salary payments in a
> > > > row).
> > > > > > So for me, a matching period of a few days is OK, but for the
> > > > > > categorization I probably need a few months. The default to
> > take
> > > > the
> > > > > > last one on file if nothing else was found is probably a good
> > > > > > decision.
> > > > > > > >
> > > > > > > > Would a new setting to limit the search for transactions
> > to do
> > > > > > the auto categorization help here? What would best describe
> > it and
> > > > > > what would be a neat name for it?
> > > > > > > >
> > > > > > > > Any ideas, anyone?
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Regards
> > > > > >
> > > > > > Thomas Baumgart
> > > > > >
> > > > > > https://www.signal.org/       Signal, the better WhatsApp
> > > > > > -------------------------------------------------------------
> > > > > > A: Because it destroys the flow of the conversation
> > > > > > Q: Why is top-posting bad?
> > > > > > A: Top-posting
> > > > > > Q: What is the most annoying thing in e-mail?
> > > > > > -------------------------------------------------------------
> > > > > >
> > > > >
> > > >
> > >
> >
> > --
> >
> > Regards
> >
> > Thomas Baumgart
> >
> > https://www.signal.org/       Signal, the better WhatsApp
> > -------------------------------------------------------------
> > 'Good code is not created, it evolves.'
> > -- George Anzinger
> > -------------------------------------------------------------
> >
>


More information about the KMyMoney-devel mailing list