Question about AFT and duplicate entries
Jeff Mitchell
kde-dev at emailgoeshere.com
Mon Feb 25 23:09:20 UTC 2008
Colin Guthrie wrote:
> I presume that to record a statistic, I need to take the url and
> deviceid from the tags table and then look up the uniqueid from the
> uniqueid table right? (perhaps not programatically, but at least
> conceptually).
>
>
Something like that, yes.
> Carrying on this premise, only one of the two entries from the tags
> table select actually has a corresponding row in the uniqueid table due
> to the fact that there is a unique key on the uniqueid table. This way I
> can only get a uniqueid for one of the two files I've played. How then
> can I record a stat for that sone that didn't get it's lookup?
>
You can't. Buyer beware. The point of AFT isn't to make sure that
every file that is the exact same song always updates stats -- it's to
make sure that stats aren't lost totally. It's up to you to make sure
you don't have duplicate files all over your collection (which are
generally annoying anyways and people generally want to get rid of
them). Or that you always play the same one.
If anything, people generally want to use AFT to find duplicates to
avoid this situation, because it's so annoying :-)
> Now looking at the statistics table, it seems that it also has url and
> deviceid fields as well as another unique key on it's uniqueid. This
> means that it would be impossible to record statistics against a song if
> you have two copies of it in your collection due to key violations.
>
> I suspect the only reason this does not happen more often is due to the
> fact that the second copy's uniqueid cannot be looked up in the uniqueid
> table in the first place with results in NULL values for uniqueid and
> preventing key violations. Indeed:
> mysql> select count(*) from statistics where uniqueid is null;
> +----------+
> | count(*) |
> +----------+
> | 9 |
> +----------+
>
>
Right. BUT. There is still a URL associated with that second copy.
And the uniqueid will get updated to the proper ID if/when that first
file is ever deleted/removed/not found. Then the second one will be
found, the entry in statistics will get a proper uniqueid attached to
it, and it becomes the new one. Does this mean you can bounce back and
forth with your statistics? Absolutely. See my preceding answer for
why this is generally not a problem.
That being said, when stats are being updated, it is possible that the
song (if currently lacking a uniqueid) could have its uniqueid be
calculated and check the stats table for that first before simply
relying on the URL. It doesn't work that way currently, but maybe it'd
be a good change.
> 1. Remove the unique key on the uniqueid table to allow it to list
> *every* file in your collection, copies and all (as the copies *do* show
> up in the browser and *can* be played, they *must* be cataloged).
>
Maybe. A long time ago the design was to allow this, and not have the
unique field be unique (an ID rather than a unique ID). But it didn't
work for other reasons, some of which have been mitigated by now, and
I'm not sure about the rest.
> 2. I'd remove the url and deviceid fields from the statistics table. The
> joining factor should be uniqueid and there should only ever be one of
> these in there.
>
You'd have to cause everywhere the statistics table is accessed to have
to put in code to ensure that the "full" URL is figured out ahead of
time to avoid screwing with dynamic collection.
> This implies that uniqueid is *always* used which may or may not be
> mandatory now anyway?
>
It's used...for files in your collection. Not for files outside your
collection.
Anyways, some of this should be thought out for AFT in A2, which won't
be for a long while as all the other subsystems need to be working
first. These things can't be changed in the stable version since we
can't do huge DB modifications in the stable code.
Thanks for the input...
--Jeff
More information about the Amarok
mailing list