Question about AFT and duplicate entries

Mon Feb 25 23:09:20 UTC 2008

Colin Guthrie wrote:
> I presume that to record a statistic, I need to take the url and
> deviceid from the tags table and then look up the uniqueid from the
> uniqueid table right? (perhaps not programatically, but at least
> conceptually).
>
>   
Something like that, yes.

> Carrying on this premise, only one of the two entries from the tags
> table select actually has a corresponding row in the uniqueid table due
> to the fact that there is a unique key on the uniqueid table. This way I
> can only get a uniqueid for one of the two files I've played. How then
> can I record a stat for that sone that didn't get it's lookup?
>   
You can't.  Buyer beware.  The point of AFT isn't to make sure that 
every file that is the exact same song always updates stats -- it's to 
make sure that stats aren't lost totally.  It's up to you to make sure 
you don't have duplicate files all over your collection (which are 
generally annoying anyways and people generally want to get rid of 
them).  Or that you always play the same one.

If anything, people generally want to use AFT to find duplicates to 
avoid this situation, because it's so annoying  :-)
> Now looking at the statistics table, it seems that it also has url and
> deviceid fields as well as another unique key on it's uniqueid. This
> means that it would be impossible to record statistics against a song if
> you have two copies of it in your collection due to key violations.
>
> I suspect the only reason this does not happen more often is due to the
> fact that the second copy's uniqueid cannot be looked up in the uniqueid
> table in the first place with results in NULL values for uniqueid and
> preventing key violations. Indeed:
> mysql> select count(*) from statistics where uniqueid is null;
> +----------+
> | count(*) |
> +----------+
> |        9 |
> +----------+
>
>   
Right.  BUT.  There is still a URL associated with that second copy.  
And the uniqueid will get updated to the proper ID if/when that first 
file is ever deleted/removed/not found.  Then the second one will be 
found, the entry in statistics will get a proper uniqueid attached to 
it, and it becomes the new one.  Does this mean you can bounce back and 
forth with your statistics?  Absolutely.  See my preceding answer for 
why this is generally not a problem.

That being said, when stats are being updated, it is possible that the 
song (if currently lacking a uniqueid) could have its uniqueid be 
calculated and check the stats table for that first before simply 
relying on the URL.  It doesn't work that way currently, but maybe it'd 
be a good change.

> 1. Remove the unique key on the uniqueid table to allow it to list
> *every* file in your collection, copies and all (as the copies *do* show
> up in the browser and *can* be played, they *must* be cataloged).
>   
Maybe.  A long time ago the design was to allow this, and not have the 
unique field be unique (an ID rather than a unique ID).  But it didn't 
work for other reasons, some of which have been mitigated by now, and 
I'm not sure about the rest.
> 2. I'd remove the url and deviceid fields from the statistics table. The
> joining factor should be uniqueid and there should only ever be one of
> these in there.
>   
You'd have to cause everywhere the statistics table is accessed to have 
to put in code to ensure that the "full" URL is figured out ahead of 
time to avoid screwing with dynamic collection.
> This implies that uniqueid is *always* used which may or may not be
> mandatory now anyway?
>   
It's used...for files in your collection.  Not for files outside your 
collection.

Anyways, some of this should be thought out for AFT in A2, which won't 
be for a long while as all the other subsystems need to be working 
first.  These things can't be changed in the stable version since we 
can't do huge DB modifications in the stable code.

Thanks for the input...

--Jeff