Question about AFT and duplicate entries

Mon Feb 25 22:29:00 UTC 2008

Jeff Mitchell wrote:
> Colin Guthrie wrote:
>> Accepting that ATF does not filter out duplicates on the collection
>> browser,
> AFT.  We'll not hear of ATF again, thank you very much.  :-)

Gah! Indeed!

>> it becomes possible for me to play two versions of the same
>> song from different locations, but for the two (or more) files there
>> will only ever be one entry in the uniqueid table.
>>   
> Yes.  Although you mean, the same song, not two versions of the same 
> song, as AFT calculates both on the tag and some part of the actual 
> audio information (which is why it's not perfect as it's not *all* the 
> audio information, but you'd be hard pressed to find a case where it 
> doesn't work properly...and this way is *much* faster than scanning the 
> whole file).

Yeah, sorry, I meant two *copies* of the same song/file.....

>> Does this mean that the uniqueid itself is calculated every time a file
>> is played?
> No.  On scan.
>> Is that not a little pointless?
> No, because that's not what it does.

:)

>> I thiought that the uniquied
>> table was used to provide a cache of these values?
> Yes.  Without that cache the tracking wouldn't work in the first place.

Good. At least my understanding (if not my acronyms and working) is
correct :)

>> In my case it means
>> that the one on the network is read.
>>   
> Specifically, it actually means that the one that is scanned last is the 
> URL that will be reflected in the database.
> 
>> Would it not make way more sense to remove the unique key on the
>> uniqueid field on the uniqueid table? (too many uniques there!!)
>>   
> What would be the point in having unique values be non-unique?

Well in the DB results quoted above, I can play the two copies of the
same song (see selects from the tags table).

I presume that to record a statistic, I need to take the url and
deviceid from the tags table and then look up the uniqueid from the
uniqueid table right? (perhaps not programatically, but at least
conceptually).

Carrying on this premise, only one of the two entries from the tags
table select actually has a corresponding row in the uniqueid table due
to the fact that there is a unique key on the uniqueid table. This way I
can only get a uniqueid for one of the two files I've played. How then
can I record a stat for that sone that didn't get it's lookup?

Now looking at the statistics table, it seems that it also has url and
deviceid fields as well as another unique key on it's uniqueid. This
means that it would be impossible to record statistics against a song if
you have two copies of it in your collection due to key violations.

I suspect the only reason this does not happen more often is due to the
fact that the second copy's uniqueid cannot be looked up in the uniqueid
table in the first place with results in NULL values for uniqueid and
preventing key violations. Indeed:
mysql> select count(*) from statistics where uniqueid is null;
+----------+
| count(*) |
+----------+
|        9 |
+----------+

(this is only after a couple days....... on closer inspection these 9
are all from the NFS collection).

Solution?

Well if I ruled the world of db design, I'd:

1. Remove the unique key on the uniqueid table to allow it to list
*every* file in your collection, copies and all (as the copies *do* show
up in the browser and *can* be played, they *must* be cataloged).
2. I'd remove the url and deviceid fields from the statistics table. The
joining factor should be uniqueid and there should only ever be one of
these in there.

This implies that uniqueid is *always* used which may or may not be
mandatory now anyway?

Am I missing something? I've not digged too heavily into the code, just
looking at the db design.....

Col