Encoding of track metadata in database

Mon Sep 20 13:40:22 UTC 2010

Hallo,

I've written a Python script, which exports cover images from Amarok
to the album directory and sets them as directory icon.  Nothing
particularly useful, but quite fancy and eye-candy.  The script
queries the Amarok MySQL database to get a list of existing albums,
together with their cover images and their directories.

Currently the script simply assumes UTF-8 encoded data in the
database.  This works for me, but other users, whose Amarok database
contains a mixture of UTF-8 and ISO-8859-1 encoded metadata, have
reported problems.  Apparently the Amarok database contains
arbitrarily encoded data, probably simply copied directly from the ID3
tags.

Nevertheless, in all of these cases Amarok itself displayed the track
names correctly, so Amarok can apparently decode this data correctly.
How does Amarok do this?  How does it detect the encoding of Album
metadata?  Can any developer shed some light on this?   I'd like to
mirror Amarok's behaviour in my script to get it working for the
affected users.

Thanks in advance,
Sebastian Wiesner

[1] http://github.com/lunaryorn/amarok-covers