Taglib issues
Tom Sorensen
tsorensen at gmail.com
Fri Sep 16 18:33:12 CEST 2005
Ok, I've been trying to figure out what on earth is up with my tags
for sometime now. I think I'm getting closer, and in doing so I've
found some bugs in Taglib (or perhaps some deep mis-use by me... I'll
let you decide).
Quick point -- Taglib has issues with creating duplicate tags, and in
tag ordering. Calling Taglib::Tag::duplicate() more than once will
result in multiple copies of the ID3v2 tag on your output file.
Additionally, depending on ordering, you can end up with an APEv2 tag
_after_ the ID3v1 tag (and if you're really good, as I was, you end up
with ID3v1, APEv2, ID3v1). In order to avoid this happening you must
copy the APE tag, save the file, then duplicate the ID3 tags and save
the file again. If you do the APE tag second or do not save the file
between the two the tags will not come out correctly.
I'm also seeing some wonkiness in the APEv2 tags, but I haven't
narrowed that issue down yet.
To be clear, I'll go over the steps used to get to this point. All my
MP3s were ripped from CD using Grip or CDex and lame w/ --alt-preset
extreme. Tags were set either at rip/encode time or modified later
with EasyTag or MP3Tag. At this point all tags are valid and in
ID3v2.3 and ID3v1. I also normalized volumes using mp3gain on Linux
(at an album level), which added an APEv2 tag.
I then transcoded the MP3s to 160kbps VBR using lame (--preset 160
--noreplaygain) to make them a more reasonable size for portable
players. I know. Bad. But with over 700 CDs and 9000 songs it wasn't
viable to re-rip. Doing this loses all tag information, so I wrote a
program using Taglib to copy the tags. It opens the source and target
files, checks for ID3v2 tag on source, duplicates it, checks an APEv2
tag, copies any it finds, checks for an ID3v1 tag, duplicates it. I
then re-ran mp3gain to normalize volumes at a track level.
Unfortunately a lot of software does not handle the tags correctly at
this point. Most commonly the genre is misread -- a song with a genre
of Blues gets marked as genre "0". For some reason they're not doing
the lookup. A lot of other ID3v2 info seems to be lost as well --
particularly track counts (xx/yy, the /yy bit disappears). Some of
this is due to id3lib based programs not reading the v2.4 tags, but
iTunes has the problem as well, and it reads v2.4.
I've been hexdumping (hexdump -C on linux) the mp3s now and discovered
that there were 3 copies of the ID3v2 tag at the start of the file, 2
copies of the ID3v1 tag at the end (kinda), and the APEv2 tag is duped
and one of them lies between the two ID3v1 tags. I quickly determined
that the ID3 duping was because of the two TagLib::Tag::duplicate()
calls. Removing one makes things work (see below though). That said,
the API docs do not explicitly state that all ID3 tags are copied, and
it was my assumption that they wouldn't be since you can only pass an
ID3v1::Tag or an ID3v2::Tag object to the call -- you cannot construct
just a plain Tag object. The behavior of ::duplicate copying
everything at once is fine, but the API needs to make it clear which
tags will be copied (not all, since APE tags aren't) and it should
make sure not to let Bad Things happen like creating two copies of the
ID3v2 tag.
Tag::duplicate also modifies the ID3v2 TRCK field -- it strips out
non-numeric data (like /10) and does not retain leading 0s (01 becomes
1; 01/10 becomes 1). The stripping, at least, is contrary to the
standard and removing leading 0s is likely to annoy people.
As for the ID3v1 tag/APE tag -- if you use the APE::ItemListMap to
iterate over all APE tags and copy them then it does so.... except
that in my output file there are two copies of the APE tags, separated
by 1166 bytes in the MP3 I've been testing with. Additionally, if you
copy the APE tag after having duplicated the ID3 tags, or if you do so
before and *do not save* prior to duplicating the ID3 tags then you'll
end up with one of the APEv2 tags occurring after the ID3v1 tag. You
must copy the APE tags, save the file, and duplicate the ID3 tags.
Otherwise they'll end up hosed.
I can provide the code used for copying the tags (it's pretty much toy
code; <100 lines), as well as a sample MP3 in various stages of
distress.
Now I have to figure out how to fix the several thousand MP3s that
have bad tag info... preferably without transcoding them from the high
bitrate versions again.
More information about the taglib-devel
mailing list