Better database handling of UTF-8 data / Case-insensitive search

Leo Franchi lfranchi at gmail.com
Thu May 14 12:30:28 UTC 2009


On Thu, May 14, 2009 at 12:54 PM, Jeff Mitchell <mitchell at kde.org> wrote:

> Mark Kretschmann wrote:
> > Hi Stanislav, big thanks for diagnosing this issue.
>
> Indeed.
>
> Question: what distributions have this problem?
>
> On Gentoo at least, utf8 is the default -- you have to explicitly force
> it to use latin1 as the default (compile time option), or explicitly
> specify latin1 when creating a table.  I would have thought this would
> be the case for most unicode-friendly distributions, i.e. almost any
> modern distribution.
>
> This isn't to say we shouldn't try to fix it on our end; however, I
> think this is also something we should be asking packagers.
>
> For the #1 fix:
>
> The problem I see here is that for people using problematic mysql
> packages, any new table or field that is created will not use the right
> encoding.  So this is at best a temporary fix, and one that we have to
> then remember to carry forward.  I foresee future bug reports.
>
> As for the #2 fix:
>
> When any database update is performed, a full rescan is run, which drops
> pretty much all data with the exception of a few key tables.  I don't
> believe the values in those tables will generally have problems with
> becoming non-unique after encoding changes.  But you mention other issues.
>
>
> I think that there's only one right way to do this:
>
> 1) Clear the tables that would be wiped clean by a full rescan anyways.
> 2) Create a new DB, and dump the remaining data over.
> 3) Create the old one again, with the right encoding by default.
> 4) Dump the data back over.
>
>
> I actually would support doing this before 2.1 release, and having
> people using SVN test it out.  I'm pretty comfy (compared to most) with
> the DB code and think I could get this done.  Otherwise, I think the
> entire fix would have to wait for 2.2, which will be
> who-knows-how-far-away.


I don't agree with doing this for 2.1. Regardless of how comfortable anyone
is with the code, we've already released beta2 and have maybe 1 more RC
left. This is not the time for *any* major changes, regardless of how
straightforward. And i know you're going to say it's not a major change, but
when it has the potential to affect every user's collection, it's pretty
important. we do have 2.1.1, it's not like 2.1.0 is our last release until
2.2.

just look at what happened with max's commit to the collection code---he
knows it well, it was an easy fix, but it still b0rked collections and took
a day of people complaining to track it down properly. this is not something
to be experimenting with days before a major release.

leo



-- 
______________________________________________________
Leo Franchi
7016 Wandering Oak          lfranchi at kde.org
Austin                        leonardo.franchi at tufts.edu
            cell: (650) 704 3680
TX, USA                              home: (650) 329 0125
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/amarok/attachments/20090514/ee93a918/attachment.html>


More information about the Amarok mailing list