Messed-up encoding in doc-comments (in the popup-widget)

Wed Jul 8 06:44:30 UTC 2009

On 08.07.09 07:03:46, Niko Sams wrote:
> On Wed, Jul 8, 2009 at 00:16, Andreas Pakulat<apaku at gmx.de> wrote:
> > On 07.07.09 23:58:53, David Nolden wrote:
> >> Am Dienstag 07 Juli 2009 14:53:59 schrieb Milian Wolff:
> >> > Hey guys, I want to fix bug https://bugs.kde.org/show_bug.cgi?id=183182
> >> > but am a bit lost. Can someone give me a bit of insight?
> >> >
> >> > What I found out is that it only affects doc-comments and that it is not
> >> > due to formatComment().
> >> >
> >> > 1) How could I properly debug this stuff? kDebug() doesn't seem to work
> >> > fine. Could I use GDB and somehow print me the text and see where it
> >> > gets corrupted?
> >> >
> >> > 2) Could the popup-widget be the culprit? Where are its sources again?
> >> Generally, the comments are supposed to be utf8 encoded within the duchan. And
> >> kDebug() should work properly with them. If it doesn't, then that's probably
> >> already part of the problem.
> >>
> >> There is one thing that comes into my mind. Look at
> >> kdevelop/languages/cpp/preprocessjob.cpp: I think there's somewhere a @todo,
> >> saying something like "convert the file to utf-8 if it isn't yet", and I think
> >> that todo is there still.
> >>
> >> That would need a check like "If the local encoding is not utf-8, convert the
> >> text before processing it".
> >
> > I'd just like to add that "use the local encoding" might not necessarily
> > help. Most distro's use utf-8 as default encoding, but the user may have
> > files from "back then" when he was using KOI-8, latin9 or whatever.
> >
> > As selecting the right encoding for each file is not going to work for the
> > background parser, I'm wondering wether maybe we should try to use th
> > KEncodingProber class from kdelibs (IIRC thats the name of the newer one,
> > using the algorithms developed by mozilla to detect the encoding)?
> I think a setting per project is needed. Katepart should also use that setting.
> Autodetection can just guess. (for parsing this could be enough - but not
> for editing)

Well, I'm not sure a project-wide option helps, these people usually have a
different encoding on every other file :)

Hmm, I think there's a way to specify an encoding via the
.kate-command-file (not sure what the filename is, it takes the same
kate-commands as the kate-modeline), so if we can somehow find and read
that and just use that encoding the user at least has a way of specifying
the encoding on a per-directory base. 

If someone really wants it on a per file base, we'd have to store that
somehow somewhere and it'll definetly be non-automatic.

Andreas

-- 
Don't tell any big lies today.  Small ones can be just as effective.