The future of Sonnet

Wed Jan 3 14:40:53 GMT 2007

On Tuesday 02 January 2007 05:13, Jacob R Rideout wrote:
> > Are the classes inside a namespace? Names like KAbstractFilter and
> > GuessLanguage are a bit too generic otherwise and the latter might
> > clash with symbols from something else.
>
> I've been considering this myself. KSpell2 used a namespace for all
> its classes. Most other classes in KDE, however, seem just to use
> distinct names or a prefix in the name. KOffice prefixes its class
> with ko. I'd prefer not to use a namespace, but I've yet decide on
> naming scheme and have thus solicited outside advice. I think
> prefixing internal (although still accessible) classes with Sonnet and
> those meant to be used by application developers with
> KUniqueAndDescriptiveName might be best.
>
> > Are the API docs online somewhere?
>
> All development right now is taking place in branches/work so they
> aren't auto-generated. I just created a few for the classes I've
> documented so far.
>
> http://www.jacobrideout.net/sonnet/classKAbstractFilter.html
> http://www.jacobrideout.net/sonnet/classKTextBreaks.html
> http://www.jacobrideout.net/sonnet/classKUnicodeData.html

I just had a look at it and the implementation and I suggest not to make those 
classes public. As pointed out previously their names are generic. But in 
addition I suggest a default policy for new code in libraries to make 
everthing private by default and export/publicize API once it it is useful 
for applications.

Applications using a spell checker are unlikely to be interested in TR32 
specific character categories. And by having it in the public API it means 
you can't change it anymore. You can't even move it into a different library, 
because that is not binary compatible on Windows.

I suggest to make only SonnetSpeller public API, after another review. And 
maybe a few helper functions for bringing up and controlling a the standard 
spell checker dialog. But I wouldn't make the actual dialog class public. 
Remember, if it's public it's very hard to replace with another dialog.

On the TR23 implementation: I think string lookups for every character to get 
the tr23 category are going to be dreadfully slow. And performance is 
certainly important for a spell checker :).

I'm specifically think of this code:

       // start rule based checking

        //WB5
        if ( ( catagory == data->categoryCode("ALetter") )
                &&
                ( catagory2 == data->categoryCode("ALetter") )
           )
        {
            //kDebug(750) << "WB5";
            bk=false;
        }

        [ ... more of these ... ]

An enum would be so much faster instead and keep the code still very readable.

Even though I suggest private API for the AbstractFilter class here's one 
suggestion for the Type enum:

If you look at an enum in the class header it is often easily understandable 
what the individual values mean. But if you look at the caller code it is 
suddenly not apparent anymore. For example:

	filter->setType(AbstractFilter::Sentence);

If you look at this piece of code half a year later I promise you'll have no 
idea anymore what exactly this means. It would be much more readable with a 
longer name, including a verb. In Qt we very often repeat the name of the 
enum in the individual values to solve this very problem. Here's an 
alternative suggestion:

enum FilterMode {
    FilterSentences,
    FilterWords,
    FilterGraphemes
};

Or perhaps filtering is the wrong word here since in a way you don't filter 
but you itemize the text. The current API suggests that at least.

Hope this helps a bit :)

Simon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20070103/418f12ff/attachment.sig>