[KDE-Sonnet] [lingu-dev] Introductions from the KDE guy

Thu Jan 25 18:14:53 CET 2007

Ar Déardaoin 25 Eanáir 2007 09:38, scríobh Jacob R Rideout:
> > > I see that An Gramadoir always has a complete sentence in the
> > > "errortext" element.
> >
> > Sorry, I meant the "sentence" element of course. I think this element
> > might be confusing to the users, as it won't contain a complete sentence
> > if the sentence boundary detection works incorrect.
>
> And even if it is correct, you would need to ensure that you knew the
> particular sentence boundary detection algorithm. A Global offset is
> the correct approach. We need to ensure that the api has well defined
> and documented behavior. That is why I pointed out the difference.

Jacob, Daniel, 

   Before I forget, another convention that should be added to the DTD
is the requirement that the attributes be given in the specified order
to be compliant, or else a simple regex parse isn't going to cut it.

   Regarding the "sentence" attribute, Daniel, are you suggesting that we
change the name to something like "context"?   That would be fine
with me, and would allow either a few words of context or else
full sentences when the boundary detection is robust.  And we can
declare in the definition of the API that applications using it should not
assume that "context" is a sentence.

   Regarding offsets of different kinds:
To clarify, there are two ways of locating the error as I've defined the
Gramadóir API.
(1) The first is to locate it "globally".   The global coordinates
could either be a single number (bytes or characters from beginning of
document), or the (line,column) coordinates I've decided to use.
It is essential to me that we keep the (line,column) coordinates,
i.e. fromx,tox,fromy,toy, or else it becomes more or less impossible
to write the vim plugin using the API, since moving around the
document is based on lines/cols.    I decided to not include a global
byte offset since it can be computed from the lines/cols if need be.
Maybe Sonnet could do that.  But if others think it is essential to include,
I would consider adding it.

(2) The other way of locating the error is within the given "context" string
(what is currently called "sentence").    This could either be done
with an offset counted from the beginning of the context string (as I've 
done),
or with tags contained in the context attribute (as Daniel has done).    
I prefer my way, and if we can agree to keep the attribute I'm willing to
change its name from "offset" to "contextoffset" or something like, and
perhaps move it to just before or after the "context" attribute.

Kevin