KSyntaxHighlighting meta data

Thu Aug 2 10:31:59 BST 2018

Hi Volker,

On Thu, Aug 2, 2018 at 10:21 AM, Volker Krause <vkrause at kde.org> wrote:
> On Wednesday, 1 August 2018 22:31:03 CEST Dominik Haumann wrote:
>> Hi everyone,
>>
>> I just discussed quickly with Christoph: In his branch, KTextEditor
>> uses KSyntaxHighlighting to do highlighting - this is a very nice
>> first step :-)
>>
>> But we also identified that a lot of stuff is currently missing in the
>> KSyntaxHighlighting API, namely all the metadata like:
>> - language element: style - needed for information
>> - language element: indenter - needed for restrict allowed indenters
>> - general element: comment start / end markers for single line and
>> multi-line - general element: keyword sensitivity
>> - spellcheck information: which escape sequences should be replaced,
>> e.g. in LaTeX.xml
>> - and much more
>>
>> Two years back I already added the to Definition the functions
>>     bool indentationBasedFoldingEnabled() const;
>>     QStringList foldingIgnoreList() const;
>>
>> We could now start to add all this above to the Definition as
>> additional information. However, this is strictly speaking very
>> tightly related to KTextEditor, and possibly often not of interest to
>> other users of the KSyntaxHighlighting API.
>>
>> So we have the following options:
>> - Add many getters to Definition. Flexible, but we have to keep API
>> compatibility for all added functions.
>> - Add a new class like DefinitionMetaData (or better named), that
>> includes all this information. Maybe a bit cleaner, API compatibility
>> still holds.
>> - Add a general getter that allows to query key/values like
>> definition.get(key) -> QVariant. This could also be a QVariantMap or
>> similar... This is flexible, but we have no type safety, and behavior
>> changes still possibly imply problems. We could also have a solution
>> that has both: e.g. QVariantMap Definition::metaData(MetaDataType t),
>> where MetaDataType is an enum of e.g. { Language, General,
>> Indentation, Spellchecking, ...}
>>
>> I think it makes sense to start this discussion now, so that we can
>> decide on a solution by next week and implement this.
>>
>> Maybe it first makes sense to come up with a complete list of what we need?
>
> That would help I guess, I didn't even remember the <encoding> stuff in
> latex.xml...
>
> This might end up as a mix of the above options, as some of the properties are
> sufficiently generic and can be of general interest (e.g. the comment
> characters), which justifies proper getters IMHO, while some others look more
> like language-specific workarounds to me.

I agree. Some parts were invented on demand as workarounds.

> I'm not too worried about keeping API compatibility for extra getters, this
> all exists since many years and has proven to be stable.

Ok.

> I am however more
> worried about aspects of the syntax files becoming part of the de-facto API
> that you might not expect to be "API". We have that with the keyword access
> now for example, which exposes internal names used in the syntax files.

True, I haven't thought about it this way. Should we remove the keywordList()
getters again? Maybe we can come up with something better at Akademy.

CC: Alexander, who requested this feature

> The
> generic key/value access API would go into a similar direction. This then
> leads to "API" breakage when e.g. renaming what looks like a local identifier
> in the XML file. So dedicated getters might lead to less things that need to
> be kept stable for API compatibility, ironically.

Ok...

> Doesn't mean we can't have the generic API though, for language specific
> workarounds or language specific KTE scripts this can still be the best
> approach, with a few extra warnings comments in the XML files maybe.

For the short-term, there are other workarounds:
Given a Definitions gives you also the location to the xml file, we could
even parse additional information ourselves. Yes, that would mean we
parse xml twice. But at least thet gives us everything we need and we
can move more stuff to public api later.

@Alexander: In fact, you could do that already now as well: Open the
xml file and read which keyword lists exist. Not the most performant
solution, but it would work.

Greetings
Dominik