[kate] doc/katepart: Update documentation of highlight & RegExp

Sat Oct 19 21:19:38 BST 2019

Git commit b520021b17714aeec3b4a7fbffa6273c76655c6b by Dominik Haumann, on behalf of Nibaldo González.
Committed on 19/10/2019 at 20:19.
Pushed by scmsync into branch 'master'.

Update documentation of highlight & RegExp

M  +145  -2    doc/katepart/development.docbook
M  +38   -11   doc/katepart/regular-expressions.docbook

https://commits.kde.org/kate/b520021b17714aeec3b4a7fbffa6273c76655c6b

diff --git a/doc/katepart/development.docbook b/doc/katepart/development.docbook
index 93e5046e9..632adfd7e 100644
--- a/doc/katepart/development.docbook
+++ b/doc/katepart/development.docbook
@@ -339,7 +339,8 @@ In this example, the <userinput>itemData</userinput> <emphasis>Normal Text</emph
 <varlistentry>
 <term>The last part of a highlight definition is the optional
 <userinput>general</userinput> section. It may contain information
-about keywords, code folding, comments and indentation.</term>
+about keywords, code folding, comments, indentation, empty lines and
+spell checking.</term>
 
 <listitem>
 <para>The <userinput>comment</userinput> section defines with what
@@ -350,12 +351,24 @@ user presses the corresponding shortcut for <emphasis>comment/uncomment</emphasi
 <para>The <userinput>keywords</userinput> section defines whether
 keyword lists are case sensitive or not. Other attributes will be
 explained later.</para>
+<para>The other sections, <userinput>folding</userinput>,
+<userinput>emptyLines</userinput> and <userinput>spellchecking</userinput>,
+are usually not necessary and are explained later.</para>
 <programlisting>
   <general>
     <comments>
       <comment name="singleLine" start="#"/>
     </comments>
     <keywords casesensitive="1"/>
+    <folding indentationsensitive="0"/>
+    <emptyLines>
+      <emptyLine regexpr="\s+"/>
+      <emptyLine regexpr="\s*#.*"/>
+    </emptyLines>
+    <spellchecking>
+      <encoding char="á" string="\'a"/>
+      <encoding char="à" string="\`a"/>
+    </spellchecking>
   </general>
 </language>
 </programlisting>
@@ -397,6 +410,10 @@ to the context specified in fallthroughContext if no rule matches.
 Default: <emphasis>false</emphasis>.</para>
 <para><userinput>fallthroughContext</userinput> specifies the next context
 if no rule matches.</para>
+<para><userinput>noIndentationBasedFolding</userinput> disables indentation-based folding
+in the context. If indentation-based folding is not activated, this attribute is useless.
+This is defined in the element <emphasis>folding</emphasis> of the group <emphasis>general</emphasis>.
+Default: <emphasis>false</emphasis>.</para>
 </listitem>
 </varlistentry>
 
@@ -490,6 +507,35 @@ do not need to set it, as it defaults to <emphasis>false</emphasis>.</para>
 </varlistentry>
 
 
+<varlistentry>
+<term>The element <userinput>emptyLine</userinput> in the group <userinput>emptyLines</userinput>
+defines which lines should be treated as empty lines. This allows modifying the behavior of the
+<emphasis>lineEmptyContext</emphasis> attribute in the elements <userinput>context</userinput>.
+Available attributes are:</term>
+
+<listitem>
+<para><userinput>regexpr</userinput> defines a regular expression that will be treated as an empty line.
+By default, empty lines do not contain any characters, therefore, this adds additional empty lines,
+for example, if you want lines with spaces to also be considered empty lines.
+However, in most syntax definitions you do not need to set this attribute.</para>
+</listitem>
+</varlistentry>
+
+
+<varlistentry>
+<term>The element <userinput>encoding</userinput> in the group <userinput>spellchecking</userinput>
+defines a character encoding for spell checking. Available attributes:</term>
+
+<listitem>
+<para><userinput>char</userinput> is a encoded character.</para>
+<para><userinput>string</userinput> is a sequence of characters that will be encoded as
+the character <emphasis>char</emphasis> in the spell checking.
+For example, in the language LaTeX, the string <userinput>\"{A}</userinput> represents
+the character <userinput>Ä</userinput>.</para>
+</listitem>
+</varlistentry>
+
+
 </variablelist>
 
 
@@ -654,7 +700,7 @@ current context in its <userinput>string</userinput> or
 <userinput>char</userinput> attributes. In a <userinput>string</userinput>,
 the placeholder <replaceable>%N</replaceable> (where N is a number) will be
 replaced with the corresponding capture <replaceable>N</replaceable>
-from the calling regular expression. In a
+from the calling regular expression, starting from 1. In a
 <userinput>char</userinput> the placeholder must be a number
 <replaceable>N</replaceable> and it will be replaced with the first character of
 the corresponding capture <replaceable>N</replaceable> from the calling regular
@@ -666,6 +712,93 @@ expression. Whenever a rule allows this attribute it will contain a
 </listitem>
 </itemizedlist>
 
+<para>How does it work:</para>
+
+<para>In the <link linkend="regular-expressions">regular expressions</link> of the
+<userinput>RegExpr</userinput> rules, all text within simple curved brackets
+<userinput>(PATTERN)</userinput> is captured and remembered.
+These captures can be used in the context to which it is switched, in the rules with the
+attribute <userinput>dynamic</userinput> <emphasis>true</emphasis>, by
+<replaceable>%N</replaceable> (in <emphasis>String</emphasis>) or
+<replaceable>N</replaceable> (in <emphasis>char</emphasis>).</para>
+
+<para>It is important to mention that a text captured in a <userinput>RegExpr</userinput> rule is
+only stored for the switched context, specified in its <userinput>context</userinput> attribute.</para>
+
+<tip>
+<itemizedlist>
+
+<listitem>
+<para>If the captures will not be used, both by dynamic rules and in the same regular expression,
+<userinput>non-capturing groups</userinput> should be used: <userinput>(?:PATTERN)</userinput></para>
+<para>The <emphasis>lookahead</emphasis> or <emphasis>lookbehind</emphasis> groups such as
+<userinput>(?=PATTERN)</userinput> or <userinput>(?!PATTERN)</userinput> are not captured.
+See <link linkend="regular-expressions">Regular Expressions</link> for more information.</para>
+</listitem>
+
+<listitem>
+<para>The capture groups can be used within the same regular expression,
+using <replaceable>\N</replaceable> instead of <replaceable>%N</replaceable> respectively.
+For more information, see <link linkend="regex-capturing">Capturing matching text (back references)</link>
+in <link linkend="regular-expressions">Regular Expressions</link>.</para>
+</listitem>
+
+</itemizedlist>
+</tip>
+
+<para>Example 1:</para>
+<para>In this simple example, the text matched by the regular expression
+<userinput>=*</userinput> is captured and inserted into <replaceable>%1</replaceable>
+in the dynamic rule. This allows the comment to end with the same amount of
+<userinput>=</userinput> as at the beginning. This matches text like:
+<userinput>[[ comment ]]</userinput>, <userinput>[=[ comment ]=]</userinput> or
+<userinput>[=====[ comment ]=====]</userinput>.</para>
+<para>In addition, the captures are available only in the switched context
+<emphasis>Multi-line Comment</emphasis>.</para>
+
+<programlisting>
+<context name="Normal" attribute="Normal Text" lineEndContext="#stay">
+  <RegExpr context="Multi-line Comment" attribute="Comment" String="\[(=*)\[" beginRegion="RegionComment"/>
+</context>
+<context name="Multi-line Comment" attribute="Comment" lineEndContext="#stay">
+  <StringDetect context="#pop" attribute="Comment" String="]%1]" dynamic="true" endRegion="RegionComment"/>
+</context>
+</programlisting>
+
+<para>Example 2:</para>
+<para>In the dynamic rule, <replaceable>%1</replaceable> corresponds to the capture that matches
+<userinput>#+</userinput>, and <replaceable>%2</replaceable> to <userinput>&quot;+</userinput>.
+This matches text as: <userinput>#label""""inside the context""""#</userinput>.</para>
+<para>These captures will not be available in other contexts, such as
+<emphasis>OtherContext</emphasis>, <emphasis>FindEscapes</emphasis> or
+<emphasis>SomeContext</emphasis>.</para>
+
+<programlisting>
+<context name="SomeContext" attribute="Normal Text" lineEndContext="#stay">
+  <RegExpr context="#pop!NamedString" attribute="String" String="(#+)(?:[\w-]|[^[:ascii:]])(&quot;+)"/>
+</context>
+<context name="NamedString" attribute="String" lineEndContext="#stay">
+  <RegExpr context="#pop!OtherContext" attribute="String" String="%2(?:%1)?" dynamic="true"/>
+  <DetectChar context="FindEscapes" attribute="Escape" char="\"/>
+</context>
+</programlisting>
+
+<para>Example 3:</para>
+<para>This matches text like:
+<userinput>Class::function<T>( ... )</userinput>.</para>
+
+<programlisting>
+<context name="Normal" attribute="Normal Text" lineEndContext="#stay">
+  <RegExpr context="FunctionName" String="\b([a-zA-Z_][\w-]*)(::)([a-zA-Z_][\w-]*)(?:&lt;[\w\-\s]*&gt;)?(\()" lookAhead="true"/>
+</context>
+<context name="FunctionName" attribute="Normal Text" lineEndContext="#pop">
+  <StringDetect context="#stay" attribute="Class" String="%1" dynamic="true"/>
+  <StringDetect context="#stay" attribute="Operator" String="%2" dynamic="true"/>
+  <StringDetect context="#stay" attribute="Function" String="%3" dynamic="true"/>
+  <DetectChar context="#pop" attribute="Normal Text" char="4" dynamic="true"/>
+</context>
+</programlisting>
+
 <sect3 id="highlighting-rules-in-detail">
 <title>The Rules in Detail</title>
 
@@ -955,6 +1088,16 @@ The attribute <userinput>column</userinput> counts characters, so a tabulator is
 </para>
 </listitem>
 <listitem>
+<para>In <userinput>RegExpr</userinput> rules, use the attribute <userinput>column="0"</userinput> if the pattern
+<userinput>^PATTERN</userinput> will be used to match text at the beginning of a line.
+This improves performance, as it will avoid looking for matches in the rest of the columns.</para>
+</listitem>
+<listitem>
+<para>In regular expressions, use non-capturing groups <userinput>(?:PATTERN)</userinput> instead of
+capturing groups <userinput>(PATTERN)</userinput>, if the captures will not be used in the same regular
+expression or in dynamic rules. This avoids storing captures unnecessarily.</para>
+</listitem>
+<listitem>
 <para>You can switch contexts without processing characters. Assume that you
 want to switch context when you meet the string <userinput>*/</userinput>, but
 need to process that string in the next context. The below rule will match, and
diff --git a/doc/katepart/regular-expressions.docbook b/doc/katepart/regular-expressions.docbook
index 9c97fac7f..cdf00c5eb 100644
--- a/doc/katepart/regular-expressions.docbook
+++ b/doc/katepart/regular-expressions.docbook
@@ -240,15 +240,14 @@ corresponding to the octal number ooo (between 0 and
 
 <varlistentry>
 <term><userinput>\w</userinput></term>
-<listitem><para>Matches any <quote>word character</quote> - in this case any letter or digit. Note that
-underscore (<literal>_</literal>) is not matched, as is the case with perl regular expressions.
-Equal to <literal>[a-zA-Z0-9]</literal></para></listitem>
+<listitem><para>Matches any <quote>word character</quote> - in this case any letter, digit or underscore.
+Equal to <literal>[a-zA-Z0-9_]</literal></para></listitem>
 </varlistentry>
 
 <varlistentry>
 <term><userinput>\W</userinput></term>
-<listitem><para>Matches any non-word character - anything but letters or numbers.
-Equal to <literal>[^a-zA-Z0-9]</literal> or <literal>[^\w]</literal></para></listitem>
+<listitem><para>Matches any non-word character - anything but letters, numbers or underscore.
+Equal to <literal>[^a-zA-Z0-9_]</literal> or <literal>[^\w]</literal></para></listitem>
 </varlistentry>
 
 
@@ -256,13 +255,17 @@ Equal to <literal>[^a-zA-Z0-9]</literal> or <literal>[^\w]</literal></para></lis
 
 </para>
 
+<para>The <emphasis>POSIX notation of classes</emphasis>,
+<userinput>[:<class name>:]</userinput> are also supported.
+For example, <userinput>[:digit:]</userinput> is equivalent to <userinput>\d</userinput>,
+and <userinput>[:space:]</userinput> to <userinput>\s</userinput>.
+See the full list of POSIX character classes
+<ulink url="https://www.regular-expressions.info/posixbrackets.html">here</ulink>.</para>
+
 <para>The abbreviated classes can be put inside a custom class, for
 example to match a word character, a blank or a dot, you could write
 <userinput>[\w \.]</userinput></para>
 
-<note> <para>The POSIX notation of classes, <userinput>[:<class
-name>:]</userinput> is currently not supported.</para> </note>
-
 <sect3>
 <title>Characters with special meanings inside character classes</title>
 
@@ -331,12 +334,14 @@ put the alternatives inside a subpattern:
 
 </sect3>
 
-<sect3>
+<sect3 id="regex-capturing">
 
 <title>Capturing matching text (back references)</title>
 
-<para>If you want to use a back reference, use a sub pattern to have
-the desired part of the pattern remembered.</para>
+<para>If you want to use a back reference, use a sub pattern <userinput>(PATTERN)</userinput>
+to have the desired part of the pattern remembered.
+To prevent the sub pattern from being remembered, use a non-capturing group
+<userinput>(?:PATTERN)</userinput>.</para>
 
 <para>For example, if you want to find two occurrences of the same
 word separated by a comma and possibly some whitespace, you could
@@ -657,6 +662,28 @@ pattern.</para>
 </listitem>
 </varlistentry>
 
+<varlistentry>
+<term><userinput>(PATTERN)</userinput> (Capturing group)</term>
+
+<listitem><para>The sub pattern within the parentheses is captured and remembered,
+so that it can be used in back references. For example, the expression
+<userinput>(&quot;+)[^&quot;]*\1</userinput> matches
+<userinput>""""text""""</userinput> and
+<userinput>"text"</userinput>.</para>
+<para>See the section <link linkend="regex-capturing">Capturing matching text (back references)</link>
+for more information.</para>
+</listitem>
+</varlistentry>
+
+<varlistentry>
+<term><userinput>(?:PATTERN)</userinput> (Non-capturing group)</term>
+
+<listitem><para>The sub pattern within the parentheses is not captured and
+is not remembered. It is preferable to always use non-capturing groups if
+the captures will not be used.</para>
+</listitem>
+</varlistentry>
+
 </variablelist>
 
 </para>