kde-i18n-doc Digest, Vol 239, Issue 10

Natalie Clarius natalie_clarius at yahoo.de
Tue Feb 14 03:15:28 GMT 2023


 
> And if the rest of the KRunner team is listening, if not please convey, please document all the uses in the docbook, so that we can assist people learning all the keywords, and simply present it as a reference source (and add our own quirks). I’d love to advertise it to my local community.
Yes, usage absolutely should be documented, and it is. KRunner plugins documentation is not in docbooks, but made available through the help runner as explained in my earlier mails. This is the translation string with the placeholders you were asking about and the accompanying explanation. Type "?time" in KRunner to see what it looks like. 

Come to think of it, it might be nice to have some tool that would automatically convert these into a docbook for those that prefer documents to inline help, but that's out of scope for this thread now.
    Am Sonntag, 12. Februar 2023 um 11:12:36 MEZ hat <kde-i18n-doc-request at kde.org> Folgendes geschrieben:  
 
 Send kde-i18n-doc mailing list submissions to
    kde-i18n-doc at kde.org

To subscribe or unsubscribe via the World Wide Web, visit
    https://mail.kde.org/mailman/listinfo/kde-i18n-doc
or, via email, send a message with subject or body 'help' to
    kde-i18n-doc-request at kde.org

You can reach the person managing the list at
    kde-i18n-doc-owner at kde.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of kde-i18n-doc digest..."


Today's Topics:

  1. Fw: Translation for time zone conversion runner (Natalie Clarius)
  2. Re: Fw: Translation for time zone conversion runner
      (Thomas Vergnaud)
  3. Re: Translation for time zone conversion runner (Emir SARI)


----------------------------------------------------------------------

Message: 1
Date: Sun, 12 Feb 2023 04:16:43 +0000 (UTC)
From: Natalie Clarius <natalie_clarius at yahoo.de>
To: "kde-i18n-doc at kde.org" <kde-i18n-doc at kde.org>
Subject: Fw: Translation for time zone conversion runner
Message-ID: <1385199326.523184.1676175403187 at mail.yahoo.com>
Content-Type: text/plain; charset="utf-8"

 Hi Shinjo and Emir,

thanks for the feedback.

The following kinds of time zone names are recognized:
- city (eg "Berlin")
- international abbreviation (eg "CET")
- long name (eg "Central European Standard Time")
- short name (eg "GMT+1")
- offset name (eg "UTC+01:00")

As mentioned in my previous mail, the time zone names already are localized. For Korean, the code tells me that "중부유럽 표준시" would be recognized for CET. 

As for the syntax: We can not just write code that would parse real natural language queries for ~60 different languages. This is not a feasible thing to do in a runner plugin, and it was never meant to do that, not in the translated languages but not in English either. I can see how the current runner syntax gave that impression though so I'll try to explain how I intended it to work.

Runners use a simple syntax that gets triggered by keywords in combination with user input according to a pattern that is explained in the user help. The user help can be invoked by clicking on the "?" in KRunner and selecting from the plugin list, or typing "?" followed by the name of the plugin. Eg if you enter "?power" in KRunner, it will explain the query syntax for the power runner, amongst others that "screen brightness <percentage value>" can be used to set the screen brightness. The "<" ">" are just to indicate that this is a placeholder where the user is supposed to insert a value like "10" and not literally type in the words "percentage value". This is the common way of writing placeholders in manuals, and I was assuming you were familiar with it; sorry if that caused confusion.

So, for example, you can type "screen brightness 10" in the power runner to set the screen brightness, or "dolphin close" in the windows runner to close a Dolphin window, or "time berlin" to display the current time in Berlin. Neither of these are proper phrases in English either. It's a simple fixed pattern that will work the same independently of the user language, and what is translated are the individual parts of the query; e.g. in Turkish you set the screen brightness by typing "ekran parlaklığı 10". The idea is the same for the time zone conversion.

While we can make fixed known strings available for translation so that the translators are free to provide the exact word order and inflection for that particular string, we can not do the same for user input, which can be anything. We can not have an arbitrary order of the parts in the query, or automatic different query parsing strategies for each individual language. The runner needs some way of knowing where one part starts and the other ends. The runner is not smart, it does not and can not understand natural language.

What I can offer is to make the syntax even simpler. In a previous version, I had the input format as "<from-timezone> <time> <to-timezone>", e.g. "Berlin 8:00 UTC" to convert 08:00 Berlin time to UTC. It was suggested to change it to the "8:00 Berlin in UTC" style because, on the one hand, it was similar to what we already have in the unit conversion runner, and on the other hand, they (a native English speaker) considered it more intuitive. But I see now that this causes more problems than it solves, because it is pretending a complexity of understanding that isn't there, and won't work for languages that aren't English. So I am now leaning towards changing the input format to "<from-timezone> <time> <to-timezone>", which doesn't have any bells and whistles to cause wrong expectations and inconsistency between languages.

If, on the other hand, you say that both in Korean and in Turkish, a syntax like "<time> <from timezone> in <to timezone>" (where the time zone, the time zone names as well as a word for "in" are localized) is at least somewhat feasible, at any rate after the fact that we have (likewise translated) user help explaining how to use it, then I would say there is not much of a problem, and you are good to go by simply translating the "in" keyword and any strings that are used in the output formatting.

Let me know what you think or if there are any follow-up questions.
Natalie



  ----- Weitergeleitete Nachricht ----- Von: Emir SARI <emir_sari at icloud.com>An: KDE i18n-doc <kde-i18n-doc at kde.org>CC: "natalie_clarius at yahoo.de" <natalie_clarius at yahoo.de>Gesendet: Samstag, 11. Februar 2023 um 00:14:08 MEZBetreff: Re: Translation for time zone conversion runner
 Hello,

For some reason I am not getting the replies from the original author, but anyway. I’ve seen the author's message from Shinjo’s reply.

As a disclaimer, I do not think handling natural language through translatable strings is a good idea, instead it should be handled via quirks in code for every supported language. It is simply not covering all cases, and English being one of the simplest languages out there, it does not help either.

> <time> <from timezone> <in> <to timezone>
> 
> For instance, a user can type "10:00 CET in China" to convert 10:00 CET to
> China time and get 17:00 as a result.

Why the variables are in <> brackets, is there a technical reason for this? My translator side has gotten used to seeing input placeholders in <> brackets, and now I realise that there may be relevant mistakes already because of this present. And since these are not reorderable, it’s problematic for my language. Hopefully I won’t need to dive into code every time I see these again.

In Turkish, due to the Subject-Object-Verb word order, nearly all of the standart English structures need to be flipped. So, "10:00 CET in China” becomes “Çin’de 10:00 MAS” (China-in 10:00 CET, yes it’s agglutinative, so that “in” becomes an issue already. I can think of something like “10:00 MAS için Çin saati” (China time for 10:00 CET), but I am really not sure if this sounds okay, and really forced. Also I’d need separate variants for date and time. It should really be re-orderable.

> "18:00 UTC *to* CET”

I’d just go with something like "18:00 UTC, CET" or "18:00 UTC CET” for the lazy. Not that it’s correct, but omission of a certain textual location makes it easier to forego. This works very nicely for European languages I see, but not necessarily for others.

> The order "<time> <from timezone> <in> <to timezone>" is fixed, and was
> chosen in analogy to how the unit conversion runner works (e.g. "2 liters
> in milliliters"). Is there any language where this syntax for time zone
> conversion would not be natural at all?

In unit conversion it works in Turkish, because luckily "2 kg in mg” is informally used as “2 kg kaç mg” (2 kg how much mg), but not for time zone conversions which apparently requires a more delicate and formal language syntax.

As another example; in Japanese, it’s also possible to use the same unit conversion syntax as in Turkish (2キログラムは何ミリグラム - 2 kilogram is what miligram), but not for time conversion (中国では 10:00 CET - China-in 10:00 CET). Japanese also uses the same word order as in Turkish (my Japanese is nowhere near proficient though).

Even without any proper language use, word order and whatnot; the system should be smart enough to make assumptions independently of the language settings, and put out soma results without being bound to translatable strings. Otherwise you get frustrated people that gets the features advertised to them but still can’t print results due to some translator error or too literal code.

To improve this in general, it would be nice to make everything re-orderable and spread out English variants as much as possible, and having the each target translations for these variants support more than one outputs separable with ‘;’ where it makes sense. Also avoid string concatenation at all costs!!!

If something is not clear, let me know. It’s very late over here, I’m not sure all my words made sense.

Best regards,
Emir (𐰽𐰺𐰍)

** E-mail needs to stay simple
** Use plain text e-mail

  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-i18n-doc/attachments/20230212/b8b70671/attachment-0001.htm>

------------------------------

Message: 2
Date: Sun, 12 Feb 2023 10:24:44 +0100
From: Thomas Vergnaud <thomas.vergnaud at mailo.eu>
To: "kde-i18n-doc at kde.org" <kde-i18n-doc at kde.org>
Cc: Natalie Clarius <natalie_clarius at yahoo.de>
Subject: Re: Fw: Translation for time zone conversion runner
Message-ID: <13203567.uLZWGnKmhe at arkana>
Content-Type: text/plain; charset="UTF-8"

Hi Natalie,

You explained the KDE runner is only able to recognize simple text structures. 
That is, it cannot really recognize text written in natural language; it can 
only recognize keywords in a predefined order. Therefore, you are trying to 
define a very simple «grammar» based on a limited set of keywords to be put in 
a given order.

Basically, you are attempting to find the proper grammar to say «convert the 
time <T> from timezone <A> to timezone <B>».
>From the emails that have been exchanged so far, I understand there are 
different – and specific – ways to write this in natural languages. Thus it 
seems difficult to define one universal runner grammar that could conveniently 
fit in all languages.

Here is my idea: would it be possible to use keywords like «@» or «->» in the 
runner grammar? Using such technical keywords would prevent people from 
thinking they can write in natural language.

For example, one may write «10:00 @Paris -> @Tokyo», or something like that.

Thomas



Le dimanche 12 février 2023, 05:16:43 CET Natalie Clarius a écrit :
[…]
> What I can offer is to make the syntax even simpler. In a previous version,
> I had the input format as "<from-timezone> <time> <to-timezone>", e.g.
> "Berlin 8:00 UTC" to convert 08:00 Berlin time to UTC. It was suggested to
> change it to the "8:00 Berlin in UTC" style because, on the one hand, it
> was similar to what we already have in the unit conversion runner, and on
> the other hand, they (a native English speaker) considered it more
> intuitive. But I see now that this causes more problems than it solves,
> because it is pretending a complexity of understanding that isn't there,
> and won't work for languages that aren't English. So I am now leaning
> towards changing the input format to "<from-timezone> <time>
> <to-timezone>", which doesn't have any bells and whistles to cause wrong
> expectations and inconsistency between languages.
> 
> If, on the other hand, you say that both in Korean and in Turkish, a syntax
> like "<time> <from timezone> in <to timezone>" (where the time zone, the
> time zone names as well as a word for "in" are localized) is at least
> somewhat feasible, at any rate after the fact that we have (likewise
> translated) user help explaining how to use it, then I would say there is
> not much of a problem, and you are good to go by simply translating the
> "in" keyword and any strings that are used in the output formatting.







------------------------------

Message: 3
Date: Sun, 12 Feb 2023 13:11:55 +0300
From: Emir SARI <emir_sari at icloud.com>
To: KDE i18n-doc <kde-i18n-doc at kde.org>
Cc: Natalie Clarius <natalie_clarius at yahoo.de>
Subject: Re: Translation for time zone conversion runner
Message-ID: <8F203402-CC9A-40E5-90F1-53A0D9D90BE3 at icloud.com>
Content-Type: text/plain;    charset=utf-8

Hello Thomas, Natalie;

> Basically, you are attempting to find the proper grammar to say «convert the 
> time <T> from timezone <A> to timezone <B>».
> From the emails that have been exchanged so far, I understand there are 
> different – and specific – ways to write this in natural languages. Thus it 
> seems difficult to define one universal runner grammar that could conveniently 
> fit in all languages.
> 
> Here is my idea: would it be possible to use keywords like «@» or «->» in the 
> runner grammar? Using such technical keywords would prevent people from 
> thinking they can write in natural language.
> 
> For example, one may write «10:00 @Paris -> @Tokyo», or something like that.

While this is a good idea, it’s not logical to limit this to a one set of keyword, especially when we already have a basic foundation. Details below.

>> 
>> If, on the other hand, you say that both in Korean and in Turkish, a syntax
>> like "<time> <from timezone> in <to timezone>" (where the time zone, the
>> time zone names as well as a word for "in" are localized) is at least
>> somewhat feasible, at any rate after the fact that we have (likewise
>> translated) user help explaining how to use it, then I would say there is
>> not much of a problem, and you are good to go by simply translating the
>> "in" keyword and any strings that are used in the output formatting.

I believe I was not able to convey my message last time clearly in my previous message. Indo-European prepositional phrases do not work well at all with Turkic languages (and Japanese has a similar problem from my understanding, since it is agglutinative as well). If I remember correctly, Chinese also uses the 里 suffix to cover the “in” preposition if I remember correctly (another possible repositioning need).

>From my point of view, making every keyword flexible and movable around, and accepting more than one input for every phrase solves the issue for many languages.

Another example; in definition runner, I’ve flipped the define keyword in order to accommodate Turkish. For instance, instead of “define x”, I’ve used “x tanımla”, but it still does not sound natural, because we tend to describe the object we want explained, or use genitive case suffixes with or without and apostrophe, otherwise it sounds very, very robotic. Like, “define x word”, but we overlook here that the “word” word might have synonyms, so that a user might’ve used “x sözcüğünü tanımla, x kelimesini tanımla, x sözünü tanımla”; or can prefer “define x item” -> “x ögesini tanımla, x öğesini tanımla, x nesnesini tanımla”; or “what is x” -> “x ne, x nedir”… and it goes on. I am not even getting into the subject of genitive case suffix vowels changing accordingly with the vowel harmony, I’ll simply omit that structure (x’i tanımla) in order to reduce complexity. We can just cover a lot of corner cases like these with multiple target strings. As you can see a perfect single keyword like “define” in English can be met with a lot of variants that all can be expected from a native speaker of another language.

And if the rest of the KRunner team is listening, if not please convey, please document all the uses in the docbook, so that we can assist people learning all the keywords, and simply present it as a reference source (and add our own quirks). I’d love to advertise it to my local community.

On the other hand, I am not sure if people will complain if we just use @ or ->, or just use <source> <target> structure WITHOUT ANY English specific things, what the hell, PERFECTLY OKAY; but if feasible, and won’t be a burden much, improvements such as explained above will improve the general experience by a huge margin in the Plasma environment. This is just my feedback about the general subject. Also I agree that parsing natural language using complex NLP algorithms is outside the current scope of KRunner. Gotta be thankful for what we have already :).

PS. I am still not getting Natalie’s e-mails.

Best regards,
Emir (𐰽𐰺𐰍)

** E-mail needs to stay simple
** Use plain text e-mail



------------------------------

Subject: Digest Footer

_______________________________________________
kde-i18n-doc mailing list
kde-i18n-doc at kde.org
https://mail.kde.org/mailman/listinfo/kde-i18n-doc


------------------------------

End of kde-i18n-doc Digest, Vol 239, Issue 10
*********************************************
  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-i18n-doc/attachments/20230214/2a8c41c9/attachment-0001.htm>


More information about the kde-i18n-doc mailing list