MT based workflow

Mincho Kondarev mkondarev at yahoo.de
Thu May 18 15:19:44 BST 2023


I don’t fully understand the issue with copyright of translated models, but the output of MT is not 100 % suitable and correct. The MT are free to be used in certain conditions but the API for integration in apps is paid. 
Mincho


Am Mittwoch, Mai 17, 2023, 11:24 PM schrieb kde-i18n-doc-request at kde.org:

Send kde-i18n-doc mailing list submissions to
    kde-i18n-doc at kde.org

To subscribe or unsubscribe via the World Wide Web, visit
    https://mail.kde.org/mailman/listinfo/kde-i18n-doc
or, via email, send a message with subject or body 'help' to
    kde-i18n-doc-request at kde.org

You can reach the person managing the list at
    kde-i18n-doc-owner at kde.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of kde-i18n-doc digest..."


Today's Topics:

  1. Re: MT based workflow (Frederik Schwarzer)
  2. AW: kde-i18n-doc Digest, Vol 242, Issue 4 (Mincho Kondarev)
  3. Re: MT based workflow (Luigi Toscano)
  4. Re: MT based workflow (Albert Astals Cid)


----------------------------------------------------------------------

Message: 1
Date: Wed, 17 May 2023 14:12:48 +0000
From: Frederik Schwarzer <schwarzer at kde.org>
To: KDE i18n-doc <kde-i18n-doc at kde.org>
Subject: Re: MT based workflow
Message-ID: <5932310.lOV4Wx5bFT at swift>
Content-Type: text/plain; charset="us-ascii"

On Tuesday, May 16, 2023 9:51:28 PM CEST Oliver Kellogg wrote:

Hi,

> I had been exploring how to integrate https://translate.google.com into my
> PO translation workflow and had found this:
> https://github.com/zcribe/translate-po

Apart from the feasability I would like to see the legal situation discussed.

I guess we can safely assume at this point that the language model vendors do not care about copyright when training their models. But do we already know what they think about using the result of their models?

Cheers,
Frederik




------------------------------

Message: 2
Date: Wed, 17 May 2023 20:12:01 +0000 (UTC)
From: Mincho Kondarev <mkondarev at yahoo.de>
To: <kde-i18n-doc at kde.org>
Subject: AW: kde-i18n-doc Digest, Vol 242, Issue 4
Message-ID: <2032669659.5766334.1684354321490 at mail.yahoo.com>
Content-Type: text/plain; charset="utf-8"


Hi, Oliver, 
I use MT (google and deepl). I agree that the MT spares a lot of writing but it still need a very careful checking and corrections. I‘ve  written small bash scripts and a program in c++ that creates a translated po file from template and dictionary file in plain text format and then use the computer created po file as synchronisation source in Lokalize. This process is relativ slow and error prone and is suitable for large po file. I am not a developer and the code is not very matured. On the other  side I‘m  not aware of open source and free po editor which incorporates the automatic MT. 
Cheers 
Mincho




Am Mittwoch, Mai 17, 2023, 1:00 PM schrieb kde-i18n-doc-request at kde.org:

Send kde-i18n-doc mailing list submissions to
    kde-i18n-doc at kde.org

To subscribe or unsubscribe via the World Wide Web, visit
    https://mail.kde.org/mailman/listinfo/kde-i18n-doc
or, via email, send a message with subject or body 'help' to
    kde-i18n-doc-request at kde.org

You can reach the person managing the list at
    kde-i18n-doc-owner at kde.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of kde-i18n-doc digest..."


Today's Topics:

  1. MT based workflow (Oliver Kellogg)
  2. Re: Translation building instructions (Albert Astals Cid)


----------------------------------------------------------------------

Message: 1
Date: Tue, 16 May 2023 21:51:28 +0200
From: Oliver Kellogg <olivermkellogg at gmail.com>
To: kde-i18n-doc at kde.org
Subject: MT based workflow
Message-ID:
    <CANZbxGkCSGGjxrRLECPhZeKG=94uxiNwUuaqT4y_zXU8Ag-Y0Q at mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Hello,

I had been exploring how to integrate https://translate.google.com into my
PO translation workflow and had found this:
https://github.com/zcribe/translate-po

It does work, albeit with a few idiosyncrasies - it requires specific
postprocessing
because the generated format is not directly PO compatible. For example,
the line breaks between msgid/msgstr pairs are lost and need to be
restored. Also, it reformats the msgid strings so I made a special merge
tool which restores the original msgid's.
For details see http://okellogg.de/mt-using-translate_po.html

Of course, the msgstr content does require proofreading/corrections.
Nevertheless, it is a huge saving compared with translating from scratch.

Is anybody using MT (machine translation) in their translation workflow?
If so, what tools do you use?

- Oliver
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-i18n-doc/attachments/20230516/b01c85b1/attachment-0001.htm>

------------------------------

Message: 2
Date: Wed, 17 May 2023 00:14:27 +0200
From: Albert Astals Cid <aacid at kde.org>
To: "kde-i18n-doc at kde.org" <kde-i18n-doc at kde.org>
Subject: Re: Translation building instructions
Message-ID: <3481186.exTU7c25Sc at xps15>
Content-Type: text/plain; charset="us-ascii"

El dissabte, 13 de maig de 2023, a les 23:56:47 (CEST), Enol P. va escriure:
> Hi everyone,
>  
> Long time ago I made an script to build ast locale translations but I've
> lost it, and I forgot all commands to execute. However, I couldn't find any
> information about this online. 
> I appreciate if someone can send me all necessary instructions to build
> translations.

The "canonical" way to do this is

* Be in the dir that contains your language folder, for this example ca
* ~/path/to/l10n-scripty/autogen.sh ca
* cd ca
* cmake .
* make 
* make install

If you want to ovewrite your distro translations (be careful) 
-DCMAKE_INSTALL_PREFIX=/usr
to the cmake call and you will need su/sudo for the make install part.

Cheers,
  Albert




------------------------------

Subject: Digest Footer

_______________________________________________
kde-i18n-doc mailing list
kde-i18n-doc at kde.org
https://mail.kde.org/mailman/listinfo/kde-i18n-doc


------------------------------

End of kde-i18n-doc Digest, Vol 242, Issue 4
********************************************



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-i18n-doc/attachments/20230517/a08b1d05/attachment-0001.htm>

------------------------------

Message: 3
Date: Wed, 17 May 2023 22:46:32 +0200
From: Luigi Toscano <luigi.toscano at tiscali.it>
To: KDE i18n-doc <kde-i18n-doc at kde.org>
Subject: Re: MT based workflow
Message-ID: <dc52f5a6-023b-4e8c-167c-583bc85147f5 at tiscali.it>
Content-Type: text/plain; charset=UTF-8

Frederik Schwarzer ha scritto:
> On Tuesday, May 16, 2023 9:51:28 PM CEST Oliver Kellogg wrote:
> 
> Hi,
> 
>> I had been exploring how to integrate https://translate.google.com into my
>> PO translation workflow and had found this:
>> https://github.com/zcribe/translate-po
> 
> Apart from the feasability I would like to see the legal situation discussed.
> 
> I guess we can safely assume at this point that the language model vendors do not care about copyright when training their models. But do we already know what they think about using the result of their models?
> 

Exactly, this is THE question. Do those services allow the user to freely
relicense their output, and to use it for any purpose?
If not, we need to remove all the translations where those tools were used.

-- 
Luigi


------------------------------

Message: 4
Date: Wed, 17 May 2023 23:23:31 +0200
From: Albert Astals Cid <aacid at kde.org>
To: KDE i18n-doc <kde-i18n-doc at kde.org>
Subject: Re: MT based workflow
Message-ID: <31781726.zt2U7KV3os at xps15>
Content-Type: text/plain; charset="us-ascii"

El dimecres, 17 de maig de 2023, a les 16:12:48 (CEST), Frederik Schwarzer va 
escriure:
> On Tuesday, May 16, 2023 9:51:28 PM CEST Oliver Kellogg wrote:
> 
> Hi,
> 
> > I had been exploring how to integrate https://translate.google.com into my
> > PO translation workflow and had found this:
> > https://github.com/zcribe/translate-po
> 
> Apart from the feasability I would like to see the legal situation
> discussed.
> 
> I guess we can safely assume at this point that the language model vendors
> do not care about copyright when training their models. But do we already
> know what they think about using the result of their models?

Things like https://libretranslate.com/ that is based on 
  https://github.com/argosopentech/argos-translate
should be copyright neutral given they have been trained on data that "has no 
copyright" (If i'm not mistaken).

Cheers,
  Albert

> 
> Cheers,
> Frederik






------------------------------

Subject: Digest Footer

_______________________________________________
kde-i18n-doc mailing list
kde-i18n-doc at kde.org
https://mail.kde.org/mailman/listinfo/kde-i18n-doc


------------------------------

End of kde-i18n-doc Digest, Vol 242, Issue 5
********************************************



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-i18n-doc/attachments/20230518/628226e5/attachment-0001.htm>


More information about the kde-i18n-doc mailing list