Licensing for models and datasets

Thu May 9 07:00:00 BST 2024

Afaik, our training and data augmentation code will be, out of necessity,
GPL3.

For the images included in the dataset, our artist suggested CC-BY with a
special clause that would ensure that the images will be CC-BY when used
normally, but allow no credit when used in our AI. (I suggested CC-0, but
the artist said that they wouldn't feel comfortable giving their artworks
with CC-0 to a dataset, so that won't work).
How should we phrase it?
I think the clause should allow for our AI and, for simplicity, other
usages in AI by the Krita team, but not in any other AIs. Or possibly just
for the Smart Inking AI? But how to say it?
Should we get some kind of legal advice or something?

Tiar

sob., 30 mar 2024 o 15:22 Cornelius Schumacher <schumacher at kde.org>
napisał(a):

> On 26.03.24 17:33, Volker Krause wrote:
> > On Montag, 25. März 2024 15:17:48 CET Halla Rempt wrote:
> >> We're looking into adding an experimental AI-based feature to Krita:
> >> automated inking. That gives us three components, and we're not sure
> about
> >> the license we should use for two of them: the model and the datase.
> Would
> >> CC be best here?
> >
> > Looking at https://community.kde.org/Policies/Licensing_Policy the
> closest
> > thing would either be "media" files (generalized to "data files") and
> thus CC-
> > BY-SA (and presumably CC-BY/CC0) or "source code" (xGPL, BSD/MIT).
>
> I don't think we can directly use the current licensing policy for ML
> models and datasets. But I suppose we should discuss extending it to
> cover these use cases as well.
>
> CC-BY or CC-BY-SA are not the best choice for data as their attribution
> requirements can make it impractical to work with data under these
> licenses. There are some good arguments why data should rather not be
> licensed at all
> (https://plus.pli.edu/Details/Details?fq=id:(352066-ATL2)). This would
> suggest to use CC0 as closest practical form of it.
>
> For models, attribution requirements seem to be less of an issue. But as
> Volker described the copyright situation is quite complicated and it's
> not clear yet, what consequences this will have in the future. From this
> point of view a permissive license could a good choice as it is likely
> to not create problems in the future. As the MIT is already mentioned in
> the licensing policy, maybe this is the best choice?
>
> In addition to the licensing itself it could also be good to consider
> how to convey more information about the openness of the system. Even if
> it wouldn't make a difference in terms of copyright for the user of a
> model, it still might be preferable to use models which are trained on
> free and open data. Some kind of labeling and making this transparent to
> end users could be a solution to that.
>
> In the context of the Sustainable Software goal we have a bit of
> discussion around the labeling. There are some ongoing efforts, such as
> OSI's attempt to define what Open AI actually should mean
> (https://opensource.org/deepdive), or Nextcloud's Ethical AI labeling
> system (https://nextcloud.com/blog/nextcloud-ethical-ai-rating/). Maybe
> it would be worth thinking about adopting something like that in KDE as
> well. Who would be interested to discuss this? We have it on the agenda
> for the upcoming Goals sprint end of April, but it might be worth
> extending this discussion if there is broader interest.
>
> --
> Cornelius Schumacher <schumacher at kde.org>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-community/attachments/20240509/942efc16/attachment.htm>