Licensing for models and datasets
Cornelius Schumacher
schumacher at kde.org
Sat Mar 30 11:52:09 GMT 2024
On 26.03.24 17:33, Volker Krause wrote:
> On Montag, 25. März 2024 15:17:48 CET Halla Rempt wrote:
>> We're looking into adding an experimental AI-based feature to Krita:
>> automated inking. That gives us three components, and we're not sure about
>> the license we should use for two of them: the model and the datase. Would
>> CC be best here?
>
> Looking at https://community.kde.org/Policies/Licensing_Policy the closest
> thing would either be "media" files (generalized to "data files") and thus CC-
> BY-SA (and presumably CC-BY/CC0) or "source code" (xGPL, BSD/MIT).
I don't think we can directly use the current licensing policy for ML
models and datasets. But I suppose we should discuss extending it to
cover these use cases as well.
CC-BY or CC-BY-SA are not the best choice for data as their attribution
requirements can make it impractical to work with data under these
licenses. There are some good arguments why data should rather not be
licensed at all
(https://plus.pli.edu/Details/Details?fq=id:(352066-ATL2)). This would
suggest to use CC0 as closest practical form of it.
For models, attribution requirements seem to be less of an issue. But as
Volker described the copyright situation is quite complicated and it's
not clear yet, what consequences this will have in the future. From this
point of view a permissive license could a good choice as it is likely
to not create problems in the future. As the MIT is already mentioned in
the licensing policy, maybe this is the best choice?
In addition to the licensing itself it could also be good to consider
how to convey more information about the openness of the system. Even if
it wouldn't make a difference in terms of copyright for the user of a
model, it still might be preferable to use models which are trained on
free and open data. Some kind of labeling and making this transparent to
end users could be a solution to that.
In the context of the Sustainable Software goal we have a bit of
discussion around the labeling. There are some ongoing efforts, such as
OSI's attempt to define what Open AI actually should mean
(https://opensource.org/deepdive), or Nextcloud's Ethical AI labeling
system (https://nextcloud.com/blog/nextcloud-ethical-ai-rating/). Maybe
it would be worth thinking about adopting something like that in KDE as
well. Who would be interested to discuss this? We have it on the agenda
for the upcoming Goals sprint end of April, but it might be worth
extending this discussion if there is broader interest.
--
Cornelius Schumacher <schumacher at kde.org>
More information about the kde-community
mailing list