Licensing for models and datasets

Andrius Štikonas stikonas at kde.org
Wed Mar 27 00:30:47 GMT 2024


There is also this document by Debian's Deep Learning Team that is worth 
looking at:

https://salsa.debian.org/deeplearning-team/ml-policy/-/blob/master/ML-Policy.rst

There they make a distinction between model and its artifacts and if model is 
non-free or trained on non-free data, then they consider its output artifacts 
to be proprietary.

Andrius 

2024 m. kovo 26 d., antradienis 16:33:56 GMT Volker Krause rašė:
> On Montag, 25. März 2024 15:17:48 CET Halla Rempt wrote:
> > We're looking into adding an experimental AI-based feature to Krita:
> > automated inking. That gives us three components, and we're not sure about
> > the license we should use for two of them: the model and the datase. Would
> > CC be best here?
> 
> Looking at https://community.kde.org/Policies/Licensing_Policy the closest
> thing would either be "media" files (generalized to "data files") and thus
> CC- BY-SA (and presumably CC-BY/CC0) or "source code" (xGPL, BSD/MIT).
> 
> I think this is a bit more tricky though, depending on whether we assume a
> model is derivative work of the input data, and whether the output generated
> from a model is derivative work of the model (and thus potentially
> derivative work of the input data). The industry assumption so far seems to
> be that at least one of those isn't derivative work (AFAIK that has yet to
> be legally tested though), but I'm not sure that interpretation is in the
> best interest of FOSS developers or artists...
> 
> One scenario that would work regardless I think is using a license with
> practically no constraints (CC0, MIT, etc), but that also offers no
> protection for the training or model data (which might or might not be what
> you want).
> 
> Any other scenario I can think of involving more protective licenses runs
> into interesting issues:
> - if the output is derivative work, Krita users would be bound by e.g. the
> attribution or share-alike requirements of the license (which I guess is not
> what you want).
> - a Bison/Flex style "code generator exception" to state that the model
> output is free of any license requirements regardless of the model license
> itself requires that either the model isn't derivative work of the input or
> that the input data is licensed in a way compatible with that.
> - In the latter case we are back to essentially unprotected CC0-like input,
> or a protective license with a special exception, which then gets awfully
> close to developing new licenses.
> 
> So I guess this boils down to how much protection you have in mind for the
> input and model data?
> 
> Interesting topic, sorry if my ramblings on this are of limited help :)
> 
> Regards,
> Volker






More information about the kde-community mailing list