Introducing SPDX License Headers

Andreas Cord-Landwehr cordlandwehr at kde.org
Sun Jan 26 17:59:22 GMT 2020


Hi, as discussed during last Akademy's KF6 BoF and the KF6 sprint, as well as 
on the KDE community list, SPDX license information can help us a lot in 
maintaining a (automatically checkable) good license quality throughout our 
libraries. Since today, SPDX markers are also known in the KDE licensing 
policy and I want to start porting the existing license headers to the new 
language.

For anybody without SPDX marker experience, there are now the following good 
starting pages:
- KDE Licensing Policy: https://community.kde.org/Policies/Licensing_Policy
- (a very new) Licensing HowTo: https://community.kde.org/
Guidelines_and_HOWTOs/Licensing
- REUSE.software: http://reuse.software

The question now is, how to introduce the SPDX markers in a reasonable way. 
Since some time, I worked on a conversion tooling, which works quite well 
right now (it converts about 98% of the headers and leaves the remaining ones 
for manual work) [1]. The approach I followed was:

1. For every license header in KF5 (about 130 versions of statements), I 
created a plain text file that contains that license statement.
2. All license header files with the same meaning (e.g. all LGPL-2.0-or-later 
headers) are combined in a regular expression that also matches all possible 
whitespaces, linebreaks and "*" characters.
3. For every individual license header there is a reference original source 
code file included, which is used inside a unit test to verify that the license 
is correctly detected.
4. My tool has a "--convert" option that replaces all matched regular 
expressions (only if they could be detected unambiguously!) to the 
corresponding SPDX expression and adds the respective license files to the root 
folder of the project.

In the KF5 repositories there are slightly more than 9000 files with copyright 
information that I want to convert. I plan to provide a patch for every single 
(non porting-aid) KF5 repository, starting with the Tier 1 repositories. Each 
patch will contain the changes created by my licensedigger tool, possibly a 
few style changes (meaning whitespace removal or removal of "*" characters).
Any license that I had to state by hand will be in a different commit and 
explicitly stated in the pull request.

Does this approach sound reasonable? If anybody wants to review my conversion 
tool and the license-header-to-SPDX-translations, I am happy for feedback!

As a first test-balloon, I created a patch for KIdletime [2], mostly because it 
is one of the repositories that nearly never sees a change and allows to keep 
a pull request open for a longer time, if discussions are needed.

Cheers,
Andreas

[1] https://cgit.kde.org/scratch/cordlandwehr/licensedigger.git/
[2] https://phabricator.kde.org/D26931




More information about the Kde-frameworks-devel mailing list