Regarding KDE Privacy policy

Wed Feb 26 09:32:33 GMT 2020

On Wed, Feb 26, 2020 at 9:05 AM Volker Krause <vkrause at kde.org> wrote:
>
> Not publishing the raw data right from the start was mainly a safety measure,
> to give us a chance to review the data and fix de-anonymization issues should
> any have slipped through.
>
> There's also technical limitations, the current system has no fine-grained
> access control, not even read vs write access.
>
> For publishing aggregated data, I think that's already "allowed" right now,
> just nobody has built an automated way of doing that yet.
>
> Given how little practical experience we have with this, I'd be cautious with
> publishing unreviewed raw data though. And that isn't just theoretical. We
> have already fixed overly detailed OpenGL information that were both aiding
> fingerprinting and making the data unnecessarily noisy after a first review.
> Additionally, the data set is still too small to avoid fingerprinting
> entirely, there's at least two criteria in there that allow me to find my own
> record in the Plasma data for example. That's not an entirely fair "attack"
> obviously, but it shows this needs a careful review.

I concur with Volker's sentiment as well here - even if we think the
data is aggregated and fully anonymized, there may in fact be enough
data points exposed within the data to identify someone. We currently
guard against this risk by limiting access to the data to those who
have a justified reason to have access.

In addition to this, any change in how we use the data would require
much more than just a revision to the privacy policy (which in itself
would require us to make a public announcement concerning that change)

As we're changing how we use the data (to now include a distribution
component) we would need to invalidate all existing consents given by
users (for which no mechanism exists for us to do so, as we never
expected to need to change the policy) and I think we would have to
discard all the data we have already collected as well.

Unfortunately, as the system includes no mechanism for the server to
communicate which revision of the privacy policy the user agreed to,
we would also have to come up with a way of blocking all old clients
from communicating with the system altogether (as we have no way of
telling if it is an old consent the software is relying on or a new
one) so you'd only start getting data in the system once users had
gone through a full update cycle.

>
> Regards,
> Volker

Cheers,
Ben

>
> On Tuesday, 25 February 2020 13:44:55 CET Veggero Nylo wrote:
> > Hi!
> > Currently, data transmitted by KUserFeedback is available only by opening a
> > sysadmin ticked explaining why you need access in the first place. I can
> > see the reasoning behind this, but I do not think this is a good idea for
> > developers and users. I think that releasing the aggregated data under CC0
> > license would be better, as also proposed by Martin here:
> > https://mail.kde.org/pipermail/kde-community/2017q3/003808.html. I think
> > this would benefit user trust, as right now they have to trust what the
> > KUserFeedback KCM without really being able to see what data KDE developers
> > are actually able to see (as most users won't be able to look into the
> > code); on the other hand, if the data was publicly released, they would be
> > able to see the data themselves and know exactly what developers are going
> > to see. I also think this would benefit developers, as there might be a
> > significant number of developers who could be interested in looking to the
> > data, maybe just a single value, without being able to fully justify access
> > to all the data (the fact that you have to write a justification becomes a
> > negative factor that makes looking at the data less interesting);
> > furthermore, even if they get access to the data, they would be unable to
> > discuss it in KDE communication channels as those are public, nor on
> > phabricator tasks to support their patches, effectively making the data
> > much less useful. Also, the current policy might result in a privacy
> > problem, e.g.: I once needed data from stats.kde.org regarding website
> > views over time. I was granted access to it, and I now can see every singe
> > website viewer, with their country, OS, browser, etc - much more than I
> > actually needed. If the aggregated data was to be released publicly, I
> > would no longer need for stats.kde.org access, and I would no longer be
> > able to access private data that I did not actually need. Finally, I do not
> > fully understand why the data needs to be kept private in the first place,
> > since it is supposed to be anonymous and contain no user content.
> > What's your opinion on this?
> > ~ Niccolò Venerandi (aka veggero/niccolove)
>