Telemetry Policy

Sat Aug 19 10:39:07 BST 2017

On Friday, 18 August 2017 11:23:49 CEST Jaroslaw Staniek wrote:
> On 17 August 2017 at 16:19, Volker Krause <vkrause at kde.org> wrote:
> > On Wednesday, 16 August 2017 20:35:59 CEST Jaroslaw Staniek wrote:
> > > On 16 August 2017 at 18:56, Volker Krause <vkrause at kde.org> wrote:
> > > > On Wednesday, 16 August 2017 15:23:07 CEST Jaroslaw Staniek wrote:
> > > > > On 16 August 2017 at 14:13, Volker Krause <vkrause at kde.org> wrote:
[...]
> > > > - Kexi seems to (optionally?) contain a unique identifier
> > > 
> > > This is mostly related to cases when any kind of cloud storage is used.
> > > These cases involve unique accounts already so users can be identified
> > > very well even without having telemetry functionality.
> > > 
> > > KEXI installations limited to open-core, used away from a cloud, do not
> > > need identifiers.
> > > However I understand that identifiers, independent of network or host ID
> > > (basically a random-generated QUuids) are useful for even basic
> > > telemetry needs. Without them it's easy to abuse the system using any
> > > kind of bots to trick us that e.g. 99% of sessions happen on KDE 1.0 or 
> > > that given Linux distro has 90% of the global market :)
> > 
> > Vandalism is a potential problem indeed (did you actually have issues with
> > that on Kexi btw? if so, what counter-measures did you apply?). However I
> > don't see how a UUID is helping here, the bot could just as well generate
> > UUIDs for each submission?
> 
> UIDs indeed can't help with too clever bots but e.g. semi-evil use cases
> such as executing apps in batch mode can be catch. I've mostly encountered
> logs coming from test machines including myself so I probably should not
> have used the term 'bots' but (as unrealistic as it sounds) real bots can
> be created.

Ok, so that's more an accident scenario then vandalism/abuse. Wouldn't the 
more targeted counter-measure be to just disable telemetry for the development 
team?

> > > Similarly app projects may need the IDs to answer question about most
> > > and least used features. Most used as in "most users found it, 
> > > understood it and use it", not "most usage reports has been delivered 
> > > for it (maybe coming from a single user -- maybe even my very own co-
> > > developer). There are many other examples probably already discussed.
> > 
> > Sure this gets easier with unique ids, but it's not impossible without
> > them.
> > After all the goal here isn't to make our lives easier, but to agree on
> > something that is acceptable for our users. And yes, that might imply more
> > work and/or less accurate data.
> 
> My assumption when started with telemetry was having adequate level of
> precision. Assuming no logs are fabricated as fake interesting questions
> are for example: how many users actually run supported software and how
> many run outdated one? Not how many executions per given period of time
> because it may be that old software is executed by a few users very
> frequently for some reason. e.g. because 3 years old sofware crashes on old
> OS every minute and restart was needed :)
> 
> How to know that without unique (anonymous) identification?
> Using extra fields such as OS+Desktop type/version would be indeed a form
> of cheap UID.
> But I would say disclosing OS+Desktop type/version for that discloses more
> than the anonymous random UID represents.
> In bugzilla and mailing list we're asking for all this information too
> anyway and (at least I) do not like supporting anonymous users since I am
> not anonymous.

The implementation in KUserFeedback addresses this by fixed interval data 
submission. If you then aggregate the received data by the same interval, you 
can see e.g. how ratios of application versions develop over time.

This does have limits of course, you can't distinguish between the same person 
using the application every sampling interval, or two people using it every 
other interval for example. With a sufficiently long sampling interval the 
result should nevertheless be sufficiently accurate I think.

> BTW, it's worth to remind, the UID is not even a hash of any host and user
> info, it's a random number. I do admit that "hash of a host and user info"
> would be even better as it allows to recreate the UID after e.g. OS has
> been reinstalled or new account created. But I do not use hashing for KEXI
> anyway.
> 
> > > Thus I would see the Anonymity is covered by KEXI's approach except that
> > > it offers opt-in tracking of unique user for unique installations. KEXI
> > > currently does not track unique installations at all until the user 
> > > agrees for any telemetry (the KexiUserFeedbackAgent::
> > > AnonymousIdentificationArea value). This is required by nature of stats 
> > > computed (and abuses mentioned above are the reason).
> > > 
> > > Is this a big deal? We're close to philosophy area here.
> > 
> > Correct, this is about the philosophy behind our products :) And one very
> > core part of that happens to be privacy.
> > 
> > That basically leaves the question: do we want to additionally allow the
> > opt-in use of unique identifiers?
> 
> I would say yes. For example I see no reasons to reject any (inter)network
> software having a concept of accounts from KDE. Our Phabricator and forums
> and bugzilla are example of that. Well, our very own Akademy registration
> software especially if it's "our" code base. All of them operate with
> unique IDs. Even more: some software disclose some user-visible strings
> (e.g. user names on the forums).
> I think the key is to require that the apps, no matter what type, precisely
> and clearly ask users for the agreement. And do not scare them.

That's mixing two different things though. Communication application for 
example of course need to uniquely identify the communication partner. That 
doesn't mean though that we should use the identification for anything but the 
absolute necessary, or that we can arbitrarily leak it (we e.g. add transport 
encryption wherever possible against that). 

The benefit of unique identification for telemetry is IMHO too small to 
warrant the impact on privacy (and subsequently PR) here.

> > > Before designing the stats engine I guessed: not more than installing an
> > > email app or buying a SIM card and starting to use them; they allow me
> > > to send email or make a call using protocols that disclose quite a bit
> > > about me.
> > 
> > Sure, but that is also where we can differentiate. Just because other
> > applications weren't designed with privacy in mind doesn't mean we should
> > follow their example IMHO.
> 
> Well, "weren't designed with privacy in mind" sounds s bit strong. Each
> application has desired *level* of privacy hopefully defined maybe at
> design time. I can imagine a storage for KEXI that is using public GitHub
> account and repos in exchange for being free-as-beer cloud solution.

Email and GSM weren't designed with privacy in mind, that's what I was 
referring to. That just wasn't a topic at the time. But if a widely used 
communication protocol isn't targeted by current surveillance legislation 
attempts, something is probably wrong with that protocol ;)

This doesn't imply we can't do software that is meant to use public cloud 
services for example, it's the users choice to use that after all.

But you can't look at all that as being equal, I'm happy to share much of the 
source code I write publicly and have it attributed to me, but I'm certainly 
not ok with sharing my communication or activity data in the same way.

Regards,
Volker

> Example. Our KDE software runs on hardware that is not assuring privacy
> (emits signals that someone can easily decipher). "The apps weren't
> designed with privacy in mind because they should block non-open CPUs" --
> someone with high-enough expectations would easily say. "But we don't care,
> we still want to develop them and see them used" -- we say, that's not our
> level.
> 
> For most of the folks email has confidentiality are the SIM is OK.
> 
> Thanks for the notes and for working on the stuff, Volker.
> 
> > Regards,
> > Volker
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-community/attachments/20170819/4642fb85/attachment.sig>