Telemetry Policy

Jaroslaw Staniek staniek at kde.org
Sun Aug 20 21:29:28 BST 2017


​​
On 19 August 2017 at 11:39, Volker Krause <vkrause at kde.org> wrote:

> On Friday, 18 August 2017 11:23:49 CEST Jaroslaw Staniek wrote:
> > On 17 August 2017 at 16:19, Volker Krause <vkrause at kde.org> wrote:
> > > On Wednesday, 16 August 2017 20:35:59 CEST Jaroslaw Staniek wrote:
> > > > On 16 August 2017 at 18:56, Volker Krause <vkrause at kde.org> wrote:
> > > > > On Wednesday, 16 August 2017 15:23:07 CEST Jaroslaw Staniek wrote:
> > > > > > On 16 August 2017 at 14:13, Volker Krause <vkrause at kde.org>
> wrote:
> [...]
> > > > > - Kexi seems to (optionally?) contain a unique identifier
> > > >
> > > > This is mostly related to cases when any kind of cloud storage is
> used.
> > > > These cases involve unique accounts already so users can be
> identified
> > > > very well even without having telemetry functionality.
> > > >
> > > > KEXI installations limited to open-core, used away from a cloud, do
> not
> > > > need identifiers.
> > > > However I understand that identifiers, independent of network or
> host ID
> > > > (basically a random-generated QUuids) are useful for even basic
> > > > telemetry needs. Without them it's easy to abuse the system using any
> > > > kind of bots to trick us that e.g. 99% of sessions happen on KDE 1.0
> or
> > > > that given Linux distro has 90% of the global market :)
> > >
> > > Vandalism is a potential problem indeed (did you actually have issues
> with
> > > that on Kexi btw? if so, what counter-measures did you apply?).
> However I
> > > don't see how a UUID is helping here, the bot could just as well
> generate
> > > UUIDs for each submission?
> >
> > UIDs indeed can't help with too clever bots ​but e.g. semi-evil use cases
> > such as executing apps in batch mode can be catch. I've mostly
> encountered
> > logs coming from test machines including myself so I probably should not
> > have used the term 'bots' but (as unrealistic as it sounds) real bots can
> > be created.
>
> Ok, so that's more an accident scenario then vandalism/abuse. Wouldn't the
> more targeted counter-measure be to just disable telemetry for the
> development
> team?
>

In KEXI, in an anti-corporate fashion, we don't distinguish development
team from non-development team. All users are in the team by definition
after agreeing to support telemetry. That's one of the motivators.
​


>
> > > > Similarly app projects may need the IDs to answer question about most
> > > > and least used features. Most used as in "most users found it,
> > > > understood it and use it", not "most usage reports has been delivered
> > > > for it (maybe coming from a single user -- maybe even my very own co-
> > > > developer). There are many other examples probably already discussed.
> > >
> > > Sure this gets easier with unique ids, but it's not impossible without
> > > them.
> > > After all the goal here isn't to make our lives easier, but to agree on
> > > something that is acceptable for our users. And yes, that might imply
> more
> > > work and/or less accurate data.
> >
> > My assumption when started with telemetry was having adequate level of
> > precision. Assuming no logs are fabricated as fake interesting questions
> > are for example: how many users actually run supported software and how
> > many run outdated one? Not how many executions per given period of time
> > because it may be that old software is executed by a few users very
> > frequently for some reason. e.g. because 3 years old sofware crashes on
> old
> > OS every minute and restart was needed :)
> >
> > How to know that without unique (anonymous) identification?
> > Using extra fields such as OS+Desktop type/version would be indeed a form
> > of cheap UID.
> > But I would say disclosing OS+Desktop type/version for that discloses
> more
> > than the anonymous random UID represents.
> > In bugzilla and mailing list we're asking for all this information too
> > anyway and (at least I) do not like supporting anonymous users since I am
> > not anonymous.
>
> The implementation in KUserFeedback addresses this by fixed interval data
> submission. If you then aggregate the received data by the same interval,
> you
> can see e.g. how ratios of application versions develop over time.
>
> This does have limits of course, you can't distinguish between the same
> person
> using the application every sampling interval, or two people using it every
> other interval for example. With a sufficiently long sampling interval the
> result should nevertheless be sufficiently accurate I think.
>

Volker, thanks for sharing this. ​I don't see how this as an approximation.
Do you probe in given time intervals and/or measure time spent with the
application? How do you handle time zones (e.g. zero usage of version X
that is used only in the USA for some reason)?

KEXI sends the feedback data on startup only. I have no idea if this is
compatible with any other approach but this helps to ignore different usage
patterns, e.g. these two basic and typical to KEXI and many apps:

- user starts the app and keeps it open for half of the day
- user frequently starts the app multiple times (for any reason) and has
multiple instances open

If I remember correctly we're not measuring how long the app is used, this
can be perceived as quite private information, by the way. Interesting data
but so far not collected.

Moreover based on my specific experience giving up the IDs softens the data
any more complex than app version: Alice can use module M of the app
primarily and Bob can use module N mostly. Without IDs we have a set of
mixed probes that include usage of both modules in no particular order
(maybe per locale or timezone or other factor but this is not worth
guessing IMHO). We don't even know if there are module-based preferences
among the users.
I know you're well aware of all that given how long you spent to work on
the topic. I am not pushing for obligation of all app projects to offer IDs
(and especially with opt-out) but disallowing it in some manifest would
bring negative results and alienate someone (also stays away from *GPL as
stated above). So realism is needed here.

I've not heard privacy concerns from KEXI's user base but heard concerns
about us not knowing the user patterns enough. YMMV.

The key for me is to know users' expectations, so here I would learn what's
their perception on privacy too. No generalization. Otherwise there are
comic situations such as when I encounter a post from someone who
generalizes and calls to take a very strict privacy policy in general and
make it a KDE's differentiator, BUT the post is all signed "Sent from
iPhone". Or freedom warriors that happen to use Facebook. Evangelists would
better start from themselves and offer consulting to projects they know
from the inside.


> > ​BTW, it's worth to remind, the UID is not even a hash of any host and
> user
> > info, it's a random number. I do admit that "hash of a host and user
> info"
> > would be even better as it allows to recreate the UID after e.g. OS has
> > been reinstalled or new account created. But I do not use hashing for
> KEXI
> > anyway.
> >
> > > > Thus I would see the Anonymity is covered by KEXI's approach except
> that
> > > > it offers opt-in tracking of unique user for unique installations.
> KEXI
> > > > currently does not track unique installations at all until the user
> > > > agrees for any telemetry (the KexiUserFeedbackAgent::
> > > > AnonymousIdentificationArea value). This is required by nature of
> stats
> > > > computed (and abuses mentioned above are the reason).
> > > >
> > > > Is this a big deal? We're close to philosophy area here.
> > >
> > > Correct, this is about the philosophy behind our products :) And one
> very
> > > core part of that happens to be privacy.
> > >
> > > That basically leaves the question: do we want to additionally allow
> the
> > > opt-in use of unique identifiers?
> >
> > I would say yes. For example ​I see no reasons to reject any
> (inter)network
> > software having ​a concept of accounts from KDE. Our Phabricator and
> forums
> > and bugzilla are example of that. Well, our very own Akademy registration
> > software especially if it's "our" code base. All of them operate with
> > unique IDs. Even more: some software disclose some user-visible strings
> > (e.g. user names on the forums).
> > I think the key is to require that the apps, no matter what type,
> precisely
> > and clearly ask users for the agreement. And do not scare them.
>
> That's mixing two different things though. Communication application for
> example of course need to uniquely identify the communication partner. That
> doesn't mean though that we should use the identification for anything but
> the
> absolute necessary, or that we can arbitrarily leak it (we e.g. add
> transport
> encryption wherever possible against that).
>

> The benefit of unique identification for telemetry is IMHO too small to
> warrant the impact on privacy (and subsequently PR) here.
>

​Strictly speaking it's possible to separate​ the two uses of ID and
present them to users as separate. Then hope Bob and Alice understand and
appreciate the complexity. But I can only imagine how small group of users
will agree to register to some KDE service like a forum or bug tracking and
won't agree to share name of the OS *they* use and app's version.
My principle here is the generally known "design for typical case".

I see we are discussing the "offer opt-in or prohibit offering it for the
KDE software at all". Just because there can be abuses or very strong
opinions from people that may be not even be our real users.


> > > > Before designing the stats engine I guessed: not more than
> installing an
> > > > email app or buying a SIM card and starting to use them; they allow
> me
> > > > to send email or make a call using protocols that disclose quite a
> bit
> > > > about me.
> > >
> > > Sure, but that is also where we can differentiate. Just because other
> > > applications weren't designed with privacy in mind doesn't mean we
> should
> > > follow their example IMHO.
> >
> > Well, ​"​weren't designed with privacy in mind"​ sounds s bit strong.
> Each
> > application has desired *level* of privacy hopefully defined maybe at
> > design time. I can imagine a storage for KEXI that is using public GitHub
> > account and repos in exchange for being free-as-beer cloud solution.
>
> Email and GSM weren't designed with privacy in mind, that's what I was
> referring to. That just wasn't a topic at the time. But if a widely used
> communication protocol isn't targeted by current surveillance legislation
> attempts, something is probably wrong with that protocol ;)
>
> This doesn't imply we can't do software that is meant to use public cloud
> services for example, it's the users choice to use that after all.
>
> But you can't look at all that as being equal, I'm happy to share much of
> the
> source code I write publicly and have it attributed to me, but I'm
> certainly
> not ok with sharing my communication or activity data in the same way.
>

​I am also against ideas​ to openly share raw telemetry data, I've heard
about them in this or sibling threads for the first time. All telemetry I
worked on was based on the trust for given organization and only the
organization processes the raw data being very careful what results are
published.

​PS1: Regarding our "*control over their digital life" *part of KDE *vi*sion
There's a whole new digital generation much younger folks than both of us.
Their "digital life" is just their "life" what is shocking for us old guys.
I am afraid that the group has not much representation in KDE.

PS2: Trivial, if there is any voting planned (?) it's important how do we
ask. It's already hard enough that mostly the old generation votes...

PS3: Organizations that support IDs can have two nice things: offer the
users ability to review and remove telemetry data upon request. Hard to do
that without IDs, right?

Cheers,
Jarek



>
> Regards,
> Volker
>
> > Example. Our KDE software runs on hardware that is not assuring privacy
> > (emits signals that someone can easily decipher). ​"The apps weren't
> > designed with privacy in mind because they should block non-open CPUs" --
> > someone with high-enough expectations would easily say. "But we don't
> care,
> > we still want to develop them and see them used" -- we say, that's not
> our
> > level.
> >
> > For most of the folks email has confidentiality are the SIM is OK.
> >
> > Thanks for the notes and for working on the stuff, Volker.
> >
> > > Regards,
> > > Volker
>



-- 
regards, Jaroslaw Staniek

KDE:
: A world-wide network of software engineers, artists, writers, translators
: and facilitators committed to Free Software development - http://kde.org
Calligra Suite:
: A graphic art and office suite - http://calligra.org
Kexi:
: A visual database apps builder - http://calligra.org/kexi
Qt Certified Specialist:
: http://www.linkedin.com/in/jstaniek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/kde-community/attachments/20170820/0ae5f92a/attachment.htm>


More information about the kde-community mailing list