Telemetry Policy - Remaining Questions

Fri Sep 15 08:27:11 BST 2017

On Friday, 15 September 2017 05:23:44 CEST Nicolás Alvarez wrote:
> 2017-09-14 15:56 GMT-03:00 Albert Astals Cid <aacid at kde.org>:
> > El dijous, 14 de setembre de 2017, a les 0:20:57 CEST, Volker Krause va
> > 
> > escriure:
> >> The following questions were left unanswered in the previous thread (see
> >> there for the full arguments if needed):
> >> 
> >> (1) Should we allow opt-in tracking of unique identifiers?
> >> 
> >> This was requested by Jaroslaw, as Kexi has this right now and the policy
> >> as written right now would thus conflict with it.
> > 
> > I missed this, what's the usecase of unique id data?
> 
> Without a unique ID, each time the app sends telemetry, the record is
> independent and not correlated to previous records. Generating a
> random "client ID" and persisting it in some file in $HOME, and
> including it in the uploaded data, lets you calculate statistics per
> client, which is more useful than per telemetry record.
> 
> It's hard to know how what percentage of users users have a setting
> enabled if we don't have a client ID, since some users may send more
> telemetry reports than others (for multiple reasons, including using
> the app more often). If we have one, we can avoid double-counting
> multiple reports from the same client.

That is true is you send telemetry per application start. That's not the only 
way to do it though. Quoting myself from the earlier thread:

> The implementation in KUserFeedback addresses this by fixed interval data
> submission. If you then aggregate the received data by the same interval,
> you can see e.g. how ratios of application versions develop over time.
> 
> This does have limits of course, you can't distinguish between the same
> person using the application every sampling interval, or two people using
> it every other interval for example. With a sufficiently long sampling
> interval the result should nevertheless be sufficiently accurate I think.

and in a bit more detail here: https://mail.kde.org/pipermail/kde-community/
2017q3/003917.html

Ie. I think we can do the de-duplication by other means than unique 
identification.

> From Mozilla documentation: "So when you say '63% of beta 53 has
> Firefox set as its default browser', make sure you specify it is 63%
> of *pings*, since it is only around 46% of clients. (Apparently users
> with Firefox Beta 53 set as their default browser submit more
> main-pings than users who don't)."

That is something else though, that's the participation ratio on opt-in. 
Measuring that and determining its bias on the submitted data is indeed a 
challenge, but I don't see how unique ids help with that?

Regards,
Volker
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 195 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-community/attachments/20170915/80d51946/attachment.sig>