Metrics and your privacy

Andy Green andy at warmcat.com
Wed Nov 22 16:45:26 UTC 2006


Bruno Wolff III wrote:

>> Whereas if you collect via yum mirrors, there is a transaction going on 
>> initiated by the user that he benefits from.  It seems hard for anyone 
>> to object to your IP getting used for anonymous aggregated stats in such 
>> a case, in fact if I visit any website I expect to have the same done 
>> for my visit from their logs (esp if they are on Google Analytics).
> 
> The data can't just be stored as anonymous aggregated data since you need to
> check for unique IP addresses. You could do hashing, but that wouldn't buy
> very much privacy until ip6 is common. Still it would be nice if Fedora
> had an official policy on when the IP data (as opposed to the aggragate data)
> would be deleted.

No I'm not sure there is any point "checking for unique IP addresses". 
The nature of what the clients are doing with the mirrors is that each 
client is only going to pull a package once.  So you can do the Geo 
stuff on the IP and just keep that, and examine the logs only for GET.*\.rpm

>> It would be cool to generate a GUID per machine and attach it to yum 
>> download URLs, eg, http://mirror.org/blah/thing.rpm?GUID=123-123-123.. 
>> so it is ignored by the server but is present in the logs.  But the logs 
>> are still useful without it.
> 
> Definitely don't do this.

It doesn't seem necessary on further thought.

>> Making a new machine check for updates at least once as soon as it saw 
>> the network was up would be a friendly and non-privacy threatening 
>> action that would solve this...
> 
> No it wouldn't be friendly. I don't like that yum checks for updates on first
> boot before I have a chance to turn it off.

This later turned into a "can I check for updates [y/n]" suggestion 
which is hopefully more compatible with your view.

>>  3. Machines behind a local yum cache
>>
>> Whatever tools are provided to run the yum cache should have the repo 
>> log processing stuff folded into them, and report stats up to Fedora HQ 
>> by default.  But a user should be able to turn it off.
> 
> Definitely not, but especially not by default.

Well I don't understand why you would say "definitely not" even if the 
thing is opt-in.

> One of the reasons I like free software is that it doesn't (normally) try to
> spy on you.
> 
> Currently Fedora is a pretty good fit for me, but if it turns into spyware,
> I will be looking at other options. (Though in the short run I would probably
> look at respinning the install DVD to include modified packages without the
> spyware.)

Hum none of this is "spyware".  I just need to look in your mail headers 
and I see your IP address: the mirrors already have it anyway.  The 
other information we discuss is if you took update packages, which and 
how many.

-Andy




More information about the fedora-list mailing list