I ran across the comment below at http://onemansblog.com/2007/02/02/protect-your-privacy-delete-internet-usage-tracks/#comment-58200 about Gravatar [1]. I'm particularly curious of Meta Stack Overflow's opinions on points 4 and 6, though the others may be of interest too. Are these concerns real, and if so, what defensive measures might be used?
Comment by AL 2009-02-18 00:03:55
I’m a lawyer specialising in internet and privacy issues at a Fortune 100 company and I personally think that Gravatar is easily the worst service available in terms of your data security and privacy. I generally don’t comment on any blogs that are Gravatar-enabled (this being an exception), for the following reasons:
The entire reason Gravatar offers their service is to collect internet usage data across multiple sites. It is not offered free out of the goodness of their heart. The entire purpose of the service is to analyse the way YOU navigate the internet.
Gravatar has clear plans to monetise this data. Whether they are successful or not is another story.
It is unlikely that Gravatar would ever disclose individual user’s personal information, but it is not impossible. The Chinese government has often requested to these kind of information aggregators to disclose data for the prosecution of political dissidents – and very often these requests are met resulting in bloggers being jailed (see Yahoo!’s experiences in China). For example, if I leave a number of comments promoting democracy criticising the PRC government on various blogs, it is entirely possible that the Chinese government could use legal authority to request the holder of information to disclose that to them. By retaining this information and preventing you from stoppping it’s collection, Gravatar is putting both bloggers and commenters at risk. This is not just in China. The Patriot Act and many other new pieces of post-9/11 legislation in Western countries convey similar powers to government.
The most egregious part of Gravatar’s service is the inability to stop them from collecting your data. I have in the past tried to cancel a Gravatar registration. Gravatar does not allow this and will continue to track your e-mail address for the rest of time.
Gravatar does not provide any details about how they use your personal information and does not respond to any queries relating to privacy issues.
I do not believe Gravatar is an opt-in service. Obviously they will not display an avator unless you register, but if a blog is Gravatar-enabled, every time you comment on it, your e-mail address is sent to Gravatar. Even if they do not retain this address (and it is quite possible that they do – their Privacy Policy is silent on this point and they have not responded to any of my enquiries on this point), it is VERY likely that your internet usage is still tracked in an anonymous fashion. That is, if I use the same e-mail address to comment on 5 different blogs, even if I am not a registered Gravatar user the fact that a user has accessed those 5 blogs is very likely retained by Gravatar.
Much is made of facebook and Google Chrome’s use of personal information, but Gravatar is far and away the worst popular internet service I have encountered in terms of user (and non-user) personal information.
As a lawyer, I strongly urge all blog authors and users who are concerned about their privacy to avoid Gravatar.
Related: Is using Gravatar a security risk? [2]
Is gravatar a privacy risk?
Yes.
Is it as great a risk as DoubleClick [1]/Google?
No. Notably these sites use Google Analytics [2]. Just like Gravatar, they don't have access to personally identifiable information (that is, the email is hashed before they get their hot mitts on it).
Is it a small risk?
Yes. If you don't like someone noting that an unidentifiable user (that's you) visited two different websites - well, they have that information now. That aggregate data can, in theory, be mined (as was the "anonymous" AOL search data [3] of yore) to identify you.
Should we give up on the gravatar service?
No. It's a useful service for many people, and many of them accept the cost for this "free" service.
Who is laughing at us right now?
The Amish [4].
Actually, no, they don't even
care
[5].
IMO sites need to use gravatar sensibly.
If you say you don't publish your users' email addresses, that should mean you don't publish an MD5sum of their email addresses either. Hashing sensitive data without a salt is a schoolboy error: web developers should know better. Publishing the hash of some private data is a breach of privacy if the data is subject to a dictionary attack, which email addresses are.
Just replace address@domain
with address+salt@domain
.
SO in effect allows you to do this manually, by setting the email address for your account. It doesn't use the address for anything other than gravatar unless you ask it to, so it doesn't have to really be your email address.
I'm pretty sure this is an accident, though, not a security feature, since SO also uses your IP address in the absence of an email address. IP addresses are even more subject to dictionary attack than email addresses.
Of course for the salt to be effective in preventing gravatar tracking you across sites, gravatar has to not know the email address behind it (since if it did know, it could merge the records of address+*@domain
). This means that (a) you must live with a random icon, and therefore (b) the user should be able to specify whether they want the salt added or not. If your email provider doesn't support +salt, and you want the site to be able to send you email without publishing the hash of your email address, then you're generally out of luck: you can have one or the other.
In fact I'd say that ideally sites should default to just generating a random "md5sum" for each user, and only use the email address to generate a gravatar URL with permission. For users with no interest in uploading an image to gravatar, there's no earthly reason why any site should use a gravatar URL based on supposedly-private data. Unless you count ignorance of basic security principles as a "reason" ;-)
+stackoverflow
) and then tell users to use an email provider that supports plus-addressing (en.wikipedia.org/wiki/E-mail_address#Sub-addressing) and then register a Gravatar for that "salted" address, if they want to customize the avatar. Not perfect, but no need for database and interface changes. - Arjan
As someone who provides a similar service (however on a scale that's a tiny bit smaller), I have to say that I myself am sometimes concerned what kind of information I could pull from the access logs if I wanted to.
On the other hand, whenever I comment on some blog, join a forum or whatever, it's my own choice to provide my personal email address. If I'm concerned about that, I can either a) not join at all, b) not provide an email address (if the site allows that, as SO does), or c) create an extra email address for this purpose.
So my view is: It's no bigger or smaller problem than any other privacy concerns resulting from data collection, be it PayPal knowing where you shop, myOpenId knowing where you log in, or Google knowing... well, everything.
That's not to say it's something that can be ignored, but I don't think Gravatar is a special case.
In December 2009, somebody tested getting email addresses from some of the Stack Overflow users, by assuming the display name might be related to an email account at some of the major providers. According to Gravatars: why publishing your email's hash is not a good idea [1] that assumption is true for about 10% of the SO users:
Running my program on a list of 80871 users I was able to extract 8597 email addresses, associated to their users. This means that for a bit more than 10% of the users, the username and the gravatar URL are enough to deduce the email address they used to register to the website.
(Apart from this, I also dislike the web bug nature [2] of Gravatar and the like.)
[1] http://www.developer.it/post/gravatars-why-publishing-your-email-s-hash-is-not-a-good-ideaarjan@gmail
is not actually me (and it isn't). But for many/most hits that wouldn't matter a lot indeed. - Arjan
One problem with Gravatar isn't solved by blocking the server.
The website you use publishes the hash of your email address. At a minimum, this makes it possible to find other websites where you used the same email address.
Looking at the Jan 2011 Stack Exchange data dump:
firstname.lastname@gmail.com
.gdsfgsdf.sdfadf.com
etc.), I assume that if we use valid addresses as a basis, the percentage of guessable addresses is even larger.All of this applies even to users who have not registered an account with Gravatar.
Many websites (including Stack Overflow) promise to not publish your email address, but at the same time use Gravatar and thus leak information about about the email address they promised to keep secret. If websites insist on using Gravatar, they should at least tell the user that the email address gets published, instead of lying to their users.
[1] http://en.wikipedia.org/wiki/IPv4firstname.lastname@gmail.com
users? Like: are those people who use Firstname Lastname
as their display name here, and happen to have a matching email address? Or was the *.*@gmail.com
pattern enough to get that 27k addresses from the hashes? (In the first case, a spammer probably wouldn't even care to validate if the address matches the hash, but just assume the address exists, but that's not about privacy of course. I'm still happy I'm not arjan@gmail
... And: nice overview, thanks!) - Arjan
Gravataring an email address ( MD5 [1] hashing) is effectively making a single ID that identifies you publicly. Even if you never signed up with Gravatar, they're still tracking your ID that comes in - and the site using Gravatar is providing that ID to all users. This ID can be found on other sites, so if someone does a full transitive search across all public Internet forums, they could see what the same ID has posted. Where privacy breaks down is that if all of these sites know your real email address, and has publicly given everyone your Gravatar ID, then the Chinese government (for example) could harass not just a single entity, but ANY one of those websites that published your Gravatar ID.
What makes more sense to me is if Gravatar simply "GAVE OUT" an identifier (to you) if you sign up. And when you want to use your Gravatar, you simply give that same ID to sites that use Gravatar.
Sites that use Gravatar without your consent are the ones to blame. Gravatar's user base grows because of this principle (it's their business model - people that want a picture associated with their latest post).
I wish people would try to understand this concept first. Another poster here is generally correct in that if someone REALLY wanted to find out your real identity, they probably could, but Grav IDs make it even easier. You'll see when the equivalent of rainbow tables [2] comes along for Grav IDs (like a site that lets you enter a gravid and it will tell you all URLs that use that GravID.)
I just thought of another way to secure gravatars- they could have used PKI [3] rather than hashing. With PKI with some time based or per-instance/site salt, the ID would not be normalizable to an outsider. In fact, shame on Gravatar for not thinking of this. They're going to claim it has something to do with access to encryption APIs in the languages they are supporting, but I'm pre-emptively calling bullshit right now.
[1] http://en.wikipedia.org/wiki/MD5mtncartoons.com
, claims "Typically, $100 newsletter, $50 Web use." I guess we should delete this answer? - Arjan
As a lawyer, I strongly urge all blog authors and users who are concerned about their privacy to avoid Gravatar.
What does being a lawyer have to do with any of that rant?
Note that DoubleClick [1] (now Google) has for a very long time collected such usage data.
If this is a concern, use your browser or your hosts file [2] to block all accesses to Gravatar's servers. This will disable their ability to follow you.
[1] https://en.wikipedia.org/wiki/DoubleClickGiven the recent switch to Unicornify, I believe this question can be closed as status-completed
.
Is Gravatar a privacy risk? WT* [1]
I realize these posts are quite old, but I have been dismayed at the level of paranoia shown by some of the posters on this and similar threads.
Surely, in this day and age the cautious consciously obtains / maintains a few email accounts which matter little and are serviced even less... Just for all those mostly sign up and forget situations. So who cares if someone without a life and with huge amounts of computer power (and hacker software) eventually figures out an address from an hash.
If the name part of the free (probably) mail account(s) is mostly gobbledygook then that makes hash parsing / matching even harder. And if you use simple, real words, names or numbers for passwords to boot you have to accept you are really stupid.
But at the end of the day, a lot more worrisome for the paranoid (and so called "lawyers" with a bone to pick) are the plethora of form submission checks (AKA like excellent "aKismet") and general RBL [2] access blockers / redirects used (quite rightly so) on so many web sites you won't even know about unless you are actually a spammer of various sorts.
All those rely on and receive our email addresses and IP addresses of which they wholly or in part rely on. Is an upfront service like Gravatar to be trusted less when those others could track FAR more than a few sites here and there displaying an avatar pic. I think not.
And while we're at it, get the Ghostery [3] add-on for your browser and see just how many months it can take to get on top of (blocking) the huge amount of third-party tracking cookies we are barraged with. That's really scary too.
Discovery of a mere email address one consciously supplies has nothing on all those occurrences I mentioned, any of which if harvested can also be packaged for others at a profit. Any monetized site especially SHOULD declare it all in a readily accessed privacy declaration; does yours?
[1] https://en.wiktionary.org/wiki/WTF#Initialism