The next time a company tells you that it wants your information, but “Don’t worry, it’s anonymized,” don’t believe it.
A group of researchers from MIT proved this week that it can identify almost anyone using just four pieces of information in a supposedly anonymized database of credit card transactions — three, under certain conditions. The research follows similar work done with a database generated by cell phone usage; and other research showing how easy it is to “guess” a person’s Social Security number when given a couple of facts about them.
“That means that someone with copies of just three of your recent receipts — or one receipt, one Instagram photo of you having coffee with friends, and one tweet about the phone you just bought — would have a 94 percent chance of extracting your credit card records from those of a million other people,” said MIT in its announcement of the research.
It’s a problem that people who study big data sets understand well. Pick a slice of data, find matches, do some cross-referencing, and the cloak of anonymity disappears — people can be uniquely identified.
In this project, study author Yves-Alexandre de Montjoye examined three months of credit card data covering 1.1 million consumers. Picking two dates within those three months, they found one (and only one) person who shopped at both a particular coffee shop and a restaurant. After doing that, they could see everything that person purchased during the three months.
“We are showing that the privacy we are told that we have isn’t real,” study co-author Alex “Sandy” Pentland told The Associated Press.
The research was published in this week’s edition of the journal Science.
Anonymized data is now used for everything from serving Internet ads to traffic planning to conducting medical research. Lack of trust in anonymity could severely dampen the potential of big data.
“Sandy and I do really believe that this data has great potential and should be used,” de Montjoye said. “We, however, need to be aware and account for the risks of re-identification.”