Header graphic for print
HL Chronicle of Data Protection Privacy & Information Security News & Trends
Posted in International/EU Privacy

European Regulators Raise the Bar on Anonymization Techniques

The Article 29 Working Party’s new opinion on anonymization techniques provides a useful primer on randomization and generalization (i.e., data aggregation) techniques used to anonymize data sets.  The opinion analyzes each technique based on three ways that data can be re-identified: the ability to single out individuals after the anonymization technique has been applied; the linkability of the anonymized data sets to other data sets; and finally the ability of the data sets to resist inference attacks after application of the anonymization technique.  Organizations depending on anonymization for compliance with the Data Protection Directive would be well advised to review their anonymization processes to determine if they comport with the standards set out in the opinion. According to the Working Party, a robust anonymization technique must resist all three tests, pointing to Data Protection Directive recital 26 which states that “to determine whether a person is identifiable, account should be taken of all the means likely reasonably to be used” to re-identify the person.  The most robust form of anonymization involves a cumulative use of both randomization and generalization techniques.  In the Working Party’s 2007 opinion on the concept of personal data, the Working Party left data controllers a certain amount of flexibility when it appeared highly unlikely that the data could be re-identified.  Example 13 in the Working Party’s 2007 opinion illustrates this point in the context of pharmaceutical research data.

Under the new opinion, the “likely reasonably to be used” test appears to provide little comfort, given the heightened sophistication of re-identification techniques.  The 2014 opinion points out that irreversible hashing will in most cases be insufficient by itself to guarantee anonymization.  The only exception could be a keyed hash function with deletion of the key, or a tokenization technique where the new code numbers are generated randomly without any mathematical link to the original data.  Even then, linkability could be achieved if other data sets exist involving the same population of data subjects.

The Working Party’s opinion emphasizes that even where a data controller believes it has successfully anonymized personal data, the data controller must periodically re-evaluate the risks in light of developments in re-identification techniques.  What emerges from the Working Party’s new opinion is that anonymization, like many other aspects of data protection, requires a governance structure to conduct an initial risk analysis and on-going follow-up.  For some data, the risk of less-than-perfect anonymization may be acceptable.  For sensitive data, even the smallest risk of re-identification may be a show-stopper.

For the Working Party, true anonymization is equivalent to erasing the data entirely.  This suggests that when there exists even a small residual risk of re-identification, data will be considered as continuing to fall within European data protection laws.  In practice, this means that data controllers may have to use a belt-and-suspenders approach when considering big data projects using (supposedly) anonymized data.

To meet the Working Party’s standard, data controllers may wish to have the anonymization technique used for the project first reviewed by a committee including a data scientist independent from the project team.  Second, depending on the risk score given by the independent committee, the data controller should put safeguards into place as if the data were not completely anonymized.  These safeguards may involve limiting access to the data set and ensuring that bulk copies of the “anonymous” data set are not made.

This approach would permit the data controller to argue first that the data are sufficiently anonymous and therefore do not fall within the scope of European data protection rules, and second to show that even if the data were not totally anonymized, the data controller put into place reasonable back-up safeguards to limit risks.