Robin Hood: A De-identification Method to Preserve Minority Representation for Disparities Research

James Thomas Brown, Ellen W. Clayton, Michael Matheny, Murat Kantarcioglu, Yevgeniy Vorobeychik, Bradley A. Malin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Data stewards often turn to de-identification to make data available for research while complying with privacy law. A primary challenge to de-identification is balancing the privacy-utility tradeoff, but optimizing the tradeoff with respect to a complete dataset has been shown to create both privacy risk and data utility disparities between subgroups of individuals represented in the dataset. Notably, the minority populations incur the greatest utility loss and privacy risks. Recent studies have shown that utility inequalities can mask disparities and bias algorithms trained on such data. Yet achieving equal privacy and utility is inherently constrained by the fact that each subgroup has a different privacy-utility tradeoff, differences that are exacerbated by the deterministic transformations that standard de-identification models typically employ. To address this problem, we introduce Robin Hood, a de-identification method that leverages non-deterministic transformations to more equally distribute risk and utility in a de-identified dataset. It does so by transforming majority groups’ records in a way that gives minorities privacy. We show how Robin Hood can provide equal privacy protections to all records in a dataset at expectation while supporting more accurate and consistent disparity estimation than standard k-anonymity methods in simulated and real-world Census data.

Original languageEnglish
Title of host publicationPrivacy in Statistical Databases - International Conference, PSD 2024, Proceedings
EditorsJosep Domingo-Ferrer, Melek Önen
PublisherSpringer Science and Business Media Deutschland GmbH
Pages67-83
Number of pages17
ISBN (Print)9783031696503
DOIs
StatePublished - 2024
EventInternational Conference on Privacy in Statistical Databases, PSD 2024 - Antibes Juan-les-Pins, France
Duration: Sep 25 2024Sep 27 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume14915 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Conference on Privacy in Statistical Databases, PSD 2024
Country/TerritoryFrance
CityAntibes Juan-les-Pins
Period09/25/2409/27/24

Keywords

  • Anonymization
  • De-identification
  • Fairness

Fingerprint

Dive into the research topics of 'Robin Hood: A De-identification Method to Preserve Minority Representation for Disparities Research'. Together they form a unique fingerprint.

Cite this