A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census: Full Technical Report

John M. Abowd; Tamara Adams; Robert Ashmead; David Darais; Sourya Dey; Simson L. Garfinkel; Nathan Goldschlag; Daniel Kifer; Philip Leclerc; Ethan Lew; Scott Moore; Rolando A. Rodríguez; Ramy N. Tadros; Lars Vilhuber

doi:10.3386/w31995

A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census: Full Technical Report

John M. Abowd, Tamara Adams, Robert Ashmead, David Darais, Sourya Dey, Simson L. Garfinkel, Nathan Goldschlag, Daniel Kifer, Philip Leclerc, Ethan Lew, Scott Moore, Rolando A. Rodríguez, Ramy N. Tadros & Lars Vilhuber

Working Paper 31995

DOI 10.3386/w31995

Issue Date December 2023

Revision Date July 2025

For the last half-century, it has been a common and accepted practice for statistical agencies, including the United States Census Bureau, to adopt different strategies to protect the confidentiality of aggregate tabular data products from those used to protect the individual records contained in publicly released microdata products. This strategy was premised on the assumption that the aggregation used to generate tabular data products made the resulting statistics inherently less disclosive than the microdata from which they were tabulated. Consistent with this common assumption, the 2010 Census of Population and Housing in the U.S. used different disclosure limitation rules for its tabular and microdata publications. This paper demonstrates that, in the context of disclosure limitation for the 2010 Census, the assumption that tabular data are inherently less disclosive than their underlying microdata is fundamentally flawed. The 2010 Census published more than 150 billion aggregate statistics in 180 table sets. Most of these tables were published at the most detailed geographic level—individual census blocks, which can have populations as small as one person. Using only 34 of the published table sets, we reconstructed microdata records including five variables (census block, sex, age, race, and ethnicity) from the confidential 2010 Census person records. Using only published data, an attacker using our methods can verify that all records in 70% of all census blocks (97 million people) are perfectly reconstructed. We further confirm, through reidentification studies, that an attacker can, within census blocks with perfect reconstruction accuracy, correctly infer the actual census response on race and ethnicity for 3.4 million vulnerable population uniques (persons with race and ethnicity different from the modal person on the census block) with 95% accuracy. Having shown the vulnerabilities inherent to the disclosure limitation methods used for the 2010 Census, we proceed to demonstrate that the more robust disclosure limitation framework used for the 2020 Census publications defends against attacks that are based on reconstruction. Finally, we show that available alternatives to the 2020 Census Disclosure Avoidance System would either fail to protect confidentiality, or would overly degrade the statistics’ utility for the primary statutory use case: redrawing the boundaries of all of the nation’s legislative and voting districts in compliance with the 1965 Voting Rights Act. You are reading the full technical report. For the summary paper see https://doi.org/10.1162/99608f92.4a1ebf70.

The research presented in this article was initiated, funded, and supervised by the U.S. Census Bureau. All authors were either employees or contractors of the Census Bureau while performing their contributions. The views and opinions expressed in this paper are those of the authors and not the U.S. Census Bureau. Research was conducted under Project ID: P-7502798. John Abowd, Simson Garfinkel, Ramy Tadros, and Lars Vilhuber worked in their personal capacities and without access to any confidential data after leaving the Census Bureau (Abowd, Garfinkel, and Vilhuber) and Galois (Tadros), respectively, to assist in preparing the manuscript for publication. Statistics reported were released under DRB Clearance numbers CBDRB-FY20-DSEP-001, CBDRB-FY22-DSEP-003, CBDRB-FY22-DSEP-004, and CBDRB‐FY23‐0152. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.

Simson L. Garfinkel
Simson Garfinkel is writing a book on differential privacy.
Copy Citation

John M. Abowd, Tamara Adams, Robert Ashmead, David Darais, Sourya Dey, Simson L. Garfinkel, Nathan Goldschlag, Daniel Kifer, Philip Leclerc, Ethan Lew, Scott Moore, Rolando A. Rodríguez, Ramy N. Tadros, and Lars Vilhuber, "A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census: Full Technical Report," NBER Working Paper 31995 (2023), https://doi.org/10.3386/w31995.

Download Citation

MARC RIS BibTeΧ
- December 21, 2023

A Simulated Reconstruction and Reidentification Attack on the 2010 U.S. Census: Full Technical Report

Related

Topics

Programs

More from the NBER