Considerations for Developing Disclosure Avoidance Systems for Longitudinal Survey Data
Longitudinal surveys, such as Panel Study of Income Dynamics (PSID) and Survey of Income and Program Participants (SIPP), are indispensable to study the changes in the society, impacts of various factors on those changes and to gain deeper understanding of policy decisions needed for the betterment of the society. Data from such studies should be widely distributed, analyzed from multiple perspectives for developing and implementing policy decisions. Releasing such data, though beneficial, also can lead to disclosure of private information shared by the respondents with the data gathering agencies. This breach of confidentiality of the responses can harm the respondents in many ways and, therefore, a Disclosure Avoidance System needs to be in place that limits disclosure but at the same time allows the use of data for statistical purposes (that is, to draw statistical inferences about the population aggregates).
In 2022-2023, the committee on National Statistics within the National Academies of Sciences, Engineering, and Medicine convened a panel of experts to examine disclosure avoidance in SIPP. In this short article we will discuss four core considerations emanating from the Panel’s report that we argue are essential for developing a disclosure avoidance system for public releases of longitudinal data:
(1) Assessment of disclosure risk using quantitative measures;
(2) Development and assessment of disclosure mitigation strategies;
(3) Maintenance of usability of data for statistical purposes; and
(4) Continuous communication between the agencies collecting the data and the audiences.
In particular, we will make the following points in our article:
a) Assessment of disclosure risk in a longitudinal setting is different than in the case of cross-sectional surveys. Rich data are collected and changes over time introduce increased risks of disclosing information from respondents that may comprise its confidentiality. Information and strategies available to intruders for determining the identities of individuals and their data are rapidly changing. These features of longitudinal data, as well as others, give rise to tensions between protecting the privacy of data while ensuring that these data are usable and useful for its potential users and require continuous monitoring of these developments.
b) There is no single system for disclosure mitigation or, put differently, “one approach simply does not fit all data sets.
c) For publicly released longitudinal (and other forms of) survey data, it is essential to consider multiple forms of access, that differentiate such releases by their levels of detail and potential disclosure risks and accessibility by different types of users and different use purposes. Thus, we will discuss the relative merits of synthetic data, secured online data access, dealing with geographical information for data releases.
d) In developing public data release strategies, it also is essential to develop a framework determining the usability of data for different user communities and use purposes.
e) A key, and sometimes overlooked, obligation of data disseminators is the need for communication. This includes explaining the importance of protecting privacy, providing clear and honest characterizations of the disclosure risks associated with such data releases, and making sure there is a continual process of communication with data users and privacy advocates about the changing nature of disclosure risks threats and the changing nature of research and other uses of released data.
In addressing the above considerations, we will seek to ensure that our article is written in a way to address and inform three different potential audiences: Survey and Database managers, Privacy Researchers and Data Users.