The Economics of Socially-Efficient Privacy and Confidentiality Management for Statistical Agencies
John M. Abowd, Ian Schmutte (UGA), Lars Vilhuber
We propose to create a library of the most promising privacy-preserving algorithms, then systematically evaluate the trade-offs between accuracy and privacy when they are applied to real large-scale databases. Furthermore, we establish new measurements of the relative social preferences for privacy protection and data accuracy. Overall, the proposal aims to provide guidance primarily for public statistical agencies on the most promising methods to manage confidential information resources to maximal social benefit.
Using a variety of real social-science data sets, we will produce synthetic data to assess feasible combinations of privacy protection and statistical accuracy. Researcher users and our own replication exercises will illuminate the feasible scale of accurate, confidentiality-protected queries that can be addressed to large, sparse databases. We formalize the production possibilities frontier for producing maximal statistical accuracy at a specified level of privacy protection. We measure preferences over privacy protection and statistical accuracy using surveys, both existing and new, as well as laboratory experiments.
Based on our state-of-the-art algorithm library of privacy-preserving query mechanisms, the research will develop implementable mechanisms to generate provably accurate, synthetic data with formal privacy protection. We will improve the synthetic data while staying within the budgeted privacy protection. We will analyze these synthetic data and the source data using realistic, real research problems. We will field surveys and experiments to measure preferences over accuracy and privacy.
The expected outcomes are improved access to detailed but confidential databases, while preserving the privacy of respondents and subjects. The increase in access has a multiplier—future researchers will be able to make new findings, and better inform policy, and their results can be used to improve the published data.