Skip to main content
Cornell University mobile logo

Labor Dynamics Institute

Abstract netting in green and red

Our mission is to create and make accessible novel data on the dynamics of the labor markets, we work with research networks and statistical agencies, developing appropriate statistics to inform policy makers, researchers, and simply people seeking knowledge. We emphasize and meet the requirements of stakeholders: users as well as providers, balancing the utility of the data with the confidentiality of the people and businesses whose activities the data describe.

Abowd will be a keynote speaker at the Synthetic Data Workshop sponsored by The MITRE Corporation

John M. Abowd will be one of the keynote speakers at the Synthetic Data Workshop sponsored by The MITRE Corporation on July 29th.  

MITRE is hosting a technical exchange meeting (TEM) to discuss the challenges in the generation and use of synthetic data, a technology that has the potential for resolving some of those data access restrictions. See program details here.

Synthetic Data Publication and Access to Confidential Data

John M. Abowd

Abstract

An essential aspect the synthetic data program that emerged from the NSF-ITR grant (2004-2007 to Cornell University with partners at Census, IAB, Carnegie-Mellon, Duke and Michigan) is the hierarchy of access modalities to confidential data: public-use, restricted-access, and supervised direct access. One important feature of this hierarchy is the role of the feedback cycle between synthetic data (a public-use mode) and restricted-access to the underlying confidential data. Good quality synthetic data involves many modeling decisions even when the process appears to be automated. Synthetic data users represent a much broader community than any synthetic data development team can cover. When these users provide their results to the development team in exchange for validation on the underlying confidential data (a supervised direct-access mode), the quality of both the synthetic and confidential data is improved. We can now document this claim. This talk presents the current evidence and discusses the prospects for enhancing this feedback cycle.