## Summer student research program 2012

### Principal investigators

John Abowd and Lars Vilhuber

### Other Investigators

Undergraduate students at Cornell with an interest in economics, statistics, mathematics, high-performance computing.

### Sponsor

Funding provided by the ILR School

### Short description

We maintain a portal to the National Science Foundation's XSEDE (www.xsede.org) supercomputer facility called the Social Science Gateway (SSG). Using the SSG we prepare large-scale data analyses using local labor market data. Compute problems are then submitted to the XSEDE resources for computation and visualization.

We have begun to specify and estimate spatial-temporal models for these data. As is common for economic data analyses, the problems do not decompose cleanly into smaller independent components. The design structure for each model has to be analyzed and appropriate parallel optimization tools applied.

Example problems are listed below. Others can be specified. Interested students should contact the Labor Dynamics Institute at ldi@cornell.eduÂ directly to inquire about any opportunities. Limited summer funding may be available.

*Problem 1:* Estimates of the total variability of the Quarterly Workforce Indicators. Using twenty released vintages of these data (250GB of data per vintage), we use Matlab and R (preferred) to compute the variability due to the data quality, edit and imputation based on generalized linear mixed models. The structure of the parallel computation is well-understood for this problem, but it has not been implemented. This problem would introduce the students to the use of the XSEDE supercomputers and the implementation and programming of GLIMMs, a statistical tool typically covered in first-year graduate statistics and econometrics courses.

*Problem 2:* Geospatial employment and residence data coded to the census block (8.2 million geographical units) have been released for the years 2002-2010. The Census Bureauâ€™s visualization of these data is an application called OnTheMap, which serves the research purposes of a general audience. Our Institute wants to develop visualization tools for research scientists. To do this, these data must be staged to the XSEDE data server in a manner that allows Matlab and R tools to work with them to produce block-level thermal maps. This is a challenging problem. (A NYC company called Social Explorer has done this for the American Community Survey and those fascinating maps of the changing U.S. demographics on the front page of the NY Times electronic edition are the result.)

These projects will draw on the mathematics, statistics and economics that the students should have already taken as well as giving them the opportunity to apply those skills in a collaborative research environment with Ph.D. students and senior researchers.