Current research projects supported by the Zorro HPC System — organized below by reverse chronology and on right alphabetically by researcher:
Genomics of a deep terrestrial subsurface nematode,
John Bracht (CAS/Biology)
The surprising discovery of a nematode from the deep terrestrial subsurface, Halicephalobus mephisto, fundamentally altered our perception of the adaptability and ecological range of metazoa. This finding has profound implications for our understanding of the evolution of life on earth and the search for life on exoplanets. In this project, we will assemble the genome of this organism and determine the mechanisms by which it adapted to an extreme environment containing very little O2, high pressure, and elevated temperature. Halicephalobus mephisto is a moderately thermophilic (35-41°C), parthenogenetic, bacteriofagous nematode discovered within a fluid-filled fracture, 1.4 km beneath the surface in South Africa. The high pCH4 (2.5 bars), high sulfide (~1 ppm) and low pO2 (0.07 bars) present significant environmental challenges to this aerobic metazoan. Nematodes are notoriously adept at adapting to novel environments, having explored a wide range of free-living, zooparasitic, or phytoparasitic environments. The process of environmental adaptation and selection leaves a lasting imprint on genomes, giving us a powerful approach to study the adaptation of this metazoan to its extreme environment.
Milieu: Defining Racial Context with Geolocation Data
Ryan Moore (SPA/Government)
Across disciplines, scholars strive to better understand individuals' milieus—the people, places, and institutions individuals encounter in their daily lives. In particular, political scientists argue that racial contexts shape individuals' attitudes about candidates, policies, and people of various races and ethnicities. Yet the current standard of measuring milieus is to place survey respondents in one or two geographic containers like counties or census tracts and then to ascribe all of that container's characteristics to the individual's milieu. Using a new dataset of over 2.6 million GPS records from over 400 individuals, we compare conventional static measures of racial context to dynamic and precise measures of their milieus. In particular, we demonstrate how low-level static measures (such as census block) tend to overstate how extreme individuals' racial contexts are, and how this overstatement can lead to underestimates of racial contextual effects.
Genetic Algorithms for Experimental Design
Ryan Moore (SPA/Government)
Blocking before treatments are assigned can improve the precision of causal estimates, make experimental estimates more robust to unlucky randomizations or inadequate parametric adjustment, and preserve power, even in relatively small group-randomized trials. But how should experimentalists best incorporate valuable covariate information into blocked designs? We focus on the critical question of setting the covariate weights that define the extent to which experimental units are similar or different. Using simulated and applied data, we assess the relative performance of covariate weights derived from two sources: first, from genetic matching algorithms which explicitly seek to optimize balance; second, from Mahalanobis distances without explicit balance-seeking. We show that balance in randomized experiments that use genetic covariate reweighting is superior to that obtained by random allocation, but also by Mahalanobis-metric blocking alone, even when done optimally. Further, we show that a genetic algorithm can directly improve the robustness of a design to unobserved confounding when tuned to do so. Our results provide guidance for experimentalists and offer new insights about how to improve practice in experimental design.
Cost-Effectiveness of a Lifeline Telephone Crisis Center for Reducing Emergency Department Visits and Preventing Suicide, 2009-2014
Brian Yates (CAS/Psychology)
Suicide is a serious public health problem with considerable societal costs. However, few previous studies have compared the costs of suicide prevention programs to positive outcomes for society as well as the individual. Suicide prevention hotlines are widespread and provide suicide prevention for callers in crisis. However, the cost-effectiveness of such hotlines is unknown. This study will obtain data from a large Lifeline call center serving a tri-state area in the United States for the period January 1, 2009 to January 1, 2014. We will test whether the empirically-based Lethality Assessment conducted by Lifeline workers leads high-risk callers to seek emergency department (ED) services. As EDs can be over-utilized by suicidal persons, we hypothesize that Lifeline can triage only those callers who need immediate medical care. Lifeline is a low-cost option to ED care for low-risk callers. This research includes the immediate and direct costs of preventable ED visits related to suicide. Additionally, we hypothesize that Lifeline is a cost-effective suicide prevention program in that it increases Quality Adjusted Life Years (QALYs) for high-risk callers who receive ED care. The cost-utility analysis of QALYs will examine, whether the cost of operating Lifeline yields an economic benefit a low incremental cost ratio for service operations compared to QALYs gained.
Analysis of Periodic Point Processes
Kevin Duke (CAS/Mathematics)
Our work focuses on extracting information from periodic point processes. These problems arise in numerous situations, from radar pulse repetition interval analysis to bit synchronization in communication systems. We divide our analysis into two cases: periodic processes created by a single source, and those processes created by several sources. We wish to extract the fundamental period of the generators, and, in the second case, to deinterleave the processes. We are developing efficient algorithms for extracting the fundamental period from a set of sparse and noisy observations of single and multiple source periodic processes. The algorithms are computationally straightforward, stable with respect to noise, and converges quickly.
Solving Twisty Puzzles
Donna Dietz (CAS/Mathematics)
Many solutions abound for the well-known Rubik's Cube puzzle. Similarly, any "twisty puzzle" made up of regular corner/edge/center pieces can be solved using basic tools of abstract algebra. However, if a sufficient number of pieces are fused together, a "bandaged puzzle" emerges, and the mathematics behind this is very complicated. In fact, it is known to be at least NP-hard, as a class of puzzles. All bandaging patterns for the standard Rubik's Cube have been solved, but beyond this, there is still much which is unknown about this class of puzzles. For a further explanation of the ideas in this paragraph, please see Jaap Scherphuis' explanation at: http://www.jaapsch.net/puzzles/pspace.htm
The project we are currently working on is to count (or estimate) the color-free configurations of a particular bandaging pattern for the Megaminx. The Megaminx is a dodecahedral puzzle with each face having a center, 5 edges, and 5 corners. The bandaging pattern we are investigating is one we found a pattern for on an internet chat board for twisty puzzles. (The thread is presently at: http://www.twistypuzzles.com/forum/viewtopic.php?f=9&t=18677)
Fast functional integrals with applications to dynamical systems
John Tillinghast (CAS/Mathematics)
Functional integrals (also called "path integrals") are integrals over function spaces, such as the set of Brownian motion paths from one point to another. They have been used for decades in probability and physics. As high-dimensional integrals they tend to be computation-intensive, generally requiring Monte Carlo sampling. This talk explains how to use sparse Laplace approximation, including higher-order Laplace approximation, for functional integrals. We introduce a way to calculate the higher-order terms in O(N) time, where N is the number of time points. For an infectious disease data set, we use functional integrals to estimate the model parameters. We compare the parameter estimates and approximate integrals to results from importance sampling. For this example, the higher-order approximation is extremely accurate, and even the basic approximation gives very good estimates of the model parameters. Speed is significantly greater than for STAN (a new, general-purpose tool for Hamiltonian MCMC) while getting near-identical parameter estimates.
Three Essays on the Interaction of Fiscal Shocks and Budgetary Constraints on the Public Education Sector
Michael Hayes Independent research student
My dissertation examines how budgetary constraints on school districts create a host of challenges for public managers, including higher levels of teacher turnover, as well as an unequal distribution of state funding across school districts. I plan to use the HPC to run bootstrapped quantile regressions.
Migration, Population imbalance and Decentralization in Indonesia
Smriti Tiwari Independent research student (CAS/Economics)
Most studies on migration focus on the impact of migration and/or remittances on the economic development of the migrant sending countries. But would better opportunities at home generated through economic development deter migrants? By using the unique features of decentralization in Indonesia, this paper aims to get at the role of local development on migration behaviors. In the case of Indonesia, it will also shed light on its implications, if any, on the differences in population pressures.
Characterization of Felid SINEs
Kathryn Walters-Conte (CAS/Biology)
Short-interspersed nuclear elements (SINEs) comprise a class of nuclear DNA that can define evolutionary history. Abundant in mammalian genomes, these transposable elements can characterize lineages. A SINE family‘CanSINE’ has been described in the order Carnivora. We are pursing examination of these motifs in the Feliformia suborder (cat-like animals) and Felidae family (cats) with respect to the distribution of conserved and non-conserved insertion sites, the utility of these sequences as markers for hybridized cats and the implementation of SINEs as tools in undergraduate education.
Gender Impact of Malawi Input Subsidy
Paul Winters (CAS/Economics)
Alleviating gender differences in agricultural productivity is not only a matter of equity but it is vital to the goal of reducing poverty. While a number of studies suggest that gender differences in agricultural productivity are a result of female farmers having access to fewer resources, few studies investigate the role of agricultural interventions in alleviating the constraints to input use and subsequently the gender gap in productivity. For the first time, this study investigates whether there are gendered gains in agricultural productivity from participating in an input subsidy program. Using nationally-representative data that is disaggregated at the plot level, this study analyzes the large-scale voucher-based Farm Input Subsidy Program in Malawi. Since beneficiaries are not randomly selected, an instrumental variable approach is used to explore the relationships, focusing on the total value of output per hectare. The relationships are evaluated at each decile of the productivity distribution. The findings suggest that the input subsidy program benefits male farmers with those at the lower end of the productivity distribution incurring the largest gains. Female farmers, on the other hand, do not appear to achieve productivity gains from participating in the program which suggests that they face additional constraints to productivity apart from non-labor input use.
Exchanging Fire: Trade, Conflict, and the Strategic Incentives of Indirect Economic Interdependence
David Ohls (SIS)
How do indirect economic ties—participation in common global networks of production and trade—dampen conflict incentives between antagonistic pairs of states? Using a formal model of resource allocation in the context of dyadic conflict, I show that latent economic interdependence reduces fighting incentives. I then test this empirically using dyad-year panel data from 1948 to 2000. I find evidence consistent with the hypothesis that mutual reliance on the same outside trading partner and densely-linked trade networks strongly decreases the likelihood of interstate disputes and increases scores on conflict-cooperation scales. This effect is particularly strong in rivalrous dyads, which frequently come into conflict; although such pairs of states tend to have have limited direct economic engagement, they often share great underlying structural potential for cooperation. These results have important implications for theoretical research on the links between economics and security, dyadic rivalries, and the role of third parties in international disputes.
Land Titling and Investment in Sub Saharan Africa
Woubet Kassa Independent research student (CAS/Economy)
The role of property rights in resource allocation has been one of the central themes in development economics. There has existed extensive theoretical arguments that property rights in land are closely associated with the allocative efficiency of agricultural resources as well as investment decisions. However, empirical findings have not been conclusive. This has been complicated due to possible endogeneity of titles, unobserved hetrogeneities and the non-experimental nature of the data. This study employs various econometric tools to address these challenges using the Living Standards Measurement Study surveys of six Sub-Sahran African countries.
Stochastic Demand Theory of Gene Regulation
Corinne Abolafia (MA Candidate, Mathematics)
Dr. Tuncay Alparslan (CAS/Mathematics and Statistics)
(Thesis for MA in Mathematics) We develop a stochastic model based on continuous-time Markov chains for selection between different modes of gene regulation.
Information Theoretic Modeling
Dr. Amos Golan (CAS/Economics/Info-Metrics Institute)
Dr. Heath Henderson (Research Associate, Info-Metrics Institute)
Skipper Seabold (PhD Candidate, Economics)
This project, joint with Heath Henderson and Skipper Seabold, develops an improved information-theoretic estimator. It is a computational semi-intensive method that has proved to dominate other traditional methods for all finite and complex data.
The Determinants of Student Retention at a Private Selective Post-Secondary InstitutionDr. Seth Gershenson (SPA/DPAP)
Despite progress in leveling the playing field for disadvantaged college students, there continue to be gaps in college completion rates between students of different backgrounds. These gaps have the potential to perpetuate socioeconomic disparities, as four-year degrees become more of a prerequisite for labor market success. Existing research suggests that differential completion rates are driven almost entirely by differences in the persistence of admitted students, as opposed to differences in admission rates. Accordingly, the proposed research project seeks to identify the determinants of persistence at a selective private post-secondary institution.
Large Truck Crashes & State Roadside Inspection Practices:
A Spatial Analysis
Janine Bonner (Independent Researcher)
This project will combine the practices of analyzing large data, mapping, and drawing horizontal conclusions, while utilizing the statistical practices of linear regression and spatial analysis. The goal of the project is to draw correlations between individual state inspection practices and fatal crash rates by state. The intention is to show whether or not better regulatory compliance of state inspection practices helps reduce crash rate and/or severity.
Harnessing the power of parallel computing
for computer-aided drug discovery
Dr. Stefano Costanzi (CAS/Chemistry & Center for Behavioral Neuroscience)
My research revolves around the study of the cellular targets of drugs, the identification and pharmacological characterization of molecules that modulate their activity, and the examination of the cellular consequences resulting from such pharmacological intervention. Through the application of computational and experimental biochemical pharmacology techniques, the ultimate goal of my laboratory is to rationally identify molecules potentially endowed with a desired pharmacological activity and subsequently test their biological effect on mammalian cells that express the target of interest either naturally or artificially. The main research focus is on the discovery of compounds that act through G protein-coupled receptors (GPCRs), the single family of cellular targets most exploited by currently marketed medicines.
"Preliminary" Global Liquidity and Corporate Risk-Taking
Dr. Valentina Bruno (Kogod/Finance)
I plan to investigate whether global liquidity provided by the intermediary sector through cross-border capital flows has increased the corporate risk-taking by firms before the financial crisis.
Bruno, Valentina, and Hyun Song Shin. 2014. "Globalization of corporate risk taking." Journal of International Business Studies. Available: http://www.palgrave-journals.com/doifinder/10.1057/jibs.2014.12
TEACHER QUALITY, STUDENT BEHAVIOR, AND STUDENT ACHIEVEMENT
Seth Gershenson (SPA/DPAP)
This project seeks to estimate the effect of misbehavior on academic performance, the effect of teachers and peers on misbehavior, and the implications of omitting measures of misbehavior from value-added models of the education production function for rankings and estimates of teacher effectiveness.
"Panel data evidence on the effect of school size on student achievement" (joint with Laura Langbein) at the 2013 AEFP and AERA Annual Meetings.
Spencer Foundation Research Grant for "Linking Teacher Quality, Student Attendance, and Student Achievement," 2013-14, $39,427.
Development of a Local Spatial Indicator of Association Based on Modified Moran's I
Jess Chen (CAS/Economics)
Local indicators of spatial association (LISAs) are used to detect clusters in spatial data. I derive and implement a LISA based on Jackson et al.'s Modified Moran's I (2010). I also conduct a simulation study to compare the power of this test against that of existing LISAs under various scenarios of underlying populations, spatial weight matrices, local and global clusters, and various degrees of data sparseness. This is one part of my dissertation, which will be on methods and applications in spatial statistics.
Listening to Noise
Justin Grana (CAS/Economics)
Faculty Sponser: Alan Isaac (CAS/Econ)
Attempting to see if noise in trader activity influences the behavior of institutional traders.
Geospatial Determinants of Voting Behavior
Andrew Breza (SPA/Public Policy)
Faculty Sponsor: Alan Ford (CAS/GIS/Computer Science)
Does the distance that an individual lives from a polling place affect his or her likelihood of voting? Scholars and practitioners have written thousands of articles and books on why some people choose to vote while others do not, but many of them ignore local geography or focus on individual cities. This study seeks to calculate the decision to vote based on the distance that an individual lives from a polling place. Instead of focusing on a single city, as past researchers have done, this study uses the voter registration data and 2008 and 2012 voter histories from six states in order to generalize results. Because of the diversity of the sample, distance is regressed with several other independent variables, including access to and use of public transportation, income, unemployment, dominant local industries, and several other factors. Regressions will include a logistic regression with "Voted in person" as its dependent variable, and a multinomial logit regression with three dependent variables: voted in person, voted absentee, and did not vote. This study represents a unique contribution to voting choice literature due to the size and diversity of its sample, block-level demographic data, and use of a Geographic Information System (GIS).
Interval-Valued Data Estimation
Tual Tuang (CAS/Economics)
Faculty Sponsor: Amos Golan (CAS/Economics)
There is a growing body of literature in dealing with interval data. Most of the focus is on only the first and second moments of the interval, then apply the classical regression type estimation. This project will try to look at it from Information-Theoretic perspective and considers higher moments in estimating and predicting such interval-valued data.
Three Projects on Taxes, Revenue, and Budgets
Daniel Mullins (SPA/Public Administration and Policy)
The Future of the Property Tax:
Institutional Factors which Shape its Acceptability, Yield and Burden Distribution
The Role of Local Revenue and Expenditure Limitations in Shaping the Composition of Debt and Its Implications for Efficiency, and Intergenerational and Intergovernmental Equity in Local Public Finance
Population Sorting, Economic Segregation and Growing Fiscal and
Budgetary Disparities in U.S. Communities
Paper 1: This study seeks to first update the findings of Mullins and Mikesell (2009) in an effort to better understand the lag properties of subnational tax bases with particular scrutiny placed upon property tax dynamics. Specifically, we assess the relative performance of tax bases, both vertically (local vs. state jurisdictions) and horizontally (across tax instruments).
Paper 2: The proposed paper is the first study to examine whether the enactment of TELs have changed the composition of debt in local governments. Particularly, we are interested in whether governments constrained by binding TELs have experienced increases in their ratio of debt to general revenue for financing government services and whether there has been a shift toward non-guaranteed debt.
Paper 3: This paper uses Census of Population and Housing Data, the American Community Survey and the Government Finance Series to identify the factors that lead to increased disparity and economic sorting as well as those that provide a revenue net to communities with the most limited resource capacities. It concludes that economic sorting is becoming more pronounced and that such sorting artificially and geographically constrains the resource pool available for addressing public policy needs across all communities.
The Effects of Mexico's Non-Contributory
Health Insurance on Infant Mortality
Dr. Tobias Pfutze (CAS/Economics)
The objective of this project is to measure the impact of Mexico's recently established non-contributory health insurance program "Seguro Popular" on the country's infant mortality and neonatal mortality rates, especially for the most vulnerable population. Controlling for municipality level fixed effects, the estimations will take advantage of the program's staggered roll-out as an exogenous variation of assignment into treatment.
Banks, Market Organization, and Macroeconomic Performance:
An Agent-Based Computational Analysis
Dr. Boris Gershman (CAS/Economics)
This project is an exploratory analysis of the role that banks play in supporting the mechanism of exchange. It considers a model economy in which exchange activities are facilitated and coordinated by a self-organizing network of entrepreneurial trading firms. Collectively, these firms play the part of the Walrasian auctioneer, matching buyers with sellers and helping the economy to approximate equilibrium prices that no individual is able to calculate. Banks affect macroeconomic performance in this economy because their lending activities facilitate entry of trading firms and also influence their exit decisions. Both entry and exit have conflicting effects on performance, and we resort to computational analysis to understand how they are resolved. Our analysis sheds new light on the conflict between micro-prudential bank regulation and macroeconomic stability. Specifically, it draws an important distinction between "normal" performance of the economy and "worst-case" scenarios, and shows that micro prudence conflicts with macro stability only in bad times. The analysis also shows that banks provide a "financial stabilizer" that in some respects can more than counteract the more familiar financial accelerator.
Affine Models of the Term Structure
Barton Baker, PhD candidate (CAS/Economics)
My dissertation topic is fitting and solution methods of affine models of the term structure. The first two chapters will focus on extending the applicability of informing macroeconomic variables to the pricing kernel of the time series of yields on government bonds of various maturities, focusing specifically on aggregate uncertainty (Chapter 1), and real time data (Chapter 2). This added information will be used to tested with out-of-sample tests in the U.S and possibly Europe. Chapter 3 will present a solution class written in Python and C that I wrote from scratch for affine models of the term structure and possibility present a broad theoretical approach to solving affine models of the term structure, with and without unobserved factors. Many of these calculations involve complex operations and many iterations, so the resources provided by the HPC will benefit me immensely.
Informal Employment in Egypt:
Learning from Modeling with Essential Heterogeneity
Dr. Natalia Radchenko (CAS/Economics)
The paper focuses on the wage differentials between formal/informal sectors on the Egyptian labour market. The objective is to analyse the nature of the informal employment (involuntary engagement of workers in a segmented labor market versus voluntary choice of workers). The Egyptian labor market has significantly evolved since major structural adjustments were implemented in 1991. In particular the public sector, which employed the majority of highly educated workers, has decreased significantly. The growth in private formal employment has not been however sufficient to absorb the growing labour force and the arrival of large cohorts of civil servants. Consequently, informal employment has grown significantly in the last two decades. All these trends are likely to increase or modify labour market segmentation. At the same time, they are likely to influence worker's behaviour and induce behavioural changes regarding the informal sector. We thus seek to investigate this issue by focusing on sectorial wage gaps using recently developed non parametric methods to estimate the model with essential heterogeneity. In particular, the marginal treatment approach is used to investigate whether wage gaps are due to compensatory differentials or to segmentation between non-competing groups.
Exploratory Analysis of Price Changes in
Dr. Karaesmen Aydin (Kogod/ITEC)
We investigate how salespeople use the information provided to them to set the prices in business-to-business transactions. Of particular interest to us is how salespeople use price recommendations coming from a decision support tool. We do this by building reduced-form models and testing those on a data set obtained by a grocery products distributor.
Donor Disasters or Disaster Donors:
Analysis of Data from the American Red Cross
Dr. Karaesmen Aydin (Kogod/ITEC)
There is a close relationship between donor behavior and marketing communications for every non-profit organization that relies on gifts from its donors to fund its services: Marketing communications and interactions may influence individual donor behavior and vice versa. Yet, in the end, what matters is not an individual donor and his gift but the total amount that has been collected. We participated in a research proposal competition and won an award, in the form of a data set, from the American Red Cross. In this project, we will build "explanatory" and "predictive" models to study donor behavior. Specifically, we will investigate what factors influence repeat donations, and how marketing communications influence the frequency and magnitude of giving.
Color vision and hyperspectral images
Dr. Arthur Shapiro (CAS/Psychology)
The laboratory is investigating various aspects of color vision. In particular, we are analyzing hyperspectral images in order to develop algorithms to defeat camouflage of human-made objects.
Examining Differential International Responses to HIV/AIDS
Dr. Nathan Paxton (SIS)
This project explores the role that organizational learning processes play in state HIV/AIDS policy development. The puzzle addressed is the large degree of variation in policy output across states that are similar in terms of political or economic character. Although one can tell individual stories about each country, the overall variation defies the cross-applicability of many typical explanations. Where states better draw lessons from experience we should expect two results. First, structural characteristics of the state or of the set of HIV policy responders affects the character and degree of learning: the configuration of decision-making authority and information analytics interacts with the learning process, affecting the lessons drawn and policies pursued. Second, over time we observe some degree of policy convergence among states due to comparison and adaptation from others. The dissertation employs a mixed-methods approach. As a plausibility probe, econometric analysis tests for such patterns. The research employs an original dataset of 72 countries over 6 years and approximately 25 variables. To address data missingness, multiple imputation techniques were used. There were statisti- cally and substantively significant relationships and patterns, indicating further exploration of the underlying processes.
The Effect of Spatial Dependence on the Empirical Likelihood
Dr. Monica Jackson (CAS/Mathematics & Statistics)
The empirical likelihood is a nonparametric likelihood function that is analogous to its parametric counterpart. In particular, observations are assumed to be independent, and identically distributed.There are several empirical likelihood research papers regarding dependent data. However, there is no literature concerning the effect of spatial dependence on empirical likelihood procedures that are carried out assuming independence. This research presents the effect of such a violation on the asymptotic distribution of the empirical likelihood ratio. To determine whether the sampling distribution follows the specified distribution, we propose a spatially weighted Kolmogorov-Smirnoff Goodness-of-Fit test.
Accelerating Social Science Analysis for a New Age (ASSANA): Moving from Traditional Methods for Analyzing Large Scale Text-Based Data to Socially Intelligent High-Performance Computational Methods
Dr. Derrick Cogburn (SIS/International Communication)
The purpose of this project is to develop, test, refine and disseminate a repeatable interdisciplinary methodology and a related software tool for the computer-assisted analysis of large-scale text-based social science data. In keeping with the recent MOSAIC report , our overarching goal is to stimulate the next generation of social science research by providing analytical resources and interdisciplinary training to conduct textual analysis in PC and HPC environments. Key deliverables will be: 1) procedures for the ASSANA methodology; 2) an open-source HPC software tool; and 3) capacity building on ASSANA through workshops, seminars, and publications.
Robust Long-Term Streamflow Forecasting
Dr. Inga Maslova (CAS/Mathematics & Statistics)
The objective of this research project is to develop and demonstrate a new data-driven modeling approach to provide long-term forecasts of streamflow. The modeling approach will incorporate wavelet-based analysis techniques used in statistical signal processing and a multivariate relevance vector machine (MVRVM) that uses a Bayesian regression method. We will develop a methodology that detects patterns in changes in Pacific sea surface temperature (SST), snowpack and streamflow using wavelet decomposition. This information will then be used to improve the forecasting potential of the MVRVM.
An Adaptive Truncated Product Method
Dr. Xuguang Sheng (CAS/Economics)
In the multiple testing literature, Zaykin et al. (2002) developed a truncated product method that combines only those p-values less than some pre-specified threshold, but the lack of a clear choice of truncation point becomes a major obstacle to its more widespread use. We solve this problem by proposing an adaptive truncated product method that optimizes the selection of the truncation point among a set of candidate cut-off values. We then develop a bootstrap re-sampling procedure to efficiently estimate the distribution of the adaptive method. We illustrate the performance of the proposed method through Monte Carlo simulation and an empirical example in the context of panel cointegration tests.
Improving Measurements of Neighborhood Attributes at Multiple Spatial Scales Using the Geostatistical Method of Kriging
Dr. Michael Bader (CAS/Sociology)
With the growth of interest in accurately measuring neighborhood environments to study the influence of neighborhoods on individual-level outcomes, investigators have focused on improving two aspects of measurement: developing methods to create theoretically relevant measures and defining neighborhoods with relevant boundaries at appropriate spatial scales. Unfortunately, advancements made to improve the theoretical relevance of measures have been largely incompatible with defining appropriate neighborhood boundaries and vice-versa. In this paper, we argue that many neighborhood characteristics that social scientists are interested in studying should be conceptualized as changing from block-to-block rather than changing according to a patchwork of predefined discrete ecological units. We describe how a geostatistical method known as kriging can be combined with the existing econometric framework—an innovative method for measuring theoretically relevant attributes of discrete, ecological units—at small scales to develop city-block level estimates of theoretically nuanced measures that can then be flexibly reconfigured to multiple definitions of neighborhood boundaries. Using a cross-validation study with data from a 2002 systematic social observation of physical disorder on 1,663 city-blocks in Chicago, we show that this method creates valid results under assumptions of normality. We then demonstrate, using neighborhood measures aggregated to three different spatial scales, that the relationship between residents' perceptions of fear and neighborhood characteristics varies substantially across different spatial scales.
Bader, Michael DM, and Jennifer A. Ailshire. 2014. "Creating Measures of Theoretically Relevant Neighborhood Attributes at Multiple Spatial Scales," Sociological Methodology. Available: http://smx.sagepub.com/content/early/2014/02/07/0081175013516749.full
Forecasting Financial Data with Agent Based Models
PhD Candidate Georgi Panterov (CAS/Economics)
The purpose of this research is to build a rich, multi-agent artificial stock market where agents have endogenous expectations. Agents are able to act as sophisticated econometricians employing modern methods like Artificial Neural Networks and Genetic Algorithms. Unlike in standard financial models, agents are able to learn and update expectations using various Bayesian and non-Bayesian rules. This project will build a model that replicates some of the standard characteristics of financial markets such as volatility clustering and fat tail return distributions. At the end, there will be an attempt to calibrate the model using some real world price/orders data.
Exploiting Entanglement for Simulation of Few Body Systems
Dr. Nathan Harshman (CAS/Physics)
Team Members: Noel Klingler, Ryan Tillis
The goal of this project is to characterize entanglement in few body systems, and then to use this knowledge to optimize calculation and computation of few body dynamics. Few body systems are important at many physical scales, but this project will focus on atomic systems because implementations of quantum information processing devices, like ultracold atoms in optical lattices, require a precise understanding of few body dynamics. For example, few body effects are limiting sources of decoherence and loss in atomic interferometer experiments. The key to this method is that it finds a natural basis for efficient computation by choosing observables to describe the system such that energy eigenstates have minimal entanglement. For each additional particle in a simulation, complexity grows rapidly, and increases in efficiency become critical. More generally, studying entanglement is a theoretical probe that exposes kinematic and dynamical symmetries in both bound state and scattering problems. In this light, characterizing entanglement in few body systems shows how preferred physical observables are selected by the interactions even in complicated multiparticle systems. The outcomes of this project will include specific computational applications to cold atom systems and general results about entanglement in few body systems.
Cofactor Prevalence and the Prevalence of HIV
Dr. Alan Isaac (CAS/Economics)
The objective of this research is to investigate the role of cofactor infections in understanding the heterosexual spread of HIV. Recent research applies agent-based models to the epidemics of HIV in sub-Saharan Africa. The use of agent-based modeling in epidemiology, including in the epidemiology of HIV, is relatively recent but has been very influential. Agent-based methods allow researchers to readily model the social determinants of disease incidence and prevalence, including the detailed networks of sexual partnership. Many sub-Saharan African countries have HIV prevalence that is one to four dozen times the prevalence in other countries. Explaining this enormous difference has consumed substantial research effort, since it has important implications for treatment and prevention. Conventional explanations of high prevalence of HIV in sub-Saharan Africa have focused almost exclusively on sexual behavior. However, there are strong reasons to explore other explanations. Specifically, a more detailed examination of the disease environment is likely to contribute more to understanding HIV prevalence in the region rather than variance in sexual behavior has been able to. One reason to expect this is the research that show that sexual behavior in high prevalence countires often appears to be more conservative than in many lower-incidence countries, like the US and the UK. Another reason to expect this emerged in recent research by Sawers, Isaac, and Stillwaggon (SIS). SIS found a substantial difficulty in the agent-based literature that has proposed that a high prevalence of sexual-partnership concurrency underpins high HIV prevalence. SIS showed that the agent-based research had ignored the implications of a broad empirical literature on coital diluation. This research extends the SIS model to explore how cofactors can make sexual networks more effective at spreading HIV. For this research, our working hypothesis is that cofactor infections are key to understanding the heterosexual spread of HIV in sub-Saharan Africa and perhaps elsewhere.
Research on Multivariate Probability Distributions
Dr. John Nolan (CAS/Mathematics & Statistics)
Some of my current research is related to classes of probability distributions. The stable laws and the extreme value laws generally do not have closed form expressions for their densities and cdfs. I am developing programs to compute these quantities, and some of them are computationally intensive. Some of these formulas are parallelizable, and I would like to experiment with this.
Ideas, Technology, and Growth: The Impact of the Printing Press
Dr. Jeremiah Dittmar (CAS/Economics)
The goal of Dr. Dittmar's research is to deepen the understanding of the role of technology in economic growth. In this project, he will focus on the adoption of the movable type printing press—the closest historical parallel to the emergence of the internet—and its impact on city growth in Europe. While economists recognize that economic activity is highly concentrated in cities (Glaeser, 2009; Lucas, 1988) and that ideas are at the heart of growth (Lucas, 2009; Mokyr, 2005; Romer 1994), the precise mechanisms through which ideas in cities create growth are unknown. To investigate the mechanisms, Dr. Dittmar will construct a new database on the diffusion of the printing press and print media, on the acquisition of mathematical skills, and on innovations in business practice. He will use the database to measure the extent to which information technology contributed to these useful ideas, which in turn transformed the economic geography of Europe and contributed to the emergence of modern growth. Preliminary work (Dittmar, 2009) shows that cities where the printing press was adopted in the late 1400s had no prior growth advantages, but subsequently grew significantly faster than cities without a press.
Dr. Dittmar's work has intellectual merit because his study of how the great pre-internet revolution in information technology transformed patterns of growth will contribute to economists' assessment of the endogenous and unified theories of growth. Dr. Dittmar's work will have broad impact because his new data on the local production of knowledge will be used across many disciplines including economic geography, the history of science, and urban history.
The Effect of Federal Subsidies on the
Outcomes of Children in Foster Care
Drs. Hansen and Reynolds will simulate the effect of extending federal subsidies to all children in foster care on the health outcomes of child victims of abuse and neglect. Currently, state child welfare systems cannot claim the same federal support for all of the children in foster care. States therefore have an incentive to provide more services to children who are eligible for federal support. This project will be the first to measure the extent to which eligibility for federal support influences services provided and health outcomes, and it will be the first to simulate the effect of broadening federal support to all children in foster care. It is critically important to understand how the structure of federal incentives affects outcomes because over $25 billion in child welfare services are provided annually to nearly a million children (DeVooght, Allen, and Geen, 2008). Dr. Hansen's expertise is in the economics of foster care policy. Hansen and Hansen (2006) and Hansen (2007, 2008) have shown that children who are eligible for federal subsidies for adoption after foster care get higher levels of support. The proposed project extends the work to consider health outcomes and services provided while children are still in foster care. Dr. Reynolds brings to the project her expertise in structural modeling (Feinberg and Reynolds, 2010; Reynolds (nee Olson, 2004), which is the preferred, but computationally-intensive, method for estimation of underlying policy invariant parameters in the policy simulation (Heckman, 2000).
A Sample of Bankruptcy Court Records, 1898-2005
This project will construct a long-run micro-level data set from original bankruptcy case files. Despite considerable study of personal bankruptcy, we still do not understand why personal bankruptcy rates have risen so dramatically in the late 20th century, nor do we understand the relationship between the business cycle and business and personal bankruptcy filings. The accumulated scholarship in law, economics, and sociology leads us to believe that these relationships are complex. They are mediated by both the letter and practice of the bankruptcy law, other credit laws, the liquidity of financial institutions, and the regulations imposed on credit markets. The creation of a micro-level data set on bankruptcy that covers a long time period will accelerate research that has been held back by data that are weak relative to the importance of the problem. Initial stages of this research have been funded by the Institute for New Economic Thinking and the Alfred P. Sloan Foundation.
The project has significant intellectual merit; it will transform the interdisciplinary dialog between economics and social work by providing evidence-based predictions for a change in policy. The project's broader impact on at-risk children cannot be overemphasized. Policy that improves outcomes for abused and neglected children can have very high rates of return; for example, the return on society's investment on adoption of children from foster care exceeds 100 percent (Hansen, 2008).
Back to top
Special Education Technologies
Dr. Sarah Irvine Belson (CAS/SETH)
The objective of Dr. Irvine Belson's project is to design applications that allow teachers and students with behavioral disorders to collect behavioral data using handheld devices and then to manipulate the data in a virtual, 3D environment. While self-monitoring and self-graphing have been proven to support students' acquisition of socially appropriate and academically useful behaviors (Cartledge et al., 2008, Maag 1999), there has been little development of tools that allow teachers and students to study behaviors over time and across multiple settings. The study of behaviors across settings is critical because students who fail to meet teacher expectations of social behavior are at an increased risk for unfavorable school outcomes. For example, these students have poor interactions with teachers and peers, poor academic performance, and high rates of disciplinary problems (Nowicki, 2003). Dr. Irvine Belson has experience directing school-based research projects examining implementation of high-end technology and telecommuni-cations in the classroom. She has trained pre-service and in-service teachers in electronic communication and technology integration, and she serves as a consultant to schools and businesses on design, implementation, and analysis of technology-based applications for instruction.
The intellectual merit of this project lies in its pedagogical innovation. The use of handheld devices to generate data on behaviors that can be quickly and cheaply visualized has the potential to transform instruction in special education. The broader impact of this program lies in the contributions to the research based on technology in education and to broader society in the effect of these tools to support long-term student achievement.
Spatial and Mixed Method Orientations in Native American Histories
Dr. Adrea Lawrence (CAS/SETH)
Dr. Lawrence's work aims to show how teachers from the Bureau of Indian Affairs (BIA), their supervisors, and Native Americans acted on (or appropriated) federal Indian policy within on-reservation day schools. Specifically, Dr. Lawrence asks: How have the actions of BIA day school personnel and Native communities adapted and affected federal Indian policy locally over time? What relationships have existed between actions of BIA day school personnel, tribal members, and other BIA officials? There are only a handful of studies on the day school system, despite the fact that teachers and supervisors were effectively the agents of the U.S. government and were responsible for implementing a wide range of policies (Carter, 1995; Gere, 2005). The small amount of research does not reflect the wealth of primary source material. To effectively organize the material so that the specific aims can be addressed, Dr. Lawrence employs a metaphor from geography—sediment and sedimentation—to visualize the cumulative effects of individual action, group decisions, social ecologies, and the physical environment across space and over time. Dr. Lawrence will develop an open source digital tool that will allow researchers to incorporate geospatial, qualitative and quantitative source material. Dr. Lawrence is uniquely trained to accomplish this work. She is an education historian who specializes in federal Indian policy and has published in the fields of policy analysis, history, and ethnography (Lawrence, 2008, 2009; Lawrence & Cooke, 2010; Winstead, Lawrence, Brantmeier & Frey, 2008).
The intellectual merit of Lawrence's project lies in its potential to untangle epistemological questions stemming from the differences between the spatial orientation of Native scholars and the temporal-historical orientations of non-Native scholars (Deloria, 1992; Meyer, 2008). The reconfiguration of the history of federal Indian policy through simultaneous sedimentary (corresponding to temporal-historical) and spatial analyses has the potential to reshape the study of education history by facilitating the incorporation of rich qualitative data. The broader impact of this research will be felt in the field of educational policy studies because sedimentary analysis could, for example, help make successful school programs more easily replicable.
Modeling Exposure-Response Relationships
Dr. Elizabeth Malloy (CAS/Mathematics and Statistics)
Dr. Malloy will examine the utility of computationally-intensive, data-adaptive approaches to splines and other smoothers in Cox regression models. Smoothing avoids a priori specification of the functional form of the relationship between exposure and a response. Because inferences about the relative risk of a response are drawn from the final model, selection of the model is critical. Dr. Malloy's preliminary studies show that there is excessive variation in model fit when different standard smoothing methods are applied to the same data (Govindarajulu, et al. 2009; Malloy et al. 2009). In this project, Dr. Malloy proposes to (1) implement model averaging methods within a given class of smoothers, and (2) extend the "super learner" algorithm (van der Laan et al. 2007) to estimate the dose-response from a set of candidate smoothers within Cox models based on cross-validated risk.
Dr. Malloy's research has intellectual merit because it will investigate the benefits of model averaging and the feasibility of using the super learner algorithm in estimation with smoothing methods. The broader impact of the research is in its application in epidemiology. The work will inform understanding of the underlying factors related to disease incidence or mortality. For example, Dr. Malloy and her collaborator, Dr. Ellen A. Eisen (School of Public Health, UC – Berkeley) plan to use the results of the research to estimate the incidence of skin cancer in relation to cumulative exposure to oil-based metalworking fluids in a cohort study of 23,650 autoworkers.
A Scientific Computing Toolkit for the Volunteer Grid
Dr. Michael Black (CAS/Computer Science)
Dr. Black will implement a mathematics and scientific computing toolkit that uses volunteer grid computing resources. The toolkit will equip the proposed HPC server with software to solve common mathematics problems using large-scale volunteer parallel resources and an interface allowing researchers to submit problems to the server. The server will divide the problems into sub-problems and dispatch the sub-problems to volunteers' computers to solve while they would otherwise sit idle. The toolkit will utilize the volunteer grid using the Berkeley Open Infrastructure for Network Computing (BOINC) framework (Anderson, 2004). Existing parallel solutions to these applications generally require researchers to dedicate their own resources to the problem and to have experience with software development (Ghuloum, 2007). On the other hand, most existing BOINC projects are specialized. Software in the proposed toolkit will be designed to solve satisfiability equations, calculate gradient descent, and compute graph coloring—very common problems. Dr. Black is prepared to complete this toolkit. He has published on porting the BOINC framework to new platforms (Black and Edgar, 2009) and has recently completed "proof-of-concept" satisfiability and gradient-descent solvers that use the BOINC framework.
The intellectual merit of this research lies in its innovative code to port the sub-problems through BOINC. The broader impact of this research lies in the potential of the toolkit to harness unused computing resources to help the worldwide community of researchers quickly solve important scientific problems.
Dr. Black's long-term research goal is to create a "meta BOINC project," not tied to any application, allowing less computer-savvy researchers to use the volunteer grid in their research and giving volunteers a single place they can go to contribute to many different research projects. The proposed toolkit is the first step in this meta-project.
Designing Novel Many-Body Quantum States of Ultracold Atoms using Dynamically Transforming Optical Lattices
Dr. Philip R. Johnson (CAS/Physics)
The objective of Dr. Johnson's research is to develop and simulate a new method for creating and probing quantum states of ultracold matter not ordinarily existing in nature. He will use dynamical transformation of optical lattice potentials. Optical lattices are, essentially, crystals of light that can hold and control atoms suspended at the potential minima in a vacuum. The shapes of optical lattices can be dynamically transformed by manipulation of laser beams. Dr. Johnson will simulate the creation of quantum states by a sequence of single-well splitting and double-well merging operations on arbitrary pairings of adjacent lattice sites. Experimental tests of the simulations are within reach of leading experimental groups, including Dr. Johnson's collaborators at the Joint Quantum Institute of NIST and the University of Maryland, but experiments have not been performed because of the complexity of the modeling required to understand the physics. Dr. Johnson's previous work using numerical simulations of the macroscopic quantum mechanics of superconducting qubits uses similar methods (Johnson, 2003; Berkley 2003) and he has the requisite knowledge of optical lattices to carry out this work (Spielman, 2007; Johnson, 2009).
This research has intellectual merit because it designs a fundamentally new method for studying many-body states with optical lattices and because it is likely to yield explicit predictions that can be tested in the lab. Optical lattices have great promise as analogs for studying quantum phase transitions; therefore, the broader impact of the work is that it will contribute to the important goal of designing revolutionary materials such as high-temperature superconductors.