Making Use of Statistics in and outside the Classroom
For some, it can be difficult to figure out what to do after graduation. For Philip Gautier, it meant earning a master’s in statistics at American University, after graduating from the University of Maryland in 2008 with a degree in economics. Since beginning his studies at AU, Gautier has been able to work on fascinating projects with two faculty members.
Gautier has had the opportunity to collaborate in the Department of Economics with Professor Mary Hansen, who is examining the effects of recessions on government subsidies to families with foster children. The research is looking at the outcomes for kids in the foster care system and works to measure how long children stay in a certain foster care setting, since it is a sign of an unhealthy system, if children are moving around a lot.
Gautier’s work with Hansen began when she came into the Statistical Consulting Center of the Department of Mathematics and Statistics. The center, which is staffed by graduate students including Gautier, is open to students and faculty for consultation on projects with statistical applications. Hansen came in for assistance with downloading, cleaning (looking for typos and mistakes in the data), and analysis. “Results can be useless if you don’t clean properly,” Gautier explains. “This database is coming from so many different agencies with different practices. It is an ongoing process, and it takes forever.” As it turned out, cleaning and reformatting the data took six months (about half of the project time).
Throughout the project, Gautier has been looking at data spanning the entire country from 1995 to 2008. Examining the effects of the recessions in that period on the levels of subsidies to foster families, as well as the effects on lengths of stay, he has found mixed results. The preliminary findings have shown that the amounts of subsidies increased with a decrease in the number of families being offered subsidies. Gautier explains that during these recessions the federal government increased funding to state governments to encourage them to keep up with subsidies even as revenue fell. All local government foster care and adoption agencies are required to report to the Adoption and Foster Care Analysis and Reporting System (AFCARS), from which Gautier obtains the data for the project. AFCARS is a powerful database, because it is not a survey or sample but contains information on every child in the system. Gautier also works with a database specific to Washington, D.C., through its Child and Family Services Agency, which includes data given anonymously. The database allows him to look at outcomes across years in D.C., work with massive amounts of data, and track large numbers of children, while also keeping the information private.
Gautier has been working with Hansen for 13 months, and in January, the preliminary results were presented at the Allied Social Sciences Associations Conference in Denver. He received funding from the university through the Mellon travel grant to attend the conference. “The goal of the research is to use the evidence that we find in data to analyze the effects of government policy towards kids in foster care,” explains Gautier, adding that the research could be used to inform policymaking in the future.
Gautier has found his work with Hansen beneficial: “It was an opportunity to apply some of the statistical analysis and statistical computing methods I am learning now to socioeconomic-related questions.” Hansen has found the extra assistance valuable: “Philip has been a tremendous asset because of his focus. He is genuinely interested in mastering the difficulties of sifting through great mounds of data in order to extract relevant information. His particular focus has been essential for the project we are tackling together, because the base of the project is 8.5 million observations of administrative data; 8.5 million observations would intimidate most people, but not Philip!”
Gautier has also been assisting Professor Betty Malloy in the Mathematics and Statistics Department on a machine-learning algorithm for combining statistical models. His work has been with an algorithm called the Super Learner, which was developed by Mark J. van der Laan of the University of California–Berkeley. Malloy has been working on a long-term study of workers in the automotive industry to determine whether their exposure to metalworking fluids has a relationship to the occurrence of cancer. A statistical model would explain the relationship between two or more variables. Gautier explains that the big question is, “Which model does one use?”
The Super Learner creates an ensemble of models and combines them into one model with the strength of multiple models. “You can use the strengths of multiple models together,” explains Gautier. He is using cross-validation to analyze the data properly. When collecting data, there is a signal, and there is noise (just like when a person talks on a cell phone— there are words and there is static). Gautier explains that there is information that actually reveals something in a data set, which is the signal, but there is noise (the static) that comes from randomness from the data set. The danger of any statistical procedure is that a person might not be able to tell the difference between signal and noise. Crossvalidation avoids the noise.
Gautier took two mathematical statistical courses with Malloy, which led to his work with her: “I knew that she was working on this algorithm, and I have a particular interest in machine-learning, so I went for it and asked if I could work with her.”
The data being analyzed includes information year after year about employees and their levels of exposure, as well as other important information, including age and lifestyle. The Super Learner has been seen in different literature, but it has yet to be used with the Cox Model, which analyzes anything that happens just once. In this case, the Super Learner analyzes the data of initial cancer diagnoses. Gautier explains that his work with Malloy is aimed to use the algorithm to the Cox Model, so they can analyze the automotive workers data set and help determine the risks of cancer associated with metalworker fluids. In the end, findings from this work will help to inform policies regarding the safety practices of metalworkers.
While working toward his master’s degree, Gautier has been able to take material from the classroom outside into real-life projects involving copious amounts of data. These projects are just two examples of the tremendous work AU students are completing while still in school.