You are here: Learnable Evolution Models: A Solution in Need of a Problem

Contact Us

Battelle-Tompkins, Room 200

CAS Dean's Office 4400 Massachusetts Avenue NW Washington, DC 20016-8012 United States

Back to top


Learnable Evolution Models: A Solution in Need of a Problem

By  | 

American University Senior Michael Krause describes how proteins bind in the immune system at a chalkboard.
American University Senior Michael Krause describes how proteins bind in the immune system.

Michael Krause had found a solution. In Fall 2015, Michael, then a junior statistics major at American University, stumbled across a paper on learnable evolution models (LEM), a computational process that uses machine learning to determine which individuals in a population are better at performing a certain task. 

In contrast to Darwinian-type models, which refine a population through slight improvements over many evolutionary steps, machine learning enables LEM to make leaps in understanding, or "insight jumps," dramatically reducing the number of steps needed to analyze a population. The process can be applied in engineering design, drug design, economics, data mining, and other diverse fields. 

Michael recognized that LEM was a promising research tool, and he looked for an application. "For a long time I had a solution in need of a problem," he explains. He read about vaccine therapy for cancer and realized he might have a match. "I was looking for an application already, and thought that oncological vaccine therapy might fit my solution. I got lucky when it did."

Vaccine therapy is emerging as an alternative to traditional cancer treatments such as chemotherapy and radiation. But the current methodology is slow, costly, and manual, hampering research and preventing vaccines from becoming a widely available option. Only one cancer vaccine, Provenge® for prostate cancer, is currently FDA-approved; otherwise, the treatment is only available in clinical trials. Michael thought computers could help. The challenge intrigued him—the intersection between a difficult math problem and an important application. Now a senior, Michael is developing software to put his idea into practice as a part of four simultaneous independent studies with Julia Chifman and Elizabeth Malloy of the Mathematics and Statistics Department, Joshua McCoy of the Computer Science department, and Katie DeCicco-Skinner of the Biology department.

Cancer occurs when the immune system's regulatory agents, T-cells and B-cells, do not bind properly with abnormal cells in the body. As a result, the immune system fails to break down harmful cells. The goal of vaccine therapy is to introduce into the body a protein complex that will bind with both the cancer cells and the T- and B-cells, activating an immune response to destroy the cancerous cells.

Researchers begin with a sample from a patient's tumor. Currently, based on knowledge about certain proteins that have worked previously in other patients, they select protein complexes from the Protein Data Bank (PDB), a weekly-updated database cataloguing the 3D structure of tens of thousands of biological molecules. They synthesize and physically test each complex against the tumor sample until they find one that binds appropriately. The process is expensive and time-consuming, with no standard procedure, and since tumor cells are unique to an individual, this must be repeated for each patient. 

Michael's software uses LEM to expedite this process. A patient's tumor sample would be analyzed using mass spectrometry, producing data that digitally represents the cells in an artificial immune system. Then, instead of choosing and synthesizing individual protein complexes, the program tests all potentially relevant samples available in the PDB, currently around 45,000, digitally assessing the likelihood that each complex will properly bind with the tumor sample. The evolutionary algorithm selects the high fitness complexes (the ones that bind the best), then switches to a machine learning algorithm, which employs regression analysis to extrapolate a pattern among the best performers, determining the characteristics that make a candidate more likely to bind. The program uses that pattern to generate a new population of proteins, which is fed back into the evolutionary algorithm and refined further. The whole cycle is repeated until the population is winnowed down to a sample of about 100 "elite" proteins that have the desired binding properties. 

After tests to rule out undesirable side effects (for example, selected proteins binding with each other to create toxic complexes), the program recommends the best candidates, which can then be synthesized and physically tested with the tumor sample to verify binding capability. If effective, the complex can be injected into the patient.

Michael's program dramatically increases the number of complexes that can be tested against a given sample while also cutting down the amount of protein synthesis required—the most expensive and difficult part of the process. The software would facilitate research by making it faster, easier, and cheaper for labs to develop viable vaccines. His program is specifically designed to address lung cancer, the second-most common type of cancer in the United States, but the idea could be extended to treat other types of cancer. 

Michael will be presenting his research at CAS's Robyn Rafferty Mathias Student Research Conference in April, and he has applied to present at other research conferences. He is interested in publishing a paper on his work.