Success Story

May 6, 2017

Scaling up cancer research

The Genomic Data Commons (GDC), an initiative of the National Cancer Institute (NCI), is a centralized national database and information system, allowing researchers free access and sharing of cancer-related genomic and clinical data. A 2008 CBC Lever Award contributed to the establishment of the GDC, which was launched at the University of Chicago in June 2016. Since then, Bob Grossman, a member of the 2008 CBC Lever Award team, has expanded the GDC by establishing a new commons for cancer data called the Blood Profiling Atlas for Cancer (BloodPAC), which will be a home for liquid biopsy data with the goal of accelerating the discovery of new biomarkers.

Rick Stevens

Rick Stevens, PhD, above, is working with University of Chicago and Argonne National Laboratory researchers on a project to extract clinically relevant discoveries from the vast amount of cancer data. (Photo: Mark Lopez)

Currently, only a small minority of cancers has a known relationship between mutation and treatment that can inform today’s clinical decisions. The University of Chicago is at the epicenter of the search for new cancer targets and therapies in the massive — and rapidly growing — data of the National Cancer Institute (NCI).

The Genomic Data Commons (GDC), led by Robert Grossman, PhD, the Frederick H. Rawson Professor in Medicine and the College and chief research informatics officer for the Biological Sciences Division, launched in June 2016 with an announcement by Vice President Joe Biden. In the months since, the GDC has expanded to hold over five petabytes of data, accessed by more than 1,500 users a day.

Robert Grossman, PhD, talks with Vice President Joseph Biden at the Genomic Data Commons launch in June 2016, far left. The GDC now holds more than five petabytes of data and is accessed by 1,500 users a day.

The GDC unlocks the potential of the NCI’s vast archive of genomic and clinical data. Because these datasets have grown too large for most laboratories to download or analyze, the GDC provides a centralized and standardized repository and advanced tools so that researchers can work remotely. By working with this extensive data, scientists can find subtle cancer-related genetic effects and probe whether various combinations of drugs might be effective for particular cancer subtypes.

Since the launch of the GDC, Grossman has also led the development of a new commons for cancer data called the Blood Profiling Atlas for Cancer (BloodPAC), which will be a home for liquid biopsy data with the goal of accelerating the discovery of new biomarkers. Future projects will apply the data commons concept to other conditions, such as psychological disorders and traumatic brain injuries.

“We’re trying to change the way scientific discoveries are made by democratizing access to this large-scale data,” Grossman said. “We’ve been happy to see our community develop tools that allow these kinds of discoveries to be made on software applications that run on researchers’ desktop computers, with the needed data streaming from the GDC in real time.”

But data is only half the story. Fulfilling the promise of personalized medicine will also require powerful computation, beyond even the level of today’s most powerful supercomputers.

As part of the Exascale Computing Project, a Department of Energy initiative to push the frontier of supercomputer speed to one quintillion (or a billion billion) calculations per second, researchers at UChicago and Argonne National Laboratory will help extract clinically relevant discoveries from the huge and rapidly growing landscape of cancer data. The CANcer Distributed Learning Environment, or CANDLE, will develop “deep learning” methods” — similar to those used to train self-driving cars — on clinical, experimental and molecular data in the hope of finding new hypotheses, drug targets and treatments for different types of cancers.

“It’s a huge computational problem,” said principal investigator Rick Stevens, PhD, associate laboratory director for computing, environment and life sciences at Argonne and professor of computer science at UChicago. “We have lots of data — millions of millions of experiments and expression data from 20,000 patients. But the models we need are still an open question. Once we train and develop the models, clinics can deploy them on relatively small systems and start using models to predict which drugs to give to a given patient.”

This is the second of a five-part series on data-driven medicine and research at the University of Chicago Medicine, originally published in the Spring 2017 issue of Medicine on the Midway.

Source: Adapted (with modifications) from the UChicago ScienceLife. Posted on April 25, 2017 by Rob Mitchum in Sidebar.


▸ Chicago Tribune—Joe Biden highlights Genomic Data Commons at UChicago in a speech on the “Cancer Moonshot” initiative
▸ CBC-Funded Project Plays a Major Role in a Presidential Initiative
▸ 2008 Lever Grant to the Chicago Center for Systems Biology