Researcher’s Code Could Be Key To Crunching Cancer’s Big Data

Aug 23, 2016

A drawing depicting DNA molecule unwinding from a chromosome inside the nucleus of a cell.
Credit National Human Genome Research Institute / genome.gov

At the most basic level cancer can be defined as the DNA of a normal cell going haywire. 

One way that DNA can go haywire is through something called aneuploidy, where one or more of a cell’s 23 chromosomes gets duplicated. It can also commonly happen through structural variations in the sequences within the DNA -- think of the double helix model from middle school science then move around the pieces.

Those changes can occur in any of the 3.2 billion base pairs of DNA found in each person. That means the data sets holding genome sequences for cancer cells can be huge.

Usually those changes are looked at separately, but Carnegie Mellon University Associate Professor of Computational Biology Jian Ma is using servers at CMU to crunch data to see how aneuploidy mutations and structural variations relate to each other.

“The advantage, of course, is if you look at them together, you can identify these event more precisely in an unbiased way,” Ma said. “You will be able to tell the timing of these event so you will be able to know if this structure rearrangement happened before chromosome duplication or after chromosome duplication.”

Ma is doing this by looking at breast and cervical cancer genomes that have already been sequenced as part of the Cancer Genome Atlas. The Atlas is a federally-funded project to catalog the genetic mutations responsible for cancer.

“As a computational biologist, our interest is to develop new algorithms, new analytic tools, that can look at these data from different perspective and then you can potentially identify new things that are relevant to the biology of cancer,” said Ma, who is pushing that huge data set through code that he calls “Weaver.”

Adrian Lee, a professor of pharmacology and chemical biology at the University of Pittsburgh, said medicine keeps producing bigger and bigger data sets.

“And so now we are virtually dependent on computational biology to try to decipher what these changes mean,” Lee said. “Say you find a million changes, but only 20 are important. How do you find those 20? It’s kind of a needle in a haystack.”

Cancer evolves over time. Two biopsies taken of the same cancer, from the same patient, a year apart would look different on the DNA level, Lee said.

“We can measure the DNA and then try to build models to predict how it changed over time and then what it would have done in the future,” Lee said. “In that way we can predict how it’s going to change and then what we should target.”

Ma’s code was published in the journal Cell Systems and he said hopes scientists will use it on other data sets. Weaver has already found some duplicated chromosomal regions that are caused by specific structural variations. 

Ma said he hopes to improve his algorithms to better understand those evolutions and then apply it to more samples from the Cancer Genome Atlas project. From there it could move into the clinical setting.

“Weaver at the moment is just an algorithm, there is no direct clinical impact yet,” Ma said. “But I think I will be interested in exploring potential opportunities like how to apply Weaver in the broader context.”

Researchers said someday this type of big data crunching could help doctors tailor a specific treatment to a patient’s specific cancer mutation.

In this week's Tech Headlines: 

  • Pennsylvania officials are hoping to make access to the state’s massive amounts of data a bit easier.  Gov. Tom Wolf’s administration this week launched OpenDataPA. Wolf said the goal is to be more transparent and accountable while creating economic opportunities for businesses and entrepreneurs. The project opened with 12 datasets, including information about job creation and school performance, with the expectation of more agencies and program areas coming onboard soon. 
  • Some of the most cutting edge research on driverless cars is being done in Pittsburgh but those cars might be in use across the state line sooner than in Pennsylvania. Officials with the Ohio Turnpike said they will likely start testing the cars within 12 months — and possibly before the end of the year. Officials said the Ohio Turnpike is ideal for testing self-driving cars because it's relatively straight and flat with three lanes running in each direction.