Nov 26 2018

CanDIG provides one-stop analysis platform for secure genomics data

With strict controls over the use of and access to genomics data, how do we create a system where researchers can better understand the role genes play in manifesting into deadly diseases and cancers?

The Canadian Distributed Infrastructure for Genomics, or CanDIG, may be the solution. CanDIG is a national computation infrastructure for genomics analysis which aims to remove the barriers doctors and researchers face in discovering, exploring, and analyzing large genomics datasets that are increasingly generated in healthcare settings.

“Health data is sensitive and privacy controls are very high, as they should be. It’s hard to get data to leave the hospital, and even harder to get it to cross provincial boundaries,” explains Dr. Jonathan Dursi, a senior research associate and architect CanDIG at SickKids in Toronto, O.N. Both federal and provincial regulations exist around the access and use of health data collected in hospitals and patient-care.

We’re trying to make sure researchers can understand the problems that are too complex for any one hospital, institution or province to be able to tackle on their own.

 

“We are building a platform where people can perform analysis over the federated data by sending the analysis to the data without actually exposing or sharing the data explicitly. We’re trying to make sure researchers can understand the problems that are too complex for any one hospital, institution or province to be able to tackle on their own,” said Dr. Dursi.

Through CanDIG’s completely distributed platform, data providers have complete control over who can access datasets and how much. The data has federated analysis built on top of application programming interfaces, or APIs, so it can be analyzed without being copied.

“What makes CanDIG stand out on an international scale is the ability to federate the data across multiple jurisdictional boundaries – we’re building an infrastructure for federation of health data that the rest of the world is going to need in the next five to ten years.”

 

The goal is to enable researchers to analyze genomic data on a Canada-wide scale, connecting researchers with both the data and the compute power required to support breakthroughs in genomics research and genetic sequencing. It has wide-ranging implications – not only will it help doctors and researchers better understand, treat and possibly prevent disease, it will create a new approach to delivering secure, innovative and collaborative research with large genomics datasets.

“What makes CanDIG stand out on an international scale is the ability to federate the data across multiple jurisdictional boundaries – we’re building an infrastructure for federation of health data that the rest of the world is going to need in the next five to ten years.”

With sites in Toronto, Vancouver, and Montreal, CanDIG is a collaboration between HPC4Health directors, Dr. Michael Brudno, professor at the University of Toronto and Carl Virtanen Director and Research Lead, UHN Digital; Guillame Bourque, an associate professor at McGill and Director of Bioinformatics at the McGill University & Genome Quebec Innovation Center (MUGQIC); and Steve Jones, Associate Director, Canada’s Michael Smith Genome Science Centre and professor at UBC and SFU.

CanDIG is funded by CFI’s Cyberinfrastructure Initiative which supports research data infrastructure projects that create tailored, shared and integrated data resources capable of enabling leading-edge research on significant scientific, social and economic questions. The consortium is part of the Global Alliance for Genomic & Health Initiative (GA4GH), which is facilitating industry partnerships with major companies such as Google, Microsoft and Amazon, as well as other research centres.