Cancer research is one of the most dynamic applications of precision medicine. However, the collection of tumor and patient genomic sequences, protein biomarkers, tumor cells, radiological and molecular images and clinical findings creates a superabundance of data held in various datasets in different repositories across the country and around the world. The challenge of accessing all this data in a usable way slows the progress of cancer research.
The National Cancer Institute (NCI) has been working to corral a wealth of data generated by projects such as the Human Tumor Cell Atlas Network and the Clinical Proteomic Tumor Analysis Consortium by integrating that data into the Cancer Research Data Commons, which connects data repositories, analytic tools, computational platforms, and knowledge bases. To make this data especially useful, the NCI has selected a consortium led by the Broad Institute and including Seven Bridges, the Institute for Systems Biology and General Dynamics Information Technology to develop the Cancer Data Aggregator. The goal of the Cancer Data Aggregator — which was a key recommendation from the Cancer Moonshot Blue Ribbon Panel — is to enable researchers to seamlessly query and connect data that is distributed across entirely different types of repositories.
Seven Bridges and its consortium partners will collaborate closely with the NCI Center for Cancer Data Harmonization (CCDH) to develop an aggregator that will allow for harmonized metadata searches across diverse sources. It will enable users to identify which data types are available for given criteria, understand where data resides within participating repositories, and return results in a format suitable for users’ downstream analysis needs. Importantly, it will let researchers aggregate data, via queries, across the Human Tumor Atlas Network, the NCI Cancer Research Data Commons (CRDC) and other Data Coordinating Centers and repositories with which NCI plans to interoperate. These integrative searches should yield additional knowledge to inform cancer treatment and discovery.
Seven Bridges will serve as a solution architect of the Cancer Data Aggregator, adding to the user-guided perspective of the consortium partners. The company’s insights into the research community enable the identification of user pain points and the ability to design the types of output that researchers require for workflows and analyses. Additionally, Seven Bridges will leverage its experience as a subject matter expert in interoperability and harmonization efforts with several groups, including the Proteomic Data Commons, Genomic Data Commons and Gabriella Miller Kids First Data Resource Center, to guide the integration of services required to realize the Cancer Data Aggregator’s potential.
Seven Bridges has been a part of the CRDC since it was established by NCI in 2017; before that Seven Bridges built one of the NCI Cloud Resources (formerly Cloud Pilots) starting in 2014. The CRDC is an expandable data science infrastructure that connects separate data systems, repositories, analytic tools and knowledge bases to lead to new discoveries in biomedical research.
Our Chief Scientific Officer, Brandi Davis-Dusenbery expressed her excitement about the project: “We look forward to applying our extensive experience in democratizing access to data and computational resources. We are confident that by leveraging our many diverse partnerships ― along with the expertise we have creating multi-omic, cloud-based infrastructures ― we can help accelerate efforts to advance personalized medicine and improve human health.”
NCI awarded the CDA project to the Broad Institute-led consortium on May 6. Work will begin in June.