The Data Repository Service API on Seven Bridges: Towards Global Interoperability

Cancer Genomics Cloud
Back to Blog

The Data Repository Service API on Seven Bridges: Towards Global Interoperability

A major step towards global interoperability

For the first time ever, users on CAVATICA are able to import datasets from NHLBI BioData Catalyst powered by Seven Bridges, such as TOPMed, onto the CAVATICA platform for analysis. Likewise, users on NHLBI BioData Catalyst can import Kid’s First Datasets from CAVATICA in the same manner. Before now, there was no way to perform such co-analysis of datasets on separate platforms without doubling storage and associated costs. Now with the Data Repository Service import feature on CAVATICA and BioData Catalyst powered by Seven Bridges, users can enjoy a faster and less labor-intensive import process, and without increased storage and cost.

Through the Data Repository Service it is now possible to import files via DRS between CAVATICA and NHLBI BioData Catalyst and co-analyze them, all in one project and on a single platform. Described herein, this effort is a step towards global interoperability made possible through the Data Repository Service.

How does the Data Repository Service impact my research?

The Data Repository Service (DRS) API is an interoperability standard for access methods developed by the Global Alliance for Genomics and Health (GA4GH). These methods allow researchers on cloud platforms utilizing the DRS API to access the data they need, regardless of the architecture of the platform where the data is stored. This is increasingly important today, as the volume and complexity of datasets continues to grow with time, as does the number of locations they are stored and different methods for data management and access.

For a researcher, navigating the myriad data repositories to find the files needed for your research can be a hassle. These data repositories each may have different access tools and methods of organizing their datasets. Further complicating matters: the datasets themselves may be stored in different cloud providers such as Amazon Web Services, Google Cloud Provider, Microsoft Azure, and even local storage. The DRS API alleviates the challenges of navigating this sea of repositories, access tools, and files locations by providing a standardized and organized method for platforms such as Seven Bridges to connect to these data repositories. The DRS API provides a generic interface to data repositories so data consumers, including workflow systems, can access data in a single, standard way regardless of storage location and how the cloud data is managed. Overall, this removes the burden from the researcher, who otherwise would be concerned with access tools, minimizing egress costs, etc, when they would rather be focused on data analysis.

The computation engine in CAVATICA and NHLBI BioData Catalyst powered by Seven Bridges is able to work directly with DRS inputs coming from either of the two platforms, enabling interoperability between them. The DRS import feature is available on the both platforms’ visual interfaces, as a part of the “Add files” dialog within a project (see screenshot below). Files imported via DRS import can be used in execution in the same way as any other file on the platform, including on The Data Cruncher.

To learn more about the DRS API and importing files, see the documentation here for CAVATICA, and for NHLBI BioData Catalyst powered by Seven Bridges.

Strength in numbers: increasing statistical power in analysis

For many birth defects and rare diseases, and as is often the case in pediatrics, the number of patients and associated samples is relatively low. However, there is often an overlap in disease data appearing in disparate datasets, typically merely referenced on different grants and residing in different storage locations. By bringing these disparate datasets together onto one platform for co-analysis, the number of samples for a specific disease can be drastically increased, an added benefit of DRS-mediated interoperability.  As a result, researchers can utilize increased statistical power in their analysis owing to this increase in sample size. As an added benefit on CAVATICA and NHLBI BioData Catalyst powered by Seven Bridges, this can be done without egress costs.