CloudNeo: CWL Brings Cancer Genomics to the Cloud

Cancer Genomics CloudCommon Workflow Language

A cloud-based workflow for patient-specific tumor neoantigens

CloudNeo—a computational workflow for identifying patient-specific tumor neoantigens from Next-Generation Sequencing (NGS) data, was recently published in Bioinformatics. Originating from Jeffrey Chuang’s lab at The Jackson Laboratory, CloudNeo is a neoantigen prioritization workflow designed specifically for the cloud.

The authors have made the CloudNeo workflow available on the Seven Bridges Cancer Genomics Cloud (CGC) using the Common Workflow Language (CWL)—an emerging standard for sharing and reproducing bioinformatics analyses.

CloudNeo’s availability on the CGC arises from a collaboration supported by an NIH grant to The Jackson Laboratory and by the National Cancer Institute Cancer Genomics Cloud Pilots initiative that funded development of the CGC. The release of CloudNeo on the CGC for use by other researchers fulfills a central goal articulated by the funders of this collaboration.

For this project, our scientists helped deploy and test the CloudNeo workflow on the CGC, where users of the workflow benefit from massive computational scalability, download-free access to large datasets, and flexible computation.

The CloudNeo workflow provides cloud-native scalable capabilities for neoantigen identification, giving researchers a powerful tool to connect genomic information with biologically relevant phenotypes. CloudNeo lets you choose Polysolver or HLAminer for HLA typing, and uses Variant Effect Predictor combined with custom scripts for mutant peptide identification, outputting a list of mutant peptides. 

Finally, NetMHCpan uses the HLA types and mutant peptide sequences to calculate and output binding affinity predictions for mutated and homologous peptides. With a ranked list of potential neoantigens, researchers can focus their efforts effectively towards designing cancer vaccines against neoantigens uniquely present on the patient’s tumor. 

“[CloudNeo] allows users to realize advantages of cloud analysis, including massive computing scalability and access to large datasets on the CGC such as TCGA”—Bais P. et al.

The authors demonstrated the workflow using 23 melanoma tumor samples, with an average wall time to run the workflow on a single genome of 8h 2min for HLAminer and 7h 25min for Polysolver.

A reproducible workflow in the Common Workflow Language

By sharing their CWL description, the authors have made CloudNeo the first CWL workflow published in a peer-reviewed journal (to our knowledge). The scientists and engineers of Seven Bridges have long advocated for this approach, and we anticipate many similar publications in the coming years as cancer researchers increasing take advantage of the cloud to analyze their data. 

CWL promotes computational reproducibility, which allows research institutions to conduct more cost-effective and accurate scientific research. Computational reproducibility enables organizations to track down exactly which piece of software or which data generated a particular insight or action, such as a decision to target a specific gene or rule out an avenue of exploration. Without reproducibility, it is difficult to track errors in an analysis pipeline and debug any issues that arise later, e.g. when attempting to discover novel drug targets or verify results.

CWL is a community-driven specification and emerging standard for describing how to run computational analyses with command line tools. Workflow descriptions are a key component of computational reproducibility or recomputability. Workflow descriptions also allow portability of bioinformatics tools and workflows between computational environments. We use CWL to describe tools and workflows on all Seven Bridges environments.

Seven Bridges has a long history with CWL, having been part of developing and implementing the specification since the beginning. Seven Bridges implemented an enhanced version of CWL Draft 2 on the Cancer Genomics Cloud in late 2015 and on the Seven Bridges platform in early 2016.

In December 2015, Seven Bridges participated in the Pan Cancer Analyses of Whole Genomes (PCAWG) project, where we analyzed more than 1,000 patient samples using CWL workflows. In 2017, Seven Bridges launched support for CWL 1.0 and backwards compatibility for all previous drafts thanks to its new Rabix Executor.

Using the workflow in your own research project

The authors have made CloudNeo available to researchers in two ways:

  1. Through a CWL description via GitHub
  2. Through the Seven Bridges Cancer Genomics Cloud

Researchers can run the CWL version of the full workflow using the Rabix suite of tools —our open-source development project for creating and running computational workflows. The authors’ recommended version of the workflow is implemented as a workflow on CGC, offering additional functionality including graphical interfaces, workflow sharing and version tracking, improved calling of multiple cloud instances, and access to The Cancer Genome Atlas, Cancer Cell Line Encyclopedia, Simons Genome Diversity Project, and TARGET datasets.

Seven Bridges is a keen supporter of initiatives to develop reproducible and portable workflow standards. Drop us a line to find out how using CWL and Rabix can speed adoption and reapplication of complex bioinformatics workflows at your organization.