Seven Bridges is at the AACR Annual Meeting showcasing the Cancer Genomics Cloud, the best way to make use of large-scale cancer data on the cloud.

Precise and Rapid Detection of Gene Fusions and Microbial Pathogens in Next-Generation Sequencing Data with Sequence Bloom Trees

April 2 | 1pm – 5pm | Poster Session

Public archives of sequencing data continue to grow at a rapid pace. (Re)analysis of large datasets can be both expensive and resource-intensive, and users are often only interested in the sample prevalence of a small subset of sequences. We have implemented a Sequence Bloom Tree data structure for the The Cancer Genome Atlas (TCGA) RNA-seq datasets, allowing researchers to rapidly test samples for the presence of sequences of interest.

We demonstrate the ability to rapidly identify samples containing viral transcript sequences, estimate the sample prevalence of gene fusions and novel splice variants, and infer the presence of HeLa-cell contamination in a subset of TCGA data. We have implemented post-querying controls to mitigate false positives arising from the presence of genes with highly similar sequences. The tools are described using Common Workflow Language, allowing researchers to reproducibly generate and query the dataset.

Enabling Petabyte-Scale Cancer Genomics with the NCI Cancer Cloud Pilots

April 3 | 1pm – 5pm | Poster Session

The widespread adoption of next generation sequencing has generated petabytes of multi-dimensional information and created a need for new methodologies to organize, share, and analyze large volumes of data. Access and analysis of this information becomes increasingly challenging as the amount of data grows. This difficulty is exemplified when we consider data generated by the efforts of The Cancer Genome Atlas (TCGA) network, which encompasses more than 2.5 petabytes. Downloading the complete TCGA repository can require several weeks or more, and access to large institutional compute clusters puts integrated analysis out of reach for many researchers.

The Cancer Cloud Pilot project seeks to directly address these challenges by co-localizing data alongside computational analysis tools. The Cancer Genomics Cloud (CGC) enables researchers to leverage the power of cloud computing to gain actionable insights about cancer biology and human genetics from massive public datasets including TCGA, the Cancer Cell Line Encyclopedia, and the Simons Genome Diversity Project dataset. Our approach includes collaborative tools, security permissions, data harmonization, metadata curation, resource description frameworks, and visual tools. Computational reproducibility is supported by the Common Workflow Language. More than 1,200 researchers have analyzed more than 50,000 samples on the CGC since its launch in February 2016.

We will present a case study on the application of unsupervised learning methods to identify individual cell types within tumors using TCGA RNA-seq data and demonstrate how researchers can apply open pipelines on their own data to interrogate cancer subtypes and mixed cell populations.

NCI’s Center for Cancer Genomics and Cloud Pilots Initiatives: Using
Large-Scale Data to Advance Precision Oncology

April 3 | 4:15pm – 6:15pm, Marquis Ballroom Salons 3-4 | Marriott Marquis DC

The growth of large-scale sequence data in cancer research, including those generated through CCG programs, is rapidly outstripping the required computational capacity for data storage, processing, network transmission, and analysis.  The NCI Cancer Genomics Cloud (CGC) Pilots seek to create a new model for computational analysis of large-scale biological data. The pilot projects combine data from CCG’s TCGA project with co-located computational capacity and an Application Programming Interface that provides security and data access for developers. The cloud model democratizes access to NCI-generated genomic data and provides a more cost-effective way to provide computational support to the cancer research community.

We will give a general overview of the Cancer Genomics Cloud Pilot and provide updates on the Cancer Target Discovery and Development (CTD2) network and Human Cancer Models Initiative (HCMI) functional genomics initiatives and conclude with an overview of how to access and compute over the comprehensive genomic data generated by CCG initiatives using Seven Bridges’ Cancer Genomics Cloud.

