Multi-Omics Analysis on the CGC: Applications in Epigenetics Research

The growing diversity of large cancer datasets has led to increased capabilities for multi-omics research. Multi-omics analysis is a promising approach to reveal the functionality of complex biological systems and processes. However, multi-omics analysis is a complex process that needs to be carefully designed and conducted, beginning from sampling all …

Written by Daniel Ventre PhD, Vesna Pajic PhD, and Jeffrey Grover PhD

GATK Best Practices Spotlight: The GATK Somatic Create Mutect2 Panel of Normals workflow

The GATK Somatic Create Mutect2 Panel of Normals (PON) workflow takes multiple normal sample callsets produced by GATK Somatic SNVs and INDELs workflow tumor-only mode (although it is called tumor-only, normal samples are given as the input) and collates sites present in two or more samples into a sites-only …

Written by Daniel Ventre PhD

Go beyond somatic variant calling: GATK Somatic SNVs and INDELs (Mutect2)

The GATK Somatic SNVs and INDELs (Mutect2) workflow is a somatic variant caller workflow that uses local assembly and realignment to detect single nucleotide variants (SNVs) and insertion and deletion (INDEL) changes. This Mutect2 tool (see the original publication on BioRxiv) is an improvement upon the original “MuTect” tool …

Written by Daniel Ventre PhD

What Makes TOPMed Datasets So Special?

Studies from the Trans-Omics for Precision Medicine (​TOPMed​) program are available for analysis on NHLBI BioData Catalyst. The TOPMed program, funded by the National Heart, Lung, and Blood Institute (NHLBI), part of the National Institutes of Health (NIH), focuses on data specifically for advancing science in the fields of heart, …

Written by Daniel Ventre

Assessing State of the Art Bioinformatics

The Oxford English Dictionary defines bioinformatics as “the science of information and information flow in biological systems, especially of the use of computational methods in genetics and genomics.” In common vernacular, it is often defined as the use of statistical and computing methods to solve or better understand complex biological …

Written by Vladimir Kovacevic

Be Cloud-Agnostic: A Solution for Computing on Genomics Datasets in Distributed Cloud Locations

The Multi-Cloud features on the Seven Bridges Platform allow you to work in a “cloud-agnostic” manner, enabling researchers to access and compute on datasets stored on multiple cloud locations to save time and money.  Empower your research with relevant datasets regardless of where the data lives  Starting a research project with data distributed in multi-cloud […]

Written by Daniel Ventre

How Memoization Enhances Efficiency for Large Scale Genomic Analysis Research Projects

Memoization for large scale genomic analysis allows researchers and bioinformaticians to restart from a point of failure by enabling the reuse of existing outputs. This functionality is of critical importance given the size and complexity of genomic data and the impact of a failure on workflow efficiency and overall cost. …

Written by The Seven Bridges Computation Team

Data Cruncher Public Interactive Analyses

To help researchers transform raw NGS-based data into clinically actionable knowledge, Seven Bridges strives to efficiently bridge the gap between secondary and tertiary analysis. One of the features we offer to achieve this goal is Data Cruncher, which enables scientists to perform interactive computing and open-ended exploration of data on …

Written by Marko Milanovic, Nemanja Vucic, Milan Kovacevic, Boris Majic & Ana Damljanovic

A first look at GATK4 on the Seven Bridges Platform

One of the big take-away messages from the Bio-It World Conference this year was the Broad Institute’s announcement that they plan to fully open source their GATK4 software. By transitioning to a BSD 3-Clause licence, GATK4 becomes fully open for commercial use without a separate commercial licence, which should particularly …

Written by Nick

