Machine Learning and Image Processing on the CGC: Tools For Success

Machine learning is becoming ubiquitous in the bioinformatics space: applying machine learning algorithms to analysis of proteomics, genomics, and other -omics datasets has provided a wealth of analysis and interpretations of data not easily achievable by conventional methods. The CGC offers many helpful features for users performing machine learning (ML) …

Written by Dan Ventre PhD, Soner Koc, and Ana Stankovic

Multi-Omics Analysis on the CGC: Applications in Epigenetics Research

The growing diversity of large cancer datasets has led to increased capabilities for multi-omics research. Multi-omics analysis is a promising approach to reveal the functionality of complex biological systems and processes. However, multi-omics analysis is a complex process that needs to be carefully designed and conducted, beginning from sampling all …

Written by Daniel Ventre PhD, Vesna Pajic PhD, and Jeffrey Grover PhD

GATK Best Practices Spotlight: The GATK Somatic Create Mutect2 Panel of Normals workflow

The GATK Somatic Create Mutect2 Panel of Normals (PON) workflow takes multiple normal sample callsets produced by GATK Somatic SNVs and INDELs workflow tumor-only mode (although it is called tumor-only, normal samples are given as the input) and collates sites present in two or more samples into a sites-only …

Written by Daniel Ventre PhD

Go beyond somatic variant calling: GATK Somatic SNVs and INDELs (Mutect2)

The GATK Somatic SNVs and INDELs (Mutect2) workflow is a somatic variant caller workflow that uses local assembly and realignment to detect single nucleotide variants (SNVs) and insertion and deletion (INDEL) changes. This Mutect2 tool (see the original publication on BioRxiv) is an improvement upon the original “MuTect” tool …

Written by Daniel Ventre PhD

The Data Repository Service API on Seven Bridges: Towards Global Interoperability

For the first time ever, users on CAVATICA are able to import datasets from NHLBI BioData Catalyst powered by Seven Bridges, such as TOPMed, onto the CAVATICA platform for analysis. Likewise, users on NHLBI BioData Catalyst can import Kid’s First Datasets from CAVATICA in the same manner. Before now, there …

Written by Daniel Ventre PhD

Accurate sequencing data analysis for under-represented populations: The Pan-African genome

We are excited to announce the release of our GRAF Population Solution, a set of tools, services, and workflows  that enable the construction of genome graph references for targeted populations and/or studies. In this blog post, we discuss some of the advantages of population-specific graphs compared to  methods that rely …

Written by Deniz Turgut, Kübra Narcı, Güngör Budak, H. Serhat Tetikol, and Seven Bridges GRAF Team

Promoting Interoperability and Standardization: Seven Bridges and GA4GH

The Global Alliance for Genomics and Health (GA4GH) is an international body set to create policies and promote technical standards to maximize interoperability among various stakeholders involved with genomics and healthcare-related data. Through its engagement in GA4GH, Seven Bridges actively works with platform development partners and industry leaders to develop …

Written by Dan Ventre

The Annotation Explorer: 1 billion variants, hundreds of annotations, and just a few minutes

With the ongoing proliferation of genome sequencing data, the number of rare variants found is growing rapidly. To detect associations between phenotypes of interest and these rare variants, researchers employ mechanisms to increase statistical power in association testing. Variant annotation information can be used to combine variants into biologically-relevant units …

Written by Dan Ventre

Single and Multiple Variant Association Testing on Seven Bridges

For researchers interested in performing genotype-phenotype association studies, Seven Bridges offers a suite of tools for both single-variant and multiple-variant association testing. These tools and features include EPACTS, PLINK, and the GENESIS pipelines. EPACTS EPACTS (Efficient and Parallelizable Association Container Toolbox) is a versatile software pipeline to perform various statistical …

Written by Dan Ventre, Alison Leaf, Dave Roberson, Ana Stankovic, and Aleksandar Danicic

