Release notes

March 11th, 2024

Recently published apps

We’ve published the following new apps on the Seven Bridges Platform:

  • FusionInspector (v2.8.0), a tool that performs validation of fusion transcript predictions. FusionInspector is a part of the Trinity Cancer Transcriptome Analysis Toolkit (CTAT). It takes a list of potential fusion genes (obtained by executing any fusion transcript prediction tool), extracts the genomic regions corresponding to the fusion partners, and creates mini-fusion-contigs that hold the gene pairs in the suggested fused orientation. The original reads align to these putative fusion contigs. In the fusion-gene context, fusion-supporting reads that would typically align as split reads or discordant pairs should align as concordant ‘normal’ reads. Reads that span fragments and reads containing fusion breakpoints that support each fusion, are recognized, reported, and scored accordingly.
  • Arriba (v2.4.0), a tool for the detection of gene fusions from RNA-Seq data. Arriba is designed to work with STAR aligner-processed data, and the post-alignment runtime is typically a few minutes long. Arriba does not require reducing the –alignIntronMax parameter of STAR to identify fusions resulting from focal deletions, in contrast to many other fusion detection methods that are based on STAR. Its intended application was in the context of clinical research. As such, high sensitivity and fast runtimes were crucial design requirements. Arriba can identify structural rearrangements other than gene fusions that may have clinical significance. These include viral integration sites, internal tandem duplications, whole exon duplications, and truncations of genes (i.e., breakpoints in introns and intergenic regions).
  • Arriba draw_fusions.R is an R script that comes with the Arriba gene fusions detection tool. This script produces visualizations of the transcripts involved in predicted fusions that are suitable for publication in terms of quality. For every predicted fusion, it creates a single page in the output PDF file. Each page contains information about the fusion partners, their orientation, the retained exons in the fusion transcript, statistics about the number of supporting reads, and, if the fusion_transcript column has a value, an excerpt of the sequence around the breakpoint.
  • Parabricks RNA Pipeline, utilizing Parabricks toolkit v4.2.0. It is used for SNP and Indel discovery from RNAseq input data.
  • STAARpipeline PheWAS v0.9.6 which is used for analyzing WGS/WES sequencing data in PheWAS.
  • SRA to DRS converter workflow that converts SRA metadata to the DRS format for streamlined genomic data handling. This app is developed to streamline the conversion of Sequence Read Archive (SRA) metadata into Data Repository Service (DRS) URIs. It addresses the challenge faced by researchers in efficiently accessing and utilizing large genomic datasets stored in SRA format. By simplifying and automating this conversion, the app facilitates quicker and more effective genomic data analysis, thus accelerating research in the fields such as disease study and genetic discovery.

Recently updated apps

We also updated the following apps:

  • DESeq2 tool (v1.40.1) that performs differential gene expression analysis across two or more study conditions. DESeq2 performs differential gene expression analysis using negative binomial generalized linear models. It analyzes estimated read counts from several samples, each belonging to one of two or more conditions under study, searching for systematic changes between conditions, as compared to within-condition variability.
Read more

February 14th, 2024

AlphaFold Visualizer – new public project on the Seven Bridges Platform

We published the AlphaFold Visualizer (v0.1.5) as a public project on the Seven Bridges Platform containing an interactive analysis for the visualization of AlphaFold results. AlphaFold is an AI tool for 3D protein structure prediction. As it produces protein structure models as its output, a secondary analysis is needed for the interpretation and visualization of results. This analysis must be interactive, so it has been developed as the Data Studio AlphaFold Visualizer analysis and it represents a great addition to the publicly available app.

Recently published apps

We have also published new workflows for processing Nanopore data:

  • ONT Flowcell Processing  aligns (Minimap2), sorts (Samtools) and quality checks (NanoPlot, Samtools Flagstat, Mosdepth, GATK ComputeLongReadMetrics) input Nanopore data from a single flowcell.
  • ONT WGS Variant Calling – merges (Sambamba), calls variants (Clair3, Sniffles2) and quality checks (Mosdepth, NanoPlot) input BAM files from Nanopore data.
Read more

January 3rd, 2024

Recently updated apps

We have updated the following tools on the Seven Bridges Platform: 

  • Exomiser 13.3.0 – used to identify candidate causative variants from WES or WGS patient VCF data and phenotype HPO terms. 
  • PharmCAT 2.8.3 toolkit: 
    • PharmCAT VCF Preprocess – prepares an input VCF file for PharmCAT. 
    • PharmCAT – takes a single-sample VCF file and returns a report with guideline variants. 
  • Sambamba 1.0.1 toolkit: 
    • Sambamba Index – creates a BAI or FAI index for the provided BAM/FASTA file. 
    • Sambamba Slice – copies a slice (region) of the coordinate sorted and indexed input file in BAM or FASTA format. 
    • Sambamba Sort – sorts alignments in BAM format. 
    • Sambamba Markdup – marks or removes duplicate reads from an input BAM file. 
    • Sambamba Flagstat – creates read flag statistics from a BAM file. 
    • Sambamba Merge – merges alignments in BAM format. 
    • Sambamba View – inspects and filters alignments in SAM/BAM format. 
  • Clair3 1.0.4 – calls small germline variants from data generated by Nanopore, PacBio or Illumina sequencing technologies. 
  • cuteSV 2.1.0 – calls structural variation from sorted long read alignments. 
  • Twelve tools from the Bismark 0.24.1 toolkit: 
    • Bismark – takes files with bisulfite-treated reads and aligns them to a specified bisulfite genome. 
    • Bismark Methylation Extractor – extracts the methylation call for every Cytosine in Bismark result files. 
    • Bismark Genome Preparation – converts the specified reference genome into bisulfite converted genome. 
    • Bismark Deduplicate – removes duplicate Bisulfite-Sequencing (BS-Seq) reads from an alignment file. 
    • Bismark2BedGraph – generates bedGraph and coverage files sorted by chromosomal position. 
    • Bismark2Report – uses Bismark alignment, deduplication and methylation reports to generate a graphical HTML report. 
    • Bismark2Summary – uses Bismark report files of several samples to generate a graphical summary HTML report. 
    • Bismark Bam2Nuc – calculates the mono and di-nucleotide coverage and compares it to the average genomic sequence composition. 
    • Bismark Coverage2Cytosine – generates a cytosine methylation report for a genome of interest. 
    • Bismark Filter Non Conversion – filters incomplete bisulfite conversion in non-CG context in Bismark BAM files. 
    • Bismark Methylation Consistency – splits BAM files based on methylation consistency. 
    • Bismark NOMe Filtering – filters reads in a yacht file (output of Bismark Methylation Extractor). 
  • Bismark Analysis 0.24.1 workflow for analyzing DNA methylation, a type of epigenetic modification. The workflow processes reads from Whole Genome Bisulfite Sequencing (WGBS) and Reduced Representation Bisulfite Sequencing (RRBS). While suitable for any input size, it excels with larger input samples. 
  • Three tools from the Salmon 1.10.1 toolkit: 
    • Salmon Index – builds an index necessary for the Salmon Quant – Reads and Salmon Alevin tools. 
    • Salmon Quant – Reads – infers transcript abundance estimates from RNA-seq data, using Selective Alignment (SA) for mapping. 
    • Salmon Quant – Alignment – estimates transcript abundance from aligned RNA-seq data using the Variational Bayesian EM algorithm. 
  • Salmon Workflow 1.10.1 for estimating transcript abundances from RNA-Seq data using Selective Alignment for mapping. The workflow enables creation of the necessary index for quantification and provides the capability to process multiple samples at once. It also creates an expression matrix at both the transcript and gene levels, aggregating expression results across all samples. 
  • ENCODE Chip-Seq Pipeline (v2.2.1). This workflow represents the ENCODE transcription factor and histone ChIP-Seq analysis pipelines. ChIP-Seq Analysis studies chromatin modifications and binding patterns of transcription factors and other proteins. It combines chromatin immunoprecipitation (ChIP) assays with standard NGS sequencing. The steps of the ChIP-Seq Analysis workflow consist of mapping of reads including duplicate removal, post alignment QC, cross correlation analysis, peak calling with blacklist filtering and a statistical framework, applied to the replicated peaks at the end in order to assess concordance of biological replicates.
  • ENCODE ATAC-Seq Pipeline. ATAC-Seq analysis performs quality control and signal processing, producing alignments and measures of enrichment. The Assay for Transposase-Accessible Chromatin followed by sequencing (ATAC-Seq) experiment provides genome-wide profiles of chromatin accessibility. Briefly, the ATAC-seq method works as follows: loaded transposase inserts sequencing primers into open chromatin sites across the genome, and reads are then sequenced. The ends of the reads mark open chromatin sites.
    The workflow is based on the ENCODE ATAC-seq pipeline, developed by the ENCODE Consortium. The four major steps of the ATAC-Seq analysis are pre-alignment quality control, alignment, post-alignment processing and advanced ATAC-Seq-specific quality control, and peak calling in order to identify accessible regions (which is the basis for advanced downstream analysis).
Read more

November 27th, 2023

Single and Global logout flows defined by SAML protocol are now available for SSO  

Users who access the Seven Bridges Platform through Single Sign-On (SSO) can now perform Single (IdP Initiated) logout to log out of multiple SSO sessions, in a single click. Also, it is now possible to initiate Global (SP initiated) logout flow from the Seven Bridges Platform.

Recently published apps  

We have published the following tools in our Public Apps gallery:  

  • Tximport, a tool that imports and summarizes transcript-level estimates for transcript and gene-level analysis based on the tximport R/Bioconductor package. It is designed to simplify the import of transcript-level abundances, estimated counts, and effective lengths from a variety of upstream tools, for downstream transcript-level or gene-level analysis.  

Three tools from the SplAdder (3.0.4) toolkit:  

  • SplAdder build constructs splicing graphs and extracts alternative splicing events.  
  • SplAdder test differentially tests the usage of alternative event between two groups of samples.  
  • SplAdder viz generates visual overviews of splicing graphs and alternative events. SplAdder viz uses results generated by SplAdder build or SplAdder test to create plots.  

Five tools from the Qualimap 2.3 toolkit:  

  • Qualimap Multi-sample BAM QC reports QC metrics computed in BAM QC analysis combined for multiple samples.  
  • Qualimap Compute counts calculates how many reads are mapped to each region of interest.  
  • Qualimap RNA-seq QC reports quality control metrics and bias estimations for whole transcriptome sequencing.  
  • Qualimap Counts QC analyzes count data to assess a differential expression between two or more conditions.  
  • Qualimap BAM QC reports information for the evaluation of the quality of the provided alignment data.  

Six tools from the RSeQC 5.0.1 toolkit:  

  • RSeQC read distribution calculates how mapped reads were distributed over genomic features.  
  • RSeQC junction annotation compares detected splice junctions to the reference gene model.  
  • RSeQC inner distance is used to calculate the inner distance (or insert size) between two paired RNA reads.  
  • RSeQC infer experiment is designed to estimate how RNA-seq data is configured.  
  • RSeQC read duplication calculates read duplication rate determined by mapping position or sequence of the read.  
  • RSeQC bam stat summarizes mapping statistics of the provided alignment file.  

The Tidyproteomics 1.5.2 toolkit:  

  • Expression analysis – used for proteomics differential expression analysis.  
  • Data input and summary – used for loading protein or peptide data.  
  • Data subsetting and summary – used for subsetting protein or peptide data.  
  • Abundance normalization – used for abundance normalization of the protein or peptide data.  
  • Enrichment analysis – used for enrichment analysis after differential expression analysis.  
  • Annotate data – used for proteomics data annotation before enrichment analysis.  

Improved storage cost breakdown  

To meet the need for precise allocation of storage costs within a division, we have implemented a solution that enables precise per-project storage breakdown on the Seven Bridges Platform.  

The new solution includes improved accuracy in per-project storage breakdown, facilitating precise cost allocation, as well as elimination of storage size discrepancies between per-project and per-division calculations. This presents a great benefit both for users with administrative roles in divisions and Platform users in general as it increases transparency and provides a better insight into storage cost distribution.  

Recently updated apps  

TopHat2, a tool that aligns RNA-Seq reads to a genome to identify exon-exon splice junctions, just got updated to version 2.2.1 and upgraded to CWL version 1.2 (was previously available in CWL draft-2).

Read more

November 13th, 2023

Improved error messages for volume imports

To provide you with more detailed information about each import from an attached volume and enable you to resolve import issues independently, we have added improved notifications in the recently implemented Activity center, available by clicking Open activity center in the Activity feed. When any of the items from a particular import fails, you will be able to see an error message and a corresponding error code for each of the items, allowing you to understand and try to fix the issue. Furthermore, a description and link to the relevant documentation will be provided for each import from a volume.

Recently published apps

The Change-O 1.3.0 toolkit is the latest new toolkit addition in our Public Apps gallery. It includes the following apps: 

  • DefineClones  assigns Ig sequences into clonal groups. 
  • BuildTrees – creates IgPhyML input files. 
  • ParseDb – parses and updates input database files. 
  • AlignRecords – multiple aligns sequence fields. 
  • AssignGenes – assigns V(D)J gene annotations. 
  • MakeDb – creates standardized databases output from the input germline alignment results. 
  • CreateGermlines – reconstructs germline V(D)J sequences for alignment data. 
  • ConvertDb – parses input tab-delimited database files and converts them to different output formats. 

Recently updated apps

We updated Broad Institute’s best practices for somatic copy number variant discovery analyses, to version 4.2.5.0 in our Public Apps gallery: 

  • GATK Somatic CNV Panel Workflow 4.2.5.0 – used for creating a panel of normals (PON) given a set of normal samples. 
  • GATK Somatic CNV Pair Workflow 4.2.5.0 – used for detecting copy number variants (CNVs) from WES/WGS single sample data in tumor-only or matched-normal mode. 
Read more

October 30th, 2023

Recently published apps

The pRESTO 0.7.1. toolkit is the latest new toolkit addition in our Public Apps gallery. It includes the following apps: 

  • ParseLog  Parses pRESTO log records and outputs values in TAB-separated tables. 
  • BuildConsensus – Builds consensus sequences. 
  • ClusterSets – Clusters sequences into groups. 
  • CollapseSeq– Removes duplicates sequences from input FASTA/FASTQ files. 
  • PairSeq – Sorts and matches sequences across input files. 
  • ConvertHeaders – Converts sequence headers to pRESTO format. 
  • AlignSets – Aligns sequences using different methods. 
  • FilterSeq – Filters input sequences. 
  • ParseHeaders – Manipulates sequence headers. 
  • SplitSeq – Splits and samples sequence files. 
  • UnifyHeaders – Reassigns or deletes sequence header fields. 
  • AssemblePairs – Assembles paired-end reads to a single sequence. 
  • MaskPrimers – Removes primers and annotates sequences with primers and barcodes. 
  • EstimateError – Estimates annotation set error rates.  

We also published the following new tools: 

  • ComBat-seq (sva 3.35.2), an R tool used for batch effect adjustment in bulk RNA-seq data. Some additional improvements to the tool wrapper were developed, like removing more than one batch by dataset and adapting outputs to be compatible with downstream analyses (DeSeq). 
  • GffRead (0.12.7) GFF/GTF utility tool providing format conversions, filtering, FASTA sequence extraction, and more. 

Recently updated apps

We published the following updates in our Public Apps gallery: 

  • RNA-seq alignment – STAR (2.7.10a), a workflow that performs the first step of RNA-seq analysis – alignment of the reads to a reference genome. It is used to generate aligned BAM files (in genome and transcriptome coordinates) from RNA-seq data, which can later be used in further RNA studies, like gene expression analysis. 
  • Trim Galore! (0.6.10) is a wrapper around adapter trimming and quality control tools Cutadapt and FastQC with extra functionality for RRBS data.
Read more

October 9th, 2023

Data Browser on the Seven Bridges Platform deprecated to promote interoperability with the CGC  

As the next step in promoting interoperability, we deprecated the Data Browser on the Seven Bridges Platform. As Data Browser remains available on the CGC, this will help promote the CGC as the hub for publicly available data and enable Seven Bridges Platform users who have a CGC account to discover all datasets that are available in the Data Browser on the CGC and much more. 

To complete the process of Data Browser deprecation, in the upcoming period we will disable access to files that were previously imported using Data Browser.  

Recently published apps

 We published Immcantation toolkit 4.4.0 in our Public Apps gallery. The toolkit consists of a set of pipeline scripts which are wrapped as the following tools: 

  • preprocess-phix – removes reads which align to phiX174 from the input sequence file. 
  • presto-abseq – runs pRESTO tools for pre-processing of NEBNext / ABSeq immune sequencing data. 
  • presto-clontech – uses pRESTO tools for analyzing Takara Bio/Clontech SMARTer v1 immune sequencing kit data. 
  • presto-clontech-umi – uses pRESTO tools for analyzing Takara Bio/Clontech SMARTer v2 (UMI) immune sequencing kit data. 
  • changeo-10x – annotates and infers clonal relationships in Cell Ranger 10x Genomics single-cell V(D)J data. 
  • changeo-igblast – does V(D)J alignment using IgBLAST. 
  • tigger-genotype – does TIgGER polymorphism detection and genotyping. 
  • shazam-threshold – calculates clonal assignment threshold. 
  • changeo-clone – runs Change-O cloning and germline reconstruction. 

We also published Nirvana 3.18.1. Nirvana annotates variants from VCF file input and generates a JSON file with results. 

Read more

September 25th, 2023

Recently published apps

We published the following apps in our Public Apps gallery:

  • RNA-SeQC 2.4.2, a tool that computes post-alignment quality control metrics for RNA-Seq data. It takes aligned reads in BAM/SAM or CRAM format and an annotation file as inputs, and outputs different alignment metrics files
  • scCODA 0.1.9, a Python-based tool that performs differential analysis of cell populations.
Read more

September 20th, 2023

Recently published apps

We have just published the following tools from the BBTools 39.01 toolkit:

  • BBDuk: used for trimming, filtering, and masking of input reads.
  • Reformat: used for generic read-processing tasks (changing ASCII quality encoding, interleaving, file format, compression).
  • BBMap: used for splice-aware read alignment.
  • Dedupe: used for removing duplicates from input sequences.
  • SplitNextera: used for splitting Nextera long-mate-pair reads.
  • CalcUniqueness: used for determining library complexity and the need for additional sequencing by generating kmer uniqueness histogram.
  • Taxonomy: used for printing taxonomy information for provided organism identifiers.
  • Repair: used to correct disordered reads and reads whose mates have been lost.
  • Seal: used for alignment-free sequence quantification.
  • BBMerge: used for merging overlapping paired end reads.
  • BBMask: used for masking low-complexity, tandem repeats or SAM mapped regions.
  • Tadpole: used as a kmer-based assembler.
  • Statistics: used for calculating assembly statistics.
  • BBNorm: used for normalizing read depth based on kmer counts.
Read more

We are always engaged in research and development, working to build the future of genomics, science, and health. Let's work together. We'd love to hear about your projects and challenges, so drop us a line.

get in touch