Release notes

August 7th, 2023

Recently published apps

We have just published RTG Tools 3.12.1 (ROCPlot, VCFEval, Format), MaxQuant 2.4.2.0 and GATK GenotypeGVCFs 4.4.0.0 tools:

RTG Tools Format converts data files from FASTA/FASTQ/SAM/BAM into the RTG Sequence Data File (SDF) format.
RTG Tools VCFEval is a tool that performs sophisticated comparison of VCF files.
RTG Tools ROCPlot is used to plot ROC curves from ROC data files generated by RTG Tools VCFEval.
MaxQuant integrates algorithms specifically developed for high-resolution, quantitative MS data, and it is applicable for shotgun proteomics.
GATK GenotypeGVCFs is used to joint-genotype GVCF files created by GATK HaplotypeCaller/GATK GenomicsDBImport tools.

Recently published apps

We published Giraffe-DeepVariant workflow 1.0, Cramino 0.9.7 and kyber 0.4.0 tools from the NanoPack2 toolkit, as well as Pisces 5.3.0.0 tool, PureCN NormalDB workflow 2.6.4, PureCN workflow 2.6.4, zUMIs 2.9.7 tool, and AlphaFold 2.3.2 tool. Here are the details:

Giraffe-DeepVariant workflow 1.0 is a pipeline for calling small variants using the pangenome reference. The workflow starts with sequenced reads (FASTQs, CRAM). Reads are mapped to a pangenome with vg giraffe and pre-processed (e.g. indel realignment) before performing the variant calling step using DeepVariant.
Cramino 0.9.7 is a quick QC tool intended for long-read sequencing. It takes a BAM/CRAM format alignment file and creates a QC report with mean coverage, number of reads, their mean and median length and sequence identity relative to the reference genome.
kyber 0.4.0 creates a 600×600 pixel heatmap image of read length and read accuracy from input alignment file (BAM/CRAM format).
Pisces 5.3.0.0 does variant calling from aligned amplicon sequencing data.
PureCN NormalDB workflow 2.6.4 builds a normal database which is used for coverage normalization in PureCN workflow.
PureCN workflow 2.6.4 estimates tumor purity and ploidy, copy number and loss of heterozygosity (LOH). Calculated purity and ploidy combinations are sorted by likelihood score. Copy number and LOH data are provided by both gene and genomic region. The steps in the workflow include: preparation of an interval file for further analysis, calculation of coverage for tumor and normal samples (optionally for additional tumors) and final calculation of purity, ploidy, copy number and LOH results.
zUMIs 2.9.7 takes RNA-seq data with or without UMIs, STAR index files archive and annotation GTF file and analyzes the data as specified by the other input parameters.
AlphaFold 2.3.2 is a machine-learning application which incorporates knowledge about physical and biological protein structure properties into a deep learning algorithm, and predicts protein structures with high accuracy.

Recently published apps

We published the following apps in our Public Apps gallery:

RADx-rad v0.2 Workflow, which is used for metagenomic data analysis of SARS-CoV-2 from wastewater samples. The workflow was developed and ported to CWL as a part of the RADx (Rapid Acceleration of Diagnostics) – the initiative to speed innovation in the development, commercialization, and implementation of technologies for COVID-19 testing, launched by The US National Institutes of Health (NIH).
CNVPanelizer 1.32.0, which generates a report table and visualization of detected CNVs from targeted sequencing data.
Control-FREEC 11.6 which can be used for somatic copy number analysis of WGS, WES and targeted data.

DRS import available on the Seven Bridges Platform

With the introduction of DRS import on the Seven Bridges Platform, you are now able to import files from either external sources or the Cancer Genomics Cloud (CGC). This also presents a significant improvement to data interoperability, as CGC users who have adequate authorizations can now successfully access CGC datasets from the Platform using DRS import. Imported files can then be used as any other file on the Platform.

Learn more about DRS import.

Recently published apps

We have published the following apps in our Public Apps gallery:

VEP Slivar Trios Rare Diseases Analysis with VEP 109.3 version and Slivar 0.3.0 version inside. This analysis is used for preprocessing and analyzing variants from related individuals (trios or families; WES or WGS).
STAR-Fusion (v1.12.0), an app that uses the STAR aligner to identify candidate fusion transcripts supported by Illumina reads.
STAR-Fusion Build FusionFilter Dataset (v1.12.0) that creates the required CTAT genome lib archive for STAR-Fusion execution.
Cutadapt (v4.4), an app most commonly used for removing adapter sequences. It finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequences from high-throughput sequencing reads.
Seven tools from the Kalisto 0.48.0 toolkit:
- kallisto quant computes equivalence classes for reads and quantifies transcript abundances from RNA-Seq data.
- kallisto quant-tcc runs the EM algorithm on a supplied TCC matrix file to make transcript-level estimates.
- kallisto bus produces BUS (Barcode-UMI-Set format) output files from single-cell RNA Seq datasets.
- kallisto merge merges the results of several batches obtained by kallisto pseudo.
- kallisto h5dump converts HDF-5-formatted results to plaintext.
- kallisto index builds an index from a transcriptome FASTA formatted file of target sequences.
- kallisto inspect outputs the target de Bruijn Graph from the kallisto index file in different file formats.

Recently published apps

We published the following apps in our Public Apps gallery:

Parabricks fq2bam (4.0.0-1) – GPU-accelerated alignment, duplicate marking and optionally BQSR.
Parabricks haplotypecaller (4.0.0-1) – GPU-accelerated GATK HaplotypeCaller.
Parabricks deepvariant (4.0.0-1) – GPU-accelerated version of DeepVariant.
Parabricks Somatic Calling workflow – calling somatic variants from a matched tumor-normal sample pair. It is based on running accelerated Mutect2 on GPU instances with or without a panel of normals.

Recently published apps

We published the NanoMod workflow version 1.1. NanoMod is a workflow for detecting RNA modifications using Oxford Nanopore direct long-read sequencing data.

Recently published apps

We published the following tools from the GATK 4.4.0.0 and ensembl-vep 109.3 toolkits:

GATK BaseRecalibrator, which generates a recalibration table based on various covariates for input mapped reads.
GATK ApplyBQSR, which recalibrates the base quality scores of an input BAM or CRAM file containing reads.
GATK GatherBQSRReports, which gathers scattered BQSR recalibration reports into a single file.
GATK HaplotypeCaller, which calls germline SNPs and indels from input BAM file(s) via local re-assembly of haplotypes.
GATK VariantFiltration, which is used for filtering variants in a VCF file based on INFO and/or FORMAT annotations.
Augmented Filter VEP, which is a customized wrapper of the filter_vep script from the ensembl-vep toolkit. The tool is modified to allow GNU parallel-scattered filtering of VEP-annotated VCFs split on chromosomes.
Variant Effect Predictor, which predicts functional effects of genomic variants and is used to annotate VCF files.

In addition, the VEP annotation workflow 109.3 is also live and available in the Public Apps gallery. It is used for preprocessing, annotating, and filtering VCF files using the vt toolkit and VEP.

We also published the PURPLE CNV Calling Workflow used for somatic CNV calling and purity and ploidy estimation on WGS data. It is based on PURPLE 3.7.2, and consists of two additional tools – AMBER and COBALT. The workflow first calculates B-allele frequency (BAF) with AMBER and read depth ratios with COBALT, which is then used by PURPLE to estimate the purity, ploidy and copy number profile of a tumor sample.

RHEO becomes more flexible and robust

For an improved experience of creating code packages and running automations in RHEO, we have introduced several improvements that are already live and ready to be used:

When creating code packages, you can now select which Python version to use in order to maintain reproducibility, but also benefit from the features available in more recent Python versions. The available options are Python 3.6, 3.7 and 3.8, with 3.8 being the default version if this parameter is not specified.
The new AWS execution mode enables automation runs directly on AWS Fargate instances instead of using the available RHEO execution infrastructure. AWS execution mode is activated using a dedicated configuration parameter.
Improved stability and monitoring of automation runs executed both on the RHEO execution infrastructure and AWS Fargate.

Recently published apps

We published the following tools from the STAARpipeline 0.9.6 and FAVORannotator 1.0.0 toolkits:

STAARpipeline tool, which performs phenotype-genotype association analyses using the STAAR procedure. The app is designed for analyzing whole-genome/whole-exome sequencing data.
STAARpipelineSummary VarSet tool, which summarizes results from the STAAR procedure for analyzing WGS and WES data.
STAARpipelineSummary IndVar tool, which extracts information of individual variants from a user-specified variant set.
FAVORannotator tool which functionally annotates genotype data in GDS format using the FAVOR Database. The resulting file can then facilitate a wide range of functionally-informed downstream analyses, for example, phenotype-genotype association analyses using the STAARpipeline toolkit.

Recently published apps

We have published the following new and updated apps in our Public Apps gallery:

ABySS 2.3.5 – a de novo sequence assembler intended for short paired-end reads and genomes of all sizes.
Minia 3.2.6 – a short-read assembler based on a de Bruijn graph.
IDBA 1.1.3 toolkit:
- IDBA-Hybrid – a de novo assembler for hybrid sequencing data.
- IDBA-UD – a short-read-data de novo assembler.
- fq2fa – used for converting FASTQ format read data to FASTA format suitable for IDBA tools.
ABACAS 1.3.1 – used for contiguating reference-based assemblies.
Viralrecon Illumina De novo assembly workflow – designed for amplicon and metagenomics short-reads assembly. It is able to analyze metagenomics data obtained from shotgun sequencing (e.g. directly from clinical samples) and enrichment-based library preparation methods (e.g. amplicon-based or probe-capture-based data). It takes single or multiple sample Illumina short-reads, and performs reads trimming, removing host reads, assembly with one of the five included assemblers, blasting and different QC metrics calculating.
Picard 3.0.0 toolkit:
- Picard CollectMultipleMetrics – collects BAM statistics by running multiple Picard modules at once.
- Picard ValidateSamFile validates an alignments file against the SAM specification.
- Picard SortSam – sorts alignment files (BAM or SAM).
- Picard RevertSam – reverts a BAM/SAM file to a previous state.
- Picard MarkDuplicates – marks duplicate reads in alignment files.
- Picard GenotypeConcordance – calculates genotype concordance between two VCF files.
- Picard GatherBamFiles – merges BAM files after a scattered analysis.
- Picard FixMateInformation – verifies and fixes mate-pair information.
- Picard FastqToSam – converts FASTQ files to an unaligned SAM or BAM file.
- Picard CrosscheckFingerprints – checks a set of data files for sample identity.
- Picard CreateSequenceDictionary – creates a DICT index file for a sequence.
- Picard CollectWgsMetricsWithNonZeroCoverage – evaluates the coverage and performance of WGS experiments.
- Picard CollectVariantCallingMetrics – can be used to collect variant call statistics after variant calling.
- Picard CollectSequencingArtifactMetrics – collects metrics to quantify single-base sequencing artifacts.
- Picard CollectHsMetrics – collects hybrid-selection metrics for alignments in SAM or BAM format.
- Picard CollectAlignmentSummaryMetrics – produces a summary of alignment metrics from a SAM or BAM file.
- Picard CheckFingerprint – checks sample identity of provided data against known genotypes.
- Picard BedToIntervalList – converts a BED file to a Picard INTERVAL_LIST format.
- Picard AddOrReplaceReadGroups – assigns all reads to the specified read group.
SnpEff 5.1d toolkit:
- SnpSift Filter – filters SnpEff-annotated VCF files using arbitrary expressions.
- SnpEff – which is a variant annotation and effect prediction tool.
- SnpSift Annotate – which annotates VCF files.
- SnpSift dbNSFP – which allows annotation with dbNSFP (an integrated database of functional predictions from multiple algorithms, including SIFT, Polyphen2, LRT, MutationTaster, PhyloP and GERP++).

August 7th, 2023

Recently published apps

July 17th, 2023

Recently published apps

June 19th, 2023

Recently published apps

June 6th, 2023

DRS import available on the Seven Bridges Platform

Recently published apps

June 1st, 2023

Recently published apps

May 22nd, 2023

Recently published apps

April 21st, 2023

Recently published apps

April 10th, 2023

RHEO becomes more flexible and robust

Recently published apps

April 3rd, 2023

Recently published apps

Request sent