Release notes

May 22nd, 2023

Recently published apps

We published the NanoMod workflow version 1.1. NanoMod is a workflow for detecting RNA modifications using Oxford Nanopore direct long-read sequencing data.

Read more

April 21st, 2023

Recently published apps

We published the following tools from the GATK 4.4.0.0 and ensembl-vep 109.3 toolkits:

  • GATK BaseRecalibrator, which generates a recalibration table based on various covariates for input mapped reads.
  • GATK ApplyBQSR, which recalibrates the base quality scores of an input BAM or CRAM file containing reads.
  • GATK GatherBQSRReports, which gathers scattered BQSR recalibration reports into a single file.
  • GATK HaplotypeCaller, which calls germline SNPs and indels from input BAM file(s) via local re-assembly of haplotypes.
  • GATK VariantFiltration, which is used for filtering variants in a VCF file based on INFO and/or FORMAT annotations.
  • Augmented Filter VEP, which is a customized wrapper of the filter_vep script from the ensembl-vep toolkit. The tool is modified to allow GNU parallel-scattered filtering of VEP-annotated VCFs split on chromosomes.
  • Variant Effect Predictor, which predicts functional effects of genomic variants and is used to annotate VCF files.

In addition, the VEP annotation workflow 109.3 is also live and available in the Public Apps gallery. It is used for preprocessing, annotating, and filtering VCF files using the vt toolkit and VEP.

We also published the PURPLE CNV Calling Workflow used for somatic CNV calling and purity and ploidy estimation on WGS data. It is based on PURPLE 3.7.2, and consists of two additional tools – AMBER and COBALT. The workflow first calculates B-allele frequency (BAF) with AMBER and read depth ratios with COBALT, which is then used by PURPLE to estimate the purity, ploidy and copy number profile of a tumor sample.

Read more

April 10th, 2023

RHEO becomes more flexible and robust

For an improved experience of creating code packages and running automations in RHEO, we have introduced several improvements that are already live and ready to be used:

  • When creating code packages, you can now select which Python version to use in order to maintain reproducibility, but also benefit from the features available in more recent Python versions. The available options are Python 3.6, 3.7 and 3.8, with 3.8 being the default version if this parameter is not specified.
  • The new AWS execution mode enables automation runs directly on AWS Fargate instances instead of using the available RHEO execution infrastructure. AWS execution mode is activated using a dedicated configuration parameter.
  • Improved stability and monitoring of automation runs executed both on the RHEO execution infrastructure and AWS Fargate.

Recently published apps

We published the following tools from the STAARpipeline 0.9.6 and FAVORannotator 1.0.0 toolkits:

  • STAARpipeline tool, which performs phenotype-genotype association analyses using the STAAR procedure. The app is designed for analyzing whole-genome/whole-exome sequencing data.
  • STAARpipelineSummary VarSet tool, which summarizes results from the STAAR procedure for analyzing WGS and WES data.
  • STAARpipelineSummary IndVar tool, which extracts information of individual variants from a user-specified variant set.
  • FAVORannotator tool which functionally annotates genotype data in GDS format using the FAVOR Database. The resulting file can then facilitate a wide range of functionally-informed downstream analyses, for example, phenotype-genotype association analyses using the STAARpipeline toolkit.
Read more

April 3rd, 2023

Recently published apps

We have published the following new and updated apps in our Public Apps gallery:

  • ABySS 2.3.5 – a de novo sequence assembler intended for short paired-end reads and genomes of all sizes.
  • Minia 3.2.6 – a short-read assembler based on a de Bruijn graph.
  • IDBA 1.1.3 toolkit:
    • IDBA-Hybrid – a de novo assembler for hybrid sequencing data.
    • IDBA-UD – a short-read-data de novo assembler.
    • fq2fa – used for converting FASTQ format read data to FASTA format suitable for IDBA tools.
  • ABACAS 1.3.1 – used for contiguating reference-based assemblies.
  • Viralrecon Illumina De novo assembly workflow – designed for amplicon and metagenomics short-reads assembly. It is able to analyze metagenomics data obtained from shotgun sequencing (e.g. directly from clinical samples) and enrichment-based library preparation methods (e.g. amplicon-based or probe-capture-based data). It takes single or multiple sample Illumina short-reads, and performs reads trimming, removing host reads, assembly with one of the five included assemblers, blasting and different QC metrics calculating.
  • Picard 3.0.0 toolkit:
    • Picard CollectMultipleMetrics – collects BAM statistics by running multiple Picard modules at once.
    • Picard ValidateSamFile validates an alignments file against the SAM specification.
    • Picard SortSam – sorts alignment files (BAM or SAM).
    • Picard RevertSam – reverts a BAM/SAM file to a previous state.
    • Picard MarkDuplicates – marks duplicate reads in alignment files.
    • Picard GenotypeConcordance – calculates genotype concordance between two VCF files.
    • Picard GatherBamFiles – merges BAM files after a scattered analysis.
    • Picard FixMateInformation – verifies and fixes mate-pair information.
    • Picard FastqToSam – converts FASTQ files to an unaligned SAM or BAM file.
    • Picard CrosscheckFingerprints – checks a set of data files for sample identity.
    • Picard CreateSequenceDictionary – creates a DICT index file for a sequence.
    • Picard CollectWgsMetricsWithNonZeroCoverage – evaluates the coverage and performance of WGS experiments.
    • Picard CollectVariantCallingMetrics – can be used to collect variant call statistics after variant calling.
    • Picard CollectSequencingArtifactMetrics – collects metrics to quantify single-base sequencing artifacts.
    • Picard CollectHsMetrics – collects hybrid-selection metrics for alignments in SAM or BAM format.
    • Picard CollectAlignmentSummaryMetrics – produces a summary of alignment metrics from a SAM or BAM file.
    • Picard CheckFingerprint – checks sample identity of provided data against known genotypes.
    • Picard BedToIntervalList – converts a BED file to a Picard INTERVAL_LIST format.
    • Picard AddOrReplaceReadGroups – assigns all reads to the specified read group.
  • SnpEff 5.1d toolkit:
    • SnpSift Filter – filters SnpEff-annotated VCF files using arbitrary expressions.
    • SnpEff – which is a variant annotation and effect prediction tool.
    • SnpSift Annotate – which annotates VCF files.
    • SnpSift dbNSFP – which allows annotation with dbNSFP (an integrated database of functional predictions from multiple algorithms, including SIFT, Polyphen2, LRT, MutationTaster, PhyloP and GERP++).
Read more

March 27th, 2023

Recently published apps

We have just published the following GATK 4.4.0.0 tools:

  • GATK IndexFeatureFile used for indexing of provided feature files.
  • GATK MergeVcfs – used for combining multiple variant files.
  • GATK VariantEval BETA – used for evaluating variant calls.
  • GATK FilterMutectCalls – used to filter somatic SNVs and indels called by Mutect2.

We have also published Minimac 4 4.1.2, which is a tool for imputing genotypes.

Read more

March 20th, 2023

Recently published apps

Metagenomics WGS analysis – Centrifuge 1.0.4

A workflow for analyzing metagenomic samples. It assigns taxonomic labels to DNA sequences, estimates the abundance of the taxonomic categories in the sample, makes visualizations that give insights into the taxonomic structure of the sample, and makes files that are suitable for downstream analysis. This allows researchers to assign reads from their samples to a likely species of origin and quantify each species’ abundance.

Reference Index Creation – Centrifuge 1.0.4

A workflow that builds an index from reference sequences downloaded from NCBI databases.

Five tools from the Centrifuge 1.0.4 toolkit:

  • Centrifuge Classifier is the main tool of the Centrifuge toolkit, used for classification of metagenomics reads.
  • Centrifuge Download is a part of the Centrifuge toolkit, used for downloading reference sequences from NCBI.
  • Centrifuge Build is a part of the Centrifuge toolkit, which makes a Centrifuge index from DNA sequences.
  • Centrifuge Kreport is used to make a Kraken-style report from the Centrifuge Classifier output.
  • Centrifuge Inspect is a part of the Centrifuge toolkit that inspects index files.
Read more

February 13th, 2023

Recently published apps

We have published HTSeq-count (2.0.2 in CWL 1.2). HTSeq-count is a Python tool for counting how many reads map to each feature. It takes aligned reads together with a list of genomic features as inputs, and outputs a TSV table with counts for each genomic feature.

Read more

February 6th, 2023

We have just published five tools from the GraphicsMagick 1.3.38 toolkit, the swiss army knife of image processing:

  • GraphicsMagick compare compares two images using statistics and/or visual differencing. The tool compares two images and reports difference statistics according to specified metrics, and/or outputs an image with a visual representation of the differences.
  • GraphicsMagick composite composites (combines) images to create a new image.
  • GraphicsMagick conjure interprets and executes scripts in the Magick Scripting Language (MSL). The Magick scripting language (MSL) will primarily benefit those that want to accomplish custom image processing tasks but do not wish to program.
  • GraphicsMagick convert is used to convert an input image file using one image format to an output file with the same or different image format while applying an arbitrary number of image transformations.
  • GraphicsMagick montage creates a composite image by combining several separate images.
Read more

January 16th, 2023

Recently published apps

We have just published two Bowtie2 2.4.5 (CWL 1.2) tools:

  • Bowtie2 Indexer, for building a Bowtie index from a set of DNA sequences.
  • Bowtie2 Aligner, for performing end-to-end read alignment.

On top of that, there are two more additions to our Public Apps gallery:

  • RSeQC – Junction Saturation 5.0.1 (CWL 1.2) tool for determining if the sequencing depth is sufficient to perform alternative splicing.
  • GATK IndexFeature 4.2.5.0 tool.
Read more

We are always engaged in research and development, working to build the future of genomics, science, and health. Let's work together. We'd love to hear about your projects and challenges, so drop us a line.

get in touch