Release notes

July 18th, 2022

Recently published apps

We have just published and updated our Public Apps gallery with Regenie 3.1.3, a tool which is used for whole genome regression analysis.

Read more

June 27th, 2022

Recently published apps

We have published the following apps in our Public Apps gallery:

  • Mosdepth 0.3.3 toolkit: Mosdepth, a tool used for fast depth calculation on WGS, WES or targeted BAM and CRAM files and Mosdepth plot_dist which plots Mosdepth results.
  • Personal Cancer Genome Reporter 1.0.3 which is used for functional annotation and classification of somatic variants.
  • Cancer Predisposition Sequencing Reporter 1.0.0 which analyzes cancer-predisposing germline variants.

We have also updated versions and published tools from the following two toolkits: SRA (v3.0.0, CWL1.2) and Salmon (v1.5.2, CWL1.2). Tools that got the update are:

  • SRA sam-dump that converts SRA data into SAM format. With aligned data, NCBI uses Compression by Reference, which only stores the differences in base pairs between sequence data and the segment it aligns to. The process to restore original data, for example as FASTQ, requires fast access to the reference sequences that the original data was aligned to.
  • SRA fasterq-dump that converts SRA data into FASTQ format while using temporary files and multi-threading to speed up the extraction.
  • SRA fastq-dump that converts SRA data into FASTQ format.
  • Salmon Alevin that introduces a family of algorithms for quantification and analysis of 3’ tagged-end single-cell sequencing data.
  • Salmon Index that builds an index necessary for the Salmon Quant and Salmon Alevin tools. To create an index, it uses a transcriptome reference file in FASTA format. Additionally, one can provide a genome reference along with the transcriptome to create a hybrid index compatible with the improved mapping algorithm named Selective Alignment.
Read more

June 20th, 2022

Recently published apps

We have published three VarDict (v1.8.3, CWL1.2) tools and one workflow:

  • VarDictJava is the VarDict variant caller Java port. It can be used to call SNPs, MNVs, small indels or complex variants from DNA or RNA alignments. VarDictJava can be used for amplicon-based variant calling and supports both single sample and paired sample analysis.
  • VarDict var2vcf_valid, a CWL tool that takes VarDict variants tabular file and outputs variants in VCF format.
  • VarDict var2vcf_paired, a CWL tool that converts VarDict tabular output to VCF.
  • VarDict Variant Calling workflow (also VarDict v1.8.3, CWL1.2), which can be used for single sample and paired sample variant calling using VarDictJava starting from WES, WGS or amplicon data.

We have also published the following workflows and a toolkit:

  • CNVnator Analysis workflow 0.4.1 for CNV calling by doing read-depth (RD) analysis of input BAM files.
  • CNVpytor workflow 1.1 for CNV/CNA detection and analysis based on read depth and allele imbalance in WGS.
  • PureCN workflow 1.22.0 for estimating tumor purity and ploidy, copy number and loss of heterozygosity (LOH) and PureCN NormalDB workflow which builds a normal database used for coverage normalization in PureCN workflow 1.22.0.
  • FACETS workflow 0.6.1 for allele-specific copy number analysis (ASCN).
  • Tabix BGZIP 1.14.0 for compressing/decompressing (BAM, VCF, BED, …) any file in BGZF and from BGZF format and Tabix Index 1.14.0 which indexes a TAB-delimited genome position file IN.TAB.BGZ.
Read more

June 6th, 2022

NewData Cruncher and Interactive Analysis become Data Studio and Interactive Browsers

Data Studio, previously Data Cruncher, is an interactive analysis tool which allows you to explore and visualize data using environments like JupyterLab and RStudio. Previously located under the Interactive Analysis tab, it has now been given a more prominent location in the project navigation by having its own tab located next to Tasks. With the removal of Data Studio from the Interactive Analysis tab, the tab’s name has been changed to Interactive Browsers in order to better reflect its contents.

Recently published apps

We have just published an updated version (4.2.5.0) of Mutect2 workflows:

  • GATK Somatic SNVs and INDELs (Mutect2) 4.2.5.0, a workflow used for somatic short variant calling. It runs on a single tumor-normal pair or on a single tumor sample, and performs additional filtering and functional annotation tasks, and
  • GATK Create Mutect2 Panel of Normals 4.2.5.0 that creates a panel of normals for use in other GATK workflows. The workflow takes multiple normal sample callsets and passes them to GATK Somatic SNVs and INDELs (Mutect2) 4.2.5.0 with tumor-only mode (although it is called tumor-only, normal samples are given as the input) and additionally collates sites present in two or more samples into a sites-only VCF.
  • Three apps from the MetaXcan toolkit:
    • S-PrediXcan for computing associations between omic features and a complex trait starting from GWAS summary statistics.
    • S-MultiXcan for computing association from predicted gene expression to a trait, using multiple studies for each gene.
    • MetaMany for serially performing multiple MetaXcan runs on a GWAS study from summary statistics using multiple tissues.
  • The MetaXcan Workflow for computing associations between omic features and complex traits across multiple tissues. The workflow includes two tools from MetaXcan framework – MetaMany and S-MultiXcan and it uses summary statistics from a GWAS study and multiple models that predict the expression or splicing quantification.
  • MaxQuant (v2.0.3.0, CWL1.2), a quantitative proteomics tool designed for analysing large mass-spectrometric data. It uses a target-decoy search strategy to estimate and control the extent of false positives. Within the target-decoy strategy, MaxQuant applies the concept of posterior error probability (PEP) to integrate multiple peptide properties (e.g. length, charge, number of modifications) together with Andromeda score into a single quantity, reflecting the quality of a peptide spectrum match (PSM).
  • Manta (v1.6.0, CWL1.2), a tool used for calling structural variants (germline or somatic) from paired-end data. It can process WGS or WES data and supports germline SV calling on one or more samples (<=10) and somatic SV calling for matched tumor-normal pairs or tumor-only data.
Read more

May 30th, 2022

Recently published apps

We have just published the V-pipe 2.99.2 for SARS-CoV-2 workflow for analyzing high throughput SARS-CoV-2 sequencing data. V-pipe integrates several tools for the analysis of viral high throughput sequencing data. It allows for assessing viral diversity at the level of SNVs, short variant sequences (or local haplotypes), and long-range haplotypes (or global haplotypes).

Read more

May 24th, 2022

Recently published apps

We have just published the updated 0.7.17 version of BWA MEM Bundle, a well-known tool designed for aligning sequence reads onto a large reference genome, and BWA INDEX, used for indexing the reference sequence as a prior step required for BWA MEM Bundle. Both tools are published in CWL1.2.

Read more

May 16th, 2022

Recently published apps

We have published the following apps in our Public Apps gallery:

  • Cyrius (v1.1.1, CWL1.2), a tool that genotypes CYP2D6 in WGS data. It takes WGS BAM or CRAM files and creates a TSV report with CYP2D6 alleles.
  • Two PharmCAT (v1.6.0, CWL1.2) tools:
    • PharmCAT VCF Preprocess is a tool that takes a VCF file and prepares it for downstream processing with PharmCAT, and
    • PharmCAT, a tool for interpreting guideline variants in VCF files.
  • Two Biobambam2 (v2.0.183, CWL1.2) tools:
    • Biobambam2 Bamtofastq that converts BAM/CRAM/SAM files to FASTQ format, and
    • Biobambam2 Bamseqchksum – tool for calculating hashes for the contents of the provided alignments file.
  • Two Cojac (v0.2, CWL1.2) tools:
    • Cojac cooc-mutbamscan is a tool that scans amplicons for mutation co-occurrence, and
    • Cojac cooc-tabmut converts Cojac cooc-mutbamscan results (JSON, YAML) to a CSV file.
  • Six iVar (v1.3.1, CWL1.2) tools:
    • iVar trim takes a sorted BAM file and trims reads based on quality and primers if provided.
    • iVar variants takes an aligned BAM file and a Reference sequence and produces a TSV file with detected variants.
    • iVar filtervariants filters variants across provided TSV replicate or sample variant files.
    • iVar consensus takes an aligned BAM file and generates a FASTA file with consensus sequences and a TXT file with average base qualities.
    • iVar getmasked takes a TSV file with variants generated by iVar tools, Primers BED file, and a TSV file with primer pair information and retrieves primers with mismatches to the reference sequence. Please note that this tool is only applicable to amplicon-based sequencing.
    • iVar removereads takes a BAM file trimmed with iVar trim, a Primers BED file, and the Mismatch primer indices output by iVar getmasked and removes reads associated with identified mismatched primers.
  • Pangolin (v40.5, CWL1.2) (Phylogenetic Assignment of Named Global Outbreak LINeages), is a tool that takes a FASTA file with SARS-CoV-2 sequences and assigns each sequence to a Pango lineage using PangoLEARN.
  • Picard RevertSam (v2.25.7, CWL1.2), is a tool that reverts a BAM/SAM file to a previous state. It can be used to recreate an unaligned BAM file from aligned BAM/SAM files or restore original qualities to post-BQSR files (if original qualities were stored).
Read more

May 4th, 2022

Recently published apps

We have just published the following apps:

  • An updated version of the SRA Download and Set Metadata workflow (SRA Toolkit 3.0.0) that downloads metadata associated with SRA accession via SRA Run Info CGI, (on-demand instance) FASTQ files and sets corresponding metadata.
  • OptiType (v1.3.5, CWL1.2), a tool designed for precision HLA typing from next-generation sequencing data. It is based on the assumption that the correct HLA genotype explains the highest number of mapped reads. Therefore, it searches for the best HLA allele combination of up to six major and six minor HLA-I alleles. The maximum number of reads potentially originating from one selection under the biological constraints that at least one and at most two alleles are selected per locus can be conveniently formulated as an ILP.
  • fastENLOC (v1.0, CWL1.2), a tool that enables integrative genetic association analysis of molecular QTL data and GWAS data. It performs integration of the results from molecular quantitative trait loci (QTL) mapping into genome-wide genetic association analysis of complex traits, with the primary objective of quantitatively assessing the enrichment of the molecular QTLs in complex trait-associated genetic variants and the colocalizations of the two types of association signals.
Read more

April 11th, 2022

Recently published apps

We have just published the following apps in our Public Apps gallery:

  • TwoSampleMR, a tool that performs Mendelian randomization testing for a given exposure-outcome pair. It is a wrapper around the TwoSampleMR R package and uses summary statistics data for making causal inference.
  • CCS, a tool that combines multiple subreads of the same SMRTbell molecule and outputs one highly accurate consensus sequence.
  • lima, a tool used with PacBio single-molecule sequencing data for barcode and primer sequences identification.
  • PacBio Flowcell Data Processing, a workflow that can be used to process PacBio CCS or CLR data in preparation for variant calling.
  • PacBio CCS or CLR WGS Variant Calling workflow that can be used to call structural variants in PacBio CCS or CLR data. The workflow can also call small variants in CCS data using Clair3.
  • WARP WGS DRAGEN-GATK Single Sample, a WGS single sample processing workflow with DRAGMAP and GATK.
  • WARP TargetedSomaticSingleSample Pipeline, designed for somatic human targeted sequencing data analysis. The workflow takes human single sample uBAM input files which will be converted into FASTQ reads and mapped to a reference file. The obtained alignment files will be passed to the quality control tools, calculating and producing different quality metrics. Output targeted somatic alignment files can be further used for variant calling or other analyses by different tools/pipelines, while metrics outputs can give various quality and statistical calculations about input data and the produced alignment file.
Read more

We are always engaged in research and development, working to build the future of genomics, science, and health. Let's work together. We'd love to hear about your projects and challenges, so drop us a line.

get in touch