Recently published apps
We have published the following apps in our Public Apps gallery:
- Cyrius (v1.1.1, CWL1.2), a tool that genotypes CYP2D6 in WGS data. It takes WGS BAM or CRAM files and creates a TSV report with CYP2D6 alleles.
- Two PharmCAT (v1.6.0, CWL1.2) tools:
- PharmCAT VCF Preprocess is a tool that takes a VCF file and prepares it for downstream processing with PharmCAT, and
- PharmCAT, a tool for interpreting guideline variants in VCF files.
- Two Biobambam2 (v2.0.183, CWL1.2) tools:
- Biobambam2 Bamtofastq that converts BAM/CRAM/SAM files to FASTQ format, and
- Biobambam2 Bamseqchksum – tool for calculating hashes for the contents of the provided alignments file.
- Two Cojac (v0.2, CWL1.2) tools:
- Cojac cooc-mutbamscan is a tool that scans amplicons for mutation co-occurrence, and
- Cojac cooc-tabmut converts Cojac cooc-mutbamscan results (JSON, YAML) to a CSV file.
- Six iVar (v1.3.1, CWL1.2) tools:
- iVar trim takes a sorted BAM file and trims reads based on quality and primers if provided.
- iVar variants takes an aligned BAM file and a Reference sequence and produces a TSV file with detected variants.
- iVar filtervariants filters variants across provided TSV replicate or sample variant files.
- iVar consensus takes an aligned BAM file and generates a FASTA file with consensus sequences and a TXT file with average base qualities.
- iVar getmasked takes a TSV file with variants generated by iVar tools, Primers BED file, and a TSV file with primer pair information and retrieves primers with mismatches to the reference sequence. Please note that this tool is only applicable to amplicon-based sequencing.
- iVar removereads takes a BAM file trimmed with iVar trim, a Primers BED file, and the Mismatch primer indices output by iVar getmasked and removes reads associated with identified mismatched primers.
- Pangolin (v40.5, CWL1.2) (Phylogenetic Assignment of Named Global Outbreak LINeages), is a tool that takes a FASTA file with SARS-CoV-2 sequences and assigns each sequence to a Pango lineage using PangoLEARN.
- Picard RevertSam (v2.25.7, CWL1.2), is a tool that reverts a BAM/SAM file to a previous state. It can be used to recreate an unaligned BAM file from aligned BAM/SAM files or restore original qualities to post-BQSR files (if original qualities were stored).
Recently published apps
We have just published the following apps:
- An updated version of the SRA Download and Set Metadata workflow (SRA Toolkit 3.0.0) that downloads metadata associated with SRA accession via SRA Run Info CGI, (on-demand instance) FASTQ files and sets corresponding metadata.
- OptiType (v1.3.5, CWL1.2), a tool designed for precision HLA typing from next-generation sequencing data. It is based on the assumption that the correct HLA genotype explains the highest number of mapped reads. Therefore, it searches for the best HLA allele combination of up to six major and six minor HLA-I alleles. The maximum number of reads potentially originating from one selection under the biological constraints that at least one and at most two alleles are selected per locus can be conveniently formulated as an ILP.
- fastENLOC (v1.0, CWL1.2), a tool that enables integrative genetic association analysis of molecular QTL data and GWAS data. It performs integration of the results from molecular quantitative trait loci (QTL) mapping into genome-wide genetic association analysis of complex traits, with the primary objective of quantitatively assessing the enrichment of the molecular QTLs in complex trait-associated genetic variants and the colocalizations of the two types of association signals.
Recently published apps
We have just published the following apps in our Public Apps gallery:
- TwoSampleMR, a tool that performs Mendelian randomization testing for a given exposure-outcome pair. It is a wrapper around the TwoSampleMR R package and uses summary statistics data for making causal inference.
- CCS, a tool that combines multiple subreads of the same SMRTbell molecule and outputs one highly accurate consensus sequence.
- lima, a tool used with PacBio single-molecule sequencing data for barcode and primer sequences identification.
- PacBio Flowcell Data Processing, a workflow that can be used to process PacBio CCS or CLR data in preparation for variant calling.
- PacBio CCS or CLR WGS Variant Calling workflow that can be used to call structural variants in PacBio CCS or CLR data. The workflow can also call small variants in CCS data using Clair3.
- WARP WGS DRAGEN-GATK Single Sample, a WGS single sample processing workflow with DRAGMAP and GATK.
- WARP TargetedSomaticSingleSample Pipeline, designed for somatic human targeted sequencing data analysis. The workflow takes human single sample uBAM input files which will be converted into FASTQ reads and mapped to a reference file. The obtained alignment files will be passed to the quality control tools, calculating and producing different quality metrics. Output targeted somatic alignment files can be further used for variant calling or other analyses by different tools/pipelines, while metrics outputs can give various quality and statistical calculations about input data and the produced alignment file.
Recently published apps
We’ve just published AnnotationDbi select and mapIds, a tool that maps one type of IDs to another. It is based on Bioconductor annotation data packages.
Recently published apps
New apps have been added to the Seven Bridges Platform:
- Two tools from the Samplot toolkit:
- Samplot Plot takes alignment files and coordinates for a region containing the SV call of interest (Chromosome, Start position, and End position) and creates a plot of the SV region.
- Samplot Vcf can be used to create visualizations of structural variant calls from a VCF file.
- Seven tools from the Smoove toolkit:
- Smoove Annotate annotates SV calls with SV quality and gene information from GFF3 files.
- Smoove Call calls structural variants with Lumpy and optionally calls svtyper.
- Smoove Duphold annotates SV calls in the file based on information from the provided alignment files.
- Smoove Genotype runs svtyper in parallel on provided SV inputs.
- Smoove Merge merges SV calls from individual files with SV calls and sorts them using svtools.
- Smoove Paste squares matching SV calls from individual files to a single joint file with final calls.
- Smoove Plot-counts takes a VCF file created by other Smoove tools and plots counts of split and discordant reads before and after filtering.
- Upgraded four Sambamba tools to 0.8.1 (and CWL 1.2) and added three new tools:
- Sambamba Flagstat generates statistics from read flags in a BAM file.
- Sambamba Index creates a BAI or FAI index for the provided input.
- Sambamba Markdup can be used to mark or remove duplicate reads from an input BAM file.
- Sambamba Merge merges alignments in BAM format.
- Sambamba Slice can be used to copy a slice (region) of the coordinate sorted and indexed input file in BAM or FASTA format.
- Sambamba Sort sorts alignments in BAM format.
- Sambamba View accepts alignments in BAM or SAM format and outputs data in a user-specified format.
- Vcf2maf is a tool that converts VCF files to MAF files. To obtain a MAF file from a VCF, each variant must be mapped to exactly one gene transcript/isoform that it might affect, and be associated with exactly one effect. Vcf2maf invokes Variant Effect Predictor to choose the transcript and effect associated with each variant in the output MAF file.
- Cancer Predisposition Sequencing Reporter tool that can be used to interpret germline variants in the context of cancer predisposition. The tool takes a VCF file with germline variants obtained from WES or WGS, cross-references the set with the user-selected set of genes of interest, annotates the variants and reports the associated information (ClinVar-classified variants, ACMG secondary findings, Variant biomarkers and GWAS hits).
- Personal Cancer Genome Reporter can be used to analyze a VCF with somatic variants obtained using WES, WGS or targeted sequencing. It is a tool for functional annotation and classification of somatic variants.
- Whole Genome Sequencing – Quality Control workflow, used for quality control of WGS data. The workflow is intended as a general-purpose QC workflow for users processing WGS data, offering plots which can be easily inspected, as well as structured data output suitable for aggregation and parsing in an automated setup (JSON and TAR.GZ archive with all QC files).
GDC Datasets version update
As of March 11, 2022, GDC datasets available through the Data Browser and the API correspond to GDC Data Release 31.
Recently published apps
We have added four apps to our public apps gallery:
- Single cell RNA-seq velocity analysis with scVelo 0.2.4 workflow that performs preprocessing, marker gene analysis, and velocity analysis of single-cell expression data. It is based on SingleCellExperiment, Seurat, scran, scater, AnnotationHub, scuttle, and scVelo.
- Velocyto.py – Velocyto 0.17.17 is a package for the analysis of expression dynamics in single cell RNAseq data. In particular, it enables estimations of RNA velocities of single cells by distinguishing unspliced and spliced mRNAs in standard single-cell RNA sequencing protocols. Velocyto.py is a command line tool (distributed with the package) that is used to generate spliced/unspliced count matrices.
- SBG single cell object convertor tool that performs conversion of single cell data object type for commonly used formats: Seurat, AnnotatedData, and SingleCellExperiment.
- Single cell RNA-seq trajectory analysis with slingshot and tradeSeq, a tool that performs single cell trajectory analysis with slingshot 2.0.0, and differential expression testing on inferred trajectories with tradeSeq 1.6.0. Slingshot takes advantage of single cell data principal components analysis (PCA) and clustering to infer probable paths of cell development.
NewSupport for Nextflow and WDL workflows available on the Seven Bridges Platform
Apart from significant contributions from Seven Bridges team members to the development of the Common Workflow Language (CWL) and its extensive implementation on our Platform, we are now taking a step further and providing support for two more workflow description languages, Nextflow and WDL. This presents a groundbreaking move in the direction of enabling you to reduce the time needed to bring your apps to the Seven Bridges Platform, eliminate the need for conversion of your Nextflow or WDL code, while still allowing you to use a better interface for running workflows and all other out-of-the-box features in the Seven Bridges ecosystem.
The process of bringing Nextflow and WDL apps to the Seven Bridges Platform is designed to be in line with the existing app development tools and practices and is a matter of using the existing Nextflow and WDL code, making configuration optimizations to get the maximum out of the Seven Bridges execution environment and using the sbpack utility to help with the optimization and the actual communication with the Seven Bridges Platform. To guide you through the process, the documentation describes the process for both Nextflow and WDL workflows.
This implementation is the initial release and will be under intensive development in the future. Further improvements will include performance improvements, such as the capability of running different parts of a Nextflow or WDL workflow on different compute instances, as well as many others that will significantly contribute to a better user experience.
ImprovementsRHEO – improved scaling and more elastic execution infrastructure
We have improved our automation execution infrastructure by enabling it to scale up its compute capacity automatically when there is an increased workload and a need for new automation runs to be initialized. In addition, we have introduced a limit of 30 parallel automation runs per Division as a measure of precaution to ensure proper use of the elastic automation execution service. Executions that require more than 30 parallel runs are still absolutely possible, as the limit can be increased if there is an actual need for more capacity. Please contact firstname.lastname@example.org for more details.
AWS i3 instances available on all environments
With this update you can use the newest Amazon EC2 I3 instances designed for data-intensive, high transaction, low latency workloads, offering the best price per I/O performance (I3) and the lowest price per GB of SSD instance storage on Amazon EC2 (I3en).
The following instances have been added:
Learn more about supported AWS instance types.
Recently published apps
We have published GATK RNAseq short variant discovery 126.96.36.199 workflow, which represents a CWL implementation of the official GATK best practices workflow given in WDL for RNASeq variant discovery. Starting from an unmapped BAM file, the workflow performs alignment to the reference genome, followed by marking of duplicates, reassigning of mapping qualities, base recalibration, variant calling, and variant filtering.
NewRecently published apps
We have published 10 tools from the GRIDSS module software suite (toolkit) containing tools useful for the detection of genomic rearrangements:
- GRIDSS tool, a structural variation caller for Illumina sequencing data. It calls variants based on alignment-guided positional de Bruijn graph genome-wide break-end assembly, split read, and read pair evidence.
- GRIDSS Extract Overlapping Fragments is used to extract reads of interest for targeted GRIDSS variant calling.
- GRIDSS Annotate VCF Kraken2 adds Kraken2 classifications to single breakend and breakpoint inserted sequences.
- GRIDSS Annotate VCF RepeatMasker adds RepeatMasker classifications to inserted sequences.
- GRIDSS GeneratePonBedpe aggregates variants from multiple VCFs and counts the number of samples supporting each.
- GRIDSS SetupReference is used for generating additional files for the reference needed for running GRIDSS.
- GRIDSS Somatic Filter filters somatic calls from a VCF generated by GRIDSS joint tumor/normal variant calling.
- GRIDSS VIRUSBreakend is a high-speed viral integration detection tool. It is designed to be incorporated in the WGS pipelines with minimal additional cost.
- GRIPSS applies a set of filtering and post processing steps on GRIDSS paired tumor-normal output. It produces a high confidence set of somatic SV for a tumor sample.
- GRIPSS Hard Filter applies a set of filtering and post processing steps on GRIDSS paired tumor-normal output. It produces a high confidence set of somatic SV for a tumor sample.