ReleaseGDC Datasets version update
As of December 27, GDC datasets available through the Data Browser and the API correspond to GDC Data Release 21.
Recently published apps
Tabix 1.9 toolkit was updated to CWL1.0.
Bulk moving of files and folders via the API
In order to help you further optimize your API usage and the number of calls required to organize files and folders within projects, we have introduced the option of moving files or folders in bulk from one project location to another. Bulk move is aimed at improving API usage and user experience in general for all users who use the API to run analyses at scale. For more information, please see the related documentation pages:
Recently published apps
The following toolkits had their versions updated and were bumped to CWL1.0:
ImprovementsPassword policy changes on the Platform
In order to further increase the security of the Seven Bridges Platform and user data, we have enforced a stricter password policy. Specifically, the changes include:
- After 5 unsuccessful login attempts, you will be locked out of your account.
- When changing your password, you will not be able to set a previous password if it was already used in the past 365 days.
NewPlatform status page now available at status.sevenbridges.com
In order to enable easier communication of real-time Platform status, we have introduced the Platform status page at status.sevenbridges.com. The page covers the current state of several functional parts of the Platform, as well as our public website and the Platform’s documentation website. There is also an overview of past incidents and a detailed view of the complete incident history.
ImprovementsSearch by ID through multiple datasets at once
We have improved the existing Search by ID feature by enabling you to perform a search that will be applied across all available datasets. The search is performed by clicking Search by ID from the Data Browser’s dataset selection screen, returns sets of matched entities from all available datasets and allows you to select an entity (or a combination of entities) to start the Data Browser with. The search covers every available UUID and ID, either belonging to an entity or property, while retaining the existing capability of searching by file name.
NewRecently published apps
Several new CWL1.0 apps have been published to the Public Apps Gallery:
New BROAD Best Practices workflows: Data Pre-processing and Germline snps and indels variant calling in version 18.104.22.168. These workflows are built according to BROAD’s best practices following their WDL scripts, and together they allow for producing analysis-ready BAM files and VCF files with germline mutations.
eQTL analysis workflows: FastQTL and MatrixEQTL – Expression quantitative trait loci (eQTLs) are genomic variants related to variation in expression levels of mRNAs. These loci could be either cis, in the neighborhood of a gene transcription start site (TSS) or trans, distant eQTLs. The eQTL analysis workflow with FastQTL and MatrixEQTL are designed for fast eQTL analysis on large datasets, using standard mapping methods that test the linkage between variation in expression and genetic polymorphisms. FastQTL works with cis loci, while MatrixEQTL works with both cis and trans loci. These workflows are available on the Platform, starting from standard bioinformatics file formats (VCFs and gene expression results), and producing a comprehensive set of plots, reports and results allowing for easier insight into eQTL analysis.
NanoStringQCPro 1.10.0: NanoString® has introduced the nCounter technology for direct counting of molecules in samples, which enables direct detection of specific RNA, DNA and protein molecules. It provides highly robust data across clinically relevant samples while reducing hands-on time and simplifying analysis. The NanoStringQCPro app performs basic QC steps and data normalization of NanoString mRNA gene expression data.
NewAdded support for Amazon EC2 P3 GPU Instances
We have added support for Amazon P3 GPU instance family to the Seven Bridges Platform. Amazon EC2 P3 instances deliver high performance compute in the cloud with up to 8 NVIDIA® V100 Tensor Core GPUs and up to 100 Gbps of networking throughput. These instances deliver up to one petaflop of mixed-precision performance per instance to significantly accelerate machine learning and high performance computing applications.
NVIDIA drivers come preinstalled and optimized according to the Amazon best practice for the specific instance family and are accessible from the Docker container.
The following instances have been added:
NewDefine Compute Resources per Task Run
When creating a task via visual interface, you are now able to set top level instance type and max number of parallel instances for your execution without having to create a new version of the app. Learn more about setting execution hints on task level from our documentation.
NewAccess task secondary files via the API
You can now use our sevenbridges-python client to access secondary files for task inputs and outputs.
New and improved functionality:
- API users can now see exactly which files were used as secondary files for inputs.
- Python client can now easily get those files via a simple call, as shown in the example below.
- All of this is also supported for CWL 1.x tools and workflows, where the secondary files can be defined as JS expressions.
Some examples utilizing the sevenbridges-python API client:
import sevenbridges as sb
config = sb.Config(profile='default')
api = sb.Api(config=config)
task = api.tasks.get('439221a0-27c8-47a3-bcac-fcc5f44f82a8')
output_secondary_files = task.outputs['my_output'].secondary_files
input_secondary_files = task.inputs['my_input'].secondary_files
Please note that secondary files are captured from tasks as inputs or outputs, not from the file system. This means that the
secondary_files property is available only when the file is pulled from the task itself, not when it is reloaded from the file system or directly instantiated from the file system via the
api.files.get(<FILE_ID>) call or a similar one. The only supported way of getting secondary files is shown above – they need to be captured as soon as possible from the input file.
Learn more about the sevenbridges-python API client.
Whole Genome Sequencing – Quality Control – CWL1.0 Workflow
Data quality control (QC) is an important component of NGS projects, especially with relatively costly whole genome sequencing (WGS). Timely QC can identify and account for issues with the starting biological material (DNA contamination or sample swaps), the sequencing process or bioinformatic pipelines used for processing.
Whole Genome Sequencing – Quality Control – CWL1.0 Workflow is intended as a general-purpose QC flow for users processing WGS data, regardless of the number of samples. It should offer plots which can be easily visually inspected by the end users, as well as structured data output suitable for aggregation and parsing in an automated setup. As it may be of interest to keep the cost and duration of single-sample tasks to a minimum in large-scale sequencing projects, the workflow is designed to be modular, with nodes that can be turned on/off on request, or segments completely skipped (based on input data availability, for example).
ImprovementsExport files to a volume within the same region
It is now possible to mount volumes from all supported cloud providers and regions in read-write (RW) mode on the Seven Bridges Platform. File export is possible to volumes that are in the same location (cloud provider and region) as the file that is being exported, which prevents additional data transfer costs to be caused by the export procedure.