Seven Bridges is at the Bioinformatics Open Source Conference showcasing how we’re helping researchers do more with their biomedical data. Set up a meeting with us and join us for the following presentations:
Rabix Executor: an open-source executor supporting recomputability and interoperability of workflow descriptions
July 22 | 10:45 – 10:57
Biomedical data has become increasingly easy to generate in large quantities and the methods used to analyze it have proliferated rapidly. Reproducible methods are required to learn from large volumes of data and to reliably disseminate findings and methods within the scientific community. Workflow specifications and execution engines provide a framework with which to perform a sequence of analyses and help address issues with reproducibility and portability across environments. One such specification is the Common Workflow Language (CWL), an emerging standard which provides a robust and flexible framework for describing data analysis tools and workflows. CWL requires executors or workflow engines to interpret the specification and turn tools and workflows into computational jobs, as well as provide additional tooling such as error logging, file organization, and job scheduling optimizations to allow for easy computing on large volumes of data. To this end, we have developed the Rabix Executor, an open-source workflow engine that improves reproducibility through reusability and interoperability of workflow descriptions. We define five major components of the Rabix Executor — frontend, bindings, engine, queue, and backend — each of which is abstracted from the other to maintain a modular design so that components can be used as needed. Developers are able to design custom frontends (e.g. a custom graphical user interface or command line interface), build bindings for the engine to parse a specific set of workflow languages, employ a specific queuing or scheduling protocol of their choice, and submit computational jobs to different backends (e.g. Amazon Web Services, a high-performance computing cluster, a local machine).
For workflow decomposition, the Rabix Executor employs an abstract software model which was defined to be a superset of all known workflow languages; this enables the use of different workflow languages and versions. To demonstrate this, we are capable of running all drafts and versions of Common Workflow Language and CWL v1.0 workflows composed of tools in previous versions of CWL. To our knowledge, Rabix Executor is the only CWL implementation which maintains full backwards compatibility of tools and workflows, removing the need for refactoring existing bioinformatics applications. In scaling benchmarks, the Rabix Executor is capable of running tens of thousands of concurrent jobs from a batch of whole genome sequencing workflows. The modular and abstracted design of the Rabix Executor is intended to allow for reproducible bioinformatics analysis on various infrastructures and continual support of a growing library of bioinformatics tools and workflows.
Rabix Composer: an open-source integrated development environment for the Common Workflow Language
July 22 | 10:57 – 11:07
The Common Workflow Language (CWL) is an emerging standard for describing data analysis workflows which allows for portability between compute environments, improved reproducibility, and the ability to add custom extensions to fit institutions’ needs. The robust and flexible framework provided by CWL has led to its adoption by the National Cancer Institute (NCI) Cancer Cloud Pilots, the NCI Genomic Data Commons, and academic and industrial organizations worldwide. Over time, the CWL community has worked hard to improve the CWL syntax to make it easy to read, easy to parse, and comprehensive in the scope of workflow parameters and behaviors it captures. A trade-off of this approach, however, is that complex bioinformatics workflows may consist of dozens of tools and hundreds of parameters which can be time-consuming to describe manually in CWL; a whole genome sequencing workflow may be hundreds of lines of CWL code alone.
To support the CWL community, we’ve created the Rabix Composer, a stand alone integrated development environment which provides rich visual and text-based editors for CWL. Our vision for the Rabix Composer is to enable rapid workflow composition and testing, provide version control and the ability to add documentation, share tools easily with online platforms and developers, and allow integration with online services such as GitHub. The Rabix Composer was designed by integrating more than 500 pieces of feedback from Seven Bridges users regarding our previous software development kit for CWL.
The Rabix Composer is part of the Rabix project (http://rabix.io), an open-source effort to provide tooling for the bioinformatics developer community. The Rabix project includes the Rabix Executor, a workflow engine that executes CWL descriptions and their associated Docker containers locally on a laptop, HPC, or on multiple cloud environments such as the Seven Bridges Platform. Using the Rabix Executor in combination with the Composer enables developers to create, run and debug bioinformatics applications locally before scaling. Together, these technologies enhance data analysis reproducibility, simplify software sharing/publication, and reduce the friction when designing a new workflow or replicating a scientific finding.
CWL-svg: an open-source workflow visualization library for the Common Workflow Language
July 22 | 11:07 – 11:10
As the Common Workflow Language (CWL) becomes more widely adopted among the bioinformatics community, the volume and complexity of publicly available CWL has increased. The flexibility and portability of CWL encourages developers to tackle difficult pipelines and edge-cases, enabling them to describe intricate processes, which can be executed in multiple environments. However, complex workflows can be challenging for users to interpret. As CWL syntax has matured, the syntactic shortcuts added in recent versions to make the language easier to write can simultaneously make it more difficult to interpret. As a result of these combined aspects of CWL, some workflows, such as BCBio (bcbio-nextgen), can have hundreds of lines of code. Understandably, these workflows can be difficult to understand and debug without external visualization tools. As part of the Rabix Composer, an integrated development environment for CWL, we developed an open-source, workflow visualization library called CWL-svg. The CWL-svg library takes a CWL description of a workflow and creates a scalable vector graphics (SVG) image to provide a visual representation for more intuitive user interactivity. CWL-svg can be used either as a standalone library, which renders SVG files from CWL, or as part of a larger interface. Our goal with CWL-svg was to create a visualization library which would most clearly represent the relevant parts of the CWL description to the user. We incorporated user feedback gleaned from our previous iteration of our CWL Workflow Editor to create a more intuitive and informative user interface. Implementation of the CWL-svg library allows users to select nodes within the workflow (e.g. a tool or file) to highlight immediate connections, rearrange nodes, and use fluid zoom resizing to make workflow details visible at all resolutions. The library also implements an auto-align algorithm which untangles complicated workflows in a visually pleasing arrangement. These design details, combined with meticulous optimizations and attention to efficiency, make CWL-svg ideal for handling complex bioinformatics workflows.
CWL-ts: an open-source TypeScript library for building developer tools for the Common Workflow Language
July 22 | 11:10 – 11:15