Biomedical Datasets for Large Scale Analysis

Useful Data
Back to Blog

Biomedical Datasets for Large Scale Analysis

Biomedical data is growing at astonishing rates with the broadening of access to next-generation sequencing (NGS) and single-cell solutions. Not only is more data being generated, but it is also being generated with improvements in base-calling accuracy, read length and the generation of reads from each end of the library fragment (paired-end reads), which necessitates frequent updates for the datasets. At Seven Bridges we strive to provide easy access to the most current and the largest volume of datasets across multiple diseases states, made accessible through our multi-cloud computing platform.

We are happy to share with you the following dataset updates on the Seven Bridges Platform.

Human Cell Atlas

The Human Cell Atlas (HCA) project is focused on generating a map of all the cells in the human body at single-cell resolution. The project will expand researchers’ ability to study developmental biology to disease progression and is made possible because massively parallel single-cell genomics assays can now profile hundreds of thousands of cells.

The first datasets from the HCA were released in April 2018 and are now available on the Seven Bridges Platform enabling researchers to easily access and analyze a large quantity of data seamlessly in the cloud.

The three RNA-Seq datasets now available on the Seven Bridges Platform include:

  • 530,000 human immune cell profiles from the human umbilical cord and bone marrow. This dataset was generated via the 10X Genomics protocol and called the “Census of Immune Cells”
  • 2000 human spleen cells generated via the 10X Genomics protocol, called the “Ischaemic Sensitivity of Human Tissue.” This is the first dataset of the human spleen at the single-cell level
  • An immuno-oncology focused dataset of 6639 tumor and lymph node cells from mouse, generated using the Smart-seq2 protocol, called the “Melanoma Infiltration of Stromal and Immune Cells”

Collectively, these datasets are ~1.7TB in size and each dataset has corresponding metadata indicating age, preparation information, sequencing information, and other clinically relevant factors.

Updated TCGA and TARGET datasets Genomic Data Commons

The Genomic Data Commons (GDC) periodically releases updates to the datasets it hosts, including The Cancer Genome Atlas (TCGA) and Therapeutically Applicable Research to Generate Effective Treatments (TARGET) datasets. The GDC recently released v14 and these updates are now available on the Seven Bridges Platform. Updates in v14 include:

  • New Copy Number Variation (CNV) data for TCGA projects
  • New miRNA data for TARGET and TCGA
  • New versions of TCGA biospecimen supplements
  • New harmonized WGS BAM files for TARGET data

As part of the update, some files may no longer be available. Reasons for obsolescence include:

  • New version of the data file
  • Corrected inaccuracies to improve the data file
  • Loss of permission for a particular patient data file

If you had used a file that will be deleted, Seven Bridges has contacted you via the email we have on file.

Sign up today to receive late-breaking updates from Seven Bridges and be sure to follow us on LinkedIn and Twitter.