What Makes TOPMed Datasets So Special?

BioData CatalystPlatform
Back to Blog

What Makes TOPMed Datasets So Special?

Studies from the Trans-Omics for Precision Medicine (​TOPMed​) program are available for analysis on NHLBI BioData Catalyst. The TOPMed program, funded by the National Heart, Lung, and Blood Institute (NHLBI), part of the National Institutes of Health (NIH), focuses on data specifically for advancing science in the fields of heart, lung, blood, and sleep disorders. This program is generating a rich, multi-omic dataset with data from various -omics fields, such as genomics, proteomics, metabolomics, and transcriptomics, and is adding that information to pre-existing studies that have characterized thousands of phenotypic variables, including biochemical, physiological, clinical, behavioral, and anatomical measures. These TOPMed studies feature a great degree of racial and ancestral diversity compared to other genetic studies, which have typically focused on participants with European ancestry. Working with participants of other ancestries is important to advance scientific discovery for the many groups that have received significantly less (or no) attention from genetic research.

TOPMed currently includes whole-genome sequencing (WGS) of over 155,000 individuals, making it one of the largest, most diverse WGS datasets available to researchers. Furthermore, some TOPMed studies offer rich imaging datasets like that of COPDGene with 22 million CT scan images for about 10,000 study participants. The TOPMed program works alongside and complements various other programs, such as the ​All Of Us​ Research Program, The Million Veterans Program, and the NIH Database of Genotypes and Phenotypes (dbGaP​). Overall, the TOPMed program is a powerful step toward developing precision medicine: using a patient’s individual genetic background and consideration of their environmental factors in order to formulate therapies and treatments specifically tailored for that individual.

TOPMed studies on NHLBI BioData Catalyst

NHLBI BioData Catalyst powered by Seven Bridges​ strives to facilitate access to TOPMed studies. The following TOPMed data is hosted on NHLBI BioData Catalyst today:

  • Multi-sample VCFs for participants within 42 TOPMed studies. This includes multi-sample VCFs from Freeze8 as well as multi-sample VCFs from ​Freeze 5b​. See the table below for the full list of hosted studies.
  • CRAM files and single sample VCFs for all sequenced participants, the former of which are not available on dbGaP.
  • Raw phenotype files for participants in TOPMed studies, providing clinical information such as BMI and lipids levels. In some cases, these data are in different dbGaP accessions.
  • Single sample VCFs for subjects included in Freeze5.

Seven Bridges is currently working with the NHLBI BioData Catalyst consortium to onboard additional studies and data that have been released on dbGaP as part of TOPMed Freeze 8​.

Working with the TOPMed studies on the cloud

You can view the available TOPMed studies in the platform Data Browser feature and select studies to query. You can search for specific file types in the Data Browser as well as by study consent group. The following data types are available for the hosted TOPMed studies:

Data Type property in Data BrowserGenomic data file type
Aligned ReadsCRAM files
Simple Germline VariationSingle sample VCF files
Unharmonized Clinical DataRaw phenotype files from dbGaP
Variant CallMulti-sample VCF files

You can find these files by searching the “File” entity and using the “Data Type” property in the Data Browser, as shown in the image below:

You can search specifically for the Freeze8 multi-sample VCFs by checking the box next to “Variant Call” in the Data Type property and then subsequently adding the “Freeze” property to the search as shown below:

A search for Freeze8 multi-sample VCFs from a particular study and consent group will result in VCF files for each chromosome. This is in contrast to dbGaP, where the released Freeze8 data is available in tar bundles. Please note that the Freeze5 multi-sample VCFs are in tar bundles for each consent group within a TOPMed study. These tar files can be decompressed on the platform using the Seven Bridges Decompressor App by searching “decompressor” in the Public Gallery of Apps.

The ​Data Browser exposes only open metadata from the TOPMed studies for search, so all researchers are able to do the same searches and see the existence of all files. However, only users with appropriate dbGaP approval can add files to their project to use in an analysis. ​A service within NHLBI BioData Catalyst programmatically reads user permissions from dbGaP to determine if a user can access particular files on the system. This service recognizes if a user has a Data Access Request in dbGaP for TOPMed data or if a user is set as a “dbGaP downloader” for a particular dataset. Therefore, these are the two mechanisms for getting access to TOPMed studies on NHLBI BioData Catalyst. Please note that phenotype and genotype data for some studies are in different dbGaP accessions. More information is available on the ​Data page​ ​of the NHLBI BioData Catalyst website​.

To learn more about how to get started working with the TOPMed studies on NHLBI BioData Catalyst powered by Seven Bridges, take a look at the ​Getting Started Guidewhich describes how to create an account, set up projects, run analyses, and find the TOPMed data in the Data Browser.

Further information on the hosted datasets can also be found on the ​Seven Bridges Documentation section “Datasets Hub”​.

For more information on which TOPMed studies and parent studies are offered, including their phs identification numbers used by dbGaP, please see the tables below:

Hosted TOPMed study accessions with genomic data from Freeze 5b

 

Study Name

Acronym

phs ID #

Freeze5b data?

Freeze8 data?

NHLBI TOPMed: Genetics of Cardiometabolic Health in the Amish

Amish

phs000956

NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Atherosclerosis Risk in Communities

ARIC

phs001211

NHLBI TOPMed: The Genetics and Epidemiology of Asthma in Barbados

BAGS

phs001143

Coming soon

NHLBI TOPMed: NHGRI CCDG: The BioMe Biobank at Mount Sinai

BIOME

phs001644

 

NHLBI TOPMed: Childhood Asthma Management Program

CAMP

phs001726

 

NHLBI TOPMed: Coronary Artery Risk Development in Young Adults (CARDIA)

CARDIA

phs001612

 

NHLBI TOPMed: Cleveland Clinic Atrial Fibrillation Study

CCAF

phs001189

NHLBI TOPMed: The Cleveland Family Study 

CFS

phs000954

NHLBI TOPMed: Trans-Omics for Precision Medicine (TOPMed) Whole Genome Sequencing Project: Cardiovascular Health Study

CHS

phs001368

NHLBI TOPMed: Genetic Epidemiology of COPD (COPDGene) Funded by the National Heart Lung and Blood Institute (NHLBI) in the NHLBI Trans-Omics for Precision Medicine (TOPMed) Program

COPDGene

phs000951

Coming soon

NHLBI TOPMed: The Genetic Epidemiology of Asthma in Costa Rica

CRA

phs000988

Coming soon

NHLBI TOPMed: Diabetes Heart Study African American Coronary Artery Calcification (AA CAC)

DHS

phs001412

Coming soon

NHLBI TOPMed: Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints

ECLIPSE

phs001472

 

Coming soon

NHLBI TOPMed: Boston Early-Onset COPD Study in the National Heart Lung and Blood Institute (NHLBI) in the NHLBI Trans-Omics for Precision Medicine (TOPMed) Program

EOCOPD

phs000946

Coming soon

NHLBI TOPMed: Genomic Activities such as Whole Genome Sequencing and Related Phenotypes in the Framingham Heart Study 

FHS

phs000974

NHLBI TOPMed: Genes-Environments and Admixture in Latino Asthmatics 

GALAII

phs000920

Coming soon

NHLBI TOPMed: Genetic Study of Atherosclerosis Risk

GeneSTAR

phs001218

NHLBI TOPMed: Genetic Epidemiology Network of Arteriopathy

GENOA

phs001345

Coming soon

NHLBI TOPMed: Genetic Epidemiology Network of Salt Sensitivity

GenSalt

phs001217

Coming soon

NHLBI TOPMed: Epigenetic Determinants of Lipid Response to Dietary Fat and Fenofibrate

GOLDN

phs001359

NHLBI TOPMed: NHGRI CCDG: Hispanic Community Health Study/Study of Latino

HCHS_SOL

phs001395

 

NHLBI TOPMed: Heart and Vascular Health Study

HVH

phs000993

NHLBI TOPMed: Genetics of Left Ventricular Hypertrophy

HyperGEN

phs001293

NHLBI TOPMed: The Jackson Heart Study

JHS

phs000964

NHLBI TOPMed – NHGRI CCDG: The Johns Hopkins University School of Medicine Atrial Fibrillation Genetics Study

JHU_AF

phs001598

 

NHLBI TOPMed: Whole Genome Sequencing of Venous Thromboembolism (WGS of VTE)

MAYO_VTE

phs001402

NHLBI TOPMed: Multi-Ethnic Study of Atherosclerosis (MESA) and MESA Family AA-CAC

MESA

phs001416

NHLBI TOPMed: Massachusetts General Hospital (MGH) Atrial Fibrillation Study

MGH_AF

phs001062

Coming soon

NHLBI TOPMed: Defining the time-dependent genetic and transcriptomic responses to cardiac injury among patients with arrhythmias 

MIRHYTHM

phs001434

 

NHLBI TOPMed: Partners HealthCare Biobank

Partners

phs001024

NHLBI TOPMed: Pediatric Cardiac Genomics Consortium’s Congenital Heart Disease Biobank

PCGC_CHD

phs001735

 

Coming soon

NHLBI TOPMed: Pulmonary Hypertension and the Hypoxic Response in SCD

PUSH_SCD

phs001682

 

NHLBI TOPMed: Recipient Epidemiology and Donor Evaluation Study-III Brazil Sickle Cell Disease Cohort

REDS-III_BRAZIL

phs001468

 

NHLBI TOPMed: San Antonio Family Heart Study

SAFS

phs001215

NHLBI TOPMed: Study of African Americans, Asthma, Genes and Environment

SAGE

phs000921

Coming soon

NHLBI TOPMed: African American Sarcoidosis Genetics Resource

Sarcoidosis

phs001207

Coming soon

NHLBI TOPMed: Severe Asthma Research Program

SARP

phs001446

 

NHLBI TOPMed: Genome-wide Association Study of Adiposity in Samoans

SAS

phs000972

NHLBI TOPMed: Rare Variants for Hypertension in Taiwan Chinese

THRV

phs001387

Coming soon

NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Ablation Registry

VAFAR

phs000997

NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Registry

VU_AF

phs001032

NHLBI TOPMed: Walk-PHaSST Sickle Cell Disease

WALK_PHASST

phs001514

 

NHLBI TOPMed: Novel Risk Factors for the Development of Atrial Fibrillation in Women

WGHS

phs001040

Coming soon

NHLBI TOPMed: Women’s Health Initiative

WHI

phs001237

Hosted TOPMed study accessions with phenotype data

Study Name

Acronym

phs I.D. #

Currently hosted?

Atherosclerosis Risk in Communities Cohort

ARIC

phs000280


Lung Cohorts Exome Sequencing Project

Asthma

phs000422

Coming soon

CATHeterization GENetics

CATHGEN

phs000703

Coming soon

CCF AFIB GWAS Study

CCAF

phs000820

NHLBI Cleveland Family Study Candidate Gene Association Resource (CARe)

CFS

phs000284

Cardiovascular Health Study Cohort

CHS

phs000287

Genetic Epidemiology of COPD

COPDGene

phs000179

The Diabetes Heart Study

DHS

phs001012

Coming soon

Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints

ECLIPSE

phs001252

Coming soon

Framingham Heart Study Cohort

FHS

phs000007

Genes-Environments and Admixture in Latino Asthmatics

GALAII

phs001180

GeneSTAR NextGen Functional Genomics of Platelet Aggregation

GeneSTAR

phs001074

Genetic Epidemiology Network of Arteriopathy

GENOA

phs001238

Genetic Epidemiology Network of Salt Sensitivity

GenSalt

phs000784

Hispanic Community Health Study/Study of Latinos

HCHS-SOL

phs000810

Coming soon

Heart and Vascular Health Study

HVH

phs001013

The Jackson Heart Study

JHS

phs000286

The Multi-Ethnic Study of Atherosclerosis Cohort

MESA

phs000209

Massachusetts General Hospital (MGH) Atrial Fibrillation Study

MGH_AF

phs001001

Pediatric Cardiac Genetics Consortium

PCGC

phs001194

Coming soon

A Genome-Wide Association Comparative Analysis of Response of AF Patients to Rate Control Therapy

PGRN-RIKEN_AF

phs000439

Coming soon

National Heart, Lung, and Blood Institute SNP Health Association Asthma Resource Project

SHARP

phs000166

Coming soon

Women’s Health Initiative

WHI

phs000200

Be sure to receive late-breaking updates from Seven Bridges and follow us on ​LinkedIn and ​Twitter​.