Studies from the Trans-Omics for Precision Medicine (TOPMed) program are available for analysis on NHLBI BioData Catalyst. The TOPMed program, funded by the National Heart, Lung, and Blood Institute (NHLBI), part of the National Institutes of Health (NIH), focuses on data specifically for advancing science in the fields of heart, lung, blood, and sleep disorders. This program is generating a rich, multi-omic dataset with data from various -omics fields, such as genomics, proteomics, metabolomics, and transcriptomics, and is adding that information to pre-existing studies that have characterized thousands of phenotypic variables, including biochemical, physiological, clinical, behavioral, and anatomical measures. These TOPMed studies feature a great degree of racial and ancestral diversity compared to other genetic studies, which have typically focused on participants with European ancestry. Working with participants of other ancestries is important to advance scientific discovery for the many groups that have received significantly less (or no) attention from genetic research.
TOPMed currently includes whole-genome sequencing (WGS) of over 155,000 individuals, making it one of the largest, most diverse WGS datasets available to researchers. Furthermore, some TOPMed studies offer rich imaging datasets like that of COPDGene with 22 million CT scan images for about 10,000 study participants. The TOPMed program works alongside and complements various other programs, such as the All Of Us Research Program, The Million Veterans Program, and the NIH Database of Genotypes and Phenotypes (dbGaP). Overall, the TOPMed program is a powerful step toward developing precision medicine: using a patient’s individual genetic background and consideration of their environmental factors in order to formulate therapies and treatments specifically tailored for that individual.
TOPMed studies on NHLBI BioData Catalyst
NHLBI BioData Catalyst powered by Seven Bridges strives to facilitate access to TOPMed studies. The following TOPMed data is hosted on NHLBI BioData Catalyst today:
- Multi-sample VCFs for ~55,000 sequenced participants within 32 TOPMed studies included in Freeze 5b (see the table below for the full list).
- CRAM files and single sample VCFs for all sequenced participants, the former of which are not available on dbGaP.
- Raw phenotype files for participants in TOPMed studies, providing clinical information such as BMI and lipids levels. In some cases, these data are in different dbGaP accessions.
Seven Bridges is currently working with the NHLBI BioData Catalyst consortium to onboard additional studies and data that have been released on dbGaP as part of TOPMed Freeze 8.
Working with the TOPMed studies on the cloud
You can view the available TOPMed studies in the platform Data Browser feature and select studies to query. You can search for specific file types in the Data Browser as well as by study consent group. The following data types are available for the hosted TOPMed studies:
|Data Type property in Data Browser||Genomic data file type|
|Aligned Reads||CRAM files|
|Simple Germline Variation||Single sample VCF files|
|Unharmonized Clinical Data||Raw phenotype files from dbGaP|
|Variant Call||Multi-sample VCF files|
You can find these files by searching the “File” entity and using the “Data Type” property in the Data Browser, as shown in the image below:
The phenotype data and multi-sample VCF data are currently in tar bundles for each consent group within a TOPMed study. These tar files can be decompressed on the platform using the Seven Bridges Decompressor App by searching “decompressor” in the Public Gallery of Apps.
The Data Browser exposes only open metadata from the TOPMed studies for search, so all researchers are able to do the same searches and see the existence of all files. However, only users with appropriate dbGaP approval can add files to their project to use in an analysis. A service within NHLBI BioData Catalyst programmatically reads user permissions from dbGaP to determine if a user can access particular files on the system. This service recognizes if a user has a Data Access Request in dbGaP for TOPMed data or if a user is set as a “dbGaP downloader” for a particular dataset. Therefore, these are the two mechanisms for getting access to TOPMed studies on NHLBI BioData Catalyst. Please note that phenotype and genotype data for some studies are in different dbGaP accessions. More information is available on the Data page of the NHLBI BioData Catalyst website.
To learn more about how to get started working with the TOPMed studies on NHLBI BioData Catalyst powered by Seven Bridges, take a look at the Onboarding to Seven Bridges Tutorial which describes how to create an account, set up projects, run analyses, and find the TOPMed data in the Data Browser.
Further information on the hosted datasets can also be found on the Seven Bridges Documentation section “Datasets Hub”.
For more information on which TOPMed studies and parent studies are offered, including their phs identification numbers used by dbGaP, please see the tables below:
Hosted TOPMed study accessions with genomic data from Freeze 5b
|Study Name||Acronym||phs I.D. #|
|NHLBI TOPMed: Genetics of Cardiometabolic Health in the Amish||Amish||phs000956|
|NHLBI TOPMed: Atherosclerosis Risk in Communities||ARIC||phs001211|
|NHLBI TOPMed: The Genetics and Epidemiology of Asthma in Barbados||BAGS||phs001143|
|NHLBI TOPMed: Cleveland Clinic Atrial Fibrillation Study||CCAF||phs001189|
|NHLBI TOPMed: The Cleveland Family Study||CFS||phs000954|
|NHLBI TOPMed: Cardiovascular Health Study||CHS||phs001368|
|NHLBI TOPMed: Genetic Epidemiology of COPD||COPDGene||phs000951|
|NHLBI TOPMed: The Genetic Epidemiology of Asthma in Costa Rica||CRA||phs000988|
|NHLBI TOPMed: Diabetes Heart Study||DHS||phs001412|
|NHLBI TOPMed: Boston Early-Onset COPD Study||EOCOPD||phs000946|
|NHLBI TOPMed: Framingham Heart Study||FHS||phs000974|
|NHLBI TOPMed: Genes-Environments and Admixture in Latino Asthmatics||GALAII||phs000920|
|NHLBI TOPMed: Genetic Study of Atherosclerosis Risk||GeneSTAR||phs001218|
|NHLBI TOPMed: Genetic Epidemiology Network of Arteriopathy||GENOA||phs001345|
|NHLBI TOPMed: Genetic Epidemiology Network of Salt Sensitivity||GenSalt||phs001217|
|NHLBI TOPMed: Epigenetic Determinants of Lipid Response to Dietary Fat and Fenofibrate||GOLDN||phs001359|
|NHLBI TOPMed: Heart and Vascular Health Study||HVH||phs000993|
|NHLBI TOPMed: Genetics of Left Ventricular Hypertrophy||HyperGEN||phs001293|
|NHLBI TOPMed: The Jackson Heart Study||JHS||phs000964|
|NHLBI TOPMed: Whole Genome Sequencing of Venous Thromboembolism||Mayo_VTE||phs001402|
|NHLBI TOPMed: The Multi-Ethnic Study of Atherosclerosis||MESA||phs001416|
|NHLBI TOPMed: Massachusetts General Hospital (MGH) Atrial Fibrillation Study||MGH_AF||phs001062|
|NHLBI TOPMed: Partners HealthCare Biobank||Partners||phs001024|
|NHLBI TOPMed: San Antonio Family Heart Study||SAFS||phs001215|
|NHLBI TOPMed: Study of African Americans, Asthma, Genes and Environment||SAGE||phs000921|
|NHLBI TOPMed: African American Sarcoidosis Genetics Resource||Sarcoidosis||phs001207|
|NHLBI TOPMed: Genome-wide Association Study of Adiposity in Samoans||SAS||phs000972|
|NHLBI TOPMed: Rare Variants for Hypertension in Taiwan Chinese||THRV||phs001387|
|NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Ablation Registry||VAFAR||phs000997|
|NHLBI TOPMed: The Vanderbilt Atrial Fibrillation Registry||VU_AF||phs001032|
|NHLBI TOPMed: The Women’s Genome Health Study||WGHS||phs001040|
|NHLBI TOPMed: Women’s Health Initiative||WHI||phs001237|
Hosted TOPMed study accessions with phenotype data
|Study Name||Acronym||phs I.D. #|
|Atherosclerosis Risk in Communities||ARIC||phs000280|
|Cleveland Clinic Atrial Fibrillation Study||CCAF||phs000820|
|The Cleveland Family Study||CFS||phs000284|
|Cardiovascular Health Study||CHS||phs000287|
|Genetic Epidemiology of COPD||COPDGene||phs000179|
|Framingham Heart Study||FHS||phs000007|
|Genes-Environments and Admixture in Latino Asthmatics||GALAII||phs001180|
|Genetic Study of Atherosclerosis Risk||GENESTAR||phs001074|
|Genetic Epidemiology Network of Arteriopathy||GENOA||phs001238|
|Genetic Epidemiology Network of Salt Sensitivity||GENSALT||phs000784|
|Heart and Vascular Health Study||HVH||phs001013|
|The Jackson Heart Study||JHS||phs000286|
|The Multi-Ethnic Study of Atherosclerosis||MESA||phs000209|
|Massachusetts General Hospital (MGH) Atrial Fibrillation Study||MGH_AF||phs001001|
|Women’s Health Initiative||WHI||phs000200|