U.S. flag

An official website of the United States government

dbGaP Special Studies Submission Guide

Go back to dbGaP Study Submission Guide

1. What files are required for a Study with External Data Source (EDS)?

External Data Source (EDS) is a non-dbGaP entity that is a public or private, national or international organization that is able to meet core NIH standards for establishing data quality and data management service protocols for NIH, based on the programmatic need of an NIH funding Institute or Center (IC). Trusted Parters is a type of EDS. When data is submitted to EDS and dbGaP functions as the Authorized Access.

dbGaP Studies with External Data Source (EDS) are designated in the dbGaP Submission System by the GPA. Instructions for GPAs can be found here. For these studies, minimally the Study Config, Subject Consent (SC), and Subject Sample Mapping (SSM) files will need to be submitted to the Submission Portal.

All new study versions must complete the Study Data Outline in the Submission Portal in order to assert what data types will be submitted and released for the current study version. Upon completion, a dbGaP study accession (phs######.v#.p#) will be provided.

Complete the Study Config web form. This will populate the public study report page.

  • Phenotype Dataset (DS) and Data Dictionary (DD) files

    • (1) Subject Consent (SC) DS and DD - the SEX variable is not required for this study
    • (1) Subject Sample Mapping (SSM) DS and DD

Please note that the Sample Attributes DS and DD is currently required in the Submission Portal. If your study does not have these files, create a txt file with the file name and extension blank.txt. The content should be completely empty. Upload blank.txt for both the Sample Attributes DS and DD. However, if you are willing to submit Sample Attributes DS and DD, see the Sample Attributes section in the Submission Guide: Sample Attributes

Frequently, these studies only have sample IDs and not subject IDs or vice versa. In such cases, please use the same ID under both the subject and sample ID columns. If dbGaP needs to hold the study until sequences are processed by a Trusted Partner or External Data Source, please let the phenotype curator know formally by email the desired date of release (only weekdays).

If your study is known to have overlapping subjects with another dbGaP study, please let us know and we will work with you to either set the dbGaP assigned repository the same or for you to provide aliases.

2. Why and how should I submit a Parent-Child Study?

dbGaP refers to a study with substudies as a parent-child study. Some submitters also refer to these studies as an umbrella or cohort study. This type of study is useful and beneficial for large cohorts with multiple submitters who are accountable to the parent study and submitting data on the same set of individuals who share the same Data Use Limitations and consents for those individuals. The submitters of each substudy may submit independently from another substudy, but the Study Investigator of the parent study will also need to approve the parent-child study for release. Users who request for a parent-child study will request for data using the parent accession. Once approved, the user will have access to all substudy data. A parent-child study cannot be released until all components from the parent study and every substudy is processed and can be released.

If a study has multiple techologies (SNP, sequencing, etc), we strongly suggest NOT submitting the study as a parent-child study. dbGaP has specific molecular data accessions that can be applied to distinguish different technologies for users under a single study accession. As long as different submitters can identify a primary contact who dbGaP can work with for questions, a standalone study would be the better option.

For a parent-child study, the parent study and all substudies will each have a unique study accession, and their own Submission Portal. Each substudy will need to independently submit a study config, phenotype data, molecular data, and any additional data specific to the substudy. If the phenotype data is shared among all substudies, the phenotype data should be submitted to the parent study Submission Portal. However, if there is data collected specifically for a substudy, the data should be submitted to the substudy Submission Portal. Molecular data should never be submitted to the parent study Submission Portal.

All new study versions must complete the Study Data Outline in the Submission Portal in order to assert what data types will be submitted and released for the current study version. Upon completion, a dbGaP study accession (phs######.v#.p#) will be provided.

  • Parent Study

    • A parent study CANNOT have molecular data. All molecular data will need to be submitted to a substudy accession.
    • Study Config - This is required to populate the study report page.
    • Phenotype Dataset (DS) and Data Dictionary (DD) files
      • (1) Subject Consent (SC) DS and DD - a single consent DS and DD should be provided that includes all unique subjects with valid consents from the parent study and substudies. Each person should be assigned a single subject ID. Also include pedigree linking members and HapMap controls with CONSENT=0.
      • (1) Subject Sample Mapping (SSM) DS and DD - unlike the SSM of a standalone study, this table should have minimally 3 columns: SUBJECT_ID, SAMPLE_ID, and STUDY, where STUDY is the substudy phs accession (phs######.v#.p#). SAMPLE_IDs are the IDs used in the molecular data. Each person may have multiple sample IDs. Like the SC, there is only one SSM per parent-child study.
      • (1) Pedigree DS and DD - Submit only if there are self-reported or known genetic relationships in the parent-child study. There is only one pedigree for the entire parent-child study.
      • (1 or more) Subject Phenotypes DS and DD - If the substudies share the same subject phenotypes, submit subject phenotypes to the parent study accession. Otherwise, submit unique substudy subject phenotypes to the applicable substudy.
      • NO sample attributes should be submitted to the parent study because molecular data is not tied directly to the parent study phs accession.
  • Substudies

    • Study Config - This is required to populate the study report page.
    • Phenotype Dataset (DS) and Data Dictionary (DD) files
      • No Subject Consent (SC) DS and DD should be provided for the substudy, since a single SC is required at the parent study level. Submit blank.txt files (an empty tab-delimited txt file with the filename "blank.txt") in place of the SC DS and DD. The substudy should never have more subjects than what is listed in the parent SC, since each subject must be consented and should only be assigned one consent value for the entire parent-child study.
      • No Subject Sample Mapping (SSM) DS and DD should be provided for the substudy, since a single SSM is required at the parent study level. Submit blank.txt files in place of the SSM DS and DD. In the event that the parent study submitters are not able to add your sample IDs to the parent-child study SSM, you will need to submit a cumulative SSM DS for your substudy to your dbGaP Submission Portal account. dbGaP will add the sample IDs to the parent study SSM. Ideally, the sample IDs for your substudy should be different from the sample IDs of other substudies in the parent-child study. The substudy SSM will not be separately released, so no SSM DD needs to be submitted.
      • No Pedigree DS and DD should be provided for the substudy, since a single pedigree is distributed at the parent study level. If your substudy has pedigree information that differs from what was provided at the parent study level, you will need to verify that it is acceptable to include your substudy pedigree under the parent study. We have encountered cases where the original cohort was not consented to have a pedigree file, but it was acceptable to include Identity By Descent (IBD) as molecular data for a substudy. There are many variations of what is acceptable, so if you have a unique scenario, please discuss with the parent study and then with a dbGaP phenotype curator.
      • (1 or more) Subject Phenotypes DS and DD - If subject phenotypes have already been submitted to the parent study, then do not submit the same subject phenotypes to the substudy. Submit only subject phenotype data specific to the substudy.
      • (1 or more) Sample Attributes DS and DD - If molecular data is being submitted for the substudy, sample attributes DS and DD must be submitted.

3. What is a dbGaP Collection and what files need to be provided?

A dbGaP Collection provides streamlined access to data across dbGaP studies or portions of dbGaP studies that share the same consent group, disease, or funding project. Data access for a collection is controlled by a single data access committee. The data in a collection is not harmonized across studies or otherwise altered from the original study. Investigators using data within a dbGaP collection are required to follow the use restrictions and acknowledgement instructions from the original dataset.

To search for dbGaP Collections, go to dbGaP Advanced Search.

To create a new dbGaP Collection, please work with your GPA to instantiate a new registration.

  • Study Config - This is required to populate the collection web page.
  • List of study accessions (phs######.v#.p#) and corresponding consents to be included in the collection. The study accessions should be parent study accessions. Do not include substudy accessions.

Upload the Study Configuration through the dbGaP Submission Portal. The list of study accessions and consents can be uploaded under "Other files" with type "Special".

Support Center

Last updated: 2023-08-21T20:42:09Z