Research Projects Directory

Research Projects Directory

12,351 active projects

This information was updated 7/14/2024

The Research Projects Directory includes information about all projects that currently exist in the Researcher Workbench to help provide transparency about how the Workbench is being used. Each project specifies whether Registered Tier or Controlled Tier data are used.

Note: Researcher Workbench users provide information about their research projects independently. Views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program. Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

Food Insecurity and Eye Diseases Project

The specific scientific questions I intend to study are: 1. Is there an association between food insecurity and chronic eye diseases such as glaucoma, age-related macular degeneration (AMD), and diabetic retinopathy? 2. Are there any mediator factors between food insecurity…

Scientific Questions Being Studied

The specific scientific questions I intend to study are:
1. Is there an association between food insecurity and chronic eye diseases such as glaucoma, age-related macular degeneration (AMD), and diabetic retinopathy?
2. Are there any mediator factors between food insecurity and these eye diseases, such as metabolic diseases like diabetes?

Understanding the impact of food insecurity on eye health can help inform public health interventions aimed at reducing the burden of chronic eye diseases. Investigating potential mediator factors, such as metabolic diseases, can provide a comprehensive understanding of the pathways through which food insecurity impacts eye health. This knowledge can lead to more effective and holistic healthcare interventions.

Overall, this research can provide valuable insights into the complex interplay between social determinants of health, metabolic diseases, and chronic eye conditions, ultimately contributing to improved public health strategies and outcomes.

Project Purpose(s)

  • Disease Focused Research (Glaucoma, AMD, Diabetic Retinopathy)
  • Social / Behavioral

Scientific Approaches

For my study on the association between food insecurity and chronic eye diseases, and the potential mediating role of metabolic diseases, I plan to use the following scientific approaches:

Datasets
- Population-based data: This dataset includes diverse participants with detailed EHR and responses to survey questions.
- Study Population: Participants who have both EHR data and survey responses regarding food insecurity.
- Primary Outcomes: Diagnoses of glaucoma, AMD, and diabetic retinopathy, identified using ICD-9 and ICD-10 codes.

Methods
Cohort study:
- This design will allow us to examine the association between food insecurity and the prevalence of chronic eye diseases.

Statistical Analysis
Mediation Analysis:
- Investigate potential mediating factors, such as metabolic diseases (e.g., diabetes), in the relationship between food insecurity and chronic eye diseases.
- Use statistical methods for mediation analysis to estimate direct and indirect effects.

Tools: R or Python

Anticipated Findings

Anticipated Findings
1. Association Between Food Insecurity and Chronic Eye Diseases
2. Role of Metabolic Diseases as Mediators: Diabetes and other metabolic diseases are expected to mediate the relationship between food insecurity and chronic eye diseases.

Contribution to Scientific Knowledge
1. Public Health Impact:
-Broader Understanding: This study will expand knowledge on the public health implications of food insecurity, emphasizing its impact on eye health.
-Focus on Vulnerable Populations: Vulnerable populations are defined as individuals who responded "yes" to either of the two food insecurity questions in the AoU “Social Determinants of Health” survey.

2. Healthcare Practices and Policies:
-Targeted Interventions: Inform healthcare providers and policymakers about the importance of screening for food insecurity in patients with chronic eye diseases.
-Integrated Care Models: Support the development of care models that combine nutritional support with eye health management.

Demographic Categories of Interest

  • Others

Data Set Used

Registered Tier

Research Team

Owner:

  • Deyu Sun - Graduate Trainee, University of California, Los Angeles

Collaborators:

  • Ramin Talebi - Graduate Trainee, University of California, Los Angeles

Contrastive Learning - Glaucoma

We have previously published a predictive model of glaucoma progression using EHR data pertaining to systemic attributes from a single institution. We aim to use the All of Us dataset to 1) serve as external validation for this single-center model…

Scientific Questions Being Studied

We have previously published a predictive model of glaucoma progression using EHR data pertaining to systemic attributes from a single institution. We aim to use the All of Us dataset to 1) serve as external validation for this single-center model and 2) train new models focused on predicting glaucoma progression using systemic predictors. This is important to understand whether the original findings are generalizable and provide knowledge about the utility of systemic predictors on a national-level dataset. The citation is:
Baxter, S. L., Saseendrakumar, B. R., Paul, P., Kim, J., Bonomi, L., Kuo, T. T., Loperena, R., Ratsimbazafy, F., Boerwinkle, E., Cicek, M., Clark, C. R., Cohn, E., Gebo, K., Mayo, K., Mockrin, S., Schully, S. D., Ramirez, A., Ohno-Machado, L., & All of Us Research Program Investigators (2021). Predictive Analytics for Glaucoma Using Data From the All of Us Research Program. American journal of ophthalmology, 227, 74–86. https://doi.org/10.1016/j.ajo.2021.01.008

Project Purpose(s)

  • Disease Focused Research (Primary open angle glaucoma)
  • Other Purpose (This work is the result of an All of Us Research Program Demonstration Project. Demonstration Projects are efforts by the All of Us Research Program designed to meet the goal of ensuring the quality and utility of the Research Hub as a resource for accelerating precision medicine. This work has been approved, reviewed, and overseen by the All of Us Research Program Science Committee and Data and Research Center to ensure compliance with program policy. )

Scientific Approaches

We plan to primarily work with EHR data contained in All of Us for a cohort of adult participants diagnosed with primary open-angle glaucoma. We will extract data on systemic conditions and medications for this cohort, as well as physical measurements and vital signs. We will clean the data such that the format is consistent with the data from our previously published model. Then, we will use this data as an external validation of a logistic regression model derived from our prior study that was based at a single academic center. Next, we will use All of Us data to train a new set of models, using techniques such as logistic regression, random forests, and artificial neural networks. We will optimize these models using feature selection methods and class balancing procedures. By evaluating performance metrics such as area under the curve (AUC), precision, recall, and accuracy, we will assess whether we can achieve superior predictive performance when training models using All of Us.

Anticipated Findings

We anticipate that the All of Us data will validate the findings from the model, which demonstrated that blood pressure-related metrics and certain medication classes had predictive value for glaucoma progression. In addition, we anticipate that the models trained with All of Us data will outperform the model trained with single institution data due to larger sample size and greater diversity. These findings will support further investigation in understanding the relationship between systemic conditions like blood pressure with glaucoma progression.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

EM prev/comorbid

This cross-sectional study aims to characterize the prevalence and comorbidities associated with erythromelalgia diagnosis in the United States.

Scientific Questions Being Studied

This cross-sectional study aims to characterize the prevalence and comorbidities associated with erythromelalgia diagnosis in the United States.

Project Purpose(s)

  • Disease Focused Research (Erythromelalgia)

Scientific Approaches

This study will identify patients that have been diagnosed with erythromelalgia (ICD: I73.81) and to determine prevalence, demographics, and comorbidities within this cohort, especially those that are overrepresented with this diagnosis via odds ratio calculations.

Anticipated Findings

Erythromelalgia is an overall rare disease with multiple potential etiologies. Understanding associations of comorbidities can help to elucidate some of the underlying processes that may contribute to pathogenesis.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Kevin Yang - Research Fellow, University of Alabama at Birmingham

backupCMS_Disparities in maternal mortality and morbidity in the USA]

Maternal health is an important part of the health system of any country. Wit U.S, maternal mortality and morbidity is higher compared to any other developed country. According to PMSS report maternal death rate is 17.3 per 100,000 live birth.…

Scientific Questions Being Studied

Maternal health is an important part of the health system of any country. Wit U.S, maternal mortality and morbidity is higher compared to any other developed country. According to PMSS report maternal death rate is 17.3 per 100,000 live birth. The World Health Organization (WHO) report says the position of the U.S in maternal mortality ranking is 56, which is unacceptable for a developed country. A clear picture of disparity is present in every report dealing this topic. The mortality rate among black American women is about 3 times higher than Non-Hispanic white women. The death rate among other minorities like Non-Hispanic American Indian or Alaska Native, Asian-Pacific Islander is also higher. The case of maternal morbidity is also not different. The maternal death rate among Hispanic-Whites are lower, however Severe Maternal Morbidity (SMM) is higher among this minority group.

Project Purpose(s)

  • Population Health
  • Social / Behavioral
  • Educational

Scientific Approaches

With the time-stamped data for different procedures, laboratory results, and other hospital visits for the patient cohort, we aim to develop a process mining algorithm to identify variations in care pathways that cause adverse maternal outcomes. Process mining approaches in healthcare to identify variability in system level factors is a newer approach to conduct disparity research. Our research will address this gap in literature.
We hope to address the potential stigmatization issues by educating necessary stakeholders including hospital, providers, and policymakers. Once we have a preliminary framework, we hope to conduct a community based participatory research and engage with the community members. We propose that the process mining approach would help providers identify the “hotspots” in the care pathways that cause disparities.

Anticipated Findings

The major factors causing maternal mortality and morbidity are sociodemographic, socioeconomic, provider factors and system level factors. This research investigates the system level factors that can cause disparity in maternal health. With the AllofUs data we are trying to group the women utilized the healthcare system for their maternal care, with respect to their race/ethnicity, pregnancy complications, outcome etc. and find out the factors that caused adverse pregnancy outcome, mortality, and morbidity. Moreover, we apply novel process mining approaches to map the patient cohort and identify any changes in care pathways that may result in disparities.

The research will be helpful to find out the system-level factors other than income, insurance, or social status causing disparity in maternal health. Also, it can help in reducing those factors that have a major role in maternal health care disparities.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography
  • Access to Care
  • Education Level
  • Income Level
  • Others

Data Set Used

Registered Tier

Research Team

Owner:

Genetic Ancestry Proportions of T2D and T2DN

Genetic studies have focused almost exclusively on excess risk for poor health, yet there is considerable ethnic heterogeneity in African derived populations such that not all Black/African Americans (B/AA) have poor type 2 diabetes (T2D) associated comorbidities. Variants in the…

Scientific Questions Being Studied

Genetic studies have focused almost exclusively on excess risk for poor health, yet there is considerable ethnic heterogeneity in African derived populations such that not all Black/African Americans (B/AA) have poor type 2 diabetes (T2D) associated comorbidities. Variants in the engulfment and cell motility 1 (ELMO1) gene have been previously associated with protection against end-stage-renal disease (ESRD) due to T2D in a B/AA case-control cohort. Leak et al. (Ann Hum Genet. 2009; 73(2):152-9) performed the first comprehensive evaluation of variations across ELMO1 gene in a large B/AA T2D-ESRD case-control population and identified that the minor allele of four intronic variants showed protection, with odds ratios between 0.77-0.84.
This study will assess the replication of association of previously discovered ELMO1 variants with T2D-ESRD, as well as further refine the location of associations within the ~126kb region.

Project Purpose(s)

  • Disease Focused Research (end stage renal failure, type 2 diabetes)
  • Social / Behavioral
  • Control Set
  • Ancestry

Scientific Approaches

We propose to perform a case-control association study using the 26,405 genetic variants spanning introns 8 through 15 of the ELMO1 gene (GRCh38 coordinates chr7: 37133234-37259180) in B/AA case-control participants B/AA cases (diagnosis of T2D-ESRD) and controls (without a current diagnosis of T2D and ESRD) will be used for the current analysis.

Tests of association under the three a priori genetic models (additive, dominant, and recessive) will be reported. T2D-ESRD phenotypes (microalbuminuria, eGFR, creatinine, etc) will be performed using a series of analysis of variance. SNPs that showed nominal evidence for association will be further adjusted for age, sex, and genome-wide principal components.

Adjusted multivariable linear regression and unconditional logistic regression analyses will be performed.

Anticipated Findings

To our knowledge, the ELMO1 is the first candidate gene reported to show protection against T2D-ESRD in B/AAs. Hence, we aim to confirm and extend the previous report of associations between ELMO1 variants with T2D-ESRD and quantitative traits in B/AA.

As public health has shown B/AA with T2D are at an increased risk for developing ESRD in the presence of a family history of ESRD. To date genetic studies have focused almost exclusively

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Muhammed Idris - Early Career Tenure-track Researcher, Morehouse School of Medicine

Collaborators:

  • Viviane Schuch - Research Fellow, Morehouse School of Medicine
  • Tennille Leak-Johnson - Early Career Tenure-track Researcher, Morehouse School of Medicine

Duplicate of CMS_Disparities in maternal mortality and morbidity in the USA]

Maternal health is an important part of the health system of any country. Wit U.S, maternal mortality and morbidity is higher compared to any other developed country. According to PMSS report maternal death rate is 17.3 per 100,000 live birth.…

Scientific Questions Being Studied

Maternal health is an important part of the health system of any country. Wit U.S, maternal mortality and morbidity is higher compared to any other developed country. According to PMSS report maternal death rate is 17.3 per 100,000 live birth. The World Health Organization (WHO) report says the position of the U.S in maternal mortality ranking is 56, which is unacceptable for a developed country. A clear picture of disparity is present in every report dealing this topic. The mortality rate among black American women is about 3 times higher than Non-Hispanic white women. The death rate among other minorities like Non-Hispanic American Indian or Alaska Native, Asian-Pacific Islander is also higher. The case of maternal morbidity is also not different. The maternal death rate among Hispanic-Whites are lower, however Severe Maternal Morbidity (SMM) is higher among this minority group.

Project Purpose(s)

  • Population Health
  • Social / Behavioral
  • Educational

Scientific Approaches

With the time-stamped data for different procedures, laboratory results, and other hospital visits for the patient cohort, we aim to develop a process mining algorithm to identify variations in care pathways that cause adverse maternal outcomes. Process mining approaches in healthcare to identify variability in system level factors is a newer approach to conduct disparity research. Our research will address this gap in literature.
We hope to address the potential stigmatization issues by educating necessary stakeholders including hospital, providers, and policymakers. Once we have a preliminary framework, we hope to conduct a community based participatory research and engage with the community members. We propose that the process mining approach would help providers identify the “hotspots” in the care pathways that cause disparities.

Anticipated Findings

The major factors causing maternal mortality and morbidity are sociodemographic, socioeconomic, provider factors and system level factors. This research investigates the system level factors that can cause disparity in maternal health. With the AllofUs data we are trying to group the women utilized the healthcare system for their maternal care, with respect to their race/ethnicity, pregnancy complications, outcome etc. and find out the factors that caused adverse pregnancy outcome, mortality, and morbidity. Moreover, we apply novel process mining approaches to map the patient cohort and identify any changes in care pathways that may result in disparities.

The research will be helpful to find out the system-level factors other than income, insurance, or social status causing disparity in maternal health. Also, it can help in reducing those factors that have a major role in maternal health care disparities.

Demographic Categories of Interest

  • Race / Ethnicity
  • Geography
  • Access to Care
  • Education Level
  • Income Level
  • Others

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Sreenath Chalil Madathil - Early Career Tenure-track Researcher, Binghamton University
  • Ashaar Rasheed - Graduate Trainee, Binghamton University

Ophthalmic Analgesia

We are hoping to explore patterns of use of analgesia for patients with ocular surface conditions, and associated outcomes. Ocular surface conditions can be extremely painful, but many methods of providing analgesia have systemic or local toxicities. Therefore, we aim…

Scientific Questions Being Studied

We are hoping to explore patterns of use of analgesia for patients with ocular surface conditions, and associated outcomes. Ocular surface conditions can be extremely painful, but many methods of providing analgesia have systemic or local toxicities. Therefore, we aim to characterize patterns in current use of these analgesic agents and determine associations with long-term outcomes.

Project Purpose(s)

  • Disease Focused Research (ocular surface conditions)
  • Population Health

Scientific Approaches

We will use EHR data in All of Us to evaluate patients with ocular surface conditions who did and did not receive analgesic agents. We will use propensity score overlap weighting methods to compare receipt of analgesia with no receipt through target trial emulation methodology.

Anticipated Findings

We anticipate that this study will provide important information on the potential safety signals of analgesic medication use for treating ocular surface conditions.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Jay Lusk - Research Fellow, Duke University

Tutorial 2

The purpose of this Workspace is to explore through data of All of Us and eventually conduct an analysis of certain data study related to Generalized Anxiety Disorder.

Scientific Questions Being Studied

The purpose of this Workspace is to explore through data of All of Us and eventually conduct an analysis of certain data study related to Generalized Anxiety Disorder.

Project Purpose(s)

  • Educational

Scientific Approaches

The scientific approaches I plan to use includes investigating in Cohorts to find interesting study ideas and then conducting the datasets and make interpretations.

Anticipated Findings

The anticipated findings from the study includes understand of All of Us Data and what we can do with RStudio.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

nlrisk

How do we best model disease risk across the phenome and across data modalities, including genotypes, biometrics, and a range of other assays? How well do these models work in different populations? How can we improve the transfer of models…

Scientific Questions Being Studied

How do we best model disease risk across the phenome and across data modalities, including genotypes, biometrics, and a range of other assays?
How well do these models work in different populations?
How can we improve the transfer of models between different populations?

Improved risk models can potentially be used to improve disease prevention, stratified testing or treatment, thus improving health outcomes and decreasing unnecessary interventions.
Not all populations are represented equally in large population cohorts and thus risk modelling performance varies across populations. With better models, we hope to increase how much we can learn about risk in underrepresented populations from overrepresented populations, thus making advances in risk modelling more widely available.

Project Purpose(s)

  • Methods Development
  • Control Set
  • Ancestry

Scientific Approaches

We develop and evaluate nonlinear risk models based on artificial neural networks, and compare them to a wide range of established methods. This work is perfomed in multiple population cohorts like All of Us and the UK-Biobank.
Neural networks are trained in pytorch, and compared to existing tools across a range of different metrics capturing performance across a population, in high-risk subpopulations, and calibration. Additionally, we investigate how interpretable each method is, and how useful these interpretations can be.

Anticipated Findings

We hypothesize that in high-dimensional settings, non-linear models like neural networks can outperform linear models. Furthermore, we hypothesize that these are more robust to population shifts when optimized correctly.
If this is the case it would indicate, that more focus should be put on the non-linear relations underlying human disease risk.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Sedra Abou Ghaloun - Student, Charité Universitätsmedizin Berlin

Genetics of Smoking

We want to test whether polygenic risk for nicotine dependence predicts risk for mental illness. This question is important because it may identify key causal pathways to mental illness.

Scientific Questions Being Studied

We want to test whether polygenic risk for nicotine dependence predicts risk for mental illness. This question is important because it may identify key causal pathways to mental illness.

Project Purpose(s)

  • Other Purpose (The purpose of this project is to estimate the influence of genetic risk for nicotine dependence on other illnesses.)

Scientific Approaches

We will use the controlled tier dataset 7 to create polygenic risk scores (PRS). PRS will then be used in regression models to predict mental health outcomes, adjusting for relevant demographic and behavioral characteristics. Analyses will be run using R statistical software.

Anticipated Findings

We anticipate that genetic risk for nicotine dependence will be associated with increased probability of participants reporting diagnosis of various mental health conditions.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Jun Qian - Other, All of Us Program Operational Use
  • Henry Condon - Project Personnel, All of Us Program Operational Use
  • David Bond - Mid-career Tenured Researcher, Johns Hopkins University
  • Brion Maher - Late Career Tenured Researcher, Johns Hopkins University

Improving Black Maternal Outcomes through Predictive Analytics

The significant gap lies in the effect of Social Determinants of Health (SDoH) and the association between psychological and physiological stressors and racial health disparities on maternal outcomes in Black women. Our research will carry out a collaborative approach and…

Scientific Questions Being Studied

The significant gap lies in the effect of Social Determinants of Health (SDoH) and the association between psychological and physiological stressors and racial health disparities on maternal outcomes in Black women. Our research will carry out a collaborative approach and the use of predictive analysis with Artificial Intelligence and Machine Learning to improve maternal health outcomes in Black women. .

Project Purpose(s)

  • Disease Focused Research (pregnancy complications)
  • Population Health

Scientific Approaches

By leveraging the comprehensive datasets from AllofUs and Urchin, our research team can magnify the depth and breadth of the healthcare studies, leading to more informed and impactful outcomes. This integrated approach and diversity of thought among the expert research team will ensures that the research is both data-driven and clinically relevant, enhancing advancements in healthcare and biological sciences through the application of advanced computational techniques, and hence, contribute to the ultimate improvement on Black maternal health outcomes using AI and ML.

Anticipated Findings

Our proposed AIM-AHEAD Phase-II project is aligned with North Star I, II and IV by pursuing the goal of developing a diverse, equitable, and inclusive AI/ML workforce. By establishing the Health Equity AI Lab (HEAL) at Fayetteville State University, a Historically Black College and University (HBCU), we are creating opportunities for underrepresented minorities in AI/ML. The lab will not only focus on cutting-edge research but also on training and mentoring students from diverse backgrounds. This initiative will foster an inclusive environment where students and researchers can develop their skills and contribute to advancements in AI/ML, particularly in health equity.

Demographic Categories of Interest

  • Race / Ethnicity
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level
  • Others

Data Set Used

Registered Tier

Research Team

Owner:

  • Jiazheng Yuan - Mid-career Tenured Researcher, Fayetteville State University

Duplicate of Association between gynecological and autoimmune disease

We would like to perform association analysis for gynecological and autoimmune phenotypes in diverse populations to explore genetic architecture using All of Us research dataset. We will integrate the All of Us data with previously published GWAS meta-analysis which consists…

Scientific Questions Being Studied

We would like to perform association analysis for gynecological and autoimmune phenotypes in diverse populations to explore genetic architecture using All of Us research dataset. We will integrate the All of Us data with previously published GWAS meta-analysis which consists of samples of European and East Asian ancestries. We will consider the following research questions: 1) Given autoimmune diseases are female-predominant phenotypes, are there any shared loci between gynecological and autoimmune diseases, 2) By considering trans-ancestry and admixed population features, can we identify associations unique to non-European ancestry? 2) We will identify genetic effect differences between populations of different ancestries. 3) We will narrow down the list of causal variants using state-of-the-art methods. We will publish our results as an applied paper.

Project Purpose(s)

  • Ancestry

Scientific Approaches

We would like to utilize regression models, including linear (mixed effects) regression for continuous traits and logistic mixed regression for the binary outcomes as implemented in REGENIE on All of Us genetics data. We will follow utilize ICD codes to extract phenotypes. 1) phenotype definition: Retrieve both genetics data and phenotype data based on ICD codes and uniformly process the data. 2) GWAS analysis: for each trait, we would like to perform GWAS analysis. We will adjust sex, age, and 10 principal components of genome-wide genotypes as covariates in association analysis. 3) Meta-Analysis and downstream analysis: we would also like to perform a meta-analysis with previously published GWAS data on gynecological and autoimmune diseases. We will also perform downstream analysis including fine-mapping and genetic correlation analysis.

Anticipated Findings

For this analysis, we would expect to identify shared genetic loci between autoimmune and gynecological phenotypes, especially taking into consideration of trans-ancestry structure and admixed population features. To be more precise, we would expect to find: 1) potential variants or genes that are associated with both gynecological and autoimmune diseases. 2) more accurate genetic effect estimation. 3) Detailed pipeline for performing similar analysis will be available to researchers that are within the Researcher Workbench for All of Us to enhance reproducibility. Our developed methods will benefit research of a similar kind.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Shaoyi Zhang - Graduate Trainee, Pennsylvania State University, College of Medicine
  • Jeniece Regan - Graduate Trainee, Pennsylvania State University, College of Medicine
  • Elizabeth Bond - Graduate Trainee, Pennsylvania State University, College of Medicine
  • Avantika Diwadkar - Graduate Trainee, Pennsylvania State University, College of Medicine

Assoication between gynecological and autoimmune disease

We would like to perform association analysis for gynecological and autoimmune phenotypes in diverse populations to explore genetic architecture using All of Us research dataset. We will integrate the All of Us data with previously published GWAS meta-analysis which consists…

Scientific Questions Being Studied

We would like to perform association analysis for gynecological and autoimmune phenotypes in diverse populations to explore genetic architecture using All of Us research dataset. We will integrate the All of Us data with previously published GWAS meta-analysis which consists of samples of European and East Asian ancestries. We will consider the following research questions: 1) Given autoimmune diseases are female-predominant phenotypes, are there any shared loci between gynecological and autoimmune diseases, 2) By considering trans-ancestry and admixed population features, can we identify associations unique to non-European ancestry? 2) We will identify genetic effect differences between populations of different ancestries. 3) We will narrow down the list of causal variants using state-of-the-art methods. We will publish our results as an applied paper.

Project Purpose(s)

  • Ancestry

Scientific Approaches

We would like to utilize regression models, including linear (mixed effects) regression for continuous traits and logistic mixed regression for the binary outcomes as implemented in REGENIE on All of Us genetics data. We will follow utilize ICD codes to extract phenotypes. 1) phenotype definition: Retrieve both genetics data and phenotype data based on ICD codes and uniformly process the data. 2) GWAS analysis: for each trait, we would like to perform GWAS analysis. We will adjust sex, age, and 10 principal components of genome-wide genotypes as covariates in association analysis. 3) Meta-Analysis and downstream analysis: we would also like to perform a meta-analysis with previously published GWAS data on gynecological and autoimmune diseases. We will also perform downstream analysis including fine-mapping and genetic correlation analysis.

Anticipated Findings

For this analysis, we would expect to identify shared genetic loci between autoimmune and gynecological phenotypes, especially taking into consideration of trans-ancestry structure and admixed population features. To be more precise, we would expect to find: 1) potential variants or genes that are associated with both gynecological and autoimmune diseases. 2) more accurate genetic effect estimation. 3) Detailed pipeline for performing similar analysis will be available to researchers that are within the Researcher Workbench for All of Us to enhance reproducibility. Our developed methods will benefit research of a similar kind.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Shaoyi Zhang - Graduate Trainee, Pennsylvania State University, College of Medicine
  • Jeniece Regan - Graduate Trainee, Pennsylvania State University, College of Medicine
  • Elizabeth Bond - Graduate Trainee, Pennsylvania State University, College of Medicine
  • Avantika Diwadkar - Graduate Trainee, Pennsylvania State University, College of Medicine

Wearable Sleep Sensor Visualization Study

My research aims to investigate if wearable sleep sensor data can reveal actionable insights to deliberately improve the quality of sleep. This question is significant because improving sleep quality has profound implications for overall health and well-being. By analyzing large…

Scientific Questions Being Studied

My research aims to investigate if wearable sleep sensor data can reveal actionable insights to deliberately improve the quality of sleep. This question is significant because improving sleep quality has profound implications for overall health and well-being. By analyzing large datasets from wearable devices, I hope to identify patterns and behaviors that correlate with better sleep quality, providing practical recommendations for individuals and healthcare providers.

Project Purpose(s)

  • Educational

Scientific Approaches

To address my research question, I will analyze data from wearable sleep sensors, focusing on variables such as sleep duration, sleep stages, heart rate, and restlessness. I will use statistical analysis methods, including correlation analysis and regression models, to identify relationships between these variables and sleep quality. Additionally, machine learning techniques will be employed to uncover complex patterns and predictive models. Tools like Python and R will be utilized for data cleaning, visualization, and analysis, enabling a comprehensive exploration of the dataset.

Anticipated Findings

I anticipate discovering specific behaviors and conditions that are associated with improved sleep quality, such as optimal bedtime routines, physical activity levels, and stress management techniques. These findings could significantly contribute to the scientific understanding of sleep and provide evidence-based recommendations for improving sleep quality. By leveraging large datasets from wearable sensors, this research could lead to personalized sleep interventions, enhancing public health and individual well-being.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

zipcode data maryland visually impaired

what are the disparities of the visually impaired community in prince Georges county versus other counties in maryland? DOes race paly a role? How is the level of access?

Scientific Questions Being Studied

what are the disparities of the visually impaired community in prince Georges county versus other counties in maryland? DOes race paly a role? How is the level of access?

Project Purpose(s)

  • Educational

Scientific Approaches

i plan to use descriptive statistics and possibly a logistic regression analysis if it is enough observations and a t-test to measure statistical significance

Anticipated Findings

variation in access and knowledge of resources in the blind community across the state.Lower-income and younger individuals who are visually impaired tend to have less access to resources

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Disability Status
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

gene_prev

In this project, we will evaluate the prevalence of genetic variants associated with cardiovascular disease in the US population. We will adjust the prevalence rates based on demographic data of age, sex, race, and ethnicity.

Scientific Questions Being Studied

In this project, we will evaluate the prevalence of genetic variants associated with cardiovascular disease in the US population. We will adjust the prevalence rates based on demographic data of age, sex, race, and ethnicity.

Project Purpose(s)

  • Ancestry

Scientific Approaches

We will obtain the proportional breakdown of demographic subgroups of age, sex, and gender from the US census data. These proportions will be used to calculate the adjusted prevalence of each genetic variant. Standard statistical methods of normalization and adjustment will be used for this purpose.

Anticipated Findings

We expect our findings to shed light on the differential proportion of pathogenic genetic variants across demographic subgroups. These findings will inform future decisions for resource allocation to screen demographic subgroups with the highest burden of the disease to identify carriers of these variant at an earlier stage.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth

Data Set Used

Controlled Tier

Research Team

Owner:

Fitbit research

I intend to study wearables data and find a correlation between step count/heart rate and underlying medical conditions. The idea is to create a machine-learning model that can study Fitbit data and allow individuals to self-diagnose underlying medical conditions. This…

Scientific Questions Being Studied

I intend to study wearables data and find a correlation between step count/heart rate and underlying medical conditions. The idea is to create a machine-learning model that can study Fitbit data and allow individuals to self-diagnose underlying medical conditions. This research will further broaden the use of wearable data to help predict certain diseases.

Project Purpose(s)

  • Population Health
  • Methods Development
  • Control Set

Scientific Approaches

I will use the AoU wearables dataset and other public wearables datasets to train a machine-learning model using Python, pandas, and scikit-learn libraries.

Anticipated Findings

In the study, I anticipate certain underlying diseases are better predictable using step/heart rate counters than others, and the challenges with creating a reasonably accurate model.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Erik Windgassen - Undergraduate Student, University of California, Riverside

Mutations in FBN1 and Glaucoma

What is the prevalence of Marfan Syndrome (or FBN1 mutations) in patients with glaucoma? We are trying to explore the causes of open-angle glaucoma, primarily in Marfan Syndrome. We will explore the clinical manifestations in Marfan Syndrome and subtypes of…

Scientific Questions Being Studied

What is the prevalence of Marfan Syndrome (or FBN1 mutations) in patients with glaucoma?

We are trying to explore the causes of open-angle glaucoma, primarily in Marfan Syndrome. We will explore the clinical manifestations in Marfan Syndrome and subtypes of glaucoma and correlate them with mutations in FBN1.

Project Purpose(s)

  • Disease Focused Research (Marfan Syndrome; mutations in FBN1; glaucoma)
  • Ancestry

Scientific Approaches

Datasets include patients with FBN1 mutations or a confirmed diagnosis of Marfan Syndrome and a confirmed diagnosis of glaucoma. We will assess the subtypes of clinical manifestations of the Marfan Syndrome and the subtypes of glaucoma. We will also correlate racial background with the data.

Anticipated Findings

We expect to find an excess of patients with Marfan Syndrome in the glaucoma population. We also expect an excess of patients with glaucoma in the Marfan Syndrome population. These questions have not been analyzed since completion of the Human Genome Project.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

1_Davis_tutorials

This workspace will be used to create tutorials for students trained in the Davis lab.

Scientific Questions Being Studied

This workspace will be used to create tutorials for students trained in the Davis lab.

Project Purpose(s)

  • Educational

Scientific Approaches

We will use datasets derived from whole genome sequence and electronic health record data. Tutorials will include extracting individuals based on phenotype, case control matching algorithms, genetic data cleaning, genome wide association studies, admixture mapping analysis and more.

Anticipated Findings

This workspace is primarily a training ground for students in the Davis lab.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Mary Davis - Early Career Tenure-track Researcher, Brigham Young University
  • Steven Brugger - Graduate Trainee, Brigham Young University

Collaborators:

  • Kylee Bates - Undergraduate Student, Brigham Young University
  • Hannah Snarr - Undergraduate Student, Brigham Young University
  • Breckin Forstrom - Undergraduate Student, Brigham Young University
  • Alyks Odell - Undergraduate Student, Brigham Young University

Multiple sclerosis v7

We will analyze real-world data of individuals with and without multiple sclerosis (MS) to identify risk factors of disease and better predict who will respond positively to different types of MS treatments.

Scientific Questions Being Studied

We will analyze real-world data of individuals with and without multiple sclerosis (MS) to identify risk factors of disease and better predict who will respond positively to different types of MS treatments.

Project Purpose(s)

  • Disease Focused Research (multiple sclerosis)
  • Population Health
  • Social / Behavioral
  • Ancestry

Scientific Approaches

We will use insurance billing codes, medications, and free text to identify individuals with and without multiple sclerosis (MS). We will perform statistical genetic analyses to better understand the genetic variations that contribute to development of MS across diverse ancestry. We will analyze other symptoms and variables in the dataset to find earlier in life events that are associated with later MS development. We will use natural language processing and structured data to identify individuals with MS who are on MS treatments and use available data to identify indicators of which medications are effective and tolerable for patients.

Anticipated Findings

We anticipate the results of these studies will contribute to understanding how MS develops in individuals, and what earlier in life events may be predictors of who will develop MS, hopefully shortening the time to diagnosis. We hope to identify genetic or environmental predictors of which treatments are effective and tolerable so a patient in the future can be placed on the best medication at the onset of disease course.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

  • Tam Tran - Other, National Human Genome Research Institute (NIH - NHGRI)
  • Mary Davis - Early Career Tenure-track Researcher, Brigham Young University
  • Hannah Snarr - Undergraduate Student, Brigham Young University
  • Steven Brugger - Graduate Trainee, Brigham Young University

Collaborators:

  • Lily Robison - Undergraduate Student, Brigham Young University
  • Kylee Bates - Undergraduate Student, Brigham Young University
  • Kayden Hadlock - Undergraduate Student, Brigham Young University
  • Alyks Odell - Undergraduate Student, Brigham Young University

Subtyping complex disease using EHR (CONTROLLED TIER)

Unmodeled heterogeneity of the biological underpinnings of phenotypes reduces statistical power and reproducibility in downstream analyses such as genome-wide association studies. While some complex diseases have well-established clinically-relevant subtypes that respond differently to different treatments (e.g., diabetes and asthma), heterogeneity…

Scientific Questions Being Studied

Unmodeled heterogeneity of the biological underpinnings of phenotypes reduces statistical power and reproducibility in downstream analyses such as genome-wide association studies. While some complex diseases have well-established clinically-relevant subtypes that respond differently to different treatments (e.g., diabetes and asthma), heterogeneity is suspected but has not so far been demonstrated for many other phenotypes. Existing data-driven subtyping methods primarily rely on data clustering methods that are not guaranteed to capture genetically- or clinically-relevant stratification of patients, thus, impeding our ability to take an important step towards personalized medicine by defining treatment at the subphenotypic level. The goal of this proposal is to enable a robust identification of clinically-relevant subphenotypic variation given no prior knowledge.

Project Purpose(s)

  • Methods Development

Scientific Approaches

This study’s aim is the development of a contrastive learning method to define de novo phenotypic subtypes of complex disease based on electronic health records, and the application of the method to analyze the heterogeneity in clinical trajectories of conditions for which subphenotypic heterogeneity will be identified.

Anticipated Findings

A successful implementation of this study will provide a general and robust approach for learning phenotypic subtypes, which can then be applied to many phenotypes within and outside the All of Us database.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Arush Ramteke - Undergraduate Student, University of California, Los Angeles

Duplicate of [V7] Buu

Polygenic risk score (PRS) is an emerging tool to evaluate the risk of complex diseases and traits using aggregated effects from millions of variants. We would like to compare the prediction accuracy of PRS between biobanks (e.g UK Biobank). We…

Scientific Questions Being Studied

Polygenic risk score (PRS) is an emerging tool to evaluate the risk of complex diseases and traits using aggregated effects from millions of variants. We would like to compare the prediction accuracy of PRS between biobanks (e.g UK Biobank). We hypothesize that the same ancestries on different biobanks may have different origins and exposure to different environments. We then evaluate whether the variation might come from differences in genetic architectures of individuals and investigate the environmental effects on disease risk for individuals. Finally, we would like to harmonize data and propose appropriate approaches and models to improve PRS prediction accuracy.

Project Purpose(s)

  • Population Health
  • Methods Development
  • Control Set
  • Ancestry

Scientific Approaches

We employ different PRS methods including PT, PRS-CS for single population and PRS-CSx in the cross-ancestry context. We next compare the PRS with R2 or Nagelkerke R2 between different biobanks. We then estimate the heritability of each trait in each biobank with LD Score regression approach. We then utilize GxE interaction analysis to explore the interaction between genetic variants and environment on risk of diseases.

Anticipated Findings

We are looking forward to observing the difference between PRS accuracy across different data. The next expected result is to identify the different genetic architecture and environment effects which contributes to the difference. Finally, we hope to leverage all differences to improve the prediction accuracy including PRS and environmental effects.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

SDoH Subtyper

Exploring 19 SDoH variables defined on prior Dementia Workspace with relation to delayed care and inability to afford care within the realm of hypertension, diabetes, and osteoarthritis.

Scientific Questions Being Studied

Exploring 19 SDoH variables defined on prior Dementia Workspace with relation to delayed care and inability to afford care within the realm of hypertension, diabetes, and osteoarthritis.

Project Purpose(s)

  • Disease Focused Research (type 2 diabetes mellitus, osteoarthritis, and hypertension)
  • Educational
  • Methods Development
  • Ancestry

Scientific Approaches

Descriptive analyses of the cohort and subsequent demographics will be done with Seaborn and R. Bi-clustering will be done with ExplodeLayout and Bipartite Modularity.

Anticipated Findings

Certain subtypes of these disease groups may have more SDoH variables answered that may help with future interventions. Developing a generalizable method to analyze AoU data is also important.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Weibin Zhang - Project Personnel, University of Texas Medical Branch (UTMB) at Galveston

Collaborators:

  • yury garcia - Research Fellow, University of California, Davis
  • Daniel Bao - Graduate Trainee, University of Texas Medical Branch (UTMB) at Galveston
  • Alex Bokov - Other, University of Texas Health Science Center, San Antonio

APOL1 PheWAS

Research questions: 1) What disease associations with APOL1 can be replicated using the AllofUs dataset? 2) What novel disease associations, if any, with APOL1 can be identified in a diverse cohort? Relevance: Genomic variation in APOL1 is associated with kidney…

Scientific Questions Being Studied

Research questions: 1) What disease associations with APOL1 can be replicated using the AllofUs dataset? 2) What novel disease associations, if any, with APOL1 can be identified in a diverse cohort?

Relevance: Genomic variation in APOL1 is associated with kidney disease, but other disease associations are not well investigated. Thus, we seek to do the first PheWAS on extensively typed RBC antigens and to do so in a diverse cohort.

Project Purpose(s)

  • Disease Focused Research (renal disease)

Scientific Approaches

We plan to employ a phenome-wide association study (PheWAS) approach to identify associations between APOL1 variation and other clinical phenotypes. PheWAS will be carried out using multivariable linear regression and logistic regressions with APOL1 haplotypes. For example, APOL1 G1 and G2 alleles will act as the independent variable and phenotypes, derived from participant provided information (PPI) electronic health records (EHR), as the dependent variable. Initial models will include adjustments for age, gender, and race/ethnicity. Differential associations by race/ethnicity, gender, and sex will also be evaluated.

Anticipated Findings

We expect to replicate known APOL1-disease associations as well as identify any novels ones that may be identified within a diverse cohort.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

kristin_med_explore

We are looking into associations between medications and various neurodegenerative diseases, including Alzheimer's disease, Parkinson's disease, MS, vascular dementia, etc.

Scientific Questions Being Studied

We are looking into associations between medications and various neurodegenerative diseases, including Alzheimer's disease, Parkinson's disease, MS, vascular dementia, etc.

Project Purpose(s)

  • Disease Focused Research (Alzheimer's disease and other NDDs)
  • Population Health

Scientific Approaches

We will use EHR data and medication data to examine these associations. We will run a Cox regression and right censor the data to only include medication exposures that occur BEFORE an NDD diagnosis.

Anticipated Findings

We are looking to replicate similar findings from UKB and provide more information about the benefits/risks to the use of certain drugs.

Demographic Categories of Interest

  • Age

Data Set Used

Registered Tier

Research Team

Owner:

  • Kristin Levine - Project Personnel, National Institute on Aging (NIH - NIA)

Collaborators:

  • Vanessa Pitz - Research Fellow, National Institute on Aging (NIH - NIA)
  • Emma Somerville - Graduate Trainee, National Institute on Aging (NIH - NIA)
1 - 25 of 12351
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.