Research Projects Directory

Research Projects Directory

18,260 active projects

This information was updated 5/23/2025

The Research Projects Directory includes information about all projects that currently exist in the Researcher Workbench to help provide transparency about how the Workbench is being used. Each project specifies whether Registered Tier or Controlled Tier data are used.

Note: Researcher Workbench users provide information about their research projects independently. Views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program. Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

[All by All] Cardiovascular-Kidney-Metabolic (CKM) Syndrome

Primary Research Question: What is the relationship between social determinants of health (income, education, insurance status, food insecurity, and neighborhood deprivation) and the development of advanced stage (III and IV) Cardiovascular-Kidney-Metabolic (CKM) syndrome? Secondary Research Questions: How do individual-level SDOH…

Scientific Questions Being Studied

Primary Research Question: What is the relationship between social determinants of health (income, education, insurance status, food insecurity, and neighborhood deprivation) and the development of advanced stage (III and IV) Cardiovascular-Kidney-Metabolic (CKM) syndrome? Secondary Research Questions: How do individual-level SDOH factors (income, education, insurance status, food insecurity) interact with community-level factors (neighborhood deprivation) in influencing advanced stage CKM syndrome risk? Are there differential associations between SDOH factors and specific components of advanced stage CKM syndrome (cardiovascular disease, kidney disease, metabolic conditions)? Do these relationships vary across different demographic subgroups (e.g., race/ethnicity, sex)?

Project Purpose(s)

  • Disease Focused Research ( Cardiovascular-Kidney-Metabolic (CKM) Syndrome)

Scientific Approaches

Dataset: All of Us Research Program dataset controlled tier version 8, which includes: Individual-level demographic and clinical data EHR data for medical conditions, diagnoses, and medication use Survey responses for SDOH factors ZIP code-derived neighborhood deprivation indices Research Methods: Cohort Development: Inclusion criteria based on data completeness and follow-up time Exclusion of participants with missing key variables (EHR, labs, measurements) Creation of composite SDOH measures Statistical Analysis: Descriptive statistics of SDOH factors and CKM outcomes Time-to-event analysis using Cox proportional hazards models Interaction analyses between individual and community-level SDOH Stratified analyses by demographic subgroups Sensitivity analyses to test robustness of findings Tools: R software with specialized packages: survival/survminer for time-to-event analysis finalfit for regression modeling tidyverse for data manipulation bigrquery for database queries

Anticipated Findings

Anticipated Findings: 1. Differential associations between SDOH factors and CKM syndrome: - Stronger associations for combined social disadvantages (e.g., low income + high neighborhood deprivation 2. Synergistic effects: - Multiplicative/additive interactions between individual and community-level SDOH - Potential threshold effects at certain levels of social disadvantage 3. Subgroup variations: - Different patterns of association across racial/ethnic groups - Sex-specific differences in SDOH-CKM relationships Scientific Contributions: 1. Novel insights into how multiple SDOH factors interact to influence CKM risk using a large, diverse cohort 2. Methodological advancement in measuring cumulative SDOH burden and its health impacts 3. Evidence base for targeted interventions addressing both individual and community-level factors 4. Framework for studying SDOH impacts on complex, multi-system health conditions 5. Support for integrated approaches to addressing social factors in clinics

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Yong Eun - Research Associate, New York City Health & Hospitals

Collaborators:

  • Yoonhyuk Jang - Early Career Tenure-track Researcher, Seoul National University Hospital
  • Suhwan Bong - Graduate Trainee, Harvard Faculty of Arts and Sciences
  • Hyunyong Koh - Research Fellow, Baylor College of Medicine

Patient Perception of Care in the Treatment of Erectile Dysfunction

We intend to use the All-of-Us direct-to-patient surveys to investigate the relationship between experiences of care and outcomes in patients with a history of erectile dysfunction. Specifically, we hope to identify and better understand the differences in how men perceive…

Scientific Questions Being Studied

We intend to use the All-of-Us direct-to-patient surveys to investigate the relationship between experiences of care and outcomes in patients with a history of erectile dysfunction. Specifically, we hope to identify and better understand the differences in how men perceive their quality of care within the context of sexual health. In doing so, we hope to better gauge patients' experiences with feelings of discrimination in the care of erectile dysfunction, a health condition that often carries with it feelings of shame and emotional baggage.

Project Purpose(s)

  • Social / Behavioral

Scientific Approaches

We intend to analyze responses to the NIH All-of-Us Social Determinants of Health (SDOH) survey from patients with a history erectile dysfunction. Scores for each response will be analyzed using various statistical tests to determine a relationship, if any, between experiences of discrimination among various groups of participants such as different ethnic, racial, and geographic groups.

Anticipated Findings

We anticipate to find a significant bias in reported feelings of discrimination among patients with erectile dysfunction related to racial/ethnic group. If such a relationship exists, our findings will elucidate a contribution to the disproportionate burden of adverse effects shared by minority racial groups in the United States.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Aidan Boyne - Graduate Trainee, Baylor College of Medicine

Cervical cancer screening and Diagnosis among WLHIV

Women living with HIV are at increased risk of HPV infection, which can lead to cervical cancer. Living with HIV is associated with low cervical cancer screening among the US population. However, few studies have examined trends in cervical cancer…

Scientific Questions Being Studied

Women living with HIV are at increased risk of HPV infection, which can lead to cervical cancer. Living with HIV is associated with low cervical cancer screening among the US population. However, few studies have examined trends in cervical cancer incidence and screening rates among women living with HIV.
This study aims to identify trends in cervical cancer screening, incidence of cervical cancer, and stage at diagnosis following the 2018 USPSTF and 2020 ACS screening guidelines. Comparison of the rate of cervical cancer among WLHIV who have comorbidities and those without comorbidities will be made. The association between age at HIV diagnosis and cervical cancer screening will be examined. The association between ART adherence and viral load suppression with cervical cancer incidence will be explored. This study will provide evidence for integrating cervical cancer prevention into HIV care and treatment, and evidence to target the most at-risk people for early identification.

Project Purpose(s)

  • Population Health

Scientific Approaches

This will be a retrospective cohort study of women living with HIV in the US, using the All of US data sets. Data sets from 2018 to 2024 will be analysed using R. Data will include women aged 21 and above with confirmed HIV diagnosis. Outcome variables of interest are cervical cancer screening, cervical cancer diagnosis, and stage at diagnosis, while age at HIV diagnosis, ART adherence, viral load suppression, and comorbidity status are predictors. Descriptive statistics will be used to analyze trends in cervical cancer screening, incidence rate of cervical cancer, and stage at diagnosis, and these will be represented using frequencies, line graphs, and bar charts. Association between age at HIV diagnosis and cervical cancer screening will be explored using bivariate analysis (chi-square or t-test). Multivariable logistic regression will be used to compare the association between comorbidities, ART adherence, and viral load suppression with cervical cancer diagnosis.

Anticipated Findings

I anticipate increased rates of screening post-2018, which will be suboptimal compared to national targets, with disparities in screening rates and cervical cancer incidence across age groups and comorbidity status. I hypothesize that WLHIV with comorbidities will have higher cervical cancer incidence and a greater likelihood of late-stage diagnosis due to possible healthcare prioritization of other chronic conditions over preventive screenings. I hypothesize that women diagnosed with HIV at a younger age will have higher screening uptake, probably due to earlier engagement in care, while women diagnosed later in life may be less likely to undergo regular screening. This study will provide up-to-date evidence on how cervical cancer screening has evolved following the recent recommendations. It will provide evidence of how comorbidities influence cancer risk and preventive care among WLHIV, supporting the integration of cervical cancer prevention into routine HIV care and treatment.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Edith Utaka - Graduate Trainee, University of South Carolina

Collaborators:

  • Huiyi Xia - Research Associate, University of South Carolina

Duplicate of PheWAS of endurance-associated variants

The purpose of this workspace is to use the All by All tables (v7) for phenome-wide association results. We examine two variants based on results from a case-control genome-wide association study with endurance athletes. (contact cwshanks@ucsc.edu)

Scientific Questions Being Studied

The purpose of this workspace is to use the All by All tables (v7) for phenome-wide association results. We examine two variants based on results from a case-control genome-wide association study with endurance athletes. (contact cwshanks@ucsc.edu)

Project Purpose(s)

  • Control Set
  • Ancestry

Scientific Approaches

We will leverage the All by All v7 tables as to conduct a phenome-wide association study (PheWAS) that identifies associations between the endurance athlete-associated variants and a broad range of phenotypes.

Anticipated Findings

Since the variants of interest are putatively associated with endurance athletes, we anticipate that these variants will have other significant associations with other phenotypes.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Investigation of genetic variation underlying real-world weight loss

We will analyze genetic variation associated with real-world weight loss and response to anti-obesity medication by leveraging EHR and genetic data provided in All of Us. We will ask the following specific questions: 1. What is the distribution of weight…

Scientific Questions Being Studied

We will analyze genetic variation associated with real-world weight loss and response to anti-obesity medication by leveraging EHR and genetic data provided in All of Us. We will ask the following specific questions:

1. What is the distribution of weight change experienced by individuals after anti-obesity medication prescription?
2. What genetic variants are associated with individuals with overweight and obesity that lost weight versus those that gained weight regardless of intervention?
3. What genetic variants are associated with weight change in response to weight loss medications?
4. Can polygenic models be used to predict response to anti-obesity medications?

We hypothesize that weight change in the real-world after anti-obesity medication will vary substantially between individuals and part of this variation in response can be explained by genetic predisposition.

Project Purpose(s)

  • Ancestry

Scientific Approaches

We will use All of US medication data to identify patients with BMI>27 who have been prescribed anti-obesity medications. We will perform regression models to test individual genetic variants with changes in weight using a GWAS approach, while adjusting for potential confounding variables (e.g., age, race, ethnicity, comorbidities). We will use appropriate adjustments for multiple comparisons. Subsequently, the patient cohort will be randomly partitioned into training and test data sets. We will use statistical summary measures to describe these patients and from the GWAS to identify variants associated with response to medications in the training set. A variety of polygenic models (e.g., SCT-PRS) with cross-validation will be applied to the training set to develop genetic models of response. Accuracy metrics (e.g., sensitivity & specificity) will be evaluated during cross-validation and finally on the test set).

Anticipated Findings

We anticipate greater variability in response to anti-obesity treatment in the real-world cohort compared to the response observed in the clinical trials of these medications. This information is valuable to setting expectations in clinical settings. We will also investigate genetic contribution to weight change on GLP-1 medications, which is not currently well understood.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

LoFGenes

I am currently exploring the All of Us dataset to investigate how loss-of-function (LoF) variants in genes that are intolerant to functional disruption may contribute to rare or undiagnosed genetic conditions. Specifically, I aim to identify LoF variants—such as nonsense,…

Scientific Questions Being Studied

I am currently exploring the All of Us dataset to investigate how loss-of-function (LoF) variants in genes that are intolerant to functional disruption may contribute to rare or undiagnosed genetic conditions. Specifically, I aim to identify LoF variants—such as nonsense, frameshift, and splice-site mutations—in genes with low LOEUF or high pLI scores, where loss of function is a known mechanism of disease. This question is important because such variants are more likely to have functional and clinical consequences, and identifying them can improve our understanding of gene-disease relationships. It also has potential public health relevance, as prioritizing high-impact variants can support more accurate diagnoses, inform clinical decision-making, and ultimately advance the goals of precision medicine, particularly in genetically diverse and historically underrepresented populations.

Project Purpose(s)

  • Educational
  • Ancestry

Scientific Approaches

To address my research question, I will integrate variant-level and gene-level data from the All of Us Researcher Workbench. Specifically, I will use the Variant Annotation Table (VAT) to identify predicted loss-of-function (LoF) variants, such as nonsense, frameshift, splice-site, and start/stop codon changes. These variants will be cross-referenced with gene constraint metrics—namely pLI and LOEUF scores—from external resources to prioritize genes that are intolerant to LoF. My analytical approach will involve filtering and annotating variants using Python and Hail within the JupyterLab environment, leveraging tools like pandas for data manipulation and Hail’s built-in functions for genetic variant handling. I will also examine population allele frequencies and carrier distributions to better understand variant impact in diverse ancestral groups.

Anticipated Findings

I anticipate identifying a subset of high-confidence loss-of-function (LoF) variants occurring in genes that are intolerant to functional disruption, particularly those where LoF is a known mechanism of disease. These findings may reveal rare or previously uncharacterized variants with potential clinical relevance, especially in diverse or underrepresented populations included in the All of Us dataset. By combining gene constraint metrics with variant annotation and population-level frequency data, this study will contribute to the broader understanding of gene-disease relationships and improve the interpretation of genetic variants. Ultimately, the results could help refine variant prioritization strategies in both research and clinical genomics, supporting the development of more equitable and accurate precision medicine approaches.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Rare Cancer

Rare cancer is an important public health disease, but specific screening methods have not yet been developed. In this context, we aim to develop statistical methods to detect rare cancers.

Scientific Questions Being Studied

Rare cancer is an important public health disease, but specific screening methods have not yet been developed. In this context, we aim to develop statistical methods to detect rare cancers.

Project Purpose(s)

  • Population Health
  • Social / Behavioral
  • Methods Development

Scientific Approaches

We will test prediction accuracy and specificity using regression-based approaches implemented in R and Python.

Anticipated Findings

Our anticipated findings include both known and unknown predictors of rare cancer. Once these predictors are identified, we will specify the mechanism of rare cancer onset by analyzing the temporal order of medical events.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Kyung Hee Lee - Mid-career Tenured Researcher, Central Michigan University

predixcan_test

The goal of the workspace is to develop a pipeline to streamline the retrieval of GWAS summary statistics from a user-entered phenotype accession number. Subsequent goals include integration of qqman (https://CRAN.R-project.org/package=qqman ) , LocusZoomR (https://CRAN.R-project.org/package=locuszoomr ), and MetaXcan S-PrediXcan (https://github.com/hakyimlab/MetaXcan)…

Scientific Questions Being Studied

The goal of the workspace is to develop a pipeline to streamline the retrieval of GWAS summary statistics from a user-entered phenotype accession number. Subsequent goals include integration of qqman (https://CRAN.R-project.org/package=qqman ) , LocusZoomR (https://CRAN.R-project.org/package=locuszoomr ), and MetaXcan S-PrediXcan (https://github.com/hakyimlab/MetaXcan) as subsequent analyses of the summary statistics.

Project Purpose(s)

  • Educational

Scientific Approaches

We will use GWAS summary statistics from the All of Us database along with Jupyter notebook to build a pipeline that can find a Hail MatrixTable, plot a manhattan plot from the table, plot a LocusZoom plot, and compute omic associations from a phenotype accession number.

Anticipated Findings

A GitHub repository for the pipeline will be made available and updated as research progress occurs. This community workspace enables a training environment for a coursework project in Loyola University Chicago’s masters in bioinformatics degree program.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

R01_Osteoarthritis_v8

In the United States, there are currently 32.5 million adults who are diagnosed with osteoarthritis (OA). As the most common form of arthritis, OA is a debilitating disease that causes the deterioration of the articular cartilage and underlying subchondral bone.…

Scientific Questions Being Studied

In the United States, there are currently 32.5 million adults who are diagnosed with osteoarthritis (OA). As the most common form of arthritis, OA is a debilitating disease that causes the deterioration of the articular cartilage and underlying subchondral bone. Its complex pathogenesis is thought to be influenced by genetic and metabolic factors involved in the development and progression of OA. However, the etiology of OA is poorly understood as there are limited approaches to personalized treatment before the end stage, such as total joint replacement surgery, is reached. Thus, the overall goal of this proposal is address this critical gap by leveraging genetic association analyses stratified by joint site and replacement to identify genetic loci associated with successful rehabilitation outcomes to advance precision medicine approaches predicting patient prognosis and response to rehabilitation for patients with OA.

Project Purpose(s)

  • Disease Focused Research (osteoarthritis)
  • Population Health
  • Drug Development
  • Methods Development
  • Control Set
  • Ancestry

Scientific Approaches

To test this hypothesis, we will use a comprehensive approach integrating advanced genetic analyses with functional validation of identified loci. Specifically, genome-wide association studies (GWAS), polygenic risk scores (PRS), regression modeling, and pathway analysis will be utilized to identify specific genetic variants associated with rehabilitation outcomes in OA patients. This approach is advantageous compared to other approaches because it allows us to utilize patient cohorts representing diverse demographics and joint sites, ‘omic (genotype, phenotype), and clinical (electronic medical record) variables to unravel the genetic landscape of OA for improved rehabilitation strategies.

Anticipated Findings

Collectively, these data will aim to identify shared and distinct fine-mapped genetic loci associated with OA, joint replacement, and joint site. Additionally, this proposed work will integrate genotypic, phenotypic, demographic, and environmental characteristics to decipher the multifactorial insight of OA susceptibility and heterogeneity. The findings of this proposal will potentially introduce evidence-based direction for rehabilitation approaches (i.e., targeting new pathways, combinatorial therapies, and/or identification of genetic risk factors in each population group) for OA after joint replacement and/or injury, as well as knowledge to guide future research toward developing novel therapeutic approaches.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Lavanya Pilla - Other, University of Alabama at Birmingham

Analysis of Physical Activity Levels

We will explore step count as a quantification of physical activity level and plan to assess physical activity as modifiable risk factor in the development of dementia.

Scientific Questions Being Studied

We will explore step count as a quantification of physical activity level and plan to assess physical activity as modifiable risk factor in the development of dementia.

Project Purpose(s)

  • Population Health

Scientific Approaches

We will use step count from fit bit data as a method to quantify different levels of physical activity/inactivity.

Anticipated Findings

We are hoping that after stratifying levels of physical activity/inactivity, we can assess physical activity/inactivity as a modifiable risk factor for the development of dementia.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

RVAS and GWAS Templates and Tutorials

Endogenous retroviruses (ERVs) comprise approximately 8% of the human genome. For years, ERVs have been considered silent and therefore deemed irrelevant by most for study. However, in the last years, we have shown that ERV expression signatures have a larger…

Scientific Questions Being Studied

Endogenous retroviruses (ERVs) comprise approximately 8% of the human genome. For years, ERVs have been considered silent and therefore deemed irrelevant by most for study. However, in the last years, we have shown that ERV expression signatures have a larger volume than previously thought. ERV RNA and protein expression has been demonstrated in particular for malignancies like cancers, neurological conditions, and autoimmune diseases as well as in immune privileged organs such as testes, placenta, brain and thyroid. While overexpression of certain ERV families which can encompass loci from multiple chromosomes has been linked to disease, single locus associations remain scarce. We will study the mutational burden of specific ERVs. This research aims to identify mutational profiles of ERVs in various disease,

Project Purpose(s)

  • Methods Development
  • Ancestry

Scientific Approaches

Common variant frequency associations, rare-variant burden analyses, and detection of ERV integration in large whole genome sequencing dataset from the All of Us research program.

Anticipated Findings

We hypothesize that certain ERVs have disease specific roles and will be more mutated in disease.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Pragati Kore - Graduate Trainee, Baylor College of Medicine
  • Nirav Shah - Graduate Trainee, Baylor College of Medicine
  • Jessica Honorato Mauer - Project Personnel, Baylor College of Medicine
  • Hatoon Al Ali - Graduate Trainee, Baylor College of Medicine
  • Grace Tietz - Graduate Trainee, Baylor College of Medicine
  • Elizabeth Atkinson - Early Career Tenure-track Researcher, Baylor College of Medicine
  • Christina Magyar - Graduate Trainee, Baylor College of Medicine
  • Astrid Manuel - Other, Baylor College of Medicine
  • Aishi Ayyanathan - Undergraduate Student, Baylor College of Medicine

Duplicate of New Analysis: Everyday Discrimination & Sleep Analysis

We are interested in understanding the relationship between everyday discrimination and sleep outcomes.

Scientific Questions Being Studied

We are interested in understanding the relationship between everyday discrimination and sleep outcomes.

Project Purpose(s)

  • Population Health
  • Social / Behavioral

Scientific Approaches

We will use the everyday discrimination data from the SDOH survey and sleep data from the Fitbit dataset to study this question. We expect to use generalized linear models to characterize the relationship between these two variables.

Anticipated Findings

We anticipate that exposure to more everyday discrimination will lead to poorer sleep quality. Our findings will contribute to the growing literature describing the effect of racism and discrimination on health in minority populations.

Demographic Categories of Interest

  • Race / Ethnicity
  • Gender Identity
  • Sexual Orientation

Data Set Used

Controlled Tier

Research Team

Owner:

  • Sarah Lee - Graduate Trainee, University of Massachusetts Medical School
  • Owen Leary - Graduate Trainee, Brown University

Duplicate of All by All - Lab Measurements Phenotypes Curation - JG

This Featured Workspace provides details about how lab measurements phenotypes were curated for downstream genome- and phenome-wide analysis in All by All. The All by All tables encompass about 3,400 phenotypes with gene-based and single-variant associations across nearly 250,000 whole…

Scientific Questions Being Studied

This Featured Workspace provides details about how lab measurements phenotypes were curated for downstream genome- and phenome-wide analysis in All by All. The All by All tables encompass about 3,400 phenotypes with gene-based and single-variant associations across nearly 250,000 whole genome sequences, with lab measurements as an included phenotype category. More details about the All by All tables can be found in the User Support Hub Article: https://support.researchallofus.org/hc/en-us/articles/27049847988884-Overview-of-the-All-by-All-tables-available-on-the-All-of-Us-Researcher-Workbench.

Within the Featured Workspace, a ReadMe file provides more information about the lab measurements phenotypes. Each phenotype is included as a separate notebook, which includes a graphical summary and descriptive statistics of the data. The ReadMe file includes an index of all the phenotypes and notebooks included in the Featured Workspace.

Project Purpose(s)

  • Educational

Scientific Approaches

Briefly, data for each lab measurement was harmonized to a standard unit and outlier values were dropped. The resultant participant level summaries for each lab measurement phenotype were then used in downstream genome- and phenome-wide analysis.

Anticipated Findings

The All by All tables leverage the genomic data and rich array of phenotypic data available from All of Us participants. The billions of association testing results available in the All by All data will enable many types of research studies geared towards understanding the genetic contribution to a variety of phenotypes, including lab measurements.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Psychiatric Comorbidity in Chronic Rhinosinusitis

The primary question that we would like to answer is – for patients with chronic rhinosinusitis (CRS), does undergoing sinus surgery reduce the incidence of psychiatric comorbidities—specifically depressive disorders, anxiety disorders, and adjustment disorders?

Scientific Questions Being Studied

The primary question that we would like to answer is – for patients with chronic rhinosinusitis (CRS), does undergoing sinus surgery reduce the incidence of psychiatric comorbidities—specifically depressive disorders, anxiety disorders, and adjustment disorders?

Project Purpose(s)

  • Disease Focused Research (Chronic Rhinosinusitis)

Scientific Approaches

To examine the relationship between sinus surgery and psychiatric comorbidities in patients with chronic rhinosinusitis (CRS), by employing a retrospective cohort study design.

Anticipated Findings

We anticipate sinus surgery in patients with CRS to be associated with a reduction in the incidence of psychiatric comorbidities, including depressive and anxiety disorders.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Easton Attwood - Graduate Trainee, University of Kansas Medical Center

rv_association

We are interested to understand the effect of rare variants present in individuals on complex diseases such as cardiometabolic or neurological disorders. Understanding the effect could help us to identify drug targets potentially offering avenues to develop therapeutic interventions for…

Scientific Questions Being Studied

We are interested to understand the effect of rare variants present in individuals on complex diseases such as cardiometabolic or neurological disorders. Understanding the effect could help us to identify drug targets potentially offering avenues to develop therapeutic interventions for these disorders.

Project Purpose(s)

  • Drug Development

Scientific Approaches

We intend to use genomic data, electronic health records and survey responses of all individuals in the All of Us cohort. Thereafter, we plan to conduct rare variant association studies with multiple traits and identify significant variant-trait associations. These associations will be further evaluated and prioritized based on existing literature or known association with the trait reported in public database.

Anticipated Findings

We expect to find various known and novel associations through our study. While the known associations identified through the approach are useful to validate the accuracy of the results, the novel findings will not only provide insights into human disease genetics but can also be used to develop therapeutics for the specific disease.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Determinants of Stroke in SGM People

Stroke is the fifth leading cause of death in the US. Despite a higher risk of stroke in sexual and gender minority (SGM; also known as LGBTQ+) people, little is known about why this disparity exists. We do not know…

Scientific Questions Being Studied

Stroke is the fifth leading cause of death in the US. Despite a higher risk of stroke in sexual and gender minority (SGM; also known as LGBTQ+) people, little is known about why this disparity exists. We do not know if it is due to differences in the prevalence of stroke risk factors and health-related behaviors, due to social determinants of health like poverty, or due to a combination of these factors. This project seeks to identify the risk factors driving stroke risk in SGM adults in order to inform future interventions.

Project Purpose(s)

  • Disease Focused Research (Stroke)
  • Population Health

Scientific Approaches

Using the cohort of people in All of Us with linked electronic health record data, we will perform two different analyses: (1) a series of time-to-event analyses using Cox proportional hazards model to evaluate the relationship between traditional (hypertension, hyperlipidemia, diabetes, atrial fibrillation) and non-traditional stroke risk factors (HIV, hepatitis C, syphilis, stimulant use) and stroke incidence in SGM and non-SGM people in 2 models (unadjusted, adjusted for age); (2) analyze cross-level interactions between the social determinants of race, ethnicity, and socioeconomic status and stroke incidence in SGM and non-SGM people.

Anticipated Findings

Our hypothesis is that non-traditional stroke risk factors will be associated with stroke in SGM people and that race and ethnicity (as a proxy measure for racism) will be important intersectional factors in the association between these risk factors and stroke. We also hypothesize that socioeconomic status will be a significant mediator in this relationship. If our hypothesis is confirmed, these findings will be an important first step in understanding why disparities in stroke exist for SGM communities and enable us to create a tailored intervention to reduce stroke risk in the community. If our hypothesis is not confirmed, that will also be a significant step and lead to further research to identify what other risk factors and social determinants may be driving disparities in stroke.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

  • Nicole Rosendale - Early Career Tenure-track Researcher, University of California, San Francisco

Collaborators:

  • Shufan Huo - Research Fellow, Yale University
  • Stephanie Cook - Early Career Tenure-track Researcher, New York University
  • Nguyen Tran - Research Fellow, Stanford University
  • Jingxuan Evelyn Ma - Graduate Trainee, New York University

predictive models for OUD v8 dataset

To objective of this study is to develop predictive models using machine learning that will integrate clinical, social, genomic, and demographic features to identify patients that are higher risk for opioid use disorder. This is important because we need to…

Scientific Questions Being Studied

To objective of this study is to develop predictive models using machine learning that will integrate clinical, social, genomic, and demographic features to identify patients that are higher risk for opioid use disorder. This is important because we need to better identify patients at risk in order to improve how we allocate resources to those who are prescribed opioids in order to reduce incidence of opioid addiction.

Project Purpose(s)

  • Disease Focused Research (opioid use disorder)
  • Social / Behavioral
  • Ancestry

Scientific Approaches

We will use various machine learning approaches (e.g., deep learning, foundation models) to identify patients at risk for opioid use disorder. This will involve creating a cohort of all patients prescribed an opioid during their case. The population will be split into those who had or did not have a diagnosis of opioid use disorder (e.g., ICD10 F11.xx). Predictor variables that will be included are responses to survey questions (e.g., social determinants of health), demographic/geographic data, diagnosis codes, procedure codes, medications, and genomic information. These models will include genomic information from SNPs as well as markers discovered via GWAS. We will train models with a portion of the dataset and will validate the models on a separate test set.

Anticipated Findings

We anticipate that we can generate predictive models for opioid use disorder among patients prescribed an opioid, have chronic pain, and/or underwent surgery. This may potentially provide clinicians a tool to identify which of their patients are at high risk of addiction prior to prescribing opioids for pain.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Soraya Mehdipour - Other, University of California, San Diego
  • Rodney Gabriel - Early Career Tenure-track Researcher, University of California, San Diego

Collaborators:

  • Sara Rosenthal - Senior Researcher, University of California, San Diego
  • Rohith Vutukuru - Graduate Trainee, University of California, San Diego
  • Daisy Chilin-Fuentes - Project Personnel, University of California, San Diego
  • Charles-Alexandre Roy - Project Personnel, University of California, San Diego
  • Charvi Bannur - Graduate Trainee, University of California, San Diego
  • Chirag Jain - Graduate Trainee, University of California, San Diego
  • Atharv Sunil Biradar - Graduate Trainee, University of California, San Diego
  • Varshini Sathish - Graduate Trainee, University of California, San Diego
  • Ricardo Pietrobon - Project Personnel, Stanford University
  • Ji-Qing Chen - Research Fellow, Stanford University
  • Chunnan Hsu - Senior Researcher, University of California, San Diego

Gene-Environment Interactions in NAFLD and complications

NAFLD has reached pandemic proportions. NAFLD is often asymptomatic, however, a subset can develop NASH, which can progress to cirrhosis and hepatocellular carcinoma (HCC). Identifying key clinical and genetic biomarkers that can identify high-risk NAFLD patients may lead to improved…

Scientific Questions Being Studied

NAFLD has reached pandemic proportions. NAFLD is often asymptomatic, however, a subset can develop NASH, which can progress to cirrhosis and hepatocellular carcinoma (HCC). Identifying key clinical and genetic biomarkers that can identify high-risk NAFLD patients may lead to improved detection and management strategies to help prevent the development of complications. Strong evidence supports the notion that NAFLD is a highly heritable disease (~ 20-70%), in which gene-environmental interactions can determine the severity and risk of disease progression. The objective of this study is to identify genetic and environmental factors that are associated with risk of NAFLD progression. Our aims are: 1. Defining a NAFLD cohort; 2. Examine the association between pre-selected genetic variants and risk of NAFLD, NAFLD-cirrhosis and NAFLD-HCC; 3. Determine if including specific genetic with clinical information from patients’ health records can improve diagnostic accuracy NAFLD patients.

Project Purpose(s)

  • Disease Focused Research (Gene-Environment Interactions in NAFLD and its complications)
  • Population Health
  • Ancestry

Scientific Approaches

We will use previously developed clinical algorithms developed by our group and others that have been published (using patient clinical factors-demographic, laboratory and imaging data) to define the NAFLD cohort. Clinical characteristics and risk factors will be determined using published ICD9/10 codes. Using pre-selected and published SNPs (from UK biobank and Veterans Affairs), we will develop a polygenic risk score to determine which variants are associated with progression of NAFLD liver disease as defined by cirrhosis, decompensations (hepatic encephalopathy, gastrointestinal bleeding, ascites) and HCC without or without liver transplantation. We propose to use the Fine and Gray linear regression models to assess for the risk of the individual and all outcomes.

Anticipated Findings

Currently there are no NAFLD biomarkers or risk stratification tools to help guide which patient in this heterogeneous population will progress to complications and which patients should therefore been seen in gastroenterology and hepatology sub-specialty care. Many patients present with complications without ever having been diagnosed with NAFLD. We anticipate to identify how patients' genetic make up as defined by pre-selected SNPs and their interactions with clinical risk factors affect the risk of liver disease progression in a NAFLD cohort. These data will help us develop biomarkers that can be used clinically to determine which NAFLD patients should be screened for complications and referred to subspecialty care from primary care and endocrinology clinic.

Demographic Categories of Interest

  • Race / Ethnicity
  • Access to Care
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

  • Nicole Prause - Other, University of California, Los Angeles
  • Jihane Benhammou - Early Career Tenure-track Researcher, University of California, Los Angeles
  • Arthur Ko - Other, Children's Research Institute

BWhabit

We are seeking to clearly identify the drivers of health improvement over time of habitual improvement in certain patterns of movement, recovery, and activity engagement including but not limited to directed engagement, situational improvements, and structured program adoption.

Scientific Questions Being Studied

We are seeking to clearly identify the drivers of health improvement over time of habitual improvement in certain patterns of movement, recovery, and activity engagement including but not limited to directed engagement, situational improvements, and structured program adoption.

Project Purpose(s)

  • Commercial

Scientific Approaches

We will be using multiple ML models to leverage wearable data sets to do regressive and predictive modelling across user behaviors. Segmental analysis, dynamic cohort binding and predictive models will be used.

Anticipated Findings

We anticipate improving our existing models that predict the impact and timing of specific interventions on overall health and wellness. Overall, science has broad generalities on the relative health benefits of isolated activities -- modelling this in a noisy, confounding-element rich model (life) is critical to delivering personalized, targeted recommendations effectively.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

CFTR

The study aims to investigate the potential link between inherited variations in the CFTR gene and the risk of developing various cancers. This research is important because while two CFTR changes cause Cystic Fibrosis, we know less about the health…

Scientific Questions Being Studied

The study aims to investigate the potential link between inherited variations in the CFTR gene and the risk of developing various cancers.

This research is important because while two CFTR changes cause Cystic Fibrosis, we know less about the health effects of having just one change. If carrying one CFTR gene change increases cancer risk, it could help us understand who might need earlier cancer screening or new ways to prevent cancer. This could improve public health by helping identify people at higher risk and guiding future research into better cancer prevention.

Project Purpose(s)

  • Ancestry

Scientific Approaches

Plan to use statistical methods to see if having a CFTR gene change is linked to specific types of cancer

Anticipated Findings

I expect to find whether inherited pathogenic CFTR gene variations, particularly in the heterozygous carrier state, are present in a notable proportion of individuals with cancer, potentially at a higher frequency than in the general population without cancer.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Li Sun - Graduate Trainee, Indiana University

Long term CVD - in 3 cancers

What are the genes that associated with cardiovascular diseases and Breast, Colorectal, and Prostate Cancers?

Scientific Questions Being Studied

What are the genes that associated with cardiovascular diseases and Breast, Colorectal, and Prostate Cancers?

Project Purpose(s)

  • Disease Focused Research (Cardiovascular disease)

Scientific Approaches

I will calculate hazard ratio of cardiovascular outcomes and Breast, Colorectal, and Prostate cancer comparing cancer free population with Breast, colorectal and prostate cancer survivors. Cox PH model will be used to obtain HRs.

Anticipated Findings

Due to a lack of or distant from cancer treatment or healthcare facilities, rural cancer survivors might receive less care compared to urban patients after their cancer diagnosis.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

first workspace

What are the role(s) of provider networks on health outcomes? What kinds of provider networks (MD, NPs etc) have effects on health outcomes?

Scientific Questions Being Studied

What are the role(s) of provider networks on health outcomes? What kinds of provider networks (MD, NPs etc) have effects on health outcomes?

Project Purpose(s)

  • Disease Focused Research (Diabetes, Cancer )
  • Population Health
  • Social / Behavioral
  • Educational
  • Control Set

Scientific Approaches

I want to use encounter data in rural areas and cities to describe physician networks and their impact(s) on health outcomes. This is preliminary search to see if there are any possible impacts, I will proceed with this study if this first search provides a foundation to continue the question.

Anticipated Findings

I want to use ML/AI to make a predictive model/tool to predict health outcomes based on a physician network.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

  • Amanda Carbajal - Research Fellow, SUNY Downstate Health Sciences University

Example

This is an example of how to generate a workbook. In order to develop a cohort, I need to have a workbench.

Scientific Questions Being Studied

This is an example of how to generate a workbook. In order to develop a cohort, I need to have a workbench.

Project Purpose(s)

  • Population Health
  • Social / Behavioral
  • Methods Development
  • Control Set
  • Ancestry

Scientific Approaches

I plan to examine the differential gene expression in response to an environmental exposure. This will shed light on the correlation between environment and gene expression patterns.

Anticipated Findings

I will anticipate finding a difference in gene expression patterns as result a number of environmental factors.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

Sleep quality - Health

Investigate association of Polygenic Risk Scores for Morning Chronotype and Longer Sleep Duration with health outcomes

Scientific Questions Being Studied

Investigate association of Polygenic Risk Scores for Morning Chronotype and Longer Sleep Duration with health outcomes

Project Purpose(s)

  • Ancestry

Scientific Approaches

Calculate PRS for the following phenotypes were constructed as weight sum of genoma-wide significant alleles: 1) chronotype [PRS-C]; 2) continuous SD [PRS-SD]; and 3) SS duration [PRS-SS].

Assess associations between PRS and study outcomes (fasting glucose and insulin, HOMA-IR, hemoglobin A1c, BMI, and WC) cross-sectionally and longitudinally.

Anticipated Findings

Importance of genetic predisposition for morningness and adequate sleep duration for metabolic health.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Stephanie Shue - Research Assistant, Columbia University
  • Jiheum Park - Early Career Tenure-track Researcher, Columbia University

Collaborators:

  • Annabel Gerber - Project Personnel, Columbia University

Risk Factors for Hypertension in Adults

Scientific question(s) intended to study: What are the key demographic factors associated with hypertension in adults over 50? How do clinical characteristics correlate with hypertension in this population? Are certain lifestyle factors more common among hypertensive adults over 50? What…

Scientific Questions Being Studied

Scientific question(s) intended to study:
What are the key demographic factors associated with hypertension in adults over 50?
How do clinical characteristics correlate with hypertension in this population?
Are certain lifestyle factors more common among hypertensive adults over 50?
What patterns or disparities exist across different subgroups, and how might these inform targeted prevention strategies?

Why is this Important:
Guide early detection and intervention efforts by identifying high-risk populations based on demographic or clinical indicators.
Improve health equity by identifying disparities across racial, ethnic, or socioeconomic groups.
Support evidence-based public health programs aimed at reducing modifiable risk factors like obesity, smoking, or sedentary behavior.

Project Purpose(s)

  • Educational

Scientific Approaches

1. Datasets
I will define a cohort of participants aged 50 years or older with a documented diagnosis of hypertension, I will extract relevant data domains for the defined cohort, including: Demographics, Physical Measurements, Conditions and Survey Response

2. Research Methods
Descriptive Statistics:
Compute mean and median age to understand age distribution, Calculate proportions of participants by sex and race/ethnicity, Determine average systolic and diastolic blood pressure, Analyze BMI distribution

Data Visualization: Use histograms, bar charts, and boxplots to visualize distributions and proportions.

Exploratory Analysis: Assess associations between demographic and clinical variables with hypertension status.

3. Tools
Researcher Workbench: For cohort definition, dataset building
Jupyter Notebook: For data cleaning, statistical analysis, and visualization.
Python (with libraries such as pandas, matplotlib, seaborn, and numpy) for data manipulation and plotting.

Anticipated Findings

Anticipated Findings:
I expect to find that older adults with hypertension have higher average BMI and a greater prevalence of comorbidities such as diabetes and hyperlipidemia compared to general population norms. I also anticipate observing disparities in hypertension prevalence across racial and ethnic groups, with higher rates in some minority populations. Lifestyle factors like smoking may be more common among hypertensive individuals. The data will likely reveal complex interactions between demographic, clinical, and behavioral factors influencing hypertension risk in adults over 50.

Scientific Contribution:
Using the All of Us dataset’s broad demographic representation, this study will help fill gaps in knowledge about hypertension risk among underrepresented groups. Insights gained can inform targeted public health interventions, support precision medicine approaches, and ultimately contribute to reducing hypertension-related health disparities.

Demographic Categories of Interest

  • Age

Data Set Used

Registered Tier

Research Team

Owner:

1 - 25 of 18261
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.