Research Projects Directory

Research Projects Directory

15,002 active projects

This information was updated 1/14/2025

The Research Projects Directory includes information about all projects that currently exist in the Researcher Workbench to help provide transparency about how the Workbench is being used. Each project specifies whether Registered Tier or Controlled Tier data are used.

Note: Researcher Workbench users provide information about their research projects independently. Views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program. Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

SES and Uveitis

We would like to examine whether socioeconomic factors such as income, race, age, education impact the management and/or outcomes of uveitis.

Scientific Questions Being Studied

We would like to examine whether socioeconomic factors such as income, race, age, education impact the management and/or outcomes of uveitis.

Project Purpose(s)

  • Disease Focused Research (Uveitis)

Scientific Approaches

We would like to use data on patients diagnosed with uveitis and examine variables such as number of steroid injections and rates of management with DMARDs and biologics. We will perform logistic/linear regressions according to the data to examine associations between SES factors and uveitis outcomes and management.

Anticipated Findings

There is a lack of research in the field of ophthalmology on how SES factors play a role in the management and treatment of uveitis, perhaps due to smaller sample sizes and relative lack of uveitis specialists. We anticipate that our findings may shed light healthcare disparities that may limit access or lead to differential management of uveitis.

Demographic Categories of Interest

  • Race / Ethnicity
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

  • Kristen Park - Research Fellow, University of Texas at Austin

Investigating the Role of Sleep and Activity on Heart Rate

I intend to study the scientific question: How do sleep and physical activity impact heart rate? Understanding this relationship through statistical modeling of Fitbit data can help optimize cardiovascular health.

Scientific Questions Being Studied

I intend to study the scientific question: How do sleep and physical activity impact heart rate? Understanding this relationship through statistical modeling of Fitbit data can help optimize cardiovascular health.

Project Purpose(s)

  • Educational

Scientific Approaches

My approach is grounded in NIH research that shows a link between lower resting heart rate and better cardiovascular health and longevity. I will extract Fitbit data from the All of Us Curated Data Repository (CDR) for a cohort of participants. After cleaning the Fitbit datasets, I will conduct exploratory data analysis. For data preprocessing, I will create the target variable (i.e., resting heart rate) and perform downsampling. To develop a robust model for predicting resting heart rate based on daily summary statistics related to sleep and physical activity, I will utilize two machine learning algorithms: Support Vector Regression (SVR) and Random Forest. I will also implement feature engineering and hyperparameter tuning to improve model performance. Finally, I will identify key features of daily sleep and physical activity that correlate with minimal resting heart rate.

Anticipated Findings

I anticipate concluding that both sleep and physical activity have a significant relationship with heart rate. Specifically, I expect that higher sleep quality and duration along with regular physical activity will correlate with more stable heart rate patterns, while insufficient sleep or inactivity may lead to less healthy heart rates. My findings could contribute to the existing body of scientific knowledge by providing data-driven insights into how lifestyle modifications in sleep and physical activity can be applied as targeted interventions for cardiovascular diseases.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Taylor Prince - Undergraduate Student, University of California, Los Angeles

Sezary Syndrome

We are hoping to explore the disease process of Sezary Syndrome and exploring the relationships with other diseases to SS

Scientific Questions Being Studied

We are hoping to explore the disease process of Sezary Syndrome and exploring the relationships with other diseases to SS

Project Purpose(s)

  • Disease Focused Research (Sezary's disease)

Scientific Approaches

We plan to use Python and SQL through Jupyter Notebook to analyze the data set and explore the relationships between other diseases.

Anticipated Findings

We hope to find diseases that relate to Sezary's Syndrome and would be of interest to clinicians and help guide clinicians to be aware of other diseases that may relate to Sezary's Syndrome.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Jeffrey Chen - Graduate Trainee, University of California, San Diego

Mental illness symptoms, diagnoses, and discrimination/stigma

I am currently exploring the data to formalize a specific research question. I am interested in the relationship between mental health symptoms and experiencing discrimination as well as the relationship between mental health diagnoses and experiencing discrimination. Are mental illness…

Scientific Questions Being Studied

I am currently exploring the data to formalize a specific research question. I am interested in the relationship between mental health symptoms and experiencing discrimination as well as the relationship between mental health diagnoses and experiencing discrimination. Are mental illness symptoms or mental health diagnoses more strongly related to reporting experiencing discrimination? Additionally, are there meaningful differences in people's experiences of discrimination based on the type of mental illness or level of severity of their mental health issue(s)?

Project Purpose(s)

  • Population Health
  • Social / Behavioral

Scientific Approaches

People 18 years and older who have answered survey questions about mental health symptoms, diagnoses, and experiences of discrimination will be included in this study. Participants will be categorized based on the severity of mental illness symptoms (normal, mild, moderate, or severe) and will also be classified based on their diagnosis(es).

Descriptive statistics (cross-tabulations with chi-squared tests; means with t-tests) will be used to assess the sociodemographic characteristics of individuals with mental health issues based on symptoms reported and also done separately based on the reported mental health diagnosis(es). Bivariate regression analyses will be performed to analyze the predictors of an individual endorsing a1) mental illness symptoms, a2) mental health diagnoses, and b) discrimination/stigma to get a sense of the data and covariates.

Anticipated Findings

This study should clarify whether mental illness symptoms and/or mental health disorder diagnoses are predictive of reporting and experiencing discrimination/stigma. If so, it should help identify whether one is more predictive than the other as well as which specific symptoms and/or diagnoses are most strongly related to experiencing discrimination/stigma. This will be particularly useful for anti-stigma campaigns and other programs because we will be able to identify those who are likely the most in need of intervention. The results can also help raise awareness about the different and nuanced experiences of anyone who falls under the umbrella of suffering from mental illness.

Based on some previous research I anticipate folks who fall within the categories of having serious or severe mental illness will report more experiences of discrimination/stigma compared to those who fall into categories of less severe mental illness.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Kristin Litzelman - Mid-career Tenured Researcher, University of Wisconsin, Madison

Health Disparities

I am exploring the data to understand All of Us Research to use for health disparities research. Findings from here exploration will be used to answer my hypothesis.

Scientific Questions Being Studied

I am exploring the data to understand All of Us Research to use for health disparities research. Findings from here exploration will be used to answer my hypothesis.

Project Purpose(s)

  • Educational

Scientific Approaches

I will be using survey data, electronic health records, and demographic information. Descriptive Statistics, data visualization, predictive modeling.

Anticipated Findings

Aim to find new areas of research that have not been researched as well as discover new cases of demographic research and uncover new associations between health disparities and outcomes.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

  • Saul Mota - Undergraduate Student, California State University, San Bernardino

MH and Substance Use

How are social needs related to unmet mental health need? • One of many independent variables: transportation need o Other IVs: insurance, food insecurity, social support, social isolation, racial discrimination, etc. • Dependent Variable: unmet mental health need

Scientific Questions Being Studied

How are social needs related to unmet mental health need?
• One of many independent variables: transportation need
o Other IVs: insurance, food insecurity, social support, social isolation, racial discrimination, etc.
• Dependent Variable: unmet mental health need

Project Purpose(s)

  • Educational

Scientific Approaches

How are social needs related to unmet mental health need?
• One of many independent variables: transportation need
o Other IVs: insurance, food insecurity, social support, social isolation, racial discrimination, etc.
• Dependent Variable: unmet mental health need

Anticipated Findings

Understanding the complexities of mental health and social needs. This research will help guide policies and practices around addressing social needs.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Comparing ancestry standardization methods for multiple PRS

Previously, we compared 5 methods to standardize a colorectal cancer (CRC) PRS. These methods used linear models to adjust the mean and variance of the PRS by genetic ancestry, using principal components of ancestry or admixture. We found that using…

Scientific Questions Being Studied

Previously, we compared 5 methods to standardize a colorectal cancer (CRC) PRS. These methods used linear models to adjust the mean and variance of the PRS by genetic ancestry, using principal components of ancestry or admixture. We found that using a genetically representative subsample of the data and adjusting for admixture resulted in standardized PRS closest to standard Normal. However, the preferred standardization method may depend on properties of the trait and PRS, such as heritability, the PRS development process and the number of SNPs included. We will compare these standardization methods across PRS for 5 conditions (breast cancer, coronary heart disease, hypercholesterolemia, prostate cancer, and type 2 diabetes). Results from this analysis will inform best practices for applying these standardization methods to other PRS/conditions, and development of random sampling techniques to ensure adequate representation of genetic ancestry within a population.

Project Purpose(s)

  • Methods Development

Scientific Approaches

We will select 20,000 unrelated participants (17.5% from each of AFR, ASN, AMR, EUR and 30% from OTH) from this cohort based on age, sex, availability of EHR
data and genetic ancestry. We will randomly sample training sets (N=2000) from the participants using naive sampling as well as sampling based on genetic representation of the cohort. We will calculate the PRS for 5 traits and then standardize them using methods used previously (doi:10.1002/gepi.22590). We will compare the expected number of participants in the upper tails, by genetic ancestry, to the observed counts, to determine which method(s) give the best standardization. We will tabulate the performance of each method by training data used, heritability (h^2) and number of SNPs for each condition: Breast Cancer (h^2=0.3 313 SNPs), Coronary Heart Disease (h^2=0.22; 5.4e5 SNPs), Hypercholesterolemia (h^2=0.56; 9k SNPs), Prostate Cancer (h^2=0.58; 269 SNPs), Type 2 diabetes (h^2=0.15, 1e6 SNPs).

Anticipated Findings

We hope to determine optimal standardization methods for each PRS type, dependant on heritabilty of the traits of interest and the number of SNPs involved as well as random sampling technique for the training data.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Michael Gatzen - Project Personnel, Broad Institute
  • Chris Kachulis - Project Personnel, Broad Institute
  • José Irizarry - Graduate Trainee, Tulane University

allofus_burden_analysis

We are looking at the association between genetic diversity in the AllOfUs cohort and rare Mendelian diseases.

Scientific Questions Being Studied

We are looking at the association between genetic diversity in the AllOfUs cohort and rare Mendelian diseases.

Project Purpose(s)

  • Population Health
  • Drug Development
  • Ancestry
  • Commercial

Scientific Approaches

We will assess how social health factors influence the severity rare mendelian diseases. We will assess whether common diseases are moderate forms of severe rare mendelian diseases. We will examine the genetic basis of these rare and common disease is influenced by loss of function and missense variants.

Anticipated Findings

We anticipate that we will identify social health factors that influence the severity rare mendelian diseases. We anticipate that we will identify relationships between common diseases and rare mendelian disease. We may identify genetic variants associated with rare and common diseases.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Mosaic Monosomy X Quantification

What is the incidence of mosaic loss of the X chromosome in individuals of diverse ancestries? Furthermore, what are the genetic drivers and phenotypic consequences of mosaic loss and are these different than previously characterized findings in European or East…

Scientific Questions Being Studied

What is the incidence of mosaic loss of the X chromosome in individuals of diverse ancestries? Furthermore, what are the genetic drivers and phenotypic consequences of mosaic loss and are these different than previously characterized findings in European or East Asian populations? This question is critical for understanding the broader applicability of findings across populations, improving equity in genomic research, and identifying ancestry-specific risk factors.

Project Purpose(s)

  • Ancestry

Scientific Approaches

To calculate mosaic loss, the study will use whole genome sequence data of participants to quantify chromosome X dosage across individuals of different ages and ancestries. Pending the results of these findings, the study will use phenotypic data from the All of Us biobank, leveraging genome-wide association studies (GWAS) to understand phenotypic impacts of mosaic chromosome X loss.

Anticipated Findings

The study will more accurately quantify the pattern of mosaic chromosome X loss in diverse populations. These findings would expand the understanding of mosaic chrX loss's genetic underpinnings.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

predictive models for OUD

To objective of this study is to develop predictive models using machine learning that will integrate clinical, social, genomic, and demographic features to identify patients that are higher risk for opioid use disorder. This is important because we need to…

Scientific Questions Being Studied

To objective of this study is to develop predictive models using machine learning that will integrate clinical, social, genomic, and demographic features to identify patients that are higher risk for opioid use disorder. This is important because we need to better identify patients at risk in order to improve how we allocate resources to those who are prescribed opioids in order to reduce incidence of opioid addiction.

Project Purpose(s)

  • Disease Focused Research (opioid use disorder)
  • Social / Behavioral
  • Ancestry

Scientific Approaches

We will use various machine learning approaches (e.g., deep learning, foundation models) to identify patients at risk for opioid use disorder. This will involve creating a cohort of all patients prescribed an opioid during their case. The population will be split into those who had or did not have a diagnosis of opioid use disorder (e.g., ICD10 F11.xx). Predictor variables that will be included are responses to survey questions (e.g., social determinants of health), demographic/geographic data, diagnosis codes, procedure codes, medications, and genomic information. These models will include genomic information from SNPs as well as markers discovered via GWAS. We will train models with a portion of the dataset and will validate the models on a separate test set.

Anticipated Findings

We anticipate that we can generate predictive models for opioid use disorder among patients prescribed an opioid, have chronic pain, and/or underwent surgery. This may potentially provide clinicians a tool to identify which of their patients are at high risk of addiction prior to prescribing opioids for pain.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Rodney Gabriel - Early Career Tenure-track Researcher, University of California, San Diego

Collaborators:

  • Varshini Sathish - Graduate Trainee, University of California, San Diego
  • Soraya Mehdipour - Other, University of California, San Diego
  • Sara Rosenthal - Senior Researcher, University of California, San Diego
  • Rohith Vutukuru - Graduate Trainee, University of California, San Diego
  • Ricardo Pietrobon - Project Personnel, Stanford University
  • Onkar Litake - Graduate Trainee, University of California, San Diego
  • Mihir Gujarathi - Graduate Trainee, University of California, San Diego
  • Lucas Teixeira - Research Fellow, Stanford University
  • Kathleen Fisch - Mid-career Tenured Researcher, University of California, San Diego
  • Ji-Qing Chen - Research Fellow, Stanford University
  • Geena Ildefonso - Other, University of California, San Diego
  • Daisy Chilin-Fuentes - Project Personnel, University of California, San Diego
  • Chunnan Hsu - Senior Researcher, University of California, San Diego
  • Charvi Bannur - Graduate Trainee, University of California, San Diego
  • Chirag Jain - Graduate Trainee, University of California, San Diego
  • Brian Park - Research Fellow, University of California, San Diego
  • Aditi Desai - Senior Researcher, Stanford University
  • Atharv Sunil Biradar - Graduate Trainee, University of California, San Diego

Duplicate of Germline Mutations that Hearts

Previous studies have reported numerous heritable gene variants that can increase the risk of developing heart arrhythmia. We look to increase understanding of these gene variants and their connection to cancer risk in a more diverse population. We are also…

Scientific Questions Being Studied

Previous studies have reported numerous heritable gene variants that can increase the risk of developing heart arrhythmia. We look to increase understanding of these gene variants and their connection to cancer risk in a more diverse population. We are also interested in exploring how these variants connect to other reported health problems in individuals who later develop cancer. Specifically, we intend to ask the following questions:

1. Are harmful, germline, gene variants a good predictor of whether or not an individual will develop heart issues during their life?
2. Are there other commonly reported health problems that can be linked to greater risk for cancer in people with these gene variants?
3. Do these findings hold across a diverse population?

Project Purpose(s)

  • Disease Focused Research (cancer)
  • Ancestry

Scientific Approaches

We will create workflows that align, intersect, extract, integrate, and analyze known predispositions of heart maladies mutations in the All of Us cohort.
Align: We will use the UCSC genome browser tool to ensure that predisposition variants match the human reference build of All of Us.
Intersect: We will use “bedtools intersect” and “BigQuery” to identify predisposition variants in whole genome sequencing mutation files (VCFs).
Extract: We will store all suspected predisposition variants as a first “data freeze”. This dataset will be our “training-set”. We will use subsequent All of Us data releases as “test-sets” for any novel associations or statical models we identify.
Integrate: Using genomics data and insurance billing codes, we will visualize the relationships between predisposition variants, cancer occurrences, and other reported health problems.
Analyze: We will build custom scripts in Python and R to identify associations found when combining genomics and phenotypic data.

Anticipated Findings

We expect to see that the presence of pathogenic gene variants can help predict a person’s risk for developing heart abnormalities. We anticipate that this finding will hold across a diverse population. We also expect to find other frequently reported health problems that associate with increased occurrence of heart disease.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Matthew Bailey - Early Career Tenure-track Researcher, Brigham Young University
  • Brian Kim - Undergraduate Student, Brigham Young University

Collaborators:

  • McKay Christenson - Undergraduate Student, Brigham Young University

Utilization of Advanced Rosacea Treatments in LHS+ Populations

We aim to investigate the utilization and outcomes of advanced rosacea treatments, including laser therapy and compounded medications, among Latinx populations. This study will examine barriers to accessing these beyond-first-line treatments and their efficacy compared to standard therapies. Understanding these…

Scientific Questions Being Studied

We aim to investigate the utilization and outcomes of advanced rosacea treatments, including laser therapy and compounded medications, among Latinx populations. This study will examine barriers to accessing these beyond-first-line treatments and their efficacy compared to standard therapies. Understanding these disparities is crucial for addressing gaps in dermatologic care and improving public health outcomes for underserved communities, particularly those with unique cultural and socioeconomic challenges. Exploring this data will help formalize strategies to enhance equitable access and tailor interventions to the needs of Latinx individuals.

Project Purpose(s)

  • Disease Focused Research (rosacea)
  • Population Health

Scientific Approaches

We will use the All of Us Research Program dataset to analyze demographic, clinical, and treatment data for Latinx individuals diagnosed with rosacea. Our study will employ a mixed-methods approach: quantitative analysis to assess the prevalence of advanced rosacea treatments (e.g., laser therapy, compounded medications) and qualitative analysis to explore barriers to accessing these treatments. Statistical methods, including logistic regression, will be used to identify predictors of advanced treatment use. We will also perform subgroup analyses to evaluate treatment outcomes by factors such as socioeconomic status, language preference, and geographic location. Tools such as R and Python will be used for data cleaning and statistical modeling, and NVivo will assist in analyzing qualitative data from survey responses if applicable. This approach will provide a comprehensive understanding of treatment disparities and inform targeted interventions to address them.

Anticipated Findings

We anticipate finding disparities in the use of advanced rosacea treatments, such as laser therapy and compounded medications, among Latinx individuals, with socioeconomic status, language preference, and insurance coverage being significant predictors. We also expect to identify barriers unique to this population, such as limited access to specialized care and treatment awareness. These findings would highlight inequities in rosacea care and contribute to the body of scientific knowledge by addressing gaps in understanding treatment accessibility among underserved populations. This work could guide culturally tailored interventions and policies to improve equitable dermatologic care.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Melissa Hernandez - Graduate Trainee, Loyola University Chicago
  • Cesar Ponce - Graduate Trainee, SUNY Upstate Medical University
  • Alexa DiNello - Graduate Trainee, Ohio State University

Breast reconstruction

What is the prevalence of pro-thrombotic genes in patients who underwent breast reconstruction and had a free flap failure.

Scientific Questions Being Studied

What is the prevalence of pro-thrombotic genes in patients who underwent breast reconstruction and had a free flap failure.

Project Purpose(s)

  • Disease Focused Research (Venous thrombosis)

Scientific Approaches

Create a cohort of patients who underwent breast reconstruction using autologous tissue and then had failure of the flap to investigate if they had pro-thrombotic genes.

Anticipated Findings

Presence of pro-thrombotic genes in these patients.
Elucidate why these patients had free flap failure.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

  • Agustin Posso - Research Fellow, Beth Israel Deaconess Medical Center

Salivary Stones Study

We aim to study which health conditions most commonly occur with salivary stones. Salivary stones are deposits in the submandibular or parotid glands that can cause persistent infection and often present with pain and swelling. We want to investigate risk…

Scientific Questions Being Studied

We aim to study which health conditions most commonly occur with salivary stones. Salivary stones are deposits in the submandibular or parotid glands that can cause persistent infection and often present with pain and swelling. We want to investigate risk factors such as diabetes, hypertension, and environmental factors in predicting the emergence of salivary stones. We hope that gaining insight into these predictive metrics will help guide prevention strategies and prioritize environmental cleanup.

Project Purpose(s)

  • Disease Focused Research (Sialolithiasis)

Scientific Approaches

We aim to structure this study as a retrospective analysis of patients with salivary stones matched to patients without salivary stones. We will collect patient demographic data as well as condition data about hypertension, diabetes, alcohol use, obesity, and others. We will then perform a logistic regression to identify the effect size of these different condition and demographic variables in predicting salivary stones. We will optimize the logistic regression with various machine learning packages in python.

Anticipated Findings

We anticipate that a combination of health factors can help predict a prognosis of salivary stones. We would contribute to the body of scientific knowledge on prevention strategies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Khushi Bhatt - Graduate Trainee, University of California, Irvine

RGC Workspace 2: Genetic association of disease and health-related traits

The primary scientific goal of this research is to improve our understanding of the basis of human disease and human health using genetics. We propose to focus on as many diseases and quantitative traits as possible. We also propose to…

Scientific Questions Being Studied

The primary scientific goal of this research is to improve our understanding of the basis of human disease and human health using genetics. We propose to focus on as many diseases and quantitative traits as possible. We also propose to deploy statistical methods that allow for analysis (or re-analysis) of genetic data, including gene-based burden tests and conditional analysis. We also propose to deploy phenotyping approaches for the most comprehensive capture of genes involved in human genetic diseases and traits. We plan to incorporate genetic ancestry into analysis approaches to optimize applicability of findings.

Project Purpose(s)

  • Ancestry

Scientific Approaches

We plan to evaluate pre-existing summary statistics assessing single variant and rare variant analyses of the All of Us cohort. We also propose to access genotype data derived from sequence data as well as phenotype data to allow for definition of case, control and quantitative trait values, as well as demographic or covariate values such as age, sex, BMI, any relevant experimental batches or similar covariates, other genetic covariates, and/or principal components to allow for correction of fine-scale ancestry. We will perform association analyses using Regenie, Remeta, Metal or similar software. We plan to perform association tests for case-control status and quantitative traits with single variants, as well as aggregated variants (e.g. rare variant burden tests), using direct genotypes, imputed variants, and may use approximate correction methods that use LD.

Anticipated Findings

We expect to observe genetic association results for a variety of diseases and health-related traits in a non-identifiable manner. We will keep genetic ancestry at the forefront so that genetic results benefit as many ancestry groups as possible and are used in an ethical and equitable manner.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Sheldon Bai - Senior Researcher, Regeneron Genetics Center LLC

Systemic lupus erythematosus

Gene variants in systemic lupus erythematosus (SLE) have been invaluable in uncovering insights into the underlying disease mechanisms. To enhance our understanding of SLE pathophysiology, we propose categorizing risk variants based on their impact on cell subsets. We will use…

Scientific Questions Being Studied

Gene variants in systemic lupus erythematosus (SLE) have been invaluable in uncovering insights into the underlying disease mechanisms. To enhance our understanding of SLE pathophysiology, we propose categorizing risk variants based on their impact on cell subsets.

We will use bioinformatic tools employed in functional genetics to assess genetic risks for a specific trait, condition, or disease. This tool enables us to comprehend how multiple genetic variations across the genome collectively influence a particular phenotype or outcome.

Project Purpose(s)

  • Disease Focused Research (autoimmune disease)
  • Population Health
  • Ancestry

Scientific Approaches

To achieve this, we will categorize both rare and common genetic variants within established biological pathways, utilizing publicly available functional genomics datasets. We will employ gene expression of risk loci to identify potential target genes that co-locate with SLE genome-wide association study (GWAS) hits.

Anticipated Findings

These insights will lay the groundwork for future well-powered investigations. We anticipate that this study will unveil insights into the underlying mechanisms of the condition and establish initial associations between genotypes and phenotypes in SLE. This research will provide essential guidance for optimizing the integration of genetics into SLE subset classification and offer valuable insights into the genetic architecture of multifactorial diseases.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

  • Liyoung Kim - Research Associate, Boston Children's Hospital

Collaborators:

  • In-Hee Lee - Research Associate, Boston Children's Hospital

ZC3H12A

To look into the phenotypes of mutations in the gene to see if there are any novel genotype-phenotype connections with ZC3H12A.

Scientific Questions Being Studied

To look into the phenotypes of mutations in the gene to see if there are any novel genotype-phenotype connections with ZC3H12A.

Project Purpose(s)

  • Ancestry

Scientific Approaches

Collect persons from variants of ZC3H12A. From those persons then extract out phenotypes and track novel phenotypes or any phenotypes, and run genomic and phenotypic analyses.

Anticipated Findings

The findings will be able to assess ZC3H12A and whether specific mutations of ZC3H12A correspond to the phenotype.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Anirudh Kesanapally - Research Assistant, National Human Genome Research Institute (NIH - NHGRI)

Collaborators:

  • Sofia Torreggiani - Graduate Trainee, National Human Genome Research Institute (NIH - NHGRI)

Metabolic Syndrome

We want to look at metabolic syndrome, and how it affects individuals. We will build a dataset for metabolic syndrome consisting of blood pressure, serum glucose, serum triglycerides, waist circumference and HDL.

Scientific Questions Being Studied

We want to look at metabolic syndrome, and how it affects individuals. We will build a dataset for metabolic syndrome consisting of blood pressure, serum glucose, serum triglycerides, waist circumference and HDL.

Project Purpose(s)

  • Ancestry

Scientific Approaches

We will be focusing on creating a diverse dataset in our investigation. Our research methods consist of creating a dataset, and further analyzing the data.

Anticipated Findings

We anticipate to identify potential genetic contribution to an individual having metabolic syndrome.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Vir Trivedi - Undergraduate Student, Loyola University Chicago
  • Maya Sharma - Undergraduate Student, Loyola University Chicago
  • Isabelle Gregga - Project Personnel, Loyola University Chicago

Duplicate of How to Work with All of Us Genomic Data (Hail - Plink)(v7)

Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.

Scientific Questions Being Studied

Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.

Project Purpose(s)

  • Other Purpose (Demonstrate to the All of Us Researcher Workbench users how to get started with the All of Us genomic data and tools. It includes an overview of all the All of Us genomic data and shows some simple examples on how to use these data.)

Scientific Approaches

Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.

Anticipated Findings

Not applicable - these notebooks demonstrate example analysis how to use Hail and PLINK to perform genome-wide association studies using the All of Us genomic data and phenotypic data.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Olivia Smith - Graduate Trainee, University of Texas at Austin

Collaborators:

  • Matthew Ming - Graduate Trainee, University of Texas at Austin
  • Jared Cole - Graduate Trainee, University of Texas at Austin

RGC Workspace 4: Study of genetic association of health-related traits

The primary scientific goal of this research is to improve our understanding of the basis of human disease and human health using genetics.  We propose to focus on as many diseases and quantitative traits as possible. We also propose to…

Scientific Questions Being Studied

The primary scientific goal of this research is to improve our understanding of the basis of human disease and human health using genetics.  We propose to focus on as many diseases and quantitative traits as possible.
We also propose to deploy statistical methods that allow for analysis (or re-analysis) of genetic data, including gene-based burden tests and conditional analysis.
We also propose to deploy phenotyping approaches for the most comprehensive capture of genes involved in human genetic diseases and traits.
We plan to incorporate genetic ancestry into analysis approaches to optimize applicability of findings.

Project Purpose(s)

  • Ancestry

Scientific Approaches

We plan to evaluate pre-existing summary statistics assessing single variant and rare variant analyses of the All of Us cohort.  We also propose to access genotype data derived from sequence data as well as phenotype data to allow for definition of case, control and quantitative trait values, as well as demographic or covariate values such as age, sex, BMI, any relevant experimental batches or similar covariates, other genetic covariates, and/or principal components to allow for correction of fine-scale ancestry.  We will perform association analyses using Regenie, Remeta, Metal or similar software.  We plan to perform association tests for case-control status and quantitative traits with single variants, as well as aggregated variants (e.g. rare variant burden tests), using direct genotypes, imputed variants, and may use approximate correction methods that use LD.

Anticipated Findings

We expect to observe genetic association results for a variety of diseases and health-related traits in a non-identifiable manner.  We will keep genetic ancestry at the forefront so that genetic results benefit as many ancestry groups as possible and are used in an ethical and equitable manner.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Sarah Graham - Other, Regeneron Genetics Center LLC
  • Niek Verweij - Senior Researcher, Regeneron Genetics Center LLC
  • Sheldon Bai - Senior Researcher, Regeneron Genetics Center LLC

Health Care Access & Risk Tiering Algorithms

Data-driven methods for identifying individuals at high-risk of adverse health events have become ubiquitous among payers and health care systems. These methods have come under scrutiny in light of evidence outlining that algorithms may perpetuate health disparities. One major concern…

Scientific Questions Being Studied

Data-driven methods for identifying individuals at high-risk of adverse health events have become ubiquitous among payers and health care systems. These methods have come under scrutiny in light of evidence outlining that algorithms may perpetuate health disparities. One major concern is that the data used to train such algorithms depend on access to the health care system and therefore perform poorly for individuals with less access to health care services. In this study, we plan to test this theory by comparing the predictive performance of algorithms commonly used by insurers to identify high-risk patients by measures of health care access.

Project Purpose(s)

  • Population Health
  • Methods Development

Scientific Approaches

We will predict several common outcomes used by payers and hospitals for population health management services including use of the ED and hospital (re)admissions using demographic, claims, and EHR data. We will assess the performance of the algorithms by varies definitions of patient access using patient survey data on health care use and access.

Anticipated Findings

We anticipate that the findings from the study will help quantify the amount of performance degradation that may result in a risk-tiering algorithm when there is variation in the amount of access to health care services.

Demographic Categories of Interest

  • Geography
  • Access to Care

Data Set Used

Controlled Tier

Research Team

Owner:

  • Anna Zink - Research Fellow, University of Chicago

Collaborators:

  • Zhongyuan Liang - Graduate Trainee, University of California, Berkeley
  • Irene Chen - Graduate Trainee, Massachusetts Institute of Technology
  • Hongzhou Luan - Graduate Trainee, University of California, Berkeley
  • Erin Tan - Undergraduate Student, University of California, Berkeley

Duplicate of Data Wrangling in All of Us Program (v7)

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Scientific Questions Being Studied

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Project Purpose(s)

  • Educational
  • Other Purpose (For use with Office hours. notebooks for adding code snippets useful for researchers. This is a placeholder for creating notebooks for best practices among other things)

Scientific Approaches

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Anticipated Findings

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Gabriel Goodney - Project Personnel, National Heart, Lung, and Blood Institute (NIH - NHLBI)

Duplicate of Calculate PRS to Mimic Drug Efficacy

Cardiovascular diseases remain leading causes of mortality worldwide. While traditional risk factors provide valuable insights into disease susceptibility, they fall short in predicting individual disease trajectories. Polygenic Risk Scores have also emerged as promising tools for aggregating genetic information into…

Scientific Questions Being Studied

Cardiovascular diseases remain leading causes of mortality worldwide. While traditional risk factors provide valuable insights into disease susceptibility, they fall short in predicting individual disease trajectories. Polygenic Risk Scores have also emerged as promising tools for aggregating genetic information into clinically meaningful metrics. Current research has primarily focused on using PRS for disease onset prediction, with limited exploration of their utility in predicting disease progression. This gap is particularly significant for CAD, where understanding progression could inform therapeutic strategies and resource allocation. However, the relationship between PRS and disease trajectory remains inadequately characterized.
By leveraging AoU, we will evaluate whether PRS can enhance our ability to predict disease progression in CAD patients. We will then explore similar models in other common disease phenotypes.

Project Purpose(s)

  • Drug Development

Scientific Approaches

Our research will employ a multi-staged analytical approach combining genetic and clinical data analysis. Initially, our focus will be on recapitulating the scientific results from the latest set of scientific publications on CAD-specific PRS using established genome-wide association study (GWAS) summary statistics and validated methodologies.

Next, we hope to expand on the published PRS into disease progression analysis. We will define objective endpoints including: progression to severe CAD (determined by coronary intervention requirements) and major adverse cardiovascular events. We still intend to utilize standard methods for the PRS calculations, incorporating the most recent meta-analyses of CAD-associated variants. We will assess PRS prediction performance through calibration plots and area under the receiver operating characteristic curve (AUC-ROC).

Anticipated Findings

more accurate prediction of CAD progression. The findings could have immediate clinical applications by identifying high-risk individuals who might benefit from more intensive monitoring or aggressive intervention strategies. This aligns with the public interest by potentially reducing healthcare costs through better resource allocation and improving patient outcomes through personalized risk stratification.

The results will contribute to the broader understanding of genetic influences on disease progression, potentially informing drug development and clinical trial design. Furthermore, the methodological framework developed could be adapted for studying progression patterns in other complex diseases, maximizing the public health impact of this research.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

V7 ARI Genomics Workspace - 4-21-23

We now have 4 goals in our research - this workspace has been created specifically for Goal #4. 1. Determine prevalence of autoimmune diseases, individually and as a class of disease, in the US. 2. Determine comorbidity of autoimmune diseases,…

Scientific Questions Being Studied

We now have 4 goals in our research - this workspace has been created specifically for Goal #4.

1. Determine prevalence of autoimmune diseases, individually and as a class of disease, in the US.

2. Determine comorbidity of autoimmune diseases, including statistics on comorbidity of other autoimmune diseases and non-autoimmune diseases for each autoimmune disease.

3. Determine the impact of COVID-19 on the autoimmune and autoinflammatory disease population. This work will be conducted in parallel with work we are doing at University of Southern California under an IRB there.

4. Explore the genomic component of autoimmune diseases, particularly among patients with more than one autoimmune disease, so that the underlying mechanisms of disease among these diseases can be better understood.

Project Purpose(s)

  • Disease Focused Research (Autoimmune diseases)
  • Ancestry

Scientific Approaches

We will create three data sets for analysis:

1. A list of diseases rated in the following ways:

a. Evidence Class
i. Strong evidence it is autoimmune
ii. Moderate evidence it is autoimmune
iii. Weak evidence for autoimmunity
iv. A comorbidity of autoimmune disease
v. Symptom or symptom set with no known mechanism

b. Autoinflammatory versus autoimmune flag

c. “Not always autoimmune” flag – to indicate diseases that could have alternative mechanisms of cause

2. A list of patients, anonymized, with socioeconomic, geographic and other data that would be of interest to patients and public health officials to understand which communities are affected by these diseases
3. Outcomes data for patients over time assessing quality of life using PROMIS metrics
4. We will develop statistics analyzing the association of variants known to affect autoimmune diseases for specific diseases to see if those variants corelate with other autoimmune diseases.

Anticipated Findings

There are recognized associations between specific gene variants and some autoimmune diseases. We are going to explore whether those associations can be found in other autoimmune and autoinflammatory diseases. We hope this work can uncover the common mechanisms that underlie autoimmune conditions that appear to be unconnected but which are comorbid.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Stephen Kocsis - Project Personnel, Mayo Clinic

V7 ARI Workspace - 4-21-23

We now have 4 goals in our research. This workspace is for goals 1 through 3. We have created a new workspace for Goal #4. 1. Determine prevalence of autoimmune diseases, individually and as a class of disease, in the…

Scientific Questions Being Studied

We now have 4 goals in our research. This workspace is for goals 1 through 3. We have created a new workspace for Goal #4.

1. Determine prevalence of autoimmune diseases, individually and as a class of disease, in the US.

2. Determine comorbidity of autoimmune diseases, including statistics on comorbidity of other autoimmune diseases and non-autoimmune diseases for each autoimmune disease.

3. Determine the impact of COVID-19 on the autoimmune and autoinflammatory disease population. This work will be conducted in parallel with work we are doing at University of Southern California under an IRB there.

4. Explore the genomic component of autoimmune diseases, particularly among patients with more than one autoimmune disease, so that the underlying mechanisms of disease among these diseases can be better understood.

Project Purpose(s)

  • Disease Focused Research (Autoimmune diseases)
  • Population Health
  • Ancestry

Scientific Approaches

We will create three data sets for analysis:

1. A list of diseases rated in the following ways:

a. Evidence Class
i. Strong evidence it is autoimmune
ii. Moderate evidence it is autoimmune
iii. Weak evidence for autoimmunity
iv. A comorbidity of autoimmune disease
v. Symptom or symptom set with no known mechanism

b. Autoinflammatory versus autoimmune flag

c. “Not always autoimmune” flag – to indicate diseases that could have alternative mechanisms of cause

2. A list of patients, anonymized, with socioeconomic, geographic and other data that would be of interest to patients and public health officials to understand which communities are affected by these diseases
3. Outcomes data for patients over time assessing quality of life using PROMIS metrics

Anticipated Findings

The current NIH estimate of 23.5 million people with autoimmune disease was a guess by a knowledgable clinician, but has no scientific support. As a consequence, there are numerous figures in the public sphere and nobody knows which one is correct.

Many reports say autoimmune diseases are on the increase, but since the number is unknown, it is impossible to say whether this is a public health issue or not. Having a methodology that can be used to recompute the number of people with autoimmune disease will help us understand if these reports are true.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Francis Ratsimbazafy - Other, All of Us Program Operational Use
  • Stephen Kocsis - Project Personnel, Mayo Clinic
  • Jun Qian - Other, All of Us Program Operational Use
  • Jeremy Harper - Senior Researcher, Autoimmune Registry
  • Jeffrey Green - Project Personnel, Autoimmune Registry
  • Ingrid He - Project Personnel, Autoimmune Registry
  • Emily Holladay - Project Personnel, Autoimmune Registry
  • Chenchal Subraveti - Project Personnel, All of Us Program Operational Use
  • Boyd Ingalls - Project Personnel, Autoimmune Registry
  • Adnaan Jhetam - Project Personnel, Autoimmune Registry
  • Alexander Burrows - Research Assistant, Autoimmune Registry
  • Jagannadha Avasarala - Other, University of Kentucky
1 - 25 of 15002
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.