Research Projects Directory

Research Projects Directory

11,945 active projects

This information was updated 6/21/2024

The Research Projects Directory includes information about all projects that currently exist in the Researcher Workbench to help provide transparency about how the Workbench is being used. Each project specifies whether Registered Tier or Controlled Tier data are used.

Note: Researcher Workbench users provide information about their research projects independently. Views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program. Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

Shared Lifestyles and Genetic Risk Factors of MAFLD and Diabetes

Metabolic dysfunction-associated fatty liver disease (MAFLD) and diabetes are two common types of metabolic disorder diseases. Unhealthy lifestyles and genetic mutations affecting biochemical processes are the causes of MAFLD and diabetes. The question is to see how the shared lifestyles…

Scientific Questions Being Studied

Metabolic dysfunction-associated fatty liver disease (MAFLD) and diabetes are two common types of metabolic disorder diseases. Unhealthy lifestyles and genetic mutations affecting biochemical processes are the causes of MAFLD and diabetes. The question is to see how the shared lifestyles and genetic risk factors contribute to the development of MAFLD and diabetes compared to those without the diseases.

Project Purpose(s)

  • Disease Focused Research (metabolic dysfunction-associated fatty liver disease and diabetes)

Scientific Approaches

Depending on the data available, I plan to compare the lifestyles (including sedentary time, physical activity, high-fat diet, etc.) and genetics of those with MAFLD and diabetes to those with one type of MAFLD and diabetes or those without the diseases to identify the potentially associated lifestyle risk factors and genes. I will use data from All of Us only and perform statistical analysis on the data.

Anticipated Findings

We anticipate that lack of exercise and high-fat diet may be the common shared risk factors of MAFLD and diabetes. Through our analysis, we expect to receive recommended physical activity time and proper fat diet to decrease the incidence of MAFLD and diabetes. Meanwhile, I hope this research can identify the possible shared genes those contribute to the development of MAFLD and diabetes, which can help us prevent or treat MAFLD and diabetes precisely.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Guangqin Xiao - Research Fellow, Harvard T. H. Chan School of Public Health

Using AI to Predict Cardiovascular Disease

I want to use Artificial Intelligence to predict at what age a person is most likely to develop moderate to severe heart disease based on a variety of inputs, including their age, family history, other medical conditions, and lifestyle. Cardiovascular…

Scientific Questions Being Studied

I want to use Artificial Intelligence to predict at what age a person is most likely to develop moderate to severe heart disease based on a variety of inputs, including their age, family history, other medical conditions, and lifestyle. Cardiovascular diseases are the leading cause of death globally, based off of data from the World Health Organization (WHO). Leveraging modern-day technology in such a wide-spread and preventable disease will help people take preventative steps earlier and reduce the chances of an untimely death.

Project Purpose(s)

  • Disease Focused Research (Cardiovascular Disease)
  • Methods Development

Scientific Approaches

Our training data will be the group of people who have already been diagnosed with a form of heart disease, and at what age they were given this diagnosis. Our inputs to the model will include traditional risk factors of cardiovascular disease, such as age, height, and weight of a person, if they’ve already been diagnosed with high blood pressure, high cholesterol, and/or diabetes, family history of high cholesterol, high blood pressure and/or diabetes, their smoking and drinking habits, and their physical lifestyle. We will also explore various other inputs included within the All of Us dataset to investigate whether there are additional inputs which may be a significant indicator of developing heart disease. Modern machine learning methods such as supervised classification algorithms and supervised regression algorithms will be implemented using Python. Our output will be a potential age range of when they are likely to develop a moderate to severe diagnosis of heart disease.

Anticipated Findings

From this study, I hope to have created a model for predicting when a person may develop heart disease with an accuracy of at least 90%. This model would be a valuable contribution in utilizing artificial intelligence and machine learning in the realm of cardiovascular health. I'd also be interested in incorporating Fitbit data and seeing how that impacts performance.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Ethan Steinberg - Project Personnel, Stanford University

DuplicateofNuclearGeneticControlofmtDNACopyNumberHeteroplasmy

This workspace was duplicated from an accessible repository generated as part of the study: "Nuclear genetic control of mtDNA copy number and heteroplasmy in humans". Please see https://github.com/rahulg603/mtSwirl. Current Project info: We hope to assess the relationship between mtDNA SNPs…

Scientific Questions Being Studied

This workspace was duplicated from an accessible repository generated as part of the study: "Nuclear genetic control of mtDNA copy number and heteroplasmy in humans". Please see https://github.com/rahulg603/mtSwirl.

Current Project info: We hope to assess the relationship between mtDNA SNPs and cardiometabolic and endocrine traits such as hypothyroidism and diabetes in the AoU cohort.

Project Purpose(s)

  • Disease Focused Research (mitochondrial phenotypes, common diseases (e.g., heart disease, type 2 diabetes))
  • Ancestry

Scientific Approaches

This workspace was duplicated from an accessible repository generated as part of the study: "Nuclear genetic control of mtDNA copy number and heteroplasmy in humans".

Project information: We plan to use the previously quantified mtDNA phenotypes to replicate findings from a limited mtDNA-wide PHEWAS of cardiometabolic- and endocrine-related phecodes that was performed in a different cohort.

Anticipated Findings

We anticipate that our approach help elucidate the relationship between mitochondrial SNPs and several important traits, and how these associations may differ by ancestry.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

AL and BC risk

Does chronic stress or allostatic load increase one's risk of developing female breast cancer? At what time point, pre-diagnosis, does an increased allostatic load significantly increase breast cancer risk?

Scientific Questions Being Studied

Does chronic stress or allostatic load increase one's risk of developing female breast cancer? At what time point, pre-diagnosis, does an increased allostatic load significantly increase breast cancer risk?

Project Purpose(s)

  • Disease Focused Research (female breast cancer)
  • Social / Behavioral
  • Methods Development

Scientific Approaches

Women aged 18 - 90 years
Allostatic Load at 25, 30, 35, 40, 45 years of age
Cohort 1 : Lifetime Breast Cancer Diagnosis
Cohort 2: No Lifetime Breast Cancer Diagnosis
Age of Diagnosis
Race
Sexual orientation
SES
Income
Insurance
Breast Cancer Characteristics
Family history of breast cancer
Breast density on mammogram
Rural v urban setting

Anticipated Findings

Allostatic load increases risk of breast cancer, Elevated allostatic load at age 25 in patient's who later in life develop breast cancer

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sexual Orientation

Data Set Used

Registered Tier

Research Team

Owner:

SV

What is the relationship between physical activity levels and the occurrence of heart failure? What impact does smoking and alcohol consumption have on heart failure risk?

Scientific Questions Being Studied

What is the relationship between physical activity levels and the occurrence of heart failure?
What impact does smoking and alcohol consumption have on heart failure risk?

Project Purpose(s)

  • Educational

Scientific Approaches

I will create two datasets (one with heart failure condition and other without it) to conduct statistical analysis.

Anticipated Findings

Discovering specific lifestyle factors (e.g., smoking, sedentary behavior) strongly correlated with heart failure risk.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Yina Hou - Graduate Trainee, Tennessee State University

Bleeding risk on anticoagulant or antithrombotic therapy

Bleeding is a major adverse event for patients with anticoagulant or antithrombotic therapy. Identifying genetic, environmental, and comorbidity risks for bleeding with anticoagulant or antithrombotic therapy is required.

Scientific Questions Being Studied

Bleeding is a major adverse event for patients with anticoagulant or antithrombotic therapy. Identifying genetic, environmental, and comorbidity risks for bleeding with anticoagulant or antithrombotic therapy is required.

Project Purpose(s)

  • Drug Development
  • Ancestry

Scientific Approaches

We plan to use the All of Us Database to identify a cohort of patients with anticoagulant or antithrombotic therapy. We will seek the disease that increase the bleeding and will look at genetic risks without these diseases.

Anticipated Findings

We anticipate identifying genetic risks and disease risks for bleeding with anticoagulant or antithrombotic therapy. We can change the duration of therapy with a more accurate prediction model.

Demographic Categories of Interest

  • Geography

Data Set Used

Controlled Tier

Research Team

Owner:

Impact of COVID-19 on the Hispanic Community

This Workspace will be used for the “Impact of COVID-19 on the Hispanic Community” Driver Project, that will go beyond validation and result in novel findings. The study aims to replicate findings from the following articles: “COVID-19 Pandemic: Disparate Health…

Scientific Questions Being Studied

This Workspace will be used for the “Impact of COVID-19 on the Hispanic Community” Driver Project, that will go beyond validation and result in novel findings. The study aims to replicate findings from the following articles: “COVID-19 Pandemic: Disparate Health Impact on the Hispanic/Latinx Population in the United States” (Gil et al. 2020), “Life in the Time of COVID-19: a Case Study of Community Health” (Schelly 2021), “Racial/Ethnic Disparities In COVID-19 Exposure Risk, Testing, And Cases At The Subcounty Level In California” (Reitsma et al. 2021), that shed light on how COVID-19 has disproportionately affected the Hispanic community in the United States. We will focus on replicating the findings as they relate to the social determinants of health (SDOH) such as income, education, coexisting medical conditions such as obesity and diabetes, lack of access to health care, language barriers, working conditions, and living conditions.

Project Purpose(s)

  • Social / Behavioral

Scientific Approaches

1. Has the Hispanic community been disproportionately affected by COVID-19?
What is the impact COVID-19 has had on the U.S. Hispanic Community?

2. What role have social determinants of health, such as income, education, and access to health care, played in the disproportionate effect of COVID-19 on the Hispanic community?

Anticipated Findings

We expect that the Hispanic cohort in the All of Us Research Program reflects the disproportionate effect of COVID-19 on the Hispanic community found in the aforementioned articles. We expect to see that a large proportion of Hispanics will have had COVID-19, or known someone who did, or lack access to healthcare facilities in the event of a COVID-19 infection, based on COPE Survey findings. These findings would reinforce the importance of taking social determinants of health into account when creating policy relating to access to health care and safety net programs for the Hispanic community in the United States.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

Impact of COVID-19 on the Hispanic Community - Dataset v7

This Workspace will be used for the “Impact of COVID-19 on the Hispanic Community” Driver Project, that will go beyond validation and result in novel findings. The study aims to replicate findings from the following articles: “COVID-19 Pandemic: Disparate Health…

Scientific Questions Being Studied

This Workspace will be used for the “Impact of COVID-19 on the Hispanic Community” Driver Project, that will go beyond validation and result in novel findings. The study aims to replicate findings from the following articles: “COVID-19 Pandemic: Disparate Health Impact on the Hispanic/Latinx Population in the United States” (Gil et al. 2020), “Life in the Time of COVID-19: a Case Study of Community Health” (Schelly 2021), “Racial/Ethnic Disparities In COVID-19 Exposure Risk, Testing, And Cases At The Subcounty Level In California” (Reitsma et al. 2021), that shed light on how COVID-19 has disproportionately affected the Hispanic community in the United States. We will focus on replicating the findings as they relate to the social determinants of health (SDOH) such as income, education, coexisting medical conditions such as obesity and diabetes, lack of access to health care, language barriers, working conditions, and living conditions.

Project Purpose(s)

  • Social / Behavioral

Scientific Approaches

1. Has the Hispanic community been disproportionately affected by COVID-19?
What is the impact COVID-19 has had on the U.S. Hispanic Community?

2. What role have social determinants of health, such as income, education, and access to health care, played in the disproportionate effect of COVID-19 on the Hispanic community?

Anticipated Findings

We expect that the Hispanic cohort in the All of Us Research Program reflects the disproportionate effect of COVID-19 on the Hispanic community found in the aforementioned articles. We expect to see that a large proportion of Hispanics will have had COVID-19, or known someone who did, or lack access to healthcare facilities in the event of a COVID-19 infection, based on COPE Survey findings. These findings would reinforce the importance of taking social determinants of health into account when creating policy relating to access to health care and safety net programs for the Hispanic community in the United States.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Kyle Melin - Mid-career Tenured Researcher, University of Puerto Rico Medical Sciences
  • William Agyapong - Project Personnel, University of Texas at El Paso

predictive models for OUD

To objective of this study is to develop predictive models using machine learning that will integrate clinical, social, genomic, and demographic features to identify patients that are higher risk for opioid use disorder. This is important because we need to…

Scientific Questions Being Studied

To objective of this study is to develop predictive models using machine learning that will integrate clinical, social, genomic, and demographic features to identify patients that are higher risk for opioid use disorder. This is important because we need to better identify patients at risk in order to improve how we allocate resources to those who are prescribed opioids in order to reduce incidence of opioid addiction.

Project Purpose(s)

  • Disease Focused Research (opioid use disorder)
  • Social / Behavioral
  • Ancestry

Scientific Approaches

We will use various machine learning approaches (e.g., deep learning, foundation models) to identify patients at risk for opioid use disorder. This will involve creating a cohort of all patients prescribed an opioid during their case. The population will be split into those who had or did not have a diagnosis of opioid use disorder (e.g., ICD10 F11.xx). Predictor variables that will be included are responses to survey questions (e.g., social determinants of health), demographic/geographic data, diagnosis codes, procedure codes, medications, and genomic information. These models will include genomic information from SNPs as well as markers discovered via GWAS. We will train models with a portion of the dataset and will validate the models on a separate test set.

Anticipated Findings

We anticipate that we can generate predictive models for opioid use disorder among patients prescribed an opioid, have chronic pain, and/or underwent surgery. This may potentially provide clinicians a tool to identify which of their patients are at high risk of addiction prior to prescribing opioids for pain.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Rodney Gabriel - Early Career Tenure-track Researcher, University of California, San Diego

Collaborators:

  • Varshini Sathish - Graduate Trainee, University of California, San Diego
  • Soraya Mehdipour - Other, University of California, San Diego
  • Sierra Simpson - Research Fellow, University of California, San Diego
  • Sara Rosenthal - Senior Researcher, University of California, San Diego
  • Onkar Litake - Graduate Trainee, University of California, San Diego
  • Daisy Chilin-Fuentes - Project Personnel, University of California, San Diego
  • Brian Park - Research Fellow, University of California, San Diego

Skin conditions in people with diabetes

The primary aim is to understand descriptive characteristics and outcomes of skin conditions in people with diabetes.

Scientific Questions Being Studied

The primary aim is to understand descriptive characteristics and outcomes of skin conditions in people with diabetes.

Project Purpose(s)

  • Disease Focused Research (Skin conditions in diabetes)

Scientific Approaches

We will use descriptive characteristics to better understand skin conditions in people with diabetes

Anticipated Findings

The current study anticipates that people with diabetes and skin conditions have poor glycemia and higher comorbidities. Findings from this study will help advance knowledge and learning of skin conditions in people with diabetes.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Sydney Chon - Undergraduate Student, Houston Methodist Research Institute

Duplicate of How to Run Notebooks in the Background_Sierra

Some analyses take some time to run. Currently, the researcher has to wait for their job to run because if they are logged out of the system, their code will stop working and will not be executed. This is problematic…

Scientific Questions Being Studied

Some analyses take some time to run. Currently, the researcher has to wait for their job to run because if they are logged out of the system, their code will stop working and will not be executed. This is problematic for users working on datatypes such as Fitbit and Genomics.

To avoid this interruption, this notebook will run codes in the background.

Project Purpose(s)

  • Educational
  • Other Purpose (The notebook in this workspace shows how to run notebooks in the background even if the user is logged out of the workbench.)

Scientific Approaches

To run notebooks in the background, we use a special Python library called nbconvert. Users will specify the name of the notebook that they need to be executed. After that, they just need to run every cell in this notebook.

Anticipated Findings

There is no anticipated findings as this is for educational purpose only.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Sierra Simpson - Research Fellow, University of California, San Diego

Collaborators:

  • Rodney Gabriel - Early Career Tenure-track Researcher, University of California, San Diego

Brain tumor

I am exploring this data set to formulate a specific question while learning how to use R and this data set. I do basic science research on brain tumors and vitamin metabolism and am generally curious about the relationship with…

Scientific Questions Being Studied

I am exploring this data set to formulate a specific question while learning how to use R and this data set. I do basic science research on brain tumors and vitamin metabolism and am generally curious about the relationship with exercise and healthy metabolic markers and if this correlates with better cancer outcomes. I am planning to explore data with the wearables to assess exercise.

Project Purpose(s)

  • Disease Focused Research (Brain tumor)
  • Social / Behavioral

Scientific Approaches

I plan to use the conditions data set to look at brain tumors. I will also use the wearables data to assess movement and exercise, and metabolic markers such as LDL and HbA1C to assess health.

Anticipated Findings

There have been some studies that have shown that exercise is beneficial for cancer patients. I think that this is important to explore in people from lower SES backgrounds who may not have as many resources (time/money) to access exercise.

Demographic Categories of Interest

  • Geography
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

NASH

I am looking to explore the data and see what data is provided for patients with NASH. I want to look at if BMI and bile acids are implicated in the pathogenesis and compensation of NASH in non-obese individuals.

Scientific Questions Being Studied

I am looking to explore the data and see what data is provided for patients with NASH. I want to look at if BMI and bile acids are implicated in the pathogenesis and compensation of NASH in non-obese individuals.

Project Purpose(s)

  • Disease Focused Research (NASH)

Scientific Approaches

I will be looking at all patients diagnosed with NASH excluding those who progressed to more advanced liver disease. I will stratify based on BMI and look if bile acids are implicated.

Anticipated Findings

This can lead us to understand the pathogenesis of NASH in non-obese patients and help us identify therapeutics that modulate bile acids.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Anisha Jain - Graduate Trainee, University of Pittsburgh

20240620 class

Explore the association between Ikzf1/Ikzf2/HLA SNPs, emotional health and wellbeing, and type 1 diabetes outcome.

Scientific Questions Being Studied

Explore the association between Ikzf1/Ikzf2/HLA SNPs, emotional health and wellbeing, and type 1 diabetes outcome.

Project Purpose(s)

  • Educational

Scientific Approaches

Correlation between emotional health and wellbeing with type 1 diabetes outcome among patients stratified by Ikzf1/Ikzf2/HLA SNPs.

Anticipated Findings

Emotional health and wellbeing with have varying effects on type 1 diabetes outcomes in patients with different SNPs, findings that will impact prevention, diagnosis, and treatment.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • David Gao - Graduate Trainee, University of Pittsburgh

postpartum_hypertension

I am looking to better understand the etiology of chronic hypertension which develops in the first year postpartum in women who had hypertensive disorders of pregnancy and were normotensive before pregnancy. To do this, I am hoping to explore demographics,…

Scientific Questions Being Studied

I am looking to better understand the etiology of chronic hypertension which develops in the first year postpartum in women who had hypertensive disorders of pregnancy and were normotensive before pregnancy. To do this, I am hoping to explore demographics, survey data, vital sign data including blood pressure, and possibly genomic data in women who have given birth. The temporal aspect of this data is crucial, so I am hoping to explore what data is sufficiently present so that I may explore some of these factors in the first year postpartum. Currently, the pathogenesis of new onset hypertension which develops postpartum is poorly understood, and this study aims to develop that understanding further.

Project Purpose(s)

  • Disease Focused Research (pre-eclampsia)

Scientific Approaches

I am not sure which approaches I plan to use as this is currently explorative. Once I better understand the data I will make a new workspace detailing my scientific approaches.

Anticipated Findings

I am not sure the anticipated findings as this is currently explorative. Once I better understand the data I will make a new workspace detailing my scientific approaches.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Alexis Cenname - Graduate Trainee, University of Pittsburgh

Subsetting SNPs

These results will be used for running future GWAS in the Wheeler Lab. We are subsetting the available genomic data to the SNPS included in Wheeler Lab models to make GWAS more time- and cost-effective. We will then use the…

Scientific Questions Being Studied

These results will be used for running future GWAS in the Wheeler Lab. We are subsetting the available genomic data to the SNPS included in Wheeler Lab models to make GWAS more time- and cost-effective. We will then use the GWAS sumstats with our prediction models to study complex trait genetics and potential biological pathways/mechanisms.

Project Purpose(s)

  • Other Purpose (This data will be used for future academic research projects in the Wheeler Lab)

Scientific Approaches

These results will be used for running future GWAS in the Wheeler Lab. We are subsetting the available All of Us genomic data to the SNPS included in Wheeler Lab omics models to make GWAS more time- and cost-effective. We will then use the GWAS sumstats with our omics prediction models (using tools such as PRS-CSX and PrediXcan).

Anticipated Findings

These results will be used for running future GWAS in the Wheeler Lab. We will use the future GWAS sumstats with our omics prediction models to study complex trait genetics and potential biological pathways/mechanisms.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Vir Trivedi - Undergraduate Student, Loyola University Chicago
  • Maya Sharma - Undergraduate Student, Loyola University Chicago
  • Jacob Grandinetti - Undergraduate Student, Loyola University Chicago
  • Heather Wheeler - Mid-career Tenured Researcher, Loyola University Chicago

Duplicate of Phenotype - Breast Cancer (v7)

The Notebooks in this Workspace can be used to implement well-known phenotype algorithms in one’s own research. This is for basic learning on how to use the All of Us data.

Scientific Questions Being Studied

The Notebooks in this Workspace can be used to implement well-known phenotype algorithms in one’s own research.

This is for basic learning on how to use the All of Us data.

Project Purpose(s)

  • Disease Focused Research (cancer)
  • Population Health
  • Educational
  • Methods Development
  • Control Set
  • Other Purpose (This is an All of Us Phenotype Library Workspace created by the Researcher Workbench Support team. It is meant to demonstrate the implementation of key phenotype algorithms within the All of Us Research Program cohort.)

Scientific Approaches

This is a temporary work space that I duplicated, in order to learn how to parse All of Us phenotype data

Anticipated Findings

By reading and running the Notebooks in this Phenotype Library Workspace, researchers can implement the following phenotype algorithms:

Ning Shang, George Hripcsak, Chunhua Weng, Wendy K. Chung, & Katherine Crew. Breast Cancer. Retrieved from https://phekb.org/phenotype/breast-cancer.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Matthew Bailey - Early Career Tenure-track Researcher, Brigham Young University

Collaborators:

  • Camille Krieger - Undergraduate Student, Brigham Young University
  • Colin Wessman - Undergraduate Student, Brigham Young University

BPD

This research aims to: Evaluate the effectiveness of various treatments for Borderline Personality Disorder (BPD) using a Python algorithm, focusing on 12-month outcomes like hospitalizations and medication adjustments. Investigate how treatment outcomes differ across demographics (race, age, socio-economic status). Analyze…

Scientific Questions Being Studied

This research aims to:

Evaluate the effectiveness of various treatments for Borderline Personality Disorder (BPD) using a Python algorithm, focusing on 12-month outcomes like hospitalizations and medication adjustments.

Investigate how treatment outcomes differ across demographics (race, age, socio-economic status).

Analyze patterns of co-occurring mental health conditions (depression, anxiety, PTSD) in BPD and their impact on treatment outcomes.

Study healthcare utilization trends among individuals with BPD and identify predictors of high healthcare use (hospital visits, ED visits, outpatient care, refills, telehealth visits).

Project Purpose(s)

  • Disease Focused Research (Bipolar Disorder)

Scientific Approaches

Dataset Description:
The primary dataset will consist of longitudinal clinical data from patients diagnosed with Borderline Personality Disorder. This dataset will include demographic information (such as age, race, socio-economic status), clinical variables (symptom severity, comorbid conditions), treatment history (therapeutic interventions received, medications prescribed), and healthcare utilization metrics.

Research Methods:
Data Analysis: Descriptive statistics will be used to characterize the study population and summarize treatment outcomes and healthcare utilization patterns.
Machine Learning Algorithms: Python-based machine learning algorithms will be applied to identify predictive models for treatment outcomes and healthcare utilization. Techniques such as classification and regression will be employed.
Statistical Analysis: Multivariate analysis techniques will be used to analyze the relationship between demographic factors, comorbidities, and treatment outcomes.

Anticipated Findings

-The study aims to identify which therapeutic interventions (such as Dialectical Behavior Therapy, Cognitive Behavioral Therapy, or medication regimens) are most effective for improving long-term outcomes in BPD patients. This knowledge can guide clinicians in selecting the most appropriate treatment strategies tailored to individual patient characteristics and needs.

-By analyzing treatment outcomes across different demographic groups (such as race, age, and socio-economic status), the study aims to uncover disparities in healthcare access and outcomes. Understanding these disparities can inform efforts to reduce inequities and improve healthcare delivery for diverse patient populations.

-By identifying predictors of high healthcare utilization among BPD patients (including hospitalizations, ED visits, outpatient visits, and telehealth usage), the study aims to optimize healthcare resource allocation and improve cost-effectiveness in managing this patient population.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Yeon Mi Hwang - Research Fellow, Stanford University

Depression

intends to study how to create an AI model based on patients' medical history. This involves understanding the variables within medical histories that are most predictive of certain outcomes or conditions.

Scientific Questions Being Studied

intends to study how to create an AI model based on patients' medical history. This involves understanding the variables within medical histories that are most predictive of certain outcomes or conditions.

Project Purpose(s)

  • Disease Focused Research (mental depression)

Scientific Approaches

plans to use datasets containing patients' medical histories, likely including variables such as demographics, medical conditions, medications, and treatment histories. It will use machine learning and artificial intelligence techniques to analyze these datasets, likely including methods such as logistic regression, decision trees, or neural networks. It may also employ natural language processing (NLP) to extract information from unstructured medical notes.

Anticipated Findings

The anticipated findings from this study would be the development of an AI model that can effectively predict certain health outcomes or conditions based on patients' medical histories. These findings would contribute to the body of scientific knowledge by demonstrating the potential of AI in healthcare for improving diagnosis, treatment, and patient outcomes.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Yili Lin - Graduate Trainee, George Mason University

SZ Radiation

I am assessing radiation dermatitis in terms of assorted patient-relevant aspects using large datasets.

Scientific Questions Being Studied

I am assessing radiation dermatitis in terms of assorted patient-relevant aspects using large datasets.

Project Purpose(s)

  • Disease Focused Research (Radiation dermatitis)

Scientific Approaches

I plan to use the All of Us research program to provide a large database from which to assess radiation dermatitis.

Anticipated Findings

I expect to learn a large assortment of patient-relevant metrics via data analysis to investigate factors related to radiation dermatitis.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Improving Prediction of Substance Use

Hazardous alcohol and opioid use are prevalent among persons with HIV (PWH) in the US, and contribute to numerous adverse outcomes including HIV disease progression, various co-morbidities, and premature mortality. A significant proportion of PWH co-use alcohol and opioids, which…

Scientific Questions Being Studied

Hazardous alcohol and opioid use are prevalent among persons with HIV (PWH) in the US, and contribute to numerous adverse outcomes including HIV disease progression, various co-morbidities, and premature mortality. A significant proportion of PWH co-use alcohol and opioids, which worsens HIV-related outcomes. Despite the notable prevalence of these types of substance use and their related negative sequelae, alcohol and opioid use are often underdiagnosed among PWH in part due to the widespread use of self-report which is prone to underreporting. Alcohol and opioid use lead to physiological changes (e.g., hematological changes, increased levels of liver-derived enzymes) which are often measured in routine lab tests (e.g., complete blood count, metabolic panel). This study aims to assess whether lab and other health data could be leveraged with machine learning methods to predict alcohol use, opioid use, and co-use in a nationwide sample of PWH in the All of Us Research Program.

Project Purpose(s)

  • Population Health

Scientific Approaches

This study will be conducted among people with HIV in the All of Us Research Program. Within this cohort of individuals with HIV, we will develop datasets that include substance use lab data along with other commonly collected lab and health data (e.g., complete blood count, age, weight, blood pressure). To answer our scientific questions, we will use machine learning to build predictive models for different types of substance use (i.e., alcohol use, opioid use, and alcohol and opioid co-use). These machine learning methods will generate many predictive models, and we will subsequently identify which are the optimal models and whether these optimal models are useful in predicting each type of substance use.

Anticipated Findings

Findings from this analysis will be informative because our main research question is whether lab and other health data are useful in predicting certain types of substance use. We may find that lab and other health data are not predictive of certain types of substance use and that other approaches are needed. On the other hand, we may find that lab and other health data are very useful for predicting substance use in clinical and research settings; findings from this study could then serve as preliminary data for future work developing a predictive tool to improve identification of substance use in clinical settings to better identify PWH in need of additional support, and also improve substance use measurement in research studies.

Demographic Categories of Interest

  • Others

Data Set Used

Registered Tier

Research Team

Owner:

SkinCancer

This paper seeks to provide a comprehensive review of non-genetic risk factors for skin cancer by examining the complete medical histories of patients. By including a broader range of risk factors beyond those traditionally considered by the USPSTF, this review…

Scientific Questions Being Studied

This paper seeks to provide a comprehensive review of non-genetic risk factors for skin cancer by examining the complete medical histories of patients. By including a broader range of risk factors beyond those traditionally considered by the USPSTF, this review aims to enhance our understanding of the multifaceted nature of skin cancer risk.

Project Purpose(s)

  • Disease Focused Research (skin cancer)

Scientific Approaches

This is a retrospective case control study. we will examine the probability of diagnosis among people with skin cancer and people without skin cancer.

Anticipated Findings

The proposed predictive model (including risk factors) can be used to recommend the need of skin cancer screening. Hope to contribute to more accurate risk assessment, earlier detection, and improved patient outcomes in the battle against skin cancer.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Yili Lin - Graduate Trainee, George Mason University

Community-Acquired Sepsis Risk Prediction

This study aims to use data collected through the ‘All of Us’ program to help develop a predictive model of which individuals are at high risk of developing community-onset sepsis. Through a combination of health data including wearable device data…

Scientific Questions Being Studied

This study aims to use data collected through the ‘All of Us’ program to help develop a predictive model of which individuals are at high risk of developing community-onset sepsis. Through a combination of health data including wearable device data (e.g., FitBit data) and other pre-hospital clinical data (e.g., medical history), this project aims to develop a predictive model that will identify individuals at risk for community-acquired sepsis, potentially allowing for early intervention prior to hospitalization. Early treatment of sepsis significantly improves health outcomes, making early identification important. This, the two main goals are as follows:

Specific Aim #1: Develop a deep-learning model that accurately identifies which individuals are at higher risk for community-acquired sepsis.
Specific Aim #2: Analyze which factors are most significant in determining whether or not someone is at high risk for community-acquired sepsis.

Project Purpose(s)

  • Disease Focused Research (Sepsis)

Scientific Approaches

Using this data I will develop several models using machine learning techniques. Through the use of multilayered neural networks, I hope to capture both linear and nonlinear relationships between input factors and the estimated risk of community-acquired sepsis. Model training and validation will occur using a k-fold cross-validation technique with an 80:20 split of the dataset for training and testing respectively. I will measure the performance of the different models I train using various metrics including positive predictive values and specificities. I will then compare models via de Long’s test to determine which model works best using R. Model development will adhere to the TRIPOD reporting framework to ensure transparency. Afterward, I will calculate relevance scores for the input features to show which input features had the highest impact on the output of the model and suggest which factors are most important when evaluating the risk of community-acquired sepsis.

Anticipated Findings

From this study, I aim to develop a model that can accurately predict which individuals are at high risk of community-acquired sepsis. Through the use of this model, I hope to get at-risk individuals the care they need earlier, improving health outcomes. Furthermore, I hope to learn more about which factors increase the risk of specifically community-acquired sepsis, leading to a better understanding of the condition and the identification of potential preventative measures.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Geography
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

Physical Activity and Weight Loss Medications

Our goal is to understand the relationship between physical activity and weight loss medications. New weight loss medications such as semaglutide and liraglutide treat diabetes and obesity. It is unclear the relationship between these medications and physical activity.

Scientific Questions Being Studied

Our goal is to understand the relationship between physical activity and weight loss medications. New weight loss medications such as semaglutide and liraglutide treat diabetes and obesity. It is unclear the relationship between these medications and physical activity.

Project Purpose(s)

  • Disease Focused Research (respiratory, diabetes, and obesity conditions treated by weight loss medication)
  • Population Health

Scientific Approaches

We will look at physical activity and weight loss medications. We will develop diagnoses definitions using Electronic Health Records (EHR), look at International Classification of Diseases (ICD) codes and medication prescriptions. We will use Fitbit data, primarily daily steps but including activity intensity and heart rate. We will use EHR data to look at weight trajectories and secondary medications, such as ondansetron, which could indicate side effects from primary weight loss medications.

Anticipated Findings

We expect that physical activity will increase after initiation of drug. As people lose weight, it may make them more likely to and able to exercise. Given that these medications are relatively new, there is little knowledge on the physical activity effects.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Joshua Halevi - Undergraduate Student, Vanderbilt University Medical Center

Collaborators:

  • Jeffrey Annis - Other, Vanderbilt University Medical Center

Physical Activity and Emergency Data

Our goal is to understand the relationship between physical activity and emergency room visits. We will examine emergency room visits and look for physical activity trends surrounding the visit. Increased physical activity has been shown to improve quality of life…

Scientific Questions Being Studied

Our goal is to understand the relationship between physical activity and emergency room visits. We will examine emergency room visits and look for physical activity trends surrounding the visit. Increased physical activity has been shown to improve quality of life and reduce incidence of many chronic diseases. We will examine if it impacts emergency room visits.

Project Purpose(s)

  • Population Health

Scientific Approaches

We will look at physical activity and emergency room visits. We will develop diagnoses definitions using Electronic Health Records (EHR), look at International Classification of Diseases (ICD) codes, Current Procedural Terminology (CPT) codes, and medication prescriptions. We will use Fitbit data, primarily daily steps but including activity intensity and heart rate. We will use EHR data to look at frequency and severity of emergency room visits, which may be impacted by physical activity.

Anticipated Findings

We expect that physical activity will decrease prior to ER visits. We expect that it will further decline after the ER visit. Physical activity data may be a predictor of future ER visits.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Joshua Halevi - Undergraduate Student, Vanderbilt University Medical Center

Collaborators:

  • Jeffrey Annis - Other, Vanderbilt University Medical Center
1 - 25 of 11945
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.