Research Projects Directory

Research Projects Directory

10,560 active projects

This information was updated 4/26/2024

The Research Projects Directory includes information about all projects that currently exist in the Researcher Workbench to help provide transparency about how the Workbench is being used. Each project specifies whether Registered Tier or Controlled Tier data are used.

Note: Researcher Workbench users provide information about their research projects independently. Views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program. Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

Health_Activity_Patterns

Exploring the Association Between Personal/Family Health History and Daily Physical Activity and Sleep Behavior Patterns

Scientific Questions Being Studied

Exploring the Association Between Personal/Family Health History and Daily Physical Activity and Sleep Behavior Patterns

Project Purpose(s)

  • Social / Behavioral
  • Educational

Scientific Approaches

Utilize statistical methods to analyze the survey data on personal/family health history. This may involve descriptive statistics to understand the distribution of health conditions within the sample and inferential statistics (such as correlation or regression analysis) to examine associations between health history variables and daily activity/sleep patterns. Process and analyze the Fitbit data to extract relevant metrics on daily physical activity levels and sleep behavior patterns. This could involve calculating daily step counts, active minutes, sleep duration, sleep efficiency, etc. and then Merge the survey and Fitbit data at the individual level for integrated analysis.

Anticipated Findings

Positive associations between certain health conditions in personal/family history (e.g., obesity, diabetes, cardiovascular diseases) and lower levels of daily physical activity.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Sexual Orientation
  • Geography

Data Set Used

Registered Tier

Research Team

Owner:

Duplicate of Data Wrangling in All of Us Program (v7)

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Scientific Questions Being Studied

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Project Purpose(s)

  • Educational
  • Other Purpose (For use with Office hours. notebooks for adding code snippets useful for researchers. This is a placeholder for creating notebooks for best practices among other things)

Scientific Approaches

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Anticipated Findings

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

hcc_relevant_survey_questions

This project aims to study what diseases Medicare-age (65+) Americans have using survey data. I am exploring this to then be able to compare disease prevalence with billing and reimbursement practices in Medicare.

Scientific Questions Being Studied

This project aims to study what diseases Medicare-age (65+) Americans have using survey data. I am exploring this to then be able to compare disease prevalence with billing and reimbursement practices in Medicare.

Project Purpose(s)

  • Population Health

Scientific Approaches

I will use a subset of the All of Us survey data (corresponding to diseases that overlap with those reimbursed by the CMS-HCC algorithm) to calculate frequencies of diseases and disease co-occurrences in a Medicare-relevant (age 65+) population.

Anticipated Findings

My anticipated findings are a correlation table of disease frequencies and disease co-occurrence frequencies. Knowledge of disease co-occurrence without diagnosis or billing data is generally unexamined, so this will contribute that information to scientific knowledge.

Demographic Categories of Interest

  • Age

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Malcolm Barrett - Research Associate, Stanford University

Duplicate of Data Wrangling in All of Us Program (v7)

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Scientific Questions Being Studied

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Project Purpose(s)

  • Educational
  • Other Purpose (For use with Office hours. notebooks for adding code snippets useful for researchers. This is a placeholder for creating notebooks for best practices among other things)

Scientific Approaches

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Anticipated Findings

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Access to Healthcare Services for People with Disabilities

This workspace will investigate the relationship between reported genomic data, disability, stress, and access to mental health/disability related services.

Scientific Questions Being Studied

This workspace will investigate the relationship between reported genomic data, disability, stress, and access to mental health/disability related services.

Project Purpose(s)

  • Population Health
  • Educational

Scientific Approaches

The study will use the All of Us Controlled Tier Dataset v7, analyzed in SAS. Regression analysis will be utilized. Datasets will include genetic information and report on stress, disability, and race/ethnicity,

Anticipated Findings

It is anticipated that people who are identified as having a disability who also report lack of medical access to services may also report higher levels of stress. It is also anticipated that there may be a relationship between access to healthcare and race/ethnicity.

Demographic Categories of Interest

  • Race / Ethnicity
  • Disability Status
  • Access to Care

Data Set Used

Controlled Tier

Research Team

Owner:

  • Marcella McCollum - Early Career Tenure-track Researcher, San Jose State University

Collaborators:

  • Gabriela Lopez - Graduate Trainee, San Jose State University
  • Anthony Garcia - Graduate Trainee, San Jose State University

Assortative Mating Across Time (AMAT)

Assortative Mating Across Time (AMAT) aims to answer the research question: "How does the degree of assortative mating change across time, especially for traits (phenotypes) of interest to social scientists?" I define assortative mating as a phenomenon where individuals have…

Scientific Questions Being Studied

Assortative Mating Across Time (AMAT) aims to answer the research question: "How does the degree of assortative mating change across time, especially for traits (phenotypes) of interest to social scientists?" I define assortative mating as a phenomenon where individuals have children with other individuals who have similar genetic makeup (genotypes) or characteristics (phenotypes) more often than what we would expect under random mating. I define random mating as a phenomenon where individuals are equally as likely to have children with any other individual in a population. This question is important for 2 reasons. First, assortative mating is a key source of statistical bias in many current pieces of research using genetic data; understanding the changing levels of assortative mating over time can give us further insight into this issue. Second, understanding how assortative mating changes over time helps us explain some important social phenomena through the lens of genetics.

Project Purpose(s)

  • Social / Behavioral
  • Methods Development
  • Ancestry

Scientific Approaches

A central feature of AMAT is the development of a computer program that is able to use minimal inputs to measure assortative mating across time. These inputs are (1) birth dates and (2) polygenic indices (PGIs). PGIs assign numbers to individuals to measure their genetic predisposition for a trait (phenotype). I aim to create PGIs for UK Biobank (UKB) and All of Us participants and pass them through AMAT, estimating the levels of assortative mating across time for each sample. Traits (phenotypes) for which I hope to measure assortative mating include but are not limited to height, education attainment, and smoking patterns.

Anticipated Findings

Current results using data from the UKB show increasing assortative mating for height and education attainment in the mid-1900s. I anticipate similar results using All of Us data. The findings we make with AMAT will expand on a growing literature on assortative mating, contributing a novel tool and statistical method to measure assortative mating trends.

Demographic Categories of Interest

  • Geography
  • Education Level

Data Set Used

Controlled Tier

Research Team

Owner:

  • Matthew Howell - Research Assistant, National Bureau of Economic Research

Duplicate of Androgenetic Alopecia

The aim of this study is to identify vascular comorbidities associated with early-onset androgenetic alopecia (AGA). We hypothesize that early-onset AGA is associated with vascular conditions such as hypertension, atherosclerosis, heart disease, hypercholesterolemia, and peripheral vascular disease. We hope the…

Scientific Questions Being Studied

The aim of this study is to identify vascular comorbidities associated with early-onset androgenetic alopecia (AGA). We hypothesize that early-onset AGA is associated with vascular conditions such as hypertension, atherosclerosis, heart disease, hypercholesterolemia, and peripheral vascular disease. We hope the results of this study will aid in the detection and thus early intervention of such conditions.

Project Purpose(s)

  • Disease Focused Research (androgenic alopecia)
  • Educational

Scientific Approaches

Patients aged 18 – 30 years recruited to the All of Us cohort between May 6, 2018 and December 31, 2023 will be included. International Classification of Diseases (ICD) and Systematized Nomenclature of Medicine (SNOMED) diagnostic codes from electronic health record data will be used to identify cases of androgenetic alopecia (ICD-10 codes L64 and L64.9; SNOMED code 87872006), essential hypertension (ICD-10 code I10; SNOMED code 59621000), atherosclerotic heart disease (ICD-10 code I25; SNOMED code 41702007), atherosclerosis (I70; SNOMED code 38716007), hypercholesterolemia (ICD-10 code E78; SNOMED code 13644009), and peripheral vascular disease (ICD 10 code 173.9; SNOMED code 400047006). Each case of early-onset AGA will be matched to three controls based on age, sex, and ethnicity using nearest neighbor propensity score matching.

Anticipated Findings

An FDA-approved treatment for AGA is minoxidil, a medication with vasodilatory qualities first introduced in the 1970s for the treatment of refractory hypertension. Minoxidil opens adenosine triphosphate-sensitive potassium channels leading to vascular smooth muscle relaxation. While the exact mechanism of minoxidil in the treatment of AGA is unknown, a proposed mechanism is by increasing cutaneous blood flow. A double-blind study of 16 people found topical minoxidil 5% significantly increased skin blood flow. Minoxidil has also been found to upregulate the expression of VEGF in cultured hair derma papillae, potentially explaining the increase in follicular capillary fenestrations in minoxidil-treated rats. These results suggest that AGA is potentially mediated through vascular factors. As such, the early onset of AGA may be a predictor of vascular comorbidities.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Wearables and Dementia

We are examining differences in sleeping and activity patterns between older and younger individuals. We are especially interested in how these measures change longitudinally, and what motifs of sleeping and activity patterns may be early warning signs of neurodegenerative diseases…

Scientific Questions Being Studied

We are examining differences in sleeping and activity patterns between older and younger individuals. We are especially interested in how these measures change longitudinally, and what motifs of sleeping and activity patterns may be early warning signs of neurodegenerative diseases in older populations. The AoU dataset provides a large population of older individuals with wearable device data that would be well-suited for answering these questions.

Project Purpose(s)

  • Disease Focused Research (dementia)
  • Social / Behavioral
  • Methods Development

Scientific Approaches

We plan to use wearable device data from younger (~18-30 year olds) and older (~65+ year olds) individuals. We also plan to compare sleep and activity patterns from older individuals with and without dementia. Therefore, we will utilize EHR data to ascertain an approximate date of dementia diagnosis. As it would be ideal to age- and sex-match these older individuals with each other, we will utilize demographic data. We plan on using simple statistical tests to differentiate between older/younger individuals and older w/dementia/older w/o dementia individuals.

Anticipated Findings

We may anticipate finding salient differences in activity between older and younger individuals, as well as differences in sleeping habits as individuals age. We may anticipate finding differences in sleep onset variability between older individuals with dementia and those without dementia. These findings would help legitimize wearable devices as non-invasive, longitudinal risk-monitoring devices for "healthy" aging.

Demographic Categories of Interest

  • Age

Data Set Used

Controlled Tier

Research Team

Owner:

SM APOE 2024

Variants in the Apolipoprotein E (APOE) have been shown to have differential risk for conditions such as Alzheimer's disease and cardiovascular disease. This workspace will assess the feasibility of a large APOE association study by summarizing the APOE genotypes, hypercholesterolemia…

Scientific Questions Being Studied

Variants in the Apolipoprotein E (APOE) have been shown to have differential risk for conditions such as Alzheimer's disease and cardiovascular disease. This workspace will assess the feasibility of a large APOE association study by summarizing the APOE genotypes, hypercholesterolemia and demographics.

Project Purpose(s)

  • Educational
  • Ancestry

Scientific Approaches

The two rsIDs for APOE genotyping (rs429358 and rs7412) will be extracted from AllOfUs Hail tables. Data will be summarized into the defined APOE genotypes (ε2, ε3, ε4) and associated with hypercholesterolemia status. Hypercholesterolemia will be defined by the SNOMED code "E78.0 - Pure hypercholesterolemia". Summary statistics for APOE association will be stratified by age, sex and declared race/ethnicity.

Anticipated Findings

The study hopes to find similar statics as has been found in the UKBiobank cohort, with the addition of higher diversity. The summary statistics of these findings will be used as preliminary data to propose a larger cohort study.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Sevda Molani - Research Fellow, Institute for Systems Biology

Control SV

We will perform statistical analysis/enrichment tests on rare SV variants across different ethnicity groups to find the SV mutagenesis hotspots.

Scientific Questions Being Studied

We will perform statistical analysis/enrichment tests on rare SV variants across different ethnicity groups to find the SV mutagenesis hotspots.

Project Purpose(s)

  • Ancestry

Scientific Approaches

We will perform statistical method on to identify association between rare SV with some health conditions

Anticipated Findings

We anticipate finding SV mutation hotspots and association with some common conditions.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Haowei Du - Graduate Trainee, Baylor College of Medicine

Rice cultivation and human adapatation

Rice is a staple part of the diet in many Asian countries. Its cultivation started in Asia around 6-9000 years ago, which enabled Asians to shift from a nomadic to a settled way of life; the reliable food supply allowed…

Scientific Questions Being Studied

Rice is a staple part of the diet in many Asian countries. Its cultivation started in Asia around 6-9000 years ago, which enabled Asians to shift from a nomadic to a settled way of life; the reliable food supply allowed humans to increase population size and density, which may have introduced new selective pressures coupled to nutritious changes due to the dietary shift. I aim to understand how rice cultivation changed humans biologically in Asia by analyzing genomic data of human and rice populations across Asia. I will reconstruct detailed demographic history and dispersal patterns of human populations in Asia in the last 10,000 years, and examine its relationship to dispersal patterns of rice. I will also detect genetic variants in humans adaptive to rice cultivation and obtain functional and medical insights of the variants to reveal the driving force of natural selection and their influence on modern-day health outcomes in Asians.

Project Purpose(s)

  • Ancestry

Scientific Approaches

I will apply numerous population genetic approach to the publicly available Asian genomic data. I will run smcpp and IBDNe to assess population size changes and conduct ADMIXTURE, finestructure, f-statistics to see relationships between populations. I will apply ASMC and Relate to Asian ancestry individuals in All of US to detect positive selection occurred in the last 10000 years. The frequency trajectory for the variants undergone strong selection will be checked with CLUES 2. I will also conduct Gene Ontology analyses and Phenome wide association studies for the detected variants to obtain functional information.

Anticipated Findings

I anticipate to find several variants undergone selection as a result of the adaptation to rice and those variants may make difference of some disease prevalence between Asians and other continental populations and/or within Asians. Since different species of rice were domesticated and the degree of dependence on rice as a source of energy is different between South Asia and East Asia, adaptation might have been in different directions.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Mariko Segawa - Research Fellow, Albert Einstein College of Medicine

Mental Health

Our scientific question we intend to study is "Does the poverty rate effect post-partum health outcomes within African American women who have limited healthcare access within the first six months after delivery?" We intend to investigate whether or not there…

Scientific Questions Being Studied

Our scientific question we intend to study is "Does the poverty rate effect post-partum health outcomes within African American women who have limited healthcare access within the first six months after delivery?" We intend to investigate whether or not there is a correlation between negative postpartum health outcomes and low income African American women with limited healthcare access within the first 6 months after delivery. This question is important because it address a significant gap in our understanding of postpartum health disparities among African American women. By investigating the relationship between poverty, healthcare access, and postpartum health outcomes, we can identify potential interventions to address these disparities and improve overall maternal and infant health outcomes. This research is relevant to both science and public health as it has the potential to inform policies and interventions aimed at reducing health inequities and promoting health equity

Project Purpose(s)

  • Educational

Scientific Approaches

we will collect data on poverty rates, healthcare access, and postpartum health outcomes among African American women. This study design allows for the examination of associations between variables but does not involve experimental manipulation. Instead, it relies on existing data sources.

Anticipated Findings

Based on our hypothesis, we anticipate finding a correlation between postpartum health outcomes and low income among African American women with limited healthcare access after delivery. Furthermore, we expect that this correlation will result in negative postpartum health outcomes, such as higher rates of postpartum depression, maternal morbidity, and infant mortality. These findings will contribute to a better understanding of the factors influencing postpartum health disparities among African American women and may inform interventions to improve healthcare access and outcomes in this population.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

Vision Loss

Our objective is to investigate the socioeconomic factors associated with loss of visual acuity using the AllofUs National database.

Scientific Questions Being Studied

Our objective is to investigate the socioeconomic factors associated with loss of visual acuity using the AllofUs National database.

Project Purpose(s)

  • Disease Focused Research (binocular vision disease)

Scientific Approaches

We will be using the diverse patient population of AllofUs database to look for any disparities regarding visual acuity. We will also use patient filled surveys to look for any patient reported perceptions to eye care.

Anticipated Findings

We anticipate to find socioeconomic and racial disparities regarding eye care, specifically visual acuity between participants.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Registered Tier

Research Team

Owner:

Duplicate of Data Wrangling in All of Us Program (v7)

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Scientific Questions Being Studied

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Project Purpose(s)

  • Educational
  • Other Purpose (For use with Office hours. notebooks for adding code snippets useful for researchers. This is a placeholder for creating notebooks for best practices among other things)

Scientific Approaches

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Anticipated Findings

For Educational purpose to show best practices when using jupyter notebooks for data access, storage, data manipulations - transformations, conversions, cleaning, optimization and other research support related issues that is useful for multiple AoU researchers.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Yan Zeng - Graduate Trainee, Baylor College of Medicine

Androgenetic Alopecia

The aim of this study is to identify vascular comorbidities associated with early-onset androgenetic alopecia (AGA). We hypothesize that early-onset AGA is associated with vascular conditions such as hypertension, atherosclerosis, heart disease, hypercholesterolemia, and peripheral vascular disease. We hope the…

Scientific Questions Being Studied

The aim of this study is to identify vascular comorbidities associated with early-onset androgenetic alopecia (AGA). We hypothesize that early-onset AGA is associated with vascular conditions such as hypertension, atherosclerosis, heart disease, hypercholesterolemia, and peripheral vascular disease. We hope the results of this study will aid in the detection and thus early intervention of such conditions.

Project Purpose(s)

  • Disease Focused Research (androgenic alopecia)
  • Educational

Scientific Approaches

Patients aged 18 – 30 years recruited to the All of Us cohort between May 6, 2018 and December 31, 2023 will be included. International Classification of Diseases (ICD) and Systematized Nomenclature of Medicine (SNOMED) diagnostic codes from electronic health record data will be used to identify cases of androgenetic alopecia (ICD-10 codes L64 and L64.9; SNOMED code 87872006), essential hypertension (ICD-10 code I10; SNOMED code 59621000), atherosclerotic heart disease (ICD-10 code I25; SNOMED code 41702007), atherosclerosis (I70; SNOMED code 38716007), hypercholesterolemia (ICD-10 code E78; SNOMED code 13644009), and peripheral vascular disease (ICD 10 code 173.9; SNOMED code 400047006). Each case of early-onset AGA will be matched to three controls based on age, sex, and ethnicity using nearest neighbor propensity score matching.

Anticipated Findings

An FDA-approved treatment for AGA is minoxidil, a medication with vasodilatory qualities first introduced in the 1970s for the treatment of refractory hypertension. Minoxidil opens adenosine triphosphate-sensitive potassium channels leading to vascular smooth muscle relaxation. While the exact mechanism of minoxidil in the treatment of AGA is unknown, a proposed mechanism is by increasing cutaneous blood flow. A double-blind study of 16 people found topical minoxidil 5% significantly increased skin blood flow. Minoxidil has also been found to upregulate the expression of VEGF in cultured hair derma papillae, potentially explaining the increase in follicular capillary fenestrations in minoxidil-treated rats. These results suggest that AGA is potentially mediated through vascular factors. As such, the early onset of AGA may be a predictor of vascular comorbidities.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Genetics

How do genetic markers contribute to variations in different populations? We want to use the data sets to find more information about less-studied populations.

Scientific Questions Being Studied

How do genetic markers contribute to variations in different populations? We want to use the data sets to find more information about less-studied populations.

Project Purpose(s)

  • Ancestry

Scientific Approaches

Honestly, I am new at using the All of Us data sets, but I look forward to expanding my knowledge as I go. I plan to use statistical models to describe my findings.

Anticipated Findings

My goal for my findings is to provide insight into the genetic history of Polynesian populations. These findings would contribute to the body of scientific knowledge in the field because we would be able to learn more about genetic markers.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Registered Tier

Research Team

Owner:

Development and evaluation of polygenic risk scores across complex diseases

The aim of this research is to develop and validate models for predicting risks of common complex diseases, like cancers, heart disease and type-2 diabetes, and evaluate potential utility of such models for developing strategies for risk-based approach to disease…

Scientific Questions Being Studied

The aim of this research is to develop and validate models for predicting risks of common complex diseases, like cancers, heart disease and type-2 diabetes, and evaluate potential utility of such models for developing strategies for risk-based approach to disease prevention through lifestyle modification, screening and medication. We will leverage the large size and diversity of All of US study to develop and validate comprehensive multi-ethnic models that will incorporate information on sociodemographic indicators, lifestyle factors, environmental exposures, family and medical history, biomarkers and whole genome genotyping and sequencing profiles of individuals. Integration of information across multiple domains of data is expected to lead to improved models for risk prediction and thus will lead to maximization of benefit and minimization harms and economic costs associated with various types of available interventions for disease prevention.

Project Purpose(s)

  • Population Health
  • Methods Development
  • Ancestry

Scientific Approaches

We will be developing predictive models based on classical statistical methods as well as advanced machine learning algorithms. We will build "cohorts" based on individuals who are free of specific diseases of interest at the time to entry to All of Us. We will then link information on "baseline" variables for these individuals to prospectively collected data on disease outcomes (e.g those captured through electronic medical records). Disease-specific models will incorporate available information on corresponding well established risk factors such as age, family history, smoking, BMI and alcohol consumption. In addition, when genetic data becomes available, the model will incorporate information on emerging polygenic risk scores from genome-wide association studies. Finally, we will explore potential role of high-dimensional biomarkers, such as blood metabolites, on risk prediction beyond risk factors that are easy to ascertain and evaluated in more parsimonious models.

Anticipated Findings

Our study will lead to comprehensive multi-ethnic models for risk prediction models across a number of common chronic diseases. Using results from the study, we will further develop online risk calculators for potential clinical applications.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age
  • Sex at Birth
  • Gender Identity
  • Geography
  • Disability Status
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Yujie Wei - Graduate Trainee, Johns Hopkins University
  • Yuzheng Dun - Graduate Trainee, Johns Hopkins University
  • Wen Shi - Project Personnel, Johns Hopkins University
  • Martina Fu - Graduate Trainee, Johns Hopkins University

Duplicate of Demo - Cardiovascular Risk Scoring

1- Can we use All of Us data to calculate the cardiovascular pooled score? We want to use and utilize the unique data collected by the All of Us program including smoking information, underrepresented race data, and measurement values such…

Scientific Questions Being Studied

1- Can we use All of Us data to calculate the cardiovascular pooled score? We want to use and utilize the unique data collected by the All of Us program including smoking information, underrepresented race data, and measurement values such as blood pressure and cholesterol to calculate the score.
2- Can we identify the scores that we calculate within a year of enrollment? We wanted to know the participants who might have cardiovascular risk score within the time enrollment. This will help the program quantify the importance of collecting longitudinal data for participants.
3- Will the risk score per race group be different? We compared the risk scores in each racial groups to quantify if some racial groups have higher or lower risk scores.
Citation is: Citation is Ramirez AH, Sulieman L, Schlueter DJ, Halvorson A, Qian J, et al. (2022). The All of Us Research Program: Data quality, utility, and diversity. Patterns. 12;3(8). https://doi.org/10.1016/j.patter.2022.100570.

Project Purpose(s)

  • Disease Focused Research (Cardiovascular disease )
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the All of Us Data and Research Center to ensure compliance with program policy, including acceptable data access and use. )

Scientific Approaches

In this project, we plan on using the AHA algorithm/equation to calculate the cardiovascular risk scores ( https://ahajournals.org/doi/full/10.1161/01.cir.0000437741.48606.98). Further, we want to demonstrate the usage of smoking and race data collected by the program, which are data that usually researchers use natural language processing to extract, to facilitate the calculation of cardiovascular risk score.
We will calculate the scores using 1- Data manipulation: Using python and BigQuery to: A- Retrieve medications (diabetes), lab measurements including systolic blood pressure, diastolic blood pressure, cholesterol, race, and smoking information provided by participants 2- Visualization: A- Creating histogram for calculated scores using python visualization library Matplotlib

Anticipated Findings

For this study, we anticipate demonstrating the validity and importance of the data collected by the program and can be challenging to extract from medical records (smoking status) by showing by calculating the cardiovascular risk within 10 years. We expect to find: 1) the easiness in using data from different sources (EHR and survey data) to build a model or calculate a risk. 2) the heterogeneity in All of Us population where underrepresented population in clinical trials or clinical data set are more present in the All of Us 3) the cardiovascular risk score is different in racial groups.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Anti-NMDA Receptor encephalitis

What is prevalence of anti-NMDA receptor encephalitis in the US and how does it compare to Southwest Virginia? No cases were reported in this region. We hope to determine the reason why this disorder is under-diagnosed in rural communities.

Scientific Questions Being Studied

What is prevalence of anti-NMDA receptor encephalitis in the US and how does it compare to Southwest Virginia? No cases were reported in this region. We hope to determine the reason why this disorder is under-diagnosed in rural communities.

Project Purpose(s)

  • Educational

Scientific Approaches

Compare prevalence across regions. Determine characteristics that may relate to diagnosis frequency.

Anticipated Findings

We hypothesize that a combination of physician familiarity and patient socioeconomic conditions determine the frequency of diagnosis.

Demographic Categories of Interest

  • Geography

Data Set Used

Controlled Tier

Research Team

Owner:

  • Ramu Anandakrishnan - Early Career Tenure-track Researcher, Edward Via College of Osteopathic Medicine, Carolina Campus

Duplicate of Beginner Intro to AoU Data and the Workbench (v7)

This workspace contains multiple notebooks that assess users' understanding of the workbench and OMOP. These notebooks are meant to help users check their knowledge not only on Python, R, and SQL, but also on the general data structure and data…

Scientific Questions Being Studied

This workspace contains multiple notebooks that assess users' understanding of the workbench and OMOP. These notebooks are meant to help users check their knowledge not only on Python, R, and SQL, but also on the general data structure and data model used by the All of Us program.

Project Purpose(s)

  • Educational

Scientific Approaches

There are no scientific approach used in this workspace because it is meant for educational purposes only. We will cover all aspects of OMOP, and hence will use most datasets available in the workbench.

Anticipated Findings

We do not anticipate to have any findings. Instead, we are educating people on the use of the workbench and the common data model OMOP used by the program.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Zhiyan Li - Graduate Trainee, Emory University

Estimating the causal effect of SDoH exposures on adverse cardiovascular events

In recent years, the biomedical research community has become more aware of the potential influence that a set of non-medical factors have on health outcomes (Marmot et al. 2008). Such social determinants of health (SDOH) are comprised of the conditions…

Scientific Questions Being Studied

In recent years, the biomedical research community has become more aware of the potential influence that a set of non-medical factors have on health outcomes (Marmot et al. 2008). Such social determinants of health (SDOH) are comprised of the conditions in which people work and live, including income, education, food security, early childhood development, social inclusion, etc. Recently, an association was observed between SDOH and adverse cardiovascular outcomes in heart failure (HF) patients (Vinter et al. 2022). However, whether the link between SDOH and adverse cardiovascular outcomes is causal and, in which direction, remains to be seen. This study is important because a statistical correlation between two variables can be explained by multiple underlying causal networks and knowledge of the causal network can help focus research and healthcare delivery.

Project Purpose(s)

  • Disease Focused Research (Cardiovascular disease(s))
  • Population Health
  • Social / Behavioral
  • Educational
  • Methods Development

Scientific Approaches

We will develop a list of confounders, SDOH exposures of interest and outcomes of adverse cardiovascular conditions / events, based on a literature review and evaluation of the data available to us. We will formulate these into a series of directed acyclic graphs (DAGs) per exposure-outcome model.

We will begin transforming and building our analytic dataset. We will make use of the various tools AllOfUs provides via Workbench platform. These tools include: Workspaces, R and Python, Dataset Builders and Cohort Builders.

We will conduct an exploratory data analysis (EDA).

We will then calculate propensity score models and employ inverse probability weighting (IPW) to weight regression models. Doubly robust methods such as targeted maximum likelihood estimation (TMLE) can be explored. Alternative to TMLE, a factor analysis can be performed on various SDOH and demographic survey measures available. These aggregate factors could then be utilized as exposures in our IPW regression models.

Anticipated Findings

Ideally, this study will answer the following question: What is the strength and direction of the causal relationship between multiple SDOH exposures and adverse cardiovascular outcomes? Currently, many SDOH exposures are associated with a variety of poor medical outcomes. We expect our models to show that some SDOH exposures do not causally influence adverse cardiovascular outcomes, and that other factors do have a causal influence and to what magnitude. Knowledge of the underlying causal structure can help focus public policy and healthcare delivery on SDOH factors that really matter to patients suffering from cardiovascular disease(s).

Demographic Categories of Interest

  • Geography
  • Access to Care
  • Education Level
  • Income Level

Data Set Used

Controlled Tier

Research Team

Owner:

Duplicate of Demo - Cardiovascular Risk Scoring

1- Can we use All of Us data to calculate the cardiovascular pooled score? We want to use and utilize the unique data collected by the All of Us program including smoking information, underrepresented race data, and measurement values such…

Scientific Questions Being Studied

1- Can we use All of Us data to calculate the cardiovascular pooled score? We want to use and utilize the unique data collected by the All of Us program including smoking information, underrepresented race data, and measurement values such as blood pressure and cholesterol to calculate the score.
2- Can we identify the scores that we calculate within a year of enrollment? We wanted to know the participants who might have cardiovascular risk score within the time enrollment. This will help the program quantify the importance of collecting longitudinal data for participants.
3- Will the risk score per race group be different? We compared the risk scores in each racial groups to quantify if some racial groups have higher or lower risk scores.
Citation is: Citation is Ramirez AH, Sulieman L, Schlueter DJ, Halvorson A, Qian J, et al. (2022). The All of Us Research Program: Data quality, utility, and diversity. Patterns. 12;3(8). https://doi.org/10.1016/j.patter.2022.100570.

Project Purpose(s)

  • Disease Focused Research (Cardiovascular disease )
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the All of Us Data and Research Center to ensure compliance with program policy, including acceptable data access and use. )

Scientific Approaches

In this project, we plan on using the AHA algorithm/equation to calculate the cardiovascular risk scores ( https://ahajournals.org/doi/full/10.1161/01.cir.0000437741.48606.98). Further, we want to demonstrate the usage of smoking and race data collected by the program, which are data that usually researchers use natural language processing to extract, to facilitate the calculation of cardiovascular risk score.
We will calculate the scores using 1- Data manipulation: Using python and BigQuery to: A- Retrieve medications (diabetes), lab measurements including systolic blood pressure, diastolic blood pressure, cholesterol, race, and smoking information provided by participants 2- Visualization: A- Creating histogram for calculated scores using python visualization library Matplotlib

Anticipated Findings

For this study, we anticipate demonstrating the validity and importance of the data collected by the program and can be challenging to extract from medical records (smoking status) by showing by calculating the cardiovascular risk within 10 years. We expect to find: 1) the easiness in using data from different sources (EHR and survey data) to build a model or calculate a risk. 2) the heterogeneity in All of Us population where underrepresented population in clinical trials or clinical data set are more present in the All of Us 3) the cardiovascular risk score is different in racial groups.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Feng Chen - Graduate Trainee, University of Washington

Hispanics Diabetes Study

We hypothesize that social behavioral factors, such as social support and social stress, will negatively impact the health of Mexican-origin Hispanics. Investigating this is important for understanding health disparities among Hispanics and improving their healthy lifespan.

Scientific Questions Being Studied

We hypothesize that social behavioral factors, such as social support and social stress, will negatively impact the health of Mexican-origin Hispanics. Investigating this is important for understanding health disparities among Hispanics and improving their healthy lifespan.

Project Purpose(s)

  • Other Purpose (I will sue this workspace to investigate the impact of social behavior and clinical factors on the healthy of Mexican-original Hispanics, focusing on diabetes disease.)

Scientific Approaches

The data will include demographics, clinical data, molecular data, and social behavioral data.

Regarding the method, we will conduct descriptive analysis, logistic regression, and advanced analytical methods such as machine learning (ML).

Anticipated Findings

Anticipated findings:
These social behavioral factors are negatively associated with the occurrence of diabetes among Hispanics.

Demographic Categories of Interest

  • Race / Ethnicity
  • Age

Data Set Used

Registered Tier

Research Team

Owner:

  • Cai Xu - Early Career Tenure-track Researcher, University of Texas at El Paso

Genetics of Myocarditis

In this study utilizing the All of Us Biobank, we aim to investigate the impact of cardiomyopathy-associated gene variants on myocarditis as well as the development of heart failure, arrhythmia, and other outcomes following myocarditis among a diverse cohort of…

Scientific Questions Being Studied

In this study utilizing the All of Us Biobank, we aim to investigate the impact of cardiomyopathy-associated gene variants on myocarditis as well as the development of heart failure, arrhythmia, and other outcomes following myocarditis among a diverse cohort of 245,388 individuals with detailed exome sequencing data. We plan to analyze how the presence of myocarditis influences the progression to cardiomyopathy and heart failure. We are seeking to understand the prognostic significance and risk prediction potential of analyzing the burden of gene variants on the outcomes of individuals with myocarditis – which holds great potential to enhance risk prediction and primary prevention of secondary outcomes in individuals based on their genotype status. Ultimately, our goal is to enhance personalized treatment potential and improve precision medicine interventions for patients with gene mutations in cardiomyopathy genes.

Project Purpose(s)

  • Disease Focused Research (Myocarditis and Cardiomyopathies)

Scientific Approaches

In this study, I plan to analyze the exomic data of individuals diagnosed with myocarditis, comparing them against a control group within the All of Us Researcher Workbench biobank. The primary dataset comprises exome sequencing results from 245,388 participants, enriched with clinical data including onset and progression of myocarditis and cardiomyopathy. Using categorical analysis and comparative statistics, I will investigate phenotypic differences between individuals who are phenotype-positive (those with myocarditis) and phenotype-negative (those without myocarditis), as well as differences based on genotype-positive status. This approach will enable the identification of genetic variants and patterns that may predispose individuals to myocarditis or influence the progression to cardiomyopathy. Regression analysis will will assess the impact of these genotypes on health outcomes, enhancing our understanding of myocarditis's genetic basis and clinical progression.

Anticipated Findings

We anticipate discovering a higher prevalence of cardiomyopathy-related gene variants in individuals with myocarditis. These findings could enable the prediction of increased risk for progression to cardiomyopathy or heart failure among those diagnosed with myocarditis, based on their genetic profiles. This research could significantly contribute to the scientific and medical community's understanding of the genetic factors influencing cardiovascular diseases and enhance our ability to implement preemptive medical interventions.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Duplicate of Demo - Cardiovascular Risk Scoring

1- Can we use All of Us data to calculate the cardiovascular pooled score? We want to use and utilize the unique data collected by the All of Us program including smoking information, underrepresented race data, and measurement values such…

Scientific Questions Being Studied

1- Can we use All of Us data to calculate the cardiovascular pooled score? We want to use and utilize the unique data collected by the All of Us program including smoking information, underrepresented race data, and measurement values such as blood pressure and cholesterol to calculate the score.
2- Can we identify the scores that we calculate within a year of enrollment? We wanted to know the participants who might have cardiovascular risk score within the time enrollment. This will help the program quantify the importance of collecting longitudinal data for participants.
3- Will the risk score per race group be different? We compared the risk scores in each racial groups to quantify if some racial groups have higher or lower risk scores.
Citation is: Citation is Ramirez AH, Sulieman L, Schlueter DJ, Halvorson A, Qian J, et al. (2022). The All of Us Research Program: Data quality, utility, and diversity. Patterns. 12;3(8). https://doi.org/10.1016/j.patter.2022.100570.

Project Purpose(s)

  • Disease Focused Research (Cardiovascular disease )
  • Other Purpose (This work is a result of an All of Us Research Program Demonstration Project. The projects are efforts by the Program designed to meet the program's goal of ensuring the quality and utility of the Research Hub as a resource for accelerating discovery in science and medicine. This work was reviewed and overseen by the All of Us Research Program Science Committee and the All of Us Data and Research Center to ensure compliance with program policy, including acceptable data access and use. )

Scientific Approaches

In this project, we plan on using the AHA algorithm/equation to calculate the cardiovascular risk scores ( https://ahajournals.org/doi/full/10.1161/01.cir.0000437741.48606.98). Further, we want to demonstrate the usage of smoking and race data collected by the program, which are data that usually researchers use natural language processing to extract, to facilitate the calculation of cardiovascular risk score.
We will calculate the scores using 1- Data manipulation: Using python and BigQuery to: A- Retrieve medications (diabetes), lab measurements including systolic blood pressure, diastolic blood pressure, cholesterol, race, and smoking information provided by participants 2- Visualization: A- Creating histogram for calculated scores using python visualization library Matplotlib

Anticipated Findings

For this study, we anticipate demonstrating the validity and importance of the data collected by the program and can be challenging to extract from medical records (smoking status) by showing by calculating the cardiovascular risk within 10 years. We expect to find: 1) the easiness in using data from different sources (EHR and survey data) to build a model or calculate a risk. 2) the heterogeneity in All of Us population where underrepresented population in clinical trials or clinical data set are more present in the All of Us 3) the cardiovascular risk score is different in racial groups.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Ya Lin Chen - Graduate Trainee, University of Washington
1 - 25 of 10560
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.