Research Projects Directory

Research Projects Directory

5,321 active projects

This information was updated 6/2/2023

The Research Projects Directory includes information about all projects that currently exist in the Researcher Workbench to help provide transparency about how the Workbench is being used. Each project specifies whether Registered Tier or Controlled Tier data are used.

Note: Researcher Workbench users provide information about their research projects independently. Views expressed in the Research Projects Directory belong to the relevant users and do not necessarily represent those of the All of Us Research Program. Information in the Research Projects Directory is also cross-posted on AllofUs.nih.gov in compliance with the 21st Century Cures Act.

Nearline Cloud Cost Debug CT

When considering cost-effective practices for interacting with Google Cloud Nearline storage data in the All of Us Research Program, it's essential to focus on scientific questions that align with the program's objectives and can benefit from best utilizing this storage…

Scientific Questions Being Studied

When considering cost-effective practices for interacting with Google Cloud Nearline storage data in the All of Us Research Program, it's essential to focus on scientific questions that align with the program's objectives and can benefit from best utilizing this storage solution. What are the most frequent access patterns for the stored data in Google Cloud Nearline? Understanding access patterns helps identify data subsets that are frequently accessed and might benefit from a different storage tier, such as Google Cloud Coldline or Google Cloud Storage Standard. This knowledge can help optimize costs while ensuring timely access to frequently used data. Query optimization: How can we design efficient querying strategies on Google Cloud Nearline to minimize data retrieval costs? By studying query patterns and optimizing data retrieval techniques, researchers can reduce the amount of data transferred from Nearline storage, leading to cost savings.

Project Purpose(s)

  • Other Purpose (To evaluate the most cost effective practices on how to interact with AoU nearline storage data.)

Scientific Approaches

To analyze data access patterns in the All of Us Research Program using Google Cloud Nearline storage, we can employ various scientific approaches. Here are a few commonly used methods: Usage logs and access tracking: Analyze usage logs and access tracking data provided by Google Cloud services. These logs contain information about data retrieval requests, timestamps, and user access patterns. By analyzing this data, we can identify which datasets are frequently accessed, the frequency of access, and the specific access patterns. Machine learning and predictive modeling: Employ machine learning techniques to predict future data access patterns based on historical access data. By training models on past access patterns, you can forecast future demands and adjust storage strategies accordingly. Predictive modeling can help optimize storage tiers, anticipate peak periods, and allocate resources effectively.

Anticipated Findings

By combining these scientific approaches, you can gain a comprehensive understanding of data access patterns in the All of Us Research Program. This knowledge can inform decisions on storage optimization, data retrieval strategies, and resource allocation, ultimately enhancing cost-effectiveness and user experience.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Brian Sharber - Project Personnel, Vanderbilt University Medical Center
  • Alexander Bick - Early Career Tenure-track Researcher, Vanderbilt University Medical Center

Contraception

Previous studies have been able to map variations in medication administration and treatment decisions in conditions like Type 2 Diabetes and substance abuse disorder. Similar to contraception, there are many options for the treatment of high blood sugar and substance…

Scientific Questions Being Studied

Previous studies have been able to map variations in medication administration and treatment decisions in conditions like Type 2 Diabetes and substance abuse disorder. Similar to contraception, there are many options for the treatment of high blood sugar and substance use and may be a complex decision-making process between personal and provider preferences. No currently published work has mapped the contraception decisions of an individual over time. My goal is to quantify the variation in contraception prescriptions for individuals in the All of Us Research Program dataset.
The overall objective of this project is to understand contraception prescription patterns in individuals who identify as female within the All of Us dataset.
(1) What are the most prevalent types of contraception? (2) Has there been a change in the percentages of participants who were prescribed a particular contraceptive? (3) What are the most common sequences of contraceptive prescriptions?

Project Purpose(s)

  • Educational
  • Methods Development

Scientific Approaches

A retrospective, longitudinal analysis will be conducted using the All of Us Research Program dataset to identify contraceptive prescribing patterns for individuals who identify as female in the United States. First, we will use the Cohort Builder within the workbench to identify all eligible adult participants (≥18 years of age) who identify as female. We will then limit the cohort to just individuals who have a medication record of at least one contraceptive including, birth control pills (e.g., Microgestin), IUDs (e.g., Levonorgestrel [Mirena]), implants (e.g., Etonogestrel [Nexplanon]), and injections (e.g., Depo-Provera). I will use R within the All of Us Researcher Workbench to complete the study objectives.

Anticipated Findings

I anticipate that I will be able to complete the study objectives within the timeframe of the summer. Overall, the anticipated results of the project will have a significant positive impact on the field of contraception research and ultimately improve the quality of care for individuals seeking contraception. In the long term, I expect to develop predictive models that can help healthcare providers identify the most effective contraceptive methods for individual patients based on their characteristics and preferences. More excitingly, I am looking forward to the project that will conduct and provide some educational resources which help individuals and healthcare providers make decisions.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Caitlin Dreisbach - Early Career Tenure-track Researcher, University of Rochester
  • Carol Li - Undergraduate Student, University of Rochester

Duplicate of Duplicate of Vaping study

vaping and skin cancer. Un sure of exact parameters but want to look at everything including melanoma, basal cell and squamous cell carcinoma.

Scientific Questions Being Studied

vaping and skin cancer. Un sure of exact parameters but want to look at everything including melanoma, basal cell and squamous cell carcinoma.

Project Purpose(s)

  • Disease Focused Research (Skin cancers)

Scientific Approaches

odds ratio. Probably just look for association studies. Will also include regression studies and other statistical tests that may link vaping to increased odds of skin cancer

Anticipated Findings

Given the ambivalent data on vaping and skin cancers, I am unsure of what results to expect. I anticipate that it is possuible there is increased risk, but the time that vaping has been around is not long enough to determine.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Harrison Zhu - Graduate Trainee, Baylor College of Medicine

PRISM

Clinicians today rely on imprecise, generalized methods in the treatment of complex diseases, leading to less effective care. Polygenic Risk Scores (PRS’s) are one method to help providers make more informed decisions, yet they are sparsely adopted due to two…

Scientific Questions Being Studied

Clinicians today rely on imprecise, generalized methods in the treatment of complex diseases, leading to less effective care. Polygenic Risk Scores (PRS’s) are one method to help providers make more informed decisions, yet they are sparsely adopted due to two key limitations: PRS’s consider only individual genetic variants in a summative manner and they are trained on data representative of primarily European populations. However, new precision medicine initiatives and genomic data will bring PRS’s back to the forefront of innovation. Therefore, we present a plan with the following specific aims: 1) To design a study cohort ethnically representative of the U.S. population and limited in confounding diagnoses, 2) To compile a list of risk variants and associated pathways for a well-studied disease with a PRS, 3) To develop a more generally applicable PRS that factors in gene-gene interactions, and 4) To evaluate the accuracy of the PRS in ethnicity-stratified data.

Project Purpose(s)

  • Population Health
  • Methods Development
  • Ancestry

Scientific Approaches

We plan to use the genomics data from this database, examining the alleles present for a number of selected risk variants for a disease of choice (chosen simply for comparison purposes, not to seek greater understanding). We hope to design a cohort with a distribution of ancestry likened to the actual distribution of ancestry in the U.S. and limit the presence of confounding diagnoses as much as feasible. We will then train a non-linear model that takes into account biologically relevant cross-terms, or gene-gene relationships, to generate a polygenic risk score that will be more accurate in general and for a broader range of patients.

Anticipated Findings

We anticipate finding that training an algorithm that makes a risk estimate on more ethnically diverse data will increase the accuracy for patients of non-European ancestry. This will hopefully indicate the necessity to train other risk estimates on more diverse data to improve clinical decision-making. We also anticipate finding that including more complex interactions in our risk estimate will increase the granularity (ie. discerning disease stages) and accuracy of the estimate. This again will improve clinical decision-making and increase our ability to use risk modeling algorithms such as these to learn more about the interactions underlying complex diseases.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Noah Fields - Graduate Trainee, Stanford University
  • Allie Littleton - Graduate Trainee, Stanford University

Demo

jugk

Scientific Questions Being Studied

jugk

Project Purpose(s)

  • Ancestry

Scientific Approaches

khbkj

Anticipated Findings

jhgk

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

PinkPig

Benchmarking the workbench space and cohort creation processes to estimate data usage and pricing. The parameters developed here will be used as guide lines to develop larger cohorts and assess compute and storage costs for future research.

Scientific Questions Being Studied

Benchmarking the workbench space and cohort creation processes to estimate data usage and pricing. The parameters developed here will be used as guide lines to develop larger cohorts and assess compute and storage costs for future research.

Project Purpose(s)

  • Educational

Scientific Approaches

Specific approaches include creating a Crromwel based analysis pipeline. The data sets will be short and long read WGS that meet the criteria of smoking and small cell lung cancer.

Anticipated Findings

The andicipated findings of the study are:
1. assessing compute and storage costs for research on SV analysis
2. identify and document steps for instructional purposes.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Feseha Abebe-Akele - Early Career Tenure-track Researcher, Elizabeth City State University

Explore the potential mechanisms of noncoding variants in human diseases

Noncoding regulatory elements, such as enhancers, act as regulators of gene expression. The Encyclopedia of DNA Elements (ENCODE) project and other studies have identified millions of regulatory elements in various tissues. The roles of noncoding regulatory elements in many human…

Scientific Questions Being Studied

Noncoding regulatory elements, such as enhancers, act as regulators of gene expression. The Encyclopedia of DNA Elements (ENCODE) project and other studies have identified millions of regulatory elements in various tissues. The roles of noncoding regulatory elements in many human diseases have acquired considerable attention, including in cancer, cardiovascular disease and psychiatric disorders. GWAS studies in understanding the genetic basis of these diseases has also identified many noncoding regulatory variants responsible for genetic risk, yet the mechanisms behind these risk variants remain poorly understood. Recent integrating multidimensional genomic data, such as expression, methylation, histone modification, chromatin accessibility and three-dimensional organization data, have enhanced interpretation of noncoding risk variants in human diseases. However, systematic analysis multidimensional genomic data make things more complex and has become a big challenge in the field.

Project Purpose(s)

  • Methods Development
  • Ancestry

Scientific Approaches

In this project, we will develop novel and powerful computational methods to systematically discover the potential biomarker and drug target in human diseases, such as cardiovascular disease, arteriosclerosis, lung diseases, asthma, T2D, and many other metabolism diseases based on multidimensional genomic data. We first will identify molecular differences between healthy individuals and patients based on these sequencing data. The All of Us project provides a great resource of genomic data in well-phenotype, disease-relevant populations. We will identify differences between healthy individuals and patients based on All of Us sequencing data. And then using newly developed computational methods to systematically integrate multidimensional genomic data from other cohorts.

Anticipated Findings

In this study, we will develop novel computational methods to systematically discover these diseases causing changes in noncoding elements, which may be used as drug target in the future. Computational tools will be shared with all researchers. Findings from this project will be disseminated widely and shared with the scientific community by presenting results at national scientific meetings and publishing in peer-reviewed journals.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Ya Cui - Research Fellow, University of California, Irvine

Duplicate of ABO PheWAS

Research questions: 1) Can our novel ABO blood typing algorithm using genetic data be used effectively to extensively type ABO subtypes from whole genome sequencing and array data in a diverse cohort? 2) Will a SNP approach for ABO blood…

Scientific Questions Being Studied

Research questions:

1) Can our novel ABO blood typing algorithm using genetic data be used effectively to extensively type ABO subtypes from whole genome sequencing and array data in a diverse cohort?
2) Will a SNP approach for ABO blood typing be concordant with available serotype?
3) What disease association ABO blood types can be replicated using the AllofUs dataset?
4) What novel disease associations, if any, with ABO blood types can be identified in a diverse cohort?

Relevance: Genomic variation in RBC and antigens is associated with a myriad of conditions. The ABO locus alone is associated with many conditions including venous thromboembolism (VTE), pancreatic cancer, malaria, and COVID-19. Furthermore, it is not common practice to extensively type beyond the traditional ABO blood groups, and the studies that do so are primarily done in individuals of European ancestry. Thus, we seek to do the first PheWAS on extensively typed RBC antigens and to do so in a diverse cohort.

Project Purpose(s)

  • Disease Focused Research (red blood cell (RBC) antigen-associated diseases)

Scientific Approaches

We plan to employ a blood typing algorithm to extensively type RBC antigens from 1) whole genome sequencing and 2) array data in the AllofUs cohort, and compare the two outcomes. Then, we plan to employ the phenome-wide association study (PheWAS) approach to identify associations between RBC antigen types and other clinical phenotypes. PheWAS will be carried out using multivariable linear regression and logistic regressions with ABO blood groups with our novel ABO blood type. For example, in the case of the ABO blood group, ABO blood subtypes (A101, A102, Aw01, B101, etc.) will act as the independent variable and phenotypes, derived from participant provided information (PPI) electronic health records (EHR), as the dependent variable. Initial models will include adjustments for age, gender, and race/ethnicity. Differential associations by race/ethnicity, gender, and sex will also be evaluated.

Anticipated Findings

This proposed project aims to test our novel ABO blood typing algorithm on WGS and array data in the diverse AllofUs cohort. We also aim to replicate known RBC-disease associations as well as identify any novels ones that may be identified within a diverse cohort.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Tutorial

In this workshop, I am trying to get familiar with the gastrointestinal dataset at All of Us. In doing so I'll explore the distribution of the diseases across the US, and compare the relevance of these diseases amongst different ages…

Scientific Questions Being Studied

In this workshop, I am trying to get familiar with the gastrointestinal dataset at All of Us. In doing so I'll explore the distribution of the diseases across the US, and compare the relevance of these diseases amongst different ages and groups.

Project Purpose(s)

  • Educational

Scientific Approaches

In this workshop, the scientific approach I am planning to use is to use descriptive statistics to get familiar with the distribution of the gastrointestinal dataset at All of Us. In doing so I'll explore the distribution of the diseases across the US, and compare the relevance of these diseases amongst different ages and groups.

Anticipated Findings

Anticipated findings from the study could show where these diseases are staggered in the nation and ultimately help motivate allocating resources to those areas.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Alireza Majd - Research Fellow, University of California, San Francisco
  • Ali Kalantari - Graduate Trainee, University of California, San Francisco

PGx Genotyping Comparison (v7)

As interest in PGx implementation grows, there is a lot of discussion over which technologies to use and their benefits and drawbacks. Using publicly available biobank data, this will be the first comprehensive study of the strengths and weaknesses of…

Scientific Questions Being Studied

As interest in PGx implementation grows, there is a lot of discussion over which technologies to use and their benefits and drawbacks. Using publicly available biobank data, this will be the first comprehensive study of the strengths and weaknesses of each genotyping technology for use in PGx, as measured by calling accuracy and PGx coverage against a gold standard of high coverage WGS.

Project Purpose(s)

  • Methods Development

Scientific Approaches

We will use the WGS data, and also derive a synthetic exome and low-pass WGS dataset from it by downsampling reads and extracting relevant regions. We will also use the array data. We will use PharmCAT to call the pharmacogenetic variants in each dataset and compare their concordance with calls from the 40x WGS dataset.

Anticipated Findings

We anticipate that 1x WGS will have high coverage, but suffer in per-site accuracy. Exome will likely have high accuracy and decent coverage but will miss key noncoding variants (specifically expect to perform poorly for CYP2C19). Array will likely have low coverage and relatively low per-site accuracy. These findings will inform decisions for clinical implementation of pharmacogenetic testing.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Karl Keat - Graduate Trainee, University of Pennsylvania

Collaborators:

  • Binglan Li - Research Fellow, Stanford University

BMIanalyses v7

I am exploring the data at this stage, and have not formalized a specific research question. I am generally interested in the genetic and non-genetic factors that lead to obesity, and would like to first get familiar with the social/environmental…

Scientific Questions Being Studied

I am exploring the data at this stage, and have not formalized a specific research question. I am generally interested in the genetic and non-genetic factors that lead to obesity, and would like to first get familiar with the social/environmental variables and with the genetic data. One question I may be interested in is how do associations of social and environmental variables with obesity differ upon adjusting for genetic risk for obesity.

Project Purpose(s)

  • Disease Focused Research (obesity)
  • Population Health
  • Social / Behavioral
  • Ancestry

Scientific Approaches

I will look at basic descriptive statistics, such as counts and means, as well as multivariable linear/logistic regression approaches to assess associations with BMI, obesity and related traits. With the genetic data, I will calculate polygenic risk scores for BMI, and will also use GWAS and Mendelian randomization analyses.

Anticipated Findings

Although this is exploratory in nature, I am interested in showing how we can incorporate measures of genetic risk to better identify and understand non-genetic risk factors for obesity.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Registered Tier

Research Team

Owner:

Collaborators:

  • Jennifer Kraszewski - Graduate Trainee, University of Arizona
  • Ashley Maxwell - Undergraduate Student, University of Arizona

Duplicate of Demo - Siloed Analysis of All of Us and UK Biobank Genomic Data

Historically, researchers responded to limitations in genomic data sharing policy and practice by conducting meta analysis on summary outputs from isolated genomic datasets. Recent work has demonstrated the increased power of individual-level genetic analysis on pooled datasets. In addition, advancements…

Scientific Questions Being Studied

Historically, researchers responded to limitations in genomic data sharing policy and practice by conducting meta analysis on summary outputs from isolated genomic datasets. Recent work has demonstrated the increased power of individual-level genetic analysis on pooled datasets. In addition, advancements in data access and sharing policies coupled with technological advancements in cloud-based environments for data access and analysis have opened up new possibilities for pooled analysis of large-scale genomic datasets. The NIH All of Us Research Program and UK Biobank are two leading examples of large, population scale studies which combine genomic data with deep phenotypic health data. There is a grand opportunity to demonstrate how the world’s largest research-ready biomedical datasets can create more value together and advance discovery in genome science.

Project Purpose(s)

  • Other Purpose (This is a demonstration project meant to support research with All of Us genomic data. Please see https://www.biorxiv.org/content/10.1101/2022.11.29.518423)

Scientific Approaches

The primary goal of this project is to demonstrate the potential of the All of Us Researcher Workbench for pooled analyses of All of Us and UK Biobank data. Specifically, we aim to: 1. Develop and describe an approved, secure path for connecting UK Biobank data to the All of Us Researcher Workbench. 2. Conduct a genome-wide association study of blood lipids on the pooled dataset aimed at demonstrating that biomedical researchers can be more productive when permitted to analyze the union of the cohorts, as opposed to computing aggregate results in separate data silos for each cohort and then combining those aggregates.

Anticipated Findings

The secondary goal of this project is to demonstrate and measure the experience when the same analyses are repeated in a siloed manner. Specifically we aim to: 3. Repeat the previously described genome-wide association study on the All of Us Researcher Workbench when working with the All of Us data and on UK Biobank’s DNAnexus when working with the UK Biobank data. 4. Conduct a meta analysis on the aggregate results for each cohort (in accordance with each program’s data use policies) and compare the result of combining those aggregates to the results from the pooled analysis. Evaluate not only differences in results, but also differences in analysis cost and analyst productivity.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Collaborators:

  • Margaret Sunitha Selvaraj - Research Fellow, Broad Institute
  • Melissa Patrick - Project Personnel, All of Us Program Operational Use
  • Jennifer Zhang - Project Personnel, All of Us Program Operational Use
  • Gage Rion - Project Personnel, All of Us Program Operational Use
  • David Glazer - Other, All of Us Program Operational Use
  • Christopher Lord - Project Personnel, All of Us Program Operational Use
  • Aymone Kouame - Other, All of Us Program Operational Use
  • Alexander Bick - Early Career Tenure-track Researcher, Vanderbilt University Medical Center

yong_longcovid

We would like to investigate risk factors for developing long covid, particularly via causal inference methods using only observational data.

Scientific Questions Being Studied

We would like to investigate risk factors for developing long covid, particularly via causal inference methods using only observational data.

Project Purpose(s)

  • Disease Focused Research (long covid)
  • Methods Development
  • Ancestry

Scientific Approaches

We plan to apply causal inference methods to study the causal effect of certain genes along with other variables such as age gender and lifestyle to developing long covid. The data we are interested in includes genetic data, demographics, lab tests, vital signs, drug exposure etc.

Anticipated Findings

We would like to identify predictors of long covid and propose appropriate intervention strategy for helping people who suffer from long covid

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Yong Huang - Graduate Trainee, University of California, Irvine

Duplicate of How to Get Started with Registered Tier Data (v7)

We recommend that all researchers explore the notebooks in this workspace to learn the basics of All of Us Program Data. What should you expect? This notebook will give you an overview of what data is available in the current…

Scientific Questions Being Studied

We recommend that all researchers explore the notebooks in this workspace to learn the basics of All of Us Program Data.

What should you expect? This notebook will give you an overview of what data is available in the current Curated Data Repository (CDR). It will also teach you how to retrieve information about Electronic Health Record (EHR), Physical Measurements (PM), and Survey data.

Project Purpose(s)

  • Educational
  • Methods Development
  • Other Purpose (This is an All of Us Tutorial Workspace. It is meant to provide instruction for key Researcher Workbench components and All of Us data representation.)

Scientific Approaches

This Tutorial Workspace contains two Jupyter Notebooks (one written in Python, the other in R). Each notebook is divided into the following sections:

1. Setup: How to set up this notebook, install and import software packages, and select the correct version of the CDR.
2. Data Availability Part 1: How to summarize the number of unique participants with major data types: Physical Measurements, Survey, and EHR;
3. Data Availability Part 2: How to delve a little deeper into data availability within each major data type;
4. Data Organization: An explanation of how data is organized according to our common data model.
5. Example Queries: How to directly query the CDR, using two examples of SQL queries to extract demographic data.
6. Expert Tip: How to access the base version of the CDR, for users that want to do their own cleaning.

Anticipated Findings

By reading and running the notebooks in this Tutorial Workspace, you will understand the following:

All of Us data are made available in a Curated Data Repository. Participants may contribute any combination of survey, physical measurement, and electronic health record data. Not all participants contribute all possible data types. Each unique piece of health information is given a unique identifier called a concept_id and organized into specific tables according to our common data model. You can use these concept_ids to query the CDR and pull data on specific health information relevant to your analysis. See our support article Learning the Basics of the All of Us Dataset for more info.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Elena_lit_review

Demographics of those with Multiple Sclerosis, COPD, Ischaemic Stroke, Atopic Dermatitis, Asthma, Rheumatoid Arthritis, Venuous Thromboembolism, ALS, Major Depressive Disorder, Schizophrenia, Platelet Count, Alzheimers in v7

Scientific Questions Being Studied

Demographics of those with Multiple Sclerosis, COPD, Ischaemic Stroke, Atopic Dermatitis, Asthma, Rheumatoid Arthritis, Venuous Thromboembolism, ALS, Major Depressive Disorder, Schizophrenia, Platelet Count, Alzheimers in v7

Project Purpose(s)

  • Ancestry

Scientific Approaches

Want to test blood cell traits for associations with these diseases, but first need to assess if AllofUs has enough individuals with these diseases.

Anticipated Findings

I will obtain the age, sex, and ancestry of indivduals with these diseases. This will determine if there is enough data to look for associations

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Micah Hysong - Graduate Trainee, University of North Carolina, Chapel Hill

Duplicate of Skills Assessment Training Notebooks For Users (v7)

This workspace contains multiple notebooks that assess users' understanding of the workbench and OMOP. These notebooks are meant to help users check their knowledge not only on Python, R, and SQL, but also on the general data structure and data…

Scientific Questions Being Studied

This workspace contains multiple notebooks that assess users' understanding of the workbench and OMOP. These notebooks are meant to help users check their knowledge not only on Python, R, and SQL, but also on the general data structure and data model used by the All of Us program.

Project Purpose(s)

  • Educational

Scientific Approaches

There are no scientific approach used in this workspace because it is meant for educational purposes only. We will cover all aspects of OMOP, and hence will use most datasets available in the workbench.

Anticipated Findings

We do not anticipate to have any findings. Instead, we are educating people on the use of the workbench and the common data model OMOP used by the program.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Kaitlyn Ford - Undergraduate Student, Salve Regina University

Duplicate of Skills Assessment Training Notebooks For Users (v7)

This workspace contains multiple notebooks that assess users' understanding of the workbench and OMOP. These notebooks are meant to help users check their knowledge not only on Python, R, and SQL, but also on the general data structure and data…

Scientific Questions Being Studied

This workspace contains multiple notebooks that assess users' understanding of the workbench and OMOP. These notebooks are meant to help users check their knowledge not only on Python, R, and SQL, but also on the general data structure and data model used by the All of Us program.

Project Purpose(s)

  • Educational

Scientific Approaches

There are no scientific approach used in this workspace because it is meant for educational purposes only. We will cover all aspects of OMOP, and hence will use most datasets available in the workbench.

Anticipated Findings

We do not anticipate to have any findings. Instead, we are educating people on the use of the workbench and the common data model OMOP used by the program.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

  • Gabriella Papale - Early Career Tenure-track Researcher, Salve Regina University

Exploration of Available Laboratory microRNA Data

The reason for exploring this data is to determine whether data is being collected on a novel biomarker in breastmilk. Our research question is to determine whether this novel biomarker in breastmilk varies by population characteristics.

Scientific Questions Being Studied

The reason for exploring this data is to determine whether data is being collected on a novel biomarker in breastmilk. Our research question is to determine whether this novel biomarker in breastmilk varies by population characteristics.

Project Purpose(s)

  • Population Health

Scientific Approaches

The data sets will include RNA-sequencing data and demographic data. The methods will be very basic, such as frequency tables and percentages.

Anticipated Findings

We would like to know the feasibility of exploring this research question further. Our findings would contribute long-term to the health of moms and babies.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Registered Tier

Research Team

Owner:

Azathioprine Replication

We are trying study risk of Leukopenia or Pancreatitis in Azathioprine patients. We would like to make prediction models based of this disease. This is important question to answer because it helps clinicians to detect which patients are in risk…

Scientific Questions Being Studied

We are trying study risk of Leukopenia or Pancreatitis in Azathioprine patients. We would like to make prediction models based of this disease. This is important question to answer because it helps clinicians to detect which patients are in risk of these side effects while prescribing Azathioprine.

Project Purpose(s)

  • Disease Focused Research (leukopenia)
  • Methods Development
  • Control Set
  • Ancestry

Scientific Approaches

We plan to collect data on patients who are new users of azathioprine. We want to select various demographic characteristics like sex, age, indication of azathioprine and genetic makeup. Then we would like to use this dataset to create prediction models.

Anticipated Findings

What we anticipated from this study is that we build accurate models which would create a risk score that would detect risk of Leukopenia or Pancreatitis on Azathioprine.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

Pulmonary Hypertension Research

We hope to better understand the association between pulmonary hypertension, cardiovascular diseases, sleep disorders and inflammatory auto-immune disorders.

Scientific Questions Being Studied

We hope to better understand the association between pulmonary hypertension, cardiovascular diseases, sleep disorders and inflammatory auto-immune disorders.

Project Purpose(s)

  • Disease Focused Research (pulmonary hypertension)
  • Ancestry

Scientific Approaches

We plan to use genetic data, clinical data, and survey data to better understand how these variables impact pulmonary hypertension.

Anticipated Findings

The relationship between pulmonary hypertension and other comorbidities will better inform clinical choice in therapy, and make for more tailored approaches to medically complex diseases.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Satoshi Okawa - Early Career Tenure-track Researcher, University of Pittsburgh
  • Anisha Shah - Other, University of Pittsburgh

Collaborators:

  • Hunter Hollis - Project Personnel, All of Us Program Operational Use

Identity-by-descent in the United States

We are leveraging the full genomic and population diversity of the All of Us project to understand the genetic ancestral basis of diversity in the causes, etiology and treatment of health outcomes. All of Us provides the racial and ethnic…

Scientific Questions Being Studied

We are leveraging the full genomic and population diversity of the All of Us project to understand the genetic ancestral basis of diversity in the causes, etiology and treatment of health outcomes. All of Us provides the racial and ethnic background of participants, but these are inaccurate proxies for genetic ancestry, which will help us understand the contribution of genetic ancestral differences among individuals to the biological basis of health outcomes. Therefore, we will measure genetic diversity, identify the genetic ancestry of All of Us participants throughout the United States. This information will help us better understand biological variation that contributes to differences in health outcomes.

Project Purpose(s)

  • Population Health
  • Ancestry

Scientific Approaches

We are first quantifying fine scale population substructure using genomic approaches that measure: a) global genetic diversity, or the total proportion of different global ancestries represented in an individual's genome; b) local genetic ancestry, or where in the genome this ancestry is located in an individual; c) detection of genomic segments shared identity-by-descent (IBD). These IBD segments are segments of DNA shared between individuals from a shared common ancestor. We are using Hail, PLINK, ADMIXTURE, RFMix, MOSAIC,TBWPT, and in-house Python and R scripts and other genomic software to capture this variation.

Anticipated Findings

We anticipate that we will identify founder populations that are distributed differently across the United States, and distinguish population subgroups that are finer grained than either racial categories or continental ancestry categories. For example, the Latino ethnicity comprises individuals who are Dominican, Puerto Rican, Mexican, Cuban, etc. We anticipate being able to distinguish these groups, as well as the admixture among these groups, to more accurately understand the contribution of ancestry to health outcomes. Quantification this ancestry is the first step to understanding the biological diversity within the United States.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Sri Raj - Early Career Tenure-track Researcher, Albert Einstein College of Medicine

Collaborators:

  • William Jerome - Graduate Trainee, Albert Einstein College of Medicine
  • Mariko Segawa - Research Fellow, Albert Einstein College of Medicine
  • Kevin Tao - Graduate Trainee, Albert Einstein College of Medicine
  • Ishana Raghuram - Project Personnel, University of California, Berkeley
  • Hersh Gupta - Graduate Trainee, Albert Einstein College of Medicine
  • Chynna Smith - Graduate Trainee, Albert Einstein College of Medicine
  • chenxin zhang - Project Personnel, Albert Einstein College of Medicine
  • Tinaye Mutetwa - Graduate Trainee, Albert Einstein College of Medicine

Azathioprine Replication

We are trying study risk of Leukopenia or Pancreatitis in Azathioprine patients. We would like to make prediction models based of this disease. This is important question to answer because it helps clinicians to detect which patients are in risk…

Scientific Questions Being Studied

We are trying study risk of Leukopenia or Pancreatitis in Azathioprine patients. We would like to make prediction models based of this disease. This is important question to answer because it helps clinicians to detect which patients are in risk of these side effects while prescribing Azathioprine.

Project Purpose(s)

  • Disease Focused Research (leukopenia and pancreatitis )
  • Methods Development
  • Control Set
  • Ancestry

Scientific Approaches

We plan to collect data on patients who are new users of azathioprine. We want to select various demographic characteristics like sex, age, indication of azathioprine and genetic makeup. Then we would like to use this dataset to create prediction models.

Anticipated Findings

What we anticipated from this study is that we build accurate models which would create a risk score that would detect risk of Leukopenia or Pancreatitis on Azathioprine

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

The Genetics of Endometriosis in Diverse Ancestries

More than 200,000 women are diagnosed with endometriosis every year and over half of those women do not receive a definitive diagnosis until 8.5 years after the onset of symptoms and many times when they present with additional comorbidities. While…

Scientific Questions Being Studied

More than 200,000 women are diagnosed with endometriosis every year and over half of those women do not receive a definitive diagnosis until 8.5 years after the onset of symptoms and many times when they present with additional comorbidities. While several studies have suggested that genomic markers, environmental risk factors and inflammatory markers play crucial roles in endometriosis symptomatology, there are no effective tools available to predict an individual’s risk of developing endometriosis or to predict its downstream effects. The long-term goal is to develop effective and non-invasive early screening tools to identify patients at risk of developing endometriosis and predict long-term effects. The main objective of this project is the development of models to predict the risk of endometriosis across varied clinical manifestations and associated long-term health outcomes, integrating genetic and non-genetic risk factors extracted from Electronic Health Records.

Project Purpose(s)

  • Disease Focused Research (endometriosis)
  • Ancestry

Scientific Approaches

We plan to use phenotype and genotype data for this project. We will approach the analyses in multiple ways:
- Genetic architecture elucidation vai common varaint and rare variant association analyses
- Testing of polygenic risk scores
- Genotypic and biometric clustering approaches
- Mediation and mendelian randomization analyses

Anticipated Findings

The expected outcomes will be rigorously evaluated non-invasive computational methods for screening and diagnosing endometriosis across various clinical manifestations and its long-term effects based on genetic and non-genetic factors. In addition, our screening and diagnostic methods will be applicable to women of diverse ancestries.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Shefali Verma - Early Career Tenure-track Researcher, University of Pennsylvania
  • Lindsay Guare - Graduate Trainee, University of Pennsylvania

Duplicate of How to Work with Genomics Data (CRAM_Processing and IGV)_v7HC

This workspace and its notebooks neither ask nor answer any scientific questions. The purpose of this workspace is to serve as a tutorial which shows how to localize the All of Us (AoU) CRAM files individually or in groups via…

Scientific Questions Being Studied

This workspace and its notebooks neither ask nor answer any scientific questions. The purpose of this workspace is to serve as a tutorial which shows how to localize the All of Us (AoU) CRAM files individually or in groups via the CRAM manifest in addition to showing how to render the Integrated Genome Viewer (IGV) on the AoU workbench to explore the CRAM files.

Project Purpose(s)

  • Methods Development

Scientific Approaches

This workspace conducts no study and applies no scientific approaches. This workspace and its notebooks are tutorials for localizing AoU CRAM files with R commands and using IGV to explore their contents. The methods and tools employed include R system commands for localizing individual CRAM files, an R for loop for localizing multiple CRAM files by referencing the manifest, and the commands for importing and rendering IGV to view the localized CRAM files.

Anticipated Findings

There will be no findings or contribution to scientific knowledge as there is no study being conducted nor questions asked. Informal 'findings' include the usability of the aforementioned tools and AoU CRAM files on the All of Us workbench.

Demographic Categories of Interest

This study will not center on underrepresented populations.

Data Set Used

Controlled Tier

Research Team

Owner:

  • Allyson Motter - Research Assistant, National Human Genome Research Institute (NIH-NHGRI)

Smoking and Genetic Ancestry

We seek to determine how genetic ancestry is associated with smoking behavior among individuals identifying as Hispanic or Latino in the All of Us cohort.

Scientific Questions Being Studied

We seek to determine how genetic ancestry is associated with smoking behavior among individuals identifying as Hispanic or Latino in the All of Us cohort.

Project Purpose(s)

  • Ancestry

Scientific Approaches

We will be performing GWAS using Hail, admixture regression using multivariable logistic regression in R, and other related genetic / genomic analyses.

Anticipated Findings

We expect to see smoking behavior vary across individuals of different Hispanic / Latino subcontinental ancestry fractions.

Demographic Categories of Interest

  • Race / Ethnicity

Data Set Used

Controlled Tier

Research Team

Owner:

  • Vincent Lam - Research Fellow, National Institutes of Health (NIH)
1 - 25 of 5321
<
>
Request a Review of this Research Project

You can request that the All of Us Resource Access Board (RAB) review a research purpose description if you have concerns that this research project may stigmatize All of Us participants or violate the Data User Code of Conduct in some other way. To request a review, you must fill in a form, which you can access by selecting ‘request a review’ below.