Biomedical Informatics Graduate Rotation Projects

This page is updated annually. Some projects may already be taken, and new projects may be available. The projects below give an indication of the types of projects available in each lab, but please browse faculty web pages and contact professors directly to discuss current opportunities.

View Rotation Projects by Faculty: BISB or BMI

Labs with BMI Rotation Projects

Robert El-Kareh | Biomedical Informatics

Tsung-Ting Kuo | Biomedical Informatics

Lucila Ohno-Machado | Biomedical Informatics

Rose Yu | Computer Science and Engineering

| Halıcıoğlu Data Science Institute

We are a statistical genetics lab focusing on developing methods to study complex traits and polygenic diseases across global populations, with a specific focus on minority groups that have been underrepresented in the fields of genetics and genomics. We are interested in developing novel multi-ancestry statistical methods for fine-mapping disease genes and their cell types of action. The goal of this research is to identify targets for gene-based therapeutics.

  • Improving disease-gene association testing using statistical priors on genetic regulation of gene expression

    Last Updated:

    One popular approach to disease-gene association testing is a transcriptome-wide association study (TWAS). Conceptually, TWAS is a test for the genetic correlation between cis-regulated gene expression and disease. However, only half of the genetic regulation of gene expression is expected to be in cis, e.g. by genetic variation within 1 Mb of the gene. The goal of this project is to develop a novel statistical method that leverages priors on SNP-gene regulatory links beyond the cis-window to improve our understanding of the genetic regulation of gene expression. As a result, our ability to identify disease-associated genes via TWAS should substantially improve due to (1) the enhanced identification of genes regulated by genetic variation and (2) the increased accuracy with which we can predict an individual's gene expression.

  • Multi-ancestry gene-disease association testing via cross-population modeling of eQTLs

    Last Updated:

    One approach to infer causal genes in disease is a transcriptome-wide association study (TWAS). However, TWAS is not powerful in non-European populations due to poor trans-ancestry portability of gene expression prediction models and smaller genome-wide association study (GWAS) sample sizes. The purpose of this project is to develop a novel machine learning approach to mitigate the issue of trans-ancestry portability. This approach will allow for powerful TWAS in non-European populations by simultaneously modeling genetic and genomic data from different populations, which has previously been challenging due to population-specific differences in genetic architecture such as linkage disequilibrium.

| La Jolla Institute for Immunology

We are interested in the analysis and modeling of the three-dimensional chromatin structure from high-throughput sequencing experiments. We develop methods that are based in statistics, machine learning, optimization and graph theory to understand how changes in the 3D genome affect cellular outcome such as development, differentiation and gene expression. We have ongoing interests in the systems level analysis and reconstruction of regulatory networks, inference of enhancer-promoter contacts, predictive models of gene expression and integration of three-dimensional chromatin structure with one-dimensional epigenetic measurements in the context of cancer, malaria, asthma and several autoimmune diseases.

  • Integrative analysis of multi cell-type gene expression and epigenomic data in tumor immune response

    Last Updated:

    This project will focus on developing regulatory network inference methods for the joint analysis of gene expression and histone modification data from several different types of tumor infiltrating lymphocytes, which are gathered from a cohort of patients with solid tumors.

  • Predictive and comparative modeling of epigenetic gene regulation in different human immune cell types

    Last Updated:

    The goal of this project is to model the natural variation in gene expression across many immune cell types using an already established database at LJI (https://dice-database.org) and to identify cell type-specific epigenetic regulators of important immune genes.

  • Statistical methods for inferring functional DNA-DNA contacts from Hi-C and HiChIP/PLAC-seq data

    Last Updated:

    This project focuses on developing computational tools for better analysis of the wealth of data from chromosome conformation capture assays with the ultimate goal of inferring functional chromatin contacts such as those between enhancers and promoters.

| Pediatrics

Research in our lab is focused on developing computational methods and tools for variant calling in human genomes and using these tools for disease association studies. We focus on challenging variant types such as haplotypes and variants in repetitive regions and work with both short-read (Illumina) and long-read sequencing technologies.

  • Duplicated genes and association with disease

    Last Updated:

    Hundreds of duplicated genes in the human genome are duplicated and many are known to be associated with a number of human diseases. However, the short read lengths of current sequencing technologies make the analysis of such genes difficult. We have developed novel tools to genotype the copy number of duplicated genes using whole-genome sequencing. The goal of this project is to analyze large-scale sequencing datasets (using cloud computing platforms) for Mendelian and complex human diseases to identify novel disease associations. 

  • Haplotype-based variant calling using long-read sequencing

    Last Updated:

    Long-read sequencing technologies have the potential to overcome some of the key limitations of short-read sequencing, particular in long repetitive regions of the human genome, but require the development of new algorithms. We have previously developed computational methods for variant calling (Longshot, Nature Communications 2019) and read mapping in segmental duplications (Duplomap, Nucleic Acids Research 2020) using long-read sequencing technologies. The goal of this project is to implement a haplotype-based model for variant calling using long reads that automatically identifies genomic regions that can be called with high confidence.

| Surgery

  • Chromatin dysregulation and DNA methylation at transcription start sites associated with transcriptional repression in cancers

    Last Updated:

    Chromatin dysregulation and DNA methylation at transcription start sites associated with transcriptional repression in cancers

  • Genetic and epigenetic analysis of HPV-positive and HPV-negative Oropharyngeal Squamous Cell Carcinoma

    Last Updated:

    The Lab of Dr. Joseph Califano under the sponsorship of the Japan Society for the Promotion of Science (JSPS) will conduct collaborative research on a new strategy for the treatment for HPV-associated oropharyngeal cancer based on comprehensive epigenetic analysis.  This year, we are proud to congratulate Postdoctoral Researcher, Dr. Takuya Nakagawa for his work entitled “Genetic and epigenetic analysis of HPV-positive and HPV-negative Oropharyngeal Squamous Cell Carcinoma”. Dr. Nakagawa graduated from Chiba University School of Medicine in Japan where he also received the Medical Pharmacy Director’s Award. 

| School of Medicine

The main objective of the Chavez laboratory is the molecular characterization of malignant childhood cancers in order to identify drug targets and improve treatment options. Our focus is mainly on pediatric brain tumors such as medulloblastoma, glioblastoma, and ependymoma. Recently, we have demonstrated how to leverage epigenetic information such as DNA methylation and enhancer profiling in pediatric brain tumors and normal human tissues to identify clinically relevant tumor subgroups, oncogenic enhancers, transcription factors, and pathways amenable to pharmacologic targeting. To reveal regulatory circuitries disturbed in childhood brain tumors, we generate and integrate public high-dimensional data from primary tumors and patient-derived cell lines. We are specifically interested in the analysis of somatic and germline DNA mutations, chromatin and DNA modifications, transcription factor binding, and gene expression.

  • The 3D Tumor Genome

    Last Updated:

    To identify molecular mechanisms that contribute to tumor development and maintenance, we develop hypotheses driven computational tools for the integrative analysis of different layers of genetic and epigenetic information. As we recognized that our epigenetic mapping studies can identify effective drug targets, we are now profiling 3D tumor genomes to uncover molecular mechanisms that may cause disturbed enhancer-gene interactions leading to deregulation of gene expression and biochemical pathways.

| Psychiatry

Dr. Cheng’s research focuses on transcriptional regulatory network and aims to develop a comprehensive understanding of how aberrant regulatory circuits contribute to human disease. Dr. Cheng’s laboratory is particularly interested in understanding transcriptional and epigenetic regulation of the interplay between the immune system and central nervous system in neurodegenerative diseases, substance use disorder and HIV infection. Current projects focus on applying single-cell transcriptomics and epigenetics assays to characterize Alzheimer’s disease, HIV and opioid use disorder patient samples, with the goal of finding diagnostic markers and therapeutic targets. Dr. Cheng’s lab also has developed 3D brain organoid models for Alzheimer’s disease and HIV infection. Dr. Cheng received her M.S. degree in Computer Science from Stanford University, and she received her Ph.D. degree in Bioinformatics and Systems Biology from University of California, San Diego. After completing her doctoral study, Dr. Cheng did her postdoctoral training at the Broad Institute of MIT and Harvard.

  • 3D brain organoid model of Alzheimer’s disease revealed by single cell transcriptomics

    Last Updated:

    We developed a novel tau propagation model using 3D spheroid model that rapidly develop tau pathology and neurodegeneration in just three weeks. Single cell transcriptomics of the model reveals cell type specific changes that resemble transcriptomic signatures from Alzheimer’s disease postmortem brain.

  • Single cell transcriptomics and epigenetics of human Alzheimer’s disease brain

    Last Updated:

    To understand cell type specific vulnerability of Alzheimer’s disease, we utilize snRNA-seq to characterize human brain tissues from Alzheimer’s disease patients across different brain regions.

  • Single cell transcriptomics and epigenetics of the opioid use disorder and HIV syndemic in the human brain

    Last Updated:

    As part of the NIH NIDA SCORCH consortium, we will dissect the dysregulated molecular circuitry in the brains of individuals with opioid use disorder and/or HIV infection. This project aims to identify genes that contribute to opioid use disorder and HIV-associated neurocognitive disorders. These approaches could lead to novel gene therapies to control and perhaps reverse the relentless disease state. We are in the process of generating snRNA-seq and snATAC-seq profiles from more than 300 patient samples across 3 different brain regions.

  • Single Cell Transcriptomics of the Cocaine Use Disorder in the Context of HIV

    Last Updated:

    As part of the NIH NIDA SCORCH consortium, we will dissect the dysregulated molecular circuitry in the brains of individuals with cocaine use disorder and/or HIV infection. This project will focus on understanding how neurovasculature and neuroimmune cells contribute to cocaine use disorder and HIV-associated neurocognitive disorders. We will be generating snRNA-seq and snATAC-seq profiles from more than 300 patient samples across 3 different brain regions.

| Biomedical Informatics

Our goal is to understand the associations between genetic variation and human disease. As part of the Center for Admixture Science and Technology (CAST), we work with large genetic datasets, such as the UK BioBank, GTEx, All of Us (AoU) and the Million Veterans Program (MVP) to characterize the associations between genetic variation and disease in global and local ancestry-aware settings.

  • Ancestry-aware genome-wide association studies

    Last Updated:

    For the past 500 years, the American continent has been the site of ongoing mixing of Europeans, Native Americans, Africans and Asians, resulting in a significant percentage of Americans carrying ancestry from outside their self-identified race. However, genome-wide association studies (GWAS) and polygenic risk score (PRS) calculations have been performed and optimized on individuals of European descent, with minor exceptions. As part of CAST (Center for Admixture Science and Technology), we are developing methods to perform GWAS and calculate PRS on diverse and admixed populations, using large diverse cohorts, such as the All of Us and Million Veteran Program.

| Biomedical Informatics

  • Development of a Research Electronic Health Record for Clinical Decision Support Studies

    Creation of effective clinical decision support tools has the potential to significantly improve the quality of care delivered within our healthcare system. However, developing and testing prototypes of these tools requires access to realistic electronic health record (EHR) environments. This process has often involved prohibitively long turnaround times due to time and resource constraints of healthcare information systems groups. For research and educational purposes, these barriers could be avoided by creating an investigator-controlled research EHR and populating it with realistic clinical data. Such a system could enable researchers and students to develop a wide range of novel and innovative clinical decision support tools much more rapidly.

    Aims:

    1. Install a sophisticated, open-source EHR (OpenMRS)
    2. Populate this EHR with deidentified data from a rich clinical database (MIMIC II)
    3. Develop one or more prototype clinical decision support tools within this environment

| Pediatrics

Welcome to the Frazer Lab! We are using two complementary approaches to achieve our goal of identifying and characterizing functional human genetic variants. Our first approach utilizes iPSCORE, a resource that was generated to enable both familial and association-based genetic studies of molecular and physiological phenotypes in induced pluripotent stem cells (iPSCs) and derived cell types. Our second approach involves conducting association studies in well-characterized cohorts with the goal of identifying variants that play roles in human disease and to assess their contributions to disease pathogenesis, progression, and prognosis.

  • Investigate fetal-specific cardiac regulatory variants and their overlap with cardiac GWAS lead variants

    Last Updated:

    We have derived iPSC-CVPCs from 180 individuals and showed that their transcriptomes are more similar to fetal heart than to adult cardiac tissues. Our goal is to leverage these data in combination with WGS to perform eQTL analyses. We plan to assess whether fetal-specific eQTLs are associated with complex adult cardiac traits, by colocalizing eQTLs with summary statistics from GWAS (cardiac traits.) Our preliminary analyses show that eQTLs in iPSC-CVPCs identifies cardiac disease GWAS variants that are active in the fetal but not adult heart, indicating that they play a role in development. Our findings provide genetic evidence supporting the fetal origins of the cardiovascular disease hypothesis and highlight the importance of investigating genetic associations across stages of development (i.e. fetal and adult tissues) to fully understand the genetic underpinnings of complex traits and disease. We are looking for rotation students to conduct QTL analyses using large ATAC-seq and ChiP-seq for H3K27ac datasets generated from the iPSC-CVPCs.

| Biomedical Informatics

The Oncogenomics laboratory is located in the Moores Cancer Center. Its research program is focused on the identification of genetic and epigenetic markers for cancer prevention and progression as well as drug response. The laboratory is a humid laboratory, combining both wet-lab techniques and bioinformatics analysis to study cancer samples from patients and animal models of cancer. The laboratory is also an important partner for multiple principal investigators at the Moores Cancer Center, collaborating on the design, analysis and interpretation of their genomic experiments.

  • Development of Genomics Virtual Machines in HIPAA compliant cloud

    Genetic information is considered protected health information (PHI) and as a consequence the highest security standards need to be applied for its storage, analysis and sharing. The oncogenomics laboratory is using state of the art iDASH compute cloud for its main computation. As a consequence, we participate in the development of optimal workflows and virtual machines for the analysis of patient-derived genomic datasets such as whole exomes, whole genomes, RNA-seq or genotyping arrays. 

    In this project we will develop robust provisioning methods to establish virtual machines capable of running popular human genomic analysis workflows. We will benchmark these machines and workflows and convert some of them into standard recipes for production-grade, reproducible genomic analysis.

  • Genetic and epigenetic of cisplatin resistance

    Last Updated:

    Cisplatin (cDDP) is the most commonly used chemotherapeutic drug, but most cancer eventually become resistant, leading to tumor recurrence. Several biological processes may modulate cDDP sensitivity: Drug import, export, detoxification, DNA repair, apoptosis. Drug resistance is transmitted to daughter cells, and one can build up resistant cell lines in vitro using sequential treatments. We are interested in identifying the genetic mutations that mediate this resistance. For this, we have derived resistant cell-lines from single clones of a cDDP sensitive ovarian cancer cell line. Using exome sequencing as well as target sequencing, we propose to determine mutations in genes and pathways that drive drug resistance. We will then expand the findings to the TCGA samples, using time to recurrence as an indicator of drug sensitivity.

  • The role of inherited variation in cancer somatic landscape

    Last Updated:

    The role of germline or inherited variation in cancer has been studied in selected families and led to the identification of genetic variants that are dominant and responsible for cancer syndromes. Similarly, rare recessive variants with lower penetrance are responsible for the increased risk in breast and ovarian cancer (BRCA1/2). More common variants in the population have also been identified through GWAS, and have revealed multiple SNPs associated with a modest increase in cancer risk. Despite these advances, multiple variants of intermediate allelic frequency in the population, or carried by patients with undocumented family history still remain variants of unknown significance (VUS) and can still play a role in tumor development. In addition, the contribution of variants located outside of the coding region has been underexplored and can now be reexamined in the light of recent maps of the regulatory landscape. The long-term goal of this research is to utilize germline genetics variation in cancer prevention and care to better stage patients or predict their response to treatment.

    We propose to identify the germline variants in the UCSD Cancer center patients (targeted gene panel) as well as in the public TCGA/ICGC datasets (whole genomes). We will then test these variants, alone or in combination to identify the ones that impact cancer onset, the tumor somatic landscape or tissue-specific regulatory network. The project will involve the processing of high throughput sequencing data, population genetics, and statistical analysis, in a HIPAA compliant cloud-computing environment.

| Psychiatry

The lab has a variety of bioinformatics projects aimed at improving understanding of the functional impact of autism risk mutations derived from exome and whole genome sequencing of the patients. We created mouse models carrying some of these mutations using CRISPR/Cas9, and also produced patient-derived cerebral organoids with autism risk mutations. We performed bulk RNA-seq from various brain regions or time periods in these models. Gene-level analyses of RNA-seq data has been completed (manuscripts in preparation). We are now pursuing isoform-level analyses of these data to better understand functional impact of autism risk mutations on splicing isoform transcriptome.

  • Isoform transcriptome of Cul3-HET mouse model

    Last Updated:

    The project deals with constructing the isoform-level co-expression and protein interaction networks for predicting functional impact of mutations in high risk autism gene Cul3. We have collected RNA-seq and TMT-proteomics data from various brain regions of Cul3+/- transgenic mouse. We are aiming at integrating isoform-level RNA-seq data with quantitative proteomic (peptide-level) data from the same samples to understand the impact of Cul3 mutation.

  • Isoform transcriptome of patient-derived cerebral organoids from 16p11.2 CNV carriers with autism

    Last Updated:

    Copy number variants (CNVs) represent significant risk factors for Autism Spectrum Disorders (ASD). One of the most frequent CNVs involved in ASD is a deletion or duplication of the 16p11.2 CNV locus, spanning 29 protein-coding genes. Despite the progress in linking 16p11.2 genetic changes with the phenotypic (macrocephaly and microcephaly) abnormalities in the patients and model organisms, the specific molecular pathways impacted by this CNV remain unknown. We generated bulk RNA-seq and TMT proteomic data from patient-derived cerebral organoids (3 deletion, 3 duplication and 3 control patients). The goal of the project is to analyze isoform-level RNA-seq data, as well as proteomics data to investigate functional impact of 16p11.2 CNV.

| Pediatrics

The Knight lab has broad interests in the human microbiome, the collection of trillions of microbes that inhabits our bodies, especially in developing techniques to read out these complex microbial communities and use the resulting data to understand human health, links between humans and the environment, and to prevent and cure disease. We offer a fast-paced environment with many collaborative opportunities on different projects.

  • Machine Learning for the Microbiome

    Last Updated:

    We have amassed a database of microbial DNA sequences from hundreds of thousands of biological specimens. Understanding how these changes relate to disease requires a range of machine learning and multivariate statistical approaches. There are many opportunities ranging from entry-level (benchmarking classifier performance on specific sample sets) to extremely challenging (using deep learning to infer the structure of global sample set relationships).

  • Multi-omics integration

    Last Updated:

    An increasing need is to integrate data from different "omics" level, e.g. genomes, metagenomes, metatranscriptomes, metaproteomes, metabolomes, immunological profiling, etc., into a single coherent picture separating healthy and disease states. Improved methods for performing this task, either directly or via intermediate representations such as mapping to metabolic and regulatory pathways, is essential for improving understanding. Projects in this category range from simple (testing where existing techniques like correlation networks or Procrustes analysis do/don't connect two specific data layers) to challenging (use transfer learning to integrate heterogeneous data layers and improve the underlying network annotation). An especially exciting emerging research direction here is XAI (explainable artificial intelligence), which can provide for clinical applications a better justification for a specific classification or suggestion.

  • Optimizing microbiome algorithms

    Last Updated:

    Many algorithms used in microbiome studies, especially in metagenomic assembly, are extremely computationally expensive. Opportunities exist for either exploiting new hardware architectures to accelerate existing algorithms, or for developing new approximate algorithms, to tackle problems in the workflow including inferring taxonomy and function from DNA sequence data, genome and metagenome assembly and annotation, computing community distance metrics from sparse compositional data, and high-level analyses of hundreds of thousands of microbiomes. Again these projects range from entry level (compare results of two multiple sequence alignment techniques for subsequent community analysis) to advanced (use non-von Neumann architectures to perform pattern classification in real time at the whole community level for disease detection).

| Biomedical Informatics

Dr. Koola is a physician scientist specializing in Biomedical Informatics and Hospital Medicine. He specializes in the area of big data machine learning for predictive analytics. In particular, he is interested in using electronic health records to improve care delivery--particularly for patients with advanced liver disease. Using risk prediction models in a healthcare context requires understanding of: (i) the healthcare system of intended use; (ii) risk model building; (iii) risk model assessment; and (iv) risk model re-calibration. Additionally, Dr. Koola is interested in visual analytics, data modeling, and health services research.

  • Designing the "Green Button" informatics consult service using big data analytics for personalized medicine

    Last Updated:

    In 2012 the Institute of Medicine released a desiderata for a learning healthcare system, where evidence informs practice and practice informs evidence. Though the randomized clinical trial (RCT) serves as the gold standard for informing clinical decisions, flaws exist in terms of achieving recruitment, overly stringent inclusion/exclusion criteria, and lack of patient-centered decision making. Observational cohort studies have grown as an important complement to RCTs allowing comparative effectiveness research and patient-centered trials. The surge of Electronic Health Records (EHR) and its resulting zettabyte of data5 allows us to realize this vision for the first time. Despite the growth of observational cohort studies, challenges still remain bringing the knowledge from the bench-to-the-bedside; moreover, model performance degrades when used in a cohort outside of its development.

    To ameliorate these difficulties, we propose to launch and study a novel “informatics consult” service. The service would allow clinicians, when no clear evidence based guidelines exist regarding care decisions, to query the UCSD clinical data warehouse by identifying patients similar to the index case. First proposed in the seminal “Green Button” paper by Longhurst et al., such a system would leverage our ability to truly deliver personalized, patient-centered care. Small-scale limited efforts have been put into practice to answer questions regarding treatment of melanoma8 and systemic lupus erythematosus complications. We note, however, the opportunity for a much larger service with broad impact starting with insights borne of data from UCSD, and potentially mining insights from the entire state-wide UC Health data warehouse.

    We note several novel challenges to this proposed system: (i) Performing semi-automated phenotyping so that we can identify clinical outcomes of interest10. (ii) Identifying patients that are similar to the index patient (often called clustering). (iii) Incorporating automated, computable search regarding guideline recommended care. (iv) Performing visual analytics to understand similarity of cohorts. (v) Communication of probability and statistical information to healthcare professionals so they can effectively manage uncertainty.

    Student responsibilities:

    1. Participate in project meetings
    2. Help design one of several possible algorithms/interfaces:
      a. patient clustering algorithm using unsupervised learning
      b. visual analytic interface for describing similar cohort of patients
      c. visual analytic interface to help communicate statistical risk
  • Integrating patient reported outcomes into the electronic health record to improve cardiovascular care.

    Last Updated:

    Unhealthy dietary choices—a lack of nutritious foods and an excess of unhealthy food—was shown as the major contributor in the 400,000 U.S. deaths in 2015 from cardiovascular diseases (CVD). Eating more nuts, vegetables, and whole grains, and less salt and trans fats, could save tens of thousands of lives in the U.S. each year. Obesity is one critical outcome of poor diet, which also contributes to heightened CVD risk. Thousands of smartphone apps are available to download for weight loss, but these apps primarily focus on caloric intake, rather than the overall quality of diet and lifestyle critical for CVD prevention.

    Mobile Health (mHealth) applications also have not been systematically tested for their effectiveness and are criticized for not having an evidence-based foundation. In this study, we adapt the design of mHeart to communicate automatically with the UCSD Electronic Health Record to help healthcare providers have access to psychosocial aspects of patient's care outside of the direct hospital system. In particular, the provider will be able to view logs of patient activity, dietary choices, and other lifestyle choices. The provider will also be able to send feedback to the patient to alter behavior.

    Student opportunities:

    1. Help modify smartphone app to make use of healthcare connection protocols like Apple HealthKit and Google Fit
    2. Understand interfaces that communicate with electronic health records (like FHIR)
    3. Help design point-to-point interface between smartphone app and electronic health record data, which is presented to provider
    4. Participate in meetings designing pilot study to test app performance
  • Systematic review and meta-analysis of hospital readmission for patients with cirrhosis.

    Last Updated:

    Patients with cirrhosis, a late stage of chronic liver disease, are at increased risk of hospitalization and hospital readmission. Although several studies have looked at models for predicting readmission for patients with cirrhosis, they are limited by small sample sizes, limited candidate predictor variables, and limited evaluation of discrimination and calibration. A systematic review and meta-analysis of available evidence can help shed new light on the problem, and help identify modifiable risk factors.

    Student responsibilities:

    1. Understand the basics of a systematic review
    2. Perform literature review
    3. Abstract necessary information in case report forms and help perform meta-analysis
    4. Help write manuscript

| Biomedical Informatics

Dr. Tsung-Ting Kuo is an Assistant Professor of Medicine in University of California San Diego (UCSD) Health Department of Biomedical Informatics (DBMI). He is mainly conducting biomedical, healthcare and genomic studies based on blockchain and predictive modeling. His research focuses on blockchain technologies, machine learning, and natural language processing.

  • Developing privacy-preserving predictive modeling algorithms on blockchain networks

    Last Updated:

    Predictive modeling can advance research and facilitate quality improvement initiatives and substantiate research results, especially when data from multiple healthcare systems can be included. However, current, state-of-the-art privacy-preserving predictive modeling frameworks are still centralized, in other words, the models from distributed sites are integrated in a central server to build a global model. This centralization carries several risks, e.g., single-point-of-failure at the central server. To improve the security and robustness of predictive modeling frameworks, we will develop and implement novel and advanced algorithms on decentralized blockchain networks (a distributed ledger/database technology adopted by crypto-currencies such as Bitcoin and Ethereum) to build better models. The outcome will be algorithms that improve the predictive power of data from multiple healthcare systems through a distributed system. Selected references: PMID 36402113, 34923447, and 31943009.

| School of Medicine

Our goal is to identify genes causing insulin resistance in humans in order to find new therapeutic targets for diabetes and cardiometabolic diseases. Our approach to discovery is grounded in human genetics, clarified through systematic, high throughput experimentation in human cells, and calibrated by its relevance to clinical disease. We use massively parallel genome engineering to re-create mutations identified in patients and develop high-throughput assays to interrogate function in human cell models. We apply bioinformatics and statistics to make sense of this data integrating 1) human mutations, 2) cellular function, and 3) metabolic/glycemic phenotypes of the individuals who harbor them. Using this approach, we have discovered novel missense mutations that greatly increase risk for type 2 diabetes. As a complementary aim towards precision medicine, we develop tools for clinical genome interpretation powered by high-throughput experimental data.

  • Evaluating accuracy and clinical utility of commercially available genetic risk scores for diabetes

    Last Updated:

    Recently, 23andMe, which sells direct to consumer genetic testing products, has introduced a diabetes risk report based on single nucleotide polymorphisms (SNP) genotypes measured in their commercial product ($199: https://www.statnews.com/2019/03/10/23andme-will-tell-you-how-your-dna-affects-your-diabetes-risk-will-it-be-useful/). The clinical utility of this report is unclear and has generated significant controversy. Critically, 23andMe’s SNP-chips only test about 0.02% of the human genome. We have shown in previous work that a single rare SNP, not captured by SNP-chips, can change an individual’s risk of diabetes by 7-fold. The purpose of the project is to test the 23andMe diabetes report output in a dataset of individuals whose diabetes status is known and who have also undergone more extensive genome sequencing (whole exomes) to assess the accuracy of direct to consumer SNP tests and quantify the number of falsely reassuring tests when more complete genetic information is considered.

  • Identifying discriminators of drug-responsive mutations in Mendelian diabetes

    Last Updated:

    Loss-of-function mutations in hepatocyte nuclear factor 1 (HNF1A) cause autosomal dominant diabetes of the young (MODY3). Patients with MODY3 clinically are difficult to distinguish from patients with autoimmune type 1 diabetes and are therefore often given the same treatment consisting of multiple daily injections of insulin. However, MODY3 patients can be effectively treated with a single daily tablet of sulfonylureas and thus spared from having to take multiple daily injections. This project aims to utilize data generated from cells engineered to express a range of HNF1A mutations (MODY3 and non-MODY) followed by RNA-sequencing to identify a signature of genes that can distinguish between sulfonylurea responsive mutations and non-responsive mutations. This transcriptomic signature would form the basis of a biomarker test in patients with HNF1A mutations to predict their responsiveness and provide the most effective, least burdensome treatment.

  • Integrative genomics to identify a novel disease-causing mutation in the Simpson Golabi Behmel Syndrome (SGBS)

    Last Updated:

    The SGBS syndrome is characterized by overgrowth of multiple body parts. It is a rare genetic disease that has been attributed to inactivating mutations in GPC3. We have stem cells from a patient with SGBS syndrome but NO GPC3 mutation implicating another as yet unknown causal gene. We have performed whole genome sequencing and RNA sequencing on these cells. The goal of this project is to identify the causal gene utilizing the genomic data sets to create a “short list” of causal genes which then can be assessed experimentally in the patient cells using genome engineering.

| Biomedical Informatics

| Biomedical Informatics

The Sitapati Lab is an operational & translational space with expertise in the following domains: (1) clinical informatics, (2) population health (i.e. registries, outreach), (3) quality informatics, (4) vital records informatics, (5) NIH All of Us researcher workbench. The lab includes teams from Information Services at UCSD, the CalIVRS team, Quality and Patient Safety, and the Population Health Services Organization to name a few.

  • EMR based registries

    Last Updated:

    Our IS Population Health team typically builds registries in 90 day cycles that complete the organizational needs and mission. These vary but require workflow, data cleaning/mapping, creation of metrics.

  • QIP: Public health quality informatics

    Last Updated:

    UCSD has an active quality improvement program that advances health to patients. Within the program there are opportunities to improve data quality, outreach campaigns, and outcomes measurement as part of quality informatics. Most projects would last 6-24 months.

  • Vital Records Informatics

    Last Updated:

    Advanced processes that aim to modernize vital records for public health purposes such as interoperability, usability, and accessibility are needed. Projects that evaluate current state with primary outcomes description of future state and manuscript could be helpful to the field advancement.

| Bioengineering

My research focuses on time series analysis in biological systems, with an emphasis on practical information extraction for translational applications. The lab is divided into applications and approaches, though these all serve each other, and students collaborate routinely. Indeed, a positive attitude and an eagerness to support one another is requisite in the lab.  **Applications include but are not limited to: illness detection, prediction, and recovery monitoring; pregnancy detection and outcome forecasting; mental health monitoring; defining sleep in the body (as opposed to EEG); diabetes forecasting; and carbon footprint optimization of distributed computer systems.  **Approaches include, but are not limited to: multimodal time series information extraction; differentiating multiple outcome types from random assortment; reduction of high dimensional spaces with both modality, individual, and time series components; explicable machine learning model development; non-stationary signal analysis; novel approaches do diversity mapping and phenotyping from physiology and behavior data.  I seek to find a fit with each individual and the lab’s ongoing projects; no one comes in and is just given marching orders – you’ll do better work when it’s the work that you actually want to do!

  • COVID-19 recovery monitoring

    Last Updated:

    Some individuals seem to have lingering or failed recoveries after COVID-19 infections. Students comfortable with basic programming or data science skills are encouraged to enhance our description of recovery profiles from TemPredict, and search for features that can contribute to pre-recovery classification.

  • Diversity within physiological data

    Last Updated:

    Algorithms tend to be one size fits all, where as people are similar or dissimilar in complex and unmapped ways. Help map differences in normal routines, as well as in illness and recovery trajectories. These might arise from known demographic information, co-morbid conditions (diabetes, pregnancy, etc.), or be represent different patterns in illness associated with unknown or latent variables.

  • Improving women’s health outcomes

    Last Updated:

    We have shown repeatedly in humans and animal models that females are as tractable with statistics as males (actually, often more than). Yet female physiology remains inappropriately understudied. Help us refine algorithms, map changes like pregnancy and menopause, and explore diversity within as well as across traditional sex categories.

| Bioengineering

Our research focuses on molecular engineering for cellular imaging and reprogramming, and image-based bioinformatics, with applications in stem cell differentiation and cancer treatment.

  • Image-based reconstruction of biochemical networks in live cells

    Last Updated:

    Fluorescence resonance energy transfer (FRET)-based biosensors have been widely used in live-cell imaging to accurately visualize specific biochemical activities. We have developed the Fluocell image analysis software package to efficiently and quantitatively evaluate the intracellular biochemical signals in real-time, and to provide statistical inference on the biological implications of the imaging results. However, important questions arise on how to use these results to reconstruct the quantitative parameters in the underlying biochemical networks, which determine cellular functions and ultimately their fates. In this rotation project, we will integrate optimization-based machine learning approaches with biochemical network models to seek answers to these questions, with applications in cancer treatment against drug resistance.

  • Intelligent Diagnosis of Infectious Diseases by Deep Learning

    Last Updated:

    The diagnosis of infectious diseases often requires tissue biopsy and microscopic examination by pathologists, which is time-consuming, labor-intensive, and error-prone. To develop a software-assisting system for identifying microorganisms on digital images, we utilize the convolutional neural network and transfer learning for training and validating an intelligent software system for the classification of pathology slides. The goal of this project is to provide a diagnosis of pathogens with high efficiency and accuracy. Students will work in an interdisciplinary team, collecting and labelling imaging data, developing deep-learning based algorithms and user interfaces, characterizing and optimizing the accuracy and functionality of the software package.

| Biological Sciences

We study mathematical and computational models of biomedical processes, with a focus on infection, the immune system, and cancer. We also study mathematical models of evolutionary processes and develop evolutionary theory. We aim to couple mathematical modeling work with data from the relevant fields through collaborations with experimental and clinical laboratories.

  • Evolutionary theory

    Last Updated:

    This project develops basic evolutionary theory, with relevance to biomedical applications. For example, we study the evolution and emergence of mutants in spatially structured populations under various assumptions. Much remains to be discovered about the principles of mutant evolution in structured populations, and this has important applications for cancer biology and cancer therapy, since most tumor grow as a mass of cells with strong spatial structure.

  • Mathematical models on in vivo virus dynamics

    Last Updated:

    The project will be concerned with mathematical models of virus replication within hosts, and the interactions of viruses with immune responses. Much of this modeling work is concerned with human immunodeficiency virus (HIV), due to the availability of experimental and clinical data. Topics include the evolution of HIV within hosts, the effect of spatial lymphoid tissue structure on HIV dynamics and evolution, and the dynamics of HIV during antiretroviral therapy in relation to the latent viral reservoir.

  • Mathematical Oncology

    Last Updated:

    This project is concerned with mathematical models of cancer initiation, cancer progression, and cancer therapy. This involves mathematical models of tissue stem cell dynamics, clonal cellular evolution in tissues during aging in relation to the development of cancer, and evolutionary models of drug resistance in cancers. Hematological malignancies are a major focus of this work. With respect to therapies and drug resistance, this work involves the use of mathematical models with patient-specific parameters to make personalized predictions about treatment outcome.

| Computer Science and Engineering

My research interests lie primarily in machine learning, especially for large-scale spatiotemporal data. I am generally interested in deep learning, optimization, and spatiotemporal reasoning. I am particularly excited about the interplay between physics and machine learning. My work has been applied to learning dynamical systems in sustainability, health and physical sciences.

  • Automatic Blood Pressure Control with Machine Learning

    Last Updated:

    This project seeks to develop novel deep learning methods to forecast and control patients blood pressure using large-scale sensor data from artificial heart pump.