On-campus Research Opportunities (Undergraduate)

This page is updated annually. Some projects may already be taken, and new projects may be available. The projects below give an indication of the types of projects available in each lab, but please browse faculty web pages and contact professors directly to discuss current opportunities.

Also check the Undergraduate Research Hub and the REAL Portal.

Tiffany Amariuta | Halıcıoğlu Data Science Institute
Vineet Bafna | Computer Science and Engineering
Steve Briggs | Biological Sciences
Joseph Ecker | Salk Institute for Biological Studies
Michael Gilson | School of Pharmacy
Christopher Glass | Cellular and Molecular Medicine
Melissa Gymrek | School of Medicine
Tsung-Ting Kuo | Biomedical Informatics
Pavel Pevzner | Computer Science and Engineering
Scott Rifkin | Biological Sciences
Yatish Turakhia | Electrical and Computer Engineering
Gene Yeo | Cellular and Molecular Medicine

| Halıcıoğlu Data Science Institute

We are a statistical genetics lab focusing on developing methods to study complex traits and polygenic diseases across global populations, with a specific focus on minority groups that have been underrepresented in the fields of genetics and genomics. We are interested in developing novel multi-ancestry statistical methods for fine-mapping disease genes and their cell types of action. The goal of this research is to identify targets for gene-based therapeutics.

  • Mapping the genetic architecture of polygenic disease, complex traits, and gene expression levels

    Last Updated:

    The Amariuta Lab sometimes has bandwidth to take on talented undergraduate students with research or course experience in bioinformatics and statistical data analysis. We have a variety of predefined projects but are also open to student-led projects and ideas that fall within the general scope of research in our lab. These projects generally involve mapping the genetic component of gene expression and, separately, complex traits and polygenic diseases, in order to identify putative disease-critical genes that could serve as predictive biomarkers or therapeutic targets. All projects are in the area of statistical and population genetics, aiming to understand population-specific and shared genetic effects across diverse cell types and tissues, via the integration of high dimensional genomic data with globally diverse DNA sequence (genotyping) and disease data (phenotyping).

| La Jolla Institute for Immunology

We are interested in the analysis and modeling of the three-dimensional chromatin structure from high-throughput sequencing experiments. We develop methods that are based in statistics, machine learning, optimization and graph theory to understand how changes in the 3D genome affect cellular outcome such as development, differentiation and gene expression. We have ongoing interests in the systems level analysis and reconstruction of regulatory networks, inference of enhancer-promoter contacts, predictive models of gene expression and integration of three-dimensional chromatin structure with one-dimensional epigenetic measurements in the context of cancer, malaria, asthma and several autoimmune diseases.

  • Integrative analysis of multi cell-type gene expression and epigenomic data in tumor immune response

    Last Updated:

    This project will focus on developing regulatory network inference methods for the joint analysis of gene expression and histone modification data from several different types of tumor infiltrating lymphocytes, which are gathered from a cohort of patients with solid tumors.

  • Predictive and comparative modeling of epigenetic gene regulation in different human immune cell types

    Last Updated:

    The goal of this project is to model the natural variation in gene expression across many immune cell types using an already established database at LJI (https://dice-database.org) and to identify cell type-specific epigenetic regulators of important immune genes.

  • Statistical methods for inferring functional DNA-DNA contacts from Hi-C and HiChIP/PLAC-seq data

    Last Updated:

    This project focuses on developing computational tools for better analysis of the wealth of data from chromosome conformation capture assays with the ultimate goal of inferring functional chromatin contacts such as those between enhancers and promoters.

| Computer Science and Engineering

Our lab is focused on design and implementation of algorithms for biological data interpretation. Within this broad framework, we have a number of open projects relating to problems in proteomics (interpretation of mass spectrometry data), genetics, and genomics. The projects listed below are a small sampling of available projects. Interested students should be have taken a class in algorithms design, and have some facility with machine learning approaches.

  • Extrachromosomal DNA analysis

    Last Updated:

    Extrachromosomal DNA formation is an important pathological condition found in nearly a third of cancers and all cancer subtypes. Our lab is developing computational tools to characterize their structural and functional properties of ecDNA and related focal amplifications. 

    The interested students should have an interest in learning about,  designing and implementing graph algorithms, and should commit to taking my winter class CSE280A.

| Biological Sciences

We model relationships between the proteotypes and the phenotypes of cells/organisms, with an emphasis on innate immunity in plants. Proteotypes are measured using custom methodology for high-throughput proteomics based on mass spectrometry. Students have an opportunity to integrate training in bioinformatics with chemistry and biology.

The specific state of the proteome in a given cell, tissue, or organism is known as the proteotype. The proteotype integrates constraints imposed by the genotype, the environment, and by developmental history (e.g., a leaf cell has a different proteotype than a root cell with the same genotype in the same environment). The proteotype directly determines phenotype since all molecules are made by and regulated by proteins. Thus, a complete description of the proteotype should define a phenotype at the molecular level. We are constructing an Atlas of Proteotypes that currently includes 162,777 peptides from 41,553 proteins in 65 different tissues and stages of development. In addition, we have identified and measured more than 30,000 phosphopeptides from these same samples. The 65 resultant proteotypes are revealing thousands of unanticipated regulatory relationships. The relationships between mRNA levels and protein levels are fascinating; they indicate that protein levels from some genes are regulated by transcription but that most protein levels are under post-transcriptional control. Inspection of our data explains tissue specific traits such as oil accumulation in the embryo that results from selective accumulation of proteins from common mRNAs.

  • Stoichiometry of the cell

    Last Updated:

    With the rise of quantitative proteomics it is now possible to measure the absolute number of protein molecules in a cell. We are using multiple reactions monitoring (MRM) with a triple quadrupole mass spectrometer of heavy and light isotope-labeled peptides to quantify signaling and metabolic dynamics with a focus on the post-translational modifications of phosphorylation and acetylation. By placing biology on a quantitative basis we are contributing to several fundamental advances: the ratios of different proteins to each other are being determined (i.e., the stoichiometry); results from our lab can be combined with data from other labs because they are measured in absolute units; the ratios of proteins to metabolites or RNAs can be ascertained. We are constructing pathway proteotypes for signaling and metabolism to identify proteins whose levels are incompatible with a simple role in the process. To contribute to this effort we would like a student to construct an MRM database. The database will store all the heavy peptides we have available in the lab along with information about the proteins from which they are derived. We will store MS/MS information for peptides that we have observed in our proteome surveys using linear ion trap mass spectrometers. The database will store all reaction/transition data that we have obtained for each peptide along with the signal strength for each reaction product. MS1 scans with the triple quadrupole mass spectrometer will be included to evaluate the purity/intensity of the heavy peptides. This database will be of great use to the many labs that are beginning to place biology on a quantitative foundation.

| Radiology

We have a variety of projects ranging from brain mapping to derive optimal brain atlases, integrated omic analyses to identify genetic underpinnings of the brain, to precision medicine approaches for drug response prediction and drug target identification.

  • Genomic study of brain MRI phenotypes

    Last Updated:

    A major challenge hindering progress in neuropsychiatric medicine is our limited understanding of the genetics underlying the complexity of human brain structure and function. Our project aims to characterize genetic effects on the brain by multimodal imaging using human biobanks with MRI and genotype data. This will provide insight into shared and distinct genetic influences among different brain regions. Building on improved genetic knowledge of the brain, we will determine genetic relationship between brain morphology and neuropsychiatric disorders using statistical genetics tools. We will estimate effects of neuropsychiatric genetic risks and environmental exposures on deviations of MRI phenotypes from normal neurodevelopmental and aging trajectories.

  • Omic analyses for drug target identification

    Last Updated:

    Our goal is to identify potential drug targets of brain disorders (e.g., Alzheimer’s disease) through gene networks comprising disease-associated genes. Recent genomic studies have advanced our knowledge of the genetics of brain disorders and related traits, which could illuminate the pathogenesis of brain disorders. 
    The new knowledge provides opportunities for genetic-based strategies for drug target identification. Bioinformatics analyses will be performed to prioritize drug targets and potential drugs for repurposing.

| Psychiatry

Dr. Cheng’s research focuses on transcriptional regulatory network and aims to develop a comprehensive understanding of how aberrant regulatory circuits contribute to human disease. Dr. Cheng’s laboratory is particularly interested in understanding transcriptional and epigenetic regulation of the interplay between the immune system and central nervous system in neurodegenerative diseases, substance use disorder and HIV infection. Current projects focus on applying single-cell transcriptomics and epigenetics assays to characterize Alzheimer’s disease, HIV and opioid use disorder patient samples, with the goal of finding diagnostic markers and therapeutic targets. Dr. Cheng’s lab also has developed 3D brain organoid models for Alzheimer’s disease and HIV infection. Dr. Cheng received her M.S. degree in Computer Science from Stanford University, and she received her Ph.D. degree in Bioinformatics and Systems Biology from University of California, San Diego. After completing her doctoral study, Dr. Cheng did her postdoctoral training at the Broad Institute of MIT and Harvard.

  • 3D brain organoid model of Alzheimer’s disease revealed by single cell transcriptomics

    Last Updated:

    We developed a novel tau propagation model using 3D spheroid model that rapidly develop tau pathology and neurodegeneration in just three weeks. Single cell transcriptomics of the model reveals cell type specific changes that resemble transcriptomic signatures from Alzheimer’s disease postmortem brain.

  • Single cell transcriptomics and epigenetics of human Alzheimer’s disease brain

    Last Updated:

    To understand cell type specific vulnerability of Alzheimer’s disease, we utilize snRNA-seq to characterize human brain tissues from Alzheimer’s disease patients across different brain regions.

  • Single cell transcriptomics and epigenetics of the opioid use disorder and HIV syndemic in the human brain

    Last Updated:

    As part of the NIH NIDA SCORCH consortium, we will dissect the dysregulated molecular circuitry in the brains of individuals with opioid use disorder and/or HIV infection. This project aims to identify genes that contribute to opioid use disorder and HIV-associated neurocognitive disorders. These approaches could lead to novel gene therapies to control and perhaps reverse the relentless disease state. We are in the process of generating snRNA-seq and snATAC-seq profiles from more than 300 patient samples across 3 different brain regions.

  • Single Cell Transcriptomics of the Cocaine Use Disorder in the Context of HIV

    Last Updated:

    As part of the NIH NIDA SCORCH consortium, we will dissect the dysregulated molecular circuitry in the brains of individuals with cocaine use disorder and/or HIV infection. This project will focus on understanding how neurovasculature and neuroimmune cells contribute to cocaine use disorder and HIV-associated neurocognitive disorders. We will be generating snRNA-seq and snATAC-seq profiles from more than 300 patient samples across 3 different brain regions.

| School of Pharmacy

Our work aims to develop new mass spectrometry based methods to understand the chemistry of microbes, our microbiome and their ecological niche. In short, we develop tools that translate the chemical language between cells. This research requires the understanding of (microbial) genomics, proteomics, imaging mass spectrometry, genome mining, enzymology, small molecules structure elucidation, bioactivity screening, antibiotic resistance and an understanding of small molecule structure elucidation methods. The collaborative mass spectrometry innovation center that he directs is well equipped and now has twelve mass spectrometers, that are used in the studies to investigate capture cellular chatter (e.g. metabolic exchange), metabolomics, metabolism and to develop methods to characterize natural products. These tools are used to defining the spatial distribution of natural products in 2D, 3D and in some cases real-time. Areas of recent research directions are capturing mass spectrometry knowledge to understand the microbiome, non invasive drug metabolism monitoring, informatics of metabolomics, microbe-microbe, microbe-immune cells, microbe-host, stem cell-cancer cell interactions and diseased vs. non-disease model organisms and the development of strategies for mass spectrometry based genome mining and to detect and structurally characterize metabolites through crowd source annotation of molecular information on the Global Natural Products Social Molecular Networking site through the NIH supported center for computational mass spectrometry that is co-developed with Nuno Bandeira. A more detailed biography can be found in this Nature article.

  • Post-Translational Modifications Projects

    Last Updated:

    The Dorrestein laboratory is interested in the functional aspects and biosynthesis of post-translational modifications (PTMs). Of particular interest are orphan genes (genes currently assigned to have no known function) that are responsible for the generation of bioactive natural products (e.g. antibiotics, anti-cancer agents etc.) or PTMs. This research aims to understand the functions of such genes by the use of high-resolution mass spectrometry. To achieve this goal, the lab will have the most advanced mass spectrometer on the UCSD campus. Please see the Research page on the lab website for descriptions of current research projects.

| Salk Institute for Biological Studies

The establishment of cell-type identity and specific gene expression patterns is tightly regulated by the interplay between different modalities of the epigenome. These include DNA cytosine methylation, which affects transcription factors’ binding to regulatory elements, and higher-order chromosome structures bridging the distal regulatory elements to the target genes. Studying their diversity across cell types is fundamental for understanding complex human diseases in different tissue contexts.

  • Characterization of gene regulatory elements using multi-omic data

    Last Updated:

    Characterization of gene regulatory elements using multi-omic data

  • Species comparisons of brain cell types

    Last Updated:

    Computational analysis of multi-omic single cell data from 4 species (mouse/marmoset/macaque/human)

| La Jolla Institute for Immunology

After decades of research, we still do not know why the influenza (flu) vaccine elicits a strong antibody response in some but a negligible response in others. Rather than analyzing each vaccine study individually, our lab combines the wealth of existing data to predict each person's response and then tailors their choice of vaccine to maximally augment their immunity. Influenza is one of the best-studied viruses of all time, yet the models we develop are designed to readily generalize to other pathogens and other biological systems.

  • Finding the needle in the haystack: Using outlier detection to assess data quality

    Last Updated:

    In this era of big data, it is difficult to know when a peculiar trend represents a true signal or an artifact. This project will search for the potential outliers amidst the many influenza vaccine studies that have been conducted to date. We will test whether odd trends hold across other studies and determine a "quality score" that quantifies our trust in each dataset.

  • Hunting for the best vaccine response

    Last Updated:

    While the influenza vaccine elicits a moderate antibody response in most individuals, in some cases we get amazingly strong, broad, and durable responses. This project will search for what traits best identify these "super responders."

  • Time series forecasting

    Last Updated:

    Many studies have measured the response to vaccination, yet they often do so at different times. This project aims to unify these heterogeneous datasets by predicting each person's response at any time point that has been measured in any study. A basic level of programming is required. Expect to learn new techniques spanning from matrix completion to machine learning.

| Pediatrics

The Gaulton lab studies the effects of human genetic variation on gene regulation and diabetes risk. We use computational and statistical methods to integrate genome sequence information with epigenomic annotation and molecular QTL data.

  • Genetic and epigenomic fine-mapping of diabetes risk loci

    Last Updated:

    This project involves dense genetic fine-mapping of diabetes risk loci, integrating fine-mapping data with large-scale genomic and epigenomic maps using published and novel models to identify causal variants, cell types and networks, and applying these predictive models to identify additional diabetes risk loci.

  • Predicting causal genes at diabetes risk loci

    Last Updated:

    This project involves the development of novel methods for integrating genetic association data with epigenomic annotation, expression QTLs and chromatin QTLs to predict causal genes of diabetes risk variants.

  • Predicting genome-wide pleiotropic effects of diabetes risk variants

    Last Updated:

    This project involves development of novel mixture model approaches to predicting and quantifying the extent of pleiotropy among diabetes risk variants genome-wide.

| School of Pharmacy

We work on many aspects of molecular mechanism, modeling and design.  Our core interests are in the physical chemistry and algorithms underpinning computer-aided drug design methods, but we also have much wider interests, such as simple model receptors for studying molecular recognition; how molecule motors work; chemical informatics and its interface with bioinformatics; how membranes work; and synthetic catalysts and nanoparticles.

  • Information theory studies of protein sequence and structure

    Last Updated:

    While working on methods of computing entropy from molecular simulation data, we stumbled on an interesting mathematical angle for studying correlations in high-dimensional spaces. The math can be applied in various realms, and we got some interesting results applying it to residue-residue (and higher-order) correlations within protein sequences.

    Thus, this would be an exploratory project to see if we can learn new things about sequence-function relationships in proteins, and maybe about how to design new proteins, by studying residue-residue correlations, particularly in the context of 3D protein structures.

| Cellular and Molecular Medicine

Dr. Glass’ primary interests are to understand transcriptional mechanisms that regulate the development and function of macrophages. Macrophages play key roles in immunity, wound repair, development and tissue homeostasis. Dysregulation of macrophage functions contribute to a broad spectrum of human diseases, including atherosclerosis, diabetes, neurodegenerative diseases, and cancer. A major effort of the Glass laboratory is to use genomics assays and associated bioinformatics approaches to understand how macrophage gene expression programs are established and how they are influenced by different tissue environments and disease. An important concept to emerge from these studies is that enhancers can be exploited to deduce the transcription factors and upstream signaling pathways that drive context-specific transcriptional outputs. Students are welcome to select projects from current areas of active investigation.

  • Natural genetic variation and macrophage gene expression

    Last Updated:

    Many lines of evidence, including genome-wide association studies, indicate that non-coding genetic variation plays a major role in determining phenotypic diversity. We were among the first laboratories to define the impact of natural genetic variation on enhancer selection and function (Heinz et al, Nature 2013 PMID 24121437), but at present it remains difficult to predict the impact of non-coding variation on gene expression. In a novel and ambitious effort, we systematically characterized the genome wide patterns of mature RNA (RNA-seq), nascent RNA (GRO-Seq), transcriptional initiation (5’GRO-seq), histone modifications and binding profiles of lineage-determining and signal-dependent transcription factors (ChIP-seq), DNA methylation (bisulfite sequencing), and chromatin conformation (HiC, capture HiC and PLAC-seq), in resting and activated macrophages derived from 5 different inbred strains of mice providing ~60 million single nucleotide variants, ~6 million InDels and several hundred thousand structural variants. This data set provides a unique resource for investigating the impact of non-coding variants on transcription factor binding, enhancer activation and target gene expression. We are currently developing new computational methods for analyses of these data with a goal of explaining effects of non-coding mutations and predicting patterns of gene expression in new mouse strains. Related projects are investigating the relationships of genetic variation between selected mouse strains and their different susceptibilities to metabolic and cardiovascular disease. This general project area is both challenging and open ended and there are a wide range of directions that rotation projects could take. As examples, recent rotation students have implemented machine learning approaches to investigate how sequence variants affect collaborative binding between lineage-determining transcription factors.

  • Nature and nurture of microglia

    Last Updated:

    Each population of tissue resident macrophages exhibits a distinct pattern of gene expression that is tuned to the developmental and homeostatic needs of that tissue. For example, brain macrophages called microglia produce factors that are trophic for neurons and monitor synapses, functions that require a brain-specific program of gene expression. A key question is how this tissue-specific program of gene expression is achieved. Through analysis of gene expression and enhancer landscapes, we obtained evidence that the microglia-specific molecular phenotype results from instructive signals in the brain that direct the activation of microglia-specific enhancers (Gosselin et al., Cell 2014 PMID 25480297). Of particular interest, delineation of the gene expression patterns and enhancer landscapes of human microglia revealed that a substantial fraction of the genes associated with non-coding GWAS risk alleles are preferentially or exclusively expressed in microglia, and many are brain environment dependent (Gosselin et al. Science 2017 PMID 28546318). These findings raise several important questions that are under active investigation, including what are the environmental factors that dictate the brain specific program of gene expression and how do human genetic variants affect the regulation of genes that are linked to neurodegenerative disease. We are taking a multi-disciplinary approach including studies of in vivo mouse models, in vitro human iPSC-derived microglia, genomic assays of microglia nuclei derived from control and Alzheimer’s disease brains, and direct analyses of the relation of genotype to gene expression in a growing of RNA-seq data base derived from purified human microglia. As an example, a recent rotation project investigated the question of whether there is any relationship between circulating monocytes (a white blood cell that can differentiate into macrophages in tissues) and microglia gene expression patterns from the same individual. 

| School of Medicine

Our overall goal is to understand complex genetic variants that underlie human disease. We are particularly interested in repetitive DNA variants known as short tandem repeats (STRs) as a model for complex variation. Our work focuses on developing computational tools for analyzing and visualizing complex variation from large-scale sequencing data and applying these tools to learn about the contribution of repetitive variation to human disease.

  • Measuring genetic constraint in the non-coding genome

    Last Updated:

    A major result from recent genome-wide association studies (GWAS) is that the majority of genetic variants driving common human disease lie in regulatory, rather than protein-coding, regions. While it is relatively straightforward to predict the consequences of mutations in coding regions, we are far from being able to interpret and sift through the large number of non-coding variants arising from whole genome studies. Recent studies have leveraged population-wide genetics datasets to determine genes that are depleted of variation, or "constrained", and thus presumably important for human health. In this project we will develop statistical tests using large panels of human genetic variation to systematically measure constraint for a variety of regulatory annotations and evaluate the utility of these annotations for prioritizing variants from medical genetics studies.

| Physics

Terence Hwa, ​Departments of Physics and Molecular Biology

The Hwa lab (a.k.a. the Quantitative Microbiology Lab) uses a combination of experimental and theoretical approaches to elucidate the organizational principles of living systems. The goal is to quantitatively characterize the physiological behaviors and understand how they arise in terms of the underlying molecular interactions. Our lab focuses on the bacterium E. coli, because it is perhaps the best characterized in terms of molecular components and interactions. But we do also study higher organisms together with collaborating labs. Please visit our lab webpage (https://matisse.ucsd.edu) for further information.

  • Quantitative studies of bacterial physiology

    Last Updated:

    An outstanding challenge in making biology quantitative and predictive is how to deal with the millions or even billions of missing parameters that describe the underlying molecular interactions. In recent years, our lab pioneered a top-down approach which exploited a number of phenomenological laws to accurately predict the physiological responses of bacteria to environmental and genetic changes (e.g., nutrients, antibiotics, heterologous protein expression) [DOI: 10.1126/science.1192588]. Furthermore, insight from this quantitative physiological approach is able to pinpoint key missing molecular interactions in long-studied biological processes [DOI: 10.1038/nature12446]. The lab has a number of projects further extending this basic approach to a variety of problems in microbiology, including growth transitions, stress response, antibiotic resistance, and biofilm formation.

| Biological Sciences

Cells must continuously maintain integrity and compartmentalization with demands for cellular remodeling throughout development, immunity, aging and disease. Using functional genomics, genetics and cell biological approaches in the fruit fly, Drosophila, we are studying the central roles for membrane regulation of dynamic cell structure. We have identified novel endocytosis and autophagy membrane trafficking pathways that control macrophage and muscle remodeling, with relevance to human disease. Current projects in the lab aim to discover new mechanisms of cellular remodeling through functional genomic and proteomic approaches, and to better understand the pathway networks and dynamics during cellular remodeling.

  • Autophagy networks

    Last Updated:
    • Expand on our ongoing co-immunoprecipitation and mass spectrometry datasets to identify protein-protein interactions involved in autophagy.
    • In collaboration with the Ideker lab and the SDCSB Network Assembly Core, analyze coIP results and incorporate functional data into an ‘autophagy network’.
    • Test new insights predicted from network by in vivo autophagy assays.
  • Computational analysis of lipid regulators and effectors in Drosophila development

    Last Updated:
    • Use databases and bioinformatics to identify all predicted phosphoinositide lipid regulators and effectors (binding proteins) in Drosophila.
    • Mine databases of Drosophila tissue and stage-specific gene expression, function and protein-protein interaction information for each candidate gene (above).
    • Identify potential relationships between regulators and effectors computationally and experimentally.
  • High-throughput image analysis of cell morphology

    Last Updated:
    • In collaboration with the Tsimring lab at the BioCircuits Institute, optimize newly developed machine learning image analysis algorithms to quantify cell shape and cell shape changes.
    • Conduct new RNAi screens to test optimized image analysis algorithms, and employ established methodology to screen for new and modifying (enhancer/suppressor) gene functions in cellular remodeling.
    • Perform network analysis of large-scale RNAi screen image data.

| Pediatrics

The Knight lab has broad interests in the human microbiome, the collection of trillions of microbes that inhabits our bodies, especially in developing techniques to read out these complex microbial communities and use the resulting data to understand human health, links between humans and the environment, and to prevent and cure disease. We offer a fast-paced environment with many collaborative opportunities on different projects.

  • Machine Learning for the Microbiome

    Last Updated:

    We have amassed a database of microbial DNA sequences from hundreds of thousands of biological specimens. Understanding how these changes relate to disease requires a range of machine learning and multivariate statistical approaches. There are many opportunities ranging from entry-level (benchmarking classifier performance on specific sample sets) to extremely challenging (using deep learning to infer the structure of global sample set relationships).

  • Multi-omics integration

    Last Updated:

    An increasing need is to integrate data from different "omics" level, e.g. genomes, metagenomes, metatranscriptomes, metaproteomes, metabolomes, immunological profiling, etc., into a single coherent picture separating healthy and disease states. Improved methods for performing this task, either directly or via intermediate representations such as mapping to metabolic and regulatory pathways, is essential for improving understanding. Projects in this category range from simple (testing where existing techniques like correlation networks or Procrustes analysis do/don't connect two specific data layers) to challenging (use transfer learning to integrate heterogeneous data layers and improve the underlying network annotation). An especially exciting emerging research direction here is XAI (explainable artificial intelligence), which can provide for clinical applications a better justification for a specific classification or suggestion.

  • Optimizing microbiome algorithms

    Last Updated:

    Many algorithms used in microbiome studies, especially in metagenomic assembly, are extremely computationally expensive. Opportunities exist for either exploiting new hardware architectures to accelerate existing algorithms, or for developing new approximate algorithms, to tackle problems in the workflow including inferring taxonomy and function from DNA sequence data, genome and metagenome assembly and annotation, computing community distance metrics from sparse compositional data, and high-level analyses of hundreds of thousands of microbiomes. Again these projects range from entry level (compare results of two multiple sequence alignment techniques for subsequent community analysis) to advanced (use non-von Neumann architectures to perform pattern classification in real time at the whole community level for disease detection).

| Biomedical Informatics

Dr. Koola is a physician scientist specializing in Biomedical Informatics and Hospital Medicine. He specializes in the area of big data machine learning for predictive analytics. In particular, he is interested in using electronic health records to improve care delivery--particularly for patients with advanced liver disease. Using risk prediction models in a healthcare context requires understanding of: (i) the healthcare system of intended use; (ii) risk model building; (iii) risk model assessment; and (iv) risk model re-calibration. Additionally, Dr. Koola is interested in visual analytics, data modeling, and health services research.

  • Designing the "Green Button" informatics consult service using big data analytics for personalized medicine

    Last Updated:

    In 2012 the Institute of Medicine released a desiderata for a learning healthcare system, where evidence informs practice and practice informs evidence. Though the randomized clinical trial (RCT) serves as the gold standard for informing clinical decisions, flaws exist in terms of achieving recruitment, overly stringent inclusion/exclusion criteria, and lack of patient-centered decision making. Observational cohort studies have grown as an important complement to RCTs allowing comparative effectiveness research and patient-centered trials. The surge of Electronic Health Records (EHR) and its resulting zettabyte of data5 allows us to realize this vision for the first time. Despite the growth of observational cohort studies, challenges still remain bringing the knowledge from the bench-to-the-bedside; moreover, model performance degrades when used in a cohort outside of its development.

    To ameliorate these difficulties, we propose to launch and study a novel “informatics consult” service. The service would allow clinicians, when no clear evidence based guidelines exist regarding care decisions, to query the UCSD clinical data warehouse by identifying patients similar to the index case. First proposed in the seminal “Green Button” paper by Longhurst et al., such a system would leverage our ability to truly deliver personalized, patient-centered care. Small-scale limited efforts have been put into practice to answer questions regarding treatment of melanoma8 and systemic lupus erythematosus complications. We note, however, the opportunity for a much larger service with broad impact starting with insights borne of data from UCSD, and potentially mining insights from the entire state-wide UC Health data warehouse.

    We note several novel challenges to this proposed system: (i) Performing semi-automated phenotyping so that we can identify clinical outcomes of interest10. (ii) Identifying patients that are similar to the index patient (often called clustering). (iii) Incorporating automated, computable search regarding guideline recommended care. (iv) Performing visual analytics to understand similarity of cohorts. (v) Communication of probability and statistical information to healthcare professionals so they can effectively manage uncertainty.

    Student responsibilities:

    1. Participate in project meetings
    2. Help design one of several possible algorithms/interfaces:
      a. patient clustering algorithm using unsupervised learning
      b. visual analytic interface for describing similar cohort of patients
      c. visual analytic interface to help communicate statistical risk
  • Integrating patient reported outcomes into the electronic health record to improve cardiovascular care.

    Last Updated:

    Unhealthy dietary choices—a lack of nutritious foods and an excess of unhealthy food—was shown as the major contributor in the 400,000 U.S. deaths in 2015 from cardiovascular diseases (CVD). Eating more nuts, vegetables, and whole grains, and less salt and trans fats, could save tens of thousands of lives in the U.S. each year. Obesity is one critical outcome of poor diet, which also contributes to heightened CVD risk. Thousands of smartphone apps are available to download for weight loss, but these apps primarily focus on caloric intake, rather than the overall quality of diet and lifestyle critical for CVD prevention.

    Mobile Health (mHealth) applications also have not been systematically tested for their effectiveness and are criticized for not having an evidence-based foundation. In this study, we adapt the design of mHeart to communicate automatically with the UCSD Electronic Health Record to help healthcare providers have access to psychosocial aspects of patient's care outside of the direct hospital system. In particular, the provider will be able to view logs of patient activity, dietary choices, and other lifestyle choices. The provider will also be able to send feedback to the patient to alter behavior.

    Student opportunities:

    1. Help modify smartphone app to make use of healthcare connection protocols like Apple HealthKit and Google Fit
    2. Understand interfaces that communicate with electronic health records (like FHIR)
    3. Help design point-to-point interface between smartphone app and electronic health record data, which is presented to provider
    4. Participate in meetings designing pilot study to test app performance
  • Systematic review and meta-analysis of hospital readmission for patients with cirrhosis.

    Last Updated:

    Patients with cirrhosis, a late stage of chronic liver disease, are at increased risk of hospitalization and hospital readmission. Although several studies have looked at models for predicting readmission for patients with cirrhosis, they are limited by small sample sizes, limited candidate predictor variables, and limited evaluation of discrimination and calibration. A systematic review and meta-analysis of available evidence can help shed new light on the problem, and help identify modifiable risk factors.

    Student responsibilities:

    1. Understand the basics of a systematic review
    2. Perform literature review
    3. Abstract necessary information in case report forms and help perform meta-analysis
    4. Help write manuscript

| Biomedical Informatics

Dr. Tsung-Ting Kuo is an Assistant Professor of Medicine in University of California San Diego (UCSD) Health Department of Biomedical Informatics (DBMI). He is mainly conducting biomedical, healthcare and genomic studies based on blockchain and predictive modeling. His research focuses on blockchain technologies, machine learning, and natural language processing.

  • Developing privacy-preserving predictive modeling algorithms on blockchain networks

    Last Updated:

    Predictive modeling can advance research and facilitate quality improvement initiatives and substantiate research results, especially when data from multiple healthcare systems can be included. However, current, state-of-the-art privacy-preserving predictive modeling frameworks are still centralized, in other words, the models from distributed sites are integrated in a central server to build a global model. This centralization carries several risks, e.g., single-point-of-failure at the central server. To improve the security and robustness of predictive modeling frameworks, we will develop and implement novel and advanced algorithms on decentralized blockchain networks (a distributed ledger/database technology adopted by crypto-currencies such as Bitcoin and Ethereum) to build better models. The outcome will be algorithms that improve the predictive power of data from multiple healthcare systems through a distributed system. Selected references: PMID 36402113, 34923447, and 31943009.

| School of Medicine

Insulin resistance is a major cause of the epidemic diseases of our society: diabetes, heart attacks, strokes, and fatty liver. Our goal is to understand who develops insulin resistance, how, and why. We use longitudinal health records, functional genomics, and human genetics to address this goal. In addition to this discovery science, we focus on clinical translation by developing ML/AI-driven clinical tools to interpret large scale genetic, genomic, and longitudinal health data to diagnose and treat people with diseases related to insulin resistance.

  • aiDose: A foundation model for clinical decision support in hospitalized patients with diabetes

    Last Updated:

    Hospitalized patients with diabetes require daily management by a team of clinicians that must make daily insulin dosing decisions. While each decision is usually made according to standard algorithms, hundreds of such decisions must be made daily by a few providers leading to errors and burnout; this project will address this clinical burden by developing aiDose, a foundation model for inpatient diabetes management that can incorporate medical records and previous clinician notes to provide decision support for inpatient diabetes services.

  • Longitudinal metabolic analysis of Diabetes Prevention Program (DPP) participants to identify patient subgroups with differential micro and macrovascular complication risk

    Last Updated:

    Type 2 Diabetes (T2D) complications cause morbidity and mortality, but occur heterogeneously among those at risk and thus are difficult to predict. Previous studies to identify individuals at risk of diabetic complications focus on single timepoint data for a few features and do not examine phenotypic variables over time. This project will analyze multiple longitudinal clinical phenotypes to identify clusters of individuals at risk of diabetic complications.

  • Pre-adipocyte cell fate reprogramming

    Last Updated:

    Adipose tissue secretes cytokines to regulate essential functions, but comprehensive study is prevented by difficult-to-access depots such as visceral epicardial adipose tissue and differences between donor sources and methods to generate cell lines. This project will start with an isogenic population of cultured human preadipocytes and reprogram them to specific types of adipocytes using a combinatorial process analogous to that of induced pluripotent stem cells. This project includes working with single cell (sc)-RNAseq and sc-ATACseq datasets using XGBoost and other models.

| School of Medicine

Research in the Mesirov Lab focuses on cancer genomics applying machine-learning methods to functional data derived from patient tumors. The lab analyzes these molecular data to determine the underlying biological mechanisms of specific tumor subtypes, to stratify patients according to their relative risks of relapse, and to identify candidate compounds for new treatments. The overall goal is to treat patients as individuals specific to their tumors. Importantly, the lab is committed to the development of practical, accessible software tools to bring these methods to the general biomedical research community.

  • Enhancing the Molecular Signatures Database

    Last Updated:

    We seek an undergraduate student to write software to enhance our Molecular Signatures Database (MSigDB), a repository of gene sets utilized by a worldwide genomics community of over 180,000 users. These gene sets are used to identify and better understand activated pathways underlying human disease, e.g., cancer, and to interpret experimental results. The NIH recently retired the Cancer Genome Anatomy Project, which included the BioCarta collection of curated pathway diagrams. These images and associated metadata are only accessible via the Internet Archive “Wayback Machine”. The project involves retrieving this data from the archive to provide visualizations for entries in the MSigDB. The results of this work will have a great impact on biomedical investigations worldwide. Interested students should have some familiarity with either programming and/or web technologies such as HTML and a willingness to learn Python and how to use libraries such as urllib and BeautifulSoup to perform web scraping. Expertise in biology is not required. The project can be for class credit or as an internship which may lead to other research experience.

  • Using single-cell sequencing technologies to understand response and resistance to cancer immunotherapy

    Last Updated:

    Immunotherapy, a class of drugs that enable a patient’s immune system to fight cancer, has emerged as a promising area of cancer drug development in recent years. However, not all patients respond to these treatments, and many patients who do will have a recurrence of their cancer. The biological mechanisms behind these differences in response to immunotherapy are currently poorly understood. However, recent improvements in sequencing technology now allow scientists to examine the behavior of genes in individual cells and how those cells resemble or differ from other cells around them. With this new data also comes the need to create new computational methods to analyze it. On this project, the student will work at the intersection of algorithm development and cancer biology to interpret single-cell sequencing data of cancer samples with the goal of understanding how cancer cells interact with the nearby immune cell populations and how these interactions affect response to treatment. Students will work in a multidisciplinary environment, collaborating with biologists, software developers, and experts in the fields of immunology and oncology. Interested students should have prior experience with programming, preferably in Python or R. This project can be done for class credit or as an internship.

| Computer Science and Engineering

  • Pevzner Lab Projects

    Last Updated:

    See the Research pages on https://bioalgorithms.ucsd.edu for an extensive list of mass spec projects in the Pevzner lab and in collaborating labs. Project themes include mass spec, T-Rex Fossils, and Comparative Proteogenomics.

| Pediatrics

  • RNA regulation of immunity

    Last Updated:

    RNA epigenetics or epitranscriptomics is an emerging field focused on chemical modifications in RNA. We are interested in understanding how RNA modifications affect the immune system during viral infections, vaccine development, immunotherapy, and in cancer. We employ in vivo models as well as non-human primates and human tissues to investigate genetics and epigenetics mechanisms of multiple disease states. Single-cell studies and data analyses are being performed to generate a single cell transcriptome and epigenome atlas of human brain regions such as prefrontal cortex, striatum, and hippocampus. Commonly used methods in the laboratory include large scale functional perturbation studies using RNAi and CRISPR, Simultaneous single-cell RNA sequencing and single-cell Assay for Transposase- Accessible Chromatin sequencing (scMultiome-seq), patient-specific stem cell derived brain and lung organoids, drug design and pharmacology, and analyses of immune cells’ functions.

| Biological Sciences

The Rifkin laboratory studies how environmental, genetic, and stochastic variation interact to generate phenotypic variation and thereby mold the course of evolution. We use yeasts and nematodes as model organisms and work primarily at the level of gene regulatory and signal transduction networks.

  • Simulations of genetic network evolution

    Last Updated:

    In many population genetic models, mutations are assigned a fitness effect - an assigned genotype -> fitness map.  Simulations can then explore how gene frequencies change over time under a number of demographic scenarios.  An alternative is to use a generative model - often times a set of differential equations - that use genotypic information to produce a phenotype and then map this phenotype to a fitness value.  We are using such a simulation framework to study how genetic networks evolve under different demographic situations.  Experience with C++ would be very useful for this project since the basic population genetic framework consists of a library of C++ templates.

| School of Medicine

Lab Location: CMM-West, Rm. 345

Lab Phone: 858-534-5858

Lab Composition and Activities: Five graduate students from several programs, and a talented group of enthusiastic (also helpful) postdoctoral fellows and a full time laboratory manager. We have one general laboratory meeting, one graduate student-only meeting, and one personal meeting each week. We also have joint lab meetings with two other labs weekly.

Research Interests: Our central laboratory focus this year is to continue to utilize global genomic approaches to uncover and investigate the “enhancer code” controlled by new, previously unappreciated pathways that integrate the genome-wide response to permit proper development and homeostasis, and that also functions in disease and senescence. We have investigated these events in differentiated cells, neuronal development, stem cells, and cancer. Our biological focus is on molecular mechanisms of the “enhancer code” regulating learning and memory; aggressive prostate and breast cancer, and they underlying events of senescence/aging. Epigenomic events studied include non-histone methylation events and non-coding RNAs. We are investigating these events in development, breast and prostate cancers, and in inflammation-based disease, including degenerative CNS disease and diabetes. The emerging importance of non-coding RNAs and regulation of nuclear architecture is rapidly altering our concepts of homeostasis and disease. Our laboratory is “Seq-ing” (RIP-seq, ChIP-seq, RNA-seq, GRO-seq, CLIP-seq, ChIRP-seq), and a new “FISH-seq”, for open-ended discovery of long-distance genome interactions to uncover new “rules” of regulated gene transcriptional programs and new roles for lncRNAs in biology of normal, cancer neuro-affective disorders and aging cells. Coupling this with chemical library screens, we hope to introduce new types of therapies based on targeting specific gene enhancers, histone protein readers and writers, and lncRNAs for cancers and other diseases. Recent surprising findings have been novel roles of lncRNAs prostate and breast cancer, connection between DNA damage repair/transcription and replication, and unexpected roles of enhancer RNAs.

Current interests include:

  • The “enhancer code,” Epigenomics and transcriptional regulatory mechanisms.
  • Roles of by ncRNAs in enhancer function in signal-dependent genomic relocation and in establishing subnuclear architecture.
  • Mechanisms of signal-induced tumor chromosomal translocations events and new chemical screens for inhibitors for breast and prostate cancer.
  • The “enhancer code” or regulation of learning and memory, including Reelin-regulated enhancers.
  • Linkage of DNA damage/repair and transcription.
  • Retinoic Acid regulation of Pol III-transcribed DNA repeats in maintenance of the stem cell state, in neuronal differentiation and in senescence.
  • Molecular mechanisms of prevelant disease associated sequence variations (GWAS) in disease susceptibility loci.
  • “Epigenomics” in neuronal differentiation, cancer, diabetes and degenerative brain disease.
  • Answering the question when and how enhancers arise and became functional (stem cells to mature cell types).

  • Bioinformatics Rotation Projects

    Last Updated:

    Potential projects include:

    • Projects employing use of genome-wide technologies, including ChIP-seq, GRO-seq, CLIPseq-, RNA-seq, and ChIRP-seq, to elucidate molecular mechanisms of regulated enhancer lncRNA actions in cancer and stem cells;
    • Roles and mechanisms of enhancer actions in prostate and breast cancers;
    • Enhancer-based model of neurodevelopment and CNS disorders;
    • New mechanisms of long non-coding RNAs dictating physiological gene regulation in cancer transcriptional programs;
    • Understanding subnuclear structures: Roles of relocation of transcription units between subnuclear architectural structures in regulated gene expression;
    • Chemical library screens to gene signature and translocation responses as an approach toward new cancer therapeutic reagents;
    • Roles of epigenomic regulators and expression of DNA repeats in stem cells, neuronal differentiation and in senescence.

| Biological Sciences

  • Systems Biology and Engineering of Environmental and Drought Tolerance in Plants

    Last Updated:

    Julian Schroeder's research is directed at discovering the signal transduction mechanisms and the underlying signaling networks that mediate resistance to environmental stresses in plants, in particular drought, salinity stress and CO2 responses in plants. These environmental (abiotic) stresses have substantial negative impacts on plant growth and crop yields. These environmental stresses are also relevant in reference to climate change and to maintaining available arable land to meet human needs. Research in Julian Schroeder's laboratory is using multidisciplinary approaches including genomics, bioinformatics, cell signaling, network modeling, proteomics and molecular biological towards uncovering the signal transduction network and receptors in plants that translate drought stress hormone reception, CO2 sensing and salinity stress to specific resistance responses. Some of the recent research advances are being used in the biotechnology industry with the goal of enhancing stress resistance of plants and crop yields. Undergraduate research projects will include systems biology and bioinformatics and innovative analyses of large scale data sets within this research. Undergraduate research projects will be pursued to model and identify drought stress-induced and CO2-induced signaling networks based on “omic” scale data sets. Models will be directly tested by wet lab experimentation.

    Julian Schroeder is Co-Director of the Center for Food and Fuel for the 21st Century. See https://labs.biology.ucsd.edu/schroeder/ for more information on the Schroeder lab.

    Selected publications

    • Nishimura et al., Science (2009).
    • H.H. Hu et al., Nature Cell Biol. (2010).
    • T.H. Kim et al. Current Biol (2011).
    • Xue et al., EMBO J. (2011).
    • F. Hauser et al. Current Biol (2011).
    • B. Brandt et al., PNAS (2012).
    • R. Waadt et al., eLife (2014).
    • A.M. Jones et al., Science (2014).
    • C.B. Engineer et al., Nature (2014).
    • B. Brandt, S. Munemasa et al. eLife (2015).
    • See also: https://labs.biology.ucsd.edu/schroeder/publications.html

| Bioengineering

My research focuses on time series analysis in biological systems, with an emphasis on practical information extraction for translational applications. The lab is divided into applications and approaches, though these all serve each other, and students collaborate routinely. Indeed, a positive attitude and an eagerness to support one another is requisite in the lab.  **Applications include but are not limited to: illness detection, prediction, and recovery monitoring; pregnancy detection and outcome forecasting; mental health monitoring; defining sleep in the body (as opposed to EEG); diabetes forecasting; and carbon footprint optimization of distributed computer systems.  **Approaches include, but are not limited to: multimodal time series information extraction; differentiating multiple outcome types from random assortment; reduction of high dimensional spaces with both modality, individual, and time series components; explicable machine learning model development; non-stationary signal analysis; novel approaches do diversity mapping and phenotyping from physiology and behavior data.  I seek to find a fit with each individual and the lab’s ongoing projects; no one comes in and is just given marching orders – you’ll do better work when it’s the work that you actually want to do!

  • COVID-19 recovery monitoring

    Last Updated:

    Some individuals seem to have lingering or failed recoveries after COVID-19 infections. Students comfortable with basic programming or data science skills are encouraged to enhance our description of recovery profiles from TemPredict, and search for features that can contribute to pre-recovery classification.

  • Diversity within physiological data

    Last Updated:

    Algorithms tend to be one size fits all, where as people are similar or dissimilar in complex and unmapped ways. Help map differences in normal routines, as well as in illness and recovery trajectories. These might arise from known demographic information, co-morbid conditions (diabetes, pregnancy, etc.), or be represent different patterns in illness associated with unknown or latent variables.

  • Improving women’s health outcomes

    Last Updated:

    We have shown repeatedly in humans and animal models that females are as tractable with statistics as males (actually, often more than). Yet female physiology remains inappropriately understudied. Help us refine algorithms, map changes like pregnancy and menopause, and explore diversity within as well as across traditional sex categories.

| Electrical and Computer Engineering

  • Hardware acceleration of computational genomics

    Last Updated:

    We're working on a number of projects related to hardware acceleration of computationally challenging tasks in genomics research and looking for highly motivated students with some background in GPU/FPGA programming or high-performance computing to join. Please contact me via email for details on the latest projects. 

  • SARS-CoV-2 Phylogenetics

    Last Updated:

    We’re currently working with an international group of collaborators to perform phylogenetic
    analysis of millions of SARS-CoV-2 sequences for viral epidemiology and evolutionary biology applications. Current projects include phylogenetic placement in real time, tree structure optimization, recombination inference, etc. Contact me via email for the latest details.

| Bioengineering

Our research focuses on molecular engineering for cellular imaging and reprogramming, and image-based bioinformatics, with applications in stem cell differentiation and cancer treatment.

  • Image-based reconstruction of biochemical networks in live cells

    Last Updated:

    Fluorescence resonance energy transfer (FRET)-based biosensors have been widely used in live-cell imaging to accurately visualize specific biochemical activities. We have developed the Fluocell image analysis software package to efficiently and quantitatively evaluate the intracellular biochemical signals in real-time, and to provide statistical inference on the biological implications of the imaging results. However, important questions arise on how to use these results to reconstruct the quantitative parameters in the underlying biochemical networks, which determine cellular functions and ultimately their fates. In this rotation project, we will integrate optimization-based machine learning approaches with biochemical network models to seek answers to these questions, with applications in cancer treatment against drug resistance.

  • Intelligent Diagnosis of Infectious Diseases by Deep Learning

    Last Updated:

    The diagnosis of infectious diseases often requires tissue biopsy and microscopic examination by pathologists, which is time-consuming, labor-intensive, and error-prone. To develop a software-assisting system for identifying microorganisms on digital images, we utilize the convolutional neural network and transfer learning for training and validating an intelligent software system for the classification of pathology slides. The goal of this project is to provide a diagnosis of pathogens with high efficiency and accuracy. Students will work in an interdisciplinary team, collecting and labelling imaging data, developing deep-learning based algorithms and user interfaces, characterizing and optimizing the accuracy and functionality of the software package.

| Cellular and Molecular Medicine

We have a wide scope of projects ranging from developing novel algorithms for studying RNA processing in diseases, development and personalized medicine, and for analyzing single-cell RNA-seq data.

  • ENCODE RNA binding proteins

    Last Updated:

    The Yeo lab is responsible for the identification of the RNA sequence elements bound by 250 RNA binding proteins (RBPs) as part of the newest ENCODE (https://www.genome.gov/10005107) efforts. Various computational projects that pertain to integrating RNA binding sites with functional alternative splicing, RNA stability and translational changes to generate global, genome-wide predictions of what each RBP can do are available for enterprising, hard-working graduate/undergraduate students.

  • Single-cell Analysis

    Last Updated:

    Recent studies of single cells demonstrate that the assumption that all cells of the same "type" are identical is simply inaccurate. Single, individual cells from the same population of cells differ by a lot and these differences underlie phenotypic responses to environmental stimuli. The Yeo lab is using microfluidics-based platforms to study whole transcriptome differences in single cells from a variety of biological systems, ranging from embryonic stem cells to diseased motor neurons. One of our projects is to develop new bioinformatic analytics to study cellular heterogeneity during environmental influences.