On-campus Research Opportunities (Undergraduate)

This page is updated annually. Some projects may already be taken, and new projects may be available. The projects below give an indication of the types of projects available in each lab, but please browse faculty web pages and contact professors directly to discuss current opportunities.

Ferhat Ay | Pediatrics

  • ferhatay@lji.org
  • ferhatay@lji.org
  • ferhatay@lji.org

Steve Briggs | Biological Sciences

  • sbriggs@ucsd.edu

Pieter Dorrestein | School of Pharmacy

  • pdorrestein@ucsd.edu

Kyle Gaulton | Pediatrics

  • kgaulton@ucsd.edu
  • kgaulton@ucsd.edu
  • kgaulton@ucsd.edu

Michael Gilson | School of Pharmacy

  • mgilson@ucsd.edu

Christopher Glass | Cellular and Molecular Medicine

  • ckg@ucsd.edu
  • ckg@ucsd.edu

Lawrence Goldstein | Cellular and Molecular Medicine

  • lgoldstein@ucsd.edu

Melissa Gymrek | Computer Science and Engineering

  • mgymrek@ucsd.edu

Olivier Harismendy | Biomedical Informatics

  • oharismendy@ucsd.edu
  • oharismendy@ucsd.edu
  • oharismendy@ucsd.edu

Terence Hwa | Physics

  • thwa@ucsd.edu

Lilia Iakoucheva | Psychiatry

  • lilyak@ucsd.edu
  • lilyak@ucsd.edu

Amy Kiger | Biological Sciences

  • akiger@biomail.ucsd.edu
  • akiger@biomail.ucsd.edu
  • akiger@biomail.ucsd.edu

Rob Knight | Pediatrics

  • rknight@ucsd.edu
  • rknight@ucsd.edu
  • rknight@ucsd.edu

Jejo Koola | School of Medicine

  • jkoola@ucsd.edu
  • jkoola@ucsd.edu
  • jkoola@ucsd.edu

Tsung-Ting Kuo | Biomedical Informatics

  • tskuo@ucsd.edu

Amit Majithia | School of Medicine

  • amajithia@ucsd.edu

Andrew McCammon | Chemistry and Biochemistry

  • jmccammon@ucsd.edu
  • jmccammon@ucsd.edu

Siavash Mirarab | Electrical and Computer Engineering

  • smirarabbaygi@ucsd.edu
  • smirarabbaygi@ucsd.edu

Pavel Pevzner | Computer Science and Engineering

  • ppevzner@ucsd.edu

Michael Rosenfeld | School of Medicine

  • mrosenfeld@ucsd.edu

Julian Schroeder | Biological Sciences

  • jischroeder@ucsd.edu

Yingxiao (Peter) Wang | Bioengineering

  • yiw015@eng.ucsd.edu
  • yiw015@eng.ucsd.edu

Gene Yeo | Cellular and Molecular Medicine

  • geneyeo@ucsd.edu
  • geneyeo@ucsd.edu

Sheng Zhong | Bioengineering

  • szhong@ucsd.edu

| Pediatrics

ferhatay@lji.org | Lab

We are interested in the analysis and modeling of the three-dimensional chromatin structure from high-throughput sequencing experiments. We develop methods that are based in statistics, machine learning, optimization and graph theory to understand how changes in the 3D genome affect cellular outcome such as development, differentiation and gene expression. We have ongoing interests in the systems level analysis and reconstruction of regulatory networks, inference of enhancer-promoter contacts, predictive models of gene expression and integration of three-dimensional chromatin structure with one-dimensional epigenetic measurements in the context of cancer, malaria, asthma and several autoimmune diseases.

  • Integrative analysis of multi cell-type gene expression and epigenomic data in tumor immune response

    Last Updated:

    This project will focus on developing regulatory network inference methods for the joint analysis of gene expression and histone modification data from several different types of tumor infiltrating lymphocytes, which are gathered from a cohort of patients with solid tumors.

  • Predictive and comparative modeling of epigenetic gene regulation in different human immune cell types

    Last Updated:

    The goal of this project is to model the natural variation in gene expression across many immune cell types using an already established database at LJI (http://dice-database.org) and to identify cell type-specific epigenetic regulators of important immune genes.

  • Statistical methods for inferring functional DNA-DNA contacts from Hi-C and HiChIP/PLAC-seq data

    Last Updated:

    This project focuses on developing computational tools for better analysis of the wealth of data from chromosome conformation capture assays with the ultimate goal of inferring functional chromatin contacts such as those between enhancers and promoters.

| Biological Sciences

sbriggs@ucsd.edu | Profile | Lab

We model relationships between the proteotypes and the phenotypes of cells/organisms, with an emphasis on innate immunity in plants. Proteotypes are measured using custom methodology for high-throughput proteomics based on mass spectrometry. Students have an opportunity to integrate training in bioinformatics with chemistry and biology.

The specific state of the proteome in a given cell, tissue, or organism is known as the proteotype. The proteotype integrates constraints imposed by the genotype, the environment, and by developmental history (e.g., a leaf cell has a different proteotype than a root cell with the same genotype in the same environment). The proteotype directly determines phenotype since all molecules are made by and regulated by proteins. Thus, a complete description of the proteotype should define a phenotype at the molecular level. We are constructing an Atlas of Proteotypes that currently includes 162,777 peptides from 41,553 proteins in 65 different tissues and stages of development. In addition, we have identified and measured more than 30,000 phosphopeptides from these same samples. The 65 resultant proteotypes are revealing thousands of unanticipated regulatory relationships. The relationships between mRNA levels and protein levels are fascinating; they indicate that protein levels from some genes are regulated by transcription but that most protein levels are under post-transcriptional control. Inspection of our data explains tissue specific traits such as oil accumulation in the embryo that results from selective accumulation of proteins from common mRNAs.

  • Stoichiometry of the cell

    Last Updated:

    With the rise of quantitative proteomics it is now possible to measure the absolute number of protein molecules in a cell. We are using multiple reactions monitoring (MRM) with a triple quadrupole mass spectrometer of heavy and light isotope-labeled peptides to quantify signaling and metabolic dynamics with a focus on the post-translational modifications of phosphorylation and acetylation. By placing biology on a quantitative basis we are contributing to several fundamental advances: the ratios of different proteins to each other are being determined (i.e., the stoichiometry); results from our lab can be combined with data from other labs because they are measured in absolute units; the ratios of proteins to metabolites or RNAs can be ascertained. We are constructing pathway proteotypes for signaling and metabolism to identify proteins whose levels are incompatible with a simple role in the process. To contribute to this effort we would like a student to construct an MRM database. The database will store all the heavy peptides we have available in the lab along with information about the proteins from which they are derived. We will store MS/MS information for peptides that we have observed in our proteome surveys using linear ion trap mass spectrometers. The database will store all reaction/transition data that we have obtained for each peptide along with the signal strength for each reaction product. MS1 scans with the triple quadrupole mass spectrometer will be included to evaluate the purity/intensity of the heavy peptides. This database will be of great use to the many labs that are beginning to place biology on a quantitative foundation.

| School of Pharmacy

pdorrestein@ucsd.edu | Profile | Lab

Our work aims to develop new mass spectrometry based methods to understand the chemistry of microbes, our microbiome and their ecological niche. In short, we develop tools that translate the chemical language between cells. This research requires the understanding of (microbial) genomics, proteomics, imaging mass spectrometry, genome mining, enzymology, small molecules structure elucidation, bioactivity screening, antibiotic resistance and an understanding of small molecule structure elucidation methods. The collaborative mass spectrometry innovation center that he directs is well equipped and now has twelve mass spectrometers, that are used in the studies to investigate capture cellular chatter (e.g. metabolic exchange), metabolomics, metabolism and to develop methods to characterize natural products. These tools are used to defining the spatial distribution of natural products in 2D, 3D and in some cases real-time. Areas of recent research directions are capturing mass spectrometry knowledge to understand the microbiome, non invasive drug metabolism monitoring, informatics of metabolomics, microbe-microbe, microbe-immune cells, microbe-host, stem cell-cancer cell interactions and diseased vs. non-disease model organisms and the development of strategies for mass spectrometry based genome mining and to detect and structurally characterize metabolites through crowd source annotation of molecular information on the Global Natural Products Social Molecular Networking site through the NIH supported center for computational mass spectrometry that is co-developed with Nuno Bandeira. A more detailed biography can be found in this Nature article.

  • Post-Translational Modifications Projects

    Last Updated:

    The Dorrestein laboratory is interested in the functional aspects and biosynthesis of post-translational modifications (PTMs). Of particular interest are orphan genes (genes currently assigned to have no known function) that are responsible for the generation of bioactive natural products (e.g. antibiotics, anti-cancer agents etc.) or PTMs. This research aims to understand the functions of such genes by the use of high-resolution mass spectrometry. To achieve this goal, the lab will have the most advanced mass spectrometer on the UCSD campus. Please see the Research page on the lab website for descriptions of current research projects.

| Pediatrics

kgaulton@ucsd.edu | Lab

The Gaulton lab studies the effects of human genetic variation on gene regulation and diabetes risk. We use computational and statistical methods to integrate genome sequence information with epigenomic annotation and molecular QTL data.

  • Genetic and epigenomic fine-mapping of diabetes risk loci

    Last Updated:

    This project involves dense genetic fine-mapping of diabetes risk loci, integrating fine-mapping data with large-scale genomic and epigenomic maps using published and novel models to identify causal variants, cell types and networks, and applying these predictive models to identify additional diabetes risk loci.

  • Predicting causal genes at diabetes risk loci

    Last Updated:

    This project involves the development of novel methods for integrating genetic association data with epigenomic annotation, expression QTLs and chromatin QTLs to predict causal genes of diabetes risk variants.

  • Predicting genome-wide pleiotropic effects of diabetes risk variants

    Last Updated:

    This project involves development of novel mixture model approaches to predicting and quantifying the extent of pleiotropy among diabetes risk variants genome-wide.

| School of Pharmacy

mgilson@ucsd.edu | Profile | Lab

We work on many aspects of molecular mechanism, modeling and design.  Our core interests are in the physical chemistry and algorithms underpinning computer-aided drug design methods, but we also have much wider interests, such as simple model receptors for studying molecular recognition; how molecule motors work; chemical informatics and its interface with bioinformatics; how membranes work; and synthetic catalysts and nanoparticles.

  • Information theory studies of protein sequence and structure

    Last Updated:

    While working on methods of computing entropy from molecular simulation data, we stumbled on an interesting mathematical angle for studying correlations in high-dimensional spaces. The math can be applied in various realms, and we got some interesting results applying it to residue-residue (and higher-order) correlations within protein sequences.

    Thus, this would be an exploratory project to see if we can learn new things about sequence-function relationships in proteins, and maybe about how to design new proteins, by studying residue-residue correlations, particularly in the context of 3D protein structures.

| Cellular and Molecular Medicine

ckg@ucsd.edu | Profile | Lab

Dr. Glass’ primary interests are to understand transcriptional mechanisms that regulate the development and function of macrophages. Macrophages play key roles in immunity, wound repair, development and tissue homeostasis. Dysregulation of macrophage functions contribute to a broad spectrum of human diseases, including atherosclerosis, diabetes, neurodegenerative diseases, and cancer. A major effort of the Glass laboratory is to use genomics assays and associated bioinformatics approaches to understand how macrophage gene expression programs are established and how they are influenced by different tissue environments and disease. An important concept to emerge from these studies is that enhancers can be exploited to deduce the transcription factors and upstream signaling pathways that drive context-specific transcriptional outputs. Students are welcome to select projects from current areas of active investigation.

  • Natural genetic variation and macrophage gene expression

    Last Updated:

    Many lines of evidence, including genome-wide association studies, indicate that non-coding genetic variation plays a major role in determining phenotypic diversity. We were among the first laboratories to define the impact of natural genetic variation on enhancer selection and function (Heinz et al, Nature 2013 PMID 24121437), but at present it remains difficult to predict the impact of non-coding variation on gene expression. In a novel and ambitious effort, we systematically characterized the genome wide patterns of mature RNA (RNA-seq), nascent RNA (GRO-Seq), transcriptional initiation (5’GRO-seq), histone modifications and binding profiles of lineage-determining and signal-dependent transcription factors (ChIP-seq), DNA methylation (bisulfite sequencing), and chromatin conformation (HiC, capture HiC and PLAC-seq), in resting and activated macrophages derived from 5 different inbred strains of mice providing ~60 million single nucleotide variants, ~6 million InDels and several hundred thousand structural variants. This data set provides a unique resource for investigating the impact of non-coding variants on transcription factor binding, enhancer activation and target gene expression. We are currently developing new computational methods for analyses of these data with a goal of explaining effects of non-coding mutations and predicting patterns of gene expression in new mouse strains. Related projects are investigating the relationships of genetic variation between selected mouse strains and their different susceptibilities to metabolic and cardiovascular disease. This general project area is both challenging and open ended and there are a wide range of directions that rotation projects could take. As examples, recent rotation students have implemented machine learning approaches to investigate how sequence variants affect collaborative binding between lineage-determining transcription factors.

  • Nature and nurture of microglia

    Last Updated:

    Each population of tissue resident macrophages exhibits a distinct pattern of gene expression that is tuned to the developmental and homeostatic needs of that tissue. For example, brain macrophages called microglia produce factors that are trophic for neurons and monitor synapses, functions that require a brain-specific program of gene expression. A key question is how this tissue-specific program of gene expression is achieved. Through analysis of gene expression and enhancer landscapes, we obtained evidence that the microglia-specific molecular phenotype results from instructive signals in the brain that direct the activation of microglia-specific enhancers (Gosselin et al., Cell 2014 PMID 25480297). Of particular interest, delineation of the gene expression patterns and enhancer landscapes of human microglia revealed that a substantial fraction of the genes associated with non-coding GWAS risk alleles are preferentially or exclusively expressed in microglia, and many are brain environment dependent (Gosselin et al. Science 2017 PMID 28546318). These findings raise several important questions that are under active investigation, including what are the environmental factors that dictate the brain specific program of gene expression and how do human genetic variants affect the regulation of genes that are linked to neurodegenerative disease. We are taking a multi-disciplinary approach including studies of in vivo mouse models, in vitro human iPSC-derived microglia, genomic assays of microglia nuclei derived from control and Alzheimer’s disease brains, and direct analyses of the relation of genotype to gene expression in a growing of RNA-seq data base derived from purified human microglia. As an example, a recent rotation project investigated the question of whether there is any relationship between circulating monocytes (a white blood cell that can differentiate into macrophages in tissues) and microglia gene expression patterns from the same individual. 

| Cellular and Molecular Medicine

lgoldstein@ucsd.edu | Profile | Lab
  • Stem Cell GWAS in Alzheimers

    Last Updated:

    Genome-wide association studies (GWAS) have identified polymorphic variants in many new genes linked to late-onset sporadic Alzheimer's disease (SAD). Human stem cells from patients with disease give us the opportunity to examine the connection between these variants and phenotype at the individual and cellular level. Currently we are investigating the role of the sortilin-related receptor 1 (SORL1) gene in the pathogenesis of SAD. Variants in the 5' and 3' regions of SORL1 have been linked to SAD and may lead to defective gene expression, which contributes to amyloid beta production and neuronal death in the disease. However, the variants identified by GWAS are common and not likely to be causative mutations. One important project is to use bioinformatic tools to probe the genomic region around the GWAS-associated risk variants to identify candidate rare polymorphisms, which may have an effect on phenotype. Once these variants are identified, we can genotype stem cell lines for these polymorphisms and design experiments to determine the contribution of these variants to Alzheimer's disease.

| Computer Science and Engineering

mgymrek@ucsd.edu | Lab

Our overall goal is to understand complex genetic variants that underlie human disease. We are particularly interested in repetitive DNA variants known as short tandem repeats (STRs) as a model for complex variation. Our work focuses on developing computational tools for analyzing and visualizing complex variation from large-scale sequencing data and applying these tools to learn about the contribution of repetitive variation to human disease.

  • Measuring genetic constraint in the non-coding genome

    Last Updated:

    A major result from recent genome-wide association studies (GWAS) is that the majority of genetic variants driving common human disease lie in regulatory, rather than protein-coding, regions. While it is relatively straightforward to predict the consequences of mutations in coding regions, we are far from being able to interpret and sift through the large number of non-coding variants arising from whole genome studies. Recent studies have leveraged population-wide genetics datasets to determine genes that are depleted of variation, or "constrained", and thus presumably important for human health. In this project we will develop statistical tests using large panels of human genetic variation to systematically measure constraint for a variety of regulatory annotations and evaluate the utility of these annotations for prioritizing variants from medical genetics studies.

| Biomedical Informatics

oharismendy@ucsd.edu | Profile | Lab

The Oncogenomics laboratory is located in the Moores Cancer Center. Its research program is focused on the identification of genetic and epigenetic markers for cancer prevention and progression as well as drug response. The laboratory is a humid laboratory, combining both wet-lab techniques and bioinformatics analysis to study cancer samples from patients and animal models of cancer. The laboratory is also an important partner for multiple principal investigators at the Moores Cancer Center, collaborating on the design, analysis and interpretation of their genomic experiments.

  • Development of Genomics Virtual Machines in HIPAA compliant cloud

    Genetic information is considered protected health information (PHI) and as a consequence the highest security standards need to be applied for its storage, analysis and sharing. The oncogenomics laboratory is using state of the art iDASH compute cloud for its main computation. As a consequence, we participate in the development of optimal workflows and virtual machines for the analysis of patient-derived genomic datasets such as whole exomes, whole genomes, RNA-seq or genotyping arrays. 

    In this project we will develop robust provisioning methods to establish virtual machines capable of running popular human genomic analysis workflows. We will benchmark these machines and workflows and convert some of them into standard recipes for production-grade, reproducible genomic analysis.

  • Genetic and epigenetic of cisplatin resistance

    Last Updated:

    Cisplatin (cDDP) is the most commonly used chemotherapeutic drug, but most cancer eventually become resistant, leading to tumor recurrence. Several biological processes may modulate cDDP sensitivity: Drug import, export, detoxification, DNA repair, apoptosis. Drug resistance is transmitted to daughter cells, and one can build up resistant cell lines in vitro using sequential treatments. We are interested in identifying the genetic mutations that mediate this resistance. For this, we have derived resistant cell-lines from single clones of a cDDP sensitive ovarian cancer cell line. Using exome sequencing as well as target sequencing, we propose to determine mutations in genes and pathways that drive drug resistance. We will then expand the findings to the TCGA samples, using time to recurrence as an indicator of drug sensitivity.

  • The role of inherited variation in cancer somatic landscape

    Last Updated:

    The role of germline or inherited variation in cancer has been studied in selected families and led to the identification of genetic variants that are dominant and responsible for cancer syndromes. Similarly, rare recessive variants with lower penetrance are responsible for the increased risk in breast and ovarian cancer (BRCA1/2). More common variants in the population have also been identified through GWAS, and have revealed multiple SNPs associated with a modest increase in cancer risk. Despite these advances, multiple variants of intermediate allelic frequency in the population, or carried by patients with undocumented family history still remain variants of unknown significance (VUS) and can still play a role in tumor development. In addition, the contribution of variants located outside of the coding region has been underexplored and can now be reexamined in the light of recent maps of the regulatory landscape. The long-term goal of this research is to utilize germline genetics variation in cancer prevention and care to better stage patients or predict their response to treatment.

    We propose to identify the germline variants in the UCSD Cancer center patients (targeted gene panel) as well as in the public TCGA/ICGC datasets (whole genomes). We will then test these variants, alone or in combination to identify the ones that impact cancer onset, the tumor somatic landscape or tissue-specific regulatory network. The project will involve the processing of high throughput sequencing data, population genetics, and statistical analysis, in a HIPAA compliant cloud-computing environment.

| Physics

thwa@ucsd.edu | Profile | Lab

Terence Hwa, ​Departments of Physics and Molecular Biology

The Hwa lab (a.k.a. the Quantitative Microbiology Lab) uses a combination of experimental and theoretical approaches to elucidate the organizational principles of living systems. The goal is to quantitatively characterize the physiological behaviors and understand how they arise in terms of the underlying molecular interactions. Our lab focuses on the bacterium E. coli, because it is perhaps the best characterized in terms of molecular components and interactions. But we do also study higher organisms together with collaborating labs. Please visit our lab webpage (http://matisse.ucsd.edu) for further information.

  • Quantitative studies of bacterial physiology

    Last Updated:

    An outstanding challenge in making biology quantitative and predictive is how to deal with the millions or even billions of missing parameters that describe the underlying molecular interactions. In recent years, our lab pioneered a top-down approach which exploited a number of phenomenological laws to accurately predict the physiological responses of bacteria to environmental and genetic changes (e.g., nutrients, antibiotics, heterologous protein expression) [DOI: 10.1126/science.1192588]. Furthermore, insight from this quantitative physiological approach is able to pinpoint key missing molecular interactions in long-studied biological processes [DOI: 10.1038/nature12446]. The lab has a number of projects further extending this basic approach to a variety of problems in microbiology, including growth transitions, stress response, antibiotic resistance, and biofilm formation.

| Psychiatry

lilyak@ucsd.edu | Profile | Lab

The lab has a variety of bioinformatics projects aimed at improving understanding of the functional impact of autism mutations derived from exome and genome sequencing of the patients. We build spatio-temporal gene co-expression and protein interaction networks for psychiatric diseases and we use these networks to generate the testable hypothesis about the mechanisms of disease. We also test these hypothesis experimentally in the lab, thereby adding a translational aspect to our work. 

  • Evaluating the effect of splicing mutations on isoform networks in autism

    Last Updated:

    The project deals with constructing the isoform-level co-expression and protein interaction networks for predicting functional impact of the de novo splice site mutations from the patients with autism spectrum disorder (ASD). Hundreds of splice site de novo mutations are currently identified in the ASD patients, but not a single disease mechanism is established for any of these mutations. We will build and analyze isoform-level networks of brain co-expressed and physically interacting proteins; map de novo ASD mutations onto isoform-level networks to predict their functional impact; and validate the disrupted networks and pathways using CRISPR/Cas technology in neuronal and animal models. This project will discover and characterize cellular and molecular processes that are disrupted by the de novo splice site ASD mutations.

  • Integrative functional genomic study of pathways impacted by recurrent autism CNV

    Last Updated:

    Copy number variants (CNVs) represent significant risk factors for Autism Spectrum Disorders (ASD). One of the most frequent CNVs involved in ASD is a deletion or duplication of the 16p11.2 CNV locus, spanning 29 protein-coding genes. Despite the progress in linking 16p11.2 genetic changes with the phenotypic (macrocephaly and microcephaly) abnormalities in the patients and model organisms, the specific molecular pathways impacted by this CNV remain unknown. To test the hypothesis that RhoA signaling is disrupted by this CNV, we will generate KCTD13 and CUL3 mouse models using CRISPR/Cas9 system and investigate dysregulated molecular pathways using RNAseq at various stages of the developing mouse fetal brain.

| Biological Sciences

akiger@biomail.ucsd.edu | Profile | Lab

Cells must continuously maintain integrity and compartmentalization with demands for cellular remodeling throughout development, immunity, aging and disease. Using functional genomics, genetics and cell biological approaches in the fruit fly, Drosophila, we are studying the central roles for membrane regulation of dynamic cell structure. We have identified novel endocytosis and autophagy membrane trafficking pathways that control macrophage and muscle remodeling, with relevance to human disease. Current projects in the lab aim to discover new mechanisms of cellular remodeling through functional genomic and proteomic approaches, and to better understand the pathway networks and dynamics during cellular remodeling.

  • Autophagy networks

    Last Updated:
    • Expand on our ongoing co-immunoprecipitation and mass spectrometry datasets to identify protein-protein interactions involved in autophagy.
    • In collaboration with the Ideker lab and the SDCSB Network Assembly Core, analyze coIP results and incorporate functional data into an ‘autophagy network’.
    • Test new insights predicted from network by in vivo autophagy assays.
  • Computational analysis of lipid regulators and effectors in Drosophila development

    Last Updated:
    • Use databases and bioinformatics to identify all predicted phosphoinositide lipid regulators and effectors (binding proteins) in Drosophila.
    • Mine databases of Drosophila tissue and stage-specific gene expression, function and protein-protein interaction information for each candidate gene (above).
    • Identify potential relationships between regulators and effectors computationally and experimentally.
  • High-throughput image analysis of cell morphology

    Last Updated:
    • In collaboration with the Tsimring lab at the BioCircuits Institute, optimize newly developed machine learning image analysis algorithms to quantify cell shape and cell shape changes.
    • Conduct new RNAi screens to test optimized image analysis algorithms, and employ established methodology to screen for new and modifying (enhancer/suppressor) gene functions in cellular remodeling.
    • Perform network analysis of large-scale RNAi screen image data.

| Pediatrics

rknight@ucsd.edu | Lab

The Knight lab has broad interests in the human microbiome, the collection of trillions of microbes that inhabits our bodies, especially in developing techniques to read out these complex microbial communities and use the resulting data to understand human health, links between humans and the environment, and to prevent and cure disease. We offer a fast-paced environment with many collaborative opportunities on different projects.

  • Machine Learning for the Microbiome

    Last Updated:

    We have amassed a database of microbial DNA sequences from hundreds of thousands of biological specimens. Understanding how these changes relate to disease requires a range of machine learning and multivariate statistical approaches. There are many opportunities ranging from entry-level (benchmarking classifier performance on specific sample sets) to extremely challenging (using deep learning to infer the structure of global sample set relationships).

  • Multi-omics integration

    Last Updated:

    An increasing need is to integrate data from different "omics" level, e.g. genomes, metagenomes, metatranscriptomes, metaproteomes, metabolomes, immunological profiling, etc., into a single coherent picture separating healthy and disease states. Improved methods for performing this task, either directly or via intermediate representations such as mapping to metabolic and regulatory pathways, is essential for improving understanding. Projects in this category range from simple (testing where existing techniques like correlation networks or Procrustes analysis do/don't connect two specific data layers) to challenging (use transfer learning to integrate heterogeneous data layers and improve the underlying network annotation). An especially exciting emerging research direction here is XAI (explainable artificial intelligence), which can provide for clinical applications a better justification for a specific classification or suggestion.

  • Optimizing microbiome algorithms

    Last Updated:

    Many algorithms used in microbiome studies, especially in metagenomic assembly, are extremely computationally expensive. Opportunities exist for either exploiting new hardware architectures to accelerate existing algorithms, or for developing new approximate algorithms, to tackle problems in the workflow including inferring taxonomy and function from DNA sequence data, genome and metagenome assembly and annotation, computing community distance metrics from sparse compositional data, and high-level analyses of hundreds of thousands of microbiomes. Again these projects range from entry level (compare results of two multiple sequence alignment techniques for subsequent community analysis) to advanced (use non-von Neumann architectures to perform pattern classification in real time at the whole community level for disease detection).

| School of Medicine

jkoola@ucsd.edu | Profile

Dr. Koola is a physician scientist specializing in Biomedical Informatics and Hospital Medicine. He specializes in the area of big data machine learning for predictive analytics. In particular, he is interested in using electronic health records to improve care delivery--particularly for patients with advanced liver disease. Using risk prediction models in a healthcare context requires understanding of: (i) the healthcare system of intended use; (ii) risk model building; (iii) risk model assessment; and (iv) risk model re-calibration. Additionally, Dr. Koola is interested in visual analytics, data modeling, and health services research.

  • Designing the "Green Button" informatics consult service using big data analytics for personalized medicine

    Last Updated:

    In 2012 the Institute of Medicine released a desiderata for a learning healthcare system, where evidence informs practice and practice informs evidence. Though the randomized clinical trial (RCT) serves as the gold standard for informing clinical decisions, flaws exist in terms of achieving recruitment, overly stringent inclusion/exclusion criteria, and lack of patient-centered decision making. Observational cohort studies have grown as an important complement to RCTs allowing comparative effectiveness research and patient-centered trials. The surge of Electronic Health Records (EHR) and its resulting zettabyte of data5 allows us to realize this vision for the first time. Despite the growth of observational cohort studies, challenges still remain bringing the knowledge from the bench-to-the-bedside; moreover, model performance degrades when used in a cohort outside of its development.

    To ameliorate these difficulties, we propose to launch and study a novel “informatics consult” service. The service would allow clinicians, when no clear evidence based guidelines exist regarding care decisions, to query the UCSD clinical data warehouse by identifying patients similar to the index case. First proposed in the seminal “Green Button” paper by Longhurst et al., such a system would leverage our ability to truly deliver personalized, patient-centered care. Small-scale limited efforts have been put into practice to answer questions regarding treatment of melanoma8 and systemic lupus erythematosus complications. We note, however, the opportunity for a much larger service with broad impact starting with insights borne of data from UCSD, and potentially mining insights from the entire state-wide UC Health data warehouse.

    We note several novel challenges to this proposed system: (i) Performing semi-automated phenotyping so that we can identify clinical outcomes of interest10. (ii) Identifying patients that are similar to the index patient (often called clustering). (iii) Incorporating automated, computable search regarding guideline recommended care. (iv) Performing visual analytics to understand similarity of cohorts. (v) Communication of probability and statistical information to healthcare professionals so they can effectively manage uncertainty.

    Student responsibilities:

    1. Participate in project meetings
    2. Help design one of several possible algorithms/interfaces:
      a. patient clustering algorithm using unsupervised learning
      b. visual analytic interface for describing similar cohort of patients
      c. visual analytic interface to help communicate statistical risk
  • Integrating patient reported outcomes into the electronic health record to improve cardiovascular care.

    Last Updated:

    Unhealthy dietary choices—a lack of nutritious foods and an excess of unhealthy food—was shown as the major contributor in the 400,000 U.S. deaths in 2015 from cardiovascular diseases (CVD). Eating more nuts, vegetables, and whole grains, and less salt and trans fats, could save tens of thousands of lives in the U.S. each year. Obesity is one critical outcome of poor diet, which also contributes to heightened CVD risk. Thousands of smartphone apps are available to download for weight loss, but these apps primarily focus on caloric intake, rather than the overall quality of diet and lifestyle critical for CVD prevention.

    Mobile Health (mHealth) applications also have not been systematically tested for their effectiveness and are criticized for not having an evidence-based foundation. In this study, we adapt the design of mHeart to communicate automatically with the UCSD Electronic Health Record to help healthcare providers have access to psychosocial aspects of patient's care outside of the direct hospital system. In particular, the provider will be able to view logs of patient activity, dietary choices, and other lifestyle choices. The provider will also be able to send feedback to the patient to alter behavior.

    Student opportunities:

    1. Help modify smartphone app to make use of healthcare connection protocols like Apple HealthKit and Google Fit
    2. Understand interfaces that communicate with electronic health records (like FHIR)
    3. Help design point-to-point interface between smartphone app and electronic health record data, which is presented to provider
    4. Participate in meetings designing pilot study to test app performance
  • Systematic review and meta-analysis of hospital readmission for patients with cirrhosis.

    Last Updated:

    Patients with cirrhosis, a late stage of chronic liver disease, are at increased risk of hospitalization and hospital readmission. Although several studies have looked at models for predicting readmission for patients with cirrhosis, they are limited by small sample sizes, limited candidate predictor variables, and limited evaluation of discrimination and calibration. A systematic review and meta-analysis of available evidence can help shed new light on the problem, and help identify modifiable risk factors.

    Student responsibilities:

    1. Understand the basics of a systematic review
    2. Perform literature review
    3. Abstract necessary information in case report forms and help perform meta-analysis
    4. Help write manuscript

| Biomedical Informatics

tskuo@ucsd.edu | Profile
  • Developing privacy-preserving predictive modeling algorithms on blockchain networks

    Last Updated:

    Predictive modeling can advance research and facilitate quality improvement initiatives and substantiate research results, especially when data from multiple healthcare systems can be included. However, current, state-of-the-art privacy-preserving predictive modeling frameworks are still centralized, in other words, the models from distributed sites are integrated in a central server to build a global model. This centralization carries several risks, e.g., single-point-of-failure at the central server. To improve the security and robustness of predictive modeling frameworks, we will develop and implement novel and advanced algorithms on decentralized blockchain networks (a distributed ledger/database technology adopted by the Bitcoin cryptocurrency) to build better models. The outcome will be algorithms that improve the predictive power of data from multiple healthcare systems through a distributed system.

| School of Medicine

amajithia@ucsd.edu | Profile | Lab

Our goal is to identify genes causing insulin resistance in humans in order to find new therapeutic targets for diabetes and cardiometabolic diseases. Our approach to discovery is grounded in human genetics, clarified through systematic, high throughput experimentation in human cells, and calibrated by its relevance to clinical disease. We use massively parallel genome engineering to re-create mutations identified in patients and develop high-throughput assays to interrogate function in human cell models. We apply bioinformatics and statistics to make sense of this data integrating 1) human mutations, 2) cellular function, and 3) metabolic/glycemic phenotypes of the individuals who harbor them. Using this approach, we have discovered novel missense mutations that greatly increase risk for type 2 diabetes. As a complementary aim towards precision medicine, we develop tools for clinical genome interpretation powered by high-throughput experimental data.

  • Integrative genomics pipeline to identify therapeutic targets for insulin resistance

    Last Updated:

    Insulin resistance causes diabetes, heart disease and many cancers. Only one major class of drugs, thiazolidinediones (TZDs), treats insulin resistance, but causes serious side effects. RNA-sequencing has been performed on patient adipose and muscle tissue before and after TZD treatment as well as in multiple other clinical situations where insulin resistance is altered (e.g. weight loss, gastric bypass). The goal of this project is to develop a framework for jointly analyzing these multiple datasets to identify a core of common gene expression changes. This common core would be enriched for causal mediators of human insulin sensitivity changes and will be experimentally validated in laboratory based experiments to credential new therapeutic targets for insulin resistance. 

| Chemistry and Biochemistry

jmccammon@ucsd.edu | Profile | Lab

The McCammon group conducts a very wide range of research activities, from the deeply biological (studies of protein and nucleic acid targets for drugs for infectious diseases, studies of protein kinase regulation, etc.) to the development of mathematical and physical methods for simulating biological processes (development of methods for solving partial differential equations, exploring the role of hydrodynamic interactions in protein-protein association, etc.). All of this work involves the use of computers; we do no experimental work in the traditional sense, but we have extensive collaborations with experimental labs at UCSD, The Scripps Research Institute, The Salk Institute, and elsewhere. A more complete perspective can best be obtained by visiting the McCammon group website (http://mccammon.ucsd.edu/). We welcome undergraduate research participants when space allows, as described in http://mccammon.ucsd.edu/UGResOp.html

  • Computer-aided Drug Discovery

    Last Updated:

    Physics-based computational modeling of proteins and other drug targets is developed and applied to a wide variety of diseases. Please see http://mccammon.ucsd.edu/UGResOp.html.

  • Structural Systems Biology

    Last Updated:

    Physics-based computational modeling of biomolecules and their interactions is used to understand the emergence of cellular behavior. Please see http://mccammon.ucsd.edu/UGResOp.html

| Electrical and Computer Engineering

smirarabbaygi@ucsd.edu | Profile | Lab

Our lab specializes in reconstruction of evolutionary histories (phylogenies) from large scale datasets and applications of phylogenetic analyses to downstream analyses. Large-scale datasets include those with many genes and those with many species, and we focus on high accuracy and scalability at the same time. Many projects in this area are available, some of which are described below, but students can contact me to start on other projects as well.

  • Multiple sequence alignment

    Last Updated:

    Developing methods for computing a consensus among large numbers of large multiple sequence alignments using the concept of an equivalence class.

  • Reconstruction of species trees from gene trees

    Last Updated:

    Several projects are available, with different emphasis. Other projects in this general area can also be defined based on student interest.

    1. Improving ASTRAL (an algorithm for species tree reconstruction from gene trees) to handle more varied datasets, to improve scalability as the number of genes increases, and to give better theoretical analysis of the algorithm. An HPC implementation is also of interest.
    2. Testing ASTRAL for gene trees that include duplication and loss events in addition to incomplete lineage sorting.
    3. Re-analyzing a set of biological datasets using recently developed methods, and comparing their empirical performance

| Computer Science and Engineering

ppevzner@ucsd.edu | Profile | Lab
  • Undergraduate Projects Available

    Last Updated:

    See the Projects page on http://www.ubergrid.org for an extensive list of mass spec projects in the Pevzner lab and in collaborating labs. Project themes include mass spec, T-Rex Fossils, and Comparative Proteogenomics.

| School of Medicine

mrosenfeld@ucsd.edu | Profile

Lab Location: CMM-West, Rm. 345

Lab Phone: 858-534-5858

Lab Composition and Activities: Five graduate students from several programs, and a talented group of enthusiastic (also helpful) postdoctoral fellows and a full time laboratory manager. We have one general laboratory meeting, one graduate student-only meeting, and one personal meeting each week. We also have joint lab meetings with two other labs weekly.

Research Interests: Our central laboratory focus this year is to continue to utilize global genomic approaches to uncover and investigate the “enhancer code” controlled by new, previously unappreciated pathways that integrate the genome-wide response to permit proper development and homeostasis, and that also functions in disease and senescence. We have investigated these events in differentiated cells, neuronal development, stem cells, and cancer. Our biological focus is on molecular mechanisms of the “enhancer code” regulating learning and memory; aggressive prostate and breast cancer, and they underlying events of senescence/aging. Epigenomic events studied include non-histone methylation events and non-coding RNAs. We are investigating these events in development, breast and prostate cancers, and in inflammation-based disease, including degenerative CNS disease and diabetes. The emerging importance of non-coding RNAs and regulation of nuclear architecture is rapidly altering our concepts of homeostasis and disease. Our laboratory is “Seq-ing” (RIP-seq, ChIP-seq, RNA-seq, GRO-seq, CLIP-seq, ChIRP-seq), and a new “FISH-seq”, for open-ended discovery of long-distance genome interactions to uncover new “rules” of regulated gene transcriptional programs and new roles for lncRNAs in biology of normal, cancer neuro-affective disorders and aging cells. Coupling this with chemical library screens, we hope to introduce new types of therapies based on targeting specific gene enhancers, histone protein readers and writers, and lncRNAs for cancers and other diseases. Recent surprising findings have been novel roles of lncRNAs prostate and breast cancer, connection between DNA damage repair/transcription and replication, and unexpected roles of enhancer RNAs.

Current interests include:

  • The “enhancer code,” Epigenomics and transcriptional regulatory mechanisms.
  • Roles of by ncRNAs in enhancer function in signal-dependent genomic relocation and in establishing subnuclear architecture.
  • Mechanisms of signal-induced tumor chromosomal translocations events and new chemical screens for inhibitors for breast and prostate cancer.
  • The “enhancer code” or regulation of learning and memory, including Reelin-regulated enhancers.
  • Linkage of DNA damage/repair and transcription.
  • Retinoic Acid regulation of Pol III-transcribed DNA repeats in maintenance of the stem cell state, in neuronal differentiation and in senescence.
  • Molecular mechanisms of prevelant disease associated sequence variations (GWAS) in disease susceptibility loci.
  • “Epigenomics” in neuronal differentiation, cancer, diabetes and degenerative brain disease.
  • Answering the question when and how enhancers arise and became functional (stem cells to mature cell types).

  • Bioinformatics Rotation Projects

    Last Updated:

    Potential projects include:

    • Projects employing use of genome-wide technologies, including ChIP-seq, GRO-seq, CLIPseq-, RNA-seq, and ChIRP-seq, to elucidate molecular mechanisms of regulated enhancer lncRNA actions in cancer and stem cells;
    • Roles and mechanisms of enhancer actions in prostate and breast cancers;
    • Enhancer-based model of neurodevelopment and CNS disorders;
    • New mechanisms of long non-coding RNAs dictating physiological gene regulation in cancer transcriptional programs;
    • Understanding subnuclear structures: Roles of relocation of transcription units between subnuclear architectural structures in regulated gene expression;
    • Chemical library screens to gene signature and translocation responses as an approach toward new cancer therapeutic reagents;
    • Roles of epigenomic regulators and expression of DNA repeats in stem cells, neuronal differentiation and in senescence.

| Biological Sciences

jischroeder@ucsd.edu | Profile | Lab
  • Systems Biology and Engineering of Environmental and Drought Tolerance in Plants

    Last Updated:

    Julian Schroeder's research is directed at discovering the signal transduction mechanisms and the underlying signaling networks that mediate resistance to environmental stresses in plants, in particular drought, salinity stress and CO2 responses in plants. These environmental (abiotic) stresses have substantial negative impacts on plant growth and crop yields. These environmental stresses are also relevant in reference to climate change and to maintaining available arable land to meet human needs. Research in Julian Schroeder's laboratory is using multidisciplinary approaches including genomics, bioinformatics, cell signaling, network modeling, proteomics and molecular biological towards uncovering the signal transduction network and receptors in plants that translate drought stress hormone reception, CO2 sensing and salinity stress to specific resistance responses. Some of the recent research advances are being used in the biotechnology industry with the goal of enhancing stress resistance of plants and crop yields. Undergraduate research projects will include systems biology and bioinformatics and innovative analyses of large scale data sets within this research. Undergraduate research projects will be pursued to model and identify drought stress-induced and CO2-induced signaling networks based on “omic” scale data sets. Models will be directly tested by wet lab experimentation.

    Julian Schroeder is Co-Director of the Center for Food and Fuel for the 21st Century. See http://www-biology.ucsd.edu/labs/schroeder for more information on the Schroeder lab.

    Selected publications

    • Nishimura et al., Science (2009).
    • H.H. Hu et al., Nature Cell Biol. (2010).
    • T.H. Kim et al. Current Biol (2011).
    • Xue et al., EMBO J. (2011).
    • F. Hauser et al. Current Biol (2011).
    • B. Brandt et al., PNAS (2012).
    • R. Waadt et al., eLife (2014).
    • A.M. Jones et al., Science (2014).
    • C.B. Engineer et al., Nature (2014).
    • B. Brandt, S. Munemasa et al. eLife (2015).
    • See also: http://labs.biology.ucsd.edu/schroeder/publications.html

| Bioengineering

yiw015@eng.ucsd.edu | Profile | Lab

Our research focuses on molecular engineering for cellular imaging and reprogramming, and image-based bioinformatics, with applications in stem cell differentiation and cancer treatment.

  • Image-based reconstruction of biochemical networks in live cells

    Last Updated:

    Fluorescence resonance energy transfer (FRET)-based biosensors have been widely used in live-cell imaging to accurately visualize specific biochemical activities. We have developed the Fluocell image analysis software package to efficiently and quantitatively evaluate the intracellular biochemical signals in real-time, and to provide statistical inference on the biological implications of the imaging results. However, important questions arise on how to use these results to reconstruct the quantitative parameters in the underlying biochemical networks, which determine cellular functions and ultimately their fates. In this rotation project, we will integrate optimization-based machine learning approaches with biochemical network models to seek answers to these questions, with applications in cancer treatment against drug resistance.

  • Intelligent Diagnosis of Infectious Diseases by Deep Learning

    Last Updated:

    The diagnosis of infectious diseases often requires tissue biopsy and microscopic examination by pathologists, which is time-consuming, labor-intensive, and error-prone. To develop a software-assisting system for identifying microorganisms on digital images, we utilize the convolutional neural network and transfer learning for training and validating an intelligent software system for the classification of pathology slides. The goal of this project is to provide a diagnosis of pathogens with high efficiency and accuracy. Students will work in an interdisciplinary team, collecting and labelling imaging data, developing deep-learning based algorithms and user interfaces, characterizing and optimizing the accuracy and functionality of the software package.

| Cellular and Molecular Medicine

geneyeo@ucsd.edu | Profile | Lab

We have a wide scope of projects ranging from developing novel algorithms for studying RNA processing in diseases, development and personalized medicine, and for analyzing single-cell RNA-seq data.

  • ENCODE RNA binding proteins

    Last Updated:

    The Yeo lab is responsible for the identification of the RNA sequence elements bound by 250 RNA binding proteins (RBPs) as part of the newest ENCODE (https://www.genome.gov/10005107) efforts. Various computational projects that pertain to integrating RNA binding sites with functional alternative splicing, RNA stability and translational changes to generate global, genome-wide predictions of what each RBP can do are available for enterprising, hard-working graduate/undergraduate students.

  • Single-cell Analysis

    Last Updated:

    Recent studies of single cells demonstrate that the assumption that all cells of the same "type" are identical is simply inaccurate. Single, individual cells from the same population of cells differ by a lot and these differences underlie phenotypic responses to environmental stimuli. The Yeo lab is using microfluidics-based platforms to study whole transcriptome differences in single cells from a variety of biological systems, ranging from embryonic stem cells to diseased motor neurons. One of our projects is to develop new bioinformatic analytics to study cellular heterogeneity during environmental influences.

| Bioengineering

szhong@ucsd.edu | Profile | Lab

We study causal relationships between gene regulation and cellular behaviors, by developing computational and experimental methods on network modeling, stem cell engineering, epigenomic and single-cell analyses.

  • Continued development of Comparative Epigenome Browser (Undergraduate Research Project)

    Last Updated:

    The Comparative Epigenome Browser (CEpBrowser, http://www.cepbrowser.org) is an online data management, visualization, and analysis tool that allows the public to perform multi-species epigenomic analysis (Cell, 2012, 149: 1381-1391) (Bioinformatics, 2013, 29 (9): 1223-1225). In collaboration with the ENCODE project, this undergraduate research project will extend CEpBrowser to incorporate new ENCODE and mouse ENCODE data, implement interactive data management features, and implement new data analysis features.