Bioinformatics and Systems Biology Rotation Projects

This page is updated annually. Some projects may already be taken, and new projects may be available. The projects below give an indication of the types of projects available in each lab, but please browse faculty web pages and contact professors directly to discuss current opportunities.

View Rotation Projects by Faculty: BISB or BMI

Labs with BISB Rotation Projects

Vineet Bafna | Computer Science and Engineering

Nuno Bandeira | School of Pharmacy

Steve Briggs | Biological Sciences

Christopher Glass | Cellular and Molecular Medicine

Joseph Gleeson | Neurosciences (School of Medicine)

Nan Hao | Biological Sciences

Jeff Hasty | Biological Sciences

Julie Law | Salk Institute for Biological Studies

Andrew McCammon | Chemistry and Biochemistry

Pavel Pevzner | Computer Science and Engineering

Bing Ren | Cellular and Molecular Medicine

Jonathan Sebat | Cellular and Molecular Medicine

Palmer Taylor | School of Pharmacy

Susan Taylor | Chemistry and Biochemistry

Yatish Turakhia | Electrical and Computer Engineering

Wei Wang | Chemistry and Biochemistry

Gene Yeo | Cellular and Molecular Medicine

Rose Yu | Computer Science and Engineering

Huilin Zhou | Cellular and Molecular Medicine

| School of Pharmacy

Our field of research is computational structural chemical biology, pharmacology, toxicology, and structure based drug discovery. We develop mathematical and computational methods for molecular mechanics, docking, visualization and cheminformatics and apply them. We also use maintain several web servers and our own large scale clusters for massively parallel calculations. See our website http://ablab.ucsd.edu for more details.

  • Building a mixed docking/pharmacophore based engine for activity prediction

    Last Updated:

    The lab now built a collection of about 1000 pocket ensembles. Computational docking a new compound to them and scoring the compound will predict the activity of it. The project involves building mixed models in which the docking score weights are optimized and the docking score is complemented with a different kind of score based on the continuous pharmacophoric models. The resulting models will be tested for their ability to predict a selectivity profile of the test compounds. This project will involve cluster or distributed computing.

  • Data visualization: 3D trees

    Last Updated:

    This project will focus on the development of the server for a new kind of data visualization of large data sets. Feel free to contact me for further details.

  • Development of activity and toxicity models related to anti-targets

    Last Updated:

    Many chemicals and metabolites can cause serious adverse effects we can read about on drug labels. We have gradually accumulated a large body of data relating some of these activities to binding to specific proteins. This project will involved the development and optimization of a specific so continuous pharmacophoric models which can be built without the knowledge of the three dimensional structure of the target. The models will be built for several key anti-targets: hERG, PPARg, ESR, D2, etc., 5HT2c and 2b, M1. The models will be tested against the known therapeutics and their metabolites.

| Cellular and Molecular Medicine

Our lab's research focuses on understanding the information hidden in large-scale omics datasets. We are particularly interested in elucidating the mechanisms by which cancers develop and in leveraging this knowledge for the development of better cancer prevention strategies and the improved targeting of existing cancer treatments. 
Throughout the past five years, our work has predominately focused on creating the concept of mutational signatures, on demonstrating the utility of mutational signatures in understanding human cancer, and on identifying mutational signatures in a plethora of diverse cancer types. Our belief is that by developing a next-generation of machine learning approaches, we can obtain a predictive-understanding of the basic molecular processes contributing to cancer develop, which will allow us to both better prevent and better treat human cancer.

  • The evolution of human evolution

    Last Updated:

    In principle, Darwinian evolution requires at least two essential ingredients: (i) processes that change the inherited genetic material (i.e., mutation of the germline DNA); and (ii) processes that cause natural section based on the functional/phenotypic results of these genetic changes. Germline mutations are believed to predominately originate from endogenous cellular processes with minor contributions from exogenous processes. Each mutational process imprints a characteristic mutational pattern on the genome, termed, mutational signature. For example, the deamination of 5-methylcytosine to thymine is an endogenous process generating C:G>T:A mutations at CpG dinucleotides, while CC:GG>TT:AA doublet substitutions occurring at dypyrimidines are associated with exogenous exposure to ultraviolet light. Analyses of mutational signatures in thousands of cancer genomes has revealed the signatures of more that 100 mutational processes. Some of these mutational processes are operative throughout the entire lifetime of an individual whereas others are present only at certain stages of life. The signatures of processes gradually accumulating throughout the entire lifetime of an individual are referred to as clock-like mutational signatures, and these include signature 1 (etiology: deamination of 5-methylcytosine) and signature 5 (etiology: unknown). Previous work on de novo germline mutations derived from family trios demonstrated that signatures 1 and 5 can explain the majority of these germline variants, indicating that the clock-like signatures are the main contributors to human evolution. However, the activity of different mutational signatures has never been evaluated in regard to the phylogenetic timeline of human evolution. In this rotation project, we will analyze data from the 1000 genomes project, a database containing the germline genomes of 2,504 individuals from 26 populations. This database includes 84.7 million single nucleotide polymorphisms (SNPs) and 3.6 million short insertions/deletions (indels) phased onto high-quality haplotypes. Using these data, we will build a phylogenetic tree (i.e., a tree showing the evolutionary relationship between individuals) where each leaf of the tree will contain the private germline mutations derived from a single individual. The activity of mutational signatures will be evaluated in each leaf as well as in each node of the phylogenetic three. The analysis will reveal the activity of mutational signatures throughout human evolution.

  • Understanding the molecular landscape of precancers for preventing cancer

    Last Updated:

    All cancers originate from a single cell that undergoes a transformation from a normally functioning somatic cell into a malignant neoplasm. In most cases, this transformation follows a stepwise process with the somatic cell first expanding into a precancer and, subsequently, becoming an advanced invasive cancer. The progression from a pre-malignant tumor to a malignant neoplasm is due to somatic mutations that can be traced, characterized, and genomically studied. In this rotation, the student will evaluate the mutational burden, driver mutations, copy number changes, mutational signatures, and subclonal architecture of pre-malignant lesions and compare them to molecular events previously identified in advanced invasive cancers. The goal is to reveal the molecular events that are necessary for a precancer to convert into cancer. Independent previously generated drug-screen datasets (e.g., Cancer Cell Line Encyclopedia) will be used to propose potential intervention strategies that can used to target these molecular events in order to halt this conversion and lead to cancer prevention.

| Halıcıoğlu Data Science Institute

We are a statistical genetics lab focusing on developing methods to study complex traits and polygenic diseases across global populations, with a specific focus on minority groups that have been underrepresented in the fields of genetics and genomics. We are interested in developing novel multi-ancestry statistical methods for fine-mapping disease genes and their cell types of action. The goal of this research is to identify targets for gene-based therapeutics.

  • Improving disease-gene association testing using statistical priors on genetic regulation of gene expression

    Last Updated:

    One popular approach to disease-gene association testing is a transcriptome-wide association study (TWAS). Conceptually, TWAS is a test for the genetic correlation between cis-regulated gene expression and disease. However, only half of the genetic regulation of gene expression is expected to be in cis, e.g. by genetic variation within 1 Mb of the gene. The goal of this project is to develop a novel statistical method that leverages priors on SNP-gene regulatory links beyond the cis-window to improve our understanding of the genetic regulation of gene expression. As a result, our ability to identify disease-associated genes via TWAS should substantially improve due to (1) the enhanced identification of genes regulated by genetic variation and (2) the increased accuracy with which we can predict an individual's gene expression.

  • Multi-ancestry gene-disease association testing via cross-population modeling of eQTLs

    Last Updated:

    One approach to infer causal genes in disease is a transcriptome-wide association study (TWAS). However, TWAS is not powerful in non-European populations due to poor trans-ancestry portability of gene expression prediction models and smaller genome-wide association study (GWAS) sample sizes. The purpose of this project is to develop a novel machine learning approach to mitigate the issue of trans-ancestry portability. This approach will allow for powerful TWAS in non-European populations by simultaneously modeling genetic and genomic data from different populations, which has previously been challenging due to population-specific differences in genetic architecture such as linkage disequilibrium.

| La Jolla Institute for Immunology

We are interested in the analysis and modeling of the three-dimensional chromatin structure from high-throughput sequencing experiments. We develop methods that are based in statistics, machine learning, optimization and graph theory to understand how changes in the 3D genome affect cellular outcome such as development, differentiation and gene expression. We have ongoing interests in the systems level analysis and reconstruction of regulatory networks, inference of enhancer-promoter contacts, predictive models of gene expression and integration of three-dimensional chromatin structure with one-dimensional epigenetic measurements in the context of cancer, malaria, asthma and several autoimmune diseases.

  • Integrative analysis of multi cell-type gene expression and epigenomic data in tumor immune response

    Last Updated:

    This project will focus on developing regulatory network inference methods for the joint analysis of gene expression and histone modification data from several different types of tumor infiltrating lymphocytes, which are gathered from a cohort of patients with solid tumors.

  • Predictive and comparative modeling of epigenetic gene regulation in different human immune cell types

    Last Updated:

    The goal of this project is to model the natural variation in gene expression across many immune cell types using an already established database at LJI (https://dice-database.org) and to identify cell type-specific epigenetic regulators of important immune genes.

  • Statistical methods for inferring functional DNA-DNA contacts from Hi-C and HiChIP/PLAC-seq data

    Last Updated:

    This project focuses on developing computational tools for better analysis of the wealth of data from chromosome conformation capture assays with the ultimate goal of inferring functional chromatin contacts such as those between enhancers and promoters.

| Computer Science and Engineering

Our lab is focused on design and implementation of algorithms for biological data interpretation. Within this broad framework, we have a number of open projects relating to problems in proteomics (interpretation of mass spectrometry data), genetics, and genomics. The projects listed below are a small sampling of available projects. Interested students should be have taken a class in algorithms design, and have some facility with machine learning approaches.

  • Genome query

    Last Updated:

    We are developing a tool-kit for archiving and querying genomic information. Part of the project involves developing a compiler for interpreting the query language.

  • MS imaging

    Last Updated:

    MS imaging is a technique for visualizing proteins in their spatial context. We have projects on analyzing MSI data, clustering, and identifying signatures of interest, and identification of peptides.

| School of Pharmacy

The Bandeira lab develops novel computational mass spectrometry approaches for the discovery and characterization of biomarker metabolites, proteins, post-translational modifications and protein-protein interactions, with the ultimate goal of substantially improving the capabilities of proteomics discovery pipelines towards the development of novel drug therapeutics.

  • Computational Mass Spectrometry

    Last Updated:

    Several rotation projects are available, each requiring a different mix of algorithmic, programming and data analysis skills:

    1. Characterization of protein sequence polymorphisms in drug-resistant TB strains.
    2. Categorization of post-translationally modified opioid neuropeptides.
    3. Discovery of novel post-translational modifications in human cancer
    4. Early detection of biosimilar precursors of current and possible therapeutic drugs, including proteomic analysis of monoclonal antibodies and venom proteins.

| Pediatrics

Research in our lab is focused on developing computational methods and tools for variant calling in human genomes and using these tools for disease association studies. We focus on challenging variant types such as haplotypes and variants in repetitive regions and work with both short-read (Illumina) and long-read sequencing technologies.

  • Duplicated genes and association with disease

    Last Updated:

    Hundreds of duplicated genes in the human genome are duplicated and many are known to be associated with a number of human diseases. However, the short read lengths of current sequencing technologies make the analysis of such genes difficult. We have developed novel tools to genotype the copy number of duplicated genes using whole-genome sequencing. The goal of this project is to analyze large-scale sequencing datasets (using cloud computing platforms) for Mendelian and complex human diseases to identify novel disease associations. 

  • Haplotype-based variant calling using long-read sequencing

    Last Updated:

    Long-read sequencing technologies have the potential to overcome some of the key limitations of short-read sequencing, particular in long repetitive regions of the human genome, but require the development of new algorithms. We have previously developed computational methods for variant calling (Longshot, Nature Communications 2019) and read mapping in segmental duplications (Duplomap, Nucleic Acids Research 2020) using long-read sequencing technologies. The goal of this project is to implement a haplotype-based model for variant calling using long reads that automatically identifies genomic regions that can be called with high confidence.

| Biological Sciences

We model relationships between the proteotypes and the phenotypes of cells/organisms, with an emphasis on innate immunity in plants. Proteotypes are measured using custom methodology for high-throughput proteomics based on mass spectrometry. Students have an opportunity to integrate training in bioinformatics with chemistry and biology.

The specific state of the proteome in a given cell, tissue, or organism is known as the proteotype. The proteotype integrates constraints imposed by the genotype, the environment, and by developmental history (e.g., a leaf cell has a different proteotype than a root cell with the same genotype in the same environment). The proteotype directly determines phenotype since all molecules are made by and regulated by proteins. Thus, a complete description of the proteotype should define a phenotype at the molecular level. We are constructing an Atlas of Proteotypes that currently includes 162,777 peptides from 41,553 proteins in 65 different tissues and stages of development. In addition, we have identified and measured more than 30,000 phosphopeptides from these same samples. The 65 resultant proteotypes are revealing thousands of unanticipated regulatory relationships. The relationships between mRNA levels and protein levels are fascinating; they indicate that protein levels from some genes are regulated by transcription but that most protein levels are under post-transcriptional control. Inspection of our data explains tissue specific traits such as oil accumulation in the embryo that results from selective accumulation of proteins from common mRNAs.

  • Plant systems biology

    Last Updated:

    We have generated the most comprehensive quantitative proteomics description of a higher eukaryote. The data include protein abundance and phosphorylation levels of 20,000 proteins across 34 tissues and stages of development in maize (corn) and it is paired with RNAseq data. Projects are available for students to use machine learning (random generalized linear models) to model the relationships between regulators and targets. For example, we are comparing the protein abundance and phosphorylation levels of transcription factors to the mRNA abundance of all genes. There are many other types of regulators whose roles can be explored using this dataset. We evaluate modeled regulatory relationships using transgene expression in protoplasts. Models are assembled into networks and integrated with pathway annotations to predict phenotypic consequences of synthetic genes. These predictions are tested in transgenic plants. Students of bioinformatics and systems biology who wish to integrate computational with experimental training in chemistry and biology are especially encouraged to apply.

| Pediatrics

Our research group strives to push the boundaries of genetic engineering and synthetic biology by developing methods with which to manipulate eukaryotic genomes and probe biological phenomena in high-throughput. We apply these tools ourselves or through collaboration with a particular focus towards understanding neurobiology and infectious disease.

  • Developing next generation CRISPR tools

    Last Updated:

    We have developed two novel technologies for rapidly generating pools of thousands of CRISPR variants and assaying their activities in a single study. Using these methods we aim to improve the efficacy of existing CRISPR tools along with endowing CRISPR tools with new activities.

  • Exploring protein language models to create enhanced protein variants

    Last Updated:

    Our lab has been exploring the use of protein language models coupled with high-throughput protein variant screens to rapidly generate optimized protein variants. We are interested in applying these tools towards the amelioration of proteins misfolding and the optimization of gene editing tools.

  • Protein tagging at scale to generate a comprehensive map of the cell

    Last Updated:

    We have recently reported a method for generating pools of cells, where each cell in the pool has a different protein tagged with either an affinity reagents (e.g. FLAG) or fluorescent protein (e.g. mCherry). Using this method, we are collaborating with several other groups on campus to generate a comprehensive map of the cell, elucidating the complexity in protein-nucleic acid and protein-protein interactions at scale and across genotype and environmental conditions.

| School of Medicine

The main objective of the Chavez laboratory is the molecular characterization of malignant childhood cancers in order to identify drug targets and improve treatment options. Our focus is mainly on pediatric brain tumors such as medulloblastoma, glioblastoma, and ependymoma. Recently, we have demonstrated how to leverage epigenetic information such as DNA methylation and enhancer profiling in pediatric brain tumors and normal human tissues to identify clinically relevant tumor subgroups, oncogenic enhancers, transcription factors, and pathways amenable to pharmacologic targeting. To reveal regulatory circuitries disturbed in childhood brain tumors, we generate and integrate public high-dimensional data from primary tumors and patient-derived cell lines. We are specifically interested in the analysis of somatic and germline DNA mutations, chromatin and DNA modifications, transcription factor binding, and gene expression.

  • The 3D Tumor Genome

    Last Updated:

    To identify molecular mechanisms that contribute to tumor development and maintenance, we develop hypotheses driven computational tools for the integrative analysis of different layers of genetic and epigenetic information. As we recognized that our epigenetic mapping studies can identify effective drug targets, we are now profiling 3D tumor genomes to uncover molecular mechanisms that may cause disturbed enhancer-gene interactions leading to deregulation of gene expression and biochemical pathways.

| Radiology

We have a variety of projects ranging from brain mapping to derive optimal brain atlases, integrated omic analyses to identify genetic underpinnings of the brain, to precision medicine approaches for drug response prediction and drug target identification.

  • Genomic study of brain MRI phenotypes

    Last Updated:

    A major challenge hindering progress in neuropsychiatric medicine is our limited understanding of the genetics underlying the complexity of human brain structure and function. Our project aims to characterize genetic effects on the brain by multimodal imaging using human biobanks with MRI and genotype data. This will provide insight into shared and distinct genetic influences among different brain regions. Building on improved genetic knowledge of the brain, we will determine genetic relationship between brain morphology and neuropsychiatric disorders using statistical genetics tools. We will estimate effects of neuropsychiatric genetic risks and environmental exposures on deviations of MRI phenotypes from normal neurodevelopmental and aging trajectories.

  • Omic analyses for drug target identification

    Last Updated:

    Our goal is to identify potential drug targets of brain disorders (e.g., Alzheimer’s disease) through gene networks comprising disease-associated genes. Recent genomic studies have advanced our knowledge of the genetics of brain disorders and related traits, which could illuminate the pathogenesis of brain disorders. 
    The new knowledge provides opportunities for genetic-based strategies for drug target identification. Bioinformatics analyses will be performed to prioritize drug targets and potential drugs for repurposing.

| Psychiatry

Dr. Cheng’s research focuses on transcriptional regulatory network and aims to develop a comprehensive understanding of how aberrant regulatory circuits contribute to human disease. Dr. Cheng’s laboratory is particularly interested in understanding transcriptional and epigenetic regulation of the interplay between the immune system and central nervous system in neurodegenerative diseases, substance use disorder and HIV infection. Current projects focus on applying single-cell transcriptomics and epigenetics assays to characterize Alzheimer’s disease, HIV and opioid use disorder patient samples, with the goal of finding diagnostic markers and therapeutic targets. Dr. Cheng’s lab also has developed 3D brain organoid models for Alzheimer’s disease and HIV infection. Dr. Cheng received her M.S. degree in Computer Science from Stanford University, and she received her Ph.D. degree in Bioinformatics and Systems Biology from University of California, San Diego. After completing her doctoral study, Dr. Cheng did her postdoctoral training at the Broad Institute of MIT and Harvard.

  • 3D brain organoid model of Alzheimer’s disease revealed by single cell transcriptomics

    Last Updated:

    We developed a novel tau propagation model using 3D spheroid model that rapidly develop tau pathology and neurodegeneration in just three weeks. Single cell transcriptomics of the model reveals cell type specific changes that resemble transcriptomic signatures from Alzheimer’s disease postmortem brain.

  • Single cell transcriptomics and epigenetics of human Alzheimer’s disease brain

    Last Updated:

    To understand cell type specific vulnerability of Alzheimer’s disease, we utilize snRNA-seq to characterize human brain tissues from Alzheimer’s disease patients across different brain regions.

  • Single cell transcriptomics and epigenetics of the opioid use disorder and HIV syndemic in the human brain

    Last Updated:

    As part of the NIH NIDA SCORCH consortium, we will dissect the dysregulated molecular circuitry in the brains of individuals with opioid use disorder and/or HIV infection. This project aims to identify genes that contribute to opioid use disorder and HIV-associated neurocognitive disorders. These approaches could lead to novel gene therapies to control and perhaps reverse the relentless disease state. We are in the process of generating snRNA-seq and snATAC-seq profiles from more than 300 patient samples across 3 different brain regions.

  • Single Cell Transcriptomics of the Cocaine Use Disorder in the Context of HIV

    Last Updated:

    As part of the NIH NIDA SCORCH consortium, we will dissect the dysregulated molecular circuitry in the brains of individuals with cocaine use disorder and/or HIV infection. This project will focus on understanding how neurovasculature and neuroimmune cells contribute to cocaine use disorder and HIV-associated neurocognitive disorders. We will be generating snRNA-seq and snATAC-seq profiles from more than 300 patient samples across 3 different brain regions.

| Biological Sciences

Our laboratory is working to understand the gene regulatory mechanisms that drive the earliest stages of mammalian development-- from development and reprogramming of the fully differentiated oocyte to the totipotent embryo to the molecular determinants of successful human embryo implantation. To answer these questions, we are integrating computational, epigenetic, chromatin and RNA biology approaches in human, mouse and stem cell-based models.

  • Deciphering the combinatorial code regulating maternal mRNA polyA tails from oocyte to embryo

    Last Updated:

    mRNA polyadenosine (polyA) tail lengths play a unique and critical role in controlling gene expression in the developmental transition from oocyte to embryos from worms to humans. We have recently generated a large comprehensive profile of polyA tail lengths across the mouse oocyte-to-embryo transition with Nanopore long read sequencing, capturing tail length regulation with isoform-specific resolution (including 3'UTR length, splice isoform, polyadenylation site choice, etc.). We found dynamic changes in polyA tail length across this transition and that these changes in tail length control mRNA translation and stability. But what molecular mechanisms orchestrate these changes in polyA tail length?  In this project, we will apply machine learning approaches to ask which mRNA features (number and position of specific RNA binding protein motifs, mRNA length, mRNA abundance, polyadenylation site choice, 3'UTR length, etc.) are most predictive of (1) polyA tail length at each developmental stage and (2) change in tail length across consecutive developmental stage transitions.  These analyses provide an exciting opportunity to address questions decades-old questions as to the mechanisms driving the earliest stages of mammalian development.

  • Uncovering the molecular determinants of successful implantation of the human blastocyst

    Last Updated:

    High rates of failed implantation in human embryos represents one of the greatest obstacles in our ability to treat infertility. Significant improvements require significant advances in our understanding of the molecular mechanisms required for successful implantation and ongoing development. The goal of this funded project is to uncover the network of regulatory factors and cell-cell signals that control cellular differentiation, developmental progression and successful implantation during early human embryo development using both human embryo and stem cell-based in vitro models. These analyses will provide the first “ground truth” atlas for human embryos of high implantation potential—and for stem cells developed to model differentiation within the embryo--and essential first steps toward development of an in vitro model for human embryo implantation.

| Biomedical Informatics

Our goal is to understand the associations between genetic variation and human disease. As part of the Center for Admixture Science and Technology (CAST), we work with large genetic datasets, such as the UK BioBank, GTEx, All of Us (AoU) and the Million Veterans Program (MVP) to characterize the associations between genetic variation and disease in global and local ancestry-aware settings.

  • Ancestry-aware genome-wide association studies

    Last Updated:

    For the past 500 years, the American continent has been the site of ongoing mixing of Europeans, Native Americans, Africans and Asians, resulting in a significant percentage of Americans carrying ancestry from outside their self-identified race. However, genome-wide association studies (GWAS) and polygenic risk score (PRS) calculations have been performed and optimized on individuals of European descent, with minor exceptions. As part of CAST (Center for Admixture Science and Technology), we are developing methods to perform GWAS and calculate PRS on diverse and admixed populations, using large diverse cohorts, such as the All of Us and Million Veteran Program.

| School of Pharmacy

Our work aims to develop new mass spectrometry based methods to understand the chemistry of microbes, our microbiome and their ecological niche. In short, we develop tools that translate the chemical language between cells. This research requires the understanding of (microbial) genomics, proteomics, imaging mass spectrometry, genome mining, enzymology, small molecules structure elucidation, bioactivity screening, antibiotic resistance and an understanding of small molecule structure elucidation methods. The collaborative mass spectrometry innovation center that he directs is well equipped and now has twelve mass spectrometers, that are used in the studies to investigate capture cellular chatter (e.g. metabolic exchange), metabolomics, metabolism and to develop methods to characterize natural products. These tools are used to defining the spatial distribution of natural products in 2D, 3D and in some cases real-time. Areas of recent research directions are capturing mass spectrometry knowledge to understand the microbiome, non invasive drug metabolism monitoring, informatics of metabolomics, microbe-microbe, microbe-immune cells, microbe-host, stem cell-cancer cell interactions and diseased vs. non-disease model organisms and the development of strategies for mass spectrometry based genome mining and to detect and structurally characterize metabolites through crowd source annotation of molecular information on the Global Natural Products Social Molecular Networking site through the NIH supported center for computational mass spectrometry that is co-developed with Nuno Bandeira. A more detailed biography can be found in this Nature article.

  • Mapping the mass spectrometry chemical space for structure

    Last Updated:

    The chemical space is enormous, however molecules are related in chemistry behave similar by mass spectrometry. Mass spectrometry generates a output in numbers and these numbers can be correlated. In this project we want to build a comparative network to generate a predictive structural classification. If succesful, this project will transform therapeutic discovery, the investigations of signalling molecules and will benefit disease diagnosis.

  • Predictable disease prognosis network interactions of microbial interactions

    Last Updated:

    For every human cell there are 10 microbes. Microbes are importan for our metabolism but also the proliferation and control of diseases. Our lab has been developing imaging mass spectrometry based tools to understand how neighboring microbes control the proliferation of pathogens. In this project we want to come up with a predictive model of the signalling molecules and microbial interactions.

| Salk Institute for Biological Studies

The establishment of cell-type identity and specific gene expression patterns is tightly regulated by the interplay between different modalities of the epigenome. These include DNA cytosine methylation, which affects transcription factors’ binding to regulatory elements, and higher-order chromosome structures bridging the distal regulatory elements to the target genes. Studying their diversity across cell types is fundamental for understanding complex human diseases in different tissue contexts.

  • Characterization of gene regulatory elements using multi-omic data

    Last Updated:

    Characterization of gene regulatory elements using multi-omic data

  • Epigenetic Variation and Inheritance

    The goal of this project is to understand the degree of epigenetic and genetic variation that occurs in Arabidopsis (a reference organism for all plants) . The rotation would involve characterization of DNA methylation, transcription and chromatin pattern in hundreds of individual strains collected from around the world. The project is expected to reveal new roles of epigenetic inheritance in many biological processes, some that affect plant fitness and adapation to climate.

  • Human ES/iPS Epigenomes

    Our aim is to understand the contribution of the epigenome in reprogramming and differentiation of induced pluripotent stem (iPS) cells. We are using novel sequencing approaches to study reprogramming of the epigenomes of somatic cells to an ES-like state and the degree to which epigenomic reprogramming affects iPSC potential. Computational and experimental studies can be combined in this training experience. This project is a collaboration with Ron Evans' laboratory.

  • Mouse Brain Methylome

    Epigenetic regulation of gene transcription, specifically when related to changes in DNA-methylation patterns (methylome), is a plausible mechanism underlying long-term environmental contributions to neuropsychiatric disorders. For example, pharmacological or environmentally-induced methylome alterations may lead to the silencing or aberrant activation of genes involved in the postnatal maturational process of brain circuitry, leading to functional and behavioral alterations appearing when the system reaches maturity. The proposed rotation project will contribute to creating complete maps of mouse brain methylomes, at the tissue and cell-type levels, during the period of postnatal development until adulthood. The goal is to delineate the methylome changes and transcriptional consequences produced by two developmental manipulations that lead to aberrant behaviors in adulthood, at the neuronal and peripheral tissue level. These reference for methylome and transcriptome databases consulted in relation to neuropsychiatric disorders with known and unknown developmental origins. This work is a collaboration with Marga Behrens/Terry Sejnowski.

  • Species comparisons of brain cell types

    Last Updated:

    Computational analysis of multi-omic single cell data from 4 species (mouse/marmoset/macaque/human)

| La Jolla Institute for Immunology

After decades of research, we still do not know why the influenza (flu) vaccine elicits a strong antibody response in some but a negligible response in others. Rather than analyzing each vaccine study individually, our lab combines the wealth of existing data to predict each person's response and then tailors their choice of vaccine to maximally augment their immunity. Influenza is one of the best-studied viruses of all time, yet the models we develop are designed to readily generalize to other pathogens and other biological systems.

  • Dynamics of the vaccine response

    Last Updated:

    Most vaccine studies measure the antibody response pre-vaccination and 1-month post-vaccination, but an ideal response must last for the duration of the influenza season (4 months) and ideally until you get your next vaccine (12 months). We will use pre-vaccination data to predict the response out as far as possible and quantify how additional measurements (e.g. at 1 month) further improve prediction accuracy.

  • Humans Versus Animal Responses

    Last Updated:

    Most influenza surveillance utilizes antibody responses from ferrets (outnumbering the amount of human data by 10-fold). While it is known that ferret responses can differ from human responses, it is not clear when or how their responses will differ. Using ferret data, we will predict the value±error for human experiments, which will help refine influenza vaccine selection (that is currently determined through ferret studies).

  • Incorporating Virus Sequences

    Last Updated:

    The influenza vaccine changes every 1-2 years. While we can separately model each vaccine, we would like to train a model on the combined data from all prior studies. Our current approach describes each virus according to its interactions with antibodies. By adding sequence information, we will develop a single unified model that can predict all past vaccines and explore the space of potential future vaccines.

| Pediatrics

Welcome to the Frazer Lab! We are using two complementary approaches to achieve our goal of identifying and characterizing functional human genetic variants. Our first approach utilizes iPSCORE, a resource that was generated to enable both familial and association-based genetic studies of molecular and physiological phenotypes in induced pluripotent stem cells (iPSCs) and derived cell types. Our second approach involves conducting association studies in well-characterized cohorts with the goal of identifying variants that play roles in human disease and to assess their contributions to disease pathogenesis, progression, and prognosis.

  • Investigate fetal-specific cardiac regulatory variants and their overlap with cardiac GWAS lead variants

    Last Updated:

    We have derived iPSC-CVPCs from 180 individuals and showed that their transcriptomes are more similar to fetal heart than to adult cardiac tissues. Our goal is to leverage these data in combination with WGS to perform eQTL analyses. We plan to assess whether fetal-specific eQTLs are associated with complex adult cardiac traits, by colocalizing eQTLs with summary statistics from GWAS (cardiac traits.) Our preliminary analyses show that eQTLs in iPSC-CVPCs identifies cardiac disease GWAS variants that are active in the fetal but not adult heart, indicating that they play a role in development. Our findings provide genetic evidence supporting the fetal origins of the cardiovascular disease hypothesis and highlight the importance of investigating genetic associations across stages of development (i.e. fetal and adult tissues) to fully understand the genetic underpinnings of complex traits and disease. We are looking for rotation students to conduct QTL analyses using large ATAC-seq and ChiP-seq for H3K27ac datasets generated from the iPSC-CVPCs.

| Pediatrics

The Gaulton lab studies the effects of human genetic variation on gene regulation and diabetes risk. We use computational and statistical methods to integrate genome sequence information with epigenomic annotation and molecular QTL data.

  • Genetic and epigenomic fine-mapping of diabetes risk loci

    Last Updated:

    This rotation project involves dense genetic fine-mapping of diabetes risk loci, integrating fine-mapping data with large-scale genomic and epigenomic maps using published and novel models to identify causal variants, cell types and networks, and applying these predictive models to identify additional diabetes risk loci. 

  • Predicting causal genes at diabetes risk loci

    Last Updated:

    This rotation project involves development of novel methods for integrating genetic association data with epigenomic annotation, expression QTLs and chromatin QTLs to predict causal genes of diabetes risk variants

  • Predicting genome-wide pleiotropic effects of diabetes risk variants

    Last Updated:

    This rotation project involves development of novel mixture model approaches to predicting and quantifying the extent of pleiotropy among diabetes risk variants genome-wide.

| Cellular and Molecular Medicine

Dr. Glass’ primary interests are to understand transcriptional mechanisms that regulate the development and function of macrophages. Macrophages play key roles in immunity, wound repair, development and tissue homeostasis. Dysregulation of macrophage functions contribute to a broad spectrum of human diseases, including atherosclerosis, diabetes, neurodegenerative diseases, and cancer. A major effort of the Glass laboratory is to use genomics assays and associated bioinformatics approaches to understand how macrophage gene expression programs are established and how they are influenced by different tissue environments and disease. An important concept to emerge from these studies is that enhancers can be exploited to deduce the transcription factors and upstream signaling pathways that drive context-specific transcriptional outputs. Students are welcome to select projects from current areas of active investigation.

  • Natural genetic variation and macrophage gene expression

    Last Updated:

    Many lines of evidence, including genome-wide association studies, indicate that non-coding genetic variation plays a major role in determining phenotypic diversity. We were among the first laboratories to define the impact of natural genetic variation on enhancer selection and function (Heinz et al, Nature 2013 PMID 24121437), but at present it remains difficult to predict the impact of non-coding variation on gene expression. In a novel and ambitious effort, we systematically characterized the genome wide patterns of mature RNA (RNA-seq), nascent RNA (GRO-Seq), transcriptional initiation (5’GRO-seq), histone modifications and binding profiles of lineage-determining and signal-dependent transcription factors (ChIP-seq), DNA methylation (bisulfite sequencing), and chromatin conformation (HiC, capture HiC and PLAC-seq), in resting and activated macrophages derived from 5 different inbred strains of mice providing ~60 million single nucleotide variants, ~6 million InDels and several hundred thousand structural variants. This data set provides a unique resource for investigating the impact of non-coding variants on transcription factor binding, enhancer activation and target gene expression. We are currently developing new computational methods for analyses of these data with a goal of explaining effects of non-coding mutations and predicting patterns of gene expression in new mouse strains. Related projects are investigating the relationships of genetic variation between selected mouse strains and their different susceptibilities to metabolic and cardiovascular disease. This general project area is both challenging and open ended and there are a wide range of directions that rotation projects could take. As examples, recent rotation students have implemented machine learning approaches to investigate how sequence variants affect collaborative binding between lineage-determining transcription factors.

  • Nature and nurture of microglia

    Last Updated:

    Each population of tissue resident macrophages exhibits a distinct pattern of gene expression that is tuned to the developmental and homeostatic needs of that tissue. For example, brain macrophages called microglia produce factors that are trophic for neurons and monitor synapses, functions that require a brain-specific program of gene expression. A key question is how this tissue-specific program of gene expression is achieved. Through analysis of gene expression and enhancer landscapes, we obtained evidence that the microglia-specific molecular phenotype results from instructive signals in the brain that direct the activation of microglia-specific enhancers (Gosselin et al., Cell 2014 PMID 25480297). Of particular interest, delineation of the gene expression patterns and enhancer landscapes of human microglia revealed that a substantial fraction of the genes associated with non-coding GWAS risk alleles are preferentially or exclusively expressed in microglia, and many are brain environment dependent (Gosselin et al. Science 2017 PMID 28546318). These findings raise several important questions that are under active investigation, including what are the environmental factors that dictate the brain specific program of gene expression and how do human genetic variants affect the regulation of genes that are linked to neurodegenerative disease. We are taking a multi-disciplinary approach including studies of in vivo mouse models, in vitro human iPSC-derived microglia, genomic assays of microglia nuclei derived from control and Alzheimer’s disease brains, and direct analyses of the relation of genotype to gene expression in a growing of RNA-seq data base derived from purified human microglia. As an example, a recent rotation project investigated the question of whether there is any relationship between circulating monocytes (a white blood cell that can differentiate into macrophages in tissues) and microglia gene expression patterns from the same individual. 

| Neurosciences (School of Medicine)

Our lab is interested in how the human brain is assembled. We use genetic strategies to explore causes of human disease and then link these genes to function using model systems. We are focused on diseases like mental retardation, epilepsy and autism, where brain development is disrupted. We have two full time bioinformatics (graduate student and staff programmer) developing computational solutions, database design and machine learning paradigms for large datasets in the area of genome sequencing.

  • Develop dynamic query system for supporting phenotype mining in genetic studies

    Last Updated:

    Rady Children's Hospital San Diego is the second largest children's hospital in the country and has recently adopted an advanced electronic medical record system. The goal of this project is to develop code to search the clinical data archive to identify patients appropriate for clinical research in collaboration with other clinical bioinformatics groups on campus.

    • Integrate hospital electronic medical record with an open-source environment for data mining to create a query system aimed at supporting identification of patients for research.
    • Combine different conditions relative to the electronic medical record data (e.g., the presence of a particular pathology) to identify those most appropriate for study.
    • Develop a multidimensional database to store/retrieve clinical data and update dynamically.
  • Informatics approaches in deciphering exomes

    Last Updated:

    We are in the middle of analyzing exome data from an accruing group of >1500 individuals with brain disorders, and are looking to develop better algorithms to uncover pathogenic mutations.

    • Develop code to sift through the 30 megabases of sequence per patient.
    • Identify novel genes involved in disorders like epilepsy and autism.
    • Develop platforms for storage, visualization, annotation, comparison and recovery of these large datasets.
    • Improve ability to predict patient outcomes and identify "actionable items"

| School of Medicine

Our overall goal is to understand how chromatin structure is employed in making cellular fate decisions, its dynamics, and how it is shaped and maintained by different chromatin regulators (CRs). We merge basic biology, genomics and technology development.

  • Characterize cell cycle asociated chromatin dynamics and epigenetic memory

    Last Updated:

    Both DNA sequence and the organization of DNA associated proteins are transmitted during cell divisions. This organization is disrupted during the cell cycle, and the original structure of chromatin must be restored after each cell division, a process termed ‘epigenetic memory’. This project, in collaboration with the Simon and Gymrek labs, focuses on the mechanisms that provide cellular epigenetic memory. We merge cell cycle synchronization methods, automated ChIP-seq and computational framework to model temporal dynamics of CRs and histone modifications during the cell cycle and predict critical regulators of cell cycle epigenomic maintenance. We will functionally evaluate these predictions using an inducible degradation system to perturb specific CRs and then follow the dynamics of the histone modifications in the perturbed cells.

  • Study the principles of the genomic CR organization and recruitment

    Last Updated:

    Chromatin structure, the genome-wide organization of DNA-associated proteins, plays a key role in cellular fate decisions. Histone modifications (HMs) are set (‘written’), removed (‘erased’) and bound (‘read’) by enzyme complexes termed Chromatin Regulators (CRs). Multiple recent studies have demonstrated that aberrations in the function of CRs are highly implicated in diseases such as cancer and developmental disorders. In our previous efforts we initiated a systematic study of the genomic organization of CRs. Following up on this, we employ our high-throughput automated ChIP-seq approach to map CRs along differentiation and use the data to computationally identify and predict key developmental CRs. We then test these predictions by perturbing the protein levels of these candidate CRs by inducible degradation and assaying the effects on differentiation.

| School of Medicine

Our overall goal is to understand complex genetic variants that underlie human disease. We are particularly interested in repetitive DNA variants known as short tandem repeats (STRs) as a model for complex variation. Our work focuses on developing computational tools for analyzing and visualizing complex variation from large-scale sequencing data and applying these tools to learn about the contribution of repetitive variation to human disease.

  • Analyzing repeat expansions in human genomes

    Last Updated:

    Tandem repeats (TRs) have been implicated in more than 40 Mendelian diseases, such as Fragile X Syndrome and Huntington's Disease. While a variety of tools can now profile short TRs from next generation sequencing data, these approaches do not immediately expand to longer TRs that cannot be spanned by a single sequencing read. In this project, we will model properties of read alignments at long TRs in order to build a statistical framework to detect expanded repeats in the genome. We will also evaluate the ability of long read (e.g. Nanopore, PacBio) and synthetic long read technologies (e.g. 10X Genomics) to capture long repeats.

  • Genome-wide association studies of short tandem repeats

    Last Updated:

    Genome-wide association studies (GWAS) have uncovered thousands of genetic loci associated with human phenotypes. However, in most cases these loci fail to explain the majority of trait heritability. Importantly, GWAS has focused largely on simple single nucleotide polymorphisms (SNPs), and therefore are likely to miss important contributions from more complex types of variants. In this project we will (1) measure the power of SNP-based GWAS to detect repeat associations (2) evaluate the ability of haplotype-based tests to identify underlying repeat associations and (3) use these results to inform a method to perform STR association tests.

  • Phasing and imputing short tandem repeats

    Last Updated:

    Imputation is a vital step of genome-wide association studies. It leverages the correlation structure in the genome induced by recombination to learn about genome-wide polymorphisms by only genotyping a small subset of variants. While imputation of single nucleotide polymorphisms (SNPs) has proven to be quite robust, statistical phasing and imputation of tandem repeats (TRs) in unrelated samples is challenging, largely because TRs and SNPs have diminished linkage disequilibrium due to the fast mutation rates and high prevalence of recurrent mutations in TRs. The goal of this project is to evaluate phasing and imputation techniques at TRs by leveraging two types of data: (1) family-based data, which allows tracing inheritance patterns to infer phase and (2) long read and synthetic long read technologies, which allows physical phasing of TRs with nearby SNPs.

| Biological Sciences

Our laboratory aims to obtain a quantitative and predictive understanding of how complex biological systems operate and function. We focus on the two complementary directions: systems biology analysis to deconstruct natural systems and synthetic biology to build artificial systems analogous to natural systems.

  • Understanding the design principles of gene regulatory networks

    Last Updated:

    Understanding the design principles of gene regulatory networks using systems and synthetic biology approaches. In particular, we will study the function of parallel transcription factor systems in single cells.

  • Understanding the feedback regulation of MAPK system

    Last Updated:

    We will use the MAPK system in yeast as a model to study the function of feedback regulation, in particular, how feedback loops contribute to signal specificity and cellular information encoding. We will combine single-cell imaging analysis, quantitative biochemical assays, and computational modeling in the project.

| Biomedical Informatics

The Oncogenomics laboratory is located in the Moores Cancer Center. Its research program is focused on the identification of genetic and epigenetic markers for cancer prevention and progression as well as drug response. The laboratory is a humid laboratory, combining both wet-lab techniques and bioinformatics analysis to study cancer samples from patients and animal models of cancer. The laboratory is also an important partner for multiple principal investigators at the Moores Cancer Center, collaborating on the design, analysis and interpretation of their genomic experiments.

  • Development of Genomics Virtual Machines in HIPAA compliant cloud

    Genetic information is considered protected health information (PHI) and as a consequence the highest security standards need to be applied for its storage, analysis and sharing. The oncogenomics laboratory is using state of the art iDASH compute cloud for its main computation. As a consequence, we participate in the development of optimal workflows and virtual machines for the analysis of patient-derived genomic datasets such as whole exomes, whole genomes, RNA-seq or genotyping arrays. 

    In this project we will develop robust provisioning methods to establish virtual machines capable of running popular human genomic analysis workflows. We will benchmark these machines and workflows and convert some of them into standard recipes for production-grade, reproducible genomic analysis.

  • Genetic and epigenetic of cisplatin resistance

    Last Updated:

    Cisplatin (cDDP) is the most commonly used chemotherapeutic drug, but most cancer eventually become resistant, leading to tumor recurrence. Several biological processes may modulate cDDP sensitivity: Drug import, export, detoxification, DNA repair, apoptosis. Drug resistance is transmitted to daughter cells, and one can build up resistant cell lines in vitro using sequential treatments. We are interested in identifying the genetic mutations that mediate this resistance. For this, we have derived resistant cell-lines from single clones of a cDDP sensitive ovarian cancer cell line. Using exome sequencing as well as target sequencing, we propose to determine mutations in genes and pathways that drive drug resistance. We will then expand the findings to the TCGA samples, using time to recurrence as an indicator of drug sensitivity.

  • The role of inherited variation in cancer somatic landscape

    Last Updated:

    The role of germline or inherited variation in cancer has been studied in selected families and led to the identification of genetic variants that are dominant and responsible for cancer syndromes. Similarly, rare recessive variants with lower penetrance are responsible for the increased risk in breast and ovarian cancer (BRCA1/2). More common variants in the population have also been identified through GWAS, and have revealed multiple SNPs associated with a modest increase in cancer risk. Despite these advances, multiple variants of intermediate allelic frequency in the population, or carried by patients with undocumented family history still remain variants of unknown significance (VUS) and can still play a role in tumor development. In addition, the contribution of variants located outside of the coding region has been underexplored and can now be reexamined in the light of recent maps of the regulatory landscape. The long-term goal of this research is to utilize germline genetics variation in cancer prevention and care to better stage patients or predict their response to treatment.

    We propose to identify the germline variants in the UCSD Cancer center patients (targeted gene panel) as well as in the public TCGA/ICGC datasets (whole genomes). We will then test these variants, alone or in combination to identify the ones that impact cancer onset, the tumor somatic landscape or tissue-specific regulatory network. The project will involve the processing of high throughput sequencing data, population genetics, and statistical analysis, in a HIPAA compliant cloud-computing environment.

| Biological Sciences

  • Modeling the dynamics of gene regulation

    Last Updated:

    This rotation project will involve two questions in the area of modeling the dynamics of gene regulation. We have generated fluorescence data from a system where a promoter is driven periodically with an external inducer and the resulting gene expression has been measured using GFP. The first question involves the deduction of a mesoscopic dynamical model from the data, and how this approach can be generalized to characterize a library of promoters for use in constructing genetic circuits. The second question is how the data and the resulting mesoscopic model can be used to constrain a large set of parameters that define a microscopic model.

| School of Pharmacy

  • Bionformatics of Mass Spectrometry Peptide Spectral Libraries for Identification of Neuropeptides

    Last Updated:

    Identification of small peptides by mass spectrometry is a distinct discipline compared to protein identification. Small ‘peptides’ in biological tissues encompass a large range of amino acid lengths from about 3-40 residues; whereas, ‘tryptic peptides’ derived from proteins have a narrow range of length of about 7/8-20 residues. Current mass spectrometry instrumentation and bioinformatics are not well designed for ‘peptides’, but have been designed for ‘proteins’ that are identified by tryptic digests. The bioinformatic challenge in this project is to design new algorithms based on peptide spectral mass spectrometry data for identification of peptides of unusual lengths. This project has tremendous relevance to all biological systems, because we are now missing identification of an entire class of peptides. The computational peptide spectral library has high impact for the biological and bioomedical sciences. This project is being conducted as a collaboration with the laboratories of Dr. Vivian Hook and Dr. Nuno Bandeira.

  • Neuropeptidomic Profiles in the Brain and Neurological Diseases

    Last Updated:

    Multiple peptide neurotransmitters are are essential for cell-cell communication among neuronal cells of the nervous system for regulation of brain and physiological organ systems. The bioinformatic challenge to address in current neuropeptide research is how to define profiles of neuropeptides, neuropeptidomes, utilized in health and diseases of the nervous system? These ‘neuropeptidomes’ will be investigated by nano-LC-MS/MS mass spectrometry with particular attention to organization of bioinformatic tools to optimize peptide identifications. An example of this research was published this year (Gupta et al., 2010). This research is conducted as a collaboration by the laboratories of Dr. Vivian Hook, Dr. Pavel Pevzner, and Dr. Nuno Bandeira.

| Physics

Terence Hwa, ​Departments of Physics and Molecular Biology

The Hwa lab (a.k.a. the Quantitative Microbiology Lab) uses a combination of experimental and theoretical approaches to elucidate the organizational principles of living systems. The goal is to quantitatively characterize the physiological behaviors and understand how they arise in terms of the underlying molecular interactions. Our lab focuses on the bacterium E. coli, because it is perhaps the best characterized in terms of molecular components and interactions. But we do also study higher organisms together with collaborating labs. Please visit our lab webpage (https://matisse.ucsd.edu) for further information.

  • Quantitative studies of bacterial physiology

    Last Updated:

    An outstanding challenge in making biology quantitative and predictive is how to deal with the millions or even billions of missing parameters that describe the underlying molecular interactions. In recent years, our lab pioneered a top-down approach which exploited a number of phenomenological laws to accurately predict the physiological responses of bacteria to environmental and genetic changes (e.g., nutrients, antibiotics, heterologous protein expression) [DOI: 10.1126/science.1192588]. Furthermore, insight from this quantitative physiological approach is able to pinpoint key missing molecular interactions in long-studied biological processes [DOI: 10.1038/nature12446]. The lab has a number of projects further extending this basic approach to a variety of problems in microbiology, including growth transitions, stress response, antibiotic resistance, and biofilm formation.

| Psychiatry

The lab has a variety of bioinformatics projects aimed at improving understanding of the functional impact of autism risk mutations derived from exome and whole genome sequencing of the patients. We created mouse models carrying some of these mutations using CRISPR/Cas9, and also produced patient-derived cerebral organoids with autism risk mutations. We performed bulk RNA-seq from various brain regions or time periods in these models. Gene-level analyses of RNA-seq data has been completed (manuscripts in preparation). We are now pursuing isoform-level analyses of these data to better understand functional impact of autism risk mutations on splicing isoform transcriptome.

  • Isoform transcriptome of Cul3-HET mouse model

    Last Updated:

    The project deals with constructing the isoform-level co-expression and protein interaction networks for predicting functional impact of mutations in high risk autism gene Cul3. We have collected RNA-seq and TMT-proteomics data from various brain regions of Cul3+/- transgenic mouse. We are aiming at integrating isoform-level RNA-seq data with quantitative proteomic (peptide-level) data from the same samples to understand the impact of Cul3 mutation.

  • Isoform transcriptome of patient-derived cerebral organoids from 16p11.2 CNV carriers with autism

    Last Updated:

    Copy number variants (CNVs) represent significant risk factors for Autism Spectrum Disorders (ASD). One of the most frequent CNVs involved in ASD is a deletion or duplication of the 16p11.2 CNV locus, spanning 29 protein-coding genes. Despite the progress in linking 16p11.2 genetic changes with the phenotypic (macrocephaly and microcephaly) abnormalities in the patients and model organisms, the specific molecular pathways impacted by this CNV remain unknown. We generated bulk RNA-seq and TMT proteomic data from patient-derived cerebral organoids (3 deletion, 3 duplication and 3 control patients). The goal of the project is to analyze isoform-level RNA-seq data, as well as proteomics data to investigate functional impact of 16p11.2 CNV.

| Division of Medical Genetics

​The overall objective of the Ideker Laboratory is to develop an artificially intelligent model of the cell able to translate a patient's data into precision diagnosis and treatment. For this purpose, we run an experimental facility for mapping the gene and protein interaction networks that govern eukaryotic cell biology, with a focus on pathways underlying cancer and neurodevelopmental disorders.

Our major bioinformatics challenge is to model how cells process information from genotype to phenotype. Towards this goal, we develop machine learning methods that attempt to learn cell structure and function directly from genome-scale datasets:

However, much remains to be done before we have a cell model capable of making robust predictions about patients. A recent breakthrough on this front was our creation of the D-Cell, a deep neural network modeling the inner workings of a eukaryotic cell. We are also developers of Cytoscape, a popular platform for visualization and modeling of biological networks which is supported by a consortium of many labs including our own (https://cytoscape.org/).

  • Computing a minimal set of genes required for life

    Last Updated:

    A long standing question in biology is how many (and which) genes are required for life. This essential core set of genes, or minimal genome, makes up the cell's “life support system” or “chassis and power supply” on which more complex functions and processes are built. This set of genes is of keen interest in the field Synthetic Biology, which aims to synthesize the complete minimal genome of an organism and add additional functions to this genome for biotechnological, pharmaceutical and agricultural ends. This project will attempt to use our whole-cell model of the networks and pathways in a cell to predict which genes and gene combinations are essential for life and, conversely, which genes and gene combinations can be removed. If successful, this project will be able to predict minimal genomes for synthesis and testing. It will also address whether there actually is a single “minimal genome” or whether there exist many different configurations all of which are near or at the global minimum.

    Prerequisites: Computer programming or scripting skills.
    Optional: Experimental laboratory skills, which would allow student to make tests of model predictions.

  • Development of a software pipeline for generating cell function hierarchies from genomic data

    Last Updated:

    We have developed algorithms (NeXO and CliXO) by which systematic datasets are used to organize genes into a gene ontology, reflecting the hierarchical organization of cellular structures and molecular pathways in the cell. Currently these algorithms are coded in Python; however, a user-friendly and expandable interface would allow end-users to quickly build and update gene ontologies from new data sets. Coding of this interface is the main goal of this rotation; If successful, this tool could seed a thesis project to construct a gene ontology for a particular cellular process (e.g. DNA damage response) or disease (e.g. cancer) of interest.

    Prerequisites: Computer programming or scripting skills; some knowledge of genomic biology.

  • Experimental mapping of the DNA damage response

    Last Updated:

    Cell colonies on agar grow in a near linear fashion with growth rates reflective of their "fitness". The laboratory has developed an experimental platform that can make continuous measurements of growth rates via time-lapse image capture of thousands of specific genetic mutant strains, enabling us to determine the relevance of every gene in the response to stimuli such as DNA damage via radiation or chemotherapy. During the rotation the student will grow ~50,000 cell colonies in parallel and capture their growth curves using digital images and intermittent radiation exposure. The project includes working in Matlab for the analysis of growth curves and the elucidation of DNA damage response pathways. If successful, the project could be developed into a thesis which uses these data to construct a hierarchical model of DNA damage responses.

    Prerequisites: Prior experience in a genetics or biochemistry experimental laboratory.

  • Using a hierarchical cellular model to analyze tumor genetic mutations

    Last Updated:

    The student will explore whether a hierarchical model we have recently constructed for predicting growth of simple cells can be translated to predict aggressiveness of human cancer. The model will be provided, along with access to tumor exomes from both public and internal sources. The goal is to determine, over a 10 week rotation, whether and to what extent the model can be used to analyze a patient's exome. If so, this project could be readily developed into a PhD thesis.

    Prerequisites: Computer programming or scripting skills; some knowledge of genomic biology.

| Physics

We are interested in fundamental questions in biology, and we bring tools and ways of thinking from physical sciences and engineering. Researchers in our lab are proficient in both experiment and theory.

  • Single-cell Physiology

    Last Updated:

    We will use the “mother machine,” a microfluidic continuous cell culture device developed in our lab, to follow thousands of individual cells for hundreds of consecutive generations. We will analyze the images using high-throughput image analysis methods that use advanced algorithms. We will focus on the single-cell physiology in the context of cell size control, cell cycle control, and cell death.

| Biological Sciences

Cells must continuously maintain integrity and compartmentalization with demands for cellular remodeling throughout development, immunity, aging and disease. Using functional genomics, genetics and cell biological approaches in the fruit fly, Drosophila, we are studying the central roles for membrane regulation of dynamic cell structure. We have identified novel endocytosis and autophagy membrane trafficking pathways that control macrophage and muscle remodeling, with relevance to human disease. Current projects in the lab aim to discover new mechanisms of cellular remodeling through functional genomic and proteomic approaches, and to better understand the pathway networks and dynamics during cellular remodeling.

  • Autophagy protein and functional networks

    Last Updated:
    • Expand on our ongoing co-immunoprecipitation and mass spectrometry datasets to identify protein-protein interactions involved in autophagy.
    • In collaboration with the Ideker lab and the SDCSB Network Assembly Core, analyze coIP results and incorporate functional data into an ‘autophagy network’.
    • Test new insights predicted from network by in vivo autophagy assays.
    • Future directions include a hierarchical analysis of dynamic protein-protein interactions in an autophagy timecourse, and RNAi screens of the autophagy network and for new autophagy gene functions.
  • High-throughput image analysis of cell morphology

    • In collaboration with the Tsimring lab at the BioCircuits Institute, optimize newly developed machine learning image analysis algorithms to quantify cell shape and cell shape changes.
    • Conduct new RNAi screens to test optimized image analysis algorithms, and employ established methodology to identify new and modifying (enhancer/suppressor) gene functions in cellular remodeling.
    • Future directions for long-term projects include use of successful image analysis algorithms in other applications, development of related methods for dynamic analysis of cell shape changes, and/or the analysis of large-scale RNAi screen image data into functional networks.

| Pediatrics

The Knight lab has broad interests in the human microbiome, the collection of trillions of microbes that inhabits our bodies, especially in developing techniques to read out these complex microbial communities and use the resulting data to understand human health, links between humans and the environment, and to prevent and cure disease. We offer a fast-paced environment with many collaborative opportunities on different projects.

  • Machine Learning for the Microbiome

    Last Updated:

    We have amassed a database of microbial DNA sequences from hundreds of thousands of biological specimens. Understanding how these changes relate to disease requires a range of machine learning and multivariate statistical approaches. There are many opportunities ranging from entry-level (benchmarking classifier performance on specific sample sets) to extremely challenging (using deep learning to infer the structure of global sample set relationships).

  • Multi-omics integration

    Last Updated:

    An increasing need is to integrate data from different "omics" level, e.g. genomes, metagenomes, metatranscriptomes, metaproteomes, metabolomes, immunological profiling, etc., into a single coherent picture separating healthy and disease states. Improved methods for performing this task, either directly or via intermediate representations such as mapping to metabolic and regulatory pathways, is essential for improving understanding. Projects in this category range from simple (testing where existing techniques like correlation networks or Procrustes analysis do/don't connect two specific data layers) to challenging (use transfer learning to integrate heterogeneous data layers and improve the underlying network annotation). An especially exciting emerging research direction here is XAI (explainable artificial intelligence), which can provide for clinical applications a better justification for a specific classification or suggestion.

  • Optimizing microbiome algorithms

    Last Updated:

    Many algorithms used in microbiome studies, especially in metagenomic assembly, are extremely computationally expensive. Opportunities exist for either exploiting new hardware architectures to accelerate existing algorithms, or for developing new approximate algorithms, to tackle problems in the workflow including inferring taxonomy and function from DNA sequence data, genome and metagenome assembly and annotation, computing community distance metrics from sparse compositional data, and high-level analyses of hundreds of thousands of microbiomes. Again these projects range from entry level (compare results of two multiple sequence alignment techniques for subsequent community analysis) to advanced (use non-von Neumann architectures to perform pattern classification in real time at the whole community level for disease detection).

| School of Medicine

Research in the Kolodner lab is primarily directed at studying the genetic and biochemical mechanisms of genetic recombination, DNA repair and suppression of spontaneous mutations primarily using Saccharomyces cerevisiae as a model system. Work in S. cerevisiae falls in two interrelated areas - 1) the analysis of the proteins and genes that function in DNA mismatch repair; and 2) elucidation of the pathways that prevent translocations and other types of gross chromosomal rearrangements, and the analysis of the proteins that function in these pathways. We also have research interests in the area of investigating the genetics of cancer susceptibility and development that follows on previous studies showing that a common cancer susceptibility syndrome, Lynch Syndrome (hereditary non-polyposis colorectal cancer), is due to inherited defects in DNA mismatch repair genes. This work is focused on understanding whether genes that prevent genome instability act as tumor suppressor genes in mouse and humans and whether it is possible to develop therapeutics that target genome instability seen in cancer.

  • Analysis of synthetic lethality in mammalian cells

    Last Updated:

    A number of genetic defects that cause increased cancer susceptibility, such as mutations in the breast cancer susceptibility gene BRCA2, cause defects in DNA repair and lead to increased genome instability. We have used yeast to identify genes that when mutated cause increased genome instability and have also identified other genes, so called synthetic lethal genes, in which mutations specifically kill cells that contain a mutation causing increased genome instability cause lethality. We are using RNAi knockdown and chemical genetic approaches with human tumor cell lines containing defined genetic defects to determine if the same synthetic lethal relationships can be demonstrated in human tumor cells. This project will provide experience in basic molecular biology, yeast genetics and mammalian genetic approaches.

    Sakai, W, Swisher, EM, Jacquemont, C, Venkatapoorna Chandramohn, K, Couch, FJ, Langdon, SP, Wurz, K, Higgins, J, Villegar, E, and Taniguchi, T. Functional restoration of BRCA2 protein by secondary BRCA2 mutations in BRCA2-mutated ovarian carcinoma. Cancer Res. 2009;69:6381-6.

  • High throughput yeast genetics analysis of pathways that prevent genome instability

    Last Updated:

    We have developed assays that allow us to measure the rates of accumulation of genome rearrangements such as translocations in the yeast Saccharomyces cerevisiae as well as different assays that allow us to measure DNA damage responses that underlie the formation of genome rearrangements, such as a fluorescence based assay for phosphorylation-dependent degradation of Sml1 that can be followed microscopically. We have adapted these assays for use in a high-throughput robotic mating scheme against a selected subset of the yeast deletion collection to systematically generate single and double mutants containing these assays. Analysis of these mutants will allow elucidation of the genetic networks that prevent genome instability as well as those that function in the DNA damage response. Variations on this project include performing a global mutant analysis, investigating specific pathways such as double strand break repair and their interacting pathways, and bioinformatics analysis of the genetic data for network construction and using next generation sequencing methods for characterizing the structure of genome rearrangements. This project will provide experience in basic molecular biology, yeast genetics, basic cell biology methods, bioinformatics network construction, next generation DNA sequencing and the use of robotic methods.

    Putnam, CD, Hayes, TK, and Kolodner, RD. Specific pathways prevent duplication-mediated genome rearrangements. Nature. 2009; 460:984-989.

| Reproductive Medicine

Our lab applies our expertise in human pluripotent stem cell research and genomics to understand the molecular mechanisms underlying normal and abnormal human development, in order to improve the health of mothers and babies.

  • Discovery of biomarkers of pregnancy complications

    Last Updated:

    Using extracellular RNA, proteomic, and metabolomic analysis of maternal blood, we are discovering biomarkers associated with a variety of pregnancy complications, including preeclampsia, preterm birth, and fetal growth restriction.

  • Discovery of regulatory network driving cellular differentiation

    Last Updated:

    Integrative genomic approaches, including DNA methylation and chromatin analyses, as well as bulk and single-cell RNA sequencing, are used to study the stepwise progression of human pluripotent stem cells during differentiation to the pancreatic, hepatic, and placental lineages.

| Salk Institute for Biological Studies

In eukaryotic organisms DNA is organized into chromatin, a combination of DNA and specialized packaging proteins called histones, which serves as a backdrop for a vast array of essential cell biology including DNA replication, transcription, recombination and repair. However chromatin is not uniform. An extra layer of information is added at precise genomic locations via the covalent attachment of various chemical tags, termed chromatin modifications, to DNA and/or histones. These modifications alter the landscape of the genome and play critical roles in chromatin biology by communicating with the rest of the cellular machinery through mechanisms that remain poorly understood. Gaining insight into this aspect of chromatin function will be critical not only in understanding normal biological processes but also in understanding how alterations in chromatin modifications can lead to developmental defects and disease. To begin investigating these complex and diverse facets of chromatin my lab will focus on the characterization of proteins, termed “chromatin readers”, which bind specific DNA or histone modifications, and thus offer an excellent tool to begin dissecting the events occurring downstream of chromatin modifications. The plant model Arabidopsis thaliana provides an ideal system to study the effects of chromatin modifications: It is genetically malleable, highly amenable to genome-wide analyses and tolerant of dramatic changes in its chromatin landscape. Furthermore, many of the pathways involved in the establishment, maintenance, and removal of chromatin modifications are conserved between plants and mammals. Thus, understanding how chromatin modifications influence cellular processes in plants will also be informative for mammalian systems.

  • Integrating chromatin readers, the epigenetic landscape, and specific cellular outputs

    Last Updated:

    The genome wide distribution of chromatin binding proteins is influence by many factors including the local chromatin environment, other chromatin modifications, and the underlying DNA sequence.  Once recruited to chromatin such proteins often further modify chromatin structure and function, setting into motion a complex and dynamic interplay between chromatin and specific cellular machinery that is required to orchestrate diverse biological processes.   We are interested in integrating these genomic and epigenomic features, as assessed using next generation sequencing technology, within the context of specific chromatin readers to gain insight into their biological functions on a genome-wide scale.  Several projects along these lines are available.

| School of Medicine

Our goal is to identify genes causing insulin resistance in humans in order to find new therapeutic targets for diabetes and cardiometabolic diseases. Our approach to discovery is grounded in human genetics, clarified through systematic, high throughput experimentation in human cells, and calibrated by its relevance to clinical disease. We use massively parallel genome engineering to re-create mutations identified in patients and develop high-throughput assays to interrogate function in human cell models. We apply bioinformatics and statistics to make sense of this data integrating 1) human mutations, 2) cellular function, and 3) metabolic/glycemic phenotypes of the individuals who harbor them. Using this approach, we have discovered novel missense mutations that greatly increase risk for type 2 diabetes. As a complementary aim towards precision medicine, we develop tools for clinical genome interpretation powered by high-throughput experimental data.

  • Evaluating accuracy and clinical utility of commercially available genetic risk scores for diabetes

    Last Updated:

    Recently, 23andMe, which sells direct to consumer genetic testing products, has introduced a diabetes risk report based on single nucleotide polymorphisms (SNP) genotypes measured in their commercial product ($199: https://www.statnews.com/2019/03/10/23andme-will-tell-you-how-your-dna-affects-your-diabetes-risk-will-it-be-useful/). The clinical utility of this report is unclear and has generated significant controversy. Critically, 23andMe’s SNP-chips only test about 0.02% of the human genome. We have shown in previous work that a single rare SNP, not captured by SNP-chips, can change an individual’s risk of diabetes by 7-fold. The purpose of the project is to test the 23andMe diabetes report output in a dataset of individuals whose diabetes status is known and who have also undergone more extensive genome sequencing (whole exomes) to assess the accuracy of direct to consumer SNP tests and quantify the number of falsely reassuring tests when more complete genetic information is considered.

  • Identifying discriminators of drug-responsive mutations in Mendelian diabetes

    Last Updated:

    Loss-of-function mutations in hepatocyte nuclear factor 1 (HNF1A) cause autosomal dominant diabetes of the young (MODY3). Patients with MODY3 clinically are difficult to distinguish from patients with autoimmune type 1 diabetes and are therefore often given the same treatment consisting of multiple daily injections of insulin. However, MODY3 patients can be effectively treated with a single daily tablet of sulfonylureas and thus spared from having to take multiple daily injections. This project aims to utilize data generated from cells engineered to express a range of HNF1A mutations (MODY3 and non-MODY) followed by RNA-sequencing to identify a signature of genes that can distinguish between sulfonylurea responsive mutations and non-responsive mutations. This transcriptomic signature would form the basis of a biomarker test in patients with HNF1A mutations to predict their responsiveness and provide the most effective, least burdensome treatment.

  • Integrative genomics to identify a novel disease-causing mutation in the Simpson Golabi Behmel Syndrome (SGBS)

    Last Updated:

    The SGBS syndrome is characterized by overgrowth of multiple body parts. It is a rare genetic disease that has been attributed to inactivating mutations in GPC3. We have stem cells from a patient with SGBS syndrome but NO GPC3 mutation implicating another as yet unknown causal gene. We have performed whole genome sequencing and RNA sequencing on these cells. The goal of this project is to identify the causal gene utilizing the genomic data sets to create a “short list” of causal genes which then can be assessed experimentally in the patient cells using genome engineering.

| Bioengineering

  • Analysis of gene network perturbations in human cells

    Last Updated:

    In this project we will develop new methods for perturbation and analysis of gene networks in pluripotent stem cells using the CRISPR-Cas systems. The student will have an opportunity to understand the experimental procedures for human genome engineering, to learn the state-of-the-art methods for bioinformatics analysis ranging from computational reagent design to next generation sequencing to network modeling, and to also develop innovative strategies for enhanced multiplexed reverse genetic screens in human cells.

| Chemistry and Biochemistry

The McCammon group conducts a very wide range of research activities, from the deeply biological (studies of protein and nucleic acid targets for drugs for infectious diseases, studies of protein kinase regulation, etc.) to the development of mathematical and physical methods for simulating biological processes (development of methods for solving partial differential equations, exploring the role of hydrodynamic interactions in protein-protein association, etc.). All of this work involves the use of computers; we do no experimental work in the traditional sense, but we have extensive collaborations with experimental labs at UCSD and elsewhere. A more complete perspective can best be obtained by visiting the McCammon group website (https://mccammon.ucsd.edu/). We co-mentor graduate students who are affiliated with other UCSD groups. We welcome undergraduate research participants when space allows, as described in our website.

  • Computer-aided Drug Discovery

    Last Updated:

    Current drug discovery projects are targeting SARS-CoV-2, TB and other pathogens.

| Bioengineering

  • Systems Biology of Hypoxia Tolerance and Susceptibility

    Last Updated:

    This project combines bioinformatics and systems biology to design experiments, analyze high-throughput/high­‐content data (on DNA sequence, gene expression, protein‐DNA interactions and metabolomics), reconstruct network models and develop new hypotheses on the molecular mechanisms of hypoxia susceptibility and tolerance in Drosophila, mouse and humans. This project includes the opportunity to interact with investigators from several labs and to participate in experimental studies and data acquisition as well as performing systems biology analysis. Individual rotation projects will address smaller questions within this overall theme based on the specific interests of the student.

| Salk Institute for Biological Studies

The McVicker laboratory aims to understand how chromatin state and organization are encoded by the human genome. Our approach to this problem is to exploit naturally occurring human genetic variation to identify sequence variants that disrupt chromatin function. We are currently focused on chromatin within immune cells and we are also interested in how variants that affect chromatin and gene regulation lead to disease risk. The problems that we work on often require the development of sophisticated computational and statistical methods that can extract subtle signals from noisy experimental data.

  • A genetic approach to identifying enhancer-promoter interactions

    Last Updated:

    Enhancers activate genes from a distance, but little is understood about how they find the correct promoters to regulate. Enhancers will often activate non-adjacent genes and will 'skip-over' the promoters of nearby genes. The goal of this project is to develop a method to identify genetic variants that jointly affect enhancer activity and transcription from promoters (i.e. joint enhancer-promoter QTLs). This will allow us to determine which enhancers regulate which genes and learn about the sequences that are involved in enhancer-promoter interaction.

  • Discovery of genetic variants that disrupt enhancer-promoter looping

    Last Updated:

    Enhancers-promoter interaction is believed to occur through chromatin looping. The goal of this project is to identify genetic variants that disrupt chromatin looping between enhancers and promoters using Hi-C experimental data.

  • Using chromatin to identify important cell types for Systemic Lupus Erythematosus (SLE)

    Last Updated:

    SLE is a complex autoimmune disease that is poorly understood. The goal of this project is to use cell-type specific chromatin marks and GWAS data to determine which cell types are most relevant to this disease. This project will involve development and/or application of methods that can correctly take into account linkage disequilibrium and sub-genome-wide significant GWAS hits to identify cell-type specific enrichments of GWAS signals. SLE is far more common in women than in men, and potentially it will be possible to use sex-specific chromatin accessibility or marks, (depending on what samples are available).

| Bioengineering

Metabolism touches virtually all aspects of biology, and defects in the regulation of biochemical reactions often contribute to disease pathogenesis. In the context of cancer, tumors must reprogram these networks to fuel their aggressive growth and survive. The complexity and interconnected nature of metabolic pathways necessitates the use of systems biology approaches to characterize their function. Our lab develops and applies quantitative methods to study metabolic regulation in cancer, stem cells, and the cardiovascular system. These data-driven metabolic flux analysis (MFA) approaches involve stable isotope tracers, mass spectrometry, and computational algorithms that analyze data as a system. We use these techniques to identify metabolic dependencies in cancer cells which are driven by tumor genetics or the surrounding microenvironment.

  • Metabolic flux profiling in cancer

    Last Updated:

    Systems biology rotation projects are available to probe various metabolic pathways in cancer cells, including carbohydrate, fatty acid, and amino acid metabolism as well as redox pathways. Students will have the opportunity to build network models and generate MFA data in engineered cancer cells.

| Electrical and Computer Engineering

Our lab specializes in reconstruction of evolutionary histories (phylogenies) from large scale datasets and applications of phylogenetic analyses to downstream analyses. Large-scale datasets include those with many genes and those with many species, and we focus on high accuracy and scalability at the same time. Many projects in this area are available, some of which are described below, but students can contact me to start on other projects as well.

  • Multiple sequence alignment

    Last Updated:

    Developing methods for computing a consensus among large numbers of large multiple sequence alignments using the concept of an equivalence class.

  • Reconstruction of species trees from gene trees

    Last Updated:

    Several projects are available, with different emphasis. Other projects in this general area can also be defined based on student interest.

    1. Improving ASTRAL (an algorithm for species tree reconstruction from gene trees) to handle more varied datasets, to improve scalability as the number of genes increases, and to give better theoretical analysis of the algorithm. An HPC implementation is also of interest.
       
    2. Testing ASTRAL for gene trees that include duplication and loss events in addition to incomplete lineage sorting.
       
    3. Re-analyzing a set of biological datasets using recently developed methods, and comparing their empirical performance

| Psychiatry

We are interested in the relationship between genes and behavior. By identifying genes that influence behavior we hope to obtain fundamental mechanistic insights into the molecular basis of both health and disease. Our research uses mice, rats and humans in pursuit of these goals.

  • Analysis of rat genetic and genomic data

    The Palmer lab hosts a NIDA-funded National Center of Excellence to examine the role of genetic differences in a variety of complex rat behaviors related to drug abuse (www.ratgenes.org). As of March 2020, we have already phenotyped and genotyped over ~7,000  rats as part of this project, and have secured funding to collect an additional 8,000 rats. In addition, we have previously collected similar data from more than 4,000 rats from a different population as well as several thousand mice and are now in the process of collecting similar data on over 5,000 zebra fish. In addition to genotypes at millions of SNP markers, and complex behavioral observations spanning weeks to months, we are also examining RNASeq (including single cell RNASeq), Epigenetic, microbiome and metaboloic data from these populations. The analysis, integration and development of new methods for analysis create numerous opportunities for bioinformatics projects and interactions with the bench scientists who are creating these data. We also collecting human genetic data and are exploring various methods for integrating human and non-human datasets. 

| Bioengineering

In Systems Biology Research Group (SBRG), We study the complexity of cellular life using experimental and computational methods that span the genome to the phenotype. Developing, integrating, and applying new methods allow us to gain a systems view of life – from bacterial to human. 

  • Method development and quantitative data analysis for understanding cellular physiology

    Last Updated:

    Projects are regularly available for developing novel systems biology methods for analyzing cellular physiology. Key application areas include cellular resource allocation, principles of evolutionary optimization, designing microbial cell factories, and human health applications in both bacterial and mammalian systems [doi:10.1038/nrg3643]. Additionally, our lab runs experiments using automated adaptive laboratory evolution (ALE) to observe bacterial evolution in real-time [doi:10.1093/molbev/msu209]. A number of research projects are available for new students to participate in ALE experiments and analyze strains optimized by ALE.

| Computer Science and Engineering

  • Error-correction in next generation sequencing data: applications to structural variations

    Last Updated:

    The recent emergence of high-throughput sequencing technologies has revolutionized genomics by providing a new wealth of data for biologists to learn from. It has facilitated our ability to discover the genomic sequence of novel species through a process called sequence assembly. However, even for a human, whose genome has been already assembled, different individuals have slightly variant genomes, accounting for a myriad of important phenotypical differences. Discovering this variation is often the starting point for finding the underlying genetic causes of diseases, and has, among other things, improved our abilities for early detection of cancer. Despite these exciting applications, the growth of sequencing technology has outpaced the development of downstream algorithms. Many assembly and variation discovery projects have been done without properly addressing a key property of the data – it is prone to errors and is not always reliable. Algorithms for error- correction have been inadequate in many assembly projects and simply non-existent in most variation discovery projects. Accurate error-correction algorithms will have a wide-effect on the quality of downstream algorithms, since they correct potential mistakes at the very early stages.

    The proposed project is to design an effective error-correction algorithm (with an eye towards variation detection and assembly). The design of such an algorithm is a challenging problem, and though we have general ideas, most design decisions remain to be made. The project therefore requires a motivated student with a superb understanding of algorithms and a good handle on programming. Since the datasets for this problem are hundreds of gigabytes in size, an understanding of data structures is crucial. Some background in either machine learning or graph theory is a plus, though not required. No biological knowledge is required.

    The successful completion of this project will ideally lead to a publication in a leading conference or journal. More importantly, it is a good stepping stone into understanding the field of assembly and variation detection and into other related projects in Pavel Pevzner’s lab.

| Pediatrics

  • RNA regulation of immunity

    Last Updated:

    RNA epigenetics or epitranscriptomics is an emerging field focused on chemical modifications in RNA. We are interested in understanding how RNA modifications affect the immune system during viral infections, vaccine development, immunotherapy, and in cancer. We employ in vivo models as well as non-human primates and human tissues to investigate genetics and epigenetics mechanisms of multiple disease states. Single-cell studies and data analyses are being performed to generate a single cell transcriptome and epigenome atlas of human brain regions such as prefrontal cortex, striatum, and hippocampus. Commonly used methods in the laboratory include large scale functional perturbation studies using RNAi and CRISPR, Simultaneous single-cell RNA sequencing and single-cell Assay for Transposase- Accessible Chromatin sequencing (scMultiome-seq), patient-specific stem cell derived brain and lung organoids, drug design and pharmacology, and analyses of immune cells’ functions.

| La Jolla Institute for Immunology

We are interested in all aspects of gene regulation, and have used model systems including cells of the immune system, embryonic stem cells, haematopoietic stem cells and cells of the mouse embryo.

  • Changes in chromatin accessibility during differentiation

    Last Updated:

    To study changes in chromatin during cell fate determination, we are using several models of differentiation in the haematopoietic system as well as transdifferentiation and reprogramming.  The combination of RNA-seq, ChIP and ATAC-seq (Assay for Transposase-Accessible Chromatin, [PMID: 24097267]) has revealed a striking interplay between chromatin regulators involved in histone and DNA modifications.  Additionally, we are using RNA-interference with shRNAs targeting transcription factors and chromatin-related genes to characterize changes in chromatin and gene expression.

  • Identification of long-range interactions in chromatin mediated by transcription factors (ChIA-PET)

    Last Updated:

    Regulatory elements that control gene expression are often positioned far away from the coding regions of genes. The regulatory regions can be “looped”, bringing them and their associated DNA binding proteins close to a gene’s promoter or other regulatory element. We recently identified a unique structural homodimer of the transcription factor FOXP3 that enables this dimer to bind two sites in DNA that may be separated by long distances [PMID: 21458306]. FOXP3 controls the function of regulatory T cells, and mutations in FOXP3 that abolish dimer formation are associated with an autoimmune syndrome called IPEX, owing to defects in T regulatory cell function [PMID: 22224781]. We are using a method known as ChIA-PET [PMID: 22926262] and several next generation sequencing strategies to assess how the FOXP3 homodimer mediates long-range interactions in DNA and controls T regulatory cell function.

  • Interplay between histone and DNA modifications

    Last Updated:

    Histone ubiquitylation is an important modification that is less studied than the more standard histone modifications (methylation, acetylation, phosphorylation).  The two histones that are modified by ubiquitylation are H2A and H2B.  H2A ubiquitylation and deubiquitylation are controlled by several enzymes, many of which are frequently mutated or otherwise dysregulated in different cancer.  Because the DNA methylcytosine oxidase TET2 [PMID: 24220273] and ASXL1, a component of an H2A deubiquitinase complex [PMID: 20436459], are frequently co-mutated in haematopoietic cancers, we are studying the relation between H2A ubiquitylation and DNA methylation status in normal cells and during cancer development.

  • Mapping DNA methylation and oxi-mC distribution

    Last Updated:

    Contribution of TET proteins to DNA methylation patterns and gene expression during normal differentiation and in cancer (mapping DNA methylation and oxi-mC distribution).  Cytosine methylation is an important modification in DNA and controls gene expression by altering transcription factor binding and local chromatin structure.  Our lab discovered the enzymatic activity of Ten-eleven translocation (TET) proteins [PMID: 19372391], dioxygenases that modify DNA methylation patterns by oxidizing 5-methylcytosine (5mC) to 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), and 5-carboxycytosine (5-caC).  We have developed tools to map the location of these modified bases in DNA by next-generation sequencing [PMID: 21552279, 24474761], and have generated mice conditionally deficient for each of the three TET proteins.  We are using these methods and reagents to explore how TET proteins, 5mC and the oxidised methylcytosines (5hmC, 5fC and 5caC, together termed oxi-mC) influence cell differentiation and cancer.

  • Mapping enhancers to interrogate the role of single nucleotide polymorphisms (SNPs) in immune diseases

    Last Updated:

    We have recently completed a study in which we used ChIP-seq for the enhancer-associated modification H3K4me2 to identify enhancer associated with T cell differentiation in healthy subjects as well as patients with asthma [Seumois G et al., Nature Immunology, in press].  Similar studies will be ongoing over the next few years.

  • Modelling of single-cell RNA biology

    Last Updated:

    We are using single-cell RNA-seq to investigate the regulation of gene expression during normal and dysregulated differentiation of mouse embryos and embryonic stem cells. To identify key regulatory factors of gene regulation, it will be crucial to apply existing computational methods and develop novel computational methods for the analysis of these single-cell RNA-seq data, including methods for unsupervised clustering, differential expression, and time-series analyses.

| Cellular and Molecular Medicine

The laboratory’s research is focused on understanding the fundamental mechanisms controlling gene expression in mammalian cells. In particular, the laboratory is investigating three related problems:

  1. What are the transcriptional regulatory sequences that control cell-specific gene expression programs in the mammalian genomes?
  2. How do these sequence elements interact with transcription factors and chromatin binding proteins to regulate gene expression during cellular differentiation?
  3. How do epigenetic mechanisms (DNA methylation and chromatin modifications) influence the gene regulatory process?

  • Dynamic chromatin landscapes in differentiating human ES cells

    Last Updated:

    As part of the NIH Roadmap epigenome project, my lab has comprehensively mapped various chromatin modification status genome-wide in the human ES cells and several partially differentiated cell types derived from the ES cells. Rotation project will be focused on understanding how chromatin modifications dynamics is related to differentiation and cell fate determination.

  • Study of the role and regulation of long-range genomic interactions in the human genome

    Last Updated:

    Many gene regulatory elements can regulate target genes located at a long distance, and this has been thought to occur through DNA-looping. However, the exact role and mechanisms of long-range DNA interactions in human cells have remained to be resolved. We have been able to generate comprehensive maps long-range DNA interactions for different cell types, and are finding interesting principles of genome organization in mammalian cells. Rotation projects will be focused on understanding the relationships between long-range DNA interactions and regulation of gene expression, and the development of predictive models of gene expression.

  • Systematic identification and characterization of transcriptional regulatory elements in the mouse genome

    Last Updated:

    The laboratory, as part of the mouse ENCODE project, is in the midst of producing the first comprehensive map of cis-regulatory elements in the mouse genome. Computational analyses are being conducted to identify the regulatory sequences controlling tissue-specific gene expression, determine the evolutionary conservation of these sequences, and predict potential transcription factors involved in lineage specification and animal development.

| Biological Sciences

The Rifkin laboratory studies how environmental, genetic, and stochastic variation interact to generate phenotypic variation and thereby mold the course of evolution. We use yeasts and nematodes as model organisms and work primarily at the level of gene regulatory and signal transduction networks.

  • Constraints and selection that drive gene family evolution

    Last Updated:

    High quality genome sequences are increasingly available that cover entire genera. This facilitates fine scale investigations of the forces that drive protein evolution. We are using the Caenorhabditis (roundworm) genus and the Saccharomyces (yesat) genus to study the role of selection and constraint on gene family evolution in the context of developmental and physiological networks. A rotation project would include phylogenetic analyses of gene families, evolutionary tests for selection and constraint, and integration with mechanistic and functional data from systems biology.

  • Image segmentation and nuclei identification using deep learning

    Last Updated:

    C. elegans embryos are powerful model systems for studying developmental variation using microscopy because they are transparent and can be fluorescently marked. However, automatic detection/segmentation of fluorescently marked nuclei is a challenging image informatics problem. This rotation would consist of applying deep learning techniques to this problem based on a corups of images that have been previously manually curated. Experience with python is useful and previous exposure/experience with deep learning would also be useful.

  • Simulations of genetic network evolution

    Last Updated:

    In many population genetic models, mutations are assigned a fitness effect - an assigned genotype -> fitness map.  Simulations can then explore how gene frequencies change over time under a number of demographic scenarios.  An alternative is to use a generative model - often times a set of differential equations - that use genotypic information to produce a phenotype and then map this phenotype to a fitness value.  We are using such a simulation framework to study how genetic networks evolve under different demographic situations.  Experience with C++ would be very useful for this project since the basic population genetic framework consists of a library of C++ templates.

  • Statistics on the morphogenesis of hybrid inviability

    Last Updated:

    Hybrids between different species do not, as a rule, do well. We are using microscopy, image informatics, and statistics to understand the developmental biology of hybrid incompatibility. We are imaging the complete development (3D position of every nucleus over time) of hybrid worm embryos in trying to determine whether there are particularly sensitive parts in development where genomes of different species prove particularly incompatible, even if other parts of development proceed properly. A rotation project on this topic would include image informatics in processing the microscopy images and statistical investigation of variation in this complicated trait.

| School of Medicine

Lab Location: CMM-West, Rm. 345

Lab Phone: 858-534-5858

Lab Composition and Activities: Five graduate students from several programs, and a talented group of enthusiastic (also helpful) postdoctoral fellows and a full time laboratory manager. We have one general laboratory meeting, one graduate student-only meeting, and one personal meeting each week. We also have joint lab meetings with two other labs weekly.

Research Interests: Our central laboratory focus this year is to continue to utilize global genomic approaches to uncover and investigate the “enhancer code” controlled by new, previously unappreciated pathways that integrate the genome-wide response to permit proper development and homeostasis, and that also functions in disease and senescence. We have investigated these events in differentiated cells, neuronal development, stem cells, and cancer. Our biological focus is on molecular mechanisms of the “enhancer code” regulating learning and memory; aggressive prostate and breast cancer, and they underlying events of senescence/aging. Epigenomic events studied include non-histone methylation events and non-coding RNAs. We are investigating these events in development, breast and prostate cancers, and in inflammation-based disease, including degenerative CNS disease and diabetes. The emerging importance of non-coding RNAs and regulation of nuclear architecture is rapidly altering our concepts of homeostasis and disease. Our laboratory is “Seq-ing” (RIP-seq, ChIP-seq, RNA-seq, GRO-seq, CLIP-seq, ChIRP-seq), and a new “FISH-seq”, for open-ended discovery of long-distance genome interactions to uncover new “rules” of regulated gene transcriptional programs and new roles for lncRNAs in biology of normal, cancer neuro-affective disorders and aging cells. Coupling this with chemical library screens, we hope to introduce new types of therapies based on targeting specific gene enhancers, histone protein readers and writers, and lncRNAs for cancers and other diseases. Recent surprising findings have been novel roles of lncRNAs prostate and breast cancer, connection between DNA damage repair/transcription and replication, and unexpected roles of enhancer RNAs.

Current interests include:

  • The “enhancer code,” Epigenomics and transcriptional regulatory mechanisms.
  • Roles of by ncRNAs in enhancer function in signal-dependent genomic relocation and in establishing subnuclear architecture.
  • Mechanisms of signal-induced tumor chromosomal translocations events and new chemical screens for inhibitors for breast and prostate cancer.
  • The “enhancer code” or regulation of learning and memory, including Reelin-regulated enhancers.
  • Linkage of DNA damage/repair and transcription.
  • Retinoic Acid regulation of Pol III-transcribed DNA repeats in maintenance of the stem cell state, in neuronal differentiation and in senescence.
  • Molecular mechanisms of prevelant disease associated sequence variations (GWAS) in disease susceptibility loci.
  • “Epigenomics” in neuronal differentiation, cancer, diabetes and degenerative brain disease.
  • Answering the question when and how enhancers arise and became functional (stem cells to mature cell types).

  • Bioinformatics Rotation Projects

    Last Updated:

    Potential projects include:

    • Projects employing use of genome-wide technologies, including ChIP-seq, GRO-seq, CLIPseq-, RNA-seq, and ChIRP-seq, to elucidate molecular mechanisms of regulated enhancer lncRNA actions in cancer and stem cells;
    • Roles and mechanisms of enhancer actions in prostate and breast cancers;
    • Enhancer-based model of neurodevelopment and CNS disorders;
    • New mechanisms of long non-coding RNAs dictating physiological gene regulation in cancer transcriptional programs;
    • Understanding subnuclear structures: Roles of relocation of transcription units between subnuclear architectural structures in regulated gene expression;
    • Chemical library screens to gene signature and translocation responses as an approach toward new cancer therapeutic reagents;
    • Roles of epigenomic regulators and expression of DNA repeats in stem cells, neuronal differentiation and in senescence.

| Biological Sciences

  • Systems Biology and Engineering of Environmental and Drought Tolerance in Plants

    Last Updated:

    Julian Schroeder's research is directed at discovering the signal transduction mechanisms and the underlying signaling networks that mediate resistance to environmental stresses in plants, in particular drought, salinity stress and heavy metal stress. These environmental (abiotic) stresses have substantial negative impacts and reduce global plant growth and biomass production. These environmental stresses are also relevant in reference to climate change and to maintaining available arable land to meet human needs. Research in Julian Schroeder's laboratory is using multidisciplinary approaches including genomics, bioinformatics, cell signaling, network modeling, proteomics and molecular biological towards uncovering the signal transduction network and receptors in plants that translate drought stress hormone reception, CO2 sensing and salinity stress to specific resistance responses. Some of recent research advances are being used in the biotechnology industry with the goal of enhancing stress resistance of plants and crop yields.

    A rotation project will be pursued to model and identify the drought stress and CO2-induced signaling networks based on “omic” scale data sets. Models will be directly tested by wet lab experimentation.

    Julian Schroeder is Co-Director of the Center for Food and Fuel for the 21st Century.  See https://labs.biology.ucsd.edu/schroeder/ for more information on the Schroeder lab.

    Selected publications

    • Nishimura et al., Science (2009).
    • H.H. Hu et al., Nature Cell Biol. (2010).
    • T.H. Kim et al. Current Biol (2011).
    • Xue et al., EMBO J. (2011).
    • F. Hauser et al. Current Biol (2011).
    • B. Brandt et al., PNAS (2012).
    • R. Waadt et al., eLife (2014).
    • A.M. Jones et al., Science (2014).
    • C.B. Engineer et al., Nature (2014).
    • B. Brandt, S. Munemasa et al. eLife (2015).
    • See also: https://labs.biology.ucsd.edu/schroeder/publications.html

| Cellular and Molecular Medicine

Our laboratory is interested in how rare and de novo mutations in the human genome contribute to patterns of genetic variation and risk for disease in humans. To this end, we are developing novel approaches to gene discovery that are based on advanced technologies for the detection of rare variants, including studies of copy number variation (CNV) and deep whole genome sequencing (WGS). Our goal is to identify genes related to psychiatric disorders and determine how genetic variants impact the function of genes and corresponding cellular pathways.

  • Complex Genetic Inheritance of Rare and Common Variation in Autism

    Last Updated:

    To get a more granular understanding of how rare variants and common variants contribute to quantitative variation in psychiatric traits, my lab is leading the whole genome analysis of rare genetic diseases that are being deeply characterized with dimensional measures of psychopathology. In collaboration with clinical research groups at several institutions, we have formed a consortium of rare disease research groups called the Genes to Mental Health Network (G2MH). This project will investigate how multiple genetic factors influence clinical features of autism using rare variant genotypes and polygenic risk scores derived from whole genome sequencing. The trainee on this project will learn how to carry out statistical genetic analysis of large whole genome and phenotype datasets.

  • Whole-genome Analysis of Autism by Pacific Biosciences HiFi genome sequencing

    Last Updated:

    Using new 3rd-generation DNA sequencing technologies, it is now possible to perform complete assembly of chromosomes from telomere to telomere.  This project will investigate new categories of genetic variation that are detectable using the PacBio Sequel II whole genome sequencing (WGS) platform that are not detectable by traditional Illumina WGS, including structural variants (SVs) and tandem repeats (TRs). The trainee on this project will become familiar with basic tools for long read HiFi sequence analysis and will innovate upon current statistical genetic methods to investigate the association of SVs and TRs with disease with a focus on genes located within new regions of the human genome that have recently been assembled.

| Bioengineering

My research focuses on time series analysis in biological systems, with an emphasis on practical information extraction for translational applications. The lab is divided into applications and approaches, though these all serve each other, and students collaborate routinely. Indeed, a positive attitude and an eagerness to support one another is requisite in the lab.  **Applications include but are not limited to: illness detection, prediction, and recovery monitoring; pregnancy detection and outcome forecasting; mental health monitoring; defining sleep in the body (as opposed to EEG); diabetes forecasting; and carbon footprint optimization of distributed computer systems.  **Approaches include, but are not limited to: multimodal time series information extraction; differentiating multiple outcome types from random assortment; reduction of high dimensional spaces with both modality, individual, and time series components; explicable machine learning model development; non-stationary signal analysis; novel approaches do diversity mapping and phenotyping from physiology and behavior data.  I seek to find a fit with each individual and the lab’s ongoing projects; no one comes in and is just given marching orders – you’ll do better work when it’s the work that you actually want to do!

  • COVID-19 recovery monitoring

    Last Updated:

    Some individuals seem to have lingering or failed recoveries after COVID-19 infections. Students comfortable with basic programming or data science skills are encouraged to enhance our description of recovery profiles from TemPredict, and search for features that can contribute to pre-recovery classification.

  • Diversity within physiological data

    Last Updated:

    Algorithms tend to be one size fits all, where as people are similar or dissimilar in complex and unmapped ways. Help map differences in normal routines, as well as in illness and recovery trajectories. These might arise from known demographic information, co-morbid conditions (diabetes, pregnancy, etc.), or be represent different patterns in illness associated with unknown or latent variables.

  • Improving women’s health outcomes

    Last Updated:

    We have shown repeatedly in humans and animal models that females are as tractable with statistics as males (actually, often more than). Yet female physiology remains inappropriately understudied. Help us refine algorithms, map changes like pregnancy and menopause, and explore diversity within as well as across traditional sex categories.

| School of Pharmacy

  • Cholinergic neurotransmission in the central and peripheral nervous system

    Last Updated:

    Dr. Palmer Taylor's laboraotry studies cholinergic neurotransmission in the central and peripheral nervous system. Studies are directed to the structure and function of nicotinic acetylcholine receptors and acetylcholinesterase. The ultimate object is to use structure guided drug design to develop more efficacous drugs in the study of nervous system development and disorders of ageing. Various pharmacological and analytical techniques are employed. Examples can be found in Dr. Taylor's curriculum vitae and his publications.

| Chemistry and Biochemistry

cAMP-dependent protein kinase (PKA) is ubiquitous in every mammalian cell with the PKA signaling network regulating processes as diverse as memory, differentiation, development, the cell cycle, and circadian rhythms. One of our goals, in addition to elucidating structures of the PKA subunits, is to map the PKA proteome as it relates to PKA signaling. The PKA interaction network consists not only of the PKA regulatory and catalytic subunits as well as the GPCRs, G-Proteins, cyclases, and phosphodiesterases, as well as PKA substrates but also the scaffold proteins (A Kinase Anchoring Proteins: AKAPs) that target PKA to specific sites in the cell. We are interested, in particular to map PKA that is targeted to organelles such as the mitochondria. A second goal is to map the activity of PKA in live cells using FRET PKA activity reporters that are targeted to specific sites such as the plasma membrane, the mitochondria, or the nucleus.

  • Rotation Project for Systems Biology

    Last Updated:

    PKA is a broad spectrum kinase that has many protein substrates. It consists of regulatory (R) and catalytic (C) subunits and assembles into an inactive tetramer (R2C2) in the absence of cAMP. Binding of cAMP to the dimeric R-subunits unleashes the catalytic activity. There are four functionally non-redundant R-subunits (RIα, RIβ, RIIα, and RIIβ) and three C-subunits (Cα, Cβ, and Cγ). The GPCRs are the most abundant gene family encoded for by the human genome and many of these couple to cyclases that generate the cAMP second messenger. In addition to PKA R and C-subunits and PKA substrates, the PKA proteome includes scaffold proteins called A Kinase Anchoring Proteins (AKAPs) that target PKA to specific sites in the cell at the correct time. This dynamic spatio-temporal aspect is essential for correct PKA signaling in cells. One of our goals is to identify the proteins that are part of this PKA proteome and to establish how this proteome is altered in response to stress signals such as starvation and to the cell cycle. In addition, we hope to establish how the proteome varies as a consequence of disease or of genetic perturbation. To initially profile the PKA proteome we will be using two cell/tissue types, S49 mouse lymphoma cells and heart tissue. Eventually we will extend this analysis to mouse macrophages (RAW cells). In each case we have perturbed the PKA signaling pathway. In the S49 cells we have generated a mutant cell line that makes no active C-subunit. Although the protein is expressed in these cells it is not active, is not soluble, and remains associated with particulate fractions. In RAW cells we have silenced the Cα and Cβ genes. Our initial goal is to compare each of the wild type S49 cells lines with the cell lines where PKA function has been perturbed. To do this we will use mass spectrometry to identify the proteins that change as well as the phosphoproteome, and these changes will be compared to changes in gene expression. The S49 project will be done in collaboration with Paul Insel (Pharmacology) and Nuno Bandiera (Pharmacy/Computer Science) while the macrophage study will be done in collaboration with Mel Simon (Pharmacology). Similar profiling is being done for cardiac myocytes, and this will be done in collaboration with Hemal Patel (Medicine) and Andrew McCulloch (Bioengineering).

| Mathematics

  • Bioinformatics Rotation Projects Available

    Last Updated:

    There are three projects available for Summer 2009.

    1. The first project is in comparative genomics and concerns genome rearrangements. I work on algorithms for comparing rearranged genomes, and on applying them to real data sets. Expertise in algorithm development and discrete mathematics is preferred.
    2. The second project is joint with Pavel Pevzner and concerns genome assembly algorithms. Our current focus is on single cell assembly.
    3. The third project is joint with Vineet Bafna. We are developing a method to find pairs of genes that interact with disease status in GWAS microarray data. Finding simple diseases caused by one gene can be done efficiently by brute force in time O(mn) (m = # genes, n = # patients), but brute force for pairs of genes would require time O(m2 n), which may be prohibitive. We developed a new mathematical technique for computing χ2 of a contingency table that allows us to find the interacting pairs more easily. This project involves statistics, abstract algebra, and programming.

| Electrical and Computer Engineering

  • Hardware acceleration of computational genomics

    Last Updated:

    We're working on a number of projects related to hardware acceleration of computationally challenging tasks in genomics research and looking for highly motivated students with some background in GPU/FPGA programming or high-performance computing to join. Please contact me via email for details on the latest projects. 

  • SARS-CoV-2 Phylogenetics

    Last Updated:

    We’re currently working with an international group of collaborators to perform phylogenetic
    analysis of millions of SARS-CoV-2 sequences for viral epidemiology and evolutionary biology applications. Current projects include phylogenetic placement in real time, tree structure optimization, recombination inference, etc. Contact me via email for the latest details.

| Chemistry and Biochemistry

  • Bioinformatics Rotation Projects Available

    Last Updated:

    We are interested in studying the biological and physical principles underlying genetic networks and protein recognition. Rotation projects are available in the following areas. Specific projects will be tailored to fit a student's research interest and scientific background.

    1. Decipher epigenetic code: develop computational methods to identify common patterns in the histone modification and DNA methylation data associated with regulatory elements; predict regulatory elements, transcription factor binding sites or non-coding RNAs based on their chromatin signatures.
    2. Reconstruct genetic networks: reconstruct physical interactions from genomic and proteomic data; analyze the robustness and landscape of the networks.
    3. Decipher protein recognition code: employ structure-based computer modeling to characterize the energetic patterns of protein-protein and protein-ligand interactions; predict the specificity of protein recognition.
    4. Design resistance-evading drugs: computer-aided drug design to combat resistance; develop new methods for drug lead optimization.

    More information can be found at http://wanglab.ucsd.edu.

| Bioengineering

Our research focuses on molecular engineering for cellular imaging and reprogramming, and image-based bioinformatics, with applications in stem cell differentiation and cancer treatment.

  • Image-based reconstruction of biochemical networks in live cells

    Last Updated:

    Fluorescence resonance energy transfer (FRET)-based biosensors have been widely used in live-cell imaging to accurately visualize specific biochemical activities. We have developed the Fluocell image analysis software package to efficiently and quantitatively evaluate the intracellular biochemical signals in real-time, and to provide statistical inference on the biological implications of the imaging results. However, important questions arise on how to use these results to reconstruct the quantitative parameters in the underlying biochemical networks, which determine cellular functions and ultimately their fates. In this rotation project, we will integrate optimization-based machine learning approaches with biochemical network models to seek answers to these questions, with applications in cancer treatment against drug resistance.

  • Intelligent Diagnosis of Infectious Diseases by Deep Learning

    Last Updated:

    The diagnosis of infectious diseases often requires tissue biopsy and microscopic examination by pathologists, which is time-consuming, labor-intensive, and error-prone. To develop a software-assisting system for identifying microorganisms on digital images, we utilize the convolutional neural network and transfer learning for training and validating an intelligent software system for the classification of pathology slides. The goal of this project is to provide a diagnosis of pathogens with high efficiency and accuracy. Students will work in an interdisciplinary team, collecting and labelling imaging data, developing deep-learning based algorithms and user interfaces, characterizing and optimizing the accuracy and functionality of the software package.

| Pediatrics

My laboratory is interested in developing new treatment modalities for infectious diseases with a focus on eukaryotic pathogens, including malaria parasites.

  • Drug resistance and population analysis of Mozambique malaria samples

    Last Updated:

    Malaria remains a major problem for 40% of the world's population and drug resistance is widespread.  One mechanism for identifying drug resistance determinants is by identifying regions that show unexpected homozygosity in whole genome sequences.  The rotation student will work with physician scientists to align short read sequences to the P. falciparum genome, call variants, annotate variants, run population genetics analyses and produce reports.  

| Biological Sciences

We study mathematical and computational models of biomedical processes, with a focus on infection, the immune system, and cancer. We also study mathematical models of evolutionary processes and develop evolutionary theory. We aim to couple mathematical modeling work with data from the relevant fields through collaborations with experimental and clinical laboratories.

  • Evolutionary theory

    Last Updated:

    This project develops basic evolutionary theory, with relevance to biomedical applications. For example, we study the evolution and emergence of mutants in spatially structured populations under various assumptions. Much remains to be discovered about the principles of mutant evolution in structured populations, and this has important applications for cancer biology and cancer therapy, since most tumor grow as a mass of cells with strong spatial structure.

  • Mathematical models on in vivo virus dynamics

    Last Updated:

    The project will be concerned with mathematical models of virus replication within hosts, and the interactions of viruses with immune responses. Much of this modeling work is concerned with human immunodeficiency virus (HIV), due to the availability of experimental and clinical data. Topics include the evolution of HIV within hosts, the effect of spatial lymphoid tissue structure on HIV dynamics and evolution, and the dynamics of HIV during antiretroviral therapy in relation to the latent viral reservoir.

  • Mathematical Oncology

    Last Updated:

    This project is concerned with mathematical models of cancer initiation, cancer progression, and cancer therapy. This involves mathematical models of tissue stem cell dynamics, clonal cellular evolution in tissues during aging in relation to the development of cancer, and evolutionary models of drug resistance in cancers. Hematological malignancies are a major focus of this work. With respect to therapies and drug resistance, this work involves the use of mathematical models with patient-specific parameters to make personalized predictions about treatment outcome.

| Cellular and Molecular Medicine

We have a wide scope of projects ranging from developing novel algorithms for studying RNA processing in diseases, development and personalized medicine, and for analyzing single-cell RNA-seq data.

  • Single-cell RNA-seq analysis

    Last Updated:

    We have projects that deal with developing new algorithms for single-cell RNA-seq analysis pertaining to studying heterogeneity in complex mixtures of cells upon environmental challenges.

| Computer Science and Engineering

My research interests lie primarily in machine learning, especially for large-scale spatiotemporal data. I am generally interested in deep learning, optimization, and spatiotemporal reasoning. I am particularly excited about the interplay between physics and machine learning. My work has been applied to learning dynamical systems in sustainability, health and physical sciences.

  • Automatic Blood Pressure Control with Machine Learning

    Last Updated:

    This project seeks to develop novel deep learning methods to forecast and control patients blood pressure using large-scale sensor data from artificial heart pump.

| Bioengineering

  • Allele-specific gene regulations

    Last Updated:

    Each human cell contains two sets of chromosomes of the paternal and maternal origins respectively. Due to the presence of genetic polymorphisms, a same gene on the two chromosome sets might behave differently, which has profound implications in high-level phenotypes, including disease susceptibilities. In this rotation project, the student will analyze RNA sequencing or methylation sequencing data to identify genes that exhibit parental preferences and investigate the functional implications of such phenomena. Towards the end of the rotation, the student will learn a number of latest methods in mapping and analyzing next-generation sequencing data, and how to study transcriptome/methylome in the context of genetic polymorphisms.

  • Single-cell genome sequencing: de novo assembly and annotations

    Last Updated:

    Single-cell genome sequencing allows us to dissect a heterogenous cell populations and to relate the genome diversity to functions. One major challenge is de novo genome assembly, because the sequencing data from single cells are typically highly biased in locus representation. In this rotation, the student will have an opportunity to understand the experimental procedures for single-cell genome sequencing, to learn the state-of-the-art methods for de novo genome assembly, and to develop innovative strategies for genome assembly and annotations.

| Bioengineering

Our lab invented the MARIO (Mapping RNA interactome in vivo) technology to massively reveal RNA-RNA interactions from human tissue (Nature Comm, 2016). We also invented the MARGI (Mapping RNA-Genome Interactions) technology for revealing thousands of chromatin-associated RNAs (caRNA) and their respective genomic interaction sites (Current Biology, 2017; Nature Protocols, 2019). Leveraging MARGI, we and our collaborators characterized caRNA’s roles in 3D organization of the nucleus (iScience 2018a), modulation of gene expression during progression of diabetes mellitus in blood vessel endothelium (bioRvix, 2019), and the biogenesis of fusion RNAs (PNAS 2019a).

We contributed to discover that the earliest cell fate decision in mouse is made sooner than the commonly thought 8-cell stage (Genome Res, 2014). Our Rainbow-seq technology combined tracing of cell division history and single-cell RNA sequencing into one experiment (iScience 2018b).

We developed SILVER-seq for extracellular RNA (exRNA) sequencing from ultra-small volumes of liquid biopsy, solidifying a basis for future in vitro diagnostic trials using finger prick blood (PNAS, 2019b; Current Biology, 2020).

We contributed to reveal that transposons are indispensable regulatory sequences in the mammalian genomes. Species-specific transposons are required for preimplantation embryonic development in humans and other mammals (Genome Res, 2010). Nature highlighted this discovery as “Hidden Differences”, reporting that “transposons or 'jumping genes' had hopped in front of the genes, changing their regulation” (Nature, 2010). We contributed to establishing the proof-of-principle that cis-regulatory sequences can be annotated by cross-species epigenomic comparison (Cell, 2012).

  • Discovery of extracellular RNA biomarkers from human blood serum and plasma

    Last Updated:

    Extracellular RNA (exRNA) sequencing from human liquid biopsy has exhibited the potential for development of future disease diagnosis. Multiple rotation openings are created to characterize the fundamental properties of exRNA based on exRNA sequencing data and discover exRNA biomarkers for Alzheimer's disease and cancer.

  • Mapping the spatial transcriptome for human blood vessels

    Last Updated:

    Through a close collaboration between the bioinformatician and the molecular biologists, this project aims to develop new methods to sequence RNA with spatial resolution from human tissue. A breakthrough of this project will lead to simultaneously improvements of spatial resolution, the dynamic range and applicable dimensions of human tissue from the state-of-art spatial transcriptomics technologies.

  • The relationship between genome-wide RNA-chromatin interactions and 3D genome organization

    Last Updated:

    It remains unclear to what extent chromatin-associated RNAs can reflect the 3D organization of the genome. To this end, we used iMARGI to map genome-wide RNA-chromatin interactions in H1, HFFc6, and K562 cells and yielded on average 1.9 billion read pairs per sample. This project will compare these iMARGI data with genome interaction data including Hi-C and PLAC-seq on three different scales. At the compartment scale, we will test whether the A compartment chromatin is associated with large amounts of RNAs, involving both intrachromosomal and interchromosomal RNA-chromatin interactions. At the TAD scale, we will test whether the RNA ends of nearly all RNA-chromatin interactions are confined to within the boundaries of one or of a few consecutive TADs. At the loop scale, we will test whether RNA-chromatin interactions are enriched with PLAC-seq derived enhancer-promoter interactions.

| Cellular and Molecular Medicine

We are broadly interested in studying genome maintenance pathways. Our lab specializes in applying quantitative proteomics to analyze protein complexes, post-translational modifications, and combining the proteomics approach with bioinformatics, genetics, biochemistry and cell biology approaches to understand how complex biological processes communicate with each other and are integrated for specific cellular responses in order to keep the genome as stable as it is.

  • Proteomics analysis of protein complexes involved in genome maintenance

    Last Updated:

    We develop quantitative proteomics technologies to study several key protein complexes involved in DNA replication and chromosome segregation, regarding their composition, dynamics, stability and regulation. Insights from the proteomics analysis are further pursued to study how cells maintain their genome integrity.