Biomedical Informatics Graduate Rotation Projects

This page is updated annually. Some projects may already be taken, and new projects may be available. The projects below give an indication of the types of projects available in each lab, but please browse faculty web pages and contact professors directly to discuss current opportunities.

View Rotation Projects by Faculty: BISB or BMI

Labs with BMI Rotation Projects

Ferhat Ay | Pediatrics

  • ferhatay@lji.org
  • ferhatay@lji.org
  • ferhatay@lji.org

Vikas Bansal | Pediatrics

  • vibansal@ucsd.edu
  • vibansal@ucsd.edu

Robert El-Kareh | School of Medicine

  • relkareh@ucsd.edu

Yoav Freund | Computer Science and Engineering

  • yfreund@ucsd.edu

Christopher Glass | Cellular and Molecular Medicine

  • ckg@ucsd.edu
  • ckg@ucsd.edu

Olivier Harismendy | Biomedical Informatics

  • oharismendy@ucsd.edu
  • oharismendy@ucsd.edu
  • oharismendy@ucsd.edu

Chun-Nan Hsu | Biomedical Informatics

  • chunnan@ucsd.edu

Lilia Iakoucheva | Psychiatry

  • lilyak@ucsd.edu
  • lilyak@ucsd.edu

Trey Ideker | School of Medicine

  • tideker@ucsd.edu
  • tideker@ucsd.edu
  • tideker@ucsd.edu
  • tideker@ucsd.edu

Rob Knight | Pediatrics

  • rknight@ucsd.edu
  • rknight@ucsd.edu
  • rknight@ucsd.edu

Jejo Koola | School of Medicine

  • jkoola@ucsd.edu
  • jkoola@ucsd.edu
  • jkoola@ucsd.edu

Amit Majithia | School of Medicine

  • amajithia@ucsd.edu
  • amajithia@ucsd.edu
  • amajithia@ucsd.edu

Lucila Ohno-Machado | Biomedical Informatics

  • machado@ucsd.edu

Scott Rifkin | Biological Sciences

  • sarifkin@ucsd.edu
  • sarifkin@ucsd.edu

Michael Rosenfeld | School of Medicine

  • mrosenfeld@ucsd.edu

Jonathan Sebat | Cellular and Molecular Medicine

  • jsebat@ucsd.edu
  • jsebat@ucsd.edu
  • jsebat@ucsd.edu

Yingxiao (Peter) Wang | Bioengineering

  • yiw015@eng.ucsd.edu
  • yiw015@eng.ucsd.edu

Gene Yeo | Cellular and Molecular Medicine

  • geneyeo@ucsd.edu

| Pediatrics

ferhatay@lji.org | Lab

We are interested in the analysis and modeling of the three-dimensional chromatin structure from high-throughput sequencing experiments. We develop methods that are based in statistics, machine learning, optimization and graph theory to understand how changes in the 3D genome affect cellular outcome such as development, differentiation and gene expression. We have ongoing interests in the systems level analysis and reconstruction of regulatory networks, inference of enhancer-promoter contacts, predictive models of gene expression and integration of three-dimensional chromatin structure with one-dimensional epigenetic measurements in the context of cancer, malaria, asthma and several autoimmune diseases.

  • Integrative analysis of multi cell-type gene expression and epigenomic data in tumor immune response

    Last Updated:

    This project will focus on developing regulatory network inference methods for the joint analysis of gene expression and histone modification data from several different types of tumor infiltrating lymphocytes, which are gathered from a cohort of patients with solid tumors.

  • Predictive and comparative modeling of epigenetic gene regulation in different human immune cell types

    Last Updated:

    The goal of this project is to model the natural variation in gene expression across many immune cell types using an already established database at LJI (http://dice-database.org) and to identify cell type-specific epigenetic regulators of important immune genes.

  • Statistical methods for inferring functional DNA-DNA contacts from Hi-C and HiChIP/PLAC-seq data

    Last Updated:

    This project focuses on developing computational tools for better analysis of the wealth of data from chromosome conformation capture assays with the ultimate goal of inferring functional chromatin contacts such as those between enhancers and promoters.

| Pediatrics

vibansal@ucsd.edu | Lab

Research in our lab is focused on developing computational methods for the discovery and analysis of human genetic variation using high-throughput sequencing technologies. We develop algorithms and software tools for accurate and comprehensive assembly of human genomes and identifying disease associated variants.

  • Identifying variants in duplicated regions of the human genome

    Last Updated:

    A significant fraction of the human genome consists of repetitive or duplicated sequences that are problematic for short-read sequencing technologies since reads generated from such regions cannot be uniquely aligned to the reference genome. Hundreds of duplicated genes are known to be associated with rare and complex human diseases. We want to develop methods for the detection of variants in duplicated genes using whole-genome Illumina sequencing as well as single-molecule long read sequencing technologies. 

  • Whole-genome haplotyping using diverse sequencing technologies

    Last Updated:

    Although advances in high-throughput sequencing (HTS) technologies have made whole-genome sequencing (WGS) routine, almost all genomes have been sequenced using short read sequencing technologies such as Illumina and are, therefore, missing long-range haplotype information. We have developed computational methods (genome.cshlp.org/content/27/5/801.full) for haplotype assembly using diverse sequencing technologies that provide long-range haplotype information. A number of projects are available:

    • Haplotype-informed variant detection using single molecule long read sequencing technologies such as Pacific Biosciences and Oxford Nanopore.
    • Methods for haplotype-sensitive de novo assembly of complex regions of the human genome (e.g. HLA, KIR).
    • Phasing of structural variants using proximity-ligation sequencing.
    • Using haplotype information to identify novel gene-disease associations in complex diseases and cancer.

| School of Medicine

relkareh@ucsd.edu | Profile
  • Development of a Research Electronic Health Record for Clinical Decision Support Studies

    Creation of effective clinical decision support tools has the potential to significantly improve the quality of care delivered within our healthcare system. However, developing and testing prototypes of these tools requires access to realistic electronic health record (EHR) environments. This process has often involved prohibitively long turnaround times due to time and resource constraints of healthcare information systems groups. For research and educational purposes, these barriers could be avoided by creating an investigator-controlled research EHR and populating it with realistic clinical data. Such a system could enable researchers and students to develop a wide range of novel and innovative clinical decision support tools much more rapidly.

    Aims:

    1. Install a sophisticated, open-source EHR (OpenMRS)
    2. Populate this EHR with deidentified data from a rich clinical database (MIMIC II)
    3. Develop one or more prototype clinical decision support tools within this environment

| Computer Science and Engineering

yfreund@ucsd.edu | Profile | Lab

My main area of research is computational learning theory and the related areas in probability theory, information theory, statistics and pattern recognition. I work on applications of machine learning algorithms in big data, computer vision, human computer interaction and online education.

If you are a masters or undergrad student and you are interested in doing a project in data analysis, please consult the Matchpoint Page.

If you are interested in applying to be a PhD student, I am looking for students with a strong background in math/physics/signal processing/statistics that are interested in big data analysis and in programming. Please email me the list of relevant courses and your grades in them. If you have completed an original data analysis project, send me the link to the github repository. In the title to the email write "application for a data science PhD."

  • Digital Mouse Brain Atlas

    Last Updated:

    This Project is in collaboration with the Kleinfeld Lab (http://physics.ucsd.edu/neurophysics/) and the Mitra Lab (http://brainarchitecture.org/mouse/about).

    The idea is to use a combination of machine learning and computer vision algorithms to create a digital atlas of the mouse brain.

    This would require developing detectors of landmarks that exist in a majority of the brains. As the data size is large (tens of tera-bytes) the work will involve using a Hadoop cluster.

    Requirements: Python, computer vision, machine learning/statistics.
     

| Cellular and Molecular Medicine

ckg@ucsd.edu | Profile | Lab

Dr. Glass’ primary interests are to understand transcriptional mechanisms that regulate the development and function of macrophages. Macrophages play key roles in immunity, wound repair, development and tissue homeostasis. Dysregulation of macrophage functions contribute to a broad spectrum of human diseases, including atherosclerosis, diabetes, neurodegenerative diseases, and cancer. A major effort of the Glass laboratory is to use genomics assays and associated bioinformatics approaches to understand how macrophage gene expression programs are established and how they are influenced by different tissue environments and disease. An important concept to emerge from these studies is that enhancers can be exploited to deduce the transcription factors and upstream signaling pathways that drive context-specific transcriptional outputs. Students are welcome to select projects from current areas of active investigation.

  • Natural genetic variation and macrophage gene expression

    Last Updated:

    Many lines of evidence, including genome-wide association studies, indicate that non-coding genetic variation plays a major role in determining phenotypic diversity. We were among the first laboratories to define the impact of natural genetic variation on enhancer selection and function (Heinz et al, Nature 2013 PMID 24121437), but at present it remains difficult to predict the impact of non-coding variation on gene expression. In a novel and ambitious effort, we systematically characterized the genome wide patterns of mature RNA (RNA-seq), nascent RNA (GRO-Seq), transcriptional initiation (5’GRO-seq), histone modifications and binding profiles of lineage-determining and signal-dependent transcription factors (ChIP-seq), DNA methylation (bisulfite sequencing), and chromatin conformation (HiC, capture HiC and PLAC-seq), in resting and activated macrophages derived from 5 different inbred strains of mice providing ~60 million single nucleotide variants, ~6 million InDels and several hundred thousand structural variants. This data set provides a unique resource for investigating the impact of non-coding variants on transcription factor binding, enhancer activation and target gene expression. We are currently developing new computational methods for analyses of these data with a goal of explaining effects of non-coding mutations and predicting patterns of gene expression in new mouse strains. Related projects are investigating the relationships of genetic variation between selected mouse strains and their different susceptibilities to metabolic and cardiovascular disease. This general project area is both challenging and open ended and there are a wide range of directions that rotation projects could take. As examples, recent rotation students have implemented machine learning approaches to investigate how sequence variants affect collaborative binding between lineage-determining transcription factors.

  • Nature and nurture of microglia

    Last Updated:

    Each population of tissue resident macrophages exhibits a distinct pattern of gene expression that is tuned to the developmental and homeostatic needs of that tissue. For example, brain macrophages called microglia produce factors that are trophic for neurons and monitor synapses, functions that require a brain-specific program of gene expression. A key question is how this tissue-specific program of gene expression is achieved. Through analysis of gene expression and enhancer landscapes, we obtained evidence that the microglia-specific molecular phenotype results from instructive signals in the brain that direct the activation of microglia-specific enhancers (Gosselin et al., Cell 2014 PMID 25480297). Of particular interest, delineation of the gene expression patterns and enhancer landscapes of human microglia revealed that a substantial fraction of the genes associated with non-coding GWAS risk alleles are preferentially or exclusively expressed in microglia, and many are brain environment dependent (Gosselin et al. Science 2017 PMID 28546318). These findings raise several important questions that are under active investigation, including what are the environmental factors that dictate the brain specific program of gene expression and how do human genetic variants affect the regulation of genes that are linked to neurodegenerative disease. We are taking a multi-disciplinary approach including studies of in vivo mouse models, in vitro human iPSC-derived microglia, genomic assays of microglia nuclei derived from control and Alzheimer’s disease brains, and direct analyses of the relation of genotype to gene expression in a growing of RNA-seq data base derived from purified human microglia. As an example, a recent rotation project investigated the question of whether there is any relationship between circulating monocytes (a white blood cell that can differentiate into macrophages in tissues) and microglia gene expression patterns from the same individual. 

| Biomedical Informatics

oharismendy@ucsd.edu | Profile | Lab

The Oncogenomics laboratory is located in the Moores Cancer Center. Its research program is focused on the identification of genetic and epigenetic markers for cancer prevention and progression as well as drug response. The laboratory is a humid laboratory, combining both wet-lab techniques and bioinformatics analysis to study cancer samples from patients and animal models of cancer. The laboratory is also an important partner for multiple principal investigators at the Moores Cancer Center, collaborating on the design, analysis and interpretation of their genomic experiments.

  • Development of Genomics Virtual Machines in HIPAA compliant cloud

    Genetic information is considered protected health information (PHI) and as a consequence the highest security standards need to be applied for its storage, analysis and sharing. The oncogenomics laboratory is using state of the art iDASH compute cloud for its main computation. As a consequence, we participate in the development of optimal workflows and virtual machines for the analysis of patient-derived genomic datasets such as whole exomes, whole genomes, RNA-seq or genotyping arrays. 

    In this project we will develop robust provisioning methods to establish virtual machines capable of running popular human genomic analysis workflows. We will benchmark these machines and workflows and convert some of them into standard recipes for production-grade, reproducible genomic analysis.

  • Genetic and epigenetic of cisplatin resistance

    Last Updated:

    Cisplatin (cDDP) is the most commonly used chemotherapeutic drug, but most cancer eventually become resistant, leading to tumor recurrence. Several biological processes may modulate cDDP sensitivity: Drug import, export, detoxification, DNA repair, apoptosis. Drug resistance is transmitted to daughter cells, and one can build up resistant cell lines in vitro using sequential treatments. We are interested in identifying the genetic mutations that mediate this resistance. For this, we have derived resistant cell-lines from single clones of a cDDP sensitive ovarian cancer cell line. Using exome sequencing as well as target sequencing, we propose to determine mutations in genes and pathways that drive drug resistance. We will then expand the findings to the TCGA samples, using time to recurrence as an indicator of drug sensitivity.

  • The role of inherited variation in cancer somatic landscape

    Last Updated:

    The role of germline or inherited variation in cancer has been studied in selected families and led to the identification of genetic variants that are dominant and responsible for cancer syndromes. Similarly, rare recessive variants with lower penetrance are responsible for the increased risk in breast and ovarian cancer (BRCA1/2). More common variants in the population have also been identified through GWAS, and have revealed multiple SNPs associated with a modest increase in cancer risk. Despite these advances, multiple variants of intermediate allelic frequency in the population, or carried by patients with undocumented family history still remain variants of unknown significance (VUS) and can still play a role in tumor development. In addition, the contribution of variants located outside of the coding region has been underexplored and can now be reexamined in the light of recent maps of the regulatory landscape. The long-term goal of this research is to utilize germline genetics variation in cancer prevention and care to better stage patients or predict their response to treatment.

    We propose to identify the germline variants in the UCSD Cancer center patients (targeted gene panel) as well as in the public TCGA/ICGC datasets (whole genomes). We will then test these variants, alone or in combination to identify the ones that impact cancer onset, the tumor somatic landscape or tissue-specific regulatory network. The project will involve the processing of high throughput sequencing data, population genetics, and statistical analysis, in a HIPAA compliant cloud-computing environment.

| Biomedical Informatics

chunnan@ucsd.edu | Profile

My research focuses on solving natural language processing problems in biomedical sciences (bioNLP).

  • Electronic phenotyping

    Last Updated:

    As more medical record data are now in electronic format, how to re-use the data for clinical research and healthcare quality improvement becomes an important research topic. Selecting patients from electronic medical records satisfying certain phenotypic conditions may require understanding and disambiguating free texts given in narrative notes. The project will develop capabilities of algorithmic selection that can be used to enhance diagnostic decision-making. 

| Psychiatry

lilyak@ucsd.edu | Profile | Lab

The lab has a variety of bioinformatics projects aimed at improving understanding of the functional impact of autism mutations derived from exome and genome sequencing of the patients. We build spatio-temporal gene co-expression and protein interaction networks for psychiatric diseases and we use these networks to generate the testable hypothesis about the mechanisms of disease. We also test these hypothesis experimentally in the lab, thereby adding a translational aspect to our work. 

  • Evaluating the effect of splicing mutations on isoform networks in autism

    Last Updated:

    The project deals with constructing the isoform-level co-expression and protein interaction networks for predicting functional impact of the de novo splice site mutations from the patients with autism spectrum disorder (ASD). Hundreds of splice site de novo mutations are currently identified in the ASD patients, but not a single disease mechanism is established for any of these mutations. We will build and analyze isoform-level networks of brain co-expressed and physically interacting proteins; map de novo ASD mutations onto isoform-level networks to predict their functional impact; and validate the disrupted networks and pathways using CRISPR/Cas technology in neuronal and animal models. This project will discover and characterize cellular and molecular processes that are disrupted by the de novo splice site ASD mutations.

  • Integrative functional genomic study of pathways impacted by recurrent autism CNV

    Last Updated:

    Copy number variants (CNVs) represent significant risk factors for Autism Spectrum Disorders (ASD). One of the most frequent CNVs involved in ASD is a deletion or duplication of the 16p11.2 CNV locus, spanning 29 protein-coding genes. Despite the progress in linking 16p11.2 genetic changes with the phenotypic (macrocephaly and microcephaly) abnormalities in the patients and model organisms, the specific molecular pathways impacted by this CNV remain unknown. To test the hypothesis that RhoA signaling is disrupted by this CNV, we will generate KCTD13 and CUL3 mouse models using CRISPR/Cas9 system and investigate dysregulated molecular pathways using RNAseq at various stages of the developing mouse fetal brain.

| School of Medicine

tideker@ucsd.edu | Profile | Lab

​The overall objective of the Ideker Laboratory is to develop an artificially intelligent model of the cell able to translate a patient's data into precision diagnosis and treatment. For this purpose, we run an experimental facility for mapping the gene and protein interaction networks that govern eukaryotic cell biology, with a focus on pathways underlying cancer and neurodevelopmental disorders.

Our major bioinformatics challenge is to model how cells process information from genotype to phenotype. Towards this goal, we develop machine learning methods that attempt to learn cell structure and function directly from genome-scale datasets:

However, much remains to be done before we have a cell model capable of making robust predictions about patients. A recent breakthrough on this front was our creation of the D-Cell, a deep neural network modeling the inner workings of a eukaryotic cell. We are also developers of Cytoscape, a popular platform for visualization and modeling of biological networks which is supported by a consortium of many labs including our own (http://www.cytoscape.org/).

  • Using a hierarchical cellular model to analyze tumor genetic mutations

    Last Updated:

    The student will explore whether a hierarchical model we have recently constructed for predicting growth of simple cells can be translated to predict aggressiveness of human cancer. The model will be provided, along with access to tumor exomes from both public and internal sources. The goal is to determine, over a 10 week rotation, whether and to what extent the model can be used to analyze a patient's exome. If so, this project could be readily developed into a PhD thesis.

    Prerequisites: Computer programming or scripting skills; some knowledge of genomic biology.

  • Computing a minimal set of genes required for life

    Last Updated:

    A long standing question in biology is how many (and which) genes are required for life. This essential core set of genes, or minimal genome, makes up the cell's “life support system” or “chassis and power supply” on which more complex functions and processes are built. This set of genes is of keen interest in the field Synthetic Biology, which aims to synthesize the complete minimal genome of an organism and add additional functions to this genome for biotechnological, pharmaceutical and agricultural ends. This project will attempt to use our whole-cell model of the networks and pathways in a cell to predict which genes and gene combinations are essential for life and, conversely, which genes and gene combinations can be removed. If successful, this project will be able to predict minimal genomes for synthesis and testing. It will also address whether there actually is a single “minimal genome” or whether there exist many different configurations all of which are near or at the global minimum.

    Prerequisites: Computer programming or scripting skills.
    Optional: Experimental laboratory skills, which would allow student to make tests of model predictions.

  • Development of a software pipeline for generating cell function hierarchies from genomic data

    Last Updated:

    We have developed algorithms (NeXO and CliXO) by which systematic datasets are used to organize genes into a gene ontology, reflecting the hierarchical organization of cellular structures and molecular pathways in the cell. Currently these algorithms are coded in Python; however, a user-friendly and expandable interface would allow end-users to quickly build and update gene ontologies from new data sets. Coding of this interface is the main goal of this rotation; If successful, this tool could seed a thesis project to construct a gene ontology for a particular cellular process (e.g. DNA damage response) or disease (e.g. cancer) of interest.

    Prerequisites: Computer programming or scripting skills; some knowledge of genomic biology.

  • Experimental mapping of the DNA damage response

    Last Updated:

    Cell colonies on agar grow in a near linear fashion with growth rates reflective of their "fitness". The laboratory has developed an experimental platform that can make continuous measurements of growth rates via time-lapse image capture of thousands of specific genetic mutant strains, enabling us to determine the relevance of every gene in the response to stimuli such as DNA damage via radiation or chemotherapy. During the rotation the student will grow ~50,000 cell colonies in parallel and capture their growth curves using digital images and intermittent radiation exposure. The project includes working in Matlab for the analysis of growth curves and the elucidation of DNA damage response pathways. If successful, the project could be developed into a thesis which uses these data to construct a hierarchical model of DNA damage responses.

    Prerequisites: Prior experience in a genetics or biochemistry experimental laboratory.

| Pediatrics

rknight@ucsd.edu | Lab

The Knight lab has broad interests in the human microbiome, the collection of trillions of microbes that inhabits our bodies, especially in developing techniques to read out these complex microbial communities and use the resulting data to understand human health, links between humans and the environment, and to prevent and cure disease. We offer a fast-paced environment with many collaborative opportunities on different projects.

  • Machine Learning for the Microbiome

    Last Updated:

    We have amassed a database of microbial DNA sequences from hundreds of thousands of biological specimens. Understanding how these changes relate to disease requires a range of machine learning and multivariate statistical approaches. There are many opportunities ranging from entry-level (benchmarking classifier performance on specific sample sets) to extremely challenging (using deep learning to infer the structure of global sample set relationships).

  • Multi-omics integration

    Last Updated:

    An increasing need is to integrate data from different "omics" level, e.g. genomes, metagenomes, metatranscriptomes, metaproteomes, metabolomes, immunological profiling, etc., into a single coherent picture separating healthy and disease states. Improved methods for performing this task, either directly or via intermediate representations such as mapping to metabolic and regulatory pathways, is essential for improving understanding. Projects in this category range from simple (testing where existing techniques like correlation networks or Procrustes analysis do/don't connect two specific data layers) to challenging (use transfer learning to integrate heterogeneous data layers and improve the underlying network annotation). An especially exciting emerging research direction here is XAI (explainable artificial intelligence), which can provide for clinical applications a better justification for a specific classification or suggestion.

  • Optimizing microbiome algorithms

    Last Updated:

    Many algorithms used in microbiome studies, especially in metagenomic assembly, are extremely computationally expensive. Opportunities exist for either exploiting new hardware architectures to accelerate existing algorithms, or for developing new approximate algorithms, to tackle problems in the workflow including inferring taxonomy and function from DNA sequence data, genome and metagenome assembly and annotation, computing community distance metrics from sparse compositional data, and high-level analyses of hundreds of thousands of microbiomes. Again these projects range from entry level (compare results of two multiple sequence alignment techniques for subsequent community analysis) to advanced (use non-von Neumann architectures to perform pattern classification in real time at the whole community level for disease detection).

| School of Medicine

jkoola@ucsd.edu | Profile

Dr. Koola is a physician scientist specializing in Biomedical Informatics and Hospital Medicine. He specializes in the area of big data machine learning for predictive analytics. In particular, he is interested in using electronic health records to improve care delivery--particularly for patients with advanced liver disease. Using risk prediction models in a healthcare context requires understanding of: (i) the healthcare system of intended use; (ii) risk model building; (iii) risk model assessment; and (iv) risk model re-calibration. Additionally, Dr. Koola is interested in visual analytics, data modeling, and health services research.

  • Designing the "Green Button" informatics consult service using big data analytics for personalized medicine

    Last Updated:

    In 2012 the Institute of Medicine released a desiderata for a learning healthcare system, where evidence informs practice and practice informs evidence. Though the randomized clinical trial (RCT) serves as the gold standard for informing clinical decisions, flaws exist in terms of achieving recruitment, overly stringent inclusion/exclusion criteria, and lack of patient-centered decision making. Observational cohort studies have grown as an important complement to RCTs allowing comparative effectiveness research and patient-centered trials. The surge of Electronic Health Records (EHR) and its resulting zettabyte of data5 allows us to realize this vision for the first time. Despite the growth of observational cohort studies, challenges still remain bringing the knowledge from the bench-to-the-bedside; moreover, model performance degrades when used in a cohort outside of its development.

    To ameliorate these difficulties, we propose to launch and study a novel “informatics consult” service. The service would allow clinicians, when no clear evidence based guidelines exist regarding care decisions, to query the UCSD clinical data warehouse by identifying patients similar to the index case. First proposed in the seminal “Green Button” paper by Longhurst et al., such a system would leverage our ability to truly deliver personalized, patient-centered care. Small-scale limited efforts have been put into practice to answer questions regarding treatment of melanoma8 and systemic lupus erythematosus complications. We note, however, the opportunity for a much larger service with broad impact starting with insights borne of data from UCSD, and potentially mining insights from the entire state-wide UC Health data warehouse.

    We note several novel challenges to this proposed system: (i) Performing semi-automated phenotyping so that we can identify clinical outcomes of interest10. (ii) Identifying patients that are similar to the index patient (often called clustering). (iii) Incorporating automated, computable search regarding guideline recommended care. (iv) Performing visual analytics to understand similarity of cohorts. (v) Communication of probability and statistical information to healthcare professionals so they can effectively manage uncertainty.

    Student responsibilities:

    1. Participate in project meetings
    2. Help design one of several possible algorithms/interfaces:
      a. patient clustering algorithm using unsupervised learning
      b. visual analytic interface for describing similar cohort of patients
      c. visual analytic interface to help communicate statistical risk
  • Integrating patient reported outcomes into the electronic health record to improve cardiovascular care.

    Last Updated:

    Unhealthy dietary choices—a lack of nutritious foods and an excess of unhealthy food—was shown as the major contributor in the 400,000 U.S. deaths in 2015 from cardiovascular diseases (CVD). Eating more nuts, vegetables, and whole grains, and less salt and trans fats, could save tens of thousands of lives in the U.S. each year. Obesity is one critical outcome of poor diet, which also contributes to heightened CVD risk. Thousands of smartphone apps are available to download for weight loss, but these apps primarily focus on caloric intake, rather than the overall quality of diet and lifestyle critical for CVD prevention.

    Mobile Health (mHealth) applications also have not been systematically tested for their effectiveness and are criticized for not having an evidence-based foundation. In this study, we adapt the design of mHeart to communicate automatically with the UCSD Electronic Health Record to help healthcare providers have access to psychosocial aspects of patient's care outside of the direct hospital system. In particular, the provider will be able to view logs of patient activity, dietary choices, and other lifestyle choices. The provider will also be able to send feedback to the patient to alter behavior.

    Student opportunities:

    1. Help modify smartphone app to make use of healthcare connection protocols like Apple HealthKit and Google Fit
    2. Understand interfaces that communicate with electronic health records (like FHIR)
    3. Help design point-to-point interface between smartphone app and electronic health record data, which is presented to provider
    4. Participate in meetings designing pilot study to test app performance
  • Systematic review and meta-analysis of hospital readmission for patients with cirrhosis.

    Last Updated:

    Patients with cirrhosis, a late stage of chronic liver disease, are at increased risk of hospitalization and hospital readmission. Although several studies have looked at models for predicting readmission for patients with cirrhosis, they are limited by small sample sizes, limited candidate predictor variables, and limited evaluation of discrimination and calibration. A systematic review and meta-analysis of available evidence can help shed new light on the problem, and help identify modifiable risk factors.

    Student responsibilities:

    1. Understand the basics of a systematic review
    2. Perform literature review
    3. Abstract necessary information in case report forms and help perform meta-analysis
    4. Help write manuscript

| School of Medicine

amajithia@ucsd.edu | Profile | Lab

Our goal is to identify genes causing insulin resistance in humans in order to find new therapeutic targets for diabetes and cardiometabolic diseases. Our approach to discovery is grounded in human genetics, clarified through systematic, high throughput experimentation in human cells, and calibrated by its relevance to clinical disease. We use massively parallel genome engineering to re-create mutations identified in patients and develop high-throughput assays to interrogate function in human cell models. We apply bioinformatics and statistics to make sense of this data integrating 1) human mutations, 2) cellular function, and 3) metabolic/glycemic phenotypes of the individuals who harbor them. Using this approach, we have discovered novel missense mutations that greatly increase risk for type 2 diabetes. As a complementary aim towards precision medicine, we develop tools for clinical genome interpretation powered by high-throughput experimental data.

  • Evaluating accuracy and clinical utility of commercially available genetic risk scores for diabetes

    Last Updated:

    Recently, 23andMe, which sells direct to consumer genetic testing products, has introduced a diabetes risk report based on single nucleotide polymorphisms (SNP) genotypes measured in their commercial product ($199: https://www.statnews.com/2019/03/10/23andme-will-tell-you-how-your-dna-affects-your-diabetes-risk-will-it-be-useful/). The clinical utility of this report is unclear and has generated significant controversy. Critically, 23andMe’s SNP-chips only test about 0.02% of the human genome. We have shown in previous work that a single rare SNP, not captured by SNP-chips, can change an individual’s risk of diabetes by 7-fold. The purpose of the project is to test the 23andMe diabetes report output in a dataset of individuals whose diabetes status is known and who have also undergone more extensive genome sequencing (whole exomes) to assess the accuracy of direct to consumer SNP tests and quantify the number of falsely reassuring tests when more complete genetic information is considered.

  • Identifying discriminators of drug-responsive mutations in Mendelian diabetes

    Last Updated:

    Loss-of-function mutations in hepatocyte nuclear factor 1 (HNF1A) cause autosomal dominant diabetes of the young (MODY3). Patients with MODY3 clinically are difficult to distinguish from patients with autoimmune type 1 diabetes and are therefore often given the same treatment consisting of multiple daily injections of insulin. However, MODY3 patients can be effectively treated with a single daily tablet of sulfonylureas and thus spared from having to take multiple daily injections. This project aims to utilize data generated from cells engineered to express a range of HNF1A mutations (MODY3 and non-MODY) followed by RNA-sequencing to identify a signature of genes that can distinguish between sulfonylurea responsive mutations and non-responsive mutations. This transcriptomic signature would form the basis of a biomarker test in patients with HNF1A mutations to predict their responsiveness and provide the most effective, least burdensome treatment.

  • Integrative genomics to identify a novel disease-causing mutation in the Simpson Golabi Behmel Syndrome (SGBS)

    Last Updated:

    The SGBS syndrome is characterized by overgrowth of multiple body parts. It is a rare genetic disease that has been attributed to inactivating mutations in GPC3. We have stem cells from a patient with SGBS syndrome but NO GPC3 mutation implicating another as yet unknown causal gene. We have performed whole genome sequencing and RNA sequencing on these cells. The goal of this project is to identify the causal gene utilizing the genomic data sets to create a “short list” of causal genes which then can be assessed experimentally in the patient cells using genome engineering.

| Biomedical Informatics

machado@ucsd.edu | Profile

| Biological Sciences

sarifkin@ucsd.edu | Profile | Lab

The Rifkin laboratory studies how environmental, genetic, and stochastic variation interact to generate phenotypic variation and thereby mold the course of evolution. We use yeasts and nematodes as model organisms and work primarily at the level of gene regulatory and signal transduction networks.

  • Constraints and selection that drive gene family evolution

    Last Updated:

    High quality genome sequences are increasingly available that cover entire genera. This facilitates fine scale investigations of the forces that drive protein evolution. We are using the Caenorhabditis (roundworm) genus and the Saccharomyces (yesat) genus to study the role of selection and constraint on gene family evolution in the context of developmental and physiological networks. A rotation project would include phylogenetic analyses of gene families, evolutionary tests for selection and constraint, and integration with mechanistic and functional data from systems biology.

  • Statistics on the morphogenesis of hybrid inviability

    Last Updated:

    Hybrids between different species do not, as a rule, do well. We are using microscopy, image informatics, and statistics to understand the developmental biology of hybrid incompatibility. We are imaging the complete development (3D position of every nucleus over time) of hybrid worm embryos in trying to determine whether there are particularly sensitive parts in development where genomes of different species prove particularly incompatible, even if other parts of development proceed properly. A rotation project on this topic would include image informatics in processing the microscopy images and statistical investigation of variation in this complicated trait.

| School of Medicine

mrosenfeld@ucsd.edu | Profile

Lab Location: CMM-West, Rm. 345

Lab Phone: 858-534-5858

Lab Composition and Activities: Five graduate students from several programs, and a talented group of enthusiastic (also helpful) postdoctoral fellows and a full time laboratory manager. We have one general laboratory meeting, one graduate student-only meeting, and one personal meeting each week. We also have joint lab meetings with two other labs weekly.

Research Interests: Our central laboratory focus this year is to continue to utilize global genomic approaches to uncover and investigate the “enhancer code” controlled by new, previously unappreciated pathways that integrate the genome-wide response to permit proper development and homeostasis, and that also functions in disease and senescence. We have investigated these events in differentiated cells, neuronal development, stem cells, and cancer. Our biological focus is on molecular mechanisms of the “enhancer code” regulating learning and memory; aggressive prostate and breast cancer, and they underlying events of senescence/aging. Epigenomic events studied include non-histone methylation events and non-coding RNAs. We are investigating these events in development, breast and prostate cancers, and in inflammation-based disease, including degenerative CNS disease and diabetes. The emerging importance of non-coding RNAs and regulation of nuclear architecture is rapidly altering our concepts of homeostasis and disease. Our laboratory is “Seq-ing” (RIP-seq, ChIP-seq, RNA-seq, GRO-seq, CLIP-seq, ChIRP-seq), and a new “FISH-seq”, for open-ended discovery of long-distance genome interactions to uncover new “rules” of regulated gene transcriptional programs and new roles for lncRNAs in biology of normal, cancer neuro-affective disorders and aging cells. Coupling this with chemical library screens, we hope to introduce new types of therapies based on targeting specific gene enhancers, histone protein readers and writers, and lncRNAs for cancers and other diseases. Recent surprising findings have been novel roles of lncRNAs prostate and breast cancer, connection between DNA damage repair/transcription and replication, and unexpected roles of enhancer RNAs.

Current interests include:

  • The “enhancer code,” Epigenomics and transcriptional regulatory mechanisms.
  • Roles of by ncRNAs in enhancer function in signal-dependent genomic relocation and in establishing subnuclear architecture.
  • Mechanisms of signal-induced tumor chromosomal translocations events and new chemical screens for inhibitors for breast and prostate cancer.
  • The “enhancer code” or regulation of learning and memory, including Reelin-regulated enhancers.
  • Linkage of DNA damage/repair and transcription.
  • Retinoic Acid regulation of Pol III-transcribed DNA repeats in maintenance of the stem cell state, in neuronal differentiation and in senescence.
  • Molecular mechanisms of prevelant disease associated sequence variations (GWAS) in disease susceptibility loci.
  • “Epigenomics” in neuronal differentiation, cancer, diabetes and degenerative brain disease.
  • Answering the question when and how enhancers arise and became functional (stem cells to mature cell types).

  • Bioinformatics Rotation Projects

    Last Updated:

    Potential projects include:

    • Projects employing use of genome-wide technologies, including ChIP-seq, GRO-seq, CLIPseq-, RNA-seq, and ChIRP-seq, to elucidate molecular mechanisms of regulated enhancer lncRNA actions in cancer and stem cells;
    • Roles and mechanisms of enhancer actions in prostate and breast cancers;
    • Enhancer-based model of neurodevelopment and CNS disorders;
    • New mechanisms of long non-coding RNAs dictating physiological gene regulation in cancer transcriptional programs;
    • Understanding subnuclear structures: Roles of relocation of transcription units between subnuclear architectural structures in regulated gene expression;
    • Chemical library screens to gene signature and translocation responses as an approach toward new cancer therapeutic reagents;
    • Roles of epigenomic regulators and expression of DNA repeats in stem cells, neuronal differentiation and in senescence.

| Cellular and Molecular Medicine

jsebat@ucsd.edu | Profile | Lab

Our laboratory is interested in how rare and de novo mutations in the human genome contribute to patterns of genetic variation and risk for disease in humans. To this end, we are developing novel approaches to gene discovery that are based on advanced technologies for the detection of rare variants, including studies of copy number variation (CNV) and deep whole genome sequencing (WGS). Our goal is to identify genes related to psychiatric disorders and determine how genetic variants impact the function of genes and corresponding cellular pathways.

  • Determining the effect of autism mutations on development of the head and face

    Last Updated:

    We have collected whole genome sequence data and 3D digital images of the head and face from a set of 300 autism families. This project will examine quantitative measurement of facial features in autism patients and sibling controls and determine the degree to which specific mutations affect craniofacial structure. We will apply unsupervised clustering of genetic and phenotype data to define diagnostic subgroups of patients.

  • Determining the frequency of spontaneous reversion in the human genome

    Last Updated:

    Structural Variants (SVs) in the human genome are poorly ascertained in genome-wide association studies (GWAS).Tandem duplications in particular are not efficiently tagged by adjacent SNPs. The reasons for this are not known. We hypothesize that SVs, once formed, create local instability resulting in a high rate of spontaneous reversion. This project will directly determine the rates of spontaneous reversion in whole genomes of 300 trio families. In addition, we will examine the local patterns of genetic variation adjacent to SVs to infer the occurrence of reversion events.

  • Identifying human essential genes by deletion mapping of a large population

    Last Updated:

    Studies of genetic variation in large populations makes it possible to determine the degree of natural selection acting on specific sequences. Our lab has mapped structural variation (SV, including deletions and duplications) in large samples (N>100,000). By generating a null model based on regional patterns of SV, we propose to identify sequences that deviate dramatically from expectations. Sequences that display extreme deviation are likely to be genes that are essential for life.

| Bioengineering

yiw015@eng.ucsd.edu | Profile | Lab

Our research focuses on molecular engineering for cellular imaging and reprogramming, and image-based bioinformatics, with applications in stem cell differentiation and cancer treatment.

  • Image-based reconstruction of biochemical networks in live cells

    Last Updated:

    Fluorescence resonance energy transfer (FRET)-based biosensors have been widely used in live-cell imaging to accurately visualize specific biochemical activities. We have developed the Fluocell image analysis software package to efficiently and quantitatively evaluate the intracellular biochemical signals in real-time, and to provide statistical inference on the biological implications of the imaging results. However, important questions arise on how to use these results to reconstruct the quantitative parameters in the underlying biochemical networks, which determine cellular functions and ultimately their fates. In this rotation project, we will integrate optimization-based machine learning approaches with biochemical network models to seek answers to these questions, with applications in cancer treatment against drug resistance.

  • Intelligent Diagnosis of Infectious Diseases by Deep Learning

    Last Updated:

    The diagnosis of infectious diseases often requires tissue biopsy and microscopic examination by pathologists, which is time-consuming, labor-intensive, and error-prone. To develop a software-assisting system for identifying microorganisms on digital images, we utilize the convolutional neural network and transfer learning for training and validating an intelligent software system for the classification of pathology slides. The goal of this project is to provide a diagnosis of pathogens with high efficiency and accuracy. Students will work in an interdisciplinary team, collecting and labelling imaging data, developing deep-learning based algorithms and user interfaces, characterizing and optimizing the accuracy and functionality of the software package.

| Cellular and Molecular Medicine

geneyeo@ucsd.edu | Profile | Lab

We have a wide scope of projects ranging from developing novel algorithms for studying RNA processing in diseases, development and personalized medicine, and for analyzing single-cell RNA-seq data.

  • Single-cell RNA-seq analysis

    Last Updated:

    We have projects that deal with developing new algorithms for single-cell RNA-seq analysis pertaining to studying heterogeneity in complex mixtures of cells upon environmental challenges.