UCSD Logo UCSD Logo For Printing Skip navigation links

Navigation


Bioinformatics and Systems Biology

Rotation Projects


This page is updated annually. Some projects may already be taken, and new projects may be available. The projects below give an indication of the types of projects available in each lab, but please browse faculty web pages and contact professors directly to discuss current opportunities.

Labs with rotation projects

Gaurav Arya, Nanoengineering

Contact:
Last updated: 1/5/2009


Our laboratory is interested in the development and application of theoretical/simulation methods for addressing important problems in biology and nanotechnology. Our focus is on three specific areas:

  1. DNA: Regulation of chromatin through histone modifications, histone variants, and chromatin remodeling, in collaboration with Profs. Doug Smith and Wei Wang; higher-order genome organization and dynamics in collaboration with Prof. Cornelius Murre.
  2. RNA: Predicting tertiary and secondary structure of RNAs using single-molecule force profiling and computational design of ribozymes. The latter study is in close collaboration with Prof. Ulrich Muller.
  3. Nano: Carbon nanotube spinning, self-assembly, and coiling. These studies are in close collaboration with Profs. Ken Vecchio, Sia Nemat-Nasser, and Prab Bandaru's research groups, respectively. Self-assembly of polymer-coated colloidal particles.

There are projects available in each of the three sub-areas for talented and motivated students. Please visit our group website (http://maeresearch.ucsd.edu/arya/), drop me an email (), or drop by my office (#2304, Calit2) for more details.

Scott B. Baden, Computer Science and Engineering

When: Fall or Winter (I am on sabbatical in Spring)
Contact:
Last updated: 9/17/2009


  • Accelerated computation on Graphical Processing Units
     
    In addition to performance programming, accuracy and error management issues may arise; GPUs deliver far higher performance for single precision than double precision floating point arithmetic.
  • Domain Specific Computational Databases
     
    Add extensions to our Computational Database testbed to treat new applications, avoiding the classic overheads of relational databases (such as MySQL) in processing scientific data.

Vineet Bafna, Computer Science and Engineering

Contact:
Last updated: 2/6/2009


Rotation opportunities in genomics and proteomics (Spring 2009)

We focus on algorithmic problems in genomics and proteomics. Current projects include

  1. Detection and resolution of structural variations in genomes, through an analysis of next generation sequencing.
  2. Problems in genetics, including haplotyping, selection, rare-variant analysis, and others.
  3. Computational analysis of mass spectrometry data for protein identification, quantification, imaging, structure, and other projects.

Richard Belew, Cognitive Sciences

Project: Computational analyses of HIV drug-resistance Co-evolution
Contact:
Last updated: 9/17/2009


One or more graduate (and potentially undergraduate) students sought as part of a long-term NIH-funded research project with collaborators at Scripps Research Institute. Funding is available immediately and may be extended as our experience warrants it.

HIV is arguably the most dynamic evolutionary system on the planet, in large part because our repeated attempts to develop drug treatments to drive it to extinction. Over the last three decades, HIV has become one of the most studied and best characterized biological systems. Huge amounts of data on viral mutation, structural features of key proteins and effective drugs, drug resistance, and patient histories are all accumulating. A major current challenges lies in methods of integration across various datasets, towards the development of predictive models of viral evolution, and from them, more effective therapies. Specific projects anticipated during the 2009-2010 academic year:

  • Simulations of intra-patient HIV evolution.
  • Dynamical prediction over clinical patient data.
  • Structural analysis underlying Vif/APOBEC interface.
  • Interdisciplinary integration of HIV-centric literature sources ontologies, etc. with primary science data sets.

Students interested in working on these issues should have strong computational and mathematical skills. Knowledge of modern biology and familiarity with genomic and other -omic resources will also be helpful. Prior experience with the following is especially valuable:

  • Discrete-time, discrete-event simulation methodologies.
  • Parallel computing models: e.g., Grid computing, Map-reduce, CUDA (Compute Unified Device Architecture), FPGAs (field-programmable gate arrays).
  • Advanced Python and/or Java techniques: e.g., Zope, Plone, XML/XSLT, multithreading.
  • Global and local optimization techniques: e.g., Hamiltonian Monte Carlo, convex optimization (ala Boyd and Vandenberghe).

Phil Bourne, School of Pharmacy

Contact:
Last updated: 9/17/2009


See http://www.sdsc.edu/pb/projects.htm for current projects available in the Bourne Lab and descriptions of the projects below. Rotation projects available as of June 2009 include:

  • Rotation/PhD Projects: Pharmaceutical Sciences - Competitive Binding of Major Pharmaceuticals.
  • Summer/Rotation/PhD Student Project: From Physical Model of Nucleosome Organization Towards Genome Annotation.
  • Rotation/Summer/PhD Projects: Earth Sciences Meets Life Sciences.
  • Rotation/Summer Project: Exploring the Impact of Co and Mo Environments on Life.
  • Rotation/Summer/PhD Project: Exploring the Flexibility versus Designability of Protein Folds.
  • Rotation/Summer/PhD Project: What Makes Some Introns Positions Ultra-conserved?
  • Rotation/Summer/PhD Project: Building a Meta-method for Assignment of Structural Domains in Proteins.
  • PhD Project: Looking for Correlation Between Protein and Gene Structure.
  • Rotation/Summer Project: Scholarly Communication.
  • Summer/Rotation/PhD Project: Evolution of Domain Associations in tRNA synthetases.

Steve Briggs, Cell and Developmental Biology

Project: Mass spectral data integration
When: Any quarter
Contact:
Last updated: 9/17/2009


We study the response of cells and organisms to changes in their immediate environment by making genome-wide, quantitative observations of their proteomes. The data from a typical experiment comprise 1 million peptide fragment mass spectra (CID) and 1 million companion PQD spectra. The CID spectrum reveals the amino acid sequence of the peptide whereas the PQD spectra reveals its relative amount (by measuring iTRAQ reporter ion intensities). We use two different search engines to interpret mass spectra; Inspect and SpectrumMill. There are several others available but we prefer these two. Inspect is being developed by Profs. Vineet Bafna and Pavel Pevzner and their teams at UCSD. SpectrumMill was also partially developed by Pavel Pevzner but it is commercially available from Agilent. We collaborate closely with both Vineet and Pavel.

There is extensive overlap in the results produced by Inspect and SpectrumMill but the results are not identical. Each search engine is able to annotate a subset of mass spectra that the other cannot. In some cases, they provide conflicting results by assigning the same mass spectrum to two different peptides.

There are three goals for this project. First, the student will develop algorithms and software that integrate compatible results from Inspect and SpectrumMill. Second, the student will develop and incorporate algorithms that resolve conflicts between Inspect and SpectrumMill. Third, the student will evaluate additional search engines (X!Tandem and OMSSA) to determine their advantages and to resolve potential conflicting results.

Pieter Dorrestein, School of Pharmacy

Project: Creative approaches for using mass spectrometry data in therapeutic dereplication and discovery efforts.
When: Any quarter
Contact:
Last updated: 9/17/2009


One of the conundrums in the discovery of new therapeutics is the rediscovery of known molecular entities. Therefore it is of important to find and remove known molecular entities from this list. This process is called dereplication. Dereplication can be performed with mass spectrometry however there is not a good database that contains mass spectrometry data of these type of molecules. Therefore one rotation project would be to interface SMILES, a one line representation of molecules to be incorporated into a comparative dereplication databases and apply this to the discovery of unknown molecular entities. Finally, we are developing novel therapeutic discovery approaches that rely on the observation of metabolic exchange via imaging mass spectrometry or metabolomic approaches. Again, new and creative algorithmic solutions are needed to discover needles in a haystack of data.

Below are two recent publications that highlight some of these efforts.

  1. Dereplication and de novo sequencing of nonribosomal peptides. Ng J, Bandeira N, Liu WT, Ghassemian M, Simmons TL, Gerwick WH, Linington R, Dorrestein PC, Pevzner PA. Nat Methods. 2009 Aug;6(8):596-9. Epub 2009 Jul 13. PMID: 19597502
  2. Interpretation of tandem mass spectra obtained from cyclic nonribosomal peptides. Liu WT, Ng J, Meluzzi D, Bandeira N, Gutierrez M, Simmons TL, Schultz AW, Linington RG, Moore BS, Gerwick WH, Pevzner PA, Dorrestein PC. Anal Chem. 2009 Jun 1;81(11):4200-9. PMID: 19413302

Charles Elkan, Computer Science and Engineering

Contact:
Last updated: 9/17/2009


BIOINFORMATICS ROTATION PROJECTS AVAILABLE IN MACHINE LEARNING

Students are welcome for computational rotation projects that apply methods from machine learning and statistics to problems in sequence analysis, structure prediction, and data and text mining.

Specific possible tasks include:

  1. Identifying and disambiguating all mentions of transport proteins in Medline abstracts.
  2. Predicting contacts between amino acids in globular proteins.
  3. Applying so-called "topic models" to gene expression data.
  4. Predicting entry of HIV into cerebrospinal fluid, based on longitudinal patient characteristics.

These projects are all somewhat speculative and high-risk high-reward, and they all require programming skills combined with mathematical ability. They are based closely on high-quality recent research, but they demand innovation.

Chris Glass, Cellular and Molecular Medicine

Contact:
Last updated: 08/31/2009


Research Interests: Dr. Glass’ laboratory investigates transcriptional mechanisms that regulate the development and function of the macrophage, a cell that plays key roles in immunity and inflammatory diseases. Current efforts are to determine the biochemical and biological roles of sequence-specific transcription factors and their associated co-regulators at a genome-wide scale. A combination of biochemical, cellular and genetic model systems are used, incorporating macrophage-specific knockouts, microarray technologies, massively parallel sequencing and bioinformatics approaches, to unravel the contributions of specific factors to the development of specialized macrophage functions in immunity and the pathogenesis of inflammatory diseases.

Bioinformatics rotation projects focus on analysis of data derived from the application of chromatin immunoprecipitation coupled to massively parallel sequencing (ChIP-Seq) to define genome-wide locations of transcription factors that control specific aspects of macrophage biology. This information is integrated with corresponding high throughput transcriptomic data to develop testable models for transcriptional circuits that underlie biological responses.

Jeff Hasty, Molecular Biology and Bioengineering

Project: Modeling the dynamics of gene regulation
Contact:
Last updated: 09/20/2009


This rotation project will involve two questions in the area of modeling the dynamics of gene regulation. We have generated fluorescence data from a system where a promoter is driven periodically with an external inducer and the resulting gene expression has been measured using GFP. The first question involves the deduction of a mesoscopic dynamical model from the data, and how this approach can be generalized to characterize a library of promoters for use in constructing genetic circuits. The second question is how the data and the resulting mesoscopic model can be used to constrain a large set of parameters that define a microscopic model.

Alexander Hoffmann, Chemistry and Biochemistry

When: Any quarter
Contact:
Last updated: 9/17/2009


The Hoffmann laboratory is focused on the signaling systems that control inflammatory and immune responses. Misregulation of these is a common cause for chronic diseases including cancer. We are interested in information processing mechanisms that connect receptors with transcription factors, as well as the regulatory circuitry in the nucleus that determines gene expression programs.

Rotation projects involve statistical and/or mechanistic modeling. Statistical modeling usually involves large amounts of data we generate by genome wide expression or transcription factor location analysis with our in-house set of mouse knockouts. Mechanistic modeling involves the ordinary differential equations that describe receptor activation and the regulation of kinase networks and transcription factor activities, and that can be parameterized using experimental data acquired by the biochemistry-focused members of the group. Merging the two to generate predictive models for gene expression programs in healthy immune responses and in disease is a third and ultimate challenge.

Current Rotation Project Opportunities are:

  1. Develop kinetic models that account for the sequential activation of different members of the NF-κB dimer family.
  2. Develop a kinetic model for MyD88/TRIF signaling downstream of TLR pathogen receptors.
  3. Develop and compare different models of gene activation: the role of multiple specific TF-binding sites and non-functional sites.
  4. Classify feedback regulators based on their emergent systems properties - this project is more theoretical than the others.
  5. Develop algorithms to identify specific gene expression modules in B-cells, T-cells, dendritic cells.
  6. Develop algorithms to characterize the specificity of NF-κB family members based on gene expression data in knockout cells.

Vivian Hook, Skaggs School of Pharmacy and Pharmaceutical Sciences, Depts. of Neuroscience, Pharmacology, and Medicine

Contact:
Last updated: 9/17/2009


Bioinformatics and mass spectrometry is utilized by the Hook lab to define the protease pathways responsible for (1) production of active peptides functioning as peptide neurotransmitters for pain relief, and for (2) production of neurotoxic peptides in neurodegenerative diseases of Alzheimer's and Huntington's disease. The peptide and protease mass spectrometry, combined with molecular and cellular biochemistry, are integrated in multidisciplinary projects to define pharmacological features of protease pathways for drug development strategies. Please see the Hook lab website: http://pharmacy.ucsd.edu/HookLab/index.php

In addition to rotation projects suggested on the bioinformatic web site, student discussions and formulation of new ideas for rotation projects are welcome.

Trey Ideker, Bioengineering

Contact:
Last updated: 9/20/2009


The Ideker Laboratory conducts bioinformatic and experimental research in the field of Network and Systems Biology. Our bioinformatics research is working to develop network-based methods that we believe will form the next generation of tools for disease diagnosis and treatment. These include:

  1. Network Assembly: Methods for assembling genome-scale data (molecular interactions and profiles) into models of signaling and regulatory pathways.
  2. Network-Based Diagnosis: Methods for using protein interaction maps to diagnose cancer and interpret personalized genetic information.
  3. Comparative Network Genomics: Methods for comparison of networks across different species, network types, biological conditions, or points in time.

Many of these tools are developed in Cytoscape, an open-source software environment for visualization and modeling of biological networks which is supported by a consortium of many labs including our own (http://www.cytoscape.org/). Experimentally, the lab is applying the above techniques to key problems in cell biology and disease, including mapping the networks governing how cells respond to DNA damage and mapping the networks that underlie infection by HIV, malaria, herpes, and others.

Rotation projects are currently available in all of the above areas. Particular details of two projects are:

  • Pathway Association: A new paradigm for interpreting patient genotypes.
    Although genome-wide association studies (GWAS) are rapidly increasing in number, numerous challenges persist in identifying and explaining the associations between genetic loci and quantitative disease phenotypes. The rotation student will develop tools to integrate gene association data with protein network information to identify the pathways underlying a patient's genotype. These methods will elevate the study of gene association to a new study of “pathway association.” The project is a collaboration with Dr. Richard Karp in the EECS Department at UC Berkeley.
  • Network comparison of fission and budding yeast.
    A current roadblock to advancing Comparative Network Genomics as a field is the marked lack of interaction maps at high coverage and at the appropriate distances for evolutionary comparison. To address this shortcoming, we are working with the Krogan lab at UCSF to obtain high coverage network maps across the model organisms Schizosaccharomyces pombe and Saccharomyces cerevisiae [for recent results see Roguev et al. Science 2008]. Networks are being analyzed at the level of protein-DNA (transcriptional), protein-protein, and genetic (synthetic-lethal) interactions. Many aspects of S. pombe physiology bear more in common with mammals than does S. cerevisiae, including intron/exon splicing, chromosomal architecture, and RNA interference machinery. The rotation student will be involved in protein-DNA interaction screens using the technique of chromatin immunoprecipitation followed by Solexa sequencing, integrative analysis of the network data s/he generates during the rotation, and joint meetings with UCSF investigators.

Richard D. Kolodner, Member, Ludwig Institute for Cancer Research and Professor, Departments of Medicine, and Cellular and Molecular Medicine

Contact:
Last updated: 9/17/2009


My research is primarily directed at studying the genetic and biochemical mechanisms of genetic recombination, DNA repair and suppression of spontaneous mutations primarily using Saccharomyces cerevisiae as a model system. Work in S. cerevisiae falls in two interrelated areas:

  1. the analysis of the proteins and genes that function in DNA mismatch repair; and
  2. elucidation of the pathways that prevent translocations and other types of gross chromosomal rearrangements, and the analysis of the proteins that function in these pathways.

The Kolodner lab also has research interests in the area of investigating the genetics of cancer susceptibility and development that follows on previous studies showing that a common cancer susceptibility syndrome, Lynch Syndrome (hereditary non-polyposis colorectal cancer), is due to inherited defects in DNA mismatch repair genes. This work is focused on understanding whether genes that prevent genome instability act as tumor suppressor genes in mouse and humans.

Current research areas are as follows:

  • Reconstitution of DNA mismatch repair from purified proteins, and biochemical and genetic analysis of mismatch repair mechanisms.
  • Biophysical analysis of mismatch repair proteins using methods such as small angle X-ray scattering, X-ray crystallography and deuterium exchange by mass spectrometry.
  • Use of yeast genetics to identify the different pathways that prevent translocations and other genome rearrangements, with a particular interest in checkpoints and processing of stalled replication intermediates.
  • Systems biology analysis of pathways that prevent genome instability as well as pathways that regulate the DNA damage response.
  • Proteomic approaches to the study of proteins that function in mismatch repair and in preventing genome instability.
  • Human and mouse genetics of mismatch repair defects and genetic defects that cause genome instability. We also study the human genetics of inflammatory syndromes.

Potential rotation projects:

  1. Use proteomic approaches including mass spectrometry to identify proteins that interact with known mismatch repair proteins. Perform genetic experiments to validate interactions identified.
  2. Use next generation DNA sequencing to identify genome rearrangements in S. cerevisiae mutants having high rates of genome instability. This would include hands on experience doing high throughput sequencing and development and evaluation of the computational pipeline for analysis of the sequence data.
  3. Develop robotic yeast genetics methods for identifying genes and pathways that prevent genome instability. Develop and/or implement bioinformatics methods for associated pathway analysis.
  4. Use robotic methods to perform a high-density genetic analysis of the ~100 genes encoding chromatin assembly and modification pathways to determine which pathways play a role in maintaining genome stability.
  5. Use next generation DNA sequencing for identifying genetic defects in genome stability genes in human cancers. This would include hands on experience doing high throughput sequencing, data analysis and different strategies for evaluating the functional consequences of genetic changes identified.

Representative publications:

  1. Wang, Y, Putnam, CD, Kane, MF, Zhang, W, Edelmann, L, Russel, R, Carrington, DV, Chin, L, Kucherlapati, R, Kolodner, RD, and Edelmann, W. Mutation in Rpa1 results in defective DNA double-strand break repair, chromosomal instability and cancer in mice. Nature Genet. 2005;37:750-755.
  2. Mazur, DJ, Mendillo, ML, and Kolodner, RD. Inhibition of Msh6 ATPase activity by mispaired DNA induces a Msh2(ATP)-Msh6(ATP) state that is capable of hydrolysis independent movement along DNA. Mol. Cell. 2006;22:39-49.
  3. Enserink, JM, Smolka, MB, Zhou, H, and Kolodner, RD. Checkpoint proteins control morphogenetic events during DNA replication stress in Saccharomyces cerevisiae. J. Cell Biol. 2006;175:729-741.
  4. Ragu, S, Faye, G, Iraqui, I, Heneman-Masurel, A, Kolodner, RD, and Huang, M-E. Endogenous oxygen metabolism and reactive oxygen species cause chromosomal rearrangements and cell death in Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. USA. 2007;104:9747-9752.
  5. Shell, SS, Putnam, CD, and Kolodner, RD. The N-terminus of Saccharomyces cerevisiae Msh6 is an unstructured tether to PCNA. Mol. Cell. 2007;26:565-578.
  6. Putnam, CD, Hayes, TK, and Kolodner, RD. Specific pathways prevent duplication-mediated genome rearrangements. Nature (Article). 2009;460:984-989.
  7. Pennaneach, V, and Kolodner, RD. Stabilization of dicentric translocations through secondary rearrangements mediated by multiple mechanisms in S. cerevisiae. PLoS One. 2009;4:e6389.

Sergei L. Kosakovsky Pond, Medicine

When: Any quarter
Contact:
Last updated: 9/23/2009


There are a variety of computational (no wet lab components) quite flexible projects available at the UCSD viral evolution group (located near the UCSD hospital in Hillcrest)

1. Software for modeling molecular evolution.

We develop a variety of software tools and statistical procedures, both standalone (www.hyphy.org) and web-based (www.datamonkey.org) for cutting-edge modeling of sequence evolution. There are a variety of projects available in this area, including

  • Learning and improving computational and data analysis algorithms
  • Writing code in C/C++/Python.
  • Parallel algorithm development (OpenCL and MPI).
  • Performing analysis of viral pathogen data using cutting edge techniques.

2. Pattern analysis and data mining of viral sequences

Many problems in evolutionary biology have involve searching a combinatorially large space of possible solutions, including vaccine design (http://www.ncbi.nlm.nih.gov/pubmed/17465674), unraveling the evolutionary past of a sequence (http://www.ncbi.nlm.nih.gov/pubmed/16818476), understanding how proteins evolve to adapt to immune responses while maintaining essential functions (http://www.ncbi.nlm.nih.gov/pubmed/18039027). There is great potential to apply machine learning techniques (genetic algorithms, natural language processing, Bayesian graphical models, support vector machines etc) to tackle these types of problems and obtain novel insight into how evolution shapes viral genomes.


3. Molecular evolution and epidemiology of HIV

For decades, the best efforts to design an effective HIV vaccine failed, partly because the virus is incredible genetically diverse and has evolved a multitude of mechanisms to adapt and escape the immune system, and partly because we still lack good understanding of what happens to the virus early in infection. We are actively involved in many projects run at the UCSD Center for AIDS Research and the Antiviral Research Center and focusing on:

  • Mapping the epidemiological network on HIV infected patients in San Diego.
  • Understanding the genotypic basis of viral escape from neutralizing antibodies.
  • Using deep sequencing to descrive and model viral populations in a single host, especially in those hosts that harbor multiple viral variants.
  • Developing an array of HIV-1 specific bioinformatic tools.

J. Andrew McCammon, Chemistry & Biochemistry, Pharmacology, HHMI

Contact:
Last updated: 09/24/2009


One-quarter rotations: fall, winter, and spring quarters, depending on desk availability.


Computer-aided Drug Discovery

The McCammon group conducts a very wide range of research activities, from the deeply biological (studies of protein and nucleic acid targets for drugs for infectious diseases, studies of protein kinase regulation, etc.) to the development of mathematical and physical methods for simulating biological processes (development of methods for solving partial differential equations, exploring the role of hydrodynamic interactions in protein-protein association, etc.). All of this work involves the use of computers; we do no experimental work in the traditional sense, but we have extensive collaborations with experimental labs at UCSD, The Scripps Research Institute, The Salk Institute, and elsewhere. A more complete perspective can best be obtained by visiting the McCammon group website (http://mccammon.ucsd.edu/).

Participants in the Bioinformatics Graduate Program who have completed their Ph.D.’s in the McCammon group have often focused on computer-aided drug discovery. Rotations can typically be arranged that involve working on aspects of ongoing projects. Current drug discovery work includes efforts to identify compounds that might be effective as antiviral agents (for HIV/AIDS or influenza), anti-trypanosomal agents (for African sleeping sickness or Chagas disease), etc. Our previous work has facilitated the discoveries of the HIV-1 protease inhibitor nelfinavir (approved by the FDA in 1997) and the first-in-class HIV-1 integrase inhibitor raltegravir (approved by the FDA in 2007).

Recommended reading includes

  • Schames, J., R.H. Henchman, J.S. Siegel, C.A. Sotriffer, H. Ni, J.A. McCammon. Discovery of a Novel Binding Trench in HIV Integrase. J. Med. Chem. 47, 1879-1881 (2004).
  • Amaro, R.E., R. Baron, J.A. McCammon. An Improved Relaxed Complex Scheme for Receptor Flexibility in Computer-Aided Drug Design. J. Comp.-Aid. Molec. Des. 22, 693-705 (2008).
  • Amaro, R., A. Schnaufer, W. Hol, K. Stuart, J.A. McCammon. Discovery of Drug-like Inhibitors of an Essential RNA Editing Ligase in Trypanosoma brucei. Proc. Natl. Acad. Sci. USA 106, 17278?17283 (2008).

Andrew McCulloch, Bioengineering

Contact:
Last updated: 09/17/2009


  1. Computational modeling of cardiac myocyte excitation-contraction coupling mechanisms and signal transduction.
  2. Systems biology of hypoxia responses in Drosophila -- analysis and modeling of microarray, metabolomic and phenomic data.
  3. Algorithms for combinatorial drug discovery and analysis of the cancer cell survival control landscape

While all projects are computational, all also have significant experimental components.

Sanjay Nigam, Pediatrics/Medicine/Cellular and Molecular Medicine/Bioengineering

Project: Systems Biology of Organ Development and Repair, and Multiscale Modeling of Drug and Toxin Handling
When: Any quarter
Contact:
Last updated: 9/17/2009


The Nigam Lab is a multidisciplinary group of wet lab and computationally-oriented biologists. There are 3 general areas of study:

  1. Systems biology of organ development and repair.
  2. Multiscale modeling of drug, toxin and metabolite handling at the molecular, cellular and organ level.
  3. Tissue engineering.

The major focus is on the kidney as a model system. A considerable body of quantitative wet lab data has been accumulated and is being systematized for ongoing computational analysis to identify and experimentally test relevant pathways. Potential rotation students have the opportunity to work with members of the group with both wet lab and computational expertise. Dual mentorship involving other Bioinformatics and Systems Biology faculty are encouraged.

Lucila Ohno-Machado, Medicine

When: Any quarter
Contact:
Last updated: 9/17/2009


Calibration of Predictive Models for Diagnosis and Prognosis of Disease

Medical decision support tools are increasingly available on the Internet and are being used by lay persons as well as health care professionals. The goal of some of these tools is to provide an "individualized" prediction of future health care related events such as prognosis in breast cancer given specific information about the individual, including the genotype. Subsequently, these estimates are used to inform decision making and are therefore of critical importance for public health. Verifying the calibration of the prognostic model is a critical but often overlooked step in evaluation, which usually favors the verification of the discriminatory ability of the model.

Our specific aims are to:

  1. Characterize the main deficiencies of existing calibration indices in the context of individualized predictions and develop a new model-independent calibration index and comparison test that can be used to assess and compare predictive models based on both statistical regression and machine learning methods;
  2. Unify the theories on decomposition of error into discrimination and calibration components stemming from the statistical and machine learning communities to derive a refined measure of calibration that can be calculated from measures of error and discrimination. We ompare the performance of the new methods in different predictive models derived from real clinical data related to different medical domains.

In-Silico Validation of Biomarkers for Breast Cancer

One of the most important challenges in the validation of breast cancer biomarkers is determining whether a potential biomarker is specific for breast cancer or it is simply a marker of acute disease. Controversy on whether C3a is a specific marker for breast cancer illustrates this issue well. The problem with many biomarker identification studies is that they do not include samples that are representative of acute disease as controls. The advent of high-throughput technologies for gene expression and protein measurement has been accompanied by a plethora of articles describing potential biomarkers. Given the experimental design limitations of the original experiments and of initial and secondary data analyses, many biomarkers are hypothesized without a firm basis and, not surprisingly, cannot be validated in further experiments. We hypothesize that it is possible to invalidate most biomarkers in-silico, before any resources are spent in large scale validation studies.

Our aims are to:

  1. Determine whether it is feasible to reliably align and rescale biomarker measurements from different technologies spanning different biological scales for comparative analysis;
  2. Build a resource that will enable researchers to assess the predictive ability of hypothesized breast cancer biomarkers using information from other studies.

José Onuchic, Physics

Contact:
Last updated: 9/17/2009


See http://ctbp.ucsd.edu/ for information about research at the Center for Theoretical Biological Physics.

Bernhard Ø. Palsson, Ph.D., Bioengineering

When: Fall 2009, Winter 2010, Spring 2010
Contact:
Last updated: 9/17/2009


See http://gcrg.ucsd.edu/About_Us for information about Systems Biology research in the Palsson lab.

Note: This list is only somewhat complete. Feel free to come with your own idea and make sure to speak with people in lab regarding new projects that are always coming up. There is also a useful course series (BENG 211-213) that can serve as an introduction to the work we do in the lab.

  1. Modeling metabolic effects of light exposure in green microalgae
    Contact: Roger Chang,
    The goal of this project is to model the effects of light on reaction activity and biomass composition. This also requires proper representation of the spectral composition of light sources and the elucidating the quantitative difference between incident light and metabolically available light.
     
  2. Human physiological/metabolic network reconstruction
    Contact: Roger Chang,
    This project is meant to assemble a knowledgebase of human metabolic interactions across organ systems and biofluids. In addition to providing an easily understandable network capturing intersystem metabolism, this result can be integrated with context-specific models to create composite multi-system models to study precise metabolic interactions among the organs and biofluids.
     
  3. Experimental gap filling iAF1260c (E. coli)
    Contact: Jeff Orth,
    This project is meant to improve the current metabolic model of E. coli. In order to improve the topology and stoichiometry of the network, experimental work is necessary to validate computationally suggested additions and subtractions to the core model. Strains from the KEIO knockout collection will be grown and tested in different media to assess growth phenotypes. Iterative improvements will be made to the model based on the results.
     
  4. Y. pestis metabolic reconstruction
    Contact: Pep Charusanti,
    This project will culminate in the creation of a metabolic reconstruction for Y. pestis, an important human pathogen. Work in this area can leverage the recent reconstructions of closely related organisms and strains.
     
  5. Energy balanced Flux Balance Analysis (FBA) implementation and analysis
    Contact: Jan Schellenberger,
    A new form of metabolic flux analysis (MFA) called thermodynamics-based metabolic flux analysis (TMFA) is introduced with the capability of generating thermodynamically feasible flux and metabolite activity profiles on a genome scale. TMFA involves the use of a set of linear thermodynamic constraints in addition to the mass balance constraints typically used in MFA. TMFA produces flux distributions that do not contain any thermodynamically infeasible reactions or pathways, and it provides information about the free energy change of reactions and the range of metabolite activities in addition to reaction fluxes. TMFA is applied to study the thermodynamically feasible ranges for the fluxes and the Gibbs free energy change, Delta(r)G', of the reactions and the activities of the metabolites in the genome-scale metabolic model of Escherichia coli developed by Palsson and co-workers. In the TMFA of the genome scale model, the metabolite activities and reaction Delta(r)G' are able to achieve a wide range of values at optimal growth. We aim to implement this algorithm (add to the COBRA toolbox) and do preliminary analysis on existing metabolic models.
     
  6. Improving an automated genome assembly pipeline (using only small reads)
    Contact: Harish Nagarajan, , and Christian Barrett,
     
  7. Brain mitochondria modeling
    Contact: Nathan Lewis,
     
  8. Microarray data analysis (E. coli)
    Contact: Nathan Lewis,
     
  9. Speed optimize an algorithm for binding site identification for ChIP-on-chip data
    Contact: Nathan Lewis,

Pavel Pevzner, Computer Science and Engineering

Contact:
Last updated: 9/20/2009


There are two rotation projects on emerging sequencing technologies.

The first project aims to develop a new tool for transcriptome (rather than genome) assembly and to apply it to analyzing cancer transcriptomes. The existing Next Generation Sequencing (NGS) tools for analyzing transcriptomes assume that a genome is known. These tools are not well suited for analyzing abnormal transcripts (i.e., fusion proteins in cancer), moreover, in many cases, genomes remain unknown. For example, the NIH Grand Opportunity topic "Transcriptomes of Medicinal Plants" aims at sequencing transcriptomes of plants with unknown genomes. While transcriptome assembly tools are clearly needed (Wang et al., 2008, Maher et al., Nature 2009) there are still no blind NGS transcriptome assembly tools able to assemble transcriptomes without genomes. The goal of this project is to develop a blind transcriptome assembler using EULER (Chaisson et al., Genome Research 2009). This rotation project is co-supervised by Pavel Pevzner and Glenn Tesler.

The second project is focused on a new protein (rather than DNA) sequencing technology. Recent collaboration between UCSD and Genentech researchers resulted in the first high-throughput MS-assembly technique for sequencing antibodies (Bandeira et al., Nature Biotechnology 2008). MS-Assembly is based on blind assembly of tandem mass spectra into intact proteins and further matching it against known antibodies. The focus of this rotation project is to further develop antibody sequencing techniques and to collaborate with researchers from various institutions on sequencing monoclonal antibodies. We will further extend this project to sequencing polyclonal antibodies. This rotation project is co-supervised by Nuno Bandeira and Pavel Pevzner.

Jim Posakony, Biological Sciences

Contact: , x46300
Last updated: 1/4/2009


  1. Ancient transcriptional regulatory linkages. A central paradigm of modern biology is that changes in gene regulatory networks, particularly those affecting transcriptional cis-regulatory elements, are a major engine for creating developmental novelty during evolution. We are investigating the flip side of this coin - regulatory connections that survive for very long evolutionary periods. We suggest that these might be characteristic of ancient and abstract developmental capabilities of fundamental utility to all metazoans. We have identified several transcriptional regulatory linkages that have persisted for hundreds of millions of years, in some cases since before the cnidarian/bilaterian divergence. In multiple instances, we see strong evidence that the actual transcription factor (TF) binding site itself is conserved over these long distances, and not just the TF-target gene connection. Moreover, this extreme site conservation frequently occurs in contexts in which other sites for the same TF are coming and going rapidly in evolution. We are engaged in the systematic identification of such extremely conserved connections as a means of uncovering the primordial functions of transcription factors and signaling systems. The rotation project, which is flexible in content, revolves around the development and utilization of bioinformatic approaches to this problem, coupled with wet-lab validation experiments.
  2. Micro-conservation domains as possible identifiers of discrete cis- regulatory modules. We have used the latest version of our GenePalette software to visualize patterns of genomic sequence conservation on a fine scale, in a manner that is independent of sequence alignment. This analysis reveals small non-coding regions containing a higher-than-average density of fully conserved sequence blocks, separated by regions of much lower conservation. The distances between more-conserved regions are typically different in different species due to insertion-deletion events, such that conserved segments that are adjacent in one species may be substantially separated in another. We would like to test the utility of this analytical method for identifying the boundaries of discrete transcriptional cis-regulatory modules. The rotation project would consist of both bioinformatic studies and wet-lab experimental tests of possible enhancer function, using reporter genes.

Michael G. Rosenfeld, Howard Hughes Medical Institute

Project: Epigenetic Control of Development, Regulation, and DNA Damage Repair in Health and Disease
Contact:
Last updated: 9/17/2009


We here provide two of five potential rotation projects that are essentially "ready to go" in terms of underlying data sets. We would be pleased to have you join us in any of these projects.

Rotation Project #1:

Analysis of induced double stranded DNA breaks and tumor translocations. We have devised a new strategy that can be used for genome-wide analysis of locations of Double-Stranded DNA Breaks (DSBs) using Solexa sequencing strategies. Using this method we have analyzed the effects of estrogen or androgen in the presence of absence of genotoxic stress on the pattern of DSBs. These are complex data sets that have the opportunity of providing a clue of where tumor translocation events might occur in breast and prostate cancers, respectively.

Rotation Project #2:

Informatic prediction of translocation sites in human solid tumors. Based on the location of estrogen receptor and androgen receptor binding sites, the boundaries between non-coding and coding exons, and double stranded DNA break sites, it should be possible to design informatically an experimental multiplexed assay that can uncover a novel translocations using deep RNA/DNA sequencing approaches. This could have a high impact in our understanding of these two prevalent forms of cancer.

Milton Saier, Biological Sciences

Contact:
Last updated: 1/4/2009


Our bioinformatics/systematics lab focuses on transport proteins. We have created an extensive data base (Transporter Classification Database (TCDB)) which was adopted by the IUBMB in 2001 and for which we have NIH support. About half of our dry lab efforts (WE ALSO HAVE A WET LAB FOR ANYONE INTERESTED (see below)) deal with the development, maintenance and extension of TCDB. We are developing MACHINE LEARNING approaches together with our collaborators in CS, Prof Charles Elkan and a postdoc, Keith Noto, and are constantly improving the system by developing new software. We (particularly Postdoc Ming Ren Yen) have recently developed novel programs that allow us to detect and construct trees for proteins of very distant phylogenetic relationships, those for which reliable multiple alignments can not be drawn by any of the existing programs. This has allowed us for the first time to define the relationship between all families and all members of these families within a superfamily. Other programs are being developed and tested for reliability. Any of these efforts can make for excellent rotation projects.

In addition to TCDB development, we (1) screen whole genomes for transporters and derive physiological, metabolic and evolutionary information for the results; (2) characterize novel (recently discovered) transport protein families, thereby deriving or allowing prediction of mechanism, transport mode, type of energy coupling, substrate specificity, etc, and examine the evolutionary origins (e.g., occurence of of the proteins as well as the likelihood of horizontal gene transfer (which is family specific), etc. We have both group and individual projects. Interested students can also initiate their own projects, not directly related to most of the efforts in the lab.

In our wet lab, we (1) study directed mutation, (2) conduct protein engineering, (3) examine the transition between bilayer and micellar forms of integral membrane transport proteins, (4) collaboratively examine bacterial-animal cell interactions for the purpose of developing microbial contraceptives, anti STD probiotic bugs and anti-cervical cancer (anti-papilloma virus) bacteria, and (5) develop methods for biofuel production using algae. We are formulating expansion of TCDB into a UCSD based Bioinformatics/ Systematics Center (BISC) that will allow classification of all major classes of proteins and nucleic acids and the processes they promote. Again, these efforts can make for excellent rotation projects.

Tabulating possible rotation projects, therefore:

  1. Software development in support of TCDB
  2. Comparative whole genome analysis
  3. Novel transport protein family characterization
  4. BISC development: e.g., superimposition of phylogenetic information on the current functional EC system; creation of a database for structural proteins, etc.
  5. Developing machine learning approaches for inputting novel information into a database (e.g., TCDB)
  6. Software development for determining pathways for protein/nucleic acid evolution
  7. Dry/Wet lab combined projects as noted in paragraph 3 above
  8. Any other project of general interest to the student and the lab

I am always available for more discussions and more detailed information.

I hope this is helpful.

All the best, Milt

Julian Schroeder, Biological Sciences

Project: Systems Biology and Engineering of Environmental and Drought Tolerance in Plants.
When: Any quarter
Contact:
Last updated: 9/17/2009


Professor Schroeder's research is directed at the signal transduction mechanisms and pathways that mediate resistance to environmental stresses in plants, in particular drought, salinity stress and heavy metal stress. These environmental ("abiotic") stresses have substantial negative impacts and reduce global plant growth and biomass production. These environmental stresses are also relevant in reference to climate change and to expanding available arable land to meet human needs. Research in Julian Schroeder's laboratory is using multidisciplinary approaches including genomics, bioinformatics, cell signaling, proteomics and molecular biological towards uncovering the signal transduction network and receptors in plants that translate drought stress hormone reception, CO2 sensing and salinity stress to specific resistance responses.

Julian Schroeder is Director of the Plant Systems Biology Graduate Training Program. See http://www-biology.ucsd.edu/labs/schroeder for more information on the Schroeder lab.

Selected publications

  • Mori, I.C. et al. PloS Biol. 4: 1749-1762 (2006).
  • Horie, T. et al. EMBO J.26:3003-3014 (2007).
  • Vahisalu, T. et al. Nature 452: 487-491 (2008).
  • Ward, J.M. et al., Annual Reviews of Physiol. (2009).
  • Park et al. Science (2009).

Susan Taylor, Chemistry and Biochemistry

Project: Mapping the PKA Proteome
When: Any quarter
Contact:
Last updated: 9/17/2009


Summary: cAMP-dependent protein kinase (PKA) is ubiquitous in every mammalian cell, and the PKA signaling network regulates processes as diverse as memory, differentiation, development, and circadian rhythms. One of our goals, in addition to elucidating structures of the PKA subunits, is to map the PKA proteome which consists not only of the PKA regulatory and catalytic subunits and PKA substrates but also the scaffold proteins (A Kinase Anchoring Proteins: AKAPs) that target PKA to specific sites in the cell. To begin mapping the PKA proteome we will use mass spectrometry to specifically compare wild type S49 mouse lymphoma cells with S49 cells that lack the ability to express active PKA catalytic subunit.


Detailed Project

cAMP-dependent protein kinase (PKA) is ubiquitous in every mammalian cell, and the PKA signaling network regulates processes as diverse as memory, differentiation, development, and circadian rhythms. One of our goals, in addition to elucidating structures of the PKA subunits, is to map the PKA proteome. PKA is a broad spectrum kinase that has many protein substrates. It consists of regulatory (R) and catalytic (C) subunits and assembles into an inactive tetramer (R2C2) in the absence of cAMP. Binding of cAMP to the dimeric R-subunits unleashes the catalytic activity. There are four functionally non-redundant R-subunits (RIα, RIβ, RIIα, and RIIβ) and three C-subunits (Cα, Cβ, and Cγ). The Cα and Cβ subunits have many N-terminal splice variants. In addition to PKA R and C-subunits and PKA substrates, the PKA proteome includes scaffold proteins called A Kinase Anchoring Proteins (AKAPs) that target PKA to specific sites in the cell at the correct time. This spacio-temporal aspect is essential for correct PKA signaling in cells. One of our goals is to identify the proteins that are part of this PKA proteome and to establish how this proteome is altered in response to stress signals such as starvation and to the normal circadian rhythm. In addition we hope to establish how the proteome varies as a consequence of disease or of genetic perturbation. To initially profile the PKA proteome we will be using two cell types, mouse macrophages (RAW cells) and S49 mouse lymphoma cells. In each cell line we have perturbed the PKA signaling pathway. In RAW cells we have deleted the Cα and Cβ genes. In the S49 cells we have generated a mutant cell line that makes no active C-subunit. Although the protein is expressed in these cells in is not active, is not soluble, and remains associated with particulate fractions. Our goal is to compare each of the wild type cells lines with the cell lines where PKA function has been perturbed. To do this we will use mass spectrometry to identify the proteins that change, and these changes will be compared to changes in gene expression. The S49 project will be done in collaboration with Paul Insel and Nuno Bandiera.

Glenn Tesler, Mathematics

Contact:
Last updated: 5/25/2009


There are two projects available for Summer 2009.

  1. The first project is in comparative genomics and concerns genome rearrangements. I work on algorithms for comparing rearranged genomes, and on applying them to real data sets. Expertise in algorithm development and discrete mathematics is preferred.
  2. The second project aims to develop a new tool for transcriptome (rather than genome) assembly and to apply it to analyzing cancer transcriptomes. The existing Next Generation Sequencing (NGS) tools for analyzing transcriptomes assume that a genome is known. These tools are not well suited for analyzing abnormal transcripts (i.e., fusion proteins in cancer), moreover, in many cases, genomes remain unknown. For example, the NIH Grand Opportunity topic "Transcriptomes of Medicinal Plants" aims at sequencing transcriptomes of plants with unknown genomes. While transcriptome assembly tools are clearly needed (Wang et al., 2008, Maher et al., Nature 2009) there are still no blind NGS transcriptome assembly tools able to assemble transcriptomes without genomes. The goal of this project is to develop a blind transcriptome assembler using EULER (Chaisson et al., Genome Research 2009). This rotation project is co-supervised by Pavel Pevzner and Glenn Tesler.

Wei Wang, Chemistry and Biochemistry

Contact:
Last updated: 09/20/2009


We are interested in studying the biological and physical principles underlying genetic networks and protein recognition. Rotation projects are available in the following areas. Specific projects will be tailored to fit a student's research interest and scientific background.

  1. Decipher epigenetic code: develop computational methods to identify common patterns in the histone modification and DNA methylation data associated with regulatory elements; predict regulatory elements, transcription factor binding sites or non-coding RNAs based on their chromatin signatures.
  2. Reconstruct genetic networks: reconstruct physical interactions from genomic and proteomic data; analyze the robustness and landscape of the networks.
  3. Decipher protein recognition code: employ structure-based computer modeling to characterize the energetic patterns of protein-protein and protein-ligand interactions; predict the specificity of protein recognition.
  4. Design resistance-evading drugs: computer-aided drug design to combat resistance; develop new methods for drug lead optimization.

More information can be found at http://wanglab.ucsd.edu.

Leor Weinberger, Chemistry and Biochemistry

Project: Theoretical and Experimental Approaches to probing HIV and Herpesvirus gene-regulation... and Designing Novel Antiviral Therapies
Contact:
Last updated: 9/17/2009


Many viruses enter a long-lived dormant state and this dormancy remains the most problematic obstacle in treating and eradicating viral infectious diseases. How animal viruses make the decision to enter latency has remained a mystery for decades. We use a combination of mathematical modeling and single-cell microscopy techniques to study HIV and Herpesvirus gene regulation and to model how dormancy is controlled. We focus on understanding feedback circuitry in gene regulation of these viruses and the role of fluctuations in gene expression (i.e. Noise) in controlling the dormancy 'switch'.

In parallel, we are also using modeling and experiment to design novel therapies to turn off the dormancy switch thereby forcing HIV into a dormant state. One of these approaches involves designing therapeutic viruses that 'piggyback' on HIV and can replicate along with HIV but force HIV into a more dormant-like state.

Rotation projects are available in all these areas.

Christopher Woelk, Medicine

When: Any quarter
Contact:
Last updated: 9/17/2009


  1. Pathogen vaccine design using Reverse Vaccinology
    AIM: To identify potential vaccine candidates in the genomes of eukaryotic and bacterial pathogens.
    SKILLS LEARNED: Protein annotation, machine learning, immunology.
    SITE: Woelk lab.
  2. Diagnostic gene expression classifiers for pathogen infection
    AIM: To identify those genes whose expression can determine whether an individual is infected with a virus or bacterial pathogen, or is suffering from an autoimmune disease.
    SKILLS LEARNED: microarray data analysis, supervised learning, class comparison.
    SITE: Woelk lab.
  3. Molecular evolution of cytokines
    AIM: Characterize the selection pressures that have shaped the evolution of interferons in mammalian species.
    SKILLS LEARNED: phylogenetic reconstruction, selection analysis, recombination detection, multiple alignment.
    SITE: Woelk lab..
  4. Genomic Analysis of the Frog Killing Chytrid Fungus
    AIM: To identify pathogen virulence factors by comparing the genomes of two strains of the Frog Killing Chytrid Fungus.
    SKILLS LEARNED: Gene annotation and genome comparison.
    SITE: Woelk lab and Modi lab (Wild animal park).
  5. Comparative Genomics and Population Genetic Analyses of the Vertebrate Major Histocompatability Complex (MHC)
    AIM: To characterize the gene deletions and duplications in the MHC locus of vertebates.
    SKILLS LEARNED: Gene annotation, genome comparison, phylogenetic reconstruction.
    SITE: Woelk lab and Modi lab (Wild animal park).

Terms and Conditions of Use