Jonathan Lam

2020-23 Trainee on National Library of Medicine (NLM) Training Grant Fellowship Program
2020-23 Trainee on National Library of Medicine (NLM) Training Grant Fellowship Program
B.S., Biology, California State University San Marcos, 2020
2020-21 Trainee on NIH Training Grant in Bioinformatics
2020-24 Sloan Fellowship
2021-23 STARs Fellow
2023-25 Human Genetics Scholar Award from the American Society of Human Genetics
2024-27 NIH F31 NRSA
Characterization of gene regulatory elements using multi-omic data
Computational analysis of multi-omic single cell data from 4 species (mouse/marmoset/macaque/human)
Hundreds of duplicated genes in the human genome are duplicated and many are known to be associated with a number of human diseases. However, the short read lengths of current sequencing technologies make the analysis of such genes difficult. We have developed novel tools to genotype the copy number of duplicated genes using whole-genome sequencing. The goal of this project is to analyze large-scale sequencing datasets (using cloud computing platforms) for Mendelian and complex human diseases to identify novel disease associations.
Long-read sequencing technologies have the potential to overcome some of the key limitations of short-read sequencing, particular in long repetitive regions of the human genome, but require the development of new algorithms. We have previously developed computational methods for variant calling (Longshot, Nature Communications 2019) and read mapping in segmental duplications (Duplomap, Nucleic Acids Research 2020) using long-read sequencing technologies. The goal of this project is to implement a haplotype-based model for variant calling using long reads that automatically identifies genomic regions that can be called with high confidence.
Malaria remains a major problem for 40% of the world's population and drug resistance is widespread. One mechanism for identifying drug resistance determinants is by identifying regions that show unexpected homozygosity in whole genome sequences. The rotation student will work with physician scientists to align short read sequences to the P. falciparum genome, call variants, annotate variants, run population genetics analyses and produce reports.
In principle, Darwinian evolution requires at least two essential ingredients: (i) processes that change the inherited genetic material (i.e., mutation of the germline DNA); and (ii) processes that cause natural section based on the functional/phenotypic results of these genetic changes. Germline mutations are believed to predominately originate from endogenous cellular processes with minor contributions from exogenous processes. Each mutational process imprints a characteristic mutational pattern on the genome, termed, mutational signature. For example, the deamination of 5-methylcytosine to thymine is an endogenous process generating C:G>T:A mutations at CpG dinucleotides, while CC:GG>TT:AA doublet substitutions occurring at dypyrimidines are associated with exogenous exposure to ultraviolet light. Analyses of mutational signatures in thousands of cancer genomes has revealed the signatures of more that 100 mutational processes. Some of these mutational processes are operative throughout the entire lifetime of an individual whereas others are present only at certain stages of life. The signatures of processes gradually accumulating throughout the entire lifetime of an individual are referred to as clock-like mutational signatures, and these include signature 1 (etiology: deamination of 5-methylcytosine) and signature 5 (etiology: unknown). Previous work on de novo germline mutations derived from family trios demonstrated that signatures 1 and 5 can explain the majority of these germline variants, indicating that the clock-like signatures are the main contributors to human evolution. However, the activity of different mutational signatures has never been evaluated in regard to the phylogenetic timeline of human evolution. In this rotation project, we will analyze data from the 1000 genomes project, a database containing the germline genomes of 2,504 individuals from 26 populations. This database includes 84.7 million single nucleotide polymorphisms (SNPs) and 3.6 million short insertions/deletions (indels) phased onto high-quality haplotypes. Using these data, we will build a phylogenetic tree (i.e., a tree showing the evolutionary relationship between individuals) where each leaf of the tree will contain the private germline mutations derived from a single individual. The activity of mutational signatures will be evaluated in each leaf as well as in each node of the phylogenetic three. The analysis will reveal the activity of mutational signatures throughout human evolution.
All cancers originate from a single cell that undergoes a transformation from a normally functioning somatic cell into a malignant neoplasm. In most cases, this transformation follows a stepwise process with the somatic cell first expanding into a precancer and, subsequently, becoming an advanced invasive cancer. The progression from a pre-malignant tumor to a malignant neoplasm is due to somatic mutations that can be traced, characterized, and genomically studied. In this rotation, the student will evaluate the mutational burden, driver mutations, copy number changes, mutational signatures, and subclonal architecture of pre-malignant lesions and compare them to molecular events previously identified in advanced invasive cancers. The goal is to reveal the molecular events that are necessary for a precancer to convert into cancer. Independent previously generated drug-screen datasets (e.g., Cancer Cell Line Encyclopedia) will be used to propose potential intervention strategies that can used to target these molecular events in order to halt this conversion and lead to cancer prevention.
B.S., Molecular and Cellular Biology, Trinity College-Connecticut, 2017
2020-21 Trainee on NIH Training Grant in Bioinformatics
NSF Graduate Research Fellowship