The evolution of human evolution

Last Updated
Project Description

In principle, Darwinian evolution requires at least two essential ingredients: (i) processes that change the inherited genetic material (i.e., mutation of the germline DNA); and (ii) processes that cause natural section based on the functional/phenotypic results of these genetic changes. Germline mutations are believed to predominately originate from endogenous cellular processes with minor contributions from exogenous processes. Each mutational process imprints a characteristic mutational pattern on the genome, termed, mutational signature. For example, the deamination of 5-methylcytosine to thymine is an endogenous process generating C:G>T:A mutations at CpG dinucleotides, while CC:GG>TT:AA doublet substitutions occurring at dypyrimidines are associated with exogenous exposure to ultraviolet light. Analyses of mutational signatures in thousands of cancer genomes has revealed the signatures of more that 100 mutational processes. Some of these mutational processes are operative throughout the entire lifetime of an individual whereas others are present only at certain stages of life. The signatures of processes gradually accumulating throughout the entire lifetime of an individual are referred to as clock-like mutational signatures, and these include signature 1 (etiology: deamination of 5-methylcytosine) and signature 5 (etiology: unknown). Previous work on de novo germline mutations derived from family trios demonstrated that signatures 1 and 5 can explain the majority of these germline variants, indicating that the clock-like signatures are the main contributors to human evolution. However, the activity of different mutational signatures has never been evaluated in regard to the phylogenetic timeline of human evolution. In this rotation project, we will analyze data from the 1000 genomes project, a database containing the germline genomes of 2,504 individuals from 26 populations. This database includes 84.7 million single nucleotide polymorphisms (SNPs) and 3.6 million short insertions/deletions (indels) phased onto high-quality haplotypes. Using these data, we will build a phylogenetic tree (i.e., a tree showing the evolutionary relationship between individuals) where each leaf of the tree will contain the private germline mutations derived from a single individual. The activity of mutational signatures will be evaluated in each leaf as well as in each node of the phylogenetic three. The analysis will reveal the activity of mutational signatures throughout human evolution.