Learning Natural Selection from the Site Frequency Spectrum.
Roy Ronen, Nitin Udpa, Eran Halperin, and Vineet Bafna.
Abstract: Genetic adaptation to external stimuli occurs through the combined action of mutation and selection. A central problem in genetics is to identify loci responsive to specific selective constraints. Over the last two decades, many tests have been proposed to identify genomic signatures of natural selection. However, the power of these tests changes unpredictably from one dataset to another, with no single dominant method. We build upon recent work that connects many of these tests in a common framework, by describing how positive selection strongly impacts the observed site frequency spectrum (SFS). Many of the proposed tests quantify the skew in SFS to predict selection. Here, we show that the skew depends on many parameters, including the selection coefficient, and time since selection. Moreover, for each of the different regimes of positive selection, informative features of the scaled SFS can be learned from simulated data and applied to population-scale variation data. Using support vector machines, we develop a test that is effective over all selection regimes. On simulated data, our test outperforms existing ones over the entire parameter space. We apply our test to variation data from Drosophila melanogaster populations adapted to hypoxia, and identify loci that were missed by previous approaches, strengthening the role of the Notch pathway in hypoxia tolerance. We further apply our test to human variation data, and identify several regions that are in agreement with earlier studies, as well as many novel regions.
UCSD coauthors are Bioinformatics and Systems Biology graduate students Roy Ronen and Nitin Udpa, and Prof. Vineet Bafna.