Enhancing the Molecular Signatures Database

Project Type
Last Updated
Project Description

We seek an undergraduate student to write software to enhance our Molecular Signatures Database (MSigDB), a repository of gene sets utilized by a worldwide genomics community of over 180,000 users. These gene sets are used to identify and better understand activated pathways underlying human disease, e.g., cancer, and to interpret experimental results. The NIH recently retired the Cancer Genome Anatomy Project, which included the BioCarta collection of curated pathway diagrams. These images and associated metadata are only accessible via the Internet Archive “Wayback Machine”. The project involves retrieving this data from the archive to provide visualizations for entries in the MSigDB. The results of this work will have a great impact on biomedical investigations worldwide. Interested students should have some familiarity with either programming and/or web technologies such as HTML and a willingness to learn Python and how to use libraries such as urllib and BeautifulSoup to perform web scraping. Expertise in biology is not required. The project can be for class credit or as an internship which may lead to other research experience.