Broadening Scientific Discovery by Repurposing Genomic and Proteomic Data
by Shauna Kanel
Atul Butte, MD, PhD, recently received a $1.1 million R01 grant from the National Library of Medicine to find and translate genomic data into clinically useful form. This research calls for Butte to develop a system using controlled vocabularies to represent the context and results of experiments from microarray expression, proteomic and genomic data repositories.
By integrating RNA expression detection microarrays, which measure gene expression, and proteomic data from multiple fields of research, Butte hopes to amass and improve access to data, making it easy for researchers and physicians to ask new questions of existing data. This will add to the overall utility of genomic and proteomic experiments in the clinical realm.
Finding new uses for existing data is essential in the genome era of high yield, but often single-purpose, experiments. Researchers have found difficulty in joining relevant data from the nearly 250,000 internationally publicly-available samples, as data and annotations on the data are usually represented by unstructured free-text. This has previously limited computer readability and secondary usage.
Butte plans to group information together, making it easy to search and link to a vast array of clinical measurements. With Butte’s work, data from expression microarrays and proteomic experiments could be combined with medical and clinical knowledge to find new results.
To repurpose the data, Butte will develop a system to represent the context of experiments using a structured medical language to “map” terms to each other. He will also develop tools and methods to improve access to microarray and proteomic samples across five international repositories, joining relevant information.
To focus tool development and ensure biological relevance, two Driving Biological Projects in the domains of solid-organ transplantation and T-cell biology will search for experimental data within and across microarray and proteomic repositories. This will allow collaborating researchers to use newly-accessible, existing data, to pose new questions in their field, and look at these sciences in new ways.
Other Stanford faculty participating in the grant with Butte include Minnie Sarwal, MD, PhD (Pediatric Nephrology) and Christopher Manning, PhD (Computer Science).