GenBank a Model for Future of Phenomics and Disease Study
In a Perspective to be published in the April 18, 2008, issue of Science, Atul Butte, MD, PhD, looks at the history of molecular measurement databases and puts forth a plan for using successes in this field to advance the study of physiology and disease.
The 25th anniversary of GenBank, the open access database of DNA sequences and the various molecules they encode, is celebrated this week by the scientific community. One of the earliest bioinformatics community projects, GenBank has laid the foundation for understanding how genes, proteins, and other molecules are formed and relate to each other, and has stimulated interest in linking this information to physiology and disease information.
Associating molecular measurements with disease phenotypes and physiology is a major focus in bioinformatics today. Butte suggests that in order to make progress in the understanding of normal and impaired health, “the same high-bandwidth measurement style that has accelerated the molecular and genetic study of disease must be practiced in physiology.”
The plethora of publicly available data from disease studies should now be used to find genes with common changes in expression for each condition or related conditions, says Butte. However, the disparity in defining these diseases causes difficulty when interpreting analyses of disease studies.
“The definition of a disease is often specified by a particular knowledge base and is thus subject to limitation and biases,” Butte says. Using common terms to analyze and interpret study data could help streamline the effort.
Recent calls for a Human Phenome Project, which would establish databases of phenotypes associated with physiology in an effort to determine their relation to genes and proteins, show a move in the direction of linking physiological measurements to genetic markers of disease.
However, current methods of identifying phenotypes for molecular study are inadequate. Taking advantage of clinical tests, which are increasingly put into electronic health records, to identify biomarkers and genetic commonalities amongst patients with specific disease phenotypes, may be the most productive means of investigation.
One current obstacle with this approach is access issues to patients’ private health information. But Butte suggests creating a public repository for purely numerical quantitative clinical measurements as a way around the issue.
“Instead of viewing data availability as a disadvantage, clinical researchers and institutions should be encouraged to look at the success of resources such as GenBank” as an example of how publicly available data can yield prolific discoveries when shared.
by Shauna Kanel