Chemical biology news from across RSC Publishing.
02 January 2007
A revolutionary change in finding relevant chemical and molecular biology articles is underway, thanks to work by the GO consortium and the RSC.
The Gene Ontology (GO)1 has been developed by an international consortium of databases and researchers to provide consistent descriptions of proteins and other products resulting from gene expression.
Thanks to Gene Ontology (GO) a revolutionary change in finding relevant chemical and molecular biology articles is underway
Shared, precise vocabularies agreed upon by specialists in a particular domain are nothing new. However, ontologies, such as GO and the related Sequence Ontology (SO), go further than this because they contain logical relations between the entries. These include simple relations, which resemble the descending hierarchy of the Linnaean classification of species, and further links between entries. For example, in GO, a cell component, such as a mitochondrion, can be defined to be a part of another component, in this case the cytoplasm.
The GO approach is more general than identifying the genes or proteins themselves. It identifies what a protein does (the molecular function) and what biological processes it is involved with or where it acts (the cellular component). The gene Sos, for example, is linked in Ensembl, a major genomics database,2 to 17 different GO terms, including nervous system development (process), nucleosome assembly (component) and guanyl-nucleotide exchange factor activity (function).
Traditionally, GO terms have been used to annotate large protein databases, so annotating the full text of journal articles as they go online is a radical departure.
This is only the beginning. GO is under constant revision, so changes to the ontology will be reflected in RSC journal articles. There is a great deal of work needed to go beyond process, function and cellular component to the gene products themselves and their interactions, and the RSC and GO consortium intends to develop the project over the coming months and years. Watch this space!
How does it all work?
Using computers to mine biological information from the literature is an active research area, and GO terms are no exception. However, existing automated methods are not reliable enough to use without supervision, so the computer results are checked carefully by the RSC's team of technical editors.
Evelyn Camon and colleagues at the EBI have written a detailed account of potential pitfalls and common mistakes found in automated methods for Gene Ontology annotation,3 and the RSC has taken this into account in its design.
The RSC's text-mining software (Science Come Alive) takes the entries in the ontology and their synonyms, generates a further, much larger, set of synonyms and searches the journal articles for them. This simple approach, while not catching all of the terms in a paper, has high precision, that is to say, it finds very few false positives.
1 The Gene Ontology Consortium, Gene Ontology: tool for the unification of biology, Nat. Genet., 2000 25, 25-29.
2 E. Birney et al, Nucleic Acids Res., 2006, 34, D556.
3 E B Camon et al, BMC Bioinformatics, 2005, 6, S17.
Also of interest
From the beginning of February 2007, authors publishing in RSC journals will see their science "come alive" thanks to an exciting new project pioneered by RSC Publishing.
External links will open in a new browser window