RSC Publishing


Publishing

 

Cover image for Highlights in Chemical Biology

Highlights in Chemical Biology

Chemical biology news from across RSC Publishing.



GO RSC!


02 January 2007

A revolutionary change in finding relevant chemical and molecular biology articles is underway, thanks to work by the GO consortium and the RSC.

The Gene Ontology (GO)1 has been developed by an international consortium of databases and researchers to provide consistent descriptions of proteins and other products resulting from gene expression.

Gene ontology

Thanks to Gene Ontology (GO) a revolutionary change in finding relevant chemical and molecular biology articles is underway

Shared, precise vocabularies agreed upon by specialists in a particular domain are nothing new. However, ontologies, such as GO and the related Sequence Ontology (SO), go further than this because they contain logical relations between the entries. These include simple relations, which resemble the descending hierarchy of the Linnaean classification of species, and further links between entries. For example, in GO, a cell component, such as a mitochondrion, can be defined to be a part of another component, in this case the cytoplasm.

The GO approach is more general than identifying the genes or proteins themselves. It identifies what a protein does (the molecular function) and what biological processes it is involved with or where it acts (the cellular component). The gene Sos, for example, is linked in Ensembl, a major genomics database,2 to 17 different GO terms, including nervous system development (process), nucleosome assembly (component) and guanyl-nucleotide exchange factor activity (function).

"We're delighted by the RSC's decision to use GO and SO terms to annotate scientific papers they publish."
What does this mean for readers? Firstly, the new HTML view (see Science Come Alive) for a given paper has a set of drop-down menus that list the GO terms and link to their definitions, providing an at-a-glance view of the biological context in a paper. Secondly, the GO terms appear in the RSS feeds, bringing the biology in RSC publications direct to your desktop. Putting this machine-readable information into HTML and RSS feeds is good news for authors, because their papers will be more visible and easier to find for search engines and database curators.

Traditionally, GO terms have been used to annotate large protein databases, so annotating the full text of journal articles as they go online is a radical departure.

"It's an exciting application of ontologies that will help researchers search the ever-growing body of scientific literature more quickly and effectively"
Midori Harris, from the European Bioinformatics Institute (EBI) in Hinxton, UK, welcomes the developments: 'We're delighted by the RSC's decision to use GO and SO terms to annotate scientific papers they publish. It's an exciting application of ontologies that will help researchers search the ever-growing body of scientific literature more quickly and effectively. We hope to see more publishers following the RSC's example in the future.'

This is only the beginning. GO is under constant revision, so changes to the ontology will be reflected in RSC journal articles. There is a great deal of work needed to go beyond process, function and cellular component to the gene products themselves and their interactions, and the RSC and GO consortium intends to develop the project over the coming months and years. Watch this space!

How does it all work?

Using computers to mine biological information from the literature is an active research area, and GO terms are no exception. However, existing automated methods are not reliable enough to use without supervision, so the computer results are checked carefully by the RSC's team of technical editors.

Evelyn Camon and colleagues at the EBI have written a detailed account of potential pitfalls and common mistakes found in automated methods for Gene Ontology annotation,3 and the RSC has taken this into account in its design.

The RSC's text-mining software (Science Come Alive) takes the entries in the ontology and their synonyms, generates a further, much larger, set of synonyms and searches the journal articles for them. This simple approach, while not catching all of the terms in a paper, has high precision, that is to say, it finds very few false positives.

Colin Batchelor

References

1 The Gene Ontology Consortium, Gene Ontology: tool for the unification of biology, Nat. Genet., 2000 25, 25-29.
2 E. Birney et al, Nucleic Acids Res., 2006, 34, D556.
3 E B Camon et al, BMC Bioinformatics, 2005, 6, S17.

Also of interest

Science Come Alive

From the beginning of February 2007, authors publishing in RSC journals will see their science "come alive" thanks to an exciting new project pioneered by RSC Publishing.

Related Links

Link icon The Gene Ontology Consortium
Gene Ontology home page

Link icon Ensembl
Ensembl home page


External links will open in a new browser window