Computers learn chemistry
Chemists who trawl through the thousands of chemistry papers published every month must wish their computers could do the job for them. Well, maybe one day they will: that's the ultimate goal of Project Prospect, an initiative unveiled this month by Royal Society of Chemistry Publishing.
From February 2007, electronic RSC journal papers will be written so their data can be read, indexed, and intelligently searched by machine. The aim of this project - the first of its kind - is to create a chemical version of the 'semantic web': where computers can understand the meaning (semantics) of information, rather than simply display data.
Laptop on a learning curve
In the first phase of Project Prospect, explained Richard Kidd, who leads the RSC editorial production systems department, technical editors, assisted by computer programs, will enrich articles with extra information, known as metadata. Though the metadata is hidden from view, it means readers can click on named compounds, scientific concepts and experimental data in an article to download structures, understand topics, or link through to electronic databases like Iupac's Gold Book.
This isn't just a neat time-saving device for busy chemists. The metadata is written in computer-readable language: including such acronym-heavy standards as InChI (Iupac's International Chemical Identifier for compounds), OBOs (Open Biomedical Ontologies: a hierarchical classification of biomedical terms), and CML (Chemical Markup Language: a chemistry equivalent of the internet's HTML).
The machine-readable metadata allows a program to search the RSC literature for similar compounds or subjects and deliver that information back to the chemist, via a convenient news feed. Eventually, a machine will be able to search intelligently: understanding, for instance, that a chemist interested in acetone would also want to read articles mentioning dimethylketone, nail-varnish remover, or organic solvents. This technology, Kidd says, will instantly help chemists to find, understand and share chemical knowledge with each other, while computers take on all the hard work of searching and delivering information.
'Project Prospect demonstrates our commitment to invest in innovative technologies to provide our authors and readers with the best publishing service available', said the RSC's editorial director, Robert Parker. The initiative was developed together with UK academics based at the Unilever centre of molecular informatics and the Computing laboratory, Cambridge University.
Other chemical publishers look sure to follow the RSC's lead: Bob Bovenschulte, head of publishing at the American Chemical Society, told Chemistry World that the ACS publications division was actively developing its own plans along these lines, though wasn't yet at the point of making final decisions about specific initiatives.
As Peter Murray-Rust of Cambridge's Unilever centre explained, scientists have already gone beyond the technology about to be used in commercial publishing. They have developed programs which don't just search and collate information from an article, but also analyse it; checking whether an article's data is self-consistent, for example.
In the future, 'your machine will be able to answer all your questions about a chemistry article - what are the molecules in this article, what are their properties, how will they react - then present this information to you,' predicts Murray-Rust.
Scientific publishers will have to adapt to a world where journal articles can be analysed and indexed as soon as they are published, he adds. Secondary publishers, for example - who provide abstracting and database services - may find their current roles defunct. But Bovenschulte warns that any predictions about the future impacts of technology are likely to be premature: 'I am optimistic that secondary publishers will continue to find innovative and creative ways to meet the changing needs of their customers,' he said.
Where chemistry publishers are leading the way, however, other sciences may find it harder to follow. Biologists are already well advanced in classifying their data for machine comprehension, but a common publishing standard will be hard to reach, as new terms are coined and frameworks of reference change so rapidly. And research physicists and mathematicians invent enough original notation to confuse the most sophisticated of search programs. The maturity of chemistry - with its precisely defined compounds and data presentation conventions - makes it the science most suited to this revolution in electronic publishing.
Richard Van Noorden
For FAQs, examples, contact information and latest news about RSC Prospect