Scientists in Germany are using artificial intelligence to design a new language for accurately describing the structure of molecules – paving the way for more rapid drug discovery.
Robin Winter, from the Bioinformatics Lab at Bayer AG in Germany, is a machine learning expert who uses computer modelling to understand the relationship between the structure of a molecule and the way it functions. This has far-reaching implications. “Modelling the relationship between the structure of a chemical compound and its chemical properties such as biological activity or toxicity could help guide medicinal chemists in the drug optimisation process –eventually decreasing the time and cost of drug development”, he says.
The first challenge of designing such complex computational models is to be able to describe a huge number of molecules in a methodical way that a computer can interpret. To do this the machine learning team around Dr Djork-Arné Clevert have used artificial intelligence to create a new 'language' for describing molecules.
There are already several such 'languages'. IUPAC has a naming convention whereby names are built up based on terms for different molecular fragments. Another 'language', called SMILES, uses strings of letters and numbers to describe the positioning of each atom. For example, the molecule known by IUPAC as 1,3-benzodioxole is described in SMILES notation as c1ccc2c(c1)OCO2.
The team’s computational model borrows ideas from the field of neural machine translation. "It translates between two different textual notations that represent the same molecule, and at the same time compresses this information", says Robin Winter. "This is similar to what automatic translation tools like Google Translate perform, but for chemical structures. By training a machine to translate between two different texts encoding the same information, the model learns to extract a general understanding of the molecule which can then be utilized as machine-readable molecular descriptors."
In other words the machine is using the two existing languages to develop its own, more succinct and mathematical language. The new language is vector-based, and can be utilized to find yet-to-be-discovered new molecules.
"Combined with a machinery to predict how a given compound can be synthesised, this whole field of de novo drug design has the potential to eventually change how new drugs will be generated in the future."
Chemical Science is the flagship journal of the Royal Society of Chemistry and publishes findings of exceptional significance from across the chemical sciences. It is a global journal for the discovery and reporting of breakthroughs in basic chemical research, communicated to a worldwide audience without barriers, through open access. All article publication charges have been waived, meaning that the journal is free to read and free to publish.
Every Wednesday we are sharing one story from Chemical Science, highlighting the cutting-edge work we publish. Follow @ChemicalScience and #ChemSciPicks on Twitter to stay up to date.
If you are a journalist wishing to receive brand new research from our journals under embargo, please contact the press office using the contact box below, to be added to our distribution list.
Our press office is open for enquiries from members of the press and media on weekdays from 9.00am to 5.00pm GMT.