13th German Conference on Chemoinformatics (Mainz, Germany)
Jeff White (Royal Society of Chemistry data scientist) recently attended the 13th German Conference on Chemoinformatics to present a poster detailing some of the Data Science team’s recent work on training neural networks to identify functional groups from carbon-13 NMR spectra.
The meeting focussed on a handful of areas this year, which were reflected in the session titles: "Chemoinformatics and Drug Discovery", "Protein Modelling and Molecular Modelling" and “Cheminformation and Big Data”. There were also sessions dedicated to short “telegrams”, allowing students the chance to give brief summaries of their research.
Among the numerous equations and symbols that flew about in the sessions, a few presentations stood out. One involved the application of spatial geometry techniques, taken from systems used to recognise such things as human figures in video recordings, to model molecules as they deform. Another highlight involved the use of cryptographic protocols that allow data sharing without revealing the specific underlying data.
One of the keynote presentations dealt with the difficulties of encoding and preserving data in machine-readable formats, a lack of consensus and standards in the software created for the community, and barriers to machine interpretation of data, all real concerns. A final talk with relevance for the RSC concerned tautomerism in the context of InChI (v2), which was interesting if a little daunting. Introducing tautomerism into the new InChI v2 standard is likely to have far-reaching effects it seems, with most compounds showing some form of tautomerism. It is likely that InChI version 2 will have a different format to the current implementation, which might mean our team's involvement. There was lots on encouragement for the community to get involved in testing the proposed new standard.
Sunday evening saw the conference trip out to Eberbach Abbey. This is an ex-monastary from the 12th century which was used for internal filming on the film "The Name of the Rose".
The Poster Session was held on Monday evening, and seemed to go well , with good traffic past our poster, even allowing for the proximity of the bar, and many opportunities to describe our work in detail, and answer questions. There was also a lot of anecdotal reference to previous attempts to automate structure assignment over the previous decades, both before and after the advent of neural networks and machine learning approaches.
Our poster, "NMR, deep learning and molecular structure: a call for data" is available for those who would like to find out more.