Journal articles made easy: Predicting crystallinity of molecular materials
Will it crystallise? Predicting crystallinity of molecular materials
Additional information from the authors
Information on the authors
Additional information from the authors...
'For a crystallographer interested in molecular structure, the biggest barrier to studying materials is obtaining them in a crystalline form.
Machine learning approaches have been applied to predict many properties of molecules – we set out to test whether there was sufficient information in a conventional 2D chemical diagram (atom types and bonds) to predict whether a material would crystallize easily.
“Big data” approach, rather than test thousands of compounds: Cambridge Database contains examples of things that are crystalline – we use commercially available catalogues to construct a complementary data set of things that are not crystalline.
This analysis tells us when a material should crystallize – and therefore when to expend effort trying to obtain crystalline sample. For some applications it also tells us how small changes to a molecule might make it more, or less, crystalline.
To assess the predictive power of the model, the accuracy is checked on a separate set of unseen data. Additionally we carried out a small blind-test recrystallisation in the lab using materials that we had predicted to be either crystalline or non-crystalline.
Complementary to CSP
The model inputs are simple facts about each molecule that can be computed from the 2D diagram, such as number and types of atoms, counts of chemical groups present, and connectivity indices. We have attempted to reverse engineer the model (by training it with only a few properties of the molecule) to find out which input facts are most significant. It turns out that just a couple of values for each molecule will still produce a working model that is about 80% accurate – in the paper, we show that rotatable bond count (how flexible is the molecule) and Chi0V (an indirect 2D measure of 3D molecular volume) taken together provide the best accuracy.
Volume can be related to the ease with which a solvated molecule can move from solution to the solid-state, and the more flexible a molecule is, the lower the concentration of the particular conformation which is needed to attach to a crystal surface.
We are currently studying a range of materials that we consider to be on the edge of crystallinity, with a view to gaining more insight into the mechanisms inhibiting their growth into larger crystals.
There are many things that the model doesn’t know about, that we would like to incorporate, but currently don’t have the source data for - principally temperature and solvent.'