When I was a student, there was an attitude of "if you’re good at maths, do physical chemistry and if you want to do physical chemistry, you’d better be good at maths". Whereas now that has spread out across the entire subject – I don’t think in ten year’s time that even the most traditional of synthetic chemists is going to be able to get away without having quite a sophisticated understanding of the kind of maths that’s necessary for complex data analysis – so effectively statistics.
I think that’s a huge opportunity in terms of how we start to look at what we teach our students and the kind of students we’re looking for. I think it’s a threat because if I look into the school curriculum, nobody’s preparing them for this.
It’s going to become increasingly important – we need to think now about equipping the people that we’re training for that world, because ten years’ time is not a long way away. People entering the first year of their degree today will be completing their PhD with this change being full-on, and they will need to be equipped to deal with it.
I wasn’t taught about the idea of a null hypothesis until a long, long way into my adulthood – we need to start to get our students to understand the language of statistics, to understand what data is showing you – and more importantly what it’s not showing you.
One of the dangers in this world of big data is something I believe statisticians call "correlation fishing" – if you run enough different kinds of tests, eventually you’ll find something that looks like a meaningful correlation. The problem with that is the word "meaningful" – if you run enough tests, statistically you will eventually find some things that appear to correlate.
And to get into that level of understanding, when you think about the reproducibility crisis – which is prevalent in the life sciences – when you get down to the core of many of those problems, it’s to poorly-done statistics.
As chemists, we've not really had to address it in the past, and we are going to have to do so now, so we don’t fall into that hole of producing work based on dodgy statistics, because we haven’t understood that you can’t just run every possible algorithm and eventually you find "ooh, that correlates" because that’s not how statistics work.
I think we’re in a case of transition. At the moment, what I do is talk to my friend Emma, who’s a statistician! Given the stage of career that I’m at, that level might be enough, to have my friendly statistician who spends time explaining to my students. But if I was 35, instead of 55, I perhaps wouldn’t get away with it, because as we progress on and there becomes more of it, I’m going to need to understand it myself.
I think we as university educators have to pick it up and pick it up quickly. We should start working with sister societies, like the Royal Statistical Society, to ensure the school curriculum has enough in it so someone with an ordinary A-level in Maths, without any extra whistles and bangs – can move rapidly on to more sophisticated statistics when they get to us.
As a member of society, you see so much statistical rubbish bandied around, so I think it would be generally a good thing, all the way through the education system. That’s not just good for doing chemistry but good for being a citizen, as the world of big data could mean more and more shoddy statistics will be bandied around.
My maths curriculum was terribly traditional, so I could integrate anything, I could do simultaneous equations until they were coming out of my ears, but calculating population changes based on death and birth rates was as close to statistics as we got. There was nothing about rigorously testing the validity of a series of results and correlations – there was nothing about confounding factors or false correlations.
You can apparently prove that carrying a box of matches in your pocket causes lung cancer... Of course it doesn’t ACTUALLY – it’s smoking that causes lung cancer – but if you carry a box of matches in your pocket, you’re far more likely to be a smoker. So understanding those things, we’ll start to weed out those confounding factors.