\documentclass{article}
\usepackage{graphicx} % Required for inserting images
% Language setting
% Replace `english' with e.g. `spanish' to change the document language
\usepackage[english]{babel}
% Set page size and margins
% Replace `letterpaper' with `a4paper' for UK/EU standard size
\usepackage[letterpaper,top=2cm,bottom=2cm,left=3cm,right=3cm,marginparwidth=1.75cm]{geometry}
% Useful packages
\usepackage{amsmath}
\usepackage{booktabs}
\usepackage{multirow}
\usepackage{graphicx}
\usepackage{indentfirst}
\usepackage[colorlinks=true, allcolors=blue]{hyperref}
\usepackage{setspace}
\usepackage[style=ieee, citestyle=numeric-comp, backend=biber]{biblatex}
\usepackage{tabularx}
\usepackage{caption}
\usepackage{float}
\usepackage{xr}
\DeclareCaptionLabelFormat{addS}{#1 S#2}
\setcounter{figure}{0}
\renewcommand{\figurename}{Fig.}
\renewcommand{\thefigure}{S\arabic{figure}}
\makeatletter
\renewenvironment{abstract}
{\if@twocolumn
\section*{\abstractname}
\else
\begin{flushleft}%
\small
{\bfseries\abstractname\vspace{-.5em}\vspace{\z@}}%
\end{flushleft}%
\quotation
\fi}
{\if@twocolumn\else\endquotation\fi}
\makeatother
\addbibresource{oc20_paper.bib}
\doublespacing
\usepackage{authblk}
\title{Investigating the Error Imbalance of Large-Scale Machine Learning Potentials in Catalysis}
\author[1]{Kareem Abdelmaqsoud}
\author[2]{Muhammed Shuaibi}
\author[1]{Adeesh Kolluru}
\author[3]{Raffaele Cheula}
\author[1]{John R. Kitchin\thanks{Corresponding author: jkitchin@andrew.cmu.edu}}
\affil[3]{Aarhus University}
\affil[2]{Meta Fundamental AI Research}
\affil[1]{Department of Chemical Engineering, Carnegie Mellon University}
\date{}
\begin{document}
\maketitle
\clearpage
% Table of Contents
\tableofcontents
\clearpage
\captionsetup[table]{labelformat=addS}\textbf{}
\section{DFT convergence errors on forces}
We showed that adsorption energies in the OC20 dataset are converged at the OC20 DFT settings due to cancellation of errors. In this section, we show that the forces are also converged. The mean absolute differences between the per-atom forces between the original OC20 settings and the more converged DFT settings for each material class are shown in Figure \ref{fig:force_maes}. The convergence errors were found to be smaller than the typical force convergence criteria which is 0.03 eV/\textup{\AA}. Moreover, nonmetals have comparable force errors to all other material classes. Therefore, DFT calculations in OC20 are converged in terms of both energy and forces.
\begin{figure}[H]
\centering
\includegraphics[width=0.7\textwidth]{figures/force_maes.png}
\caption{Difference between the per-atom forces at the original OC20 DFT settings and at the tighter DFT settings identified at this study. The magnitude of the force mean absolute differences are small, showing that the forces are converged at the original OC20 DFT settings.
}
\label{fig:force_maes}
\end{figure}
\section{DFT convergence errors on energies outliers}
Figure 5 in the main paper shows the distribution of the energy differences between the DFT energies at the OC20 original DFT settings and the tighter DFT settings chosen in this paper. Energy differences that fall outside the whiskers of the box plots are considered outliers and were removed from Figure 5 for visualization clarity. The outliers are systems with energy differences that lie in the top and bottom 5\% of the energy difference distribution. These outliers are shown here in Figure \ref{fig:energy_difference_outliers}. Information about these outlier systems such as can be accessed through this \href{https://github.com/kareem-Abdelmaqsoud/error_imbalance_mlps/blob/main/converegnce_errors_outliers.xlsx} {link}. This information includes the outliers system IDs, bulk structure formula, adsorbate formula, and their corresponding energy and force convergence errors.
\begin{figure}[H]
\centering
\includegraphics[width=1\textwidth]{figures/energy_difference_outliers.png}
\caption{Differences between the energies calculated at OC20 DFT settings and new tighter DFT settings on a larger scale of OC20-200k dataset. a) total energies (adsorbate+slab), b) slab energy, and c) adsorption energy. This figure shows the outliers that lie outside the whiskers of the box plot. The outliers are systems with energy differences that lie in the top and bottom 5\% of the energy difference distribution.
}
\label{fig:energy_difference_outliers}
\end{figure}
\section{Effect of removing the surface reconstructions on force MAEs}
To investigate the effect of removing the systems with surface reconstructions on the force MAEs, we compare the validation MAEs of three pre-trained models before and after removing the systems with surface reconstructions from the validation sets. Across the three MLPs, The force MAEs are not affected by removing the systems with surface reconstructions as shown in Table S\ref{tab:removing_anom_forces}. The surface reconstructions only cause an issue to the adsorption energy referencing used in the OC20 dataset. Since the forces are not referenced in the dataset, it is reasonable that removing the surface reconstructions does not have an effect on the ML force MAEs.
We focused the investigation of the MAEs and the surface reconstructions on the four material classes in the dataset. However, it could be of interest to do an investigation on a per-element basis. For instance, it could be interesting to compare the MAEs of nitrides to carbides. To accomplish this, we included a Jupyter notebook that showcases how to obtain the per-element MAEs and the fraction of calculations with surface reconstruction. These errors are computed for a GemNet-OC model trained on the adsorption energies of the full OC20 dataset. This analysis could help in specifying which materials which have a high ratio of surface reconstructions and have high MAEs. This notebook can be accessed through this \href{https://github.com/kareem-Abdelmaqsoud/error_imbalance_mlps/blob/main/per_element_error_distribution.ipynb} {link}.
\clearpage
\input{tables/remove_anom_forces}
\section{Effect of removing the adsorbate anomalies on energy MAEs}
Besides adsorbate-induced surface reconstructions, there are adsorbate anomalies that can occur during a DFT relaxation. The two most common anomalies are adsorbate desorption and dissociation. We refer to these two anomalies are adsorbate anomalies. These anomalies are detected based on the graph connectivity approach explained in the methods section that was developed in the AdsorbML paper \cite{lan_adsorbml_2023}. The code for detecting the surface reconstruction, adsorbate dissociation and adsorbate desorption anomalies is provided in this \href{https://github.com/Open-Catalyst-Project/Open-Catalyst-Dataset/blob/277175cceb7d68469c4c77761ca11f42388e9ab9/ocdata/utils/flag_anomaly.py#L6}{script} within the Open-Catalyst-Project GitHub repository. Table S\ref{tab:removing_adsorbate_anom} shows that removing the adsorbate anomalies from the validation sets has a very small effect on the energy MAEs of the top three performing models on the OC20 dataset. This is expected because these adsorbate anomalies do not affect the adsorption energy referencing used in the dataset. Adsorbate desorption does not affect the referencing because, in the adsorption energy referencing, the total energy is referenced by subtracting the energy of the adsorbate in the gas phase. For adsorbate dissociation, the model treats these dissociated molecules or elements are multiple adsorbates adsorbed to the surface. It is a good sign that the model can learn these anomalies and still give accurate predictions of the energy of the systems with these adsorbate anomalies.
\clearpage
\input{tables/remove_ads_anom}
\section{Effect of removing the surface reconstructions has on total energy models}
To further verify that the surface reconstructions are only an issue for the models because of the adsorption energy referencing, we test the effect of removing the surface reconstructions on a GemNet-OC model trained on total energies. Figure \ref{fig:total_energy_anom} shows the MAEs of the total energy model before and after removing the surface reconstructions on the different in-domain (ID) and out-of-domain (OOD) validation sets, without retraining. Removing the surface reconstructions has a minimal effect on the MAEs of the total energy model as expected. Therefore, the total energy model is applicable in predicting the energy of systems with surface reconstructions unlike the adsorption energy model. As shown in Figure \ref{fig:total_energy_anom}, the total energy model was found having significantly higher MAEs on OOD in terms of the catalyst validation dataset compared to the ID and OOD in terms of the adsorbate validation datasets. This could be because the new element combinations that exist in the OOD-Cat dataset lead to significantly different total energy value that can not be easily captured using the total energy model. In the next section, we show that calculating the adsorption energy using total energy models leads to cancellation of errors that bring the MAEs of the OOD-Cat and OOD-Both to be comparable to both ID and OOD-Ads.
\begin{figure}[H]
\centering
\includegraphics[width=0.6\textwidth]{figures/total_energy_anomalies.png}
\caption{Shows that removing the surface reconstruction from the validation sets has no effect on the validation errors of a GemNet-OC total energy model. Therefore, total energy models are not impacted by the surface reconstructions.
}
\label{fig:total_energy_anom}
\end{figure}
\section{Adsorption energy from total energy model}
Besides the result in the main paper, we consider three cases of comparing the adsorption energy MAEs of a total energy model and a baseline adsorption energy model on the OC20-IS2RE 25k validation sets. In all three cases, we remove the calculations with surface reconstructions since the adsorption energy is not properly defined on these systems. In the first case, we relax both the adsorbate+slab and the slab structure using the total energy model. In the second case, we relax the adsorbate+slab structure only using the total energy model and relax the slab structure using DFT. To allow for cancellation of ML errors, we predict the energy of the DFT relaxed slab structure using the total energy model. In the third case, we relax both the adsorbate+slab and the slab structures using DFT, and predict the energy of DFT relaxed structures using the total energy model. This case represents the Relaxed Structure to Relaxed Energy (RS2RE) task of the OC20 dataset.
\subsubsection{Relaxing both the adsorbate+slab and the slab structures (no reconstructions)}
\input{tables/total_energy_is2re_relax_both_anom}
Table S\ref{tab:total_energy_maes_is2re_both_anom} shows the MAEs of the total energy model and the adsorption energy models after removing the systems with surface reconstructions. The MAEs calculated after removing the surface reconstructions are lower compared to the MAEs reported in the main paper which include systems with surface reconstructions. The total energy model shows significant cancellation of errors between the adsorbate+slab and the slab structure MAEs. The adsorption energy model has slightly higher adsorption energy MAEs overall compared to the total energy model. However, the results are overall comparable after removing the surface reconstructions which reduce the MAEs of the adsorption energy model more than the total energy model.
\subsubsection{Relaxing the adsorbate+slab structure only}
\input{tables/total_energy_is2re_anom}
For this comparison, we relax the adsorbate+slab structure using ML and relax the slab structure using DFT. Computing the adsorption energy from the total energy model is comparable to the adsorption energy model in terms of compute. The adsorption energy model assumes you have access to DFT relaxed structure of the slab, so we leverage that here as well. To allow for cancellation of ML errors, we use the total energy model to predict the energy of the DFT relaxed slab and subtract that as a reference instead of the DFT energy. The MAEs are calculated between the ML predicted adsorption energy energy and the DFT adsorption energy. As shown in Table S\ref{tab:total_energy_maes_is2re_anom}, the MAEs of the adsorption energy model are lower for the in-domain (ID) and out-of-domain in terms of the adsorbate (OOD Ads) validation sets. The MAEs of the total energy model are lower on the OOD catalyst and OOD both catalyst and adsorbate validation sets. Overall, the MAEs of the total energy and the adsorption energy models are comparable. Interestingly, the MAEs of the total energy model are not significantly better compared to the previous case we relaxed the slab structure using ML as well. Therefore, it might not be necessary for the total energy model to relax the slab structure using DFT, unlike the adsorption energy model which requires relaxing the slab structure using DFT.
\subsubsection{Relaxed Structure to Relaxed Energy (RS2RE)}
In this case, we relaxed the structures using DFT and predicted the energy of the relaxed structures using the ML models. For the total energy model, we predicted the energy of both relaxed adsorbate+slab and slab structures and used these energy predictions in Equation 3 to calculate the adsorption energy. On the other hand, we used the adsorption energy model to directly predict the adsorption energy of the relaxed adsorbate+slab structure. Relaxing the structures using DFT is not very practical because of the computational cost of the DFT relaxations. However, it ensures that the ML models are predicting the energy of the same structure as the DFT relaxed structure. Whereas for previous results which included ML relaxations, it is not guaranteed that the ML relaxed structures are the same as the DFT relaxed structures, leading to high MAEs. As shown in Table S\ref{tab:total_energy_maes_rs2re}, the MAEs are much smaller in magnitude compared to the MAEs reported before. Significant cancellation of errors is observed on OOD catalyst and OOD Both datasets up to 75\%. The total energy model has higher MAEs on the ID dataset and lower MAEs on the OOD datasets. Moreover, as shown in Figure \ref{fig:error_dist}, the distribution of the MAEs across the different material classes is more uniform for the total energy model compared to the adsorption energy model even after removing the systems with surface reconstruction.
\input{tables/total_energy_rs2re}
\begin{figure}[H]
\centering
\includegraphics[width=1\textwidth]{figures/error_dist.png}
\caption{shows that using a total energy model to compute the adsorption energies reduces the imbalance of the MAEs between the different material classes in the dataset.
}
\label{fig:error_dist}
\end{figure}
% \section{Effect of using the tighter DFT setting on the ML MAEs}
% To test the effect of using tighter DFT settings on the MLP MAEs, we compare the MAEs of 1) a Gemnet-OC \cite{gasteiger_gemnet-oc_2022} model trained and evaluated on data at the original OC20 DFT settings, 2) a Gemnet-OC model trained and evaluated on data at the tighter DFT settings. As shown in Figure \ref{fig:old_new_settings_mae}a, the total energy MAEs are not significantly affected by using tighter DFT settings. The reduction of the halides MAEs is larger than the other material classes, but since the percentage of the halides in the dataset is small (1.4\%), the overall MAEs are not decreased significantly. As shown in Figure \ref{fig:old_new_settings_mae}b, for all the material classes, the MAEs do not change significantly as a result of using the tighter DFT settings. Therefore, the ML errors are not affected by the DFT convergence errors. Moreover, for MLPs trained on the data with the tighter DFT settings, nonmetals and halides still have inconsistently higher errors compared to other material classes. Therefore, it can be concluded that the DFT convergence errors are not the cause of non-metals having inconsistently higher MAEs than other material classes.
% \begin{figure}[H]
% \centering
% \includegraphics[width=1\textwidth]{figures/dft_settings_ml_errors.png}
% \caption{A comparison of the MAEs of two GemNet-OC models trained on the total energies at the original vs. tighter DFT settings. The Figure shows a) total energy MAEs b) adsorption energy MAEs of the different material classes. No significant change in the MAEs as a result of using the tighter DFT settings.
% }
% \label{fig:old_new_settings_mae}
% \end{figure}
\clearpage
\printbibliography
\end{document}