Finding descriptors in materials data using interpretable 'transparent box' machine learning

When	10:30 AM - 11:30 AM Nov 06, 2020
Where	Virtual
Add event to calendar	vCal iCal

Finding descriptors in materials data using interpretable 'transparent box' machine learning

Bryan R. Goldsmith

U-M Dept. of Chemical Engineering

Abstract: Physically transparent and predictive models that quantify structure-property relationships of materials are of key importance in numerous fields, including chemistry, physics, and materials science. Many of the high-profile applications of machine learning in materials science and engineering thus far have focused on using black-box models such as neural networks and Gaussian process regressors for the fast and accurate prediction of target properties such as material stability or bandgap. By black box, I mean that the mathematical formulation of the model does not intuitively rationalize the underlying physical mechanism, rendering the physics described by the model opaque. Although there is no denying the utility of these black-box models for rapidly calculating material properties of interest, interpretable models are desirable in their own right. By interpretable, I mean the models can explain the contribution of each feature (a measurable variable or property) to the overall model prediction. Interpretable models are desirable because they can lead to theories and hypotheses that advance our scientific knowledge and accelerate progress beyond simply enabling high-throughput screening.

In this talk, I will discuss three “transparent box” machine learning applications with the goal of interpretability in mind. I will first present on the use of the Sure Independence Screening and Sparsifying Operator (SISSO) algorithm to find descriptors of perovskite stability. SISSO is shown to find a physically meaningful descriptor that predicts the stability of perovskite oxide and halide materials with superior performance compared to the well-known Goldschmidt tolerance factor. Second, I will present subgroup discovery (SGD) as a data-mining algorithm to find interpretable local models of a target property in materials data. It is demonstrated that SGD can identify descriptors that classify the crystal structures of octet binary semiconductors as either rocksalt or zincblende. Third, I will present the use of generalized additive models (GAMs) to clarify geometric structure-property relationships for chemisorption on alloys (e.g., O, OH, S, and Cl on Rh-, Pd-, Ag-, Ir-, Pt- and Au alloy surfaces). By comparing the GAM-derived chemisorption models to previously established electronic-structure models, we clarify the critical physical parameters that control the chemisorption process on metal surfaces.