A fast-prediction surrogate model for large datasets

Surrogate models approximate a function based on a set of training points and can then predict the function at new points. In engineering, kriging is widely used because it is fast to train and is generally more accurate than other types of surrogate models. However, the prediction time of kriging increases with the size of the dataset, and the training can fail if the dataset is too large or poorly spaced, which limits the accuracy that is attainable. We develop a new surrogate modeling technique—regularized minimal-energy tensor-product splines (RMTS)—that is not susceptible to training failure, and whose prediction time does not increase with the number of training points. The improved scalability with the number of training points is due to the use of tensor-product splines, where energy minimization is used to handle under-constrained problems in which there are more spline coefficients than training points. RMTS scales up to four dimensions with 10–15 spline coefficients per dimension, but scaling beyond that requires coarsening of the spline in some of the dimensions because of the computational cost of the energy minimization step. Benchmarking using a suite of one- to four-dimensional problems shows that while kriging is the most accurate option for a small number of training points, RMTS is the best alternative when a large set of data points is available or a low prediction time is desired. The best-case average root-mean-square error for the 4-D problems is close to 1% for RMTS and just under 10% for kriging.

Publications