DFO Manifesto – Matt Menickelly

A SPECTRE is haunting numerical optimization – the spectre of derivative-free optimization.

As I get older, work with more collaborators, and see more (good and bad) trends in DFO, I’ve compiled a list of very general thoughts about best practices. This list will almost certainly grow over time as I think of or encounter more items.

If your black box isn’t completely black-box, you are morally obligated to exploit all the structure you
can see.
If you believe (a composite part of) an objective function is smooth, use model-based methods. Direct
search methods are only for when you know absolutely nothing.
Do not use derivative-free methods if you have derivatives available. (Like … why?) Minor note: Do some timing – if the computation of a derivative actually takes more than n (the problem dimension) times the cost of a derivative, maybe you should actually think about DFO.
People who work in algorithmic differentiation are our comrades. But they can’t solve all our problems.
If function evaluations are expensive, you should keep them in memory for as long as reasonable allowable.
If a black box is noisy (and not necessarily stochastic), you must do something (usually, change a tolerance or a model-building routine) to account for an estimate of the magnitude of noise.
With that said, stochastic black-box oracles are hard, unless the oracle is trivially cheap to evaluate.
Black-box constraints are hard. You’re never going to get the activities right, so go back to the drawing board with your optimization model. How infeasible or suboptimal can you be before your optimization model is garbage?
Global optimization is hard. Bayesian optimization works wonderfully … in low dimensions. It doesn’t take long before the curse of dimensionality catches up with you. You’ll generally find local DFO methods combined with multistart methods are empirically superior.
Multiobjective optimization is hard. It’s basically global optimization. Push back on optimization models with more than two objectives, three is a very generous hard limit.
If an algorithm’s motivation refers to “nature” or “evolution” as its inspiration, beware. Evolution in nature doesn’t respect any objective function. Also, it took billions of years to get multicellular life, so nature isn’t even efficient at meeting nebulous objectives.
If you’re evaluating a finite difference gradient on every iteration of an optimization method, you’re only nominally doing DFO. You’re really doing an inexact derivative-based method.
The fewer hyperparameters a method needs to actually tune, the more useful that method will end up being on computationally expensive problems.
If accuracy is critically important to your application, “dimension-independent” DFO methods are too good to be true.