While reading up on line search algorithms in nonlinear optimization for neural network training, I came across this problem: Given a function \(f(x)\), find a quadratic interpolant \(q(x) = ax^2 + bx + c\) that fulfills the conditions \(f(x_0) = q(x_0)\), \(f(x_1) = q(x_1)\) and \(f'(x_0) = q'(x_0)\). Basically this:

So I took out my scribbling pad, wrote down some equations and then, after two pages of nonsense, decided it really wasn’t worth the hassle. It turns out that the simple system

$$

\begin{align}

f(x_0) &= ax_0^2 + bx_0 + c\\

f(x_1) &= ax_1^2 + bx_1 + c\\

f'(x_0) &= 2ax_0 + b

\end{align}

$$

for

$$

\begin{align}

q(x) &= ax^2 + bx + c

\end{align}

$$

has the solution

$$

\begin{align}

a &= – \frac{f(x_0) – f(x_1) – x_0 f'(x_0) + x_1 f'(x_0)}{(x_0 – x_1)^2} \\

b &= – \frac{x_0^2 f'(x_0) – x_1^2 f'(x_0) – 2x_0 f(x_0) + 2x_0 f(x_1)}{(x_0 – x_1)^2} \\

c &= \frac{x_0^2 f(x_1) + x_1^2 f(x_0) – 2x_0 x_1 f(x_0) – x_0 x_1^2 f'(x_0) + x_0^2 x_1 f'(x_0)}{(x_0 – x_1)^2}

\end{align}

$$

Instead of ruining your time on the paper, it can be obtained more easily in Matlab using

syms a b c x_0 x_1 f(x_0) f(x_1) df(x_0)
[a, b, c] = solve(...
f(x_0) == a*x_0^2 + b*x_0 + c, ...
f(x_1) == a*x_1^2 + b*x_1 + c, ...
df(x_0) == 2*a*x_0 + b, ...
a, b, c);
syms q(x)
q(x) = simplify(a*x^2 + b*x + c);

Obviously, the whole purpose of this operation is to find an approximation to the local minimizer of \(f'(x)\). This gives

$$

\begin{align}

0 &\overset{!}{=} q'(x_{min}) \\

x_{min} &= -\frac{1}{2} \frac{x_0^2 f'(x_0) -x_1^2 f'(x_0) – 2 x_0 f(x_0) + 2 x_0 f(x_1)} {f(x_0) – f(x_1) – x_0 f'(x_0) + x_1 f'(x_0)}

\end{align}

$$

We also would need to check the interpolant’s second derivative \(q“(x_{min})\) to ensure the approximated minimizer is indeed a minimum of \(q(x)\) by requiring \(q“(x_{min}) > 0\), with the second derivative given as:

$$

\begin{align}

q“(x) &= – 2 \frac{f(x_0) – f(x_1) – x_0 f'(x_0) + x_1 f'(x_0)}{\left( x_0 – x_1 \right)^2 }

\end{align}

$$

The premise of the line search in minimization problems usually is that the search direction is already a direction of descent. By having \(0 > f'(x_0)\) and \(f'(x_1) > 0\) (as would typically be the case when bracketing the local minimizer of \(f(x)\)), the interpolant should always be (strictly) convex. If these conditions do not hold, there might be no solution at all: one obviously won’t be able to find a quadratic interpolant given the initial conditions for a function that is linear to machine precision. In that case, watch out for divisions by zero.

Last but not least, if the objective is to minimize \(\varphi(\alpha) = f(\vec{x}_k + \alpha \vec{d}_k)\) using \(q(\alpha)\), where \(\vec{d}_k\) is the search direction and \(\vec{x}_k\) the current starting point, such that

$$

\begin{align}

\varphi(0) &= f(\vec{x}_k) \\

\varphi'(0) &= \nabla f(\vec{x}_k)‘ \vec{d}_k

\end{align}

$$

then the above formulas simplify to

$$

\begin{align}

a &= – \frac{\varphi(0) – \varphi(\alpha) + \alpha \varphi'(0)}{\alpha^2} \\

b &= \frac{\alpha^2 \varphi'(\alpha)}{\alpha^2} \\

c &= \frac{\alpha^2 \varphi(0)}{\alpha^2}

\end{align}

$$

and, more importantly, the local (approximated) minimizer at \(\alpha_{min}\) simplifies to

$$

\begin{align}

\alpha_{min} &= \frac{1}{2} \frac{\alpha^2 \varphi'(0)}{\varphi(0)-\varphi(\alpha)+\alpha\varphi'(0)}

\end{align}

$$

If \(q(\alpha)\) is required to be strongly convex, then we’ll observe that

$$

\begin{align}

q“(\alpha) &= 2a \overset{!}{\succeq} m

\end{align}

$$

for an \(m > 0\), giving us that \(a\) must be greater than zero (or \(\epsilon\), for that matter), which is a trivial check. The following picture visualizes that this is indeed the case:

Convexity of a parabola for different highest-order coefficients *a* with positive *b* (top), zero *b* (middle) and negative *b* (bottom). Lowest-order coefficient *c* is left out for brevity.