## Kategoriearchiv

Du betrachtest das Archiv der Kategorie Neural Networks.

• ## Getting an image into and out of TensorFlow

Let’s assume you already have an image in numpy’s ndarray format, e.g. because you loaded it with OpenCV’s imread() function, and you want to convert it to TensorFlow’s Tensor format and later back to ndarray.

That’s essentially three calls to TensorFlow:

import cv2
import tensorflow as tf
import numpy as np

# normalize the pixel values to 0..1 range and convert them
# to a single-precision tensor
t = tf.convert_to_tensor(image_in, dtype=tf.float32)
assert isinstance(t, tf.Tensor)

# in order to convert the tensor back to an array, we need
# to evaluate it; for this, we need a session
with tf.Session() as sess:
image_out = sess.run(fetches=t)
assert isinstance(image_out, np.ndarray)

# for imshow to work, the image needs to be in 0..1 range
# whenever it is a float; that's why we normalized it.
cv2.imshow('Image', image_out)


Note that instead of using sess.run(t) we could also have used

with tf.Session() as sess:
image_out = t.eval(sess)


which essentially performs the same action. A benefit of using sess.run() directly is that we can fetch more than one tensor in the same pass through the (sub-)graph (say, tuple = sess.run(fetches=[t1, t2, t3])), whereas calling tensor.eval() always results in one separate pass per call.

• ## Quadratic interpolation given two points and one derivative

While reading up on line search algorithms in nonlinear optimization for neural network training, I came across this problem: Given a function $f(x)$, find a quadratic interpolant $q(x) = ax^2 + bx + c$ that fulfills the conditions $f(x_0) = q(x_0)$, $f(x_1) = q(x_1)$ and $f'(x_0) = q'(x_0)$. Basically this:

So I took out my scribbling pad, wrote down some equations and then, after two pages of nonsense, decided it really wasn’t worth the hassle. It turns out that the simple system

\begin{align} f(x_0) &= ax_0^2 + bx_0 + c\\ f(x_1) &= ax_1^2 + bx_1 + c\\ f'(x_0) &= 2ax_0 + b \end{align}

for

\begin{align} q(x) &= ax^2 + bx + c \end{align}

has the solution

\begin{align} a &= – \frac{f(x_0) – f(x_1) – x_0 f'(x_0) + x_1 f'(x_0)}{(x_0 – x_1)^2} \\ b &= – \frac{x_0^2 f'(x_0) – x_1^2 f'(x_0) – 2x_0 f(x_0) + 2x_0 f(x_1)}{(x_0 – x_1)^2} \\ c &= \frac{x_0^2 f(x_1) + x_1^2 f(x_0) – 2x_0 x_1 f(x_0) – x_0 x_1^2 f'(x_0) + x_0^2 x_1 f'(x_0)}{(x_0 – x_1)^2} \end{align}

Instead of ruining your time on the paper, it can be obtained more easily in Matlab using

syms a b c x_0 x_1 f(x_0) f(x_1) df(x_0)
[a, b, c] = solve(...
f(x_0) == a*x_0^2 + b*x_0 + c, ...
f(x_1) == a*x_1^2 + b*x_1 + c, ...
df(x_0) == 2*a*x_0 + b, ...
a, b, c);

syms q(x)
q(x) = simplify(a*x^2 + b*x + c);


Obviously, the whole purpose of this operation is to find an approximation to the local minimizer of $f'(x)$. This gives

\begin{align} 0 &\overset{!}{=} q'(x_{min}) \\ x_{min} &= -\frac{1}{2} \frac{x_0^2 f'(x_0) -x_1^2 f'(x_0) – 2 x_0 f(x_0) + 2 x_0 f(x_1)} {f(x_0) – f(x_1) – x_0 f'(x_0) + x_1 f'(x_0)} \end{align}

We also would need to check the interpolant’s second derivative $q“(x_{min})$ to ensure the approximated minimizer is indeed a minimum of $q(x)$ by requiring $q“(x_{min}) > 0$, with the second derivative given as:

\begin{align} q“(x) &= – 2 \frac{f(x_0) – f(x_1) – x_0 f'(x_0) + x_1 f'(x_0)}{\left( x_0 – x_1 \right)^2 } \end{align}

The premise of the line search in minimization problems usually is that the search direction is already a direction of descent. By having $0 > f'(x_0)$ and $f'(x_1) > 0$ (as would typically be the case when bracketing the local minimizer of $f(x)$), the interpolant should always be (strictly) convex. If these conditions do not hold, there might be no solution at all: one obviously won’t be able to find a quadratic interpolant given the initial conditions for a function that is linear to machine precision. In that case, watch out for divisions by zero.

Last but not least, if the objective is to minimize $\varphi(\alpha) = f(\vec{x}_k + \alpha \vec{d}_k)$ using $q(\alpha)$, where $\vec{d}_k$ is the search direction and $\vec{x}_k$ the current starting point, such that

\begin{align} \varphi(0) &= f(\vec{x}_k) \\ \varphi'(0) &= \nabla f(\vec{x}_k)‘ \vec{d}_k \end{align}

then the above formulas simplify to

\begin{align} a &= – \frac{\varphi(0) – \varphi(\alpha) + \alpha \varphi'(0)}{\alpha^2} \\ b &= \frac{\alpha^2 \varphi'(\alpha)}{\alpha^2} \\ c &= \frac{\alpha^2 \varphi(0)}{\alpha^2} \end{align}

and, more importantly, the local (approximated) minimizer at $\alpha_{min}$ simplifies to

\begin{align} \alpha_{min} &= \frac{1}{2} \frac{\alpha^2 \varphi'(0)}{\varphi(0)-\varphi(\alpha)+\alpha\varphi'(0)} \end{align}

If $q(\alpha)$ is required to be strongly convex, then we’ll observe that

\begin{align} q“(\alpha) &= 2a \overset{!}{\succeq} m \end{align}

for an $m > 0$, giving us that $a$ must be greater than zero (or $\epsilon$, for that matter), which is a trivial check. The following picture visualizes that this is indeed the case:

Convexity of a parabola for different highest-order coefficients a with positive b (top), zero b (middle) and negative b (bottom). Lowest-order coefficient c is left out for brevity.