## Kategoriearchiv

Du betrachtest das Archiv der Kategorie MATLAB.

• ## Quadratic interpolation given two points and one derivative

While reading up on line search algorithms in nonlinear optimization for neural network training, I came across this problem: Given a function $f(x)$, find a quadratic interpolant $q(x) = ax^2 + bx + c$ that fulfills the conditions $f(x_0) = q(x_0)$, $f(x_1) = q(x_1)$ and $f'(x_0) = q'(x_0)$. Basically this:

So I took out my scribbling pad, wrote down some equations and then, after two pages of nonsense, decided it really wasn’t worth the hassle. It turns out that the simple system

\begin{align} f(x_0) &= ax_0^2 + bx_0 + c\\ f(x_1) &= ax_1^2 + bx_1 + c\\ f'(x_0) &= 2ax_0 + b \end{align}

for

\begin{align} q(x) &= ax^2 + bx + c \end{align}

has the solution

\begin{align} a &= – \frac{f(x_0) – f(x_1) – x_0 f'(x_0) + x_1 f'(x_0)}{(x_0 – x_1)^2} \\ b &= – \frac{x_0^2 f'(x_0) – x_1^2 f'(x_0) – 2x_0 f(x_0) + 2x_0 f(x_1)}{(x_0 – x_1)^2} \\ c &= \frac{x_0^2 f(x_1) + x_1^2 f(x_0) – 2x_0 x_1 f(x_0) – x_0 x_1^2 f'(x_0) + x_0^2 x_1 f'(x_0)}{(x_0 – x_1)^2} \end{align}

Instead of ruining your time on the paper, it can be obtained more easily in Matlab using

syms a b c x_0 x_1 f(x_0) f(x_1) df(x_0)
[a, b, c] = solve(...
f(x_0) == a*x_0^2 + b*x_0 + c, ...
f(x_1) == a*x_1^2 + b*x_1 + c, ...
df(x_0) == 2*a*x_0 + b, ...
a, b, c);

syms q(x)
q(x) = simplify(a*x^2 + b*x + c);


Obviously, the whole purpose of this operation is to find an approximation to the local minimizer of $f'(x)$. This gives

\begin{align} 0 &\overset{!}{=} q'(x_{min}) \\ x_{min} &= -\frac{1}{2} \frac{x_0^2 f'(x_0) -x_1^2 f'(x_0) – 2 x_0 f(x_0) + 2 x_0 f(x_1)} {f(x_0) – f(x_1) – x_0 f'(x_0) + x_1 f'(x_0)} \end{align}

We also would need to check the interpolant’s second derivative $q“(x_{min})$ to ensure the approximated minimizer is indeed a minimum of $q(x)$ by requiring $q“(x_{min}) > 0$, with the second derivative given as:

\begin{align} q“(x) &= – 2 \frac{f(x_0) – f(x_1) – x_0 f'(x_0) + x_1 f'(x_0)}{\left( x_0 – x_1 \right)^2 } \end{align}

The premise of the line search in minimization problems usually is that the search direction is already a direction of descent. By having $0 > f'(x_0)$ and $f'(x_1) > 0$ (as would typically be the case when bracketing the local minimizer of $f(x)$), the interpolant should always be (strictly) convex. If these conditions do not hold, there might be no solution at all: one obviously won’t be able to find a quadratic interpolant given the initial conditions for a function that is linear to machine precision. In that case, watch out for divisions by zero.

Last but not least, if the objective is to minimize $\varphi(\alpha) = f(\vec{x}_k + \alpha \vec{d}_k)$ using $q(\alpha)$, where $\vec{d}_k$ is the search direction and $\vec{x}_k$ the current starting point, such that

\begin{align} \varphi(0) &= f(\vec{x}_k) \\ \varphi'(0) &= \nabla f(\vec{x}_k)‘ \vec{d}_k \end{align}

then the above formulas simplify to

\begin{align} a &= – \frac{\varphi(0) – \varphi(\alpha) + \alpha \varphi'(0)}{\alpha^2} \\ b &= \frac{\alpha^2 \varphi'(\alpha)}{\alpha^2} \\ c &= \frac{\alpha^2 \varphi(0)}{\alpha^2} \end{align}

and, more importantly, the local (approximated) minimizer at $\alpha_{min}$ simplifies to

\begin{align} \alpha_{min} &= \frac{1}{2} \frac{\alpha^2 \varphi'(0)}{\varphi(0)-\varphi(\alpha)+\alpha\varphi'(0)} \end{align}

If $q(\alpha)$ is required to be strongly convex, then we’ll observe that

\begin{align} q“(\alpha) &= 2a \overset{!}{\succeq} m \end{align}

for an $m > 0$, giving us that $a$ must be greater than zero (or $\epsilon$, for that matter), which is a trivial check. The following picture visualizes that this is indeed the case:

Convexity of a parabola for different highest-order coefficients a with positive b (top), zero b (middle) and negative b (bottom). Lowest-order coefficient c is left out for brevity.

well …

• ## Hidden Markov Model training using the Baum-Welch Algorithm

The Baum-Welch algorithm determines the (locally) optimal parameters for a Hidden Markov Model by essentially using three equations.

One for the initial probabilities:

\begin{align} \pi_i &= \frac{E\left(\text{Number of times a sequence started with state}\, s_i\right)}{E\left(\text{Number of times a sequence started with any state}\right)} \end{align}

Another for the transition probabilities:

\begin{align} a_{ij} &= \frac{E\left(\text{Number of times the state changed from}\, s_i \, \text{to}\,s_j\right)}{E\left(\text{Number of times the state changed from}\, s_i \, \text{to any state}\right)} \end{align}

And the last one for the emission probabilities:

\begin{align} b_{ik} &= \frac{E\left(\text{Number of times the state was}\, s_i \, \text{and the observation was}\,v_k\right)}{E\left(\text{Number of times the state was}\, s_i\right)} \end{align}

If one had a fully labeled training corpus representing all possible outcomes, this would be exactly the optimal solution: Count each occurrence, normalize and you’re good. If, however, no such labeled training corpus is available – i.e. only observations are given, no according state sequences – the expected values $E(c)$ of these counts would have to be estimated. This can be done (and is done) using the forward and backward probabilities $\alpha_t(i)$ and $\beta_t(i)$, as described below.
Weiterlesen »

• ## Regular Kalman Filter (almost) super-quick Reference

To make long things short, here’s the complete Matlab code.

% State estimations
x       state vector                  (M x 1)
A       state transition matrix       (M x M)
P       state covariance matrix       (M x M)

% Input / control data
u       input vector                  (N x 1)
B       input transition matrix       (M x N)
Q       input noise covariance matrix (N x N)

% Observations
z       observation vector            (Z x 1)
H       state-to-observation matrix   (Z x M)
R       observation noise covariance  (Z x Z)

% tuning
lambda  tuning parameter              (scalar)

function [x, P] = kf_predict (x, A, P, lambda , u, B, Q)
x = A*x + B*u; % a priori state prediction
P = A*P*A' * 1/( lambda ^2) + B*Q*B'; % a priori covariance
end

function [x, P] = kf_update (x, z, P, H, R)
y = z - H*x; % measurement residuals ("innovation")
S = H*P*H' + R; % residual (innovation) covariance
K = P*H' / S; % Kalman gain
x = x + K*y; % a posteriori state prediction
P = (eye(size(P)) - K*H)*P; % a posteriori covariance matrix
end

• ## Naismith, Aitken-Langmuir, Tranter and Tobler: Modeling hiking speed

While planning an eleven-day trip through the Hardangervidda in Norway, I came across the age old problem of estimating the walking time for a given path on the map. While one is easily able to determine the times for the main west-east and north-south routes from a travel guide, there sadly is no information about those self-made problems (i.e. custom routes). Obviously, a simple and correct solution needs to be found.

Of course, there is no such thing. When searching for hiking time rules, two candidates pop up regularly: Naismith’s rule (including Tranter’s corrections), as well as Tobler’s hiking function.

William W. Naismith’s rule – and I couldn’t find a single scientific source – is more a rule of thumb than it is exact. It states:

For every 5 kilometres, allow one hour. For every 600 metres of ascend, add another hour.

\begin{align} \theta &= \tan^{-1}(\frac{\Delta a}{\Delta s}) \\ t &= \Delta s \left( \frac{1\mathrm{h}}{5\mathrm{km}} \right) + \Delta a \left( \frac{1 \mathrm{h}}{0.6 \mathrm{km}} \right) \\ |\vec{w}| &= \frac{\Delta s}{t} \end{align}

where $|\vec{w}|$ is the walking speed, $\Delta s$ the length on the horizontal plane (i.e. „forward“), $\Delta a$ the ascend (i.e. the difference in height) and $\theta$ the slope.

function [w, t, slope] = naismith(length, ascend)
slope = ascend/length;
t = length*(1/5) + ascend*(1/0.6);
w = length./t;
end


That looks like

Interestingly, this implies that if you climb a 3 km mountain straight up, it will take you 5 hours. By recognising that $5 \textrm{km} / 0.6 \textrm{km} \approx 8.3 \approx 8$, the 8 to 1 rule can be employed, which allows the transformation of any (Naismith-ish) track to a flat track by calculating

\begin{align} \Delta s_{flat} &= \Delta s + \frac{5 \mathrm{km}}{0.6 \mathrm{km}} \cdot \Delta a\\ &\approx \Delta s + 8 \cdot \Delta a \end{align}

So a track of $20 \textrm{km}$ in length with $1 \textrm{km}$ of ascend would make for $20 \mathrm{km} + 8 \cdot 1 \mathrm{km} = 28 \mathrm{km}$ of total track length. Assuming an average walking speed of $5 \mathrm{km/h}$, that route will take $28 \mathrm{km} / 5 \mathrm{km/h} = 5.6 \mathrm{h}$, or 5 hours and 36 minutes. Although quite inaccurate, somebody found this rule to be accurate enough when comparing it against times of men running down hills in Norway. Don’t quote me on that.

Robert Aitken assumed that 5 km/h might be too much and settled for 4 km/h on all off-track surfaces. Unfortunately the Naismith rule still didn’t state anything about descent or slopes in general, so Eric Langmuir added some refinements:

When walking off-track, allow one hour for every 4 kilometres (instead of 5 km). When on a small decline of 5 to 12°, subtract 10 minutes per 300 metres (1000 feet). For any steeper decline (i.e. over 12°), add 10 minutes per 300 metres of descent.

Now that’s the stuff wonderfully non-differentiable functions are made of:

That is:

\begin{align} \theta &= \tan^{-1}(\frac{\Delta a}{\Delta s}) \\ t &= \Delta s \left( \frac{1\mathrm{h}}{5\mathrm{km}} \right) + \begin{cases} +\Delta a \left( \frac{1 \mathrm{h}}{0.6 \mathrm{km}} \right) , & \text{if \theta > -5^\circ} \\ -|\Delta a| \left( \frac{\left(10/60\right) \mathrm{h}}{0.3 \mathrm{km}} \right) , & \text{if -12^\circ \le \theta \le -5^\circ} \\ +|\Delta a| \left( \frac{\left(10/60\right) \mathrm{h}}{0.3 \mathrm{km}} \right) , & \text{if \theta < -12^\circ} \end{cases} \\ |\vec{w}| &= \frac{\Delta s}{t} \end{align}

It should be clear that 12 km/h is an highly unlikely speed, even on roads.

function [w, t, slope] = naismith_al(length, ascend, base_speed)
if ~exist('base_speed', 'var')
base_speed = 4; % km/h
end

slope = ascend/length;

t = length*(1/base_speed);
if slope >= 0
t = t + ascend*(1/0.6);
elseif atand(slope) <= -5 && atand(slope) >= -12
t = t - abs(ascend)*((10/60)/0.3);
elseif atand(slope) < -12
t = t + abs(ascend)*((10/60)/0.3);
end

w = length./t;
end


So Waldo Tobler came along and developed his „hiking function“, an equation that assumes a top speed of 6 km/h with an interesting feature: It – though still indifferentiable – adapts gracefully to the slope of the ground. That function can be found in his 1993 report „Three presentations on geographical analysis and modeling: Non-isotropic geographic modeling speculations on the geometry of geography global spatial analysis“ and looks like the following:

It boils down to the following equation of the walking speed $|\vec{w}|$ „on footpaths in hilly terrain“ (with $s=1$) and „off-path travel“ (with $s=0.6$):

\begin{align} |\vec{w}| = s \cdot 6e^{-3.5 \cdot | \tan(\theta) + 0.05 |} \end{align}

where $\tan(\theta)$ is the tangent of the slope (i.e. vertical distance over horizontal distance). By taking into account the exact slope of the terrain, this function is superior to Naismith’s rule and a much better alternative to the Langmuir bugfix, especially when used on GIS data.

function [w] = tobler(slope, scaling)
w = scaling*6*exp(-3.5 * abs(slope+0.05));
end


It however lacks the one thing that makes the Naismith rule stand out: Tranter’s corrections for fatigue and fitness. (Yes, I know it gets weird.) Sadly these corrections seem to only exists in the form of a mystical table that looks, basically, like that:

Fitness in minutes Time in hours according to Naismith’s rule
2 3 4 5 6 7 8 9 10 12 14 16 18 20 22 24
15 (very fit) 1 2 10 12½ 14½ 17 19½ 22 24
20 10 12½ 15 17½ 20 23
25 3 7 10 11½ 13¼ 15 17½
30 2 5 10½ 12½ 14½
40 11½
50 (unfit)

where the minutes are a rather obscure measure of how fast somebody is able to hike up 300 metres over a distance of 800 metres ($20^\circ$). With that table the rule is: If you get into nastier terrain, drop one fitness level. If you suck at walking, drop a fitness level. If you use a 20 kg backpack, drop one level. Sadly, there’s no equation to be found, so I had to make up one myself.

hours   = [2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, 22, 24];
fitness = [15, 20, 25, 30, 40, 50];
table   = [1,    1.5,  2,    2.75, 3.5, 4.5,  5.5,  6.75, 7.75,  10,   12.5, 14.5, 17,  19.5, 22, 24;
1.25, 2.25, 3.25, 4.5,  5.5, 6.7,  7.75, 8.75, 10,    12.5, 15,   17.5, 20,  23,  NaN, NaN;
1.5,  3,    4.25, 5.5,  7,   8.5,  10,   11.5, 13.25, 15,   17.5, NaN,  NaN, NaN, NaN, NaN;
2,    3.5,  5,    6.75, 8.5, 10.5, 12.5, 14.5, NaN,   NaN,  NaN,  NaN,  NaN, NaN, NaN, NaN;
2.75, 4.25, 5.75, 7.5,  9.5, 11.5, NaN,  NaN,  NaN,   NaN,  NaN,  NaN,  NaN, NaN, NaN, NaN;
3.25, 4.75, 6.5,  8.5,  NaN, NaN,  NaN,  NaN,  NaN,   NaN,  NaN,  NaN,  NaN, NaN, NaN, NaN];


By looking at the table and the mesh plot it seems that each time axis for a given fitness is logarithmic.

I did a log-log plot and it turns out that the series not only appear to be logarithmic in time, but also in fitness. By deriving the (log-log-)linear regression for each series, the following equations can be found:

\begin{align} t_{15}(t) &= e^{1.35 \,ln(t) – 1.08} \\ t_{20}(t) &= e^{1.24 \,ln(t) – 0.55} \\ t_{25}(t) &= e^{1.25 \,ln(t) – 0.33} \\ t_{30}(t) &= e^{1.31 \,ln(t) – 0.21} \\ t_{40}(t) &= e^{1.14 \,ln(t) + 0.20} \\ t_{50}(t) &= e^{1.05 \,ln(t) + 0.44} \\ \end{align}

These early approximations appear to be quite good, as can be seen in the following linear plot. The last three lines $t_{30}$, $t_{40}$ and $t_{50}$ however begin to drift away. That’s expected for the last two ones due to the small number of samples, but the $t_{30}$ line was irritating.

My first assumption was that the $t_{40}$ and $t_{50}$ lines simply are outliers and that the real coefficient for the time variable is the (outlier corrected) mean of $1.2215 \pm 0.11207$. This would imply, that the intersect coefficient is the variable for fitness.

Unfortunately, this only seems to make things better in the log-log plot, but makes them a little bit worse in the linear world.

Equi-distant intersect coefficients also did not do the trick. Well, well. In the end, I decided to give the brute force method a chance and defined several fitting functions for the use with genetic algorithm and pattern search solvers, including exponential, third-order and sigmoidal forms. The best version I could come up with was

\begin{align} t_{corrected}(t, f) &= 0.31381 \,e^{1.2097 \,ln(t) + 0.81328 \,ln(f) – 1.7307} \end{align}

This function results in a least squared error of about 21.35 hours over all data points. The following shows the original surface from the table and the synthetic surface from the function.

A maximum deviation of about 1 hour can be seen clearly in the following error plot for the $t_{30}$ line, which really seems to be an outlier.

For comparison (here’s the original table), this is the synthetic correction table:

Fitness in minutes Time in hours according to Naismith’s rule
2 3 4 5 6 7 8 9 10 12 14 16 18 20 22 24
15 (very fit) 2 10¼ 12¼ 14½ 16½ 18¾ 21¼ 23½
20 9 10¼ 12¾ 15½ 18¼ 21 23¾
25 3 4 8 10¾ 12¼ 15½ 18½
30 2 11 12½
40 6 11¾
50 (unfit) 3 5