## Kategoriearchiv

Du betrachtest das Archiv der Kategorie MATLAB.

• ## Quadratic interpolation given two points and one derivative

While read­ing up on line search algo­rithms in non­lin­ear opti­miza­tion for neur­al net­work train­ing, I came across this prob­lem: Giv­en a func­tion $f(x)$ , find a qua­drat­ic inter­polant $q(x) = ax^2 + bx + c$ that ful­fills the con­di­tions $f(x_0) = q(x_0)$ , $f(x_1) = q(x_1)$ and $f'(x_0) = q'(x_0)$ . Basi­cal­ly this:

So I took out my scrib­bling pad, wrote down some equa­tions and then, after two pages of non­sense, decid­ed it real­ly wasn’t worth the has­sle. It turns out that the sim­ple sys­tem

\begin{align}
f(x_0) &= ax_0^2 + bx_0 + c\\
f(x_1) &= ax_1^2 + bx_1 + c\\
f'(x_0) &= 2ax_0 + b
\end{align}

for

\begin{align}
q(x) &= ax^2 + bx + c
\end{align}

has the solu­tion

\begin{align}
a &= - \frac{f(x_0) - f(x_1) - x_0 f'(x_0) + x_1 f'(x_0)}{(x_0 - x_1)^2} \\
b &= - \frac{x_0^2 f'(x_0) - x_1^2 f'(x_0) - 2x_0 f(x_0) + 2x_0 f(x_1)}{(x_0 - x_1)^2} \\
c &= \frac{x_0^2 f(x_1) + x_1^2 f(x_0) - 2x_0 x_1 f(x_0) - x_0 x_1^2 f'(x_0) + x_0^2 x_1 f'(x_0)}{(x_0 - x_1)^2}
\end{align}

Instead of ruin­ing your time on the paper, it can be obtained more eas­i­ly in Mat­lab using

syms a b c x_0 x_1 f(x_0) f(x_1) df(x_0)
[a, b, c] = solve(...
f(x_0) == a*x_0^2 + b*x_0 + c, ...
f(x_1) == a*x_1^2 + b*x_1 + c, ...
df(x_0) == 2*a*x_0 + b, ...
a, b, c);

syms q(x)
q(x) = simplify(a*x^2 + b*x + c);


Obvi­ous­ly, the whole pur­pose of this oper­a­tion is to find an approx­i­ma­tion to the local min­i­miz­er of $f'(x)$ . This gives

\begin{align}
0 &\overset{!}{=} q'(x_{min}) \\
x_{min} &= -\frac{1}{2} \frac{x_0^2 f'(x_0) -x_1^2 f'(x_0) - 2 x_0 f(x_0) + 2 x_0 f(x_1)} {f(x_0) - f(x_1) - x_0 f'(x_0) + x_1 f'(x_0)}
\end{align}

We also would need to check the interpolant’s sec­ond deriv­a­tive $q''(x_{min})$ to ensure the approx­i­mat­ed min­i­miz­er is indeed a min­i­mum of $q(x)$ by requir­ing $q''(x_{min}) > 0$ , with the sec­ond deriv­a­tive giv­en as:

\begin{align}
q''(x) &= - 2 \frac{f(x_0) - f(x_1) - x_0 f'(x_0) + x_1 f'(x_0)}{\left( x_0 - x_1 \right)^2 }
\end{align}

The premise of the line search in min­i­miza­tion prob­lems usu­al­ly is that the search direc­tion is already a direc­tion of descent. By hav­ing $0 > f'(x_0)$ and $f'(x_1) > 0$ (as would typ­i­cal­ly be the case when brack­et­ing the local min­i­miz­er of $f(x)$ ), the inter­polant should always be (strict­ly) con­vex. If these con­di­tions do not hold, there might be no solu­tion at all: one obvi­ous­ly won’t be able to find a qua­drat­ic inter­polant giv­en the ini­tial con­di­tions for a func­tion that is lin­ear to machine pre­ci­sion. In that case, watch out for divi­sions by zero.

Last but not least, if the objec­tive is to min­i­mize $\varphi(\alpha) = f(\vec{x}_k + \alpha \vec{d}_k)$ using $q(\alpha)$ , where $\vec{d}_k$ is the search direc­tion and $\vec{x}_k$ the cur­rent start­ing point, such that

\begin{align}
\varphi(0) &= f(\vec{x}_k) \\
\varphi'(0) &= \nabla f(\vec{x}_k)' \vec{d}_k
\end{align}

then the above for­mu­las sim­pli­fy to

\begin{align}
a &= - \frac{\varphi(0) - \varphi(\alpha) + \alpha \varphi'(0)}{\alpha^2} \\
b &= \frac{\alpha^2 \varphi'(\alpha)}{\alpha^2} \\
c &= \frac{\alpha^2 \varphi(0)}{\alpha^2}
\end{align}

and, more impor­tant­ly, the local (approx­i­mat­ed) min­i­miz­er at $\alpha_{min}$ sim­pli­fies to

\begin{align}
\alpha_{min} &= \frac{1}{2} \frac{\alpha^2 \varphi'(0)}{\varphi(0)-\varphi(\alpha)+\alpha\varphi'(0)}
\end{align}

If $q(\alpha)$ is required to be strong­ly con­vex, then we’ll observe that

\begin{align}
q''(\alpha) &= 2a \overset{!}{\succeq} m
\end{align}

for an $m > 0$ , giv­ing us that $a$ must be greater than zero (or $\epsilon$ , for that mat­ter), which is a triv­ial check. The fol­low­ing pic­ture visu­al­izes that this is indeed the case:

Con­vex­i­ty of a parabo­la for dif­fer­ent high­est-order coef­fi­cients a with pos­i­tive b (top), zero b (mid­dle) and neg­a­tive b (bot­tom). Low­est-order coef­fi­cient c is left out for brevi­ty.

well …

• ## Hidden Markov Model training using the Baum-Welch Algorithm

The Baum-Welch algo­rithm deter­mines the (local­ly) opti­mal para­me­ters for a Hid­den Markov Mod­el by essen­tial­ly using three equa­tions.

One for the ini­tial prob­a­bil­i­ties:

\begin{align}
\pi_i &= \frac{E\left(\text{Number of times a sequence started with state}\, s_i\right)}{E\left(\text{Number of times a sequence started with any state}\right)}
\end{align}

Anoth­er for the tran­si­tion prob­a­bil­i­ties:

\begin{align}
a_{ij} &= \frac{E\left(\text{Number of times the state changed from}\, s_i \, \text{to}\,s_j\right)}{E\left(\text{Number of times the state changed from}\, s_i \, \text{to any state}\right)}
\end{align}

And the last one for the emis­sion prob­a­bil­i­ties:

\begin{align}
b_{ik} &= \frac{E\left(\text{Number of times the state was}\, s_i \, \text{and the observation was}\,v_k\right)}{E\left(\text{Number of times the state was}\, s_i\right)}
\end{align}

If one had a ful­ly labeled train­ing cor­pus rep­re­sent­ing all pos­si­ble out­comes, this would be exact­ly the opti­mal solu­tion: Count each occur­rence, nor­mal­ize and you’re good. If, how­ev­er, no such labeled train­ing cor­pus is avail­able — i.e. only obser­va­tions are giv­en, no accord­ing state sequences — the expect­ed val­ues $E(c)$ of these counts would have to be esti­mat­ed. This can be done (and is done) using the for­ward and back­ward prob­a­bil­i­ties $\alpha_t(i)$ and $\beta_t(i)$ , as described below.
Weit­er­lesen »

• ## Regular Kalman Filter (almost) super-quick Reference

To make long things short, here’s the com­plete Mat­lab code.

% State estimations
x       state vector                  (M x 1)
A       state transition matrix       (M x M)
P       state covariance matrix       (M x M)

% Input / control data
u       input vector                  (N x 1)
B       input transition matrix       (M x N)
Q       input noise covariance matrix (N x N)

% Observations
z       observation vector            (Z x 1)
H       state-to-observation matrix   (Z x M)
R       observation noise covariance  (Z x Z)

% tuning
lambda  tuning parameter              (scalar)

function [x, P] = kf_predict (x, A, P, lambda , u, B, Q)
x = A*x + B*u; % a priori state prediction
P = A*P*A' * 1/( lambda ^2) + B*Q*B'; % a priori covariance
end

function [x, P] = kf_update (x, z, P, H, R)
y = z - H*x; % measurement residuals ("innovation")
S = H*P*H' + R; % residual (innovation) covariance
K = P*H' / S; % Kalman gain
x = x + K*y; % a posteriori state prediction
P = (eye(size(P)) - K*H)*P; % a posteriori covariance matrix
end

• ## Naismith, Aitken-Langmuir, Tranter and Tobler: Modeling hiking speed

While plan­ning an eleven-day trekking trip through the Hardan­gervid­da in Nor­way, I came across the age old prob­lem of esti­mat­ing the walk­ing time for a giv­en path on the map. While one is eas­i­ly able to deter­mine the times for the main west-east and north-south routes from a trav­el guide, there sad­ly is no infor­ma­tion about those self-made prob­lems (i.e. cus­tom routes). Obvi­ous­ly, a sim­ple and cor­rect solu­tion needs to be found.

Of course, there is no such thing. When search­ing for hik­ing time rules, two can­di­dates pop up reg­u­lar­ly: Naismith’s rule (includ­ing Tranter’s cor­rec­tions), as well as Tobler’s hik­ing func­tion.

William W. Naismith’s rule — and I couldn’t find a sin­gle sci­en­tif­ic source — is more a rule of thumb than it is exact. It states:

For every 5 kilo­me­tres, allow one hour. For every 600 metres of ascend, add anoth­er hour.

\begin{align}
\theta &= \tan^{-1}(\frac{\Delta a}{\Delta s}) \\
t &= \Delta s \left( \frac{1\mathrm{h}}{5\mathrm{km}} \right) + \Delta a \left( \frac{1 \mathrm{h}}{0.6 \mathrm{km}} \right) \\
|\vec{w}| &= \frac{\Delta s}{t}
\end{align}

where $|\vec{w}|$ is the walk­ing speed, $\Delta s$ the length on the hor­i­zon­tal plane (i.e. “for­ward”), $\Delta a$ the ascend (i.e. the dif­fer­ence in height) and $\theta$ the slope.

function [w, t, slope] = naismith(length, ascend)
slope = ascend/length;
t = length*(1/5) + ascend*(1/0.6);
w = length./t;
end


That looks like

Inter­est­ing­ly, this implies that if you climb a 3 km moun­tain straight up, it will take you 5 hours. By recog­nis­ing that $5 \textrm{km} / 0.6 \textrm{km} \approx 8.3 \approx 8$ , the 8 to 1 rule can be employed, which allows the trans­for­ma­tion of any (Nai­smith-ish) track to a flat track by cal­cu­lat­ing

\begin{align}
\Delta s_{flat} &= \Delta s + \frac{5 \mathrm{km}}{0.6 \mathrm{km}} \cdot \Delta a\\
&\approx \Delta s + 8 \cdot \Delta a
\end{align}

So a track of $20 \textrm{km}$ in length with $1 \textrm{km}$ of ascend would make for $\mathrm{km} + 8 \cdot 1 \mathrm{km} = 28 \mathrm{km}$ of total track length. Assum­ing an aver­age walk­ing speed of $5 \mathrm{km/h}$ , that route will take $28 \mathrm{km} / 5 \mathrm{km/h} = 5.6 \mathrm{h}$ , or 5 hours and 36 min­utes. Although quite inac­cu­rate, some­body found this rule to be accu­rate enough when com­par­ing it against times of men run­ning down hills in Nor­way. Don’t quote me on that.

Robert Aitken assumed that 5 km/h might be too much and set­tled for 4 km/h on all off-track sur­faces. Unfor­tu­nate­ly the Nai­smith rule still didn’t state any­thing about descent or slopes in gen­er­al, so Eric Lang­muir added some refine­ments:

When walk­ing off-track, allow one hour for every 4 kilo­me­tres (instead of 5 km). When on a small decline of 5 to 12°, sub­tract 10 min­utes per 300 metres (1000 feet). For any steep­er decline (i.e. over 12°), add 10 min­utes per 300 metres of descent.

Now that’s the stuff won­der­ful­ly non-dif­fer­en­tiable func­tions are made of:

That is:

\begin{align}
\theta &= \tan^{-1}(\frac{\Delta a}{\Delta s}) \\
t &= \Delta s \left( \frac{1\mathrm{h}}{5\mathrm{km}} \right) + \begin{cases}
+\Delta a \left( \frac{1 \mathrm{h}}{0.6 \mathrm{km}} \right) , & \text{if $\theta > -5^\circ$} \\
-|\Delta a| \left( \frac{\left(10/60\right) \mathrm{h}}{0.3 \mathrm{km}} \right) , & \text{if $-12^\circ \le \theta \le -5^\circ$} \\
+|\Delta a| \left( \frac{\left(10/60\right) \mathrm{h}}{0.3 \mathrm{km}} \right) , & \text{if $\theta < -12^\circ$}
\end{cases} \\
|\vec{w}| &= \frac{\Delta s}{t}
\end{align}

It should be clear that 12 km/h is an high­ly unlike­ly speed, even on roads.

function [w, t, slope] = naismith_al(length, ascend, base_speed)
if ~exist('base_speed', 'var')
base_speed = 4; % km/h
end

slope = ascend/length;

t = length*(1/base_speed);
if slope >= 0
t = t + ascend*(1/0.6);
elseif atand(slope) <= -5 && atand(slope) >= -12
t = t - abs(ascend)*((10/60)/0.3);
elseif atand(slope) < -12
t = t + abs(ascend)*((10/60)/0.3);
end

w = length./t;
end


So Wal­do Tobler came along and devel­oped his “hik­ing func­tion”, an equa­tion that assumes a top speed of 6 km/h with an inter­est­ing fea­ture: It — though still indif­fer­en­tiable — adapts grace­ful­ly to the slope of the ground. That func­tion can be found in his 1993 report “Three pre­sen­ta­tions on geo­graph­i­cal analy­sis and mod­el­ing: Non-isotrop­ic geo­graph­ic mod­el­ing spec­u­la­tions on the geom­e­try of geog­ra­phy glob­al spa­tial analy­sis” and looks like the fol­low­ing:

It boils down to the fol­low­ing equa­tion of the walk­ing speed $|\vec{w}|$ “on foot­paths in hilly ter­rain” (with $s=1$ ) and “off-path trav­el” (with $s=0.6$ ):

\begin{align}
|\vec{w}| = s \cdot 6e^{-3.5 \cdot | \tan(\theta) + 0.05 |}
\end{align}

where $\tan(\theta)$ is the tan­gent of the slope (i.e. ver­ti­cal dis­tance over hor­i­zon­tal dis­tance). By tak­ing into account the exact slope of the ter­rain, this func­tion is supe­ri­or to Naismith’s rule and a much bet­ter alter­na­tive to the Lang­muir bug­fix, espe­cial­ly when used on GIS data.

function [w] = tobler(slope, scaling)
w = scaling*6*exp(-3.5 * abs(slope+0.05));
end


It how­ev­er lacks the one thing that makes the Nai­smith rule stand out: Tranter’s cor­rec­tions for fatigue and fit­ness. (Yes, I know it gets weird.) Sad­ly these cor­rec­tions seem to only exists in the form of a mys­ti­cal table that looks, basi­cal­ly, like that:

Fit­ness in min­utes Time in hours accord­ing to Naismith’s rule
2 3 4 5 6 7 8 9 10 12 14 16 18 20 22 24
15 (very fit) 1 2 10 12½ 14½ 17 19½ 22 24
20 10 12½ 15 17½ 20 23
25 3 7 10 11½ 13¼ 15 17½
30 2 5 10½ 12½ 14½
40 11½
50 (unfit)

where the min­utes are a rather obscure mea­sure of how fast some­body is able to hike up 300 metres over a dis­tance of 800 metres ($20^\circ$). With that table the rule is: If you get into nas­ti­er ter­rain, drop one fit­ness lev­el. If you suck at walk­ing, drop a fit­ness lev­el. If you use a 20 kg back­pack, drop one lev­el. Sad­ly, there’s no equa­tion to be found, so I had to make up one myself.

hours   = [2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, 22, 24];
fitness = [15, 20, 25, 30, 40, 50];
table   = [1,    1.5,  2,    2.75, 3.5, 4.5,  5.5,  6.75, 7.75,  10,   12.5, 14.5, 17,  19.5, 22, 24;
1.25, 2.25, 3.25, 4.5,  5.5, 6.7,  7.75, 8.75, 10,    12.5, 15,   17.5, 20,  23,  NaN, NaN;
1.5,  3,    4.25, 5.5,  7,   8.5,  10,   11.5, 13.25, 15,   17.5, NaN,  NaN, NaN, NaN, NaN;
2,    3.5,  5,    6.75, 8.5, 10.5, 12.5, 14.5, NaN,   NaN,  NaN,  NaN,  NaN, NaN, NaN, NaN;
2.75, 4.25, 5.75, 7.5,  9.5, 11.5, NaN,  NaN,  NaN,   NaN,  NaN,  NaN,  NaN, NaN, NaN, NaN;
3.25, 4.75, 6.5,  8.5,  NaN, NaN,  NaN,  NaN,  NaN,   NaN,  NaN,  NaN,  NaN, NaN, NaN, NaN];


By look­ing at the table and the mesh plot it seems that each time axis for a giv­en fit­ness is log­a­rith­mic.

I did a log-log plot and it turns out that the series not only appear to be log­a­rith­mic in time, but also in fit­ness. By deriv­ing the (log-log-)linear regres­sion for each series, the fol­low­ing equa­tions can be found:

\begin{align}
t_{15}(t) &= e^{1.35 \,ln(t) - 1.08} \\
t_{20}(t) &= e^{1.24 \,ln(t) - 0.55} \\
t_{25}(t) &= e^{1.25 \,ln(t) - 0.33} \\
t_{30}(t) &= e^{1.31 \,ln(t) - 0.21} \\
t_{40}(t) &= e^{1.14 \,ln(t) + 0.20} \\
t_{50}(t) &= e^{1.05 \,ln(t) + 0.44} \\
\end{align}

These ear­ly approx­i­ma­tions appear to be quite good, as can be seen in the fol­low­ing lin­ear plot. The last three lines $t_{30}$ , $t_{40}$ and $t_{50}$ how­ev­er begin to drift away. That’s expect­ed for the last two ones due to the small num­ber of sam­ples, but the $t_{30}$ line was irri­tat­ing.

My first assump­tion was that the $t_{40}$ and $t_{50}$ lines sim­ply are out­liers and that the real coef­fi­cient for the time vari­able is the (out­lier cor­rect­ed) mean of $1.2215 \pm 0.11207$ . This would imply, that the inter­sect coef­fi­cient is the vari­able for fit­ness.

Unfor­tu­nate­ly, this only seems to make things bet­ter in the log-log plot, but makes them a lit­tle bit worse in the lin­ear world.

Equi-dis­tant inter­sect coef­fi­cients also did not do the trick. Well, well. In the end, I decid­ed to give the brute force method a chance and defined sev­er­al fit­ting func­tions for the use with genet­ic algo­rithm and pat­tern search solvers, includ­ing expo­nen­tial, third-order and sig­moidal forms. The best ver­sion I could come up with was

\begin{align} t_{corrected}(t, f) &= 0.31381 \,e^{1.2097 \,ln(t) + 0.81328 \,ln(f) - 1.7307} \end{align}

This func­tion results in a least squared error of about 21.35 hours over all data points. The fol­low­ing shows the orig­i­nal sur­face from the table and the syn­thet­ic sur­face from the func­tion.

A max­i­mum devi­a­tion of about 1 hour can be seen clear­ly in the fol­low­ing error plot for the $t_{30}$ line, which real­ly seems to be an out­lier.

For com­par­i­son (here’s the orig­i­nal table), this is the syn­thet­ic cor­rec­tion table:

Fit­ness in min­utes Time in hours accord­ing to Naismith’s rule
2 3 4 5 6 7 8 9 10 12 14 16 18 20 22 24
15 (very fit) 2 10¼ 12¼ 14½ 16½ 18¾ 21¼ 23½
20 9 10¼ 12¾ 15½ 18¼ 21 23¾
25 3 4 8 10¾ 12¼ 15½ 18½
30 2 11 12½
40 6 11¾
50 (unfit) 3 5