While reading up on line search algorithms in nonlinear optimization for neural network training, I came across this problem: Given a function \(f(x)\) , find a quadratic interpolant \(q(x) = ax^2 + bx + c\) that fulfills the conditions \(f(x_0) = q(x_0)\) , \(f(x_1) = q(x_1)\) and \(f'(x_0) = q'(x_0)\) . Basically this:
So I took out my scribbling pad, wrote down some equations and then, after two pages of nonsense, decided it really wasn’t worth the hassle. It turns out that the simple system
We also would need to check the interpolant’s second derivative \(q''(x_{min})\) to ensure the approximated minimizer is indeed a minimum of \(q(x)\) by requiring \(q''(x_{min}) > 0\) , with the second derivative given as:
The premise of the line search in minimization problems usually is that the search direction is already a direction of descent. By having \(0 > f'(x_0)\) and \(f'(x_1) > 0\) (as would typically be the case when bracketing the local minimizer of \(f(x)\) ), the interpolant should always be (strictly) convex. If these conditions do not hold, there might be no solution at all: one obviously won’t be able to find a quadratic interpolant given the initial conditions for a function that is linear to machine precision. In that case, watch out for divisions by zero.
Last but not least, if the objective is to minimize \(\varphi(\alpha) = f(\vec{x}_k + \alpha \vec{d}_k)\) using \(q(\alpha)\) , where \(\vec{d}_k\) is the search direction and \(\vec{x}_k\) the current starting point, such that
If \(q(\alpha)\) is required to be strongly convex, then we’ll observe that
\begin{align}
q''(\alpha) &= 2a \overset{!}{\succeq} m
\end{align}
for an \(m > 0\) , giving us that \(a\) must be greater than zero (or \(\epsilon\) , for that matter), which is a trivial check. The following picture visualizes that this is indeed the case:
Convexity of a parabola for different highest-order coefficients a with positive b (top), zero b (middle) and negative b (bottom). Lowest-order coefficient c is left out for brevity.
Juli 2nd, 2015 GMT +1 von
Markus
2015-07-2T04:54:49+01:002018-03-4T14:45:44+01:00
· 0 Kommentare
The Baum-Welch algorithm determines the (locally) optimal parameters for a Hidden Markov Model by essentially using three equations.
One for the initial probabilities:
\begin{align}
\pi_i &= \frac{E\left(\text{Number of times a sequence started with state}\, s_i\right)}{E\left(\text{Number of times a sequence started with any state}\right)}
\end{align}
Another for the transition probabilities:
\begin{align}
a_{ij} &= \frac{E\left(\text{Number of times the state changed from}\, s_i \, \text{to}\,s_j\right)}{E\left(\text{Number of times the state changed from}\, s_i \, \text{to any state}\right)}
\end{align}
And the last one for the emission probabilities:
\begin{align}
b_{ik} &= \frac{E\left(\text{Number of times the state was}\, s_i \, \text{and the observation was}\,v_k\right)}{E\left(\text{Number of times the state was}\, s_i\right)}
\end{align}
If one had a fully labeled training corpus representing all possible outcomes, this would be exactly the optimal solution: Count each occurrence, normalize and you’re good. If, however, no such labeled training corpus is available — i.e. only observations are given, no according state sequences — the expected values \(E(c)\) of these counts would have to be estimated. This can be done (and is done) using the forward and backward probabilities \(\alpha_t(i)\) and \(\beta_t(i)\) , as described below. Weiterlesen »
September 1st, 2014 GMT +1 von
Markus
2014-09-1T03:32:04+01:002018-03-4T15:11:15+01:00
· 0 Kommentare
To make long things short, here’s the complete Matlab code.
% State estimations
x state vector (M x 1)
A state transition matrix (M x M)
P state covariance matrix (M x M)
% Input / control data
u input vector (N x 1)
B input transition matrix (M x N)
Q input noise covariance matrix (N x N)
% Observations
z observation vector (Z x 1)
H state-to-observation matrix (Z x M)
R observation noise covariance (Z x Z)
% tuning
lambda tuning parameter (scalar)
function [x, P] = kf_predict (x, A, P, lambda , u, B, Q)
x = A*x + B*u; % a priori state prediction
P = A*P*A' * 1/( lambda ^2) + B*Q*B'; % a priori covariance
end
function [x, P] = kf_update (x, z, P, H, R)
y = z - H*x; % measurement residuals ("innovation")
S = H*P*H' + R; % residual (innovation) covariance
K = P*H' / S; % Kalman gain
x = x + K*y; % a posteriori state prediction
P = (eye(size(P)) - K*H)*P; % a posteriori covariance matrix
end
While planning an eleven-day trekking trip through the Hardangervidda in Norway, I came across the age old problem of estimating the walking time for a given path on the map. While one is easily able to determine the times for the main west-east and north-south routes from a travel guide, there sadly is no information about those self-made problems (i.e. custom routes). Obviously, a simple and correct solution needs to be found.
Of course, there is no such thing. When searching for hiking time rules, two candidates pop up regularly: Naismith’s rule (including Tranter’s corrections), as well as Tobler’s hiking function.
William W. Naismith’s rule — and I couldn’t find a single scientific source — is more a rule of thumb than it is exact. It states:
For every 5 kilometres, allow one hour. For every 600 metres of ascend, add another hour.
which reads as
\begin{align}
\theta &= \tan^{-1}(\frac{\Delta a}{\Delta s}) \\
t &= \Delta s \left( \frac{1\mathrm{h}}{5\mathrm{km}} \right) + \Delta a \left( \frac{1 \mathrm{h}}{0.6 \mathrm{km}} \right) \\
|\vec{w}| &= \frac{\Delta s}{t}
\end{align}
where \(|\vec{w}|\) is the walking speed, \(\Delta s\) the length on the horizontal plane (i.e. “forward”), \(\Delta a\) the ascend (i.e. the difference in height) and \(\theta\) the slope.
function [w, t, slope] = naismith(length, ascend)
slope = ascend/length;
t = length*(1/5) + ascend*(1/0.6);
w = length./t;
end
That looks like
Interestingly, this implies that if you climb a 3 km mountain straight up, it will take you 5 hours. By recognising that \(5 \textrm{km} / 0.6 \textrm{km} \approx 8.3 \approx 8\) , the 8 to 1 rule can be employed, which allows the transformation of any (Naismith-ish) track to a flat track by calculating
\begin{align}
\Delta s_{flat} &= \Delta s + \frac{5 \mathrm{km}}{0.6 \mathrm{km}} \cdot \Delta a\\
&\approx \Delta s + 8 \cdot \Delta a
\end{align}
So a track of \(20 \textrm{km}\) in length with \(1 \textrm{km}\) of ascend would make for \(\mathrm{km} + 8 \cdot 1 \mathrm{km} = 28 \mathrm{km}\) of total track length. Assuming an average walking speed of \(5 \mathrm{km/h}\) , that route will take \(28 \mathrm{km} / 5 \mathrm{km/h} = 5.6 \mathrm{h}\) , or 5 hours and 36 minutes. Although quite inaccurate, somebody found this rule to be accurate enough when comparing it against times of men running down hills in Norway. Don’t quote me on that.
Robert Aitken assumed that 5 km/h might be too much and settled for 4 km/h on all off-track surfaces. Unfortunately the Naismith rule still didn’t state anything about descent or slopes in general, so Eric Langmuir added some refinements:
When walking off-track, allow one hour for every 4 kilometres (instead of 5 km). When on a small decline of 5 to 12°, subtract 10 minutes per 300 metres (1000 feet). For any steeper decline (i.e. over 12°), add 10 minutes per 300 metres of descent.
Now that’s the stuff wonderfully non-differentiable functions are made of:
It should be clear that 12 km/h is an highly unlikely speed, even on roads.
function [w, t, slope] = naismith_al(length, ascend, base_speed)
if ~exist('base_speed', 'var')
base_speed = 4; % km/h
end
slope = ascend/length;
t = length*(1/base_speed);
if slope >= 0
t = t + ascend*(1/0.6);
elseif atand(slope) <= -5 && atand(slope) >= -12
t = t - abs(ascend)*((10/60)/0.3);
elseif atand(slope) < -12
t = t + abs(ascend)*((10/60)/0.3);
end
w = length./t;
end
So Waldo Tobler came along and developed his “hiking function”, an equation that assumes a top speed of 6 km/h with an interesting feature: It — though still indifferentiable — adapts gracefully to the slope of the ground. That function can be found in his 1993 report “Three presentations on geographical analysis and modeling: Non-isotropic geographic modeling speculations on the geometry of geography global spatial analysis” and looks like the following:
It boils down to the following equation of the walking speed \(|\vec{w}|\) “on footpaths in hilly terrain” (with \(s=1\) ) and “off-path travel” (with \(s=0.6\) ):
where \(\tan(\theta)\) is the tangent of the slope (i.e. vertical distance over horizontal distance). By taking into account the exact slope of the terrain, this function is superior to Naismith’s rule and a much better alternative to the Langmuir bugfix, especially when used on GIS data.
function [w] = tobler(slope, scaling)
w = scaling*6*exp(-3.5 * abs(slope+0.05));
end
It however lacks the one thing that makes the Naismith rule stand out: Tranter’s corrections for fatigue and fitness. (Yes, I know it gets weird.) Sadly these corrections seem to only exists in the form of a mystical table that looks, basically, like that:
Fitness in minutes
Time in hours according to Naismith’s rule
2
3
4
5
6
7
8
9
10
12
14
16
18
20
22
24
15 (very fit)
1
1½
2
2¾
3½
4½
5½
6¾
7¾
10
12½
14½
17
19½
22
24
20
1¼
2¼
3¼
4½
5½
6½
7¾
8¾
10
12½
15
17½
20
23
25
1½
3
4¼
5½
7
8½
10
11½
13¼
15
17½
30
2
3½
5
6¾
8½
10½
12½
14½
40
2¾
4¼
5¾
7½
9½
11½
50 (unfit)
3¼
4¾
6½
8½
where the minutes are a rather obscure measure of how fast somebody is able to hike up 300 metres over a distance of 800 metres ($20^\circ$). With that table the rule is: If you get into nastier terrain, drop one fitness level. If you suck at walking, drop a fitness level. If you use a 20 kg backpack, drop one level. Sadly, there’s no equation to be found, so I had to make up one myself.
By looking at the table and the mesh plot it seems that each time axis for a given fitness is logarithmic.
I did a log-log plot and it turns out that the series not only appear to be logarithmic in time, but also in fitness. By deriving the (log-log-)linear regression for each series, the following equations can be found:
These early approximations appear to be quite good, as can be seen in the following linear plot. The last three lines \(t_{30}\) , \(t_{40}\) and \(t_{50}\) however begin to drift away. That’s expected for the last two ones due to the small number of samples, but the \(t_{30}\) line was irritating.
My first assumption was that the \(t_{40}\) and \(t_{50}\) lines simply are outliers and that the real coefficient for the time variable is the (outlier corrected) mean of \(1.2215 \pm 0.11207\) . This would imply, that the intersect coefficient is the variable for fitness.
Unfortunately, this only seems to make things better in the log-log plot, but makes them a little bit worse in the linear world.
Equi-distant intersect coefficients also did not do the trick. Well, well. In the end, I decided to give the brute force method a chance and defined several fitting functions for the use with genetic algorithm and pattern search solvers, including exponential, third-order and sigmoidal forms. The best version I could come up with was
This function results in a least squared error of about 21.35 hours over all data points. The following shows the original surface from the table and the synthetic surface from the function.
A maximum deviation of about 1 hour can be seen clearly in the following error plot for the $t_{30}$ line, which really seems to be an outlier.
For comparison (here’s the original table), this is the synthetic correction table:
Fitness in minutes
Time in hours according to Naismith’s rule
2
3
4
5
6
7
8
9
10
12
14
16
18
20
22
24
15 (very fit)
1¼
2
2¾
3½
4½
5¼
6¼
7¼
8¼
10¼
12¼
14½
16½
18¾
21¼
23½
20
1½
2½
3½
4½
5½
6¾
7¾
9
10¼
12¾
15½
18¼
21
23¾
25
1¾
3
4
5¼
6¾
8
9½
10¾
12¼
15½
18½
30
2
3¼
4¾
6¼
7¾
9¼
11
12½
40
2½
4¼
6
7¾
9¾
11¾
50 (unfit)
3
5
7¼
9½
Juni 14th, 2014 GMT +1 von
Markus
2014-06-14T08:19:40+01:002018-03-4T14:16:13+01:00
· 0 Kommentare