well …

I just failed in Matlab
September 23rd, 2014 GMT +1 von Markus · 0 Kommentare 
Hidden Markov Model training using the BaumWelch Algorithm
The BaumWelch algorithm determines the (locally) optimal parameters for a Hidden Markov Model by essentially using three equations.
One for the initial probabilities:
\begin{align}
\pi_i &= \frac{E\left(\text{Number of times a sequence started with state}\, s_i\right)}{E\left(\text{Number of times a sequence started with any state}\right)}
\end{align}
Another for the transition probabilities:
\begin{align}
a_{ij} &= \frac{E\left(\text{Number of times the state changed from}\, s_i \, \text{to}\,s_j\right)}{E\left(\text{Number of times the state changed from}\, s_i \, \text{to any state}\right)}
\end{align}
And the last one for the emission probabilities:
\begin{align}
b_{ik} &= \frac{E\left(\text{Number of times the state was}\, s_i \, \text{and the observation was}\,v_k\right)}{E\left(\text{Number of times the state was}\, s_i\right)}
\end{align}
If one had a fully labeled training corpus representing all possible outcomes, this would be exactly the optimal solution: Count each occurrence, normalize and you’re good. If, however, no such labeled training corpus is available — i.e. only observations are given, no according state sequences — the expected values \(E(c)\) of these counts would have to be estimated. This can be done (and is done) using the forward and backward probabilities \(\alpha_t(i)\) and \(\beta_t(i)\) , as described below.
Weiterlesen »September 1st, 2014 GMT +1 von Markus · 0 Kommentare 
Regular Kalman Filter (almost) superquick Reference
To make long things short, here’s the complete Matlab code.
% State estimations x state vector (M x 1) A state transition matrix (M x M) P state covariance matrix (M x M) % Input / control data u input vector (N x 1) B input transition matrix (M x N) Q input noise covariance matrix (N x N) % Observations z observation vector (Z x 1) H statetoobservation matrix (Z x M) R observation noise covariance (Z x Z) % tuning lambda tuning parameter (scalar)
function [x, P] = kf_predict (x, A, P, lambda , u, B, Q) x = A*x + B*u; % a priori state prediction P = A*P*A' * 1/( lambda ^2) + B*Q*B'; % a priori covariance end
function [x, P] = kf_update (x, z, P, H, R) y = z  H*x; % measurement residuals ("innovation") S = H*P*H' + R; % residual (innovation) covariance K = P*H' / S; % Kalman gain x = x + K*y; % a posteriori state prediction P = (eye(size(P))  K*H)*P; % a posteriori covariance matrix end
August 27th, 2014 GMT +1 von Markus · 0 Kommentare 
Read GPS logs from a Canmore GT740FL on Linux using gpsbabel
I recently bought a Canmore GT740FL GPS logger (labeled Sport LogBook 740FL GPS on Amazon) and found in a review that it can be read out with gpsbabel using the skytraq format (“SkyTraq Venus based loggers (download)”). That mode’s documentation states that the GT750FL can be read out, which is indeed true as well for the GT740FL (despite being a SiRFstar IV device).
On my Ubuntu box, the logger is available as an ACM device (basically a Hayesstyle modem) under /dev/ttyACM0; check dmesg  grep tty to be sure.
Here’s the command line to download all tracks in GPX format into a tracks.gpx file.
sudo gpsbabel i skytraq,baud=115200 f /dev/ttyACM0 \ o gpx F tracks.gpx
It seems though that gpsbabel downloads everything as a single track (except for manual waypoints, of course). Some sources suggest using baud=38400,initbaud=4800 instead of the 115200 baud option above when using Windows (and for the 730FL, to be exact); I didn’t test that, however, as 115.2 kbaud worked fine for me.
To also delete the tracks, you may append the erase option or simply delete all tracks without downloading using i skytraq,erase,nooutput.
If you’d like to see live GPS data you may use minicom, e.g. with
sudo minicom b 115200 D /dev/ttyACM0
Use CTRLA, Z to open minicom’s menu, then X to exit.
Juni 29th, 2014 GMT +1 von Markus · 1 Kommentar 
Naismith, AitkenLangmuir, Tranter and Tobler: Modeling hiking speed
While planning an elevenday trekking trip through the Hardangervidda in Norway, I came across the age old problem of estimating the walking time for a given path on the map. While one is easily able to determine the times for the main westeast and northsouth routes from a travel guide, there sadly is no information about those selfmade problems (i.e. custom routes). Obviously, a simple and correct solution needs to be found.
Of course, there is no such thing. When searching for hiking time rules, two candidates pop up regularly: Naismith’s rule (including Tranter’s corrections), as well as Tobler’s hiking function.
William W. Naismith’s rule — and I couldn’t find a single scientific source — is more a rule of thumb than it is exact. It states:
For every 5 kilometres, allow one hour. For every 600 metres of ascend, add another hour.
which reads as
\begin{align}
\theta &= \tan^{1}(\frac{\Delta a}{\Delta s}) \\
t &= \Delta s \left( \frac{1\mathrm{h}}{5\mathrm{km}} \right) + \Delta a \left( \frac{1 \mathrm{h}}{0.6 \mathrm{km}} \right) \\
\vec{w} &= \frac{\Delta s}{t}
\end{align}
where \(\vec{w}\) is the walking speed, \(\Delta s\) the length on the horizontal plane (i.e. “forward”), \(\Delta a\) the ascend (i.e. the difference in height) and \(\theta\) the slope.
function [w, t, slope] = naismith(length, ascend) slope = ascend/length; t = length*(1/5) + ascend*(1/0.6); w = length./t; end
That looks like
Interestingly, this implies that if you climb a 3 km mountain straight up, it will take you 5 hours. By recognising that \(5 \textrm{km} / 0.6 \textrm{km} \approx 8.3 \approx 8\) , the 8 to 1 rule can be employed, which allows the transformation of any (Naismithish) track to a flat track by calculating
\begin{align}
\Delta s_{flat} &= \Delta s + \frac{5 \mathrm{km}}{0.6 \mathrm{km}} \cdot \Delta a\\
&\approx \Delta s + 8 \cdot \Delta a
\end{align}
So a track of \(20 \textrm{km}\) in length with \(1 \textrm{km}\) of ascend would make for \(\mathrm{km} + 8 \cdot 1 \mathrm{km} = 28 \mathrm{km}\) of total track length. Assuming an average walking speed of \(5 \mathrm{km/h}\) , that route will take \(28 \mathrm{km} / 5 \mathrm{km/h} = 5.6 \mathrm{h}\) , or 5 hours and 36 minutes. Although quite inaccurate, somebody found this rule to be accurate enough when comparing it against times of men running down hills in Norway. Don’t quote me on that.
Robert Aitken assumed that 5 km/h might be too much and settled for 4 km/h on all offtrack surfaces. Unfortunately the Naismith rule still didn’t state anything about descent or slopes in general, so Eric Langmuir added some refinements:
When walking offtrack, allow one hour for every 4 kilometres (instead of 5 km). When on a small decline of 5 to 12°, subtract 10 minutes per 300 metres (1000 feet). For any steeper decline (i.e. over 12°), add 10 minutes per 300 metres of descent.
Now that’s the stuff wonderfully nondifferentiable functions are made of:
That is:
\begin{align}
\theta &= \tan^{1}(\frac{\Delta a}{\Delta s}) \\
t &= \Delta s \left( \frac{1\mathrm{h}}{5\mathrm{km}} \right) + \begin{cases}
+\Delta a \left( \frac{1 \mathrm{h}}{0.6 \mathrm{km}} \right) , & \text{if $\theta > 5^\circ$} \\
\Delta a \left( \frac{\left(10/60\right) \mathrm{h}}{0.3 \mathrm{km}} \right) , & \text{if $12^\circ \le \theta \le 5^\circ$} \\
+\Delta a \left( \frac{\left(10/60\right) \mathrm{h}}{0.3 \mathrm{km}} \right) , & \text{if $\theta < 12^\circ$}
\end{cases} \\
\vec{w} &= \frac{\Delta s}{t}
\end{align}
It should be clear that 12 km/h is an highly unlikely speed, even on roads.
function [w, t, slope] = naismith_al(length, ascend, base_speed) if ~exist('base_speed', 'var') base_speed = 4; % km/h end slope = ascend/length; t = length*(1/base_speed); if slope >= 0 t = t + ascend*(1/0.6); elseif atand(slope) <= 5 && atand(slope) >= 12 t = t  abs(ascend)*((10/60)/0.3); elseif atand(slope) < 12 t = t + abs(ascend)*((10/60)/0.3); end w = length./t; end
So Waldo Tobler came along and developed his “hiking function”, an equation that assumes a top speed of 6 km/h with an interesting feature: It — though still indifferentiable — adapts gracefully to the slope of the ground. That function can be found in his 1993 report “Three presentations on geographical analysis and modeling: Nonisotropic geographic modeling speculations on the geometry of geography global spatial analysis” and looks like the following:
It boils down to the following equation of the walking speed \(\vec{w}\) “on footpaths in hilly terrain” (with \(s=1\) ) and “offpath travel” (with \(s=0.6\) ):
\begin{align}
\vec{w} = s \cdot 6e^{3.5 \cdot  \tan(\theta) + 0.05 }
\end{align}
where \(\tan(\theta)\) is the tangent of the slope (i.e. vertical distance over horizontal distance). By taking into account the exact slope of the terrain, this function is superior to Naismith’s rule and a much better alternative to the Langmuir bugfix, especially when used on GIS data.
function [w] = tobler(slope, scaling) w = scaling*6*exp(3.5 * abs(slope+0.05)); end
It however lacks the one thing that makes the Naismith rule stand out: Tranter’s corrections for fatigue and fitness. (Yes, I know it gets weird.) Sadly these corrections seem to only exists in the form of a mystical table that looks, basically, like that:Fitness in minutes Time in hours according to Naismith’s rule 2 3 4 5 6 7 8 9 10 12 14 16 18 20 22 24 15 (very fit) 1 1½ 2 2¾ 3½ 4½ 5½ 6¾ 7¾ 10 12½ 14½ 17 19½ 22 24 20 1¼ 2¼ 3¼ 4½ 5½ 6½ 7¾ 8¾ 10 12½ 15 17½ 20 23 25 1½ 3 4¼ 5½ 7 8½ 10 11½ 13¼ 15 17½ 30 2 3½ 5 6¾ 8½ 10½ 12½ 14½ 40 2¾ 4¼ 5¾ 7½ 9½ 11½ 50 (unfit) 3¼ 4¾ 6½ 8½ where the minutes are a rather obscure measure of how fast somebody is able to hike up 300 metres over a distance of 800 metres ($20^\circ$). With that table the rule is: If you get into nastier terrain, drop one fitness level. If you suck at walking, drop a fitness level. If you use a 20 kg backpack, drop one level. Sadly, there’s no equation to be found, so I had to make up one myself.
hours = [2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, 22, 24]; fitness = [15, 20, 25, 30, 40, 50]; table = [1, 1.5, 2, 2.75, 3.5, 4.5, 5.5, 6.75, 7.75, 10, 12.5, 14.5, 17, 19.5, 22, 24; 1.25, 2.25, 3.25, 4.5, 5.5, 6.7, 7.75, 8.75, 10, 12.5, 15, 17.5, 20, 23, NaN, NaN; 1.5, 3, 4.25, 5.5, 7, 8.5, 10, 11.5, 13.25, 15, 17.5, NaN, NaN, NaN, NaN, NaN; 2, 3.5, 5, 6.75, 8.5, 10.5, 12.5, 14.5, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN; 2.75, 4.25, 5.75, 7.5, 9.5, 11.5, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN; 3.25, 4.75, 6.5, 8.5, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN];
By looking at the table and the mesh plot it seems that each time axis for a given fitness is logarithmic.
I did a loglog plot and it turns out that the series not only appear to be logarithmic in time, but also in fitness. By deriving the (loglog)linear regression for each series, the following equations can be found:
\begin{align}
t_{15}(t) &= e^{1.35 \,ln(t)  1.08} \\
t_{20}(t) &= e^{1.24 \,ln(t)  0.55} \\
t_{25}(t) &= e^{1.25 \,ln(t)  0.33} \\
t_{30}(t) &= e^{1.31 \,ln(t)  0.21} \\
t_{40}(t) &= e^{1.14 \,ln(t) + 0.20} \\
t_{50}(t) &= e^{1.05 \,ln(t) + 0.44} \\
\end{align}
These early approximations appear to be quite good, as can be seen in the following linear plot. The last three lines \(t_{30}\) , \(t_{40}\) and \(t_{50}\) however begin to drift away. That’s expected for the last two ones due to the small number of samples, but the \(t_{30}\) line was irritating.
My first assumption was that the \(t_{40}\) and \(t_{50}\) lines simply are outliers and that the real coefficient for the time variable is the (outlier corrected) mean of \(1.2215 \pm 0.11207\) . This would imply, that the intersect coefficient is the variable for fitness.
Unfortunately, this only seems to make things better in the loglog plot, but makes them a little bit worse in the linear world.
Equidistant intersect coefficients also did not do the trick. Well, well. In the end, I decided to give the brute force method a chance and defined several fitting functions for the use with genetic algorithm and pattern search solvers, including exponential, thirdorder and sigmoidal forms. The best version I could come up with was
\(
\begin{align}
t_{corrected}(t, f) &= 0.31381 \,e^{1.2097 \,ln(t) + 0.81328 \,ln(f)  1.7307}
\end{align}
\)This function results in a least squared error of about 21.35 hours over all data points. The following shows the original surface from the table and the synthetic surface from the function.
A maximum deviation of about 1 hour can be seen clearly in the following error plot for the $t_{30}$ line, which really seems to be an outlier.
For comparison (here’s the original table), this is the synthetic correction table:
Fitness in minutes Time in hours according to Naismith’s rule 2 3 4 5 6 7 8 9 10 12 14 16 18 20 22 24 15 (very fit) 1¼ 2 2¾ 3½ 4½ 5¼ 6¼ 7¼ 8¼ 10¼ 12¼ 14½ 16½ 18¾ 21¼ 23½ 20 1½ 2½ 3½ 4½ 5½ 6¾ 7¾ 9 10¼ 12¾ 15½ 18¼ 21 23¾ 25 1¾ 3 4 5¼ 6¾ 8 9½ 10¾ 12¼ 15½ 18½ 30 2 3¼ 4¾ 6¼ 7¾ 9¼ 11 12½ 40 2½ 4¼ 6 7¾ 9¾ 11¾ 50 (unfit) 3 5 7¼ 9½ Juni 14th, 2014 GMT +1 von Markus · 0 Kommentare