• ## Use your conda environment in Jupyter Notebooks

Sadly, running jupyter notebook from within a conda environment does not imply your notebook also runs in the same environment. Thankfully, there’s an easy fix for that, namely nb_conda, and you’ll get it using

conda install nb_conda


in the environment of your choice. After that, start up your notebook and select the Kernel you want either when creating a new notebook or from the notebook’s Kernel menu:

There we go.

• ## Building OpenCV for Anaconda Python 3

Don’t judge me, I know how this title sounds. The harsh reality is — as per writing of this post — that I always have a hard time getting CMake to recognize the right Python path when I’m using an Anaconda environment. In theory it’s just

git clone https://github.com/opencv/opencv.git opencv
mkdir opencv/build && cd opencv/build
cmake -DCMAKE_INSTALL_PREFIX=/usr/local  ..
make -j8
(sudo) make install


and there you go. It will happily pick up your environment’s Python installation and build that for you. Of course, it looks more along the lines of

cmake \
-DCMAKE_INSTALL_PREFIX="/usr/local" \
-DOPENCV_EXTRA_MODULES_PATH="../opencv_contrib/modules" \
-DBUILD_DOCS=OFF \
-DBUILD_TESTS=OFF \
-DBUILD_EXAMPLES=OFF \
-DBUILD_PERF_TESTS=OFF \
-DBUILD_opencv_dnn=OFF \
-DENABLE_FAST_MATH=ON \
-DWITH_OPENMP=ON \
-DWITH_TBB=ON \
-DMKL_WITH_TBB=ON \
-DMKL_WITH_OPENMP=ON \
-DCMAKE_CXX_COMPILER="/usr/bin/g++-5" \
-DCMAKE_C_COMPILER="/usr/bin/gcc-5" \
-DCUDA_HOST_COMPILER="/usr/bin/gcc-5" \
-DCUDA_FAST_MATH=ON \
-DCUDA_ARCH_BIN="5.2" \
-DWITH_CUBLAS=ON \
..


conda activate my-environment


to the mix though, all hell breaks loose and you end up with partially configured Python 2 and no Python 3 support at all. The trick seems to be not to rely on OpenCV’s standard Python configuration values

PYTHON3_LIBRARY
PYTHON3_EXECUTABLE
PYTHON3_INCLUDE_DIR
PYTHON3_INCLUDE_DIR2
PYTHON3_NUMPY_INCLUDE_DIRS


but rather to use the seemingly undocumented values

PYTHON3_LIBRARIES
PYTHON3_INCLUDE_PATH


as well, giving it the nice appearance of

cmake \
-DCMAKE_BUILD_TYPE=RELEASE \
-DCMAKE_INSTALL_PREFIX="/your/anaconda3" \
-DOPENCV_EXTRA_MODULES_PATH="../opencv_contrib/modules" \
-DBUILD_DOCS=OFF \
-DBUILD_TESTS=OFF \
-DBUILD_EXAMPLES=OFF \
-DBUILD_PERF_TESTS=OFF \
-DBUILD_opencv_dnn=ON \
-DTINYDNN_USE_NNPACK=OFF \
-DTINYDNN_USE_TBB=ON \
-DTINYDNN_USE_OMP=ON \
-DENABLE_FAST_MATH=ON \
-DWITH_OPENMP=ON \
-DWITH_TBB=ON \
-DMKL_WITH_TBB=ON \
-DMKL_WITH_OPENMP=ON \
-DCMAKE_CXX_COMPILER="/usr/bin/g++-5" \
-DCMAKE_C_COMPILER="/usr/bin/gcc-5" \
-DCUDA_HOST_COMPILER="/usr/bin/gcc-5" \
-DCUDA_FAST_MATH=ON \
-DCUDA_ARCH_BIN="5.2" \
-DWITH_CUBLAS=ON \
-DBUILD_opencv_python2=OFF \
-DPYTHON_EXECUTABLE="/your/anaconda3/bin/python3" \
-DPYTHON_LIBRARY="/your/anaconda3/lib/libpython3.6m.so" \
-DPYTHON3_LIBRARY="/your/anaconda3/lib/libpython3.6m.so" \
-DPYTHON3_EXECUTABLE="/your/anaconda3/bin/python3" \
-DPYTHON3_INCLUDE_DIR="/your/anaconda3/include/python3.6m" \
-DPYTHON3_INCLUDE_DIR2="/your/anaconda3/include" \
-DPYTHON3_NUMPY_INCLUDE_DIRS="/your/anaconda3/lib/python3.6/site-packages/numpy/core/include" \
-DPYTHON3_INCLUDE_PATH="/your/anaconda3/include/python3.6m" \
-DPYTHON3_LIBRARIES="/your/anaconda3/lib/libpython3.6m.so" \
-DHDF5_C_LIBRARY_z="/your/anaconda3/lib/libz.so" \
..


And then it’ll do what you want it to do. Because I’m tired of retrying everytime, on github.com/sunsided/opencv-cmake is my repo with support for the OpenCV Extras Module and some documentation for this.

So, what paths go in there? Well, this blog post has a very flashy way of finding out.

For CMAKE_INSTALL_PREFIX you use

python3 -c "import sys; print(sys.prefix)"


so it will install OpenCV directly into your Anaconda installation.
For PYTHON3_EXECUTABLE you call

which python3


and PYTHON3_INCLUDE_DIR is given by

python3 -c "from distutils.sysconfig import get_python_inc; print(get_python_inc())"


Finally, a PYTHON3_PACKAGES_PATH can be found using

python3 -c "from distutils.sysconfig import get_python_lib; print(get_python_lib())"


which you a can use to determine the PYTHON3_NUMPY_INCLUDE_DIRS, although CMake strangely did that part right for me.

• ## Using TensorFlow’s Supervisor with TensorBoard summary groups

One of TensorFlow’s more awesome parts is definitely TensorBoard, i.e. the capability of collecting and visualizing data from the TensorFlow graph as the network is running while also being able to display and browse the graph itself. Coming from Caffe, where I eventually wrote my own tooling just to visualize the training loss from logs of the raw console output and hat to copy-paste the graph’s prototxt to some online service in order to visualize it, this is a massive step in the best possible direction. To get some of Caffe’s checkpointing features back, you can use TensorFlow’s Supervisor. This blog post is about using both TensorBoard and the Supervisor for fun and profit.
TL;DR: Scroll to the end for an example of using grouped summaries with the Supervisor.

Apart from just storing scalar data for TensorBoard, the histogram feature turned out to be especially valuable to me for observing the performance of a probability inference step.

Here, the left half shows the distribution of ground truth probability values in the training and validation sets over time, whereas the right half shows the actual inferred probabilities over time. It’s not hard to see that the network is getting better, but there is more to it:

• The histogram of the ground truth values (here on the left) allows you to verify that your training data is indeed correct. If the data is not balanced, you might learn a network that is biased towards one outcome.
• If the network does indeed obtain some biased view of the data, you’ll cleary see patterns emerging in the inferred histogram that do not match the expected ground truth distribution. In this example, the right histograms approach the left histograms, so it appears to be working fine.
• However, if you only measure network performance in accuracy, as ratio of correct guesses over all examples, you might be getting the wrong impression: If the input distribution is skewed towards 95% positive and 5% negative examples, a network guessing “positive” 100% of the time is producing only 5% error. If your total accuracy is an aggregate over multiple different values, you will definitely miss this, especially since randomized mini-batches only further obscure this issue.
• Worse, if the learned coefficients run into saturation, learning will stop for them. Again, this might not be obvious if the total loss and accuracy is actually an aggregate of different values.

### Influence of the learning rate

Let’s take the example of a variable learning rate. If at some point the training slows down, it’s not immediately clear if this is due to the fact that

• a parameter space optimum has been found and training is done,
• the algorithm found a plateau in parameter space and would continue to fall after a few more hundreds or thousands of iterations or
• the training is actually diverging because the learning rate is not small enough in order to enter a local optimum in the first place.

Now optimizers like Adam are tailored to overcome the problems of fixed learning rates but they too can only go so far: If the learning rate is too big to begin with, it’s still too big after fine-tuning. Or worse, after a couple of iterations the adjusted weights could end up in saturation and no further change would be able to do anything to change this.

To rule out at least one part, you can make the learning rate a changeable parameter of the network, e.g. a function of the training iteration. I had some success in using Caffe’s “multi-step” approach of changing the learning rate at fixed iteration numbers – say, reducing it one decade at iteration 1000, 5000 and 16000 – where I determined these values over different training runs of the network.

So instead of baking the learning rate into the graph during construction, you would define a placeholder for it and feed the learning rate of the current epoch/iteration into the optimization operation each time you call it, like so:

with tf.Graph().as_default() as graph:
p_lr = tf.placeholder(tf.float32, (), name='learning_rate')
t_loss = tf.reduce_mean(...)
.minimize(t_loss)

with tf.Session(graph=graph) as sess:
init = tf.group(tf.global_variables_initializer(),
tf.local_variables_initializer())
sess.run(init)

for _ in range(0, epochs):
learning_rate = 0.1
loss, _ = sess.run([t_loss, op_minimize],
feed_dict={p_lr: learning_rate)


Alternatively, you could make it a non-learnable Variable and explicitly assign it whenever it needs to be changed; let’s assume we don’t do that.
The first thing I usually do is then to also add a summary node to track the current learning rate (as well as the training loss):

with tf.Graph().as_default() as graph:
p_lr = tf.placeholder(tf.float32, (), name='learning_rate')
t_loss = tf.reduce_mean(...)
.minimize(t_loss)

tf.summary.scalar('learning_rate', p_lr)
tf.summary.scalar('loss', t_loss)

# histograms work the same way
tf.summary.histogram('probability', t_some_batch)

s_merged = tf.summary.merge_all()

writer = tf.summary.FileWriter('log', graph=graph)
with tf.Session(graph=graph) as sess:
init = tf.group(tf.global_variables_initializer(),
tf.local_variables_initializer())
sess.run(init)

for _ in range(0, epochs):
learning_rate = 0.1
loss, summary, _ = sess.run([t_loss, s_merged, op_minimize],
feed_dict={p_lr: learning_rate)


Now, for each epoch, the values of the t_loss and p_lr tensors are stored in a protocol buffer file in the log subdirectory. You can then start TensorBoard with the --logdir parameter pointing to it and get a nice visualization of the training progress.

And one example where doing this massively helped me tracking down errors is exactly the network I took the introduction histogram picture from; here, I set the learning rate to 0.1 for about a two hundred iterations before dropping it to 0.01. It turned out that having the learning rate this high for my particular network did result in saturation and learning effectively stopped. The histogram helped noticing the issue and the scalar graph helped determining the “correct” learning rates.

### Training and validation set summaries

Suppose now you want to have different summaries that may or may not appear on different instances of the graph. The learning rate, for example, has no influence on the outcome of the validation batch, so including it in validation runs is only eating up time, memory and storage. However, the tf.summary.merge_all() operation doesn’t care where the summaries live per se – and since some summaries depend on nodes from the training graph (e.g. the learning rate placeholder), you suddenly create a dependency on nodes you didn’t want to trigger – with effects of very varying levels of fun.

It turns out that summarries can be bundled into collections – e.g. “train” and “test” – by specifying their membership upon construction, so that you can later obtain only those summaries that belong to the specified collections:

with tf.Graph().as_default() as graph:
p_lr = tf.placeholder(tf.float32, (), name='learning_rate')
t_loss = tf.reduce_mean(...)
.minimize(t_loss)

tf.summary.scalar('learning_rate', p_lr, collections=['train'])
tf.summary.scalar('loss', t_loss, collections=['train', 'test'])

# merge summaries per collection
s_training = tf.summary.merge_all('train')
s_test = tf.summary.merge_all('test')

writer = tf.summary.FileWriter('log', graph=graph)
with tf.Session(graph=graph) as sess:
init = tf.group(tf.global_variables_initializer(),
tf.local_variables_initializer())
sess.run(init)

for _ in range(0, epochs):
# during training
learning_rate = 0.1
loss, summary, _ = sess.run([t_loss, s_training, op_minimize],
feed_dict={p_lr: learning_rate)

# during validation
loss, summary = sess.run([t_loss, s_test])


In combinaion with liberal uses of tf.name_scope(), it could then look like on the following image. The graphs shows three different training runs where we now got the ability to reason about the choice(s) of the learning rate.

This works, but we can do better.

### Using the Supervisor

One currently (documentation wise) very underrepresented yet powerful feature of TensorFlow’s Python API is the Supervisor, a manager that basically takes care of writing summaries, taking snapshots, running queues (should you use them, which you probably do), initializing variables and also gracefully stopping training.

In order to use the Supervisor you basically swap out your own session with a managed one, skip variable initialization and tell it when you want which of your custom summaries to be stored. While not being required, but apparently being a good practice is the addition of a global_step variable to the graph; should the Supervisor find such a variable, it will automatically use it for internal coordination. If you bind the variable to the optimizer it will also be automatically incremented for each optimization step, freeing you from having to keep track of the iteration yourself. Here’s an example of how to use it:

with tf.Graph().as_default() as graph:
p_lr = tf.placeholder(tf.float32, (), name='learning_rate')
t_loss = tf.reduce_mean(...)

global_step = tf.Variable(0, name='global_step', trainable=False)
.minimize(t_loss, global_step=global_step)

tf.summary.scalar('learning_rate', p_lr, collections=['train'])
tf.summary.scalar('loss', t_loss, collections=['train', 'test'])

s_training = tf.summary.merge_all('train')
s_test = tf.summary.merge_all('test')

# create the supervisor and obtain a managed session;
# variable initialization will now be done automatically.
sv = tf.train.Supervisor(logdir='log', graph=graph)
with sv.managed_session() as sess:

# run until training should stop
while not sv.should_stop():
learning_rate = 0.1
loss, s, i, _ = sess.run([t_loss, s_training,
global_step, op_minimize],
feed_dict={p_lr: learning_rate)

# hand over your own summaries to the Supervisor
sv.summary_computed(sess, s, global_step=i)

loss, s = sess.run([t_loss, s_test])
sv.summary_computed(sess, s, global_step=i)

# ... at some point, request a stop
sv.request_stop()


The Supervisor will also add additional summaries to your graph for free, e.g. an insight over the number of training steps per second. This could allow you to fine-tune minibatch sizes, for example, because they currently tend to have a big impact on the host to device transmission on the data.
Different from Caffe’s behavior, the Supervisor will by default keep only the last five snapshots of the learned weights; unless you fear of missing the validation loss optimum, leaving the training running for days is now not an issue anymore – diskwise, at least.

• ## Getting an image into and out of TensorFlow

Let’s assume you already have an image in numpy’s ndarray format, e.g. because you loaded it with OpenCV’s imread() function, and you want to convert it to TensorFlow’s Tensor format and later back to ndarray.

That’s essentially three calls to TensorFlow:

import cv2
import tensorflow as tf
import numpy as np

# normalize the pixel values to 0..1 range and convert them
# to a single-precision tensor
t = tf.convert_to_tensor(image_in, dtype=tf.float32)
assert isinstance(t, tf.Tensor)

# in order to convert the tensor back to an array, we need
# to evaluate it; for this, we need a session
with tf.Session() as sess:
image_out = sess.run(fetches=t)
assert isinstance(image_out, np.ndarray)

# for imshow to work, the image needs to be in 0..1 range
# whenever it is a float; that's why we normalized it.
cv2.imshow('Image', image_out)


Note that instead of using sess.run(t) we could also have used

with tf.Session() as sess:
image_out = t.eval(sess)


which essentially performs the same action. A benefit of using sess.run() directly is that we can fetch more than one tensor in the same pass through the (sub-)graph (say, tuple = sess.run(fetches=[t1, t2, t3])), whereas calling tensor.eval() always results in one separate pass per call.

• ## Dockerized .NET Core 1.0.1 and .NET Framework on Mono 4.7

TLDR; The setup is available via GitHub and the you can directly pull the sunside/dotnet Docker image.

Three components are used in this process:

• An Ubuntu Trusty base image that has libuv installed (required for Kestrel)
• A mono installation that supports .NET 4.6.1
• .NET Core 1.0.1

Ubuntu Trusty’s libuv is pretty old, so we’re building it from source. The base image is pretty straightforward, just use ubuntu:trusty, install the requirements for building libuv, as well as a bit of candy and then immediately throw away half of the stuff:

FROM ubuntu:trusty

RUN LIBUV_VERSION=1.9.1 \
&& apt-get update \
&& apt-get -y install vim-tiny nano curl wget autoconf automake build-essential libtool \
&& curl -sSL https://github.com/libuv/libuv/archive/v${LIBUV_VERSION}.tar.gz | tar zxfv - -C /usr/local/src \ && cd /usr/local/src/libuv-$LIBUV_VERSION \
&& sh autogen.sh && ./configure && make && make install \
&& rm -rf /usr/local/src/libuv-$LIBUV_VERSION \ && ldconfig \ && apt-get -y purge autoconf automake build-essential libtool \ && apt-get -y autoremove \ && apt-get -y clean \ && rm -rf /var/lib/apt/lists/*  Next step is mono. I’m using nightly builds here, but any modern installation would probably work. It’s pretty straightforward: mono-devel is required to get the system libraries (otherwise dotnet restore will be unable to restore frameworkDependencies), the rest is the compiler. ENV MONO_VERSION 4.7.0.559 ENV DEBIAN_MONO_VERSION 4.7.0.559-0nightly1 RUN apt-key adv --keyserver hkp://pgp.mit.edu:80 --recv-keys 3FA7E0328081BFF6A14DA29AA6A19B38D3D831EF \ && echo "deb http://download.mono-project.com/repo/debian nightly main" > /etc/apt/sources.list.d/mono-nightly.list \ && apt-get update \ && apt-get upgrade -y \ && apt-get install -y mono-runtime=$DEBIAN_MONO_VERSION mono-mcs=$DEBIAN_MONO_VERSION mono-xbuild=$DEBIAN_MONO_VERSION mono-devel=\$DEBIAN_MONO_VERSION ca-certificates-mono \
&& apt-get -y autoremove \
&& apt-get -y clean \
&& rm -rf /var/lib/apt/lists/*


Finally, .NET Core 1.0.1 and the preview tooling (dotnet-dev-1.0.0-preview2-003131). This block brings in the tooling:

RUN     apt-get update \
&& apt-get install -y apt-transport-https \
&& echo "deb [arch=amd64] https://apt-mo.trafficmanager.net/repos/dotnet/ trusty main" > /etc/apt/sources.list.d/dotnetdev.list \
&& apt-key adv --keyserver apt-mo.trafficmanager.net --recv-keys 417A0893 \
&& apt-get update \
&& apt-get install -y dotnet-dev-1.0.0-preview2-003131 \
&& apt-get -y autoremove \
&& apt-get -y clean \
&& rm -rf /var/lib/apt/lists/*


For docker versions earlier than 1.11.0, you’d also need an additional environment setting to prevent this bug:

ENV LTTNG_UST_REGISTER_TIMEOUT 0


In order to prevent the “one-time” warmup of the dotnet CLI tool, add

ENV NUGET_XMLDOC_MODE skip
RUN mkdir warmup \
&& cd warmup \
&& dotnet new \
&& cd .. \
&& rm -rf warmup


The final and missing clue is this comment on GitHub: When building net4xx (i.e. .NET Framework targets as opposed to .NET Core), you will run into pretty nasty errors stating that System.Native.dll couldn’t be found. Simply patch in a symlink to the file already installed by .NET Core and you’ll be fine:

RUN ln -s /usr/share/dotnet/shared/Microsoft.NETCore.App/1.0.1/System.Native.so /usr/lib/libSystem.Native.so && \
ldconfig


With that, you’re done. Grab the code here or pull the image from the Docker Hub using

docker pull sunside/dotnet:1.0.0-preview2-003131