Writing Python wrappers for C++ (OpenCV) code, part I.

As I mentioned a couple of posts ago, we love to use C++ to make our methods to run fast. And, as many Vision engineers, we use OpenCV. However, there are a couple of things that can be tricky in C++, such as web services. Instead, for an upcoming API that we are developing (!), we decided to go with Python for the web part while not losing the performance gain of C++. Hence, there is a need to build a Python wrapper for the C++ code.

There are a couple of library options to that, but for our needs the Boost.Python is by far the best fit. The major aspect of writing these wrappers is to convert the data from/to Python to/from C++. The first conversion that it will be required is between matrices types: cv::Mat from OpenCV to/from numpy nd arrays.

Fortunately, there is a very useful lib that implements this converter called numpy-opencv-converter. However, there are a lot of these conversions that need to be implemented manually as it is impossible to predict the combination of data types one can write in his/her code.

We will begin with a simple example that uses the boost library to convert the Python parameters (tuples) to a C++ function (cv::Rect).

The definition of the method is this:

cv::Mat FaceDetection::getPose(const cv::Mat &image, const cv::Rect &rect);

Notice that, using the converter lib mentioned above, we will not have problems for the return value and the image parameter as they are cv::Mat. However, the Python code has no idea what is a cv::Rect, therefore we need a helper function to call this method.

Usually, you can export a method using boost.python as simple as this:

       .def("getPose", &FaceDetection::getPose)

But again, if we do this without a converter (will be discussed in Part II), there will be a problem when calling this method. Now, we can create a helper function to allow the conversion to cv::Rect when calling this method. In the Python version of OpenCV, a rectangle is defined by tuples, so the helper function becomes:

cv::Mat FaceDetection_getPose(FaceDetection& self, const cv::Mat &image, py::tuple rect) {
 py::tuple tl = py::extract(rect[0])();
 py::tuple br = py::extract(rect[1])();

 int tl_x = py::extract(tl[0])();
 int tl_y = py::extract(tl[1])();
 int br_x = py::extract(br[0])();
 int br_y = py::extract(br[1])();

 cv::Rect cv_rect(cv::Point(tl_x, tl_y), cv::Point(br_x, br_y));

 return self.getPose(image, cv_rect);

Now, if I call this function with a tuple the system knows what to do. The only thing is to bind this helper function to the method of our class:

       .def("getPose", &FaceDetection_getPose)

And voilà. I can call the method FaceDetection.getPose(…) from the Python (once the module is imported, of course) without any problem. This is nice and all, but you may be wondering if you have to do this kind of functions every time your data is not natively support by the boost.python. The answer is no, and it is fairly simple to create some converters for your datatype. We’ll show that in a future post, Part II.

Using sklearn.RandomizedPCA to drastically reduce PCA running times.

Hello again, the last post (for now) about dimensionality reduction tackles the problem that, even if the trick that we talked about in the last post can reduce memory consumptions and execution times, sometimes it is still not enough.

We experience this when working on facial recognition here at Meerkat, in which we had a huge set of training data with points of large dimensionality. We tried a NIPALS implementation that reduces memory consumption, but it did not improve the performance (i.e. it was too slow). Sklearn came to the rescue! The lib has a nice and easy to use implementation of a randomized PCA technique.

We made a small class to apply this method, which reads and writes .mat files for MATLAB. This was useful for us because we often implement MATLAB prototypes before going to Python/C++. We thought this code could be helpful for a lot of people searching for this, so here it goes:

import numpy as np
import scipy.io as sio
from sklearn.decomposition import RandomizedPCA

class PcaReduction:
    def __init__(self, reduce_dim):
        self.reduce_dim = int(reduce_dim);
    def reduce(self, dataset_dir):
        self.rand_pca = RandomizedPCA(n_components=self.reduce_dim)
        print("Randomizing PCA extaction...")
    def load_np_data(self, filename):
        print('Reading numpy data from file...')
        self.data = np.load(filename)
        print('done. Matrix size ', self.data.shape)

    def load_mat_data(self, filename):
        print('Reading MATLAB data from file...')
        values = sio.loadmat(filename)
        self.data = values.X
        print('done. Matrix size ', self.data.shape)
    def save_mat_file(self, matlab_filename):
        mean_X = self.rand_pca.mean_
        pca_base = self.rand_pca.components_
        d_values = {'pca_base': pca_base,
                    'mean_X': mean_X}

        sio.savemat(matlab_filename, d_values)

The RandomizedPCA from sklearn is much faster than the original PCA even when the “transpose-matrix-trick” is implemented. To get an idea, with this code, we were able to reduce the execution time from around 6 hours to merely 15 minutes! Notice that the resulting PCA base of this method is not perfect, but for our facial recognition method we did not encounter any problems in the end results.



Doc for RandomizedPCA in sklearn

Small trick to compute PCA for large training datasets

In the last post, we talked about the choice of method for dimensionality reduction and how PCA is sometimes overused due to its popularity and available implementation in several libs. Having said that, we still gonna talk about PCA, because we use it a lot! 🙂 If you do not know how PCA works, a very good introduction to PCA is the one by Lindsay Smith, check it out. The PCA is simply defined as the eigenvectors of the data covariance matrix. When you want to reduce dimensionality, only the N highest eigenvalues associated with eigenvectors are kept, forming a base for reduction. If this sounds weird, don’t worry, the important part here is the covariance matrix and it’s the main cause of headaches for large datasets.

Imagine that we a have data points that have a very large dimensionality, such as 50k, for instance (this is very common in Computer Vision!). The covariance matrix of any number of points it’s going to be a 50k X 50k matrix, which is huge. To get an idea of how huge, if we use the standard 8 bytes for each float, this matrix will be around 18GB of memory, just to describe it! This problem is well known and one nice (and quite old) trick to compute PCA is described in the seminal paper of Eigenfaces [1]. So, assuming that we arrange our P points of M-dimensionality in an M by P matrix, the algebraic way of computing the covariance matrix is:


If M = 50k as in our example, this matrix will be huge, as we said. But, if we are interested in extracting only the eigenvalue/vectors from this matrix, we can compute it from the following matrix:


It turns out that we can extract the original eigenvalues/vectors using this matrix, excluding the eigenvalues (and associated vectors) which are equal to zero. In order to do that we multiply the eigenvectors after extraction with A, v = Av. This is such a nice trick because usually P << M, i.e. the number of examples of our training dataset is much smaller than the dimensions. Both MATLAB and OpenCV have this small tweak for PCA reduction implemented:

coeff = pca(X, 'Economy', true); % for MATLAB
cv::calcCovarMatrix(..., cv::CV_COVAR_SCRAMBLED) // in OpenCV

There are still some cases where P is also very large, so the main approach is to extract an approximation of PCA. We’ll be talking about it in the next post.


[1] Turk, Matthew, and Alex P. Pentland. “Face recognition using eigenfaces.”Computer Vision and Pattern Recognition, 1991. Proceedings CVPR’91., IEEE Computer Society Conference on. IEEE, 1991.

Dimensionality reduction in large datasets

Hello guys, this is the first post of this tech blog about our discoveries and tips while working on the Meerkat’s products. Today we are going to talk about a problem that we faced a couple of weeks ago: how to reduce high-dimensional data and which dimensionality reduction technique to use.

First, let’s contextualize a little bit more here. So, we were working on facial recognition and extracting features from faces that will be latter used for classification. When faced with this problem in Computer Vision, usually you have two alternatives: 1) you design your features to be very discriminant and compact or 2) to hell with small features I use a dimensionality reduction afterwards to make it work. I might be exaggerating in alternative 2, but in the problem of facial recognition there is a nice paper about high dimensional features [1] (with a very good name) which shows that features in higher dimensions can be used in practice. For that, you need to reduce them to a sub-space that, while maintaining the discriminant aspect of features, make them easy to put in classifier.

Now, we are faced with the choice of a dimensionality reduction technique. There are a lot of methods for this, but Principal Component Analysis (PCA) is by far the most used. Let’s make some considerations on that. What the PCA will try to do is to find a subspace by maximising the variance between dimensions. This is super cool, since in classification we want to maintain features points that are far away in the original space also far away in the sub-space. However, if you have a supervised learning, it’s more intelligent to use this information to create your basis for the sub-space since we can create a subspace in which the classes are distant from each other. That is what methods like LDA and Partial Least Squares (PLS) does. They aim to maximize inter-class variance while minimising the intra-class variance (isn’t that clever?).

For instance, take a look at these plots extracted from the work of Schwartz [2] that uses PLS dimensionality
reduction to the problem of pedestrian detection:



There are also some important works that use LDA, PLS and derivate techniques for face identification, but that will be a subject for another post. One thing that is not well addressed in the literature (in my knowledge) is what implies to your feature if a dimensionality reduction can be excessively used without
any loss in the algorithm performance. What I mean is, if you can reduce a 100-d feature to a 2-d feature without loosing much information, this might means that your original feature is not discriminant at all! Why else that will be so much redundancy in the dimensions?


[1] – Chen, Dong, et al. “Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification.” Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on. IEEE, 2013.

[2] – Schwartz, William Robson, et al. “Human detection using partial least squares analysis.” Computer vision, 2009 IEEE 12th international conference on. IEEE, 2009.