But back to the subject: why am I writing about this, since the OpenCV already have the implementations of both SVM and HoG which are quite easy to use? Well, they may be easy to use, but they don’t work very well together. The HoG object detector may be called with an SVM classifier, but not in the format that the SVM classifier from OpenCV works. That really means that if you train a SVM using HoG features, it is not possible to use it on the cv::HOGDescriptor::detect() function.

Fortunately, this is easy to solve: we just need to convert the trained SVM classifier to the Primal Form. This can be done by first creating the class PrimalSVM, which is an inheritance from the the class SVM:

class PrimalSVM: public cv::SVM { public: void getSupportVector(std::vector<float>& support_vector) const; };

And then, to the magical part:

void PrimalSVM::getSupportVector(std::vector<float>& support_vector) const { int sv_count = get_support_vector_count(); const CvSVMDecisionFunc* df = decision_func; const double* alphas = df[0].alpha; double rho = df[0].rho; int var_count = get_var_count(); support_vector.resize(var_count, 0); for (unsigned int r = 0; r < (unsigned)sv_count; r++) { float myalpha = alphas[r]; const float* v = get_support_vector(r); for (int j = 0; j < var_count; j++,v++) support_vector[j] += (-myalpha) * (*v); } support_vector.push_back(rho); }

Now you can use the PrimalSVM to train a classifier just like you would do with cv::SVM, and then call getSupportVector that will give you the support vectors in the format that cv::HOGDescriptor::setSVMDetector expects. And here you go! Now you can easily create an object detector entirely on OpenCV, and using only a few lines of codes :D! You may be surprised with the results that you can achieve when training with only a handful of images. Actually, I may get into more details on the process of creating an object detector in the future…

And last but not least, another shout-out goes to DXM from Stack Overflow, which was, as far as I know, the first one to propose this solution.

PS: For the ones with more attention to details, you will notice that the signals of rho and the alphas are not the same. This may be due to some characteristics of the (older) libSVM, which was the base of the SVM OpenCV code. I don’t quite understands this particular SVM implementation details, but I don’t lose sleep over it :P.

]]>There are a couple of library options to that, but for our needs the Boost.Python is by far the best fit. The major aspect of writing these wrappers is to convert the data from/to Python to/from C++. The first conversion that it will be required is between matrices types: cv::Mat from OpenCV to/from numpy nd arrays.

Fortunately, there is a very useful lib that implements this converter called numpy-opencv-converter. However, there are a lot of these conversions that need to be implemented manually as it is impossible to predict the combination of data types one can write in his/her code.

We will begin with a simple example that uses the boost library to convert the Python parameters (tuples) to a C++ function (cv::Rect).

The definition of the method is this:

cv::Mat FaceDetection::getPose(const cv::Mat &image, const cv::Rect &rect);

Notice that, using the converter lib mentioned above, we will not have problems for the return value and the image parameter as they are cv::Mat. However, the Python code has no idea what is a cv::Rect, therefore we need a helper function to call this method.

Usually, you can export a method using boost.python as simple as this:

py::class_("FaceDetection") .def("getPose", &FaceDetection::getPose)

But again, if we do this without a converter (will be discussed in Part II), there will be a problem when calling this method. Now, we can create a helper function to allow the conversion to cv::Rect when calling this method. In the Python version of OpenCV, a rectangle is defined by tuples, so the helper function becomes:

cv::Mat FaceDetection_getPose(FaceDetection& self, const cv::Mat &image, py::tuple rect) { py::tuple tl = py::extract(rect[0])(); py::tuple br = py::extract(rect[1])(); int tl_x = py::extract(tl[0])(); int tl_y = py::extract(tl[1])(); int br_x = py::extract(br[0])(); int br_y = py::extract(br[1])(); cv::Rect cv_rect(cv::Point(tl_x, tl_y), cv::Point(br_x, br_y)); return self.getPose(image, cv_rect); }

Now, if I call this function with a tuple the system knows what to do. The only thing is to bind this helper function to the method of our class:

py::class_("FaceDetection") .def("getPose", &FaceDetection_getPose)

And voilà. I can call the method FaceDetection.getPose(…) from the Python (once the module is imported, of course) without any problem. This is nice and all, but you may be wondering if you have to do this kind of functions every time your data is not natively support by the boost.python. The answer is no, and it is fairly simple to create some converters for your datatype. We’ll show that in a future post, Part II.

]]>We experience this when working on facial recognition here at Meerkat, in which we had a huge set of training data with points of large dimensionality. We tried a NIPALS implementation that reduces memory consumption, but it did not improve the performance (i.e. it was too slow). Sklearn came to the rescue! The lib has a nice and easy to use implementation of a randomized PCA technique.

We made a small class to apply this method, which reads and writes .mat files for MATLAB. This was useful for us because we often implement MATLAB prototypes before going to Python/C++. We thought this code could be helpful for a lot of people searching for this, so here it goes:

import numpy as np import scipy.io as sio from sklearn.decomposition import RandomizedPCA class PcaReduction: def __init__(self, reduce_dim): self.reduce_dim = int(reduce_dim); def reduce(self, dataset_dir): self.rand_pca = RandomizedPCA(n_components=self.reduce_dim) print("Randomizing PCA extaction...") self.rand_pca.fit(self.data) print("done.") def load_np_data(self, filename): print('Reading numpy data from file...') self.data = np.load(filename) print('done. Matrix size ', self.data.shape) def load_mat_data(self, filename): print('Reading MATLAB data from file...') values = sio.loadmat(filename) self.data = values.X print('done. Matrix size ', self.data.shape) def save_mat_file(self, matlab_filename): mean_X = self.rand_pca.mean_ pca_base = self.rand_pca.components_ d_values = {'pca_base': pca_base, 'mean_X': mean_X} sio.savemat(matlab_filename, d_values)

The RandomizedPCA from sklearn is much faster than the original PCA even when the “transpose-matrix-trick” is implemented. To get an idea, with this code, we were able to reduce the execution time from around 6 hours to merely 15 minutes! Notice that the resulting PCA base of this method is not perfect, but for our facial recognition method we did not encounter any problems in the end results.

Enjoy!

References:

]]>Imagine that we a have data points that have a very large dimensionality, such as 50k, for instance (this is very common in Computer Vision!). The covariance matrix of any number of points it’s going to be a 50k X 50k matrix, which is huge. To get an idea of how huge, if we use the standard 8 bytes for each float, this matrix will be around 18GB of memory, just to describe it! This problem is well known and one nice (and quite old) trick to compute PCA is described in the seminal paper of Eigenfaces [1]. So, assuming that we arrange our P points of M-dimensionality in an M by P matrix, the algebraic way of computing the covariance matrix is:

If M = 50k as in our example, this matrix will be huge, as we said. But, if we are interested in extracting only the eigenvalue/vectors from this matrix, we can compute it from the following matrix:

It turns out that we can extract the original eigenvalues/vectors using this matrix, excluding the eigenvalues (and associated vectors) which are equal to zero. In order to do that we multiply the eigenvectors after extraction with A, v = Av. This is such a nice trick because usually P << M, i.e. the number of examples of our training dataset is much smaller than the dimensions. Both MATLAB and OpenCV have this small tweak for PCA reduction implemented:

coeff = pca(X, 'Economy', true); % for MATLAB

cv::calcCovarMatrix(..., cv::CV_COVAR_SCRAMBLED) // in OpenCV

There are still some cases where P is also very large, so the main approach is to extract an approximation of PCA. We’ll be talking about it in the next post.

References:

[1] Turk, Matthew, and Alex P. Pentland. “Face recognition using eigenfaces.”*Computer Vision and Pattern Recognition, 1991. Proceedings CVPR’91., IEEE Computer Society Conference on*. IEEE, 1991.

First, let’s contextualize a little bit more here. So, we were working on facial recognition and extracting features from faces that will be latter used for classification. When faced with this problem in Computer Vision, usually you have two alternatives: 1) you design your features to be very discriminant and compact or 2) to hell with small features I use a dimensionality reduction afterwards to make it work. I might be exaggerating in alternative 2, but in the problem of facial recognition there is a nice paper about high dimensional features [1] (with a very good name) which shows that features in higher dimensions can be used in practice. For that, you need to reduce them to a sub-space that, while maintaining the discriminant aspect of features, make them easy to put in classifier.

Now, we are faced with the choice of a dimensionality reduction technique. There are a lot of methods for this, but Principal Component Analysis (PCA) is by far the most used. Let’s make some considerations on that. What the PCA will try to do is to find a subspace by maximising the variance between dimensions. This is super cool, since in classification we want to maintain features points that are far away in the original space also far away in the sub-space. However, if you have a supervised learning, it’s more intelligent to use this information to create your basis for the sub-space since we can create a subspace in which the classes are distant from each other. That is what methods like LDA and Partial Least Squares (PLS) does. They aim to maximize inter-class variance while minimising the intra-class variance (isn’t that clever?).

For instance, take a look at these plots extracted from the work of Schwartz [2] that uses PLS dimensionality

reduction to the problem of pedestrian detection:

There are also some important works that use LDA, PLS and derivate techniques for face identification, but that will be a subject for another post. One thing that is not well addressed in the literature (in my knowledge) is what implies to your feature if a dimensionality reduction can be excessively used without

any loss in the algorithm performance. What I mean is, if you can reduce a 100-d feature to a 2-d feature without loosing much information, this might means that your original feature is not discriminant at all! Why else that will be so much redundancy in the dimensions?

References:

[1] – Chen, Dong, et al. “Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification.” *Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on*. IEEE, 2013.

[2] – Schwartz, William Robson, et al. “Human detection using partial least squares analysis.” *Computer vision, 2009 IEEE 12th international conference on*. IEEE, 2009.