Using sklearn.RandomizedPCA to drastically reduce PCA running times.

Hello again, the last post (for now) about dimensionality reduction tackles the problem that, even if the trick that we talked about in the last post can reduce memory consumptions and execution times, sometimes it is still not enough.

We experience this when working on facial recognition here at Meerkat, in which we had a huge set of training data with points of large dimensionality. We tried a NIPALS implementation that reduces memory consumption, but it did not improve the performance (i.e. it was too slow). Sklearn came to the rescue! The lib has a nice and easy to use implementation of a randomized PCA technique.

We made a small class to apply this method, which reads and writes .mat files for MATLAB. This was useful for us because we often implement MATLAB prototypes before going to Python/C++. We thought this code could be helpful for a lot of people searching for this, so here it goes:

import numpy as np
import as sio
from sklearn.decomposition import RandomizedPCA

class PcaReduction:
    def __init__(self, reduce_dim):
        self.reduce_dim = int(reduce_dim);
    def reduce(self, dataset_dir):
        self.rand_pca = RandomizedPCA(n_components=self.reduce_dim)
        print("Randomizing PCA extaction...")
    def load_np_data(self, filename):
        print('Reading numpy data from file...') = np.load(filename)
        print('done. Matrix size ',

    def load_mat_data(self, filename):
        print('Reading MATLAB data from file...')
        values = sio.loadmat(filename) = values.X
        print('done. Matrix size ',
    def save_mat_file(self, matlab_filename):
        mean_X = self.rand_pca.mean_
        pca_base = self.rand_pca.components_
        d_values = {'pca_base': pca_base,
                    'mean_X': mean_X}

        sio.savemat(matlab_filename, d_values)

The RandomizedPCA from sklearn is much faster than the original PCA even when the “transpose-matrix-trick” is implemented. To get an idea, with this code, we were able to reduce the execution time from around 6 hours to merely 15 minutes! Notice that the resulting PCA base of this method is not perfect, but for our facial recognition method we did not encounter any problems in the end results.



Doc for RandomizedPCA in sklearn

Small trick to compute PCA for large training datasets

In the last post, we talked about the choice of method for dimensionality reduction and how PCA is sometimes overused due to its popularity and available implementation in several libs. Having said that, we still gonna talk about PCA, because we use it a lot! 🙂 If you do not know how PCA works, a very good introduction to PCA is the one by Lindsay Smith, check it out. The PCA is simply defined as the eigenvectors of the data covariance matrix. When you want to reduce dimensionality, only the N highest eigenvalues associated with eigenvectors are kept, forming a base for reduction. If this sounds weird, don’t worry, the important part here is the covariance matrix and it’s the main cause of headaches for large datasets.

Imagine that we a have data points that have a very large dimensionality, such as 50k, for instance (this is very common in Computer Vision!). The covariance matrix of any number of points it’s going to be a 50k X 50k matrix, which is huge. To get an idea of how huge, if we use the standard 8 bytes for each float, this matrix will be around 18GB of memory, just to describe it! This problem is well known and one nice (and quite old) trick to compute PCA is described in the seminal paper of Eigenfaces [1]. So, assuming that we arrange our P points of M-dimensionality in an M by P matrix, the algebraic way of computing the covariance matrix is:


If M = 50k as in our example, this matrix will be huge, as we said. But, if we are interested in extracting only the eigenvalue/vectors from this matrix, we can compute it from the following matrix:


It turns out that we can extract the original eigenvalues/vectors using this matrix, excluding the eigenvalues (and associated vectors) which are equal to zero. In order to do that we multiply the eigenvectors after extraction with A, v = Av. This is such a nice trick because usually P << M, i.e. the number of examples of our training dataset is much smaller than the dimensions. Both MATLAB and OpenCV have this small tweak for PCA reduction implemented:

coeff = pca(X, 'Economy', true); % for MATLAB
cv::calcCovarMatrix(..., cv::CV_COVAR_SCRAMBLED) // in OpenCV

There are still some cases where P is also very large, so the main approach is to extract an approximation of PCA. We’ll be talking about it in the next post.


[1] Turk, Matthew, and Alex P. Pentland. “Face recognition using eigenfaces.”Computer Vision and Pattern Recognition, 1991. Proceedings CVPR’91., IEEE Computer Society Conference on. IEEE, 1991.