Show Sidebar Hide Sidebar

Faces Dataset Decompositions in Scikit-learn

This example applies to The Olivetti faces dataset different unsupervised matrix decomposition (dimension reduction) methods from the module sklearn.decomposition (see the documentation chapter Decomposing signals in components (matrix factorization problems)) .

New to Plotly?

Plotly's Python library is free and open source! Get started by downloading the client and reading the primer.
You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

Version

In [1]:
import sklearn
sklearn.__version__
Out[1]:
'0.18'

Imports

This tutorial imports fetch_olivetti_faces and MiniBatchKMeans.

In [2]:
print(__doc__)

import plotly.plotly as py
import plotly.graph_objs as go
from plotly import tools

import logging
from time import time
from numpy.random import RandomState
import matplotlib.pyplot as plt
import numpy as np


from sklearn.datasets import fetch_olivetti_faces
from sklearn.cluster import MiniBatchKMeans
from sklearn import decomposition
Automatically created module for IPython interactive environment

Calculations

In [3]:
# Display progress logs on stdout
logging.basicConfig(level=logging.INFO,
                    format='%(asctime)s %(levelname)s %(message)s')
n_row, n_col = 2, 3
n_components = n_row * n_col
image_shape = (64, 64)
rng = RandomState(0)

Load faces data

In [4]:
dataset = fetch_olivetti_faces(shuffle=True, random_state=rng)
faces = dataset.data

n_samples, n_features = faces.shape

# global centering
faces_centered = faces - faces.mean(axis=0)

# local centering
faces_centered -= faces_centered.mean(axis=1).reshape(n_samples, -1)

print("Dataset consists of %d faces" % n_samples)
Dataset consists of 400 faces

List of the different estimators, whether to center and transpose the problem, and whether the transformer uses the clustering API.

In [5]:
estimators = [
    ('Eigenfaces - PCA using randomized SVD',
     decomposition.PCA(n_components=n_components, svd_solver='randomized',
                       whiten=True),
     True),

    ('Non-negative components - NMF',
     decomposition.NMF(n_components=n_components, init='nndsvda', tol=5e-3),
     False),

    ('Independent components - FastICA',
     decomposition.FastICA(n_components=n_components, whiten=True),
     True),

    ('Sparse comp. - MiniBatchSparsePCA',
     decomposition.MiniBatchSparsePCA(n_components=n_components, alpha=0.8,
                                      n_iter=100, batch_size=3,
                                      random_state=rng),
     True),

    ('MiniBatchDictionaryLearning',
        decomposition.MiniBatchDictionaryLearning(n_components=15, alpha=0.1,
                                                  n_iter=50, batch_size=3,
                                                  random_state=rng),
     True),

    ('Cluster centers - MiniBatchKMeans',
        MiniBatchKMeans(n_clusters=n_components, tol=1e-3, batch_size=20,
                        max_iter=50, random_state=rng),
     True),

    ('Factor Analysis components - FA',
     decomposition.FactorAnalysis(n_components=n_components, max_iter=2),
     True),
]

Plot Results

In [6]:
def matplotlib_to_plotly(cmap, pl_entries):
    h = 1.0/(pl_entries-1)
    pl_colorscale = []
    
    for k in range(pl_entries):
        C = map(np.uint8, np.array(cmap(k*h)[:3])*255)
        pl_colorscale.append([k*h, 'rgb'+str((C[0], C[1], C[2]))])
        
    return pl_colorscale
In [7]:
def plot_gallery(title, images, n_col=n_col, n_row=n_row):
    fig = tools.make_subplots(rows=n_row, cols=n_col,
                              print_grid=False)
    
    for i, comp in enumerate(images):
        vmax = max(comp.max(), -comp.min())
        trace = go.Heatmap(z=comp.reshape(image_shape),
                           colorscale=matplotlib_to_plotly(plt.cm.gray, 20),
                           showscale=False
                          )
        if(i<3):
            row = 1
        else:
            row = 2 
            
        fig.append_trace(trace, row, i%3 +1)
    
    for i in map(str,range(1, (n_col*n_row) + 1)):
        y = 'yaxis'+ i
        x = 'xaxis'+i
        fig['layout'][y].update(autorange='reversed',
                                   showticklabels=False, ticks='')
        fig['layout'][x].update(showticklabels=False, ticks='')
        
    fig['layout'].update(title=title)
    return fig

First centered Olivetti faces

In [8]:
py.iplot(plot_gallery("First centered Olivetti faces", faces_centered[:n_components]))
Out[8]:

Do the estimation and plot it

In [9]:
plot = []
for name, estimator, center in estimators:
    print("Extracting the top %d %s..." % (n_components, name))
    t0 = time()
    data = faces
    if center:
        data = faces_centered
    estimator.fit(data)
    train_time = (time() - t0)
    print("done in %0.3fs" % train_time)
    if hasattr(estimator, 'cluster_centers_'):
        components_ = estimator.cluster_centers_
    else:
        components_ = estimator.components_
    if (hasattr(estimator, 'noise_variance_') and
            estimator.noise_variance_.shape != ()):
        plot.append(plot_gallery("Pixelwise variance",
                                 estimator.noise_variance_.reshape(1, -1), n_col=1,
                                 n_row=1))
        
    plot.append(plot_gallery('%s - Train time %.1fs' % (name, train_time),
                 components_[:n_components]))
Extracting the top 6 Eigenfaces - PCA using randomized SVD...
done in 0.298s
Extracting the top 6 Non-negative components - NMF...
done in 0.776s
Extracting the top 6 Independent components - FastICA...
done in 0.784s
Extracting the top 6 Sparse comp. - MiniBatchSparsePCA...
done in 0.947s
Extracting the top 6 MiniBatchDictionaryLearning...
done in 1.884s
Extracting the top 6 Cluster centers - MiniBatchKMeans...
done in 0.112s
Extracting the top 6 Factor Analysis components - FA...
done in 0.164s

Eigenfaces PCA using randomized SVD

In [10]:
py.iplot(plot[0])
Out[10]:

Non-negative components - NMF

In [11]:
py.iplot(plot[1])
Out[11]:

Independent components - FastlCA

In [12]:
py.iplot(plot[2])
Out[12]:

Sparse comp. - MiniBatchSparsePCA

In [13]:
py.iplot(plot[3])
Out[13]:

MiniBatchDictionaryLearning

In [14]:
py.iplot(plot[4])
Out[14]:

Cluster Centers - MiniBatchKmeans

In [15]:
py.iplot(plot[5])
Out[15]:

Pixelwise Variance

In [16]:
py.iplot(plot[6])
Out[16]:

Factor Analysis Components

In [17]:
py.iplot(plot[7])
Out[17]:

License

Authors:

      Vlad Niculae 

      Alexandre Gramfort

License:

      BSD 3 clause
Still need help?
Contact Us

For guaranteed 24 hour response turnarounds, upgrade to a Developer Support Plan.