Show Sidebar Hide Sidebar

# Faces Dataset Decompositions in Scikit-learn

This example applies to The Olivetti faces dataset different unsupervised matrix decomposition (dimension reduction) methods from the module sklearn.decomposition (see the documentation chapter Decomposing signals in components (matrix factorization problems)) .

#### New to Plotly?¶

You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

### Version¶

In [1]:
import sklearn
sklearn.__version__

Out[1]:
'0.18'

### Imports¶

This tutorial imports fetch_olivetti_faces and MiniBatchKMeans.

In [2]:
print(__doc__)

import plotly.plotly as py
import plotly.graph_objs as go
from plotly import tools

import logging
from time import time
from numpy.random import RandomState
import matplotlib.pyplot as plt
import numpy as np

from sklearn.datasets import fetch_olivetti_faces
from sklearn.cluster import MiniBatchKMeans
from sklearn import decomposition

Automatically created module for IPython interactive environment


### Calculations¶

In [3]:
# Display progress logs on stdout
logging.basicConfig(level=logging.INFO,
format='%(asctime)s %(levelname)s %(message)s')
n_row, n_col = 2, 3
n_components = n_row * n_col
image_shape = (64, 64)
rng = RandomState(0)


In [4]:
dataset = fetch_olivetti_faces(shuffle=True, random_state=rng)
faces = dataset.data

n_samples, n_features = faces.shape

# global centering
faces_centered = faces - faces.mean(axis=0)

# local centering
faces_centered -= faces_centered.mean(axis=1).reshape(n_samples, -1)

print("Dataset consists of %d faces" % n_samples)

Dataset consists of 400 faces


List of the different estimators, whether to center and transpose the problem, and whether the transformer uses the clustering API.

In [5]:
estimators = [
('Eigenfaces - PCA using randomized SVD',
decomposition.PCA(n_components=n_components, svd_solver='randomized',
whiten=True),
True),

('Non-negative components - NMF',
decomposition.NMF(n_components=n_components, init='nndsvda', tol=5e-3),
False),

('Independent components - FastICA',
decomposition.FastICA(n_components=n_components, whiten=True),
True),

('Sparse comp. - MiniBatchSparsePCA',
decomposition.MiniBatchSparsePCA(n_components=n_components, alpha=0.8,
n_iter=100, batch_size=3,
random_state=rng),
True),

('MiniBatchDictionaryLearning',
decomposition.MiniBatchDictionaryLearning(n_components=15, alpha=0.1,
n_iter=50, batch_size=3,
random_state=rng),
True),

('Cluster centers - MiniBatchKMeans',
MiniBatchKMeans(n_clusters=n_components, tol=1e-3, batch_size=20,
max_iter=50, random_state=rng),
True),

('Factor Analysis components - FA',
decomposition.FactorAnalysis(n_components=n_components, max_iter=2),
True),
]


### Plot Results¶

In [6]:
def matplotlib_to_plotly(cmap, pl_entries):
h = 1.0/(pl_entries-1)
pl_colorscale = []

for k in range(pl_entries):
C = map(np.uint8, np.array(cmap(k*h)[:3])*255)
pl_colorscale.append([k*h, 'rgb'+str((C[0], C[1], C[2]))])

return pl_colorscale

In [7]:
def plot_gallery(title, images, n_col=n_col, n_row=n_row):
fig = tools.make_subplots(rows=n_row, cols=n_col,
print_grid=False)

for i, comp in enumerate(images):
vmax = max(comp.max(), -comp.min())
trace = go.Heatmap(z=comp.reshape(image_shape),
colorscale=matplotlib_to_plotly(plt.cm.gray, 20),
showscale=False
)
if(i<3):
row = 1
else:
row = 2

fig.append_trace(trace, row, i%3 +1)

for i in map(str,range(1, (n_col*n_row) + 1)):
y = 'yaxis'+ i
x = 'xaxis'+i
fig['layout'][y].update(autorange='reversed',
showticklabels=False, ticks='')
fig['layout'][x].update(showticklabels=False, ticks='')

fig['layout'].update(title=title)
return fig


### First centered Olivetti faces¶

In [8]:
py.iplot(plot_gallery("First centered Olivetti faces", faces_centered[:n_components]))

Out[8]:

Do the estimation and plot it

In [9]:
plot = []
for name, estimator, center in estimators:
print("Extracting the top %d %s..." % (n_components, name))
t0 = time()
data = faces
if center:
data = faces_centered
estimator.fit(data)
train_time = (time() - t0)
print("done in %0.3fs" % train_time)
if hasattr(estimator, 'cluster_centers_'):
components_ = estimator.cluster_centers_
else:
components_ = estimator.components_
if (hasattr(estimator, 'noise_variance_') and
estimator.noise_variance_.shape != ()):
plot.append(plot_gallery("Pixelwise variance",
estimator.noise_variance_.reshape(1, -1), n_col=1,
n_row=1))

plot.append(plot_gallery('%s - Train time %.1fs' % (name, train_time),
components_[:n_components]))

Extracting the top 6 Eigenfaces - PCA using randomized SVD...
done in 0.298s
Extracting the top 6 Non-negative components - NMF...
done in 0.776s
Extracting the top 6 Independent components - FastICA...
done in 0.784s
Extracting the top 6 Sparse comp. - MiniBatchSparsePCA...
done in 0.947s
Extracting the top 6 MiniBatchDictionaryLearning...
done in 1.884s
Extracting the top 6 Cluster centers - MiniBatchKMeans...
done in 0.112s
Extracting the top 6 Factor Analysis components - FA...
done in 0.164s


### Eigenfaces PCA using randomized SVD¶

In [10]:
py.iplot(plot[0])

Out[10]:

### Non-negative components - NMF¶

In [11]:
py.iplot(plot[1])

Out[11]:

### Independent components - FastlCA¶

In [12]:
py.iplot(plot[2])

Out[12]:

### Sparse comp. - MiniBatchSparsePCA¶

In [13]:
py.iplot(plot[3])

Out[13]:

### MiniBatchDictionaryLearning¶

In [14]:
py.iplot(plot[4])

Out[14]:

### Cluster Centers - MiniBatchKmeans¶

In [15]:
py.iplot(plot[5])

Out[15]:

### Pixelwise Variance¶

In [16]:
py.iplot(plot[6])

Out[16]:

### Factor Analysis Components¶

In [17]:
py.iplot(plot[7])

Out[17]:

Authors:

      Vlad Niculae

Alexandre Gramfort



      BSD 3 clause