Show Sidebar Hide Sidebar

# Manifold Learning Methods on a Severed Sphere in Scikit-learn

An application of the different Manifold learning techniques on a spherical data-set. Here one can see the use of dimensionality reduction in order to gain some intuition regarding the manifold learning methods. Regarding the dataset, the poles are cut from the sphere, as well as a thin slice down its side. This enables the manifold learning techniques to ‘spread it open’ whilst projecting it onto two dimensions.

For a similar example, where the methods are applied to the S-curve dataset, see Comparison of Manifold Learning methods

Note that the purpose of the MDS is to find a low-dimensional representation of the data (here 2D) in which the distances respect well the distances in the original high-dimensional space, unlike other manifold-learning algorithms, it does not seeks an isotropic representation of the data in the low-dimensional space. Here the manifold problem matches fairly that of representing a flat map of the Earth, as with map projection

#### New to Plotly?¶

You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

### Version¶

In [1]:
import sklearn
sklearn.__version__

Out[1]:
'0.18.1'

### Imports¶

In [2]:
import plotly.plotly as py
import plotly.graph_objs as go
from plotly import tools

from time import time
import numpy as np
import matplotlib.pyplot as plt
from sklearn import manifold
from sklearn.utils import check_random_state


### Calculations¶

In [3]:
# Variables for manifold learning.
n_neighbors = 10
n_samples = 1000

# Create our sphere.
random_state = check_random_state(0)
p = random_state.rand(n_samples) * (2 * np.pi - 0.55)
t = random_state.rand(n_samples) * np.pi

# Sever the poles from the sphere.
indices = ((t < (np.pi - (np.pi / 8))) & (t > ((np.pi / 8))))
colors = p[indices]
x, y, z = np.sin(t[indices]) * np.cos(p[indices]), \
np.sin(t[indices]) * np.sin(p[indices]), \
np.cos(t[indices])

In [4]:
def matplotlib_to_plotly(cmap, pl_entries):
h = 1.0/(pl_entries-1)
pl_colorscale = []

for k in range(pl_entries):
C = map(np.uint8, np.array(cmap(k*h)[:3])*255)
pl_colorscale.append([k*h, 'rgb'+str((C[0], C[1], C[2]))])

return pl_colorscale

cmap = matplotlib_to_plotly(plt.cm.rainbow, 4)


### Plot Dataset¶

In [5]:
p1 = go.Scatter3d(x=x, y=y, z=z,
mode='markers',
marker=dict(color=x,
colorscale=cmap,
showscale=False,
line=dict(color='black', width=1)))
layout=dict(margin=dict(l=10, r=10,
t=30, b=10)
)
fig = go.Figure(data=[p1], layout=layout)

In [6]:
py.iplot(fig)

Out[6]:

### Methods: Standard, Ltsa, Hessian, Modified¶

In [7]:
methods = ['standard', 'ltsa', 'hessian', 'modified']
labels = ['LLE', 'LTSA', 'Hessian LLE', 'Modified LLE']
data = []
titles = []

sphere_data = np.array([x, y, z]).T
for i, method in enumerate(methods):
t0 = time()
trans_data = manifold\
.LocallyLinearEmbedding(n_neighbors, 2,
method=method).fit_transform(sphere_data).T
t1 = time()
print("%s: %.2g sec" % (methods[i], t1 - t0))

trace = go.Scatter(x=trans_data[0], y=trans_data[1],
mode='markers',
marker=dict(color=colors,
colorscale=cmap,
showscale=False,
line=dict(color='black', width=1)))
data.append(trace)

titles.append("%s (%.2g sec)" % (labels[i], t1 - t0))

standard: 0.15 sec
ltsa: 0.22 sec
hessian: 0.35 sec
modified: 0.24 sec


### Isomap¶

In [8]:
t0 = time()
trans_data = manifold.Isomap(n_neighbors, n_components=2)\
.fit_transform(sphere_data).T
t1 = time()
print("%s: %.2g sec" % ('ISO', t1 - t0))

trace = go.Scatter(x=trans_data[0], y=trans_data[1],
mode='markers',
marker=dict(color=colors,
colorscale=cmap,
showscale=False,
line=dict(color='black', width=1)))
data.append(trace)
titles.append("Isomap (%.2g sec)" % (t1 - t0))

ISO: 0.34 sec


### MDS¶

In [9]:
t0 = time()
mds = manifold.MDS(2, max_iter=100, n_init=1)
trans_data = mds.fit_transform(sphere_data).T
t1 = time()
print("MDS: %.2g sec" % (t1 - t0))

trace = go.Scatter(x=trans_data[0], y=trans_data[1],
mode='markers',
marker=dict(color=colors,
colorscale=cmap,
showscale=False,
line=dict(color='black', width=1)))
data.append(trace)

titles.append("MDS (%.2g sec)" % (t1 - t0))

MDS: 1.2 sec


### Spectral Embedding¶

In [10]:
t0 = time()
se = manifold.SpectralEmbedding(n_components=2,
n_neighbors=n_neighbors)
trans_data = se.fit_transform(sphere_data).T
t1 = time()
print("Spectral Embedding: %.2g sec" % (t1 - t0))

trace = go.Scatter(x=trans_data[0], y=trans_data[1],
mode='markers',
marker=dict(color=colors,
colorscale=cmap,
showscale=False,
line=dict(color='black', width=1)))
data.append(trace)

titles.append("SpectralEmbedding (%.2g sec)" % (t1 - t0))

Spectral Embedding: 0.1 sec


### t-SNE¶

In [11]:
t0 = time()
tsne = manifold.TSNE(n_components=2, init='pca', random_state=0)
trans_data = tsne.fit_transform(sphere_data).T
t1 = time()
print("t-SNE: %.2g sec" % (t1 - t0))

trace = go.Scatter(x=trans_data[0], y=trans_data[1],
mode='markers',
marker=dict(color=colors,
colorscale=cmap,
showscale=False,
line=dict(color='black', width=1)))
data.append(trace)
titles.append("t-SNE (%.2g sec)" % (t1 - t0))

t-SNE: 2.7 sec

In [12]:
fig = tools.make_subplots(rows=2, cols=4,
subplot_titles=tuple(titles))

for i in range(0, len(data)):
fig.append_trace(data[i], (i/4)+1, (i%4)+1)

fig['layout'].update(title="Manifold Learning with %i points, %i neighbors" % (1000, n_neighbors),
showlegend=False, height=900, hovermode='closest')

This is the format of your plot grid:
[ (1,1) x1,y1 ]  [ (1,2) x2,y2 ]  [ (1,3) x3,y3 ]  [ (1,4) x4,y4 ]
[ (2,1) x5,y5 ]  [ (2,2) x6,y6 ]  [ (2,3) x7,y7 ]  [ (2,4) x8,y8 ]


In [13]:
py.iplot(fig)

Out[13]:

Author:

    Jaques Grobler <jaques.grobler@inria.fr>



    BSD 3 clause