Show Sidebar Hide Sidebar

# Principal Components Analysis (PCA) in Scikit-learn

See our Version 4 Migration Guide for information about how to upgrade.

These figures aid in illustrating how a point cloud can be very flat in one directionâ€“which is where PCA comes in to choose a direction that is not flat.

#### New to Plotly?¶

You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

### Version¶

In [1]:
import sklearn
sklearn.__version__

Out[1]:
'0.18'

### Imports¶

This tutorial imports PCA.

In [2]:
print(__doc__)

import plotly.plotly as py
import plotly.graph_objs as go

from sklearn.decomposition import PCA

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

Automatically created module for IPython interactive environment


### Calculations¶

Create the data

In [3]:
e = np.exp(1)
np.random.seed(4)

def pdf(x):
return 0.5 * (stats.norm(scale=0.25 / e).pdf(x)
+ stats.norm(scale=4 / e).pdf(x))

y = np.random.normal(scale=0.5, size=(30000))
x = np.random.normal(scale=0.5, size=(30000))
z = np.random.normal(scale=0.1, size=len(x))

density = pdf(x) * pdf(y)
pdf_z = pdf(5 * z)

density *= pdf_z

a = x + y
b = 2 * y
c = a - b + z

norm = np.sqrt(a.var() + b.var())
a /= norm
b /= norm


### Plot Figures¶

In [4]:
def plot_figs(elev, azim):

scatter = go.Scatter3d(x=a[::10],
y=b[::10],
z=c[::10],
mode='markers',
opacity=0.5,
marker=dict(color='pink')
)
Y = np.c_[a, b, c]

# Using SciPy's SVD, this would be:
# _, pca_score, V = scipy.linalg.svd(Y, full_matrices=False)

pca = PCA(n_components=3)
pca.fit(Y)
pca_score = pca.explained_variance_ratio_
V = pca.components_

x_pca_axis, y_pca_axis, z_pca_axis = V.T * pca_score / pca_score.min()

x_pca_axis, y_pca_axis, z_pca_axis = 3 * V.T
x_pca_plane = np.r_[x_pca_axis[:2], - x_pca_axis[1::-1]]
y_pca_plane = np.r_[y_pca_axis[:2], - y_pca_axis[1::-1]]
z_pca_plane = np.r_[z_pca_axis[:2], - z_pca_axis[1::-1]]
x_pca_plane.shape = (2, 2)
y_pca_plane.shape = (2, 2)
z_pca_plane.shape = (2, 2)

surface = go.Surface(x=x_pca_plane,
y=y_pca_plane,
z=z_pca_plane,
showscale=False,
colorscale=[[0,'white'],[1,'cyan']])
data = [scatter, surface]
layout=go.Layout(scene=dict(
xaxis=dict(showgrid=False, ticks='',
showticklabels=False, zeroline=False),
yaxis=dict(showgrid=False, ticks='',
showticklabels=False, zeroline=False),
zaxis=dict(showgrid=False, ticks='',
showticklabels=False, zeroline=False))
)
fig = go.Figure(data=data, layout=layout)
return fig

In [5]:
elev = -40
azim = -80
py.iplot(plot_figs(elev, azim))

Out[5]:
In [6]:
elev = 30
azim = 20
py.iplot(plot_figs(elev, azim))

Out[6]:

Authors:

      Gael Varoquaux

Jaques Grobler

Kevin Hughes



      BSD 3 clause