Show Sidebar Hide Sidebar

# Principal Components Analysis (PCA) in Scikit-learn

Note: this page is part of the documentation for version 3 of Plotly.py, which is not the most recent version.
See our Version 4 Migration Guide for information about how to upgrade.

These figures aid in illustrating how a point cloud can be very flat in one direction–which is where PCA comes in to choose a direction that is not flat.

#### New to Plotly?¶

Plotly's Python library is free and open source! Get started by downloading the client and reading the primer.
You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

### Version¶

In :
import sklearn
sklearn.__version__

Out:
'0.18'

### Imports¶

This tutorial imports PCA.

In :
print(__doc__)

import plotly.plotly as py
import plotly.graph_objs as go

from sklearn.decomposition import PCA

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

Automatically created module for IPython interactive environment


### Calculations¶

Create the data

In :
e = np.exp(1)
np.random.seed(4)

def pdf(x):
return 0.5 * (stats.norm(scale=0.25 / e).pdf(x)
+ stats.norm(scale=4 / e).pdf(x))

y = np.random.normal(scale=0.5, size=(30000))
x = np.random.normal(scale=0.5, size=(30000))
z = np.random.normal(scale=0.1, size=len(x))

density = pdf(x) * pdf(y)
pdf_z = pdf(5 * z)

density *= pdf_z

a = x + y
b = 2 * y
c = a - b + z

norm = np.sqrt(a.var() + b.var())
a /= norm
b /= norm


### Plot Figures¶

In :
def plot_figs(elev, azim):

scatter = go.Scatter3d(x=a[::10],
y=b[::10],
z=c[::10],
mode='markers',
opacity=0.5,
marker=dict(color='pink')
)
Y = np.c_[a, b, c]

# Using SciPy's SVD, this would be:
# _, pca_score, V = scipy.linalg.svd(Y, full_matrices=False)

pca = PCA(n_components=3)
pca.fit(Y)
pca_score = pca.explained_variance_ratio_
V = pca.components_

x_pca_axis, y_pca_axis, z_pca_axis = V.T * pca_score / pca_score.min()

x_pca_axis, y_pca_axis, z_pca_axis = 3 * V.T
x_pca_plane = np.r_[x_pca_axis[:2], - x_pca_axis[1::-1]]
y_pca_plane = np.r_[y_pca_axis[:2], - y_pca_axis[1::-1]]
z_pca_plane = np.r_[z_pca_axis[:2], - z_pca_axis[1::-1]]
x_pca_plane.shape = (2, 2)
y_pca_plane.shape = (2, 2)
z_pca_plane.shape = (2, 2)

surface = go.Surface(x=x_pca_plane,
y=y_pca_plane,
z=z_pca_plane,
showscale=False,
colorscale=[[0,'white'],[1,'cyan']])
data = [scatter, surface]
layout=go.Layout(scene=dict(
xaxis=dict(showgrid=False, ticks='',
showticklabels=False, zeroline=False),
yaxis=dict(showgrid=False, ticks='',
showticklabels=False, zeroline=False),
zaxis=dict(showgrid=False, ticks='',
showticklabels=False, zeroline=False))
)
fig = go.Figure(data=data, layout=layout)
return fig

In :
elev = -40
azim = -80
py.iplot(plot_figs(elev, azim))

Out:
In :
elev = 30
azim = 20
py.iplot(plot_figs(elev, azim))

Out:

Authors:

      Gael Varoquaux

Jaques Grobler

Kevin Hughes



      BSD 3 clause 