Show Sidebar Hide Sidebar

SVM Weighted Samples in Scikit-learn

Plot decision function of a weighted dataset, where the size of points is proportional to its weight.

The sample weighting rescales the C parameter, which means that the classifier puts more emphasis on getting these points right. The effect might often be subtle. To emphasize the effect here, we particularly weight outliers, making the deformation of the decision boundary very visible.

New to Plotly?

Plotly's Python library is free and open source! Get started by downloading the client and reading the primer.
You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

Version

In [1]:
import sklearn
sklearn.__version__
Out[1]:
'0.18.1'

Imports

In [2]:
print(__doc__)

import plotly.plotly as py
import plotly.graph_objs as go
from plotly import tools

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
Automatically created module for IPython interactive environment

Calculations

In [3]:
# we create 20 points
np.random.seed(0)
X = np.r_[np.random.randn(10, 2) + [1, 1], np.random.randn(10, 2)]
y = [1] * 10 + [-1] * 10
sample_weight_last_ten = abs(np.random.randn(len(X)))
sample_weight_constant = np.ones(len(X))
# and bigger weights to some outliers
sample_weight_last_ten[15:] *= 5
sample_weight_last_ten[9] *= 15

# for reference, first fit without class weights

# fit the model
clf_weights = svm.SVC()
clf_weights.fit(X, y, sample_weight=sample_weight_last_ten)

clf_no_weights = svm.SVC()
clf_no_weights.fit(X, y)
Out[3]:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

Plot Results

In [4]:
def matplotlib_to_plotly(cmap, pl_entries):
    h = 1.0/(pl_entries-1)
    pl_colorscale = []
    
    for k in range(pl_entries):
        C = map(np.uint8, np.array(cmap(k*h)[:3])*255)
        pl_colorscale.append([k*h, 'rgb'+str((C[0], C[1], C[2]))])
        
    return pl_colorscale

cmap = matplotlib_to_plotly(plt.cm.bone, 4)

fig = tools.make_subplots(rows=1, cols=2,
                          subplot_titles=("Constant weights",
                                          "Modified weights"))
This is the format of your plot grid:
[ (1,1) x1,y1 ]  [ (1,2) x2,y2 ]

In [5]:
def plot_decision_function(classifier, sample_weight, col):
    # plot the decision function
    x_ = np.linspace(-4, 5, 500)
    y_ = np.linspace(-4, 5, 500)
    xx, yy = np.meshgrid(x_, y_)

    Z = classifier.decision_function(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    # plot the line, the points, and the nearest vectors to the plane
    p1 = go.Contour(x=x_, y=y_, z=Z, colorscale=cmap,
                    showscale=False,)
    p2 = go.Scatter(x=X[:, 0], y=X[:, 1], 
                    mode='markers', 
                    showlegend=False,
                    marker=dict(color=y, showscale=False,
                                colorscale=cmap,
                                line=dict(color='black', width=1)))

    fig.append_trace(p1, 1, col)
    fig.append_trace(p2, 1, col)
In [6]:
plot_decision_function(clf_no_weights, sample_weight_constant, 1)
plot_decision_function(clf_weights, sample_weight_last_ten, 2)

for i in map(str, range(1, 3)):
        y = 'yaxis' + i
        x = 'xaxis' + i
        fig['layout'][y].update(showticklabels=False, ticks='')
        fig['layout'][x].update(showticklabels=False, ticks='')
        
In [7]:
py.iplot(fig)
Out[7]:
Still need help?
Contact Us

For guaranteed 24 hour response turnarounds, upgrade to a Developer Support Plan.