Show Sidebar Hide Sidebar

Validation Curves in Scikit-learn

In this plot you can see the training scores and validation scores of an SVM for different values of the kernel parameter gamma. For very low values of gamma, you can see that both the training score and the validation score are low. This is called underfitting. Medium values of gamma will result in high values for both scores, i.e. the classifier is performing fairly well. If gamma is too high, the classifier will overfit, which means that the training score is good but the validation score is poor.

New to Plotly?

Plotly's Python library is free and open source! Get started by downloading the client and reading the primer.
You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

Version

In [1]:
import sklearn
sklearn.__version__
Out[1]:
'0.18.1'

Imports

In [2]:
import plotly.plotly as py
import plotly.graph_objs as go

import numpy as np
from sklearn.datasets import load_digits
from sklearn.svm import SVC
from sklearn.model_selection import validation_curve

Calculations

In [3]:
digits = load_digits()
X, y = digits.data, digits.target

param_range = np.logspace(-6, -1, 5)
train_scores, test_scores = validation_curve(
    SVC(), X, y, param_name="gamma", param_range=param_range,
    cv=10, scoring="accuracy", n_jobs=1)
train_scores_mean = np.mean(train_scores, axis=1)
train_scores_std = np.std(train_scores, axis=1)
test_scores_mean = np.mean(test_scores, axis=1)
test_scores_std = np.std(test_scores, axis=1)

Plot Results

In [4]:
layout = go.Layout(title="Validation Curve with SVM",
                      xaxis=dict(title="$\gamma$", type='log'),
                      yaxis=dict(title="Score"))

lw = 2
p1 = go.Scatter(x=param_range, y=train_scores_mean,
                name="Training score",
                mode='lines', 
                line=dict(color="orange", width=lw))

p2 = go.Scatter(x=param_range, y=train_scores_mean - train_scores_std,
                mode='lines', showlegend=False,
                line=dict(color="orange", width=1))

p3 = go.Scatter(x=param_range, y=train_scores_mean + train_scores_std,
                mode='lines', showlegend=False,
                line=dict(color="orange", width=1),
                fill='tonexty')

p4 = go.Scatter(x=param_range, y=test_scores_mean,
                name="Cross-validation score",
                mode='lines', 
                line=dict(color="navy", width=lw))

p5 = go.Scatter(x=param_range, y=test_scores_mean - test_scores_std,
                mode='lines', showlegend=False,
                line=dict(color="navy", width=1)) 

p6 = go.Scatter(x=param_range, y=test_scores_mean + test_scores_std,
                mode='lines', showlegend=False,
                line=dict(color="navy", width=1),
                fill='tonexty') 

fig = go.Figure(data=[p2, p3, p5, p6, p1, p4], layout=layout)
In [5]:
py.iplot(fig)
Out[5]:
Still need help?
Contact Us

For guaranteed 24 hour response turnarounds, upgrade to a Developer Support Plan.