Show Sidebar Hide Sidebar

Test with Permutations the Significance of a Classification Score in Scikit-learn

In order to test if a classification score is significative a technique in repeating the classification procedure after randomizing, permuting, the labels. The p-value is then given by the percentage of runs for which the score obtained is greater than the classification score obtained in the first place.

New to Plotly?

Plotly's Python library is free and open source! Get started by downloading the client and reading the primer.
You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

Version

In [1]:
import sklearn
sklearn.__version__
Out[1]:
'0.18.1'

Imports

This tutorial imports SVC, StratifiedKFold and permutation_test_score.

In [2]:
import plotly.plotly as py
import plotly.graph_objs as go

print(__doc__)

import numpy as np
from sklearn.svm import SVC
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import permutation_test_score
from sklearn import datasets
Automatically created module for IPython interactive environment

Calculations

Loading a dataset

In [3]:
iris = datasets.load_iris()
X = iris.data
y = iris.target
n_classes = np.unique(y).size

# Some noisy data not correlated
random = np.random.RandomState(seed=0)
E = random.normal(size=(len(X), 2200))

# Add noisy data to the informative features for make the task harder
X = np.c_[X, E]

svm = SVC(kernel='linear')
cv = StratifiedKFold(2)

score, permutation_scores, pvalue = permutation_test_score(
    svm, X, y, scoring="accuracy", cv=cv, n_permutations=100, n_jobs=1)

print("Classification score %s (pvalue : %s)" % (score, pvalue))
Classification score 0.513333333333 (pvalue : 0.00990099009901)

Plot Results

View histogram of permutation scores

In [4]:
trace = go.Histogram(x=permutation_scores, 
                     nbinsx=20,
                     marker=dict(color='blue',
                                line=dict(color='black', width=1)),
                     name='Permutation scores')
trace1 = go.Scatter(x=2 * [score], 
                    y=[0, 20],
                    mode='lines',
                    line=dict(color='green', dash='dash'),
                    name='Classification Score'
                         ' (pvalue %s)' % pvalue
                   )

trace2 = go.Scatter(x=2 * [1. / n_classes], 
                    y=[1, 20], 
                    mode='lines',
                    line=dict(color='black', dash='dash'),
                    name='Luck'
                   )

data = [trace, trace1, trace2]
layout = go.Layout(xaxis=dict(title='Score'))
fig = go.Figure(data=data, layout=layout)
In [5]:
py.iplot(fig)
Out[5]:

License

Author:

    Alexandre Gramfort <alexandre.gramfort@inria.fr>

License:

    BSD 3 clause
Still need help?
Contact Us

For guaranteed 24 hour response turnarounds, upgrade to a Developer Support Plan.