Show Sidebar Hide Sidebar

# Test with Permutations the Significance of a Classification Score in Scikit-learn

In order to test if a classification score is significative a technique in repeating the classification procedure after randomizing, permuting, the labels. The p-value is then given by the percentage of runs for which the score obtained is greater than the classification score obtained in the first place.

#### New to Plotly?¶

You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

### Version¶

In [1]:
import sklearn
sklearn.__version__

Out[1]:
'0.18.1'

### Imports¶

This tutorial imports SVC, StratifiedKFold and permutation_test_score.

In [2]:
import plotly.plotly as py
import plotly.graph_objs as go

print(__doc__)

import numpy as np
from sklearn.svm import SVC
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import permutation_test_score
from sklearn import datasets

Automatically created module for IPython interactive environment


### Calculations¶

In [3]:
iris = datasets.load_iris()
X = iris.data
y = iris.target
n_classes = np.unique(y).size

# Some noisy data not correlated
random = np.random.RandomState(seed=0)
E = random.normal(size=(len(X), 2200))

# Add noisy data to the informative features for make the task harder
X = np.c_[X, E]

svm = SVC(kernel='linear')
cv = StratifiedKFold(2)

score, permutation_scores, pvalue = permutation_test_score(
svm, X, y, scoring="accuracy", cv=cv, n_permutations=100, n_jobs=1)

print("Classification score %s (pvalue : %s)" % (score, pvalue))

Classification score 0.513333333333 (pvalue : 0.00990099009901)


### Plot Results¶

View histogram of permutation scores

In [4]:
trace = go.Histogram(x=permutation_scores,
nbinsx=20,
marker=dict(color='blue',
line=dict(color='black', width=1)),
name='Permutation scores')
trace1 = go.Scatter(x=2 * [score],
y=[0, 20],
mode='lines',
line=dict(color='green', dash='dash'),
name='Classification Score'
' (pvalue %s)' % pvalue
)

trace2 = go.Scatter(x=2 * [1. / n_classes],
y=[1, 20],
mode='lines',
line=dict(color='black', dash='dash'),
name='Luck'
)

data = [trace, trace1, trace2]
layout = go.Layout(xaxis=dict(title='Score'))
fig = go.Figure(data=data, layout=layout)

In [5]:
py.iplot(fig)

Out[5]:

Author:

    Alexandre Gramfort <alexandre.gramfort@inria.fr>



    BSD 3 clause