Show Sidebar Hide Sidebar

Recursive Feature Elimination With Cross-Validation in Scikit-learn

A recursive feature elimination example with automatic tuning of the number of features selected with cross-validation.

New to Plotly?¶

You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

Version¶

In [1]:
import sklearn
sklearn.__version__

Out[1]:
'0.18.1'

Imports¶

This tutorial imports SVC, StratifiedKFold, RFECV and make_classification.

In [2]:
print(__doc__)

import plotly.plotly as py
import plotly.graph_objs as go

from sklearn.svm import SVC
from sklearn.model_selection import StratifiedKFold
from sklearn.feature_selection import RFECV
from sklearn.datasets import make_classification

Automatically created module for IPython interactive environment


Calculations¶

In [3]:
# Build a classification task using 3 informative features
X, y = make_classification(n_samples=1000, n_features=25, n_informative=3,
n_redundant=2, n_repeated=0, n_classes=8,
n_clusters_per_class=1, random_state=0)

# Create the RFE object and compute a cross-validated score.
svc = SVC(kernel="linear")
# The "accuracy" scoring is proportional to the number of correct
# classifications
rfecv = RFECV(estimator=svc, step=1, cv=StratifiedKFold(2),
scoring='accuracy')
rfecv.fit(X, y)

print("Optimal number of features : %d" % rfecv.n_features_)

Optimal number of features : 3


Plot number of features VS. cross-validation scores¶

In [4]:
trace = go.Scatter(x=range(1, len(rfecv.grid_scores_) + 1),
y=rfecv.grid_scores_)

layout = go.Layout(xaxis=dict(title="Number of features selected",
showgrid=False),
yaxis=dict(title="Cross validation score (nb of correct classifications)",
showgrid=False)
)

fig = go.Figure(data=[trace], layout=layout)

In [5]:
py.iplot(fig)

Out[5]:
Still need help?