Show Sidebar Hide Sidebar

Feature Selection Using SelectFromModel and LassoCV in Scikit-learn

Use SelectFromModel meta-transformer along with Lasso to select the best couple of features from the Boston dataset.

New to Plotly?

Plotly's Python library is free and open source! Get started by downloading the client and reading the primer.
You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

Version

In [1]:
import sklearn
sklearn.__version__
Out[1]:
'0.18.1'

Imports

Thjs tutorial imports load_boston, SelectFromModel and LassoCV.

In [2]:
import plotly.plotly as py
import plotly.graph_objs as go

import numpy as np
from sklearn.datasets import load_boston
from sklearn.feature_selection import SelectFromModel
from sklearn.linear_model import LassoCV

Calculations

In [3]:
# Load the boston dataset.
boston = load_boston()
X, y = boston['data'], boston['target']

# We use the base estimator LassoCV since the L1 norm promotes sparsity of features.
clf = LassoCV()

# Set a minimum threshold of 0.25
sfm = SelectFromModel(clf, threshold=0.25)
sfm.fit(X, y)
n_features = sfm.transform(X).shape[1]

# Reset the threshold till the number of features equals two.
# Note that the attribute can be set directly instead of repeatedly
# fitting the metatransformer.
while n_features > 2:
    sfm.threshold += 0.1
    X_transform = sfm.transform(X)
    n_features = X_transform.shape[1]

Plot Results

Plot the selected two features from X.

In [4]:
layout = go.Layout(title="Features selected from Boston using SelectFromModel with "
                         "threshold %0.3f." % sfm.threshold,
                   xaxis=dict(title="Feature number 1"),
                   yaxis=dict(title="Feature number 2")
                  )

feature1 = X_transform[:, 0]
feature2 = X_transform[:, 1]

trace = go.Scatter(x=feature1, y=feature2,
                   mode='markers',
                   marker=dict(color='red')
                  )

fig = go.Figure(data=[trace], layout=layout)
In [5]:
py.iplot(fig)
Out[5]:

License

Author:

    Manoj Kumar <mks542@nyu.edu>

License:

    BSD 3 clause
Still need help?
Contact Us

For guaranteed 24 hour response turnarounds, upgrade to a Developer Support Plan.