Show Sidebar Hide Sidebar

Lasso Model Selection Cross-Validation AIC BIC in Scikit-learn

Use the Akaike information criterion (AIC), the Bayes Information criterion (BIC) and cross-validation to select an optimal value of the regularization parameter alpha of the Lasso estimator.

Results obtained with LassoLarsIC are based on AIC/BIC criteria.

Information-criterion based model selection is very fast, but it relies on a proper estimation of degrees of freedom, are derived for large samples (asymptotic results) and assume the model is correct, i.e. that the data are actually generated by this model. They also tend to break when the problem is badly conditioned (more features than samples).

For cross-validation, we use 20-fold with 2 algorithms to compute the Lasso path: coordinate descent, as implemented by the LassoCV class, and Lars (least angle regression) as implemented by the LassoLarsCV class. Both algorithms give roughly the same results. They differ with regards to their execution speed and sources of numerical errors.

Lars computes a path solution only for each kink in the path. As a result, it is very efficient when there are only of few kinks, which is the case if there are few features or samples. Also, it is able to compute the full path without setting any meta parameter. On the opposite, coordinate descent compute the path points on a pre-specified grid (here we use the default). Thus it is more efficient if the number of grid points is smaller than the number of kinks in the path. Such a strategy can be interesting if the number of features is really large and there are enough samples to select a large amount. In terms of numerical errors, for heavily correlated variables, Lars will accumulate more errors, while the coordinate descent algorithm will only sample the path on a grid.

Note how the optimal value of alpha varies for each fold. This illustrates why nested-cross validation is necessary when trying to evaluate the performance of a method for which a parameter is chosen by cross-validation: this choice of parameter may not be optimal for unseen data.

Version

In [1]:
import sklearn
sklearn.__version__
Out[1]:
'0.18.1'

Imports

In [2]:
print(__doc__)


import plotly.plotly as py
import plotly.graph_objs as go

import time
import numpy as np

from sklearn.linear_model import LassoCV, LassoLarsCV, LassoLarsIC
from sklearn import datasets
Automatically created module for IPython interactive environment

Calculations

In [3]:
diabetes = datasets.load_diabetes()
X = diabetes.data
y = diabetes.target

rng = np.random.RandomState(42)
X = np.c_[X, rng.randn(X.shape[0], 14)]  # add some bad features

# normalize data as done by Lars to allow for comparison
X /= np.sqrt(np.sum(X ** 2, axis=0))

def data_to_plotly(coefs):
    y_ = []

    for col in range(0, len(coefs[0])):
        y_.append([ ])
        for row in range(0, len(coefs)):
            y_[col].append(coefs[row][col])
    
    return y_

LassoLarsIC: least angle regression with BIC/AIC criterion

In [4]:
model_bic = LassoLarsIC(criterion='bic')
t1 = time.time()
model_bic.fit(X, y)
t_bic = time.time() - t1
alpha_bic_ = model_bic.alpha_

model_aic = LassoLarsIC(criterion='aic')
model_aic.fit(X, y)
alpha_aic_ = model_aic.alpha_


def plot_ic_criterion(model, name, color):
    alpha_ = model.alpha_
    alphas_ = model.alphas_
    criterion_ = model.criterion_
    trace1 = go.Scatter(x=-np.log10(alphas_), y=criterion_, 
                        mode='lines',
                        line=dict(color=color),
                        name='%s criterion' % name)
    
    trace2 = go.Scatter(x=2*[-np.log10(alpha_)], 
                        y=[3550, 3900],
                        mode='lines',
                        line=dict(color=color, dash='dash'),
                        name='alpha: %s estimate' % name)
    return trace1, trace2


aic1, aic2 = plot_ic_criterion(model_aic, 'AIC', 'blue')
bic1, bic2 = plot_ic_criterion(model_bic, 'BIC', 'red')

layout = go.Layout(title='Information-criterion for model selection (training time %.3fs)'
                          % t_bic,
                   xaxis=dict(title='-log(alpha)', zeroline=False),
                   yaxis=dict(title='criterion')
                  )
fig = go.Figure(data=[aic1, aic2, bic1, bic2], layout=layout)
In [5]:
py.iplot(fig)
Out[5]:

LassoCV: coordinate descent

In [6]:
# Compute paths
print("Computing regularization path using the coordinate descent lasso...")
t1 = time.time()
model = LassoCV(cv=20).fit(X, y)
t_lasso_cv = time.time() - t1

# Display results
m_log_alphas = -np.log10(model.alphas_)
Computing regularization path using the coordinate descent lasso...
In [7]:
ymin, ymax = 2300, 3800
data = []
y_ = data_to_plotly(model.mse_path_)

for i in range(0, len(y_)):
    p1 = go.Scatter(x=m_log_alphas, y=y_[i],
                    mode='lines', line=dict(dash='dot', width=1),
                    showlegend=False)
    data.append(p1)

p2 = go.Scatter(x=m_log_alphas, y=model.mse_path_.mean(axis=-1),
                mode='lines', line=dict(color='black'),
                name='Average across the folds')

p3 = go.Scatter(x=2*[-np.log10(model.alpha_)],
                y=[ymin, ymax],
                mode='lines', line=dict(color='black', dash='dashdot'),
                name='alpha: CV estimate')

data.append(p2)
data.append(p3)

layout = go.Layout(title='Mean square error on each fold: coordinate descent '
                          '(train time: %.2fs)' % t_lasso_cv,
                   hovermode='closest',
                   xaxis=dict(title='-log(alpha)', zeroline=False),
                   yaxis=dict(title='Mean square error', zeroline=False,
                              range=[ymin, ymax])
                  )
fig = go.Figure(data=data, layout=layout)
In [8]:
py.iplot(fig)
Out[8]:

LassoLarsCV: least angle regression

In [9]:
# Compute paths
print("Computing regularization path using the Lars lasso...")
t1 = time.time()
model = LassoLarsCV(cv=20).fit(X, y)
t_lasso_lars_cv = time.time() - t1
Computing regularization path using the Lars lasso...
In [10]:
m_log_alphas = -np.log10(model.cv_alphas_)
data = []
y_ = data_to_plotly(model.cv_mse_path_)

for i in range(0, len(y_)):
    p1 = go.Scatter(x=m_log_alphas, y=y_[i],
                    mode='lines', line=dict(dash='dot', width=1),
                    showlegend=False)
    data.append(p1)

p2 = go.Scatter(x=m_log_alphas, y=model.cv_mse_path_.mean(axis=-1),
                mode='lines', line=dict(color='black'),
                name='Average across the folds')

p3 = go.Scatter(x=2*[-np.log10(model.alpha_)],
                y=[ymin, ymax],
                mode='lines', line=dict(color='black', dash='dashdot'),
                name='alpha: CV estimate')

data.append(p2)
data.append(p3)

layout = go.Layout(title='Mean square error on each fold: Lars (train time: %.2fs)'
                          % t_lasso_lars_cv,
                   hovermode='closest',
                   xaxis=dict(title='-log(alpha)', zeroline=False),
                   yaxis=dict(title='Mean square error', zeroline=False,
                              range=[ymin, ymax])
                  )
fig = go.Figure(data=data, layout=layout)
In [11]:
py.iplot(fig)
Out[11]:

License

Authors:

    Olivier Grisel,

    Gael Varoquaux, 

    Alexandre Gramfort


License:

    BSD 3 clause
Still need help?
Contact Us

For guaranteed 24 hour response turnarounds, upgrade to a Developer Support Plan.