Show Sidebar Hide Sidebar

# Lasso on Dense and Sparse Data in Scikit-learn

We show that linear_model.Lasso provides the same results for dense and sparse data and that in the case of sparse data the speed is improved.

#### New to Plotly?¶

You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

### Version¶

In [1]:
import sklearn
sklearn.__version__

Out[1]:
'0.18.1'

### Imports¶

In [2]:
print(__doc__)

from time import time
from scipy import sparse
from scipy import linalg

from sklearn.datasets.samples_generator import make_regression
from sklearn.linear_model import Lasso

Automatically created module for IPython interactive environment


### The two Lasso implementations on Dense data¶

In [3]:
print("--- Dense matrices")

X, y = make_regression(n_samples=200, n_features=5000, random_state=0)
X_sp = sparse.coo_matrix(X)

alpha = 1
sparse_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=1000)
dense_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=1000)

t0 = time()
sparse_lasso.fit(X_sp, y)
print("Sparse Lasso done in %fs" % (time() - t0))

t0 = time()
dense_lasso.fit(X, y)
print("Dense Lasso done in %fs" % (time() - t0))

print("Distance between coefficients : %s"
% linalg.norm(sparse_lasso.coef_ - dense_lasso.coef_))

--- Dense matrices
Sparse Lasso done in 0.131697s
Dense Lasso done in 0.047787s
Distance between coefficients : 8.01528740104e-14


### The two Lasso implementations on Sparse data¶

In [4]:
print("--- Sparse matrices")

Xs = X.copy()
Xs[Xs < 2.5] = 0.0
Xs = sparse.coo_matrix(Xs)
Xs = Xs.tocsc()

print("Matrix density : %s %%" % (Xs.nnz / float(X.size) * 100))

alpha = 0.1
sparse_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=10000)
dense_lasso = Lasso(alpha=alpha, fit_intercept=False, max_iter=10000)

t0 = time()
sparse_lasso.fit(Xs, y)
print("Sparse Lasso done in %fs" % (time() - t0))

t0 = time()
dense_lasso.fit(Xs.toarray(), y)
print("Dense Lasso done in %fs" % (time() - t0))

print("Distance between coefficients : %s"
% linalg.norm(sparse_lasso.coef_ - dense_lasso.coef_))

--- Sparse matrices
Matrix density : 0.6263 %
Sparse Lasso done in 0.183157s
Dense Lasso done in 1.208756s
Distance between coefficients : 8.03421128671e-12

Still need help?