Show Sidebar Hide Sidebar

# Ordinary Least Squares and Ridge Regression Variance in Scikit-learn

Due to the few points in each dimension and the straight line that linear regression uses to follow these points as well as it can, noise on the observations will cause great variance as shown in the first plot. Every line’s slope can vary quite a bit for each prediction due to the noise induced in the observations.

Ridge regression is basically minimizing a penalised version of the least-squared function. The penalising shrinks the value of the regression coefficients. Despite the few data points in each dimension, the slope of the prediction is much more stable and the variance in the line itself is greatly reduced, in comparison to that of the standard linear regression

#### New to Plotly?¶

You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

### Version¶

In [1]:
import sklearn
sklearn.__version__

Out[1]:
'0.18.1'

### Imports¶

In [2]:
import plotly.plotly as py
import plotly.graph_objs as go
from plotly import tools

import numpy as np
from sklearn import linear_model


### Calculations¶

In [3]:
X_train = np.c_[.5, 1].T
y_train = [.5, 1]
X_test = np.c_[0, 2].T

np.random.seed(0)

classifiers = dict(ols=linear_model.LinearRegression(),
ridge=linear_model.Ridge(alpha=.1))


### Plot Results¶

In [4]:
fig = tools.make_subplots(rows=1, cols=2,
print_grid=False,
subplot_titles=('ols', 'ridge'))

def data_to_plotly(x):
k = []

for i in range(0, len(X_test)):
k.append(x[i][0])

return k

In [5]:
fignum = 1
for name, clf in classifiers.items():
for _ in range(6):
this_X = .1 * np.random.normal(size=(2, 1)) + X_train
clf.fit(this_X, y_train)

p1 = go.Scatter(x=data_to_plotly(X_test),
y=clf.predict(X_test),
mode='lines', showlegend=False,
line=dict(color='gray', width=1))
p2 = go.Scatter(x=data_to_plotly(this_X),
y=y_train, showlegend=False,
mode='markers',
marker=dict(color='gray')
)
fig.append_trace(p1, 1, fignum)
fig.append_trace(p2, 1, fignum)

clf.fit(X_train, y_train)

p3 = go.Scatter(x=data_to_plotly(X_test),
y=clf.predict(X_test),
mode='lines', showlegend=False,
line=dict(color='blue', width=2)
)

p4 = go.Scatter(x=data_to_plotly(X_train),
y=y_train,
mode='markers', showlegend=False,
marker=dict(color='red')
)
fig.append_trace(p3, 1, fignum)
fig.append_trace(p4, 1, fignum)
fignum += 1

for i in map(str, range(1, 3)):
x = 'xaxis' + i
y = 'yaxis' + i
fig['layout'][x].update(title='x', zeroline=False)
fig['layout'][y].update(title='y', zeroline=False)

In [6]:
py.iplot(fig)

Out[6]:

Code source:

        Gaël Varoquaux



Modified for documentation by Jaques Grobler

        BSD 3 clause
Still need help?