Show Sidebar Hide Sidebar

Ordinary Least Squares and Ridge Regression Variance in Scikit-learn

Due to the few points in each dimension and the straight line that linear regression uses to follow these points as well as it can, noise on the observations will cause great variance as shown in the first plot. Every line’s slope can vary quite a bit for each prediction due to the noise induced in the observations.

Ridge regression is basically minimizing a penalised version of the least-squared function. The penalising shrinks the value of the regression coefficients. Despite the few data points in each dimension, the slope of the prediction is much more stable and the variance in the line itself is greatly reduced, in comparison to that of the standard linear regression

New to Plotly?

Plotly's Python library is free and open source! Get started by downloading the client and reading the primer.
You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

Version

In [1]:
import sklearn
sklearn.__version__
Out[1]:
'0.18.1'

Imports

In [2]:
import plotly.plotly as py
import plotly.graph_objs as go
from plotly import tools

import numpy as np
from sklearn import linear_model

Calculations

In [3]:
X_train = np.c_[.5, 1].T
y_train = [.5, 1]
X_test = np.c_[0, 2].T

np.random.seed(0)

classifiers = dict(ols=linear_model.LinearRegression(),
                   ridge=linear_model.Ridge(alpha=.1))

Plot Results

In [4]:
fig = tools.make_subplots(rows=1, cols=2,
                          print_grid=False,
                          subplot_titles=('ols', 'ridge'))

def data_to_plotly(x):
    k = []
    
    for i in range(0, len(X_test)):
        k.append(x[i][0])
    
    return k
In [5]:
fignum = 1
for name, clf in classifiers.items():
    for _ in range(6):
        this_X = .1 * np.random.normal(size=(2, 1)) + X_train
        clf.fit(this_X, y_train)

        p1 = go.Scatter(x=data_to_plotly(X_test), 
                        y=clf.predict(X_test), 
                        mode='lines', showlegend=False,
                        line=dict(color='gray', width=1))
        p2 = go.Scatter(x=data_to_plotly(this_X), 
                        y=y_train, showlegend=False,
                        mode='markers',
                        marker=dict(color='gray')
                       )
        fig.append_trace(p1, 1, fignum)
        fig.append_trace(p2, 1, fignum)
        
    clf.fit(X_train, y_train)
    
    p3 = go.Scatter(x=data_to_plotly(X_test), 
                    y=clf.predict(X_test),
                    mode='lines', showlegend=False,
                    line=dict(color='blue', width=2)
                    )
    
    p4 = go.Scatter(x=data_to_plotly(X_train), 
                    y=y_train, 
                    mode='markers', showlegend=False,
                    marker=dict(color='red')
                   )
    fig.append_trace(p3, 1, fignum)
    fig.append_trace(p4, 1, fignum)
    fignum += 1

for i in map(str, range(1, 3)):
    x = 'xaxis' + i
    y = 'yaxis' + i
    fig['layout'][x].update(title='x', zeroline=False)
    fig['layout'][y].update(title='y', zeroline=False)
In [6]:
py.iplot(fig)
Out[6]:

License

Code source:

        Gaël Varoquaux

Modified for documentation by Jaques Grobler

License:

        BSD 3 clause
Still need help?
Contact Us

For guaranteed 24 hour response turnarounds, upgrade to a Developer Support Plan.