Show Sidebar Hide Sidebar

Comparison of Kernel Ridge Regression and SVR in Scikit-learn

Both kernel ridge regression (KRR) and SVR learn a non-linear function by employing the kernel trick, i.e., they learn a linear function in the space induced by the respective kernel which corresponds to a non-linear function in the original space. They differ in the loss functions (ridge versus epsilon-insensitive loss). In contrast to SVR, fitting a KRR can be done in closed-form and is typically faster for medium-sized datasets. On the other hand, the learned model is non-sparse and thus slower than SVR at prediction-time.

This example illustrates both methods on an artificial dataset, which consists of a sinusoidal target function and strong noise added to every fifth datapoint. The first figure compares the learned model of KRR and SVR when both complexity/regularization and bandwidth of the RBF kernel are optimized using grid-search. The learned functions are very similar; however, fitting KRR is approx. seven times faster than fitting SVR (both with grid-search). However, prediction of 100000 target values is more than tree times faster with SVR since it has learned a sparse model using only approx. 1/3 of the 100 training datapoints as support vectors.

The next figure compares the time for fitting and prediction of KRR and SVR for different sizes of the training set. Fitting KRR is faster than SVR for medium- sized training sets (less than 1000 samples); however, for larger training sets SVR scales better. With regard to prediction time, SVR is faster than KRR for all sizes of the training set because of the learned sparse solution. Note that the degree of sparsity and thus the prediction time depends on the parameters epsilon and C of the SVR.

New to Plotly?

Plotly's Python library is free and open source! Get started by downloading the client and reading the primer.
You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

Version

In [1]:
import sklearn
sklearn.__version__
Out[1]:
'0.18'

Imports

This tutorial imports SVR, GridSearchCV and KernelRidge

In [2]:
import plotly.plotly as py
import plotly.graph_objs as go
from plotly import tools

from __future__ import division
import time
import numpy as np
from sklearn.svm import SVR
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import learning_curve
from sklearn.kernel_ridge import KernelRidge

Calculatons

Generate sample data

In [3]:
rng = np.random.RandomState(0)
X = 5 * rng.rand(10000, 1)
y = np.sin(X).ravel()

# Add noise to targets
y[::5] += 3 * (0.5 - rng.rand(X.shape[0]/5))

X_plot = np.linspace(0, 5, 100000)[:, None]

Fit regression model

In [4]:
train_size = 100
svr = GridSearchCV(SVR(kernel='rbf', gamma=0.1), cv=5,
                   param_grid={"C": [1e0, 1e1, 1e2, 1e3],
                               "gamma": np.logspace(-2, 2, 5)})

kr = GridSearchCV(KernelRidge(kernel='rbf', gamma=0.1), cv=5,
                  param_grid={"alpha": [1e0, 0.1, 1e-2, 1e-3],
                              "gamma": np.logspace(-2, 2, 5)})

t0 = time.time()
svr.fit(X[:train_size], y[:train_size])
svr_fit = time.time() - t0
print("SVR complexity and bandwidth selected and model fitted in %.3f s"
      % svr_fit)

t0 = time.time()
kr.fit(X[:train_size], y[:train_size])
kr_fit = time.time() - t0
print("KRR complexity and bandwidth selected and model fitted in %.3f s"
      % kr_fit)

sv_ratio = svr.best_estimator_.support_.shape[0] / train_size
print("Support vector ratio: %.3f" % sv_ratio)

t0 = time.time()
y_svr = svr.predict(X_plot)
svr_predict = time.time() - t0
print("SVR prediction for %d inputs in %.3f s"
      % (X_plot.shape[0], svr_predict))

t0 = time.time()
y_kr = kr.predict(X_plot)
kr_predict = time.time() - t0
print("KRR prediction for %d inputs in %.3f s"
      % (X_plot.shape[0], kr_predict))
SVR complexity and bandwidth selected and model fitted in 0.898 s
KRR complexity and bandwidth selected and model fitted in 0.431 s
Support vector ratio: 0.320
SVR prediction for 100000 inputs in 0.138 s
KRR prediction for 100000 inputs in 0.312 s

Plot Results

In [5]:
def to_plotly(n):
    l = []
    for i in range(len(n)):
        k=float(n[i])
        l.append( k)
    return l
fig = tools.make_subplots(rows=2, cols=2,specs=[[{}, {}],
                                 [{}, None],],
                          subplot_titles=('SVR versus Kernel Ridge', 
                                                         'Execution Time',
                                                          'Learning Curves',))

sv_ind = svr.best_estimator_.support_

trace1 = go.Scatter(x=to_plotly(X[sv_ind]),
                    y=to_plotly(y[sv_ind]), 
                    name='SVR support vectors',
                    mode='markers',
                    marker=dict(
                            color='red', size=10,
                            line=dict(
                                color='black', width=1))
                   )
trace2 = go.Scatter(x = to_plotly(X[:100]),
                    y= to_plotly(y[:100]),
                    name='data', mode='markers', 
                    marker=dict(
                                color='green',size=5,
                                line=dict(
                                        color='black'))
                   )
trace3 = go.Scatter(x=to_plotly(X_plot), y=to_plotly(y_svr),
                    marker=dict(
                            line=dict(color='red')),
                            mode='lines',
                            name='SVR (fit: %.3fs, predict: %.3fs)' % (svr_fit, svr_predict)
                   )
trace4 = go.Scatter(x=to_plotly(X_plot), y=to_plotly(y_kr),
                    marker=dict(
                                line=dict(color='green')),
                    mode='lines',
                    name='KRR (fit: %.3fs, predict: %.3fs)' % (kr_fit, kr_predict)
                   )

fig['layout']['xaxis1'].update(title='data')
fig['layout']['yaxis1'].update(title='target')

for i in [trace1,trace2,trace3,trace4]:
    fig.append_trace(i , 1, 1)

# Visualize training and prediction time

# Generate sample data
X = 5 * rng.rand(10000, 1)
y = np.sin(X).ravel()
y[::5] += 3 * (0.5 - rng.rand(X.shape[0]/5))
sizes = np.logspace(1, 4, 7)
plot2=[]

for name, estimator in {"KRR": KernelRidge(kernel='rbf', alpha=0.1,
                                           gamma=10),
                        "SVR": SVR(kernel='rbf', C=1e1, gamma=10)}.items():
    train_time = []
    test_time = []
    for train_test_size in sizes:
        t0 = time.time()
        estimator.fit(X[:train_test_size], y[:train_test_size])
        train_time.append(time.time() - t0)

        t0 = time.time()
        estimator.predict(X_plot[:1000])
        test_time.append(time.time() - t0)

    trace5 = go.Scatter(x=sizes, y=train_time,
                        mode='lines+markers',
                        marker=dict(
                                color="red" if name == "SVR" else "green"),
                                name="%s (train)" % name)
    trace6 = go.Scatter(x=sizes, y=test_time, 
                        mode='lines+markers',
                        marker=dict(
                                color="red" if name == "SVR" else "green"),
                        line=dict(dash = 'dash'),
                        name="%s (test)" % name)
    plot2.append(trace5)
    plot2.append(trace6)


for i in range(len(plot2)):
     fig.append_trace(plot2[i] , 1, 2)
    
fig['layout']['xaxis2'].update(title='Train size',type='log')
fig['layout']['yaxis2'].update(title='Time (seconds)',type='log')

# Visualize learning curves

svr = SVR(kernel='rbf', C=1e1, gamma=0.1)
kr = KernelRidge(kernel='rbf', alpha=0.1, gamma=0.1)
train_sizes, train_scores_svr, test_scores_svr = \
    learning_curve(svr, X[:100], y[:100], train_sizes=np.linspace(0.1, 1, 10),
                   scoring="neg_mean_squared_error", cv=10)
train_sizes_abs, train_scores_kr, test_scores_kr = \
    learning_curve(kr, X[:100], y[:100], train_sizes=np.linspace(0.1, 1, 10),
                   scoring="neg_mean_squared_error", cv=10)

trace7 = go.Scatter(x=train_sizes, y=-test_scores_svr.mean(1), 
                    mode='lines+markers',
                    marker=dict(color="red",),
                    name="SVR")
trace8 = go.Scatter(x=train_sizes, y=-test_scores_kr.mean(1), 
                    mode='lines+markers',
                    marker=dict(color="green",),
                    name="KRR")

fig.append_trace(trace7 , 2, 1)
fig.append_trace(trace8 , 2, 1)
                    
fig['layout']['xaxis3'].update(title='Train size')
fig['layout']['yaxis3'].update(title='Mean Squared Error')
fig['layout'].update(height = 900)
This is the format of your plot grid:
[ (1,1) x1,y1 ]  [ (1,2) x2,y2 ]
[ (2,1) x3,y3 ]      (empty)    

In [6]:
py.iplot(fig, filename='comparision')
Out[6]:

License

Authors:

        Jan Hendrik Metzen <jhm@informatik.uni-bremen.de>

License:

        BSD 3 clause
Still need help?
Contact Us

For guaranteed 24 hour response turnarounds, upgrade to a Developer Support Plan.