Show Sidebar Hide Sidebar

Polynomial Interpolation in Scikit-learn

This example demonstrates how to approximate a function with a polynomial of degree n_degree by using ridge regression. Concretely, from n_samples 1d points, it suffices to build the Vandermonde matrix, which is n_samples x n_degree+1 and has the following form:

[[1, x_1, x_1 2, x_1 3, ...],

[1, x_2, x_2 ** 2, x_2 ** 3, ...], ...]


Intuitively, this matrix can be interpreted as a matrix of pseudo features (the points raised to some power). The matrix is akin to (but different from) the matrix induced by a polynomial kernel. This example shows that you can do non-linear regression with a linear model, using a pipeline to add non-linear features. Kernel methods extend this idea and can induce very high (even infinite) dimensional feature spaces.

New to Plotly?

Plotly's Python library is free and open source! Get started by downloading the client and reading the primer.
You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

Version

In [1]:
import sklearn
sklearn.__version__
Out[1]:
'0.18.1'

Imports

This tutorial imports Ridge, PolynomialFeatures and make_pipeline.

In [2]:
import plotly.plotly as py
import plotly.graph_objs as go

import numpy as np
from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

Calculations

In [3]:
def f(x):
    """ function to approximate by polynomial interpolation"""
    return x * np.sin(x)


# generate points used to plot
x_plot = np.linspace(0, 10, 100)

# generate points and keep a subset of them
x = np.linspace(0, 10, 100)
rng = np.random.RandomState(0)
rng.shuffle(x)
x = np.sort(x[:20])
y = f(x)

# create matrix versions of these arrays
X = x[:, np.newaxis]
X_plot = x_plot[:, np.newaxis]

colors = ['teal', 'yellowgreen', 'gold']
lw = 2

Plot Results

In [4]:
data = []

p1 = go.Scatter(x=x_plot, y=f(x_plot), 
                mode='lines',
                line=dict(color='cornflowerblue', width=lw),
                name="ground truth")

p2 = go.Scatter(x=x, y=y,
                mode='markers',
                marker=dict(color='navy',
                            line=dict(color='black', width=1)),
                name="training points")
data.append(p1)
data.append(p2)

for count, degree in enumerate([3, 4, 5]):
    model = make_pipeline(PolynomialFeatures(degree), Ridge())
    model.fit(X, y)
    y_plot = model.predict(X_plot)
    p3 = go.Scatter(x=x_plot, y=y_plot, 
                    mode='lines',
                    line=dict(color=colors[count], width=lw),
                    name="degree %d" % degree)
    data.append(p3)

layout = go.Layout(xaxis=dict(zeroline=False),
                   yaxis=dict(zeroline=False))
fig = go.Figure(data=data, layout=layout)
In [5]:
py.iplot(fig)
Out[5]:

License

Author:

    Mathieu Blondel

    Jake Vanderplas

License:

    BSD 3 clause
Still need help?
Contact Us

For guaranteed 24 hour response turnarounds, upgrade to a Developer Support Plan.