Show Sidebar Hide Sidebar

# Polynomial Interpolation in Scikit-learn

This example demonstrates how to approximate a function with a polynomial of degree n_degree by using ridge regression. Concretely, from n_samples 1d points, it suffices to build the Vandermonde matrix, which is n_samples x n_degree+1 and has the following form:

[[1, x_1, x_1 2, x_1 3, ...],

[1, x_2, x_2 ** 2, x_2 ** 3, ...], ...]



Intuitively, this matrix can be interpreted as a matrix of pseudo features (the points raised to some power). The matrix is akin to (but different from) the matrix induced by a polynomial kernel. This example shows that you can do non-linear regression with a linear model, using a pipeline to add non-linear features. Kernel methods extend this idea and can induce very high (even infinite) dimensional feature spaces.

#### New to Plotly?¶

You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

### Version¶

In [1]:
import sklearn
sklearn.__version__

Out[1]:
'0.18.1'

### Imports¶

This tutorial imports Ridge, PolynomialFeatures and make_pipeline.

In [2]:
import plotly.plotly as py
import plotly.graph_objs as go

import numpy as np
from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline


### Calculations¶

In [3]:
def f(x):
""" function to approximate by polynomial interpolation"""
return x * np.sin(x)

# generate points used to plot
x_plot = np.linspace(0, 10, 100)

# generate points and keep a subset of them
x = np.linspace(0, 10, 100)
rng = np.random.RandomState(0)
rng.shuffle(x)
x = np.sort(x[:20])
y = f(x)

# create matrix versions of these arrays
X = x[:, np.newaxis]
X_plot = x_plot[:, np.newaxis]

colors = ['teal', 'yellowgreen', 'gold']
lw = 2


### Plot Results¶

In [4]:
data = []

p1 = go.Scatter(x=x_plot, y=f(x_plot),
mode='lines',
line=dict(color='cornflowerblue', width=lw),
name="ground truth")

p2 = go.Scatter(x=x, y=y,
mode='markers',
marker=dict(color='navy',
line=dict(color='black', width=1)),
name="training points")
data.append(p1)
data.append(p2)

for count, degree in enumerate([3, 4, 5]):
model = make_pipeline(PolynomialFeatures(degree), Ridge())
model.fit(X, y)
y_plot = model.predict(X_plot)
p3 = go.Scatter(x=x_plot, y=y_plot,
mode='lines',
line=dict(color=colors[count], width=lw),
name="degree %d" % degree)
data.append(p3)

layout = go.Layout(xaxis=dict(zeroline=False),
yaxis=dict(zeroline=False))
fig = go.Figure(data=data, layout=layout)

In [5]:
py.iplot(fig)

Out[5]:

Author:

    Mathieu Blondel

Jake Vanderplas



    BSD 3 clause