Show Sidebar Hide Sidebar

# Linear Regression in Scikit-learn

This example uses the only the first feature of the diabetes dataset, in order to illustrate a two-dimensional plot of this regression technique. The straight line can be seen in the plot, showing how linear regression attempts to draw a straight line that will best minimize the residual sum of squares between the observed responses in the dataset, and the responses predicted by the linear approximation.

The coefficients, the residual sum of squares and the variance score are also calculated.

#### New to Plotly?¶

You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

### Version¶

In [1]:
import sklearn
sklearn.__version__

Out[1]:
'0.18.1'

### Imports¶

In [2]:
import plotly.plotly as py
import plotly.graph_objs as go

import numpy as np
from sklearn import datasets, linear_model


### Calculations¶

In [3]:
# Load the diabetes dataset

# Use only one feature
diabetes_X = diabetes.data[:, np.newaxis, 2]

# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]

# Split the targets into training/testing sets
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

# Create linear regression object
regr = linear_model.LinearRegression()

# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)

# The coefficients
print('Coefficients: \n', regr.coef_)
# The mean squared error
print("Mean squared error: %.2f"
% np.mean((regr.predict(diabetes_X_test) - diabetes_y_test) ** 2))
# Explained variance score: 1 is perfect prediction
print('Variance score: %.2f' % regr.score(diabetes_X_test, diabetes_y_test))

('Coefficients: \n', array([ 938.23786125]))
Mean squared error: 2548.07
Variance score: 0.47


### Plot Results¶

In [4]:
def data_to_plotly(x):
k = []

for i in range(0, len(x)):
k.append(x[i][0])

return k

In [5]:
p1 = go.Scatter(x=data_to_plotly(diabetes_X_test),
y=diabetes_y_test,
mode='markers',
marker=dict(color='black')
)

p2 = go.Scatter(x=data_to_plotly(diabetes_X_test),
y=regr.predict(diabetes_X_test),
mode='lines',
line=dict(color='blue', width=3)
)

layout = go.Layout(xaxis=dict(ticks='', showticklabels=False,
zeroline=False),
yaxis=dict(ticks='', showticklabels=False,
zeroline=False),
showlegend=False, hovermode='closest')

fig = go.Figure(data=[p1, p2], layout=layout)

py.iplot(fig)

Out[5]:

Code source:

        Jaques Grobler



        BSD 3 clause