Show Sidebar Hide Sidebar

Class Probabilities Calculated by the VotingClassifier in Scikit-learn

Plot the class probabilities of the first sample in a toy dataset predicted by three different classifiers and averaged by the VotingClassifier.

First, three examplary classifiers are initialized (LogisticRegression, GaussianNB, and RandomForestClassifier) and used to initialize a soft-voting

VotingClassifier with weights [1, 1, 5], which means that the predicted probabilities of the RandomForestClassifier count 5 times as much as the weights of the other classifiers when the averaged probability is calculated.

To visualize the probability weighting, we fit each classifier on the training set and plot the predicted class probabilities for the first sample in this example dataset.

New to Plotly?

Plotly's Python library is free and open source! Get started by downloading the client and reading the primer.
You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

Version

In [1]:
import sklearn
sklearn.__version__
Out[1]:
'0.18.1'

Imports

In [2]:
print(__doc__)

import plotly.plotly as py
import plotly.graph_objs as go

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
Automatically created module for IPython interactive environment

Calculations

In [3]:
clf1 = LogisticRegression(random_state=123)
clf2 = RandomForestClassifier(random_state=123)
clf3 = GaussianNB()
X = np.array([[-1.0, -1.0], [-1.2, -1.4], [-3.4, -2.2], [1.1, 1.2]])
y = np.array([1, 1, 2, 2])

eclf = VotingClassifier(estimators=[('lr', clf1), ('rf', clf2), ('gnb', clf3)],
                        voting='soft',
                        weights=[1, 1, 5])

# predict class probabilities for all classifiers
probas = [c.fit(X, y).predict_proba(X) for c in (clf1, clf2, clf3, eclf)]

# get class probabilities for the first sample in the dataset
class1_1 = [pr[0, 0] for pr in probas]
class2_1 = [pr[0, 1] for pr in probas]

Plot Results

In [4]:
N = 4  # number of groups

x_axis = ['LogisticRegression<br>weight 1',
          'GaussianNB<br>weight 1',
          'RandomForestClassifier<br>weight 5',
          'VotingClassifier<br>(average probabilities)'
         ]

# bars for classifier 1-3
p1 = go.Bar(x=x_axis, y=np.hstack(([class1_1[:-1], [0]])), 
            marker=dict(color='green'),
            name='class1'
           )

p2 = go.Bar(x=x_axis, y=np.hstack(([class2_1[:-1], [0]])), 
            marker=dict(color='lightgreen'),
            name='class2'
           )

# bars for VotingClassifier
p3 = go.Bar(x=x_axis, y=[0, 0, 0, class1_1[-1]], 
            marker=dict(color='blue'),
            showlegend=False
           )
p4 = go.Bar(x=x_axis, y=[0, 0, 0, class2_1[-1]],
            marker=dict(color='steelblue'),
            showlegend=False
           )

layout = go.Layout(title='Class probabilities for sample 1 by different classifiers')

fig = go.Figure(data=[p1, p2, p3, p4], layout=layout)
In [5]:
py.iplot(fig)
Out[5]:
Still need help?
Contact Us

For guaranteed 24 hour response turnarounds, upgrade to a Developer Support Plan.