Show Sidebar Hide Sidebar

Adjustment for chance in Clustering Performance Evaluation in Scikit-learn

The following plots demonstrate the impact of the number of clusters and number of samples on various clustering performance evaluation metrics.

Non-adjusted measures such as the V-Measure show a dependency between the number of clusters and the number of samples: the mean V-Measure of random labeling increases significantly as the number of clusters is closer to the total number of samples used to compute the measure.

Adjusted for chance measure such as ARI display some random variations centered around a mean score of 0.0 for any number of samples and clusters.

Only adjusted measures can hence safely be used as a consensus index to evaluate the average stability of clustering algorithms for a given value of k on various overlapping sub-samples of the dataset.

New to Plotly?

Plotly's Python library is free and open source! Get started by downloading the client and reading the primer.
You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

Version

In [1]:
import sklearn
sklearn.__version__
Out[1]:
'0.18'

Imports

In [2]:
import plotly.plotly as py
import plotly.graph_objs as go

import numpy as np
import matplotlib.pyplot as plt
from time import time
from sklearn import metrics

Calculations

In [3]:
def uniform_labelings_scores(score_func, n_samples, n_clusters_range,
                             fixed_n_classes=None, n_runs=5, seed=42):
    """Compute score for 2 random uniform cluster labelings.

    Both random labelings have the same number of clusters for each value
    possible value in ``n_clusters_range``.

    When fixed_n_classes is not None the first labeling is considered a ground
    truth class assignment with fixed number of classes.
    """
    random_labels = np.random.RandomState(seed).randint
    scores = np.zeros((len(n_clusters_range), n_runs))

    if fixed_n_classes is not None:
        labels_a = random_labels(low=0, high=fixed_n_classes, size=n_samples)

    for i, k in enumerate(n_clusters_range):
        for j in range(n_runs):
            if fixed_n_classes is None:
                labels_a = random_labels(low=0, high=k, size=n_samples)
            labels_b = random_labels(low=0, high=k, size=n_samples)
            scores[i, j] = score_func(labels_a, labels_b)
    return scores

score_funcs = [
    metrics.adjusted_rand_score,
    metrics.v_measure_score,
    metrics.adjusted_mutual_info_score,
    metrics.mutual_info_score,
]

Independent random clusterings with equal cluster number

In [4]:
n_samples = 100
n_clusters_range = np.linspace(2, n_samples, 10).astype(np.int)

plots = []
names = []
for score_func in score_funcs:
    print("Computing %s for %d values of n_clusters and n_samples=%d"
          % (score_func.__name__, len(n_clusters_range), n_samples))

    t0 = time()
    scores = uniform_labelings_scores(score_func, n_samples, n_clusters_range)
    print("done in %0.3fs" % (time() - t0))
    
    plots.append(
            go.Scatter(
                x=n_clusters_range, y=np.median(scores, axis=1),
                name=score_func.__name__, mode='lines',
                error_y=dict(visible=True, arrayminus=scores.std(axis=1)[0]),
                line=dict(width=2))
            )
Computing adjusted_rand_score for 10 values of n_clusters and n_samples=100
done in 0.043s
Computing v_measure_score for 10 values of n_clusters and n_samples=100
done in 0.058s
Computing adjusted_mutual_info_score for 10 values of n_clusters and n_samples=100
done in 0.393s
Computing mutual_info_score for 10 values of n_clusters and n_samples=100
done in 0.049s
In [6]:
layout = go.Layout(title="Clustering measures for 2 random uniform labelings<br>"
                          "with equal number of clusters",
                   hovermode='closest',
                   xaxis=dict(title='Number of clusters (Number of samples is fixed to %d)' 
                                     % n_samples),
                   yaxis=dict(title='Score value', range=[0, 5]))

fig = go.Figure(data=plots, layout=layout)

py.iplot(fig)
Out[6]:

Random labeling with varying n_clusters against ground class labels

In [8]:
n_samples = 1000
n_clusters_range = np.linspace(2, 100, 10).astype(np.int)
n_classes = 10

plots = []
names = []
for score_func in score_funcs:
    print("Computing %s for %d values of n_clusters and n_samples=%d"
          % (score_func.__name__, len(n_clusters_range), n_samples))

    t0 = time()
    scores = uniform_labelings_scores(score_func, n_samples, n_clusters_range,
                                      fixed_n_classes=n_classes)
    print("done in %0.3fs" % (time() - t0))
    plots.append(
            go.Scatter(
                x=n_clusters_range, y=np.median(scores, axis=1),
                name=score_func.__name__, mode='lines',
                error_y=dict(visible=True, arrayminus=scores.std(axis=1)[0]),
                line=dict(width=2))
            )
Computing adjusted_rand_score for 10 values of n_clusters and n_samples=1000
done in 0.069s
Computing v_measure_score for 10 values of n_clusters and n_samples=1000
done in 0.061s
Computing adjusted_mutual_info_score for 10 values of n_clusters and n_samples=1000
done in 0.235s
Computing mutual_info_score for 10 values of n_clusters and n_samples=1000
done in 0.047s
In [9]:
layout = go.Layout(title="Clustering measures for random uniform labeling<br>"
                          "against reference assignment with %d classes" % n_classes,
                   hovermode='closest',
                   xaxis=dict(title='Number of clusters (Number of samples is fixed to %d)' 
                                     % n_samples),
                   yaxis=dict(title='Score value'))

fig = go.Figure(data=plots, layout=layout)

py.iplot(fig)
Out[9]:

License

Author:

    Olivier Grisel <olivier.grisel@ensta.org>

License:

    BSD 3 clause
Still need help?
Contact Us

For guaranteed 24 hour response turnarounds, upgrade to a Developer Support Plan.