Show Sidebar Hide Sidebar

Kernel Density Estimate of Species Distributions in Scikit-learn

This shows an example of a neighbors-based query (in particular a kernel density estimate) on geospatial data, using a Ball Tree built upon the Haversine distance metric – i.e. distances over points in latitude/longitude. The dataset is provided by Phillips et. al. (2006). If available, the example uses basemap to plot the coast lines and national boundaries of South America.

This example does not perform any learning over the data (see Species distribution modeling for an example of classification based on the attributes in this dataset). It simply shows the kernel density estimate of observed data points in geospatial coordinates.

The two species are:

Version

In [1]:
import sklearn
sklearn.__version__
Out[1]:
'0.18.1'

Imports

This tutorial imports fetch_species_distributions and KernelDensity.

In [2]:
import plotly.plotly as py
import plotly.graph_objs as go
from plotly import tools

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_species_distributions
from sklearn.datasets.species_distributions import construct_grids
from sklearn.neighbors import KernelDensity

Calculations

In [3]:
# Get matrices/arrays of species IDs and locations
data = fetch_species_distributions()
species_names = ['Bradypus Variegatus', 'Microryzomys Minutus']

Xtrain = np.vstack([data['train']['dd lat'],
                    data['train']['dd long']]).T
ytrain = np.array([d.decode('ascii').startswith('micro')
                  for d in data['train']['species']], dtype='int')
Xtrain *= np.pi / 180.  # Convert lat/long to radians

# Set up the data grid for the contour plot
xgrid, ygrid = construct_grids(data)
X, Y = np.meshgrid(xgrid[::5], ygrid[::5][::-1])
land_reference = data.coverages[6][::5, ::5]
land_mask = (land_reference > -9999).ravel()

xy = np.vstack([Y.ravel(), X.ravel()]).T
xy = xy[land_mask]
xy *= np.pi / 180.

Plot Results

In [4]:
fig = tools.make_subplots(rows=1, cols=2,
                          subplot_titles=tuple(species_names),
                          print_grid=False)
In [5]:
for i in range(2):
    # construct a kernel density estimate of the distribution
    print(" - computing KDE in spherical coordinates")
    kde = KernelDensity(bandwidth=0.04, metric='haversine',
                        kernel='gaussian', algorithm='ball_tree')
    kde.fit(Xtrain[ytrain == i])

    # evaluate only on the land: -9999 indicates ocean
    Z =  np.zeros(land_mask.shape[0])
    Z[land_mask] = np.exp(kde.score_samples(xy))
    Z = Z.reshape(X.shape)
    
    # plot contours of the density
    trace1 = go.Contour(x=xgrid[::5],
                        y=ygrid[::5][::-1],
                        z=Z,
                        contours=dict(coloring='heatmap',
                                      start=0, end=Z.max(),
                                      size=25),
                        colorscale=[[0, 'white'],[1, 'red']], showscale=False,
                        line=dict(width=0))
    fig.append_trace(trace1, 1, i+1)
    print(" - plot coastlines from coverage")
    
    trace2 = go.Contour(x=xgrid[::5],
                        y=ygrid[::5][::-1], 
                        z=land_reference,
                        contours=dict(coloring='lines',
                                      start=-9999, end=-9998,
                                      size=2),
                        showscale=False,
                        colorscale=[[0, 'black'],[1, 'white']])
    fig.append_trace(trace2, 1, i+1)
  
 - computing KDE in spherical coordinates
 - plot coastlines from coverage
 - computing KDE in spherical coordinates
 - plot coastlines from coverage
In [6]:
for i in map(str,range(1,3)):
        y = 'yaxis' + i
        x = 'xaxis' + i
        fig['layout'][y].update(showticklabels=False, ticks='')
        fig['layout'][x].update(showticklabels=False, ticks='')
fig['layout'].update(hovermode='closest')
        
py.iplot(fig)
Out[6]:

License

Author:

    Jake Vanderplas <jakevdp@cs.washington.edu>

License:

    BSD 3 clause
Still need help?
Contact Us

For guaranteed 24 hour response turnarounds, upgrade to a Developer Support Plan.