Show Sidebar Hide Sidebar

Cufflinks in Python

An overview of cufflinks, a library for easy interactive Pandas charting with Plotly.

Cufflinks binds Plotly directly to pandas dataframes.

In [1]:
import plotly.tools as tls
tls.embed('https://plot.ly/~cufflinks/8')
Out[1]:

Packages

Run ! pip install cufflinks --upgrade to install Cufflinks. In addition to Plotly, pandas and Cufflinks, this tutorial will also use NumPy.

In [2]:
import plotly.plotly as py
import cufflinks as cf
import pandas as pd
import numpy as np
print cf.__version__
0.8.2

Dataframes

With Plotly's Python library, you can describe figures with DataFrame's series and index's

In [3]:
df = cf.datagen.lines()

py.iplot([{
    'x': df.index,
    'y': df[col],
    'name': col
}  for col in df.columns], filename='cufflinks/simple-line')
Out[3]:

But with cufflinks, you can plot them directly

In [5]:
df.iplot(kind='scatter', filename='cufflinks/cf-simple-line')
Out[5]:

Almost every chart that you make in cufflinks will be created with just one line of code.

In [6]:
df = pd.DataFrame(np.random.randn(1000, 4), columns=['a', 'b', 'c', 'd'])
df.scatter_matrix(filename='cufflinks/scatter-matrix', world_readable=True)
Out[6]:

Charts created with cufflinks are synced with your online Plotly account. You'll need to configure your credentials to get started. cufflinks can also be configured to work offline in IPython notebooks with Plotly Offline. To get started with Plotly Offline, download a trial library and run cf.go_offline().

In [14]:
cf.go_online() # switch back to online mode, where graphs are saved on your online plotly account

By default, plotly graphs are public. Make them private by setting world_readable to False

In [15]:
df.a.iplot(kind='histogram', world_readable=False)
Out[15]:

Only you (the creator) will be able to see this chart, or change the global, default settings with cf.set_config_file

In [16]:
cf.set_config_file(offline=False, world_readable=True, theme='ggplot')

Chart Types

Line Charts
In [17]:
df = pd.DataFrame(np.random.randn(1000, 2), columns=['A', 'B']).cumsum()
df.iplot(filename='cufflinks/line-example')
Out[17]:

Plot one column vs another with x and y keywords

In [18]:
df.iplot(x='A', y='B', filename='cufflinks/x-vs-y-line-example')
Out[18]:
Bar Charts

Download some civic data. A time series log of the 311 complaints in NYC.

In [ ]:
df = pd.read_csv('https://raw.githubusercontent.com/plotly/widgets/master/ipython-examples/311_150k.csv', parse_dates=True, index_col=1)
df.head(3)
In [ ]:
series = df['Complaint Type'].value_counts()[:20]
series.head(3)

Plot a series directly

In [18]:
series.iplot(kind='bar', yTitle='Number of Complaints', title='NYC 311 Complaints',
             filename='cufflinks/categorical-bar-chart')
Out[18]:

Plot a dataframe row as a bar

In [19]:
df = pd.DataFrame(np.random.rand(10, 4), columns=['A', 'B', 'C', 'D'])
row = df.ix[5]
row.iplot(kind='bar', filename='cufflinks/bar-chart-row')
Out[19]:

Call iplot(kind='bar') on a dataframe to produce a grouped bar chart

In [20]:
df.iplot(kind='bar', filename='cufflinks/grouped-bar-chart')
Out[20]:
In [21]:
df.iplot(kind='bar', barmode='stack', filename='cufflinks/grouped-bar-chart')
Out[21]:

Remember: plotly charts are interactive. Click on the legend entries to hide-and-show traces, click-and-drag to zoom, double-click to autoscale, shift-click to drag.

Click on legend entries to hide and show traces

Make your bar charts horizontal with kind='barh'

In [22]:
df.iplot(kind='barh',barmode='stack', bargap=.1, filename='cufflinks/barh')
Out[22]:
Themes

cufflinks ships with a few themes. View available themes with cf.getThemes, apply them with cf.set_config_file

In [23]:
cf.getThemes()
Out[23]:
['pearl', 'white', 'ggplot', 'solar', 'space']
In [24]:
cf.set_config_file(theme='pearl')
Histograms
In [4]:
df = pd.DataFrame({'a': np.random.randn(1000) + 1,
                   'b': np.random.randn(1000),
                   'c': np.random.randn(1000) - 1})

df.iplot(kind='histogram', filename='cufflinks/basic-histogram')
Out[4]:

Customize your histogram with

  • barmode (overlay | group | stack)
  • bins (int)
  • histnorm ('' | 'percent' | 'probability' | 'density' | 'probability density')
  • histfunc ('count' | 'sum' | 'avg' | 'min' | 'max')
In [27]:
df.iplot(kind='histogram', barmode='stack', bins=100, histnorm='probability', filename='cufflinks/customized-histogram')
Out[27]:

Like every chart type, split your traces into subplots or small-multiples with subplots and optionally shape. More on subplots below.

In [28]:
df.iplot(kind='histogram', subplots=True, shape=(3, 1), filename='cufflinks/histogram-subplots')
Out[28]:
Box Plots
In [29]:
df = pd.DataFrame(np.random.rand(10, 5), columns=['A', 'B', 'C', 'D', 'E'])
df.iplot(kind='box', filename='cufflinks/box-plots')
Out[29]:

Area Charts

To produce stacked area plot, each column must be either all positive or all negative values.

When input data contains NaN, it will be automatically filled by 0. If you want to drop or fill by different values, use dataframe.dropna() or dataframe.fillna() before calling plot.

In [30]:
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
In [31]:
df.iplot(kind='area', fill=True, filename='cuflinks/stacked-area')
Out[31]:

For non-stacked area charts, set kind=scatter with fill=True

In [32]:
df.iplot(fill=True, filename='cuflinks/filled-area')
Out[32]:

Scatter Plot

Set x and y as column names. If x isn't supplied, df.index will be used.

In [5]:
import pandas as pd
df = pd.read_csv('http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt', sep='\t')
df2007 = df[df.year==2007]
df1952 = df[df.year==1952]

df2007.iplot(kind='scatter', mode='markers', x='gdpPercap', y='lifeExp', filename='cufflinks/simple-scatter')
Out[5]:

Plotting multiple column scatter plots isn't as easy with cufflinks. Here is an example with Plotly's native syntax

In [35]:
fig = {
    'data': [
        {'x': df2007.gdpPercap, 'y': df2007.lifeExp, 'text': df2007.country, 'mode': 'markers', 'name': '2007'},
        {'x': df1952.gdpPercap, 'y': df1952.lifeExp, 'text': df1952.country, 'mode': 'markers', 'name': '1952'}
    ],
    'layout': {
        'xaxis': {'title': 'GDP per Capita', 'type': 'log'},
        'yaxis': {'title': "Life Expectancy"}
    }
}
py.iplot(fig, filename='cufflinks/multiple-scatter')
Out[35]:

Grouping isn't as easy either. But, with Plotly's native syntax:

In [36]:
py.iplot(
    {
        'data': [
            {
                'x': df[df['year']==year]['gdpPercap'],
                'y': df[df['year']==year]['lifeExp'],
                'name': year, 'mode': 'markers',
            } for year in [1952, 1982, 2007]
        ],
        'layout': {
            'xaxis': {'title': 'GDP per Capita', 'type': 'log'},
            'yaxis': {'title': "Life Expectancy"}
        }
}, filename='cufflinks/scatter-group-by')
Out[36]:

Bubble Charts

Add size to create a bubble chart. Add hover text with the text attribute.

In [37]:
df2007.iplot(kind='bubble', x='gdpPercap', y='lifeExp', size='pop', text='country',
             xTitle='GDP per Capita', yTitle='Life Expectancy',
             filename='cufflinks/simple-bubble-chart')
Out[37]:

Subplots

subplots=True partitions columns into separate subplots. Specify rows and columns with shape=(rows, cols) and share axes with shared_xaxes=True and shared_yaxes=True.

In [38]:
df=cf.datagen.lines(4)
df.iplot(subplots=True, shape=(4,1), shared_xaxes=True, fill=True, filename='cufflinks/simple-subplots')
Out[38]:

Add subplot titles with subplot_titles as a list of titles or True to use column names.

In [39]:
df.iplot(subplots=True, subplot_titles=True, legend=False)
Out[39]:

Scatter matrix

In [40]:
df.scatter_matrix(filename='cufflinks/scatter-matrix-subplot', world_readable=True)
Out[40]:

Heatmaps

In [41]:
cf.datagen.heatmap(20,20).iplot(kind='heatmap',colorscale='spectral',
                                filename='cufflinks/simple-heatmap')
Out[41]:

Lines and Shaded Areas

Use hline and vline for horizontal and vertical lines.

In [42]:
df=cf.datagen.lines(3,columns=['a','b','c'])
In [43]:
df.iplot(hline=[2,4],vline=['2015-02-10'])
Out[43]:

Draw shaded regions with hspan

In [44]:
df.iplot(hspan=[(-1,1),(2,5)], filename='cufflinks/shaded-regions')
Out[44]:

Extra parameters can be passed in the form of dictionaries, width, fill, color, fillcolor, opacity

In [45]:
df.iplot(vspan={'x0':'2015-02-15','x1':'2015-03-15','color':'rgba(30,30,30,0.3)','fill':True,'opacity':.4}, 
         filename='cufflinks/custom-regions')
Out[45]:

Customizing Figures

cufflinks is designed for simple one-line charting with Pandas and Plotly. All of the Plotly chart attributes are not directly assignable in the df.iplot call signature.

To update attributes of a cufflinks chart that aren't available, first convert it to a figure (asFigure=True), then tweak it, then plot it with plotly.plotly.iplot.

Here is an example of a simple plotly figure. You can find more examples in our online python documentation.

In [46]:
from plotly.graph_objs import *
py.iplot({
    'data': [
        Bar(**{
            'x': [1, 2, 3],
            'y': [3, 1, 5],
            'name': 'first trace',
            'type': 'bar'
        }),
        Bar(**{
            'x': [1, 2, 3],
            'y': [4, 3, 6],
            'name': 'second trace',
            'type': 'bar'
        })
    ],
    'layout': Layout(**{
        'title': 'simple example'
    })
}, filename='cufflinks/simple-plotly-example')
Out[46]:

cufflinks generates these figure's that describe plotly graphs. For example, this graph:

In [48]:
df.iplot(kind='scatter', filename='cufflinks/simple-scatter-example')
Out[48]:

has this description:

In [49]:
figure = df.iplot(kind='scatter', asFigure=True)
print figure.to_string()
Figure(
    data=Data([
        Scatter(
            x=['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04', '..'  ],
            y=array([  5.35393544e-01,  -3.51020567e-01,  -1.34207933e+00,
 ..,
            mode='lines',
            name='a',
            line=Line(
                color='rgba(255, 153, 51, 1.0)',
                width='1.3'
            )
        ),
        Scatter(
            x=['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04', '..'  ],
            y=array([ -2.58404773,  -1.91629648,  -1.88997988,  -1.09846618,..,
            mode='lines',
            name='b',
            line=Line(
                color='rgba(55, 128, 191, 1.0)',
                width='1.3'
            )
        ),
        Scatter(
            x=['2015-01-01', '2015-01-02', '2015-01-03', '2015-01-04', '..'  ],
            y=array([ 0.46611148,  1.06107695,  1.06206594, -0.56030965, -0...,
            mode='lines',
            name='c',
            line=Line(
                color='rgba(50, 171, 96, 1.0)',
                width='1.3'
            )
        )
    ]),
    layout=Layout(
        legend=Legend(
            font=Font(
                color='#4D5663'
            ),
            bgcolor='#F5F6F9'
        ),
        paper_bgcolor='#F5F6F9',
        plot_bgcolor='#F5F6F9',
        xaxis1=XAxis(
            title='',
            titlefont=Font(
                color='#4D5663'
            ),
            tickfont=Font(
                color='#4D5663'
            ),
            gridcolor='#E1E5ED',
            zerolinecolor='#E1E5ED'
        ),
        yaxis1=YAxis(
            title='',
            titlefont=Font(
                color='#4D5663'
            ),
            zeroline=False,
            tickfont=Font(
                color='#4D5663'
            ),
            gridcolor='#E1E5ED',
            zerolinecolor='#E1E5ED'
        )
    )
)

So, if you want to edit any attribute of a Plotly graph from cufflinks, first convert it to a figure and then edit the figure objects. Let's add a yaxis title, tick suffixes, and new legend names to this example:

In [50]:
figure['layout']['yaxis1'].update({'title': 'Price', 'tickprefix': '$'})
for i, trace in enumerate(figure['data']):
    trace['name'] = 'Trace {}'.format(i)
    
py.iplot(figure, filename='cufflinks/customized-chart')
Out[50]:

Cufflinks is open source on github!

In [52]:
help(df.iplot)
Help on method _iplot in module cufflinks.plotlytools:

_iplot(self, data=None, layout=None, filename='', world_readable=None, kind='scatter', title='', xTitle='', yTitle='', zTitle='', theme=None, colors=None, colorscale=None, fill=False, width=None, mode='lines', symbol='dot', size=12, barmode='', sortbars=False, bargap=None, bargroupgap=None, bins=None, histnorm='', histfunc='count', orientation='v', boxpoints=False, annotations=None, keys=False, bestfit=False, bestfit_colors=None, categories='', x='', y='', z='', text='', gridcolor=None, zerolinecolor=None, margin=None, subplots=False, shape=None, asFrame=False, asDates=False, asFigure=False, asImage=False, dimensions=(1116, 587), asPlot=False, asUrl=False, online=None, **kwargs) method of pandas.core.frame.DataFrame instance
       Returns a plotly chart either as inline chart, image of Figure object
    
       Parameters:
       -----------
           data : Data
               Plotly Data Object.
               If not entered then the Data object will be automatically
               generated from the DataFrame.
           data : Data
               Plotly Data Object.
               If not entered then the Data object will be automatically
               generated from the DataFrame.
           layout : Layout
               Plotly layout Object
               If not entered then the Layout objet will be automatically
               generated from the DataFrame.
           filename : string
               Filename to be saved as in plotly account
           world_readable : bool
               If False then it will be saved as a private file
           kind : string
               Kind of chart
                   scatter
                   bar
                   box
                   spread
                   ratio
                   heatmap
                   surface
                   histogram
                   bubble
                   bubble3d
                   scatter3d
           title : string
               Chart Title
           xTitle : string
               X Axis Title
           yTitle : string
               Y Axis Title
                   zTitle : string
           zTitle : string
               Z Axis Title
               Applicable only for 3d charts
           theme : string
               Layout Theme
                   solar
                   pearl
                   white
               see cufflinks.getThemes() for all
               available themes
           colors : list or dict
               {key:color} to specify the color for each column
               [colors] to use the colors in the defined order
           colorscale : str
               Color scale name
               If the color name is preceded by a minus (-)
               then the scale is inversed
               Only valid if 'colors' is null
               See cufflinks.colors.scales() for available scales
           fill : bool
               Filled Traces
           width : int
               Line width
           mode : string
               Plotting mode for scatter trace
                   lines
                   markers
                   lines+markers
                   lines+text
                   markers+text
                   lines+markers+text
           symbol : string
               The symbol that is drawn on the plot for each marker
               Valid only when mode includes markers
                   dot
                   cross
                   diamond
                   square
                   triangle-down
                   triangle-left
                   triangle-right
                   triangle-up
                   x
           size : string or int
               Size of marker
               Valid only if marker in mode
           barmode : string
               Mode when displaying bars
                   group
                   stack
                   overlay
               * Only valid when kind='bar'
           sortbars : bool
               Sort bars in descending order
               * Only valid when kind='bar'
           bargap : float
               Sets the gap between bars
                   [0,1)
               * Only valid when kind is 'histogram' or 'bar'
           bargroupgap : float
               Set the gap between groups
                   [0,1)
               * Only valid when kind is 'histogram' or 'bar'
           bins : int
               Specifies the number of bins
               * Only valid when kind='histogram'
           histnorm : string
                   '' (frequency)
                   percent
                   probability
                   density
                   probability density
               Sets the type of normalization for an histogram trace. By default
               the height of each bar displays the frequency of occurrence, i.e.,
               the number of times this value was found in the
               corresponding bin. If set to 'percent', the height of each bar
               displays the percentage of total occurrences found within the
               corresponding bin. If set to 'probability', the height of each bar
               displays the probability that an event will fall into the
               corresponding bin. If set to 'density', the height of each bar is
               equal to the number of occurrences in a bin divided by the size of
               the bin interval such that summing the area of all bins will yield
               the total number of occurrences. If set to 'probability density',
               the height of each bar is equal to the number of probability that an
               event will fall into the corresponding bin divided by the size of
               the bin interval such that summing the area of all bins will yield
               1.
               * Only valid when kind='histogram'
           histfunc : string
                   count
                   sum
                   avg
                   min
                   max
              Sets the binning function used for an histogram trace.
               * Only valid when kind='histogram'
           orientation : string
                   h
                   v
               Sets the orientation of the bars. If set to 'v', the length of each
    |          bar will run vertically. If set to 'h', the length of each bar will
    |          run horizontally
               * Only valid when kind is 'histogram','bar' or 'box'
           boxpoints : string
               Displays data points in a box plot
                   outliers
                   all
                   suspectedoutliers
                   False
           annotations : dictionary
               Dictionary of annotations
               {x_point : text}
           keys : list of columns
               List of columns to chart.
               Also can be usded for custom sorting.
           bestfit : boolean or list
               If True then a best fit line will be generated for
               all columns.
               If list then a best fit line will be generated for
               each key on the list.
           bestfit_colors : list or dict
               {key:color} to specify the color for each column
               [colors] to use the colors in the defined order
           categories : string
               Name of the column that contains the categories
           x : string
               Name of the column that contains the x axis values
           y : string
               Name of the column that contains the y axis values
           z : string
               Name of the column that contains the z axis values
           text : string
               Name of the column that contains the text values
           gridcolor : string
               Grid color
           zerolinecolor : string
               Zero line color
           margin : dict or tuple
               Dictionary (l,r,b,t) or
               Tuple containing the left,
               right, bottom and top margins
           subplots : bool
               If true then each trace is placed in
               subplot layout
           shape : (rows,cols)
               Tuple indicating the size of rows and columns
               If omitted then the layout is automatically set
               * Only valid when subplots=True
           asFrame : bool
               If true then the data component of Figure will
               be of Pandas form (Series) otherwise they will
               be index values
           asDates : bool
               If true it truncates times from a DatetimeIndex
           asFigure : bool
               If True returns plotly Figure
           asImage : bool
               If True it returns Image
               * Only valid when asImage=True
           dimensions : tuple(int,int)
               Dimensions for image
                   (width,height)
           asPlot : bool
               If True the chart opens in browser
           asUrl : bool
               If True the chart url is returned. No chart is displayed.
           online : bool
               If True then the chart is rendered on the server
               even when running in offline mode.

Still need help?
Contact Us

For guaranteed 24 hour response turnarounds, upgrade to a Developer Support Plan.