Show Sidebar Hide Sidebar

Jupyter Notebook Tutorial in Python

Jupyter notebook tutorial on how to install, run, and use IPython for interactive matplotlib plotting, data analysis, and publishing code

Introduction

Jupyter has a beautiful notebook that lets you write and execute code, analyze data, embed content, and share reproducible work. Jupyter Notebook (previously referred to as IPython Notebook) allows you to easily share your code, data, plots, and explanation in a sinle notebook. Publishing is flexible: PDF, HTML, ipynb, dashboards, slides, and more. Code cells are based on an input and output format. For example:

In [2]:
print "hello world"  
hello world

Installation

There are a few ways to use a Jupyter Notebook:

  • Install with pip. Open a terminal and type: $ pip install jupyter.
  • Windows users can install with setuptools.
  • Anaconda and Enthought allow you to download a desktop version of Jupyter Notebook.
  • nteract allows users to work in a notebook enviornment via a desktop application.
  • Microsoft Azure provides hosted access to Jupyter Notebooks.
  • Domino Data Lab offers web-based Notebooks.
  • tmpnb launches a temporary online Notebook for individual users.

Getting Started

Once you've installed the Notebook, you start from your terminal by calling $ jupyter notebook. This will open a browser on a localhost to the URL of your Notebooks, by default http://127.0.0.1:8888. Windows users need to open up their Command Prompt. You'll see a dashboard with all your Notebooks. You can launch your Notebooks from there. The Notebook has the advantage of looking the same when you're coding and publishing. You just have all the options to move code, run cells, change kernels, and use Markdown when you're running a NB.

Helpful Commands

- Tab Completion: Jupyter supports tab completion! You can type object_name.<TAB> to view an object’s attributes. For tips on cell magics, running Notebooks, and exploring objects, check out the Jupyter docs.
- Help: provides an introduction and overview of features.

In [3]:
help
Out[3]:
Type help() for interactive help, or help(object) for help about object.

- Quick Reference: open quick reference by running:

In [4]:
quickref

- Keyboard Shortcuts: Shift-Enter will run a cell, Ctrl-Enter will run a cell in-place, Alt-Enter will run a cell and insert another below. See more shortcuts here.

Languages

The bulk of this tutorial discusses executing python code in Jupyter notebooks. You can also use Jupyter notebooks to execute R code. Skip down to the [R section] for more information on using IRkernel with Jupyter notebooks and graphing examples.

Package Management

When installing packages in Jupyter, you either need to install the package in your actual shell, or run the ! prefix, e.g.:

!pip install packagename

You may want to reload submodules if you've edited the code in one. IPython comes with automatic reloading magic. You can reload all changed modules before executing a new line.

%load_ext autoreload
%autoreload 2

Some useful packages that we'll use in this tutorial include:

  • Pandas: import data via a url and create a dataframe to easily handle data for analysis and graphing. See examples of using Pandas here: https://plot.ly/pandas/.
  • NumPy: a package for scientific computing with tools for algebra, random number generation, integrating with databases, and managing data. See examples of using NumPy here: https://plot.ly/numpy/.
  • SciPy: a Python-based ecosystem of packages for math, science, and engineering.
  • Plotly: a graphing library for making interactive, publication-quality graphs. See examples of statistic, scientific, 3D charts, and more here: https://plot.ly/python.
In [ ]:
import pandas as pd
import numpy as np
import scipy as sp
import plotly.plotly as py

Import Data

You can use pandas read_csv() function to import data. In the example below, we import a csv hosted on github and display it in a table using Plotly:

In [6]:
import plotly.plotly as py
import plotly.figure_factory as ff
import pandas as pd

df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/school_earnings.csv")

table = ff.create_table(df)
py.iplot(table, filename='jupyter/table1')
Out[6]:

Use dataframe.column_title to index the dataframe:

In [7]:
schools = df.School
schools[0]
Out[7]:
'MIT'

Most pandas functions also work on an entire dataframe. For example, calling std() calculates the standard deviation for each column.

In [8]:
df.std()
Out[8]:
Women    12.813683
Men      25.705289
gap      14.137084
dtype: float64

Plotting Inline

You can use Plotly's python API to plot inside your Jupyter Notebook by calling plotly.plotly.iplot() or plotly.offline.iplot() if working offline. Plotting in the notebook gives you the advantage of keeping your data analysis and plots in one place. Now we can do a bit of interactive plotting. Head to the Plotly getting started page to learn how to set your credentials. Calling the plot with iplot automaticallly generates an interactive version of the plot inside the Notebook in an iframe. See below:

In [9]:
import plotly.plotly as py
from plotly.graph_objs import *

data = [Bar(x=df.School,
            y=df.gap)]

py.iplot(data, filename='jupyter/basic_bar')
Out[9]:

Plotting multiple traces and styling the chart with custom colors and titles is simple with Plotly syntax. Additionally, you can control the privacy with sharing set to public, private, or secret.

In [10]:
import plotly.plotly as py
from plotly.graph_objs import *

trace_women = Bar(x=df.School,
                  y=df.Women,
                  name='Women',
                  marker=dict(color='#ffcdd2'))

trace_men = Bar(x=df.School,
                y=df.Men,
                name='Men',
                marker=dict(color='#A2D5F2'))

trace_gap = Bar(x=df.School,
                y=df.gap,
                name='Gap',
                marker=dict(color='#59606D'))

data = [trace_women, trace_men, trace_gap]
layout = Layout(title="Average Earnings for Graduates",
                xaxis=dict(title='School'),
                yaxis=dict(title='Salary (in thousands)'))
fig = Figure(data=data, layout=layout)

py.iplot(fig, sharing='secret', filename='jupyter/styled_bar')
Out[10]:

Now we have interactive charts displayed in our notebook. Hover on the chart to see the values for each bar, click and drag to zoom into a specific section or click on the legend to hide/show a trace.

Plotting Interactive Maps

Plotly is now integrated with Mapbox. In this example we'll plot lattitude and longitude data of nuclear waste sites. To plot on Mapbox maps with Plotly you'll need a Mapbox account and a Mapbox Access Token which you can add to your Plotly settings.

In [11]:
import plotly.plotly as py
from plotly.graph_objs import *

import pandas as pd

mapbox_access_token = 'pk.eyJ1IjoiY2hlbHNlYXBsb3RseSIsImEiOiJjaXFqeXVzdDkwMHFrZnRtOGtlMGtwcGs4In0.SLidkdBMEap9POJGIe1eGw'

json_data = pd.read_json('https://raw.githubusercontent.com/ottlngr/2016-15/ottlngr/ottlngr/sites.json')
rating_one_site_lat = []
rating_one_site_lon = []
locations_name = []

for index in range(len(json_data)):
    if json_data['locations'][index] != []:
        if json_data['site'][index]['rating'] == 1:
            rating_one_site_lat.append(json_data['locations'][index][0]['latitude'])
            rating_one_site_lon.append(json_data['locations'][index][0]['longitude'])
            locations_name.append(json_data['site'][index]['name'])

data = Data([
    Scattermapbox(
        lat=rating_one_site_lat,
        lon=rating_one_site_lon,
        mode='markers',
        marker=Marker(
            size=18,
            color='rgb(155, 240, 225)',
            opacity=0.7
        ),
        text=locations_name,
        hoverinfo='text'
    ),
    Scattermapbox(
        lat=rating_one_site_lat,
        lon=rating_one_site_lon,
        mode='markers',
        marker=Marker(
            size=8,
            color='rgb(205, 245, 100)'
        ),
        hoverinfo='skip'
    )]
)
        
layout = Layout(
    title='Nuclear Waste Sites on Campus',
    autosize=True,
    hovermode='closest',
    showlegend=False,
    mapbox=dict(
        accesstoken=mapbox_access_token,
        bearing=0,
        center=dict(
            lat=38,
            lon=-94
        ),
        pitch=0,
        zoom=3,
        style='dark'
    ),
)

fig = dict(data=data, layout=layout)
py.iplot(fig, filename='jupyter/Nuclear Waste Sites on American Campuses')
Out[11]:

3D Plotting

Using Numpy and Plotly, we can make interactive 3D plots in the Notebook as well.

In [12]:
import plotly.plotly as py
from plotly.graph_objs import *

import numpy as np

s = np.linspace(0, 2 * np.pi, 240)
t = np.linspace(0, np.pi, 240)
tGrid, sGrid = np.meshgrid(s, t)

r = 2 + np.sin(7 * sGrid + 5 * tGrid)  # r = 2 + sin(7s+5t)
x = r * np.cos(sGrid) * np.sin(tGrid)  # x = r*cos(s)*sin(t)
y = r * np.sin(sGrid) * np.sin(tGrid)  # y = r*sin(s)*sin(t)
z = r * np.cos(tGrid)                  # z = r*cos(t)

surface = Surface(x=x, y=y, z=z)
data = Data([surface])

layout = Layout(
    title='Parametric Plot',
    scene=Scene(
        xaxis=XAxis(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        yaxis=YAxis(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        ),
        zaxis=ZAxis(
            gridcolor='rgb(255, 255, 255)',
            zerolinecolor='rgb(255, 255, 255)',
            showbackground=True,
            backgroundcolor='rgb(230, 230,230)'
        )
    )
)

fig = Figure(data=data, layout=layout)
py.iplot(fig, filename='jupyter/parametric_plot')
Out[12]:

Animated Plots

Checkout Plotly's animation documentation to see how to create animated plots inline in Jupyter notebooks like the Gapminder plot displayed below: https://plot.ly/~PythonPlotBot/231/

Plot Controls & IPython widgets

Add sliders, buttons, and dropdowns to your inline chart:

In [13]:
import plotly.plotly as py
import numpy as np

data = [dict(
        visible = False,
        line=dict(color='00CED1', width=6),
        name = '𝜈 = '+str(step),
        x = np.arange(0,10,0.01),
        y = np.sin(step*np.arange(0,10,0.01))) for step in np.arange(0,5,0.1)]
data[10]['visible'] = True

steps = []
for i in range(len(data)):
    step = dict(
        method = 'restyle',
        args = ['visible', [False] * len(data)],
    )
    step['args'][1][i] = True # Toggle i'th trace to "visible"
    steps.append(step)

sliders = [dict(
    active = 10,
    currentvalue = {"prefix": "Frequency: "},
    pad = {"t": 50},
    steps = steps
)]

layout = dict(sliders=sliders)
fig = dict(data=data, layout=layout)

py.iplot(fig, filename='Sine Wave Slider')
Out[13]:

Additionally, IPython widgets allow you to add sliders, widgets, search boxes, and more to your Notebook. See the widget docs for more information. For others to be able to access your work, they'll need IPython. Or, you can use a cloud-based NB option so others can run your work.

Executing R Code

IRkernel, an R kernel for Jupyter, allows you to write and execute R code in a Jupyter notebook. Checkout the IRkernel documentation for some simple installation instructions. Once IRkernel is installed, open a Jupyter Notebook by calling $ jupyter notebook and use the New dropdown to select an R notebook.

See a full R example Jupyter Notebook here: https://plot.ly/~chelsea_lyn/14069

Additional Embed Features

We've seen how to embed Plotly tables and charts as iframes in the notebook, with IPython.display we can embed additional features, such a videos. For example, from YouTube:

In [16]:
from IPython.display import YouTubeVideo
YouTubeVideo("p86BPM1GV8M")
Out[16]:

LaTeX

We can embed LaTeX inside a Notebook by putting a $$ around our math, then run the cell as a Markdown cell. For example, the cell below is $$c = \sqrt{a^2 + b^2}$$, but the Notebook renders the expression.

$$c = \sqrt{a^2 + b^2}$$

Or, you can display output from Python, as seen here.

In [17]:
from IPython.display import display, Math, Latex

display(Math(r'F(k) = \int_{-\infty}^{\infty} f(x) e^{2\pi i k} dx'))
$$F(k) = \int_{-\infty}^{\infty} f(x) e^{2\pi i k} dx$$

Exporting & Publishing Notebooks

We can export the Notebook as an HTML, PDF, .py, .ipynb, Markdown, and reST file. You can also turn your NB into a slideshow. You can publish Jupyter Notebooks on Plotly. Simply visit plot.ly and select the + Create button in the upper right hand corner. Select Notebook and upload your Jupyter notebook (.ipynb) file! The notebooks that you upload will be stored in your Plotly organize folder and hosted at a unique link to make sharing quick and easy. See some example notebooks:

Publishing Dashboards

Users publishing interactive graphs can also use Plotly's dashboarding tool to arrange plots with a drag and drop interface. These dashboards can be published, embedded, and shared.

Publishing Dash Apps

For users looking to ship and productionize Python apps, dash is an assemblage of Flask, Socketio, Jinja, Plotly and boiler plate CSS and JS for easily creating data visualization web-apps with your Python data analysis backend.

For more Jupyter tutorials, checkout Plotly's python documentation: all documentation is written in jupyter notebooks that you can download and run yourself or checkout these user submitted examples!

IPython Notebook Gallery

Still need help?
Contact Us

For guaranteed 24 hour response turnarounds, upgrade to a Developer Support Plan.