Show Sidebar Hide Sidebar

# Normality Test in Python

Learn how to generate various normality tests using Python.

#### New to Plotly?¶

Plotly's Python library is free and open source! Get started by dowloading the client and reading the primer.
You can set up Plotly to work in online or offline mode, or in jupyter notebooks.
We also have a quick-reference cheatsheet (new!) to help you get started!

#### Imports¶

The tutorial below imports NumPy, Pandas, and SciPy.

In [1]:
import plotly.plotly as py
import plotly.graph_objs as go
from plotly.tools import FigureFactory as FF

import numpy as np
import pandas as pd
import scipy


#### Import Data¶

To look at various normality tests, we will import some data of average wind speed sampled every 10 minutes:

In [2]:
data = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/wind_speed_laurel_nebraska.csv')
df = data[0:10]

table = FF.create_table(df)
py.iplot(table, filename='wind-data-sample')

Out[2]:

In statistical analysis, it is always important to be as percise as possible in our language. In general for a normality test, we are testing the null-hypothesis that the our 1D data is sampled from a population that has a Normal Distribution. We assume a significance level of $0.05$ or $95\%$ for our tests unless otherwise stated.

#### Shapiro-Wilk¶

The Shapiro-Wilk normality test is reputadely more well suited to smaller datasets.

In [3]:
x = data['10 Min Sampled Avg']

shapiro_results = scipy.stats.shapiro(x)

matrix_sw = [
['', 'DF', 'Test Statistic', 'p-value'],
['Sample Data', len(x) - 1, shapiro_results[0], shapiro_results[1]]
]

shapiro_table = FF.create_table(matrix_sw, index=True)
py.iplot(shapiro_table, filename='shapiro-table')

Out[3]:

Since our p-value is much less than our Test Statistic, we have good evidence to not reject the null hypothesis at the 0.05 significance level.

#### Kolmogorov-Smirnov¶

The Kolmogorov-Smirnov test can be applied more broadly than Shapiro, since it is comparing any two distributions against each other, not necessarily one distriubtion to a normal one. These tests can be one-sided or both-sides, but the latter only applies if both distributions are continuous.

In [15]:
ks_results = scipy.stats.kstest(x, cdf='norm')

matrix_ks = [
['', 'DF', 'Test Statistic', 'p-value'],
['Sample Data', len(x) - 1, ks_results[0], ks_results[1]]
]

ks_table = FF.create_table(matrix_ks, index=True)
py.iplot(ks_table, filename='ks-table')

Out[15]:

Since our p-value is read as 0.0 (meaning it is "practically" 0 given the decimal accuracy of the test) then we have strong evidence to not reject the null-hypothesis

#### Anderson-Darling¶

Anderson's test is derived from Kolmogorov and is used in a similar way to test the null-hypothesis that data is sampled from a population that follows a particular distribution.

In [4]:
anderson_results = scipy.stats.anderson(x)
print(anderson_results)

AndersonResult(statistic=2.653698947239036, critical_values=array([ 0.566,  0.645,  0.773,  0.902,  1.073]), significance_level=array([ 15. ,  10. ,   5. ,   2.5,   1. ]))

In [5]:
matrix_ad = [
['', 'DF', 'Test Statistic', 'p-value'],
['Sample Data', len(x) - 1, anderson_results[0], anderson_results[1][2]]
]

py.iplot(anderson_table, filename='anderson-table')

Out[5]:

As with our tests above, we have good evidence to not reject our null-hypothesis.

#### D’Agostino and Pearson¶

We can combine the D'Agostino and Pearson method to generate a new test which considers the kurtosis, the sharpest point on the curve.

In [6]:
dagostino_results = scipy.stats.mstats.normaltest(x)

matrix_dp = [
['', 'DF', 'Test Statistic', 'p-value'],
['Sample Data', len(x) - 1, dagostino_results[0], dagostino_results[1]]
]

dagostino_table = FF.create_table(matrix_dp, index=True)
py.iplot(dagostino_table, filename='dagostino-table')

Out[6]:

Our p-value is very close to 0 and much less than our Test Statistic, so we have good evidence once again to not reject the null-hypothesis.

Still need help?