Splom in R

How to make scatter-plot matrices or "sploms" natively with Plotly.


New to Plotly?

Plotly is a free and open-source graphing library for R. We recommend you read our Getting Started guide for the latest installation or upgrade instructions, then move on to our Plotly Fundamentals tutorials or dive straight in to some Basic Charts tutorials.

Splom trace

A scaterplot matrix is a matrix associated to n numerical arrays (data variables), $X^1,X^2,.,X_n$ , of the same length. The cell (i,j) of such a matrix displays the scatter plot of the variable Xi versus Xj ,

The Plotly splom trace implementation for the scaterplot matrix does not require to set $x=Xi$ , and $y=Xj$, for each scatter plot. All arrays, $X^1,X^2,.,X_n$ , are passed once, through a list of dicts called dimensions, i.e. each array/variable represents a dimension.

A trace of type splom is defined as follows:

fig <- plot_ly() 
fig <- fig %>%
  add_trace(
    dimensions = list(
      list(label='string-1', values=X1),
      list(label='string-2', values=X2),
      .
      .
      .
      list(label='string-n', values=Xn)),
    text=text,
    marker=list(...)
  )

The label in each dimension is assigned to the axes titles of the corresponding matrix cell.

text is either a unique string assigned to all points displayed by splom or a list of strings of the same length as the dimensions, $X_i$. The text[k] is the tooltip for the $k^{th}$ point in each cell.

marker sets the markers attributes in all scatter plots.

Splom of the Iris data set

library(plotly)

df <- read.csv('https://raw.githubusercontent.com/plotly/datasets/master/iris-data.csv')

The Iris dataset contains four data variables, sepal length, sepal width, petal length petal width, for 150 iris flowers. The flowers are labeled as Iris-setosa, Iris-versicolor, Iris-virginica.

Define a discrete colorscale with three colors corresponding to the three flower classes:

pl_colorscale=list(c(0.0, '#19d3f3'),
               c(0.333, '#19d3f3'),
               c(0.333, '#e763fa'),
               c(0.666, '#e763fa'),
               c(0.666, '#636efa'),
               c(1, '#636efa'))

Then create the splom:

axis = list(showline=FALSE,
            zeroline=FALSE,
            gridcolor='#ffff',
            ticklen=4)

fig <- df %>%
  plot_ly() 
fig <- fig %>%
  add_trace(
    type = 'splom',
    dimensions = list(
      list(label='sepal length', values=~sepal.length),
      list(label='sepal width', values=~sepal.width),
      list(label='petal length', values=~petal.length),
      list(label='petal width', values=~petal.width)
    ),
    text=~class,
    marker = list(
      color = as.integer(df$class),
      colorscale = pl_colorscale,
      size = 7,
      line = list(
        width = 1,
        color = 'rgb(230,230,230)'
      )
    )
  ) 
fig <- fig %>%
  layout(
    title= 'Iris Data set',
    hovermode='closest',
    dragmode= 'select',
    plot_bgcolor='rgba(240,240,240, 0.95)',
    xaxis=list(domain=NULL, showline=F, zeroline=F, gridcolor='#ffff', ticklen=4),
    yaxis=list(domain=NULL, showline=F, zeroline=F, gridcolor='#ffff', ticklen=4),
    xaxis2=axis,
    xaxis3=axis,
    xaxis4=axis,
    yaxis2=axis,
    yaxis3=axis,
    yaxis4=axis
  )

fig
4567823424645678012234246012
Iris Data setsepal lengthsepal widthpetal lengthpetal widthsepal lengthsepal widthpetal lengthpetal width

The scatter plots on the principal diagonal can be removed by setting diagonal=list(visible=FALSE):

library(plotly)

fig2 <-  fig %>% style(diagonal = list(visible = F))

fig2
4567823424601223445678246012
Iris Data setsepal lengthsepal widthpetal lengthpetal widthsepal lengthsepal widthpetal lengthpetal width

To plot only the lower/upper half of the splom we switch the default showlowerhalf=True / showupperhalf=False:

library(plotly)

fig2 <-  fig %>% style(showupperhalf = F)

fig2
4567823424645678012234246012
Iris Data setsepal lengthsepal widthpetal lengthpetal widthsepal lengthsepal widthpetal lengthpetal width

Each list in the dimensions has a key, visible, set by default on True. We can choose to remove a variable from splom, by setting visible=FALSE in its corresponding dimension. In this case the default grid associated to the scatterplot matrix keeps its number of cells, but the cells in the row and column corresponding to the visible false dimension are empty:

library(plotly)

fig2 <-  plotly_build(fig)
fig2$x$data[[1]]$dimensions[[3]] <- list(visible = F)

fig2
45678234024456780122340246012
Iris Data setsepal lengthsepal widthpetal widthsepal lengthsepal widthpetal width

Splom for the diabetes dataset

Diabetes dataset is downloaded from kaggle. It is used to predict the onset of diabetes based on 8 diagnostic measures. The diabetes file contains the diagnostic measures for 768 patients, that are labeled as non-diabetic (Outcome=0), respectively diabetic (Outcome=1). The splom associated to the 8 variables can illustrate the strength of the relationship between pairs of measures for diabetic/nondiabetic patients.

library(plotly)

df = read.csv('https://raw.githubusercontent.com/plotly/datasets/master/diabetes.csv')

We define aa discrete colorscale with two colors: blue for non-diabetics and red for diabetics:

pl_colorscale = list(c(0.0, '#119dff'),
                  c(0.5, '#119dff'),
                  c(0.5, '#ef553b'),
                  c(1, '#ef553b'))

Then create the splom:

axis = list(showline=FALSE,
            zeroline=FALSE,
            gridcolor='#ffff',
            ticklen=4,
            titlefont=list(size=13))

fig <- df %>%
  plot_ly() 
fig <- fig %>%
  add_trace(
    type = 'splom',
    dimensions = list(
      list(label='Pregnancies', values=~Pregnancies),
      list(label='Glucose', values=~Glucose),
      list(label='BloodPressure', values=~BloodPressure),
      list(label='SkinThickness', values=~SkinThickness),
      list(label='Insulin', values=~Insulin),
      list(label='BMI', values=~BMI),
      list(label='DiabPedigreeFun', values=~DiabetesPedigreeFunction),
      list(label='Age', values=~Age)
    ),
    text=~factor(Outcome, labels=c("non-diabetic","diabetic")),
    diagonal=list(visible=F),
    marker = list(
      color = ~Outcome,
      colorscale = pl_colorscale,
      size = 5,
      line = list(
        width = 1,
        color = 'rgb(230,230,230)'
      )
    )
  ) 
fig <- fig %>%
  layout(
    title = "Scatterplot Matrix (SPLOM) for Diabetes Dataset<br>Data source: <a href='https://www.kaggle.com/uciml/pima-indians-diabetes-database/data'>[1]</a>",
    hovermode='closest',
    dragmode = 'select',
    plot_bgcolor='rgba(240,240,240, 0.95)',
    xaxis=list(domain=NULL, showline=F, zeroline=F, gridcolor='#ffff', ticklen=4, titlefont=list(size=13)),
    yaxis=list(domain=NULL, showline=F, zeroline=F, gridcolor='#ffff', ticklen=4, titlefont=list(size=13)),
    xaxis2=axis,
    xaxis3=axis,
    xaxis4=axis,
    xaxis5=axis,
    xaxis6=axis,
    xaxis7=axis,
    xaxis8=axis,
    yaxis2=axis,
    yaxis3=axis,
    yaxis4=axis,
    yaxis5=axis,
    yaxis6=axis,
    yaxis7=axis,
    yaxis8=axis
  )

fig
051015010020005010005010005000204060012204060800501001502000510150501000501000500020406001220406080
Scatterplot Matrix (SPLOM) for Diabetes DatasetData source: [1]PregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabPedigreeFunAgePregnanciesGlucoseBloodPressureSkinThicknessInsulinBMIDiabPedigreeFunAge

Reference

See https://plotly.com/r/reference/ for more information and chart attribute options!

What About Dash?

Dash for R is an open-source framework for building analytical applications, with no Javascript required, and it is tightly integrated with the Plotly graphing library.

Learn about how to install Dash for R at https://dashr.plot.ly/installation.

Everywhere in this page that you see fig, you can display the same figure in a Dash for R application by passing it to the figure argument of the Graph component from the built-in dashCoreComponents package like this:

library(plotly)

fig <- plot_ly() 
# fig <- fig %>% add_trace( ... )
# fig <- fig %>% layout( ... ) 

library(dash)
library(dashCoreComponents)
library(dashHtmlComponents)

app <- Dash$new()
app$layout(
    htmlDiv(
        list(
            dccGraph(figure=fig) 
        )
     )
)

app$run_server(debug=TRUE, dev_tools_hot_reload=FALSE)