Put your ggplot on steroids

visualisation

code

Plotly adds some interactivity and can help clarify your data.

Published

February 2, 2024

Welcome back to Stats Tips for 2024 - hope you managed a nice break.

It’s a short one today. If you didn’t already now it existed, check out plotly for taking your ggplots to the next level.

Sometimes it can be extremely helpful to quickly link discrete elements of a plot to the corresponding observation/s in your dataframe. For example, you have a suspected outlier in a scatterplot and you want to know which individual that belongs to. Or, you have an unavoidably busy plot; for example, plotting the predictions from a mixed model for longitudinal data overlaid on the observed data for comparison. In these cases it’s nearly impossible to discern the origin of the plotted data. In both use-case scenarios (and many more), plotly can help.

In this example of the latter use-case, we are going to use data from a built-in dataset in the lme4 package. The sleepstudy data looks at reaction times over time in sleep-deprived individuals. For the sake of the exercise we will fit a mixed model with reaction time (ms) as the outcome, time (days) as a fixed-effect and time (days) and individual as random-effects. So this is a random slopes model allowing the ‘effect’ of sleep-deprivation on reaction time to vary over time for each individual. We fit the model and view a few lines of the dataframe which now contains the fixed (mod_pred_fix) and random (mod_pred_ran) predictions.

Code

library(lme4)
library(ggplot2)
library(plotly)
# Load data
data("sleepstudy")
# Model
mod <- lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)
# Predict
sleepstudy$mod_pred_fix <- predict(mod, re.form = NA) # predict fixed effects
sleepstudy$mod_pred_ran <- predict(mod) # predict random effects
# View data
head(sleepstudy, 10)

Reaction	Days	Subject	mod_pred_fix	mod_pred_ran
249.5600	0	308	251.4051	253.6637
258.7047	1	308	261.8724	273.3299
250.8006	2	308	272.3397	292.9962
321.4398	3	308	282.8070	312.6624
356.8519	4	308	293.2742	332.3287
414.6901	5	308	303.7415	351.9950
382.2038	6	308	314.2088	371.6612
290.1486	7	308	324.6761	391.3275
430.5853	8	308	335.1434	410.9937
466.3535	9	308	345.6107	430.6600

We can then plot the data interactively by simply ‘wrapping’ the ggplot object in a plotly call. If you hover over a data point you can easily identify which individual it belongs to as well as the observed reaction time. Similarly, by hovering over one of the random slopes you will see the predicted reaction time and the individual that corresponds to.

You won’t want to do this for every plot you make but it does provide a simple way to make some of your more complex visualisations using ggplot that bit more useful (and fun!) in helping to understand your data.

Code

# Plot
p <- sleepstudy |>
    ggplot(aes(x = Days, y = Reaction, color = factor(Subject))) +
    geom_line(aes(x = Days, y = mod_pred_ran)) +
    geom_line(aes(x = Days, y = mod_pred_fix), linewidth = 2, color = "blue") +
    geom_point(alpha = 0.5) +
    xlab("Time (days)") + ylab("Reaction Time (ms)") +
    guides(color = "none") +
    theme_bw(base_size = 15)
ggplotly(p)