Put your ggplot on steroids

visualisation
code
Plotly adds some interactivity and can help clarify your data.
Published

February 2, 2024

Welcome back to Stats Tips for 2024 - hope you managed a nice break.

It’s a short one today. If you didn’t already now it existed, check out plotly for taking your ggplots to the next level.

Sometimes it can be extremely helpful to quickly link discrete elements of a plot to the corresponding observation/s in your dataframe. For example, you have a suspected outlier in a scatterplot and you want to know which individual that belongs to. Or, you have an unavoidably busy plot; for example, plotting the predictions from a mixed model for longitudinal data overlaid on the observed data for comparison. In these cases it’s nearly impossible to discern the origin of the plotted data. In both use-case scenarios (and many more), plotly can help.

In this example of the latter use-case, we are going to use data from a built-in dataset in the lme4 package. The sleepstudy data looks at reaction times over time in sleep-deprived individuals. For the sake of the exercise we will fit a mixed model with reaction time (ms) as the outcome, time (days) as a fixed-effect and time (days) and individual as random-effects. So this is a random slopes model allowing the ‘effect’ of sleep-deprivation on reaction time to vary over time for each individual. We fit the model and view a few lines of the dataframe which now contains the fixed (mod_pred_fix) and random (mod_pred_ran) predictions.

Code
library(lme4)
library(ggplot2)
library(plotly)
# Load data
data("sleepstudy")
# Model
mod <- lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)
# Predict
sleepstudy$mod_pred_fix <- predict(mod, re.form = NA) # predict fixed effects
sleepstudy$mod_pred_ran <- predict(mod) # predict random effects
# View data
head(sleepstudy, 10)
Reaction Days Subject mod_pred_fix mod_pred_ran
249.5600 0 308 251.4051 253.6637
258.7047 1 308 261.8724 273.3299
250.8006 2 308 272.3397 292.9962
321.4398 3 308 282.8070 312.6624
356.8519 4 308 293.2742 332.3287
414.6901 5 308 303.7415 351.9950
382.2038 6 308 314.2088 371.6612
290.1486 7 308 324.6761 391.3275
430.5853 8 308 335.1434 410.9937
466.3535 9 308 345.6107 430.6600

We can then plot the data interactively by simply ‘wrapping’ the ggplot object in a plotly call. If you hover over a data point you can easily identify which individual it belongs to as well as the observed reaction time. Similarly, by hovering over one of the random slopes you will see the predicted reaction time and the individual that corresponds to.

You won’t want to do this for every plot you make but it does provide a simple way to make some of your more complex visualisations using ggplot that bit more useful (and fun!) in helping to understand your data.

Code
# Plot
p <- sleepstudy |>
    ggplot(aes(x = Days, y = Reaction, color = factor(Subject))) +
    geom_line(aes(x = Days, y = mod_pred_ran)) +
    geom_line(aes(x = Days, y = mod_pred_fix), linewidth = 2, color = "blue") +
    geom_point(alpha = 0.5) +
    xlab("Time (days)") + ylab("Reaction Time (ms)") +
    guides(color = "none") +
    theme_bw(base_size = 15)
ggplotly(p)