Plotly adds some interactivity and can help clarify your data.
Published
February 2, 2024
Welcome back to Stats Tips for 2024 - hope you managed a nice break.
It’s a short one today. If you didn’t already now it existed, check out plotly for taking your ggplots to the next level.
Sometimes it can be extremely helpful to quickly link discrete elements of a plot to the corresponding observation/s in your dataframe. For example, you have a suspected outlier in a scatterplot and you want to know which individual that belongs to. Or, you have an unavoidably busy plot; for example, plotting the predictions from a mixed model for longitudinal data overlaid on the observed data for comparison. In these cases it’s nearly impossible to discern the origin of the plotted data. In both use-case scenarios (and many more), plotly can help.
In this example of the latter use-case, we are going to use data from a built-in dataset in the lme4 package. The sleepstudy data looks at reaction times over time in sleep-deprived individuals. For the sake of the exercise we will fit a mixed model with reaction time (ms) as the outcome, time (days) as a fixed-effect and time (days) and individual as random-effects. So this is a random slopes model allowing the ‘effect’ of sleep-deprivation on reaction time to vary over time for each individual. We fit the model and view a few lines of the dataframe which now contains the fixed (mod_pred_fix) and random (mod_pred_ran) predictions.
Code
library(lme4)library(ggplot2)library(plotly)# Load datadata("sleepstudy")# Modelmod <-lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)# Predictsleepstudy$mod_pred_fix <-predict(mod, re.form =NA) # predict fixed effectssleepstudy$mod_pred_ran <-predict(mod) # predict random effects# View datahead(sleepstudy, 10)
Reaction
Days
Subject
mod_pred_fix
mod_pred_ran
249.5600
0
308
251.4051
253.6637
258.7047
1
308
261.8724
273.3299
250.8006
2
308
272.3397
292.9962
321.4398
3
308
282.8070
312.6624
356.8519
4
308
293.2742
332.3287
414.6901
5
308
303.7415
351.9950
382.2038
6
308
314.2088
371.6612
290.1486
7
308
324.6761
391.3275
430.5853
8
308
335.1434
410.9937
466.3535
9
308
345.6107
430.6600
We can then plot the data interactively by simply ‘wrapping’ the ggplot object in a plotly call. If you hover over a data point you can easily identify which individual it belongs to as well as the observed reaction time. Similarly, by hovering over one of the random slopes you will see the predicted reaction time and the individual that corresponds to.
You won’t want to do this for every plot you make but it does provide a simple way to make some of your more complex visualisations using ggplot that bit more useful (and fun!) in helping to understand your data.
Code
# Plotp <- sleepstudy |>ggplot(aes(x = Days, y = Reaction, color =factor(Subject))) +geom_line(aes(x = Days, y = mod_pred_ran)) +geom_line(aes(x = Days, y = mod_pred_fix), linewidth =2, color ="blue") +geom_point(alpha =0.5) +xlab("Time (days)") +ylab("Reaction Time (ms)") +guides(color ="none") +theme_bw(base_size =15)ggplotly(p)
Source Code
---title: "Put your ggplot on steroids"date: 2024-02-02categories: [visualisation, code]image: "plotly_small.jpg"description: "Plotly adds some interactivity and can help clarify your data."---Welcome back to Stats Tips for 2024 - hope you managed a nice break.It's a short one today. If you didn't already now it existed, check out [`plotly`](https://plotly.com/ggplot2/ "https://plotly.com/ggplot2/"){style="color: #1f7de6"} for taking your ggplots to the next level.Sometimes it can be extremely helpful to quickly link discrete elements of a plot to the corresponding observation/s in your dataframe. For example, you have a suspected outlier in a scatterplot and you want to know which individual that belongs to. Or, you have an unavoidably busy plot; for example, plotting the predictions from a mixed model for longitudinal data overlaid on the observed data for comparison. In these cases it's nearly impossible to discern the origin of the plotted data. In both use-case scenarios (and many more), `plotly` can help.In this example of the latter use-case, we are going to use data from a built-in dataset in the `lme4` package. The `sleepstudy` data looks at reaction times over time in sleep-deprived individuals. For the sake of the exercise we will fit a mixed model with reaction time (ms) as the outcome, time (days) as a fixed-effect and time (days) and individual as random-effects. So this is a random slopes model allowing the 'effect' of sleep-deprivation on reaction time to vary over time for each individual. We fit the model and view a few lines of the dataframe which now contains the fixed (`mod_pred_fix`) and random (`mod_pred_ran`) predictions.```{r}#| label: setup#| message: falselibrary(lme4)library(ggplot2)library(plotly)# Load datadata("sleepstudy")# Modelmod <-lmer(Reaction ~ Days + (Days | Subject), data = sleepstudy)# Predictsleepstudy$mod_pred_fix <-predict(mod, re.form =NA) # predict fixed effectssleepstudy$mod_pred_ran <-predict(mod) # predict random effects# View datahead(sleepstudy, 10)```We can then plot the data interactively by simply 'wrapping' the ggplot object in a `plotly` call. If you hover over a data point you can easily identify which individual it belongs to as well as the observed reaction time. Similarly, by hovering over one of the random slopes you will see the predicted reaction time and the individual that corresponds to.You won't want to do this for every plot you make but it does provide a simple way to make some of your more complex visualisations using ggplot that bit more useful (and fun!) in helping to understand your data.```{r}#| label: plotly# Plotp <- sleepstudy |>ggplot(aes(x = Days, y = Reaction, color =factor(Subject))) +geom_line(aes(x = Days, y = mod_pred_ran)) +geom_line(aes(x = Days, y = mod_pred_fix), linewidth =2, color ="blue") +geom_point(alpha =0.5) +xlab("Time (days)") +ylab("Reaction Time (ms)") +guides(color ="none") +theme_bw(base_size =15)ggplotly(p)```