gtsummary - Your New Go-To for Tables

code
presentation
Create summary and regression tables in a flash.
Published

May 3, 2024

I thought I should bring this excellent package to your attention if you weren’t aware that it exists, as I have taken gtsummary somewhat for granted over the last few years since it first appeared on CRAN. I’m prompted in part due to a research student having to recently remake several “Table 1” - style tables (following a data change) in manuscript preparation for submission and they were going to redo this manually. When they realised what gtsummary could do in terms of saving them time, I think they were fairly impressed. So today, I’m just going to show you a couple of basic functionalities of this package. It is extremely extensible and if you can’t find answers for your own customisation needs on the homepage or vignette, I have found googling the issue often brings an answer. The developer is also quite active on stackoverflow.com. The homepage can be found at:

https://www.danieldsjoberg.com/gtsummary/index.html

We going to use a publicly available MS dataset, so if you want to run the code yourself you will first need to download the data from:

Brain MRI dataset of multiple sclerosis with consensus manual lesion segmentation and patient meta information

This dataset contains the demographic and clinical data on 60 patients (MRI data in accompanying datasets available at link).

1 Load and Inspect the Data

Let’s have a look at the first few lines:

Code
head(dat, 10)
ID Gender Age Age.of.onset EDSS Does.the.time.difference.between.MRI.acquisition.and.EDSS…two.months Types.of.Medicines Presenting.Symptom Dose.the.patient.has.Co.moroidity Pyramidal Cerebella Brain.stem Sensory Sphincters Visual Mental Speech Motor.System Sensory.System Coordination Gait Bowel.and.bladder.function Mobility Mental.State Optic.discs Fields Nystagmus Ocular.Movement Swallowing
1 F 56 43 3.0 No Gelenia Motor No 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 1 0 0
2 F 29 19 1.5 No Gelenia Sensory No 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0
3 F 15 8 4.0 No Tysabri Motor No 1 1 0 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0
4 F 24 20 6.0 No Tysabri Sensory No 1 1 1 0 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0
5 F 33 31 0.0 No Avonex Pain No 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
6 F 44 40 5.0 No Avonex Motor No 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 1 0 0
7 M 43 40 3.5 No Betaferon Motor & Visual No 0 1 0 0 0 0 1 0 1 1 1 0 0 1 0 1 0 0 0 0
8 F 32 30 1.0 No Gelenia Visual No 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
9 F 36 33 6.0 No Gelenia Motore No 1 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 1 0 0
10 F 39 35 3.0 No Betaferon Motor & Behavioural No 1 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0

2 Summary Table

Let’s say you want to create a summary table showing descriptive statistics of the various demographic and clinical characteristics, stratified by DMT (Types.of.Medicines). In the first instance, this can be a basic call of tbl_summary() specifying Types.of.Medicines as the stratifying variable. We want to specify medians (IQR) and n’s (%’s) as the summary statistics.

Code
library(gtsummary)
dat |> 
  select(-ID) |> 
  tbl_summary(
    by = Types.of.Medicines,
    statistic = list(all_continuous() ~ "{median} ({p25},{p75})",
                     all_categorical() ~ "{n}/{N} ({p}%)"),
    digits = all_continuous() ~ 1) |> 
  add_overall()
Characteristic Overall, N = 601 Avonex, N = 51 Betaferon, N = 241 Gelenia, N = 91 Rebif, N = 141 Tysabri, N = 81
Gender





    F 46/60 (77%) 5/5 (100%) 15/24 (63%) 9/9 (100%) 10/14 (71%) 7/8 (88%)
    M 13/60 (22%) 0/5 (0%) 8/24 (33%) 0/9 (0%) 4/14 (29%) 1/8 (13%)
    N 1/60 (1.7%) 0/5 (0%) 1/24 (4.2%) 0/9 (0%) 0/14 (0%) 0/8 (0%)
Age 33.0 (20.0,42.3) 24.0 (23.0,33.0) 37.5 (23.8,43.0) 42.0 (36.0,52.0) 32.5 (18.5,38.0) 20.5 (15.0,24.3)
Age.of.onset 30.5 (19.8,40.0) 20.0 (20.0,31.0) 35.0 (23.0,41.0) 40.0 (30.0,42.0) 31.0 (18.5,37.0) 17.0 (16.3,21.3)
EDSS 2.0 (1.0,3.5) 1.5 (1.0,4.0) 2.3 (1.0,3.1) 3.0 (1.5,3.0) 1.3 (1.0,2.4) 3.0 (1.4,4.3)
Does.the.time.difference.between.MRI.acquisition.and.EDSS...two.months 26/60 (43%) 0/5 (0%) 10/24 (42%) 3/9 (33%) 11/14 (79%) 2/8 (25%)
Presenting.Symptom





    Balance 4/60 (6.7%) 0/5 (0%) 2/24 (8.3%) 0/9 (0%) 2/14 (14%) 0/8 (0%)
    Balance &Motor 1/60 (1.7%) 0/5 (0%) 0/24 (0%) 0/9 (0%) 0/14 (0%) 1/8 (13%)
    Motor 10/60 (17%) 1/5 (20%) 3/24 (13%) 1/9 (11%) 3/14 (21%) 2/8 (25%)
    Motor & Behavioural 1/60 (1.7%) 0/5 (0%) 1/24 (4.2%) 0/9 (0%) 0/14 (0%) 0/8 (0%)
    Motor & Sensory 1/60 (1.7%) 0/5 (0%) 1/24 (4.2%) 0/9 (0%) 0/14 (0%) 0/8 (0%)
    Motor & Visual 2/60 (3.3%) 0/5 (0%) 2/24 (8.3%) 0/9 (0%) 0/14 (0%) 0/8 (0%)
    Motore 1/60 (1.7%) 0/5 (0%) 0/24 (0%) 1/9 (11%) 0/14 (0%) 0/8 (0%)
    Pain 1/60 (1.7%) 1/5 (20%) 0/24 (0%) 0/9 (0%) 0/14 (0%) 0/8 (0%)
    Sensory 19/60 (32%) 0/5 (0%) 8/24 (33%) 3/9 (33%) 7/14 (50%) 1/8 (13%)
    Sensory & Visual 1/60 (1.7%) 0/5 (0%) 1/24 (4.2%) 0/9 (0%) 0/14 (0%) 0/8 (0%)
    Sensory & Motor 1/60 (1.7%) 0/5 (0%) 1/24 (4.2%) 0/9 (0%) 0/14 (0%) 0/8 (0%)
    Sensory & Visual 1/60 (1.7%) 0/5 (0%) 0/24 (0%) 1/9 (11%) 0/14 (0%) 0/8 (0%)
    Sensory & Visual ,Balance , Motor, Sexual,Fatigue 1/60 (1.7%) 0/5 (0%) 1/24 (4.2%) 0/9 (0%) 0/14 (0%) 0/8 (0%)
    Sensory &Motor 1/60 (1.7%) 0/5 (0%) 0/24 (0%) 0/9 (0%) 0/14 (0%) 1/8 (13%)
    Visual 14/60 (23%) 3/5 (60%) 4/24 (17%) 2/9 (22%) 2/14 (14%) 3/8 (38%)
    Visual & Balance 1/60 (1.7%) 0/5 (0%) 0/24 (0%) 1/9 (11%) 0/14 (0%) 0/8 (0%)
Dose.the.patient.has.Co.moroidity 13/60 (22%) 0/5 (0%) 8/24 (33%) 3/9 (33%) 2/14 (14%) 0/8 (0%)
Pyramidal 31/60 (52%) 2/5 (40%) 14/24 (58%) 5/9 (56%) 4/14 (29%) 6/8 (75%)
Cerebella 17/60 (28%) 1/5 (20%) 8/24 (33%) 2/9 (22%) 3/14 (21%) 3/8 (38%)
Brain.stem 5/60 (8.3%) 1/5 (20%) 1/24 (4.2%) 0/9 (0%) 1/14 (7.1%) 2/8 (25%)
Sensory 18/60 (30%) 1/5 (20%) 8/24 (33%) 3/9 (33%) 3/14 (21%) 3/8 (38%)
Sphincters 9/60 (15%) 0/5 (0%) 5/24 (21%) 0/9 (0%) 2/14 (14%) 2/8 (25%)
Visual 17/60 (28%) 3/5 (60%) 6/24 (25%) 2/9 (22%) 2/14 (14%) 4/8 (50%)
Mental 2/60 (3.3%) 0/5 (0%) 2/24 (8.3%) 0/9 (0%) 0/14 (0%) 0/8 (0%)
Speech 6/60 (10%) 0/5 (0%) 4/24 (17%) 0/9 (0%) 1/14 (7.1%) 1/8 (13%)
Motor.System 35/60 (58%) 3/5 (60%) 14/24 (58%) 5/9 (56%) 6/14 (43%) 7/8 (88%)
Sensory.System 19/60 (32%) 0/5 (0%) 8/24 (33%) 4/9 (44%) 4/14 (29%) 3/8 (38%)
Coordination 17/60 (28%) 2/5 (40%) 6/24 (25%) 2/9 (22%) 2/14 (14%) 5/8 (63%)
Gait 17/60 (28%) 2/5 (40%) 7/24 (29%) 1/9 (11%) 4/14 (29%) 3/8 (38%)
Bowel.and.bladder.function 9/60 (15%) 1/5 (20%) 2/24 (8.3%) 1/9 (11%) 3/14 (21%) 2/8 (25%)
Mobility 4/60 (6.7%) 0/5 (0%) 2/24 (8.3%) 1/9 (11%) 1/14 (7.1%) 0/8 (0%)
Mental.State 3/60 (5.0%) 0/5 (0%) 2/24 (8.3%) 0/9 (0%) 1/14 (7.1%) 0/8 (0%)
Optic.discs 22/60 (37%) 2/5 (40%) 8/24 (33%) 3/9 (33%) 4/14 (29%) 5/8 (63%)
Fields 0/60 (0%) 0/5 (0%) 0/24 (0%) 0/9 (0%) 0/14 (0%) 0/8 (0%)
Nystagmus 7/60 (12%) 1/5 (20%) 3/24 (13%) 2/9 (22%) 0/14 (0%) 1/8 (13%)
Ocular.Movement 2/60 (3.3%) 0/5 (0%) 0/24 (0%) 1/9 (11%) 0/14 (0%) 1/8 (13%)
Swallowing 3/60 (5.0%) 0/5 (0%) 3/24 (13%) 0/9 (0%) 0/14 (0%) 0/8 (0%)
1 n/N (%); Median (25%,75%)

In fact, that’s a pretty good start. However, we think that including the column frequency as the denominator in every cell is just clutter, so let’s remove that. We’ll also include an argument for reporting missingness if any exists. Additionally, we want to tidy up some of the variable names - I’ll just do Age, Age.of.onset and the somewhat convoluted Does.the.time.difference.between.MRI.acquisition.and.EDSS...two.months for now. In fact, for the latter we’ll make it a short name and include a footnote to expand on the variable description.

Code
dat |> 
  select(-ID) |> 
  tbl_summary(
    by = Types.of.Medicines,
    statistic = list(all_continuous() ~ "{median} ({p25},{p75})",
                     all_categorical() ~ "{n} ({p}%)"),
    digits = all_continuous() ~ 1,
    missing_text = "(Missing)",
    label = c(Age ~ "Age, yrs - median (IQR)",
              Age.of.onset ~ "Age onset, yrs - median (IQR)",
              Does.the.time.difference.between.MRI.acquisition.and.EDSS...two.months ~ "Time difference < 2 months")) |> 
    modify_table_styling(columns = label,
                         rows = label == "Time difference < 2 months",
                         footnote = "Does the time difference between MRI acquisition and EDSS < two months") |> 
  add_overall()
Characteristic Overall, N = 601 Avonex, N = 51 Betaferon, N = 241 Gelenia, N = 91 Rebif, N = 141 Tysabri, N = 81
Gender





    F 46 (77%) 5 (100%) 15 (63%) 9 (100%) 10 (71%) 7 (88%)
    M 13 (22%) 0 (0%) 8 (33%) 0 (0%) 4 (29%) 1 (13%)
    N 1 (1.7%) 0 (0%) 1 (4.2%) 0 (0%) 0 (0%) 0 (0%)
Age, yrs - median (IQR) 33.0 (20.0,42.3) 24.0 (23.0,33.0) 37.5 (23.8,43.0) 42.0 (36.0,52.0) 32.5 (18.5,38.0) 20.5 (15.0,24.3)
Age onset, yrs - median (IQR) 30.5 (19.8,40.0) 20.0 (20.0,31.0) 35.0 (23.0,41.0) 40.0 (30.0,42.0) 31.0 (18.5,37.0) 17.0 (16.3,21.3)
EDSS 2.0 (1.0,3.5) 1.5 (1.0,4.0) 2.3 (1.0,3.1) 3.0 (1.5,3.0) 1.3 (1.0,2.4) 3.0 (1.4,4.3)
Time difference < 2 months2 26 (43%) 0 (0%) 10 (42%) 3 (33%) 11 (79%) 2 (25%)
Presenting.Symptom





    Balance 4 (6.7%) 0 (0%) 2 (8.3%) 0 (0%) 2 (14%) 0 (0%)
    Balance &Motor 1 (1.7%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (13%)
    Motor 10 (17%) 1 (20%) 3 (13%) 1 (11%) 3 (21%) 2 (25%)
    Motor & Behavioural 1 (1.7%) 0 (0%) 1 (4.2%) 0 (0%) 0 (0%) 0 (0%)
    Motor & Sensory 1 (1.7%) 0 (0%) 1 (4.2%) 0 (0%) 0 (0%) 0 (0%)
    Motor & Visual 2 (3.3%) 0 (0%) 2 (8.3%) 0 (0%) 0 (0%) 0 (0%)
    Motore 1 (1.7%) 0 (0%) 0 (0%) 1 (11%) 0 (0%) 0 (0%)
    Pain 1 (1.7%) 1 (20%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
    Sensory 19 (32%) 0 (0%) 8 (33%) 3 (33%) 7 (50%) 1 (13%)
    Sensory & Visual 1 (1.7%) 0 (0%) 1 (4.2%) 0 (0%) 0 (0%) 0 (0%)
    Sensory & Motor 1 (1.7%) 0 (0%) 1 (4.2%) 0 (0%) 0 (0%) 0 (0%)
    Sensory & Visual 1 (1.7%) 0 (0%) 0 (0%) 1 (11%) 0 (0%) 0 (0%)
    Sensory & Visual ,Balance , Motor, Sexual,Fatigue 1 (1.7%) 0 (0%) 1 (4.2%) 0 (0%) 0 (0%) 0 (0%)
    Sensory &Motor 1 (1.7%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 1 (13%)
    Visual 14 (23%) 3 (60%) 4 (17%) 2 (22%) 2 (14%) 3 (38%)
    Visual & Balance 1 (1.7%) 0 (0%) 0 (0%) 1 (11%) 0 (0%) 0 (0%)
Dose.the.patient.has.Co.moroidity 13 (22%) 0 (0%) 8 (33%) 3 (33%) 2 (14%) 0 (0%)
Pyramidal 31 (52%) 2 (40%) 14 (58%) 5 (56%) 4 (29%) 6 (75%)
Cerebella 17 (28%) 1 (20%) 8 (33%) 2 (22%) 3 (21%) 3 (38%)
Brain.stem 5 (8.3%) 1 (20%) 1 (4.2%) 0 (0%) 1 (7.1%) 2 (25%)
Sensory 18 (30%) 1 (20%) 8 (33%) 3 (33%) 3 (21%) 3 (38%)
Sphincters 9 (15%) 0 (0%) 5 (21%) 0 (0%) 2 (14%) 2 (25%)
Visual 17 (28%) 3 (60%) 6 (25%) 2 (22%) 2 (14%) 4 (50%)
Mental 2 (3.3%) 0 (0%) 2 (8.3%) 0 (0%) 0 (0%) 0 (0%)
Speech 6 (10%) 0 (0%) 4 (17%) 0 (0%) 1 (7.1%) 1 (13%)
Motor.System 35 (58%) 3 (60%) 14 (58%) 5 (56%) 6 (43%) 7 (88%)
Sensory.System 19 (32%) 0 (0%) 8 (33%) 4 (44%) 4 (29%) 3 (38%)
Coordination 17 (28%) 2 (40%) 6 (25%) 2 (22%) 2 (14%) 5 (63%)
Gait 17 (28%) 2 (40%) 7 (29%) 1 (11%) 4 (29%) 3 (38%)
Bowel.and.bladder.function 9 (15%) 1 (20%) 2 (8.3%) 1 (11%) 3 (21%) 2 (25%)
Mobility 4 (6.7%) 0 (0%) 2 (8.3%) 1 (11%) 1 (7.1%) 0 (0%)
Mental.State 3 (5.0%) 0 (0%) 2 (8.3%) 0 (0%) 1 (7.1%) 0 (0%)
Optic.discs 22 (37%) 2 (40%) 8 (33%) 3 (33%) 4 (29%) 5 (63%)
Fields 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%) 0 (0%)
Nystagmus 7 (12%) 1 (20%) 3 (13%) 2 (22%) 0 (0%) 1 (13%)
Ocular.Movement 2 (3.3%) 0 (0%) 0 (0%) 1 (11%) 0 (0%) 1 (13%)
Swallowing 3 (5.0%) 0 (0%) 3 (13%) 0 (0%) 0 (0%) 0 (0%)
1 n (%); Median (25%,75%)
2 Does the time difference between MRI acquisition and EDSS < two months

If you want to save the created table, you can do this in one of two ways. The first is save it directly as a .docx file which should work most of the time. However, if you notice any formatting issues, change the save target file extension to .html, then open that in Word and you should be ok as well. An important point is to first save the table in your R script to an object - e.g.

tbl <- dat |> tbl_summary(...

The command to save the table as a Word (or html file is then):

gt::gtsave(as_gt(tbl), filename = "summary_table.docx", path = "...your_path.../")

3 Regression Table

gtsummary’s other strength is in making regression tables, and the relevant workhorse function here is tbl_regression().

Let’s say we’re interested in the association between Age onset and the presence of Sensory symptoms (I don’t really know whether this makes sense or not but it’s just to run a regression). The outcome variable here is binary, so we’ll need to specify a logistic regression model. We can do that as follows in R and we obtain the standard (fairly bland from the point of view of presentation/collaboration) ouput:

Code
mod <- glm(Sensory ~ Age.of.onset, family = 'binomial', data = dat)
summary(mod)

Call:
glm(formula = Sensory ~ Age.of.onset, family = "binomial", data = dat)

Coefficients:
             Estimate Std. Error z value Pr(>|z|)  
(Intercept)  -1.75743    0.87101  -2.018   0.0436 *
Age.of.onset  0.02987    0.02641   1.131   0.2581  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 73.304  on 59  degrees of freedom
Residual deviance: 71.994  on 58  degrees of freedom
AIC: 75.994

Number of Fisher Scoring iterations: 4

Let’s pretty this up by passing the model results through tbl_regression():

Code
mod |> 
  tbl_regression()
Characteristic log(OR)1 95% CI1 p-value
Age.of.onset 0.03 -0.02, 0.08 0.3
1 OR = Odds Ratio, CI = Confidence Interval

Not bad, but we’d like the output to be in terms of odds-ratios rather than log odds-ratios. That’s actually quite simple to do:

Code
mod |> 
  tbl_regression(exponentiate = T)
Characteristic OR1 95% CI1 p-value
Age.of.onset 1.03 0.98, 1.09 0.3
1 OR = Odds Ratio, CI = Confidence Interval

What if you want to include some model summary fit-statistics:

Code
mod |> 
  tbl_regression(exponentiate = T) |> 
  add_glance_source_note()
Characteristic OR1 95% CI1 p-value
Age.of.onset 1.03 0.98, 1.09 0.3
Null deviance = 73.3; Null df = 59.0; Log-likelihood = -36.0; AIC = 76.0; BIC = 80.2; Deviance = 72.0; Residual df = 58; No. Obs. = 60
1 OR = Odds Ratio, CI = Confidence Interval

That’s all great, but I’ve just noticed that the predictor variable isn’t formatted so well, so let’s change that.

Code
mod |> 
  tbl_regression(exponentiate = T,
                 label = c(Age.of.onset ~ "Age onset")) |> 
  add_glance_source_note()
Characteristic OR1 95% CI1 p-value
Age onset 1.03 0.98, 1.09 0.3
Null deviance = 73.3; Null df = 59.0; Log-likelihood = -36.0; AIC = 76.0; BIC = 80.2; Deviance = 72.0; Residual df = 58; No. Obs. = 60
1 OR = Odds Ratio, CI = Confidence Interval

tbl_regression() supports almost any model you can throw at it.

4 Last Word

I hope you find both of these functions useful in your day-to-day coding and data analysis - they are great additions to your R toolkit, not only for their time-saving capabilities, but also the fantastic improvements to the visual style of results formatting that you can achieve, for which base R often falls far short.