Easily view your data by a grouping variable

code

Use by() to view your data by a grouping variable.

Published

March 22, 2024

It is easy enough to view a dataframe in RStudio by opening the dataframe in the viewer or printing the dataframe (or part of it) to the console. However, this can be messy if you want to quickly identify data by a grouping variable (usually the patient id). The by() function can help you to do this. Let’s illustrate its utility with the sleepstudy dataset from the lme4 package. To start with I’ll print the data for the first 3 subjects as one might.

sleepstudy |> as_tibble() |> print(n = 30)

Code

library(lme4)
library(dplyr)
# Load data
data("sleepstudy")
sleepstudy |> as_tibble() |> print(n = 30)

# A tibble: 180 × 3
   Reaction  Days Subject
      <dbl> <dbl> <fct>  
 1     250.     0 308    
 2     259.     1 308    
 3     251.     2 308    
 4     321.     3 308    
 5     357.     4 308    
 6     415.     5 308    
 7     382.     6 308    
 8     290.     7 308    
 9     431.     8 308    
10     466.     9 308    
11     223.     0 309    
12     205.     1 309    
13     203.     2 309    
14     205.     3 309    
15     208.     4 309    
16     216.     5 309    
17     214.     6 309    
18     218.     7 309    
19     224.     8 309    
20     237.     9 309    
21     199.     0 310    
22     194.     1 310    
23     234.     2 310    
24     233.     3 310    
25     229.     4 310    
26     220.     5 310    
27     235.     6 310    
28     256.     7 310    
29     261.     8 310    
30     248.     9 310    
# ℹ 150 more rows

But we can do this better with:

by(sleepstudy, sleepstudy$PATIENT_ID, identity)[1:3]

Note that the [1:3] indicates the range of group indices that you want to view.

Code

by(sleepstudy, sleepstudy$Subject, identity)[1:3]

$`308`
   Reaction Days Subject
1  249.5600    0     308
2  258.7047    1     308
3  250.8006    2     308
4  321.4398    3     308
5  356.8519    4     308
6  414.6901    5     308
7  382.2038    6     308
8  290.1486    7     308
9  430.5853    8     308
10 466.3535    9     308

$`309`
   Reaction Days Subject
11 222.7339    0     309
12 205.2658    1     309
13 202.9778    2     309
14 204.7070    3     309
15 207.7161    4     309
16 215.9618    5     309
17 213.6303    6     309
18 217.7272    7     309
19 224.2957    8     309
20 237.3142    9     309

$`310`
   Reaction Days Subject
21 199.0539    0     310
22 194.3322    1     310
23 234.3200    2     310
24 232.8416    3     310
25 229.3074    4     310
26 220.4579    5     310
27 235.4208    6     310
28 255.7511    7     310
29 261.0125    8     310
30 247.5153    9     310

If you want to take this a step further, you can generalise this with a function that will allow you to quickly view the data in any range that you want, without having to continually copy and paste that line of code. Just call the function with your dataframe and group id names and the range of group indices that you want to view (interestingly while writing this function I worked out you don’t even need the by() function to achieve the same result).

print_groups(sleepstudy, Subject, 1, 3)

Code

# Create function
print_groups <- function(df, id, index1, index2) {
  df <- data.frame(df)
  ids_all <-  unique(eval(substitute(id), df))
  ids_range <- ids_all[index1:index2]
  if (index1 <= length(ids_all) & index2 <= length(ids_all)) {
    for (id2 in ids_range) {
      cat(paste0("id = ", id2, "\n"))
      print(df[eval(substitute(id), df) %in% id2,])
      cat("----------------------------\n\n")
    }
  } else {
    print("There aren't that many groups in your dataset")
  }
}

# Use function
print_groups(sleepstudy, Subject, 1, 3)

id = 308
   Reaction Days Subject
1  249.5600    0     308
2  258.7047    1     308
3  250.8006    2     308
4  321.4398    3     308
5  356.8519    4     308
6  414.6901    5     308
7  382.2038    6     308
8  290.1486    7     308
9  430.5853    8     308
10 466.3535    9     308
----------------------------

id = 309
   Reaction Days Subject
11 222.7339    0     309
12 205.2658    1     309
13 202.9778    2     309
14 204.7070    3     309
15 207.7161    4     309
16 215.9618    5     309
17 213.6303    6     309
18 217.7272    7     309
19 224.2957    8     309
20 237.3142    9     309
----------------------------

id = 310
   Reaction Days Subject
21 199.0539    0     310
22 194.3322    1     310
23 234.3200    2     310
24 232.8416    3     310
25 229.3074    4     310
26 220.4579    5     310
27 235.4208    6     310
28 255.7511    7     310
29 261.0125    8     310
30 247.5153    9     310
----------------------------

And there you have it!