More DRY (Don’t Repeat Yourself) — Meet mutate(across())

code
concept
Create or edit multiple columns efficiently.
Published

January 30, 2026

1 Introduction

Welcome back to Stats Tips for 2026. I hope you all had a restful break during the holiday period. I spent a couple of weeks on the South Island of New Zealand and was rather in awe of how beautiful that part of the world is - Queenstown, Milford Sound, Lake Tekapo and a 3 day hike on the Humpridge Track, which I can really recommend if you’re into hiking. The landscapes are otherworldly - we plan to go back, it was that good…

Ok, enough about my holidays and on to more serious topics. Today I thought I would ease you back into the world of statistical musings with a less theoretical and more practical post. One that illustrates the application of what I consider to be an indispensable dplyr function in my day-to-day work, and one that I hope you can make use of too - meet across().

I believe one aspiration we all share in our endeavour to become better R programmers is to write more efficient code that avoids repetition. You may recall that I have dedicated a whole other post to this topic and so our discussion here will further expound upon that theme.

How many of you have written a block of code that you have just reused, for example:

The same data transformation copied and pasted five times.
The same rounding applied column by column.
The same ifelse() rewritten with only the variable name changed.

I know I have.

In its most fundamental use-case, across() makes it easy to apply the same transformation to multiple columns in a dataframe in one go, rather than applying the same code block multiple times. It may not be flashy, but once you get the hang of how it works, your code will become shorter, clearer, and far easier to maintain. It’s important to note that across() doesn’t work by itself, but rather is a column-selection helper that is evaluated within other dplyr functions, most commonly mutate(), but also summarise() and filter(). In this mental model, mutate() decides what happens and across() decides where is happens.

Let’s make these ideas clearer with several examples…


2 The Problem: Repeating the Same Operation

Suppose you’re analysing a simple dataset.

Code
df <- tibble(id = as.character(1:3),
             age = c(34, 51, 63),
             weight = c(72.46, 81.27, 76.85),
             bmi = c(23.66, 27.93, 26.35),
             cholesterol = c(5.42, 6.01, 5.87))

df
id age weight bmi cholesterol
1 34 72.46 23.66 5.42
2 51 81.27 27.93 6.01
3 63 76.85 26.35 5.87


You decide that, for reporting, several variables should be rounded to 1 decimal place.


3 The Naïve Way

A common first attempt looks like this:

Code
df |> 
  mutate(weight = round(weight, 1),
         bmi = round(bmi, 1),
         cholesterol = round(cholesterol, 1))
id age weight bmi cholesterol
1 34 72.5 23.7 5.4
2 51 81.3 27.9 6.0
3 63 76.8 26.4 5.9


You write out the same line of code for each variable. It does work. It’s also:

  • Repetitive
  • Error-prone (easy to forget a variable)
  • Painful to update when the variable list changes

If you later add another variable (say waist), you must remember to update this block manually — and everywhere else you did something similar.


4 The Core Idea Behind across()

The key insight is simple:

When you apply the same transformation to multiple columns, you should write the transformation once.

That’s exactly what across() does.


5 The Better Way: mutate(across())

Here’s the same transformation rewritten using across().

Code
df |> 
  mutate(across(c(weight, bmi, cholesterol), round, 1))
id age weight bmi cholesterol
1 34 72.5 23.7 5.4
2 51 81.3 27.9 6.0
3 63 76.8 26.4 5.9


Read this out loud:

“Mutate across weight, bmi, and cholesterol by rounding to 1 decimal place.”

That phrasing is much closer to how you think about the task.


6 Why This Is Better (Beyond Being Shorter)

6.1 It Scales Naturally

If you add another variable:

Code
df |> 
mutate(across(c(weight, bmi, cholesterol, waist), round, 1))

No duplication. No copy-paste.


6.2 You Can Select Variables Programmatically

Instead of naming variables explicitly, you can select them based on some other programmatic characteristic. Here, let’s select all numeric variables:

Code
df |> 
  mutate(across(where(is.numeric), round, 1))
id age weight bmi cholesterol
1 34 72.5 23.7 5.4
2 51 81.3 27.9 6.0
3 63 76.8 26.4 5.9


Alternatively, you might want to select based on naming conventions. In this example, we would select all variables in the dataframe that begin with the text “lab_”:

Code
df |> 
  mutate(across(starts_with("lab_"), log))

This is particularly powerful in real research datasets, where variable names often follow patterns.


6.3 It Reduces Cognitive Load

If you reviewed your code 6 months down the track and compared these two blocks:

weight = round(weight, 1)
bmi = round(bmi, 1)
cholesterol = round(cholesterol, 1)

vs:

across(c(weight, bmi, cholesterol), round, 1)

The second tells you what is happening immediately.


7 Using Anonymous Functions for More Complex Logic

You’re not limited to simple functions like round() - you can write your own. In this case, across() recognises everything after the ~ as a user-defined or “anonymous” function. For example,

Suppose you want to:

  • add 1 to avoid zeros
  • log-transform the result
  • apply this consistently to multiple variables that begin with “lab_”.
Code
df |> 
  mutate(across(starts_with("lab_"), ~ log(.x + 1)))

Here, .x represents the current column being transformed.

As you can see - we can do all of this in one line of code.


8 Creating New Variables Instead of Overwriting

In research workflows, it’s often good practice for reproducibility to keep raw variables intact and create new variables instead.

Code
df |> 
  mutate(across(c(weight, bmi), scale, .names = "{.col}_z"))
id age weight bmi cholesterol weight_z bmi_z
1 34 72.46 23.66 5.42 -0.998862996 -1.0746155
2 51 81.27 27.93 6.01 1.001133139 0.9032328
3 63 76.85 26.35 5.87 -0.002270143 0.1713826


This produces new variables (weight_z, bmi_z) which are the Z-score transformations of weight and bmi using R’s built-in scale function. Note, it’s a simple case of creating new variable names by prefixing or suffixing characters to the original column name specified by {.col}.


9 mutate(across()) vs summarise(across())

As I mentioned at the outset, across() is most commonly used in conjunction with mutate(), but let’s look at an example where we may want to use it with summarise(). Let’s say we are interested in calculating the mean of each numeric column in the dataframe. We can do that as follows:

Code
df |> 
  summarise(across(where(is.numeric), mean, na.rm = TRUE))
age weight bmi cholesterol
49.33333 76.86 25.98 5.766667


Note a common point of confusion:

  • mutate(across()) returns the same number of rows as in the original dataframe
  • summarise(across()) reduces rows (to a single row if no grouping structure is specified)

across() is therefore applied to both functions in the same way, but with different intent.


10 Multiple Functions per Variable

What if were interested in not only calculating the mean of each numeric column, but also the standard deviation and the number of observations in each column. Well, it’s relatively easy to extend the above example by now applying several functions at once.

Code
df |> 
  summarise(across(where(is.numeric), list(mean = mean, 
                                           sd = sd,
                                           n = ~ sum(!is.na(.))),
                   .names = "{.col}_{.fn}"))
age_mean age_sd age_n weight_mean weight_sd weight_n bmi_mean bmi_sd bmi_n cholesterol_mean cholesterol_sd cholesterol_n
49.33333 14.57166 3 76.86 4.405009 3 25.98 2.158912 3 5.766667 0.3082748 3


We can now specify {.fn} as a naming qualifier and append this to the original column name. This pattern is a stepping stone toward automated summary tables and reporting pipelines.


11 Advanced Example: Using mutate(across()) with map()

Are you ready for something more advanced (but also extremely powerful)? So far we have been dealing with a single dataframe, but we can also leverage the power of across() in simultaneous column manipulation over multiple dataframes using map().

11.1 The Problem

Suppose you have several datasets with the same structure assembled within a list (a list is a convenient R object within which many other R objects can be stored - including dataframes). We can access a particular object within a list with the $ operator, much like we access the columns of a dataframe.

Code
datasets <- list(raw   = df, 
                 clean = df,
                 sens  = df)

datasets$raw
id age weight bmi cholesterol
1 34 72.46 23.66 5.42
2 51 81.27 27.93 6.01
3 63 76.85 26.35 5.87
Code
datasets$clean
id age weight bmi cholesterol
1 34 72.46 23.66 5.42
2 51 81.27 27.93 6.01
3 63 76.85 26.35 5.87
Code
datasets$sens
id age weight bmi cholesterol
1 34 72.46 23.66 5.42
2 51 81.27 27.93 6.01
3 63 76.85 26.35 5.87


Now, suppose you want to apply the same transformation to all of them.


11.2 The Naïve Way

df_raw   <- df_raw   |>  mutate(...)
df_clean <- df_clean |>  mutate(...)
df_sens  <- df_sens  |>  mutate(...)

In this approach we go through and re-apply the same code to each dataframe but this can become difficult to maintain and easy to get wrong.


11.3 The Better Way: map() + mutate(across())

Code
datasets <- datasets |> 
  map(~ .x |> 
    mutate(across(where(is.numeric), round, 1)))

datasets$raw
id age weight bmi cholesterol
1 34 72.5 23.7 5.4
2 51 81.3 27.9 6.0
3 63 76.8 26.4 5.9
Code
datasets$clean
id age weight bmi cholesterol
1 34 72.5 23.7 5.4
2 51 81.3 27.9 6.0
3 63 76.8 26.4 5.9
Code
datasets$sens
id age weight bmi cholesterol
1 34 72.5 23.7 5.4
2 51 81.3 27.9 6.0
3 63 76.8 26.4 5.9


Indeed, the more efficient way is to use across() within mutate() within map().

We can read this as:

“For each dataset, mutate across numeric variables by rounding to 1 decimal place.”

This approach using map() ensures:

  • identical logic across datasets
  • changes happen in one place
  • consistency is guaranteed

11.4 Another Example: Standardising Variables Across Datasets

Now, let’s take this further by extending the earlier example of creating new variables, not just within a single dataframe, but across multiple dataframes.

Code
datasets <- datasets |> 
  map(~ .x |> 
    mutate(across(c(age, bmi, weight), scale, .names = "{.col}_z")))

datasets$raw
id age weight bmi cholesterol age_z bmi_z weight_z
1 34 72.5 23.7 5.4 -1.0522707 -1.0806343 -0.99233882
2 51 81.3 27.9 6.0 0.1143773 0.8926979 1.00748903
3 63 76.8 26.4 5.9 0.9378935 0.1879364 -0.01515021
Code
datasets$clean
id age weight bmi cholesterol age_z bmi_z weight_z
1 34 72.5 23.7 5.4 -1.0522707 -1.0806343 -0.99233882
2 51 81.3 27.9 6.0 0.1143773 0.8926979 1.00748903
3 63 76.8 26.4 5.9 0.9378935 0.1879364 -0.01515021
Code
datasets$sens
id age weight bmi cholesterol age_z bmi_z weight_z
1 34 72.5 23.7 5.4 -1.0522707 -1.0806343 -0.99233882
2 51 81.3 27.9 6.0 0.1143773 0.8926979 1.00748903
3 63 76.8 26.4 5.9 0.9378935 0.1879364 -0.01515021

You can appreciate how much of a Swiss-army knife of data manipulation, across() can be become when used in conjunction with other R functions.


12 Final Thoughts and a Mental Model to Take Away

Whenever you catch yourself thinking:

“I’m doing the same thing to several variables…”

You should immediately ask:

“Can this be an across()?”

That question alone will dramatically improve the quality of your R code.

I hope you’ve found this programming tip helpful and I will see you again for more Stats Tips, next month.