Yichen's Coding Notes

Some experience and complains ;D

Blog / Coding
2023-02-03

Customized Strategy to Deprecate Multiple Arguments in An R Package

Recently, I switched my job to a new lab where they have an R package that needs maintainance. I won't talk about the detail of my to-do list, but here is a point that worth taking a note.

The scenario is, there are arguments, a lot of, that need to be renamed. One reason for renaming them is that we are trying to keep a nice and consistent coding style. Another reason, more importantly, is that some arguments are super confusing. There are arguments meaning the same thing across different functions (e.g., "the maximum number of nearest neighbors to search", seen in UMAP, graph based community detection and etc.), but named differently (e.g. k, nNeighbor, knn_k and etc.). Nevertheless, there are arguments meaning different things but named in the same way (e.g. k for number of dimensions to use in function 1, k for number of nearest neighbors to search in function 2, and etc.). All of these makes our package confusing for users.

Changing argument names is not really a big deal. The pain point is that users might find their scripts fail with unused-argument-error if we just simply rename the problematic arguments. We need to utilize some lifecycle control tools to notify users that some arguments are being renamed, and finally make it comes true until a period of time has passed, when we are pretty confident that most of our users have seen this message.

In R community, there is a pacakge called lifecycle which is build for elegantly showing those messages. By calling the following command, it shows us the message exactly as what I want. (<NA> will be replaced by my package name when this is called in my package function).

> library(lifecycle)
> deprecate_warn(when = "1.2.0", what = "foo(old_bar)", with = "foo(newBar)")
Warning message:
The `old_bar` argument of `foo()` is deprecated as of <NA> 1.2.0.
ℹ Please use the `newBar` argument instead.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated. 

However, this function needs explicit input for each of the three arguments shown in the code chunk above. I don't need to call this if old_bar is missing from users' call, which means that I would definitely use some sort of conditional statement before actually calling it. Given the truth that I have a lot of arguments to deprecate and this situation happens in many functions, I would absolutely come up with a utility function that does everything at a time.

The design of this utility comes with three parts:

  1. It has to be able to tell what users have specified in their call. Only in this way I can write down the conditions where the notification needs to be shown.
  2. It has to work with a loop for each of the deprecated arguments within a single function.
  3. It has to be able to manage the final decision of the value for this parameter. I mean, the exact "bar" value from either old_bar or newBar.

After some research and experiments, here are pieces of my implementation.

Recognizing Users' Call

rlang::call_args(match.call())

This is the simplest command I can find, that returns a list of user input which does not include anything preset as default. For example:

> foo <- function(a, b, c = 2, ...) rlang::call_args(match.call())
> foo(1)
$a
[1] 1

In this way, I can just define my exported function in a way that both old and new argument names are presented. (This will continue to be used as an example for later steps.)

foo <- function(input1, newBar = 20, old_bar = newBar) {
    .utilityFunction(mapping = list(old_bar = "newBar"),
                     userCall = rlang::call_args(match.call()))
    .doActualCalculation(input1, newBar)
}

Then I would call the command above at the very beginning so it captures users' intention and passes this to my utility function. If old_bar is in the call, then the notification has to be shown.

Iterate through All Deprecated Arguments

Simplest. Just pass a key-value pair list, which maps old and new names (See example above), to the utility function, and do a for loop through each pair.

Manage The Correct Parameter Value

What I really want as a result is that, we only use newBar in the following of the exported function body. And there should never be a mix of using both old and new variables. Let' s list all possible conditions first, which can be formed by interpreting the output from above:

  1. Users don't call the function with any specification of "bar". (e.g. foo(input1))
  2. Users call with only specifying old_bar as some value.
  3. Only specify newBar.
  4. Surprisingly (but still need to take into account), both old_bar and newBar are specified.

For condition 1, I think we should just keep the default and what is being passed to the actual calculation is newBar = 20, which means nothing need to be done.

For condition 2, it looks like users want to set "bar" parameter to something, but they don't know we want to rename the argument, which should happen the most commonly. Here, we need to pass the value for old_bar to newBar, and show the notification.

For condition 3, probably we have got some new users who read documentation before using and know that we want them to use new arguments. So, nothing to be done.

Condition 4 is quite tricky, in terms of understanding what the users intend to do. Anyway, since they are using old_bar, we will have to show the notification. Given that the notification mentions that users should use new_bar instead, I assume that they can eventually understand that these are the same thing. And I'll just use what is being specified to newBar.

Therefore, what happens in my utility function in a nut shell is:

.utilityFunction <- function(mapping, userCall) {
    for (oldName in names(mapping)) {
        newName <- mapping[[oldName]]
        if (oldName %in% names(userCall)) {
            # Condition 2 & 4
            # Fill in properly formed text for the three arguments
            lifecycle::deprecate_warn()
            if (!newName %in% userCall) {
                # Condition 2
                setNewNameValue()
            } # else condition 4
        } # else condition 1 & 3
    }
}

Okay, now the last piece of the puzzle is how I should pass the value of old_bar in condition 2 to newBar.

Luckily, I found that we have access to the variables in a parent frame. I'll take our existing example to explain this a little bit. Briefly but not accurately, the parent frame of .utilityFunction, in our instance, is foo, who called .utilityFunction directly in its function body. In our instance, we have input1, newBar and old_bar as variables in this parent frame. And luckily, again, we can get the value of these variables, as well as modify them, that's right, from the body of .utilityFunction.

Of course, it is not hard to know who -- the name of the parent frame function -- is calling .utilityFunction. See the final completed implementation below.

.utilityFunction <- function(mapping, userCall) {
    parentFuncName <- as.list(sys.call(-1))[[1]]
    pf <- parent.frame()
    for (oldName in names(mapping)) {
        newName <- mapping[[oldName]]
        if (oldName %in% names(userCall)) {
            what <- paste0(parentFuncName, "(", old, ")")
            with <- paste0(parentFuncName, "(", new, ")")
            lifecycle::deprecate_warn("NewVersionNum.0.0", what, with)
            if (!newName %in% userCall) {
                pf[[newName]] <- pf[[oldName]]
            }
        }
    }
}

Feb, 3rd, 2023


Last Next

Comments