R for reproducible scientific analysis
Creating functions
Learning Objectives
- Define a function that takes arguments.
- Return a value from a function.
- Test a function.
- Set default values for function arguments.
- Explain why we should divide programs into small, single-purpose functions.
If we only had one data set to analyze, it would probably be faster to load the file into a spreadsheet and use that to plot simple statistics. However, data may be updated periodically, and we may want to pull in that new information later and re-run our analysis again. We may also obtain similar data from a different source in the future.
In this lesson, we’ll learn how to write a function so that we can repeat several operations with a single command.
Defining a function
Let’s open a new R script file in the functions/
directory and call it functions-lesson.R.
my_sum <- function(a, b) {
the_sum <- a + b
return(the_sum)
}
Let’s define a function fahr_to_kelvin that converts temperatures from Fahrenheit to Kelvin:
fahr_to_kelvin <- function(temp) {
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}
We define fahr_to_kelvin
by assigning it to the output of function
. The list of argument names are contained within parentheses. Next, the body of the function–the statements that are executed when it runs–is contained within curly braces ({}
). The statements in the body are indented by two spaces. This makes the code easier to read but does not affect how the code operates.
When we call the function, the values we pass to it are assigned to those variables so that we can use them inside the function. Inside the function, we use a return statement to send a result back to whoever asked for it.
Let’s try running our function. Calling our own function is no different from calling any other function:
# freezing point of water
fahr_to_kelvin(32)
[1] 273.15
# boiling point of water
fahr_to_kelvin(212)
[1] 373.15
Challenge 1
Write a function called kelvin_to_celsius
that takes a temperature in Kelvin and returns that temperature in Celsius
Hint: To convert from Kelvin to Celsius you minus 273.15
Combining functions
The real power of functions comes from mixing, matching and combining them into ever large chunks to get the effect we want.
Let’s define two functions that will convert temperature from Fahrenheit to Kelvin, and Kelvin to Celsius:
fahr_to_kelvin <- function(temp) {
kelvin <- ((temp - 32) * (5 / 9)) + 273.15
return(kelvin)
}
kelvin_to_celsius <- function(temp) {
celsius <- temp - 273.15
return(celsius)
}
Challenge 2
Define the function to convert directly from Fahrenheit to Celsius, by reusing the two functions above (or using your own functions if you prefer).
Applying functions to datasets
We’re going to define a function that calculates the average year of birth in our health dataset:
# Takes a dataset and calculates the average year of birth for a
# specified study group.
calcBirthYearAverage <- function(dat) {
birthYearAverage <- mean(dat$birthYear)
return(birthYearAverage)
}
We define calcBirthYearAverage
by assigning it to the output of function
. The list of argument names are contained within parentheses. Next, the body of the function – the statements executed when you call the function – is contained within curly braces ({}
).
We’ve indented the statements in the body by two spaces. This makes the code easier to read but does not affect how it operates.
When we call the function, the values we pass to it are assigned to the arguments, which become variables inside the body of the function.
Inside the function, we use the return
function to send back the result. This return function is optional: R will automatically return the results of whatever command is executed on the last line of the function.
calcBirthYearAverage(healthData)
[1] 1933.588
That’s not very informative, since the dataset comprises data from two studies that were performed decades apart. Let’s add another argument so we can calculate the average year of birth for a particular study group.
# Takes a dataset and calculates the average year of birth for a
# specified study group.
calcBirthYearAverage <- function(dat, group = "Group 1") {
birthYearAverage <- mean(dat[dat$HIGroup == group, ]$birthYear)
return(birthYearAverage)
}
If you’ve been writing these functions down into a separate R script (a good idea!), you can load in the functions into our R session by using the source
function:
source("functions/functions-lesson.R")
The function now subsets the provided data by group before taking the average year of birth. A default value of 1 is given for group, so that if no value is specified when you call the function, the result of the function will be for group 1. You need to be careful when setting default values; sometimes you can get some unexpected behaviour from functions if you don’t realise that an argument has a default value.
Let’s take a look at what happens when we specify the study group:
calcBirthYearAverage(healthData,"Group 1")
[1] 1910.041
calcBirthYearAverage(healthData,"Group 2")
[1] 1955.426
calcBirthYearAverage(healthData)
[1] 1910.041
What if we want to look at the average year of birth for specific year levels?
Challenge 3
Define the function to calculate the average year of birth for specific year levels of a single study group. Hint: Look up the function %in%, which will allow you to subset by multiple year levels
Challenge 4
The paste
function can be used to combine text together, e.g:
best_practice <- c("Write", "programs", "for", "people", "not", "computers")
paste(best_practice, collapse=" ")
[1] "Write programs for people not computers"
Write a function called fence
that takes two vectors as arguments, called text
and wrapper
, and prints out the text wrapped with the wrapper
:
fence(text=best_practice, wrapper="***")
Note: the paste
function has an argument called sep
, which specifies the separator between text. The default is a space: " “. The default for paste0
is no space”“.
Challenge solutions
Solution to challenge 1
Write a function called kelvin_to_celsius
that takes a temperature in Kelvin and returns that temperature in Celsius
kelvin_to_celsius <- function(temp) {
celsius <- temp - 273.15
return(celsius)
}
Solution to challenge 2
Define the function to convert directly from Fahrenheit to Celsius, by reusing these two functions above
fahr_to_celsius <- function(temp) {
temp_k <- fahr_to_kelvin(temp)
result <- kelvin_to_celsius(temp_k)
return(result)
}
Solution to challenge 3
Define the function to calculate the average year of birth for specific year levels of a single study group. Hint: Look up the function %in%, which will allow you to subset by multiple year levels
calcBirthYearAverage <- function(dat, group, yearLevel) {
birthYearAverage <- mean(dat[dat$HIGroup == group & dat$education %in% yearLevel, ]$birthYear)
return(birthYearAverage)
}
Solution to challenge 4
Write a function called fence
that takes two vectors as arguments, called text
and wrapper
, and prints out the text wrapped with the wrapper
:
fence <- function(text, wrapper){
text <- c(wrapper, text, wrapper)
result <- paste(text, collapse = " ")
return(result)
}
best_practice <- c("Write", "programs", "for", "people", "not", "computers")
fence(text=best_practice, wrapper="***")
[1] "*** Write programs for people not computers ***"