1-13: Functions
0.1 Future changes
- 4 parts of programming
1 Purpose
Create reusable code (a function)
Use arguments in functions
Use return values in functions
2 Questions about the material…
The files for this lesson:
Main Script: download the main script here
Function Script: download the function script here
Function scripts should never include rm(list=ls())
. The reason is that function scripts are called from other script and rm(list=ls())
would clear the Environment of the calling script.
If you have any questions about the material in this lesson, feel free to email them to the instructor, Charlie Belinsky, at belinsky@msu.edu.
3 Reusable scripts
Functions are self-contained scripts that performs a common task. For example, there are statistical functions that take a vector of values and return the statistical value (e.g., mean() and median()), functions that take data and create plots (e.g., plot(), boxplot()), and functions that interact with the Console (e.g., cat() and print()). The functions listed above are come with R and are designed to be used in any script.
Broadly, there are three types of functions in R:
1) Base-R: these are function that come with R
2) Packages: extensions of R (e.g., ggplot2, tidyverse – covered later in class)
3) User-created: the main focus of the next few lessons
4 Parts of a function
All functions, whether base-R, in packages, or user-created, have the same four components: name, arguments, codeblock, and return value.
In the Help tab we can see all components except the codeblock for the statistical function, mean():

4.1 Name
Functions, like variables, are named objects that stores information. The difference is that variables store values whereas functions store script.
The name of the function in Figure 1 is mean(). When referring to a function, we often put parentheses after the name to indicate that this object is a function, not a variable.
4.2 Arguments (inputs to a function)
Arguments are variables whose values are supplied by the caller and used by the function.
In the Help tab (Figure 1) we see that the Default S3 Method (the most commonly used method) for mean() has three arguments (x, trim, na.rm). Note: the ( … ) indicates that mean() can be extended with more arguments.
mean(x, trim = 0, na.rm = FALSE, ...)
In this case, x is the vector of values that the caller want to find the mean for. Note: x is often used by R as a generic argument for input values.
The other two arguments, trim and na.rm, are tweaks to mean() that give mean() more functionality. trim can be used to remove extreme values and na.rm can be used to ignore NA values. trim and na.rm have default values (trim = 0, na.rm= FALSE)
meaning the caller does not need to provide values for these arguments unless they want to change the default functionality of mean().
4.3 Codeblock
In order to interact with a function, the caller needs to know the functions’ (1) name, (2) arguments, and (3) return value. The caller generally does not need to know the rest of the code inside the codeblock, and this code is usually hidden from the caller.
The Help tab does not show the codeblock for mean(). However, you can see the codeblock for mean() using this code in Console:
> getAnywhere("mean.default");
function (x, trim = 0, na.rm = FALSE, ...)
«{»if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
warning("argument is not numeric or logical: returning NA")
return(NA_real_)
}if (isTRUE(na.rm))
<- x[!is.na(x)]
x if (!is.numeric(trim) || length(trim) != 1L)
stop("'trim' must be numeric of length one")
<- length(x)
n if (trim > 0 && n) {
if (is.complex(x))
stop("trimmed means are not defined for complex data")
if (anyNA(x))
return(NA_real_)
if (trim >= 0.5)
return(stats::median(x, na.rm = FALSE))
<- floor(n * trim) + 1
lo <- n + 1 - lo
hi <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi]
x
}.Internal(mean(x))
«}»
For a full explanation of why you need to use mean.default() instead of mean() go to Extension: codeblock for mean()
The main takeaway is that, like if-else statements and for loops, all functions have an associated codeblock that is designated by curly bracket ( { }
). We will see more function codeblocks when we look at the function script for this lesson.
4.4 Return value (output of a function)
When a caller calls a function, they usually expect to get some information back from the function. In the case of mean(), the caller is expecting the mean value of the vector provided as an argument. The Value section of Figure 1 gives information about what the function is returning to the caller. mean() returns alength-one object.
In other words, the return value from mean() is one value: the mean.
4.5 Using a function
We will call mean() multiple times with the same vector of values but changing the arguments. One value in the vector is NA, which we need to deal with because the mean of any vector with an NA value is NA. Note: this was covered in more detail in lesson 5 with median().
There are four calls below to the function mean(). Each call passes in one or more argument value, and saves the return value from the function to the variables named ret1, ret2, ret3, and ret4.
= c(10, 15, 5, NA, -100, 10); # has an NA value
testVector = mean(x=testVector); # uses default for na.rm, which is FALSE
ret1a = mean(x=testVector, na.rm=FALSE); # same as above
ret1b = mean(x=testVector, na.rm=TRUE); # remove NAs
ret1c = mean(x=testVector, na.rm=TRUE, trim=0.1); # remove NAs and trims high and low value ret1d
In the Environment we can see the return values stored in the variables.
: NA_real_
ret1a: NA_real_
ret1b: -12
ret1c: 10 ret1d
ret1 and ret2 are NA_real because there was an unignored NA value in the vector and we did not choose to remove the NA. note: NA_real_ means the value is NA but should be a real number (as opposed to a string value).
ret3 is -12, the mean of the values when the NA is removed
ret4 is 10, the mean of the values when the high (15) and low (-100) values are also removed. Note: trim=0.1 means trim top and bottom 10%.
5 Using a function script
So far we have used base-R functions. Now we will use functions from script for the lesson named 1-13_myFunctions.R (in Section 2).
5.1 Function header
In 1-13_myFunctions.R, there are two functions that calculate mean called mean_class() and mean_adv(). Both functions have an argument named vec, which is the values that the caller want to find the mean of. mean_adv() has a second argument called removeNA, which is set to a default value of FALSE.
The first line of a function is called the header and gives the names and arguments needed to use the function.
= function(vec)
mean_class
= function(vec, removeNA = FALSE) mean_adv
5.2 Sources the functions script
Since the functions are in a separate script, we must first read in the functions script to gain access to the functions. In the main lesson script, 1-13_Functions.R. we read in the functions script using source() and the path to the function script as the argument (in this case: scripts/myFunctions.R):
source("scripts/1-13_myFunctions.R");
The above code adds the two functions from 1-13_myFunction.R to the Functions section of the Environment, giving the script the ability to use the functions.
Functions: function (vec, removeNA = FALSE)
mean_adv: function (vec) mean_class
note: source(), like the Source button, executes all code within a script
5.3 Parts of the functions
Looking at the two functions in 1-13_myFunctions.R, we see the four components of a function:
name: mean_class() and mean_adv()
arguments:
mean_class() has one: vec
mean_adv() has two: vec and removeNA
codeblock: everything between ( { ) and ( } ) – this is the code that gets executed when the function is called
return value: both functions return the mean value using return()
6 Using the function script
The function, mean_class() solves for the mean by cycling through all the values in vec using a for loop, adds the values, and then dividing that by the number of values in vec.
6.1 Header line of a function
The first line of a function (i.e., the header) gives the name of the function (mean_class) and the arguments (vec):
= function(vec) mean_class
vec is the vector that the caller wants to find the mean for. Since vec has no default value, vec must be supplied by the caller.
6.2 Function codeblock
Functions, just like if() and for(), have codeblocks attached to them that are executed when the function is called. The codeblock is encapsulated with curly brackets { }.
= function(«vec»)
mean_class # start function codeblock
{ = 0; ### state variable -- starts at 0
vecAdded
### Use the for loop to cycle through all the values in vec and add them to vecAdded
for(i in 1:length(«vec»))
{### Adds the next value in vec
= vecAdded + «vec»[i];
vecAdded
}
### Divide the total value by the number of values to get the mean
= vecAdded / length(«vec»);
meanVal
### return the mean value to the caller
return(meanVal)»;
«# end function codeblock }
A function codeblock operates like any other R code with two exceptions:
The arguments in the function call are variables used in the codeblock. So, the caller gives a value to vec and the codeblock uses this value.
Functions explicitly return a value to the callerusing return(). In this case, meanVal is returned.
6.3 Return value
To the caller a function acts largely like a block box that takes input (arguments) and has an output (the return value). The return value is sent to the caller using the function return():
return(meanVal);
When the caller calls the function, they often set a variable equal to the function (e.g., retVal in the code below). The return value is stored in the variable set equal to the function:
= mean_class(c(3,5,2,7)); # saves the return value (4.25) to the variable retVal retVal
6.4 Calling the function
Remember that a function, like a variable, cannot be used unless it is put in your Environment, this is what source("scripts/1-13_myFunctions.R")
does.
We will call mean_class() multiple times and save the results to four different variables:
= mean_class(vec=c(6,2,8,3)); # 4.75
ret2a # ret2b = mean_class(); # error because argument not provided
= mean_class(vec=c(6,2, NA, 8,3)); # NA_Real because of the NA
ret2c = mean_class(vec=c(6,2,8,3,75,200)); # 49
ret2d = c(7, -7, 10, -8);
anotherVec = mean_class(vec=anotherVec); # can use a predefined vector ret2e
Each call to the function mean() passes in values for one or more arguments and saves the return value to a variable (ret2a, ret2b…). In the Environment we can see the return values stored in the variables.
: 4.75
ret2a: NA_real_
ret2c: 49
ret2d: 1 ret2e
7 mean_adv
Just like mean_class(), the function, mean_adv() solves for the mean by going through all the values, adding them up, and then dividing that answer by the number of values in vec. The difference is that mean_adv() checks for NA values and gives the caller the option to remove them.
7.1 Header line of a function
The header for mean_adv() looks similar to mean_class() except there is a second argument, removeNA, and this argument has a default value of FALSE. This means if the caller does not set removeNA, it will be FALSE (i.e., NA values will not be removed).
= function(vec, removeNA = FALSE) mean_adv
7.2 Codeblock
In the codeblock attached to mean_adv(), we need to conditionally handle the situation where NA values are to be removed, based on the value of the argument removeNA. We do this using an if-else-if
structure inside the for
loop.
= function(vec, removeNA = FALSE)
mean_adv
{= 0; ### state variable -- starts at 0
vecAdded = 0; ### second state variable that counts the number of NA values
numNA
### Use the for loop to cycle through all the values in vec and add them to vecAdded
for(i in 1:length(vec))
{is.na(vec[i]) == FALSE) ### If the value is not NA
«if»(
{### Adds the next value in vec
= vecAdded + vec[i];
vecAdded
== TRUE) ## we have a NA value and want to remove it
}«else if» (removeNA
{### Don't add the value, instead increase the number of NA by 1
= numNA +1;
numNA
== FALSE) ## we have a NA value and don't want to remove it
}«else if» (removeNA
{### We cannot solve for a mean with an NA so the return value has to be NA
return(NA); # this will end the function
}
### Divide the total value by the number of values that are not NA to get the mean
= vecAdded / ( length(vec) - numNA);
meanVal
### return the mean value to the caller
return(meanVal);
}
There are three possible situations, hence three parts to the if-else-if structure:
if(is.na(vec[i]) == FALSE) # The current vec value is not NA
- The ith value is not NA (i.e., it’s a number), so add the number to the total
else if (removeNA == TRUE) # We have a NA value and want to remove it
- ignore the NA value and add 1 to the number of NA values
else if (removeNA == FALSE) # We have a NA value and don't want to remove it
- we know the return value has to be NA – so return NA
7.3 return value
Just like break immediately ends a for loop, return() immediately ends a function.
There are two return() statements in this function:
Once an NA value is hit, and the caller did not ask for NA values to be removed. At this point we know the answer is NA. There is no point in checking any more values in the vector. We put a return() to end the function and pass NA back to the caller
At the end of the codeblock after the for loop cycles through all the values in the vector.
Since return()
ends a function, only one return()
can be executed per function call.
7.4 Using the function
= mean_adv(vec=c(6,2,8,3)); # 4.75
ret3a # ret3b = mean_adv(); # will cause error because argument not provided
= mean_adv(vec=c(6,2, NA, 8,3)); # will be NA (removeNA default is FALSE)
ret3c = mean_adv(vec=c(6,2, NA, 8,3), removeNA = TRUE); # 4.75
ret3d = mean_adv(vec=c(6,2,8,3, 75,200)); # 49 ret3e
Each line of code above calls the function mean(), passes in values for one or more arguments, and saves the return value from the function to a variable. In the Environment we can see the return values stored in the variables.
: 4.75
ret3a: NA_real_
ret3c: 4.75
ret3d: 49 ret3e
8 Application
1) For this application you need to create two scripts:
a functions script named app1-13_functions.r that contains the functions created in this application
a main script named app1-13.r where you will answer questions in comments and test the functions created in app1-13_functions.r
source() your functions script from the main script
Make sure you test all the functions thoroughly in your main script. I want to see the test code in app1-13.r.
2) For the function mean_adv(), answer the following questions in comments in the main script:
Why do we count the number of NAs?
For the two else if statements, what assumption can be made about the ith value being checked? Why can we make this assumption?
Why does removeNA have a default value but vec does not?
Under what circumstances, if any, will this line not be executed?
= vecAdded / ( length(vec) - numNA); meanVal
3) Create a function that returns either the standard deviation or variance of a vector:
There should be two arguments: (1) the vector and (2) something that allows the caller choose whether they want the standard deviation or the variance of the vector.
Do not use var() or sd() – use your own formula. You can use the code from application 1-03 as a starting point.
4) Create a function that converts a temperature from either Celsius to Fahrenheit or Fahrenheit to Celsius
The conversion from Celsius to Fahrenheit is: \(F=\frac{9}{5} C+32\)
The user needs to pass in two arguments: (1) the temperature value and (2) which direction they want to convert
5) Create a function that takes a single number from 0 to 100 and returns a grade from A to F.
- Return an error if the number is less than 0 and return a different error if the number is greater than 100.
6) Create a function that takes multiple numbers from 0 to 100 and return the percentage of values that are above 60.
Do not count numbers less than 0 or greater than 100
Test the function with 10 random numbers from -20 to 120
sample(-20:120, size=10);
Save the script as app1-13.r in your scripts folder and email your Project Folder to Charlie Belinsky at belinsky@msu.edu.
Instructions for zipping the Project Folder are here.
If you have any questions regarding this application, feel free to email them to Charlie Belinsky at belinsky@msu.edu.
8.1 Questions to answer
Answer the following in comments inside your application script:
What was your level of comfort with the lesson/application?
What areas of the lesson/application confused or still confuses you?
What are some things you would like to know more about that is related to, but not covered in, this lesson?
9 Extension: getAnywhere() and mean()
getAnywhere() is a base R function that allows you to see the codeblock associated with other base-R functions. But if you pass in mean as the argument you get this:
> getAnywhere("mean")
A single object matching ‘mean’ was found in the following places
It was found :base
package:base with value
namespace
function (x, ...)
UseMethod("mean") <bytecode: 0x000001c00fe2d450>
<environment: namespace:base>
This is because mean() is a function that calls other functions (sometimes called a generic function) depending on the argument type (e.g., numeric, string, date).
To find the possible functions that mean() calls, we use methods()
> methods("mean")
1] mean.Date mean.default mean.difftime mean.POSIXct mean.POSIXlt [
There are five possible calls. Four of the calls are related to times and dates. Only the second function, mean.default(), handles number (you can look this up in the Help tab). To see the codeblock for the function that handles numeric values, we need to pass mean.default into the argument of getAnywhere():
> getAnywhere("mean.default");
A single object matching ‘mean.default’ was foundin the following places
It was found :base
packagefor mean from namespace base
registered S3 method :base
namespace
with value
function (x, trim = 0, na.rm = FALSE, ...)
{if (!is.numeric(x) && !is.complex(x) && !is.logical(x)) {
warning("argument is not numeric or logical: returning NA")
return(NA_real_)
}if (isTRUE(na.rm))
<- x[!is.na(x)]
x if (!is.numeric(trim) || length(trim) != 1L)
stop("'trim' must be numeric of length one")
<- length(x)
n if (trim > 0 && n) {
if (is.complex(x))
stop("trimmed means are not defined for complex data")
if (anyNA(x))
return(NA_real_)
if (trim >= 0.5)
return(stats::median(x, na.rm = FALSE))
<- floor(n * trim) + 1
lo <- n + 1 - lo
hi <- sort.int(x, partial = unique(c(lo, hi)))[lo:hi]
x
}.Internal(mean(x))
}<bytecode: 0x000001c00fc0f500>
<environment: namespace:base
10 Extension: Functions without arguments
Most of the time when you call a function, including the ones above, an argument is needed inside the parentheses.
This is not always true, for instance c(), which creates a vector, is often used without an argument, which means the vector initially has no values, also called a NULL vector:
= c(); # a vector that initially has no values (an empty vector)
vec1 = c(5,2,9,1); # a vector with four values vec2
: NULL
vec1: num [1:4] 5 2 9 1 vec2