[1] 2.468 2.731 2.670 2.779 3.250 3.185
Intro to Functions
1 Functions in R
A “function” is a procedure that takes any number of inputs and produces a single output. You’re probably familiar with functions in Excel, like sum(...)
, average(...)
, etc. In R, functions are used in a similar way.
A function is called by its name, followed by parentheses. Inside the parentheses, you put the “arguments” (inputs) to the function.
2 Functions on data frames
You have already come across some R functions. Let’s give you a quick reminder — here are some functions that are handy for quick looks at a data frame, often used in the console. For more on similar functions, see this page.
These functions provide slightly different summary information:
summary(df)
: gives a summary of each column in the data frameglimpse(df)
: a more succint version of summarystr(df)
: gives the structure of the data frame
More specific functions for data frames include:
nrow(df)
: gives the number of rows in the data framencol(df)
: gives the number of columns in the data framedim(df)
: gives the dimensions of the data frame (rows, columns)head(df)
: shows the first few rows of the data frametail(df)
: shows the last few rows of the data framenames(df)
: gives the names of the columns in the data frame
3 Named arguments
Consider some of the data in the UnivGPA
column of the admitdata
data table:
Let’s now apply the round()
function to it:
This function rounds a number (or vector of numbers, in this case) to the nearest integer.
What if we wanted to round to the first number after the decimal? Or the second number before the decimal? Wouldn’t it be strange to have to create a functions roundtotenth()
and roundtohundreds()
? The creators of R thought so, too.
They addressed this by adding more information to the function call when it’s needed, as so:
The inputs that go between the parentheses for a function are called “arguments” for historical reasons. In R, every argument has a name, as in the example with the round()
function above. The second argument digits
specifies how many places we want in the result. You can see the names of arguments using the help information about any function by doing one of the following:
- Typing
?function_name
, - Putting the cursor on the function and hitting
F1
, - Using the Help tab in the lower right panel to search, or
- Just search the internet (usually in the form of something like “r function THEFUNCTION”)
The help information for round()
shows that the names of the arguments are x
and digits
. The x
argument is the vector of numbers to round, and digits
is the number of decimal places to round to.
We can omit the name of the arguments if they appear in the right order. It would be tedious to type this all the time:
Instead we can leave out the names, as long as the arguments are in the correct order:
The following will not do what we expect because the arguments are in the wrong order:
If for some reason you want to use the arguments in a non-traditional order (we’re not sure why!), then if you give the argument names, you can do so:
4 Function composition
We can use functions within functions. This is called function composition.
4.1 The basic problem setup
Suppose you want to give a researcher student data, but want some protections on the privacy of individuals.
Let’s make up some data for this situation:
studentdata <-
tibble(id = sample(40000:80000, 5, replace = FALSE),
name = c("Marcia", "Jan", "Cindy", "Alice", "Greg"),
age = sample(7:15, 5, replace = FALSE),
county = sample(10000:30000, 5, replace = FALSE),
gpa = runif(5, min = 2.5, max = 3.8))
studentdata
# A tibble: 5 × 5
id name age county gpa
<int> <chr> <int> <int> <dbl>
1 74700 Marcia 13 12899 2.56
2 59270 Jan 7 27166 3.16
3 74001 Cindy 11 29660 2.76
4 55254 Alice 14 20077 3.59
5 75183 Greg 10 20892 3.33
We could slightly perturb GPAs by adding a small random number to each one. Here’s a way to do that.
[1] -0.021255755 -0.067369856 0.029681873 0.009153463 -0.072927387
This generated 5 random numbers between -0.1 and 0.1.
We are now ready to create the data table, studentdatamask
, that we will distribute to the researcher. We will first add the noise
vector to gpa
, and then we will remove the identifying id
and name
columns from the data table:
4.2 The need for function composition
In the previous example, we hard coded the 5
in 4 of the functions when creating the studentdata
tibble as well as when creating the noise
vector. These would only work, of course, if the table has 5 rows. (Duh.) What about next time we run it and it has 12? It would be better to specify the correct number of rows automatically, like this:
The first argument to the random number generator runif
is n
, which is the number of random numbers to generate. Here we use the nrow()
function to get the number of rows in the data frame. This way, the code will work no matter how many rows are in the data frame.
Most commonly we will use functions to transform data within a mutate()
or summarize()
function, as with the following:
studentdata |>
summarize(N = n(),
mean_gpa = round(mean(gpa),2),
sd_gpa = round( sd(gpa), 2),
SE_gpa = round( sd_gpa / sqrt(N), 2),
min_gpa = min(gpa),
max_gpa = max(gpa))
# A tibble: 1 × 6
N mean_gpa sd_gpa SE_gpa min_gpa max_gpa
<int> <dbl> <dbl> <dbl> <dbl> <dbl>
1 5 3.08 0.42 0.19 2.56 3.59
Look closely at the code to see where functions are within other functions. One annoyance is making sure the parentheses are balanced. The RStudio IDE will help with this by highlighting the matching parentheses when you put the cursor on one of them. (You can also turn on “rainbow” mode that colors them differently depending on their nesting level. To do that to go Tools -> Global Options -> Code -> Display -> Show rainbow parentheses.)
5 Creating your own functions
You can create your own functions in R. This is a powerful way to make your code more readable and reusable. Here’s an example of a simple function that takes a number and returns the square of that number.
When you execute the code block, it stores the function definition for later use. After that you can use it like any other function:
You can see that this function call returned the squares of 1, 2, …, 5.
Here is another new function, cubd
, that takes a number and returns the cube of that number (that is, raised to the third power). Powers can be created with the ^
operator as with the sqrd()
function we created.
Now check out this math identity that’s kind of amazing. If we sum up a sequence of numbers 1, 2, …, N and then square the result, it’s the same as summing the cubes of the numbers 1, 2, …, N. These should give you the same result
N = 7 # change this to whatever you like
squareofsum = sqrd(sum(1:N)) # sum up then square result
sumofcubes = sum(cubd(1:N)) # cube first then sum
paste(squareofsum, sumofcubes, sep = " = ")
[1] "784 = 784"
And so should these:
N = 25 # change this to whatever you like
squareofsum = sqrd(sum(1:N)) # sum up then square result
sumofcubes = sum(cubd(1:N)) # cube first then sum
paste(squareofsum, sumofcubes, sep = " = ")
[1] "105625 = 105625"
Note that we could write these using pipes, and it might be easier to follow: