Creating functions and using lapply in R

Functions are used to simplify a series of calculations.

For instance, let us suppose that there exists an array of numbers which we wish to add to another variable. Instead of carrying out separate calculations for each number in the array, it would be much easier to simply create a function that does this for us automatically.

A function in R generally works by:

a. Defining the variables to include in the function and the calculation. e.g. to add two numbers together, our function is:

function(number1,number2) {(number1+number2)}

b. Using sapply to define the list of numbers and typically an associated sequence:

sapply(c(20,40,60), number1addnumber2, seq(2,20,by=2))

To see how this works in practice, let us take a look at the examples below:

1. Add an array of numbers to a sequence

> #Function1
> number1addnumber2<-function(number1,number2) {(number1+number2)}
> result1<-sapply(c(20,40,60),number1addnumber2,seq(2,20,by=2))
> result1df<-data.frame(result1)
> result1df
   X1 X2 X3
1  22 42 62
2  24 44 64
3  26 46 66
4  28 48 68
5  30 50 70
6  32 52 72
7  34 54 74
8  36 56 76
9  38 58 78
10 40 60 80

2. Subtract an array of numbers from a sequence

> #Function2
> number1minusnumber2<-function(number1,number2) {(number1-number2)}
> result2<-sapply(c(20,40,60),number1minusnumber2,seq(2,20,by=2))
> result2df<-data.frame(result2)
> result2df
   X1 X2 X3
1  18 38 58
2  16 36 56
3  14 34 54
4  12 32 52
5  10 30 50
6   8 28 48
7   6 26 46
8   4 24 44
9   2 22 42
10  0 20 40

3. Multiply an array of numbers by a sequence

> #Function3
> multiplynumber1bynumber2 <-function(number1,number2) {(number1*number2)}
> result3<-sapply(c(20,40,60), multiplynumber1bynumber2, seq(2,20, by=2))
> result3df=data.frame(result3)
> result3df
    X1  X2   X3
1   40  80  120
2   80 160  240
3  120 240  360
4  160 320  480
5  200 400  600
6  240 480  720
7  280 560  840
8  320 640  960
9  360 720 1080
10 400 800 1200

4. Divide an array of numbers by a sequence

> #Function4
> dividenumber1bynumber2<-function(number1,number2) {(number1/number2)}
> result4<-sapply(c(20,40,60), dividenumber1bynumber2, seq(2, 20, by=2))
> result4df<-data.frame(result4)
> result4df
          X1        X2        X3
1  10.000000 20.000000 30.000000
2   5.000000 10.000000 15.000000
3   3.333333  6.666667 10.000000
4   2.500000  5.000000  7.500000
5   2.000000  4.000000  6.000000
6   1.666667  3.333333  5.000000
7   1.428571  2.857143  4.285714
8   1.250000  2.500000  3.750000
9   1.111111  2.222222  3.333333
10  1.000000  2.000000  3.000000

5. Raise number to a power

> #Function5
> raisenumber1bypower<-function(number1,power) {(number1^power)}
> result5<-sapply(c(20,40,60),raisenumber1bypower,seq(0.5,2.5,by=0.5))
> result5df<-data.frame(result5)
> result5df
           X1           X2           X3
1    4.472136     6.324555     7.745967
2   20.000000    40.000000    60.000000
3   89.442719   252.982213   464.758002
4  400.000000  1600.000000  3600.000000
5 1788.854382 10119.288513 27885.480093

6. Create a probability function

This function calculates a mu variable by using lambda and powers.

You can see more details on how the below function works at my other post.

> #Functionprob
> muCalculation <- function(lambda, powers) {1 - ((1 - lambda)^powers)}
> probability_at_lambda <- sapply(c(0.02, 0.04, 0.06), muCalculation, seq(0, 100, 10))
> probability_at_lambdadf=data.frame(probability_at_lambda)
> probability_at_lambdadf
          X1        X2        X3
1  0.0000000 0.0000000 0.0000000
2  0.1829272 0.3351674 0.4613849
3  0.3323920 0.5579976 0.7098938
4  0.4545157 0.7061424 0.8437444
5  0.5542996 0.8046338 0.9158384
6  0.6358303 0.8701142 0.9546693
7  0.7024469 0.9136477 0.9755842
8  0.7568774 0.9425902 0.9868493
9  0.8013511 0.9618321 0.9929168
10 0.8376894 0.9746247 0.9961849
11 0.8673804 0.9831297 0.9979451

Run Functions Across Lists Using lapply

Functions are very useful when it comes to running a command more than once on particular groups of data. While we could use a for loop for this purpose, combining a pre-defined function with lapply is a very efficient and useful function in R. Let’s see how it works.

Suppose that we have two groups of time series (TS1 and TS2). The objective is to split these two time series and then run an ARIMA forecast on both of them. Instead of running the ARIMA forecast twice, we wish to use a function to run it repeatedly in a similar manner to a loop.

Firstly, we are defining our dataframe and then splitting it by group:

group<-c("TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS1","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2","TS2")
value<-c("23","19","22","31","33","25","32","29","32","34","41","45","47","49","42","44","43","50","55","57","410","395","402","403","401","390","420","415","417","410","425","427","423","430","432","428","410","405","410","414")
df1<-data.frame(group,value)
newlist <- split(df1, df1$group)
newlist

Then, we are defining our forecast function using auto.arima:

arm <- function(x) plot(forecast(auto.arima(x$value),10))

We can now use lapply to run the function on the list that has now been split by group:

armforecast <- lapply(newlist,arm)

Upon doing this, R returns ARIMA plots for both of our time series.

functions1
functions2

To summarise, we have taken a look at how to create functions in R, and how we can use lapply to replicate a for loop by running a function numerous times on data that has been split according to certain criteria.

Note that you can find even more detail on how to use apply, sapply and lapply at this post from R-bloggers.

Thank you for reading!

Author: Michael Grogan

Michael Grogan is a machine learning consultant and educator, with a profound passion for statistics and data science.