Cumulative Binomial Probability with R and Shiny

probability

In conducting probability analysis, the two variables that take account of the chance of an event happening are N (number of observations) and λ (lambda – our hit rate/chance of occurrence in a single interval). When we talk about a cumulative binomial probability distribution, we mean to say that the greater the number of trials, the higher the overall probability of an event occurring.

probability = 1 – ((1 – λ)N)

For instance, let us suppose that the odds of scoring a goal in the first five minutes of a football match is 0.05 – this is our probability.

Now, let us suppose that 50 different soccer matches are played. The probability that a goal will be scored in the first five minutes of a match now increases to over 92%:

1 – ((1 – 0.05)50) = 0.923055

Based on the law of large numbers, the larger the number of trials; the larger the probability of an event happening even if the probability within a single trial is very low. So, let us generate a cumulative binomial probability to demonstrate how probability increases given an increase in the number of trials.

Firstly, we define a function (with probabilities set at 2%, 4%, and 6%, along with trials of up to 100:
 

par(bg = '#191661', fg = '#ffffff', col.main = '#ffffff', col.lab = '#ffffff', col.axis = '#ffffff')

#lambda = probability of event occuring in a single trial
#powers = number of trials
#mu = overall probability given n number of trials

muCalculation <- function(lambda, powers) {1 - ((1 - lambda)^powers)}
probability_at_lambda <- sapply(c(0.02, 0.04, 0.06), muCalculation, seq(0, 100, 1))

Then, we can set up our data as a data frame and then plot as normal:

probability_at_lambdadf=data.frame(probability_at_lambda)
col_headings <- c("probability1","probability2","probability3")
names(probability_at_lambdadf) <- col_headings
probability_at_lambdadf
attach(probability_at_lambdadf)
plot(probability_at_lambdadf$probability1,type="o",col="#b1aef4", xlab="N", ylab="Probability", xlim=c(0, 100), ylim=c(0.0, 1.0), pch=19)
lines(probability_at_lambdadf$probability2,type="o",col="red", xlab="N", ylab="Probability2", xlim=c(0, 100), ylim=c(0.0, 1.0), pch=19)
lines(probability_at_lambdadf$probability3,type="o",col="green", xlab="N", ylab="Probability3", xlim=c(0, 100), ylim=c(0.0, 1.0), pch=19)
title(main="Probability Chart")
grid(nx = NULL, ny = NULL, col = "lightgray", lty = "dotted",
     lwd = par("lwd"), equilogs = TRUE)
legend("bottomright", probability[2], c("probability_at_lambda_1","probability_at_lambda_2", "probability_at_lambda_3"), cex=0.6, col=c("#b1aef4","red","green"), pch=21:22, lty=1:2)
proc.time()

Sample Table

Here is a sample table with the calculated probabilities (probability_at_lambdadf):

cumulative

Plot

Accordingly, here is a plot of the probabilities:

probability
 
 

Analyse Cumulative Binomial Probability with a Shiny Web Application

This is an example of a Shiny Web application that can calculate cumulative binomial probabilities on the fly.

You’ll remember that our previous R script invoked a function to calculate binomial probabilities based on lambda (the probability of an event happening), and the power value (or number of trials).

The idea is that while the probability of an individual event happening may be low, the cumulative probability of the event happening increases with the number of trials.

1 - ((1 - λ)N)

Here is an example of a Shiny Web App that allows us to manipulate the lambda values using a set of sliders and automatically update the probability curve.

To run this app, open the R Studio console and click File -> New File -> Shiny Web App and select either Single File to paste the ui.R and server.R codes together, or Multiple File to paste them separately.

shiny

Additionally, if you are new to Shiny you can find my full tutorial on Sitepoint that describes how to build and run a Shiny app from scratch.

ui.R

A few points when setting up the UI (User Interface):

  • lambda represents the probability of an event occurring in a single trial
  • The slider input allows the user to set different values for lambda based on the associated probability
  • The plot is then outputted with the output being designated the name "ProbPlot".
library(shiny)

# Define UI for application that draws a probability plot
shinyUI(fluidPage(
  
  # Application title
  titlePanel("Cumulative Binomial Probability Plot"),
  
  # Sidebar with a slider input for value of lambda
  sidebarLayout(
    sidebarPanel(
      sliderInput("lambda",
                  "Probability 1:",
                  min = 0,
                  max = 1,
                  value = 0.01),
      sliderInput("lambda2",
                  "Probability 2:",
                  min = 0,
                  max = 1,
                  value = 0.01),
      sliderInput("lambda3",
                  "Probability 3:",
                  min = 0,
                  max = 1,
                  value = 0.01)
    ),
    
    # Show a plot of the generated probability plot
    mainPanel(
      plotOutput("ProbPlot")
    )
  )
))

server.R

Now, we set up the server - this is the part that takes the inputs and calculates the output that is eventually shown in the UI.

  • The lambda values represent the inputs that we defined in the UI; i.e. the user sets the probability from the slider.
  • The probability function is defined: {1 - ((1 - lambda)^powers)}
  • The separate probability arrays are then calculated (probability_at_lambda, probability_at_lambda2, probability_at_lambda3)
  • The probability is then plotted.
library(shiny)
library(ggplot2)
library(scales)

# Shiny Application
shinyServer(function(input, output) {
  
  # Reactive expressions
  output$ProbPlot <- renderPlot({
    
    # generate lambda based on input$lambda from ui.R
    l=0:1
    lambda <- seq(min(l), max(l), length.out = input$lambda)
    probability=lambda
    l2=0:1
    lambda2 <- seq(min(l2), max(l2), length.out = input$lambda2)
    probability=lambda
    l3=0:1
    lambda3 <- seq(min(l3), max(l3), length.out = input$lambda3)
    probability=lambda
    
    # generate trials based on lambda value
    muCalculation <- function(lambda, powers) {1 - ((1 - lambda)^powers)}
    probability_at_lambda <- sapply(input$lambda, muCalculation, seq(0, 100, 1))
    probability_at_lambda2 <- sapply(input$lambda2, muCalculation, seq(0, 100, 1))
    probability_at_lambda3 <- sapply(input$lambda3, muCalculation, seq(0, 100, 1))
    
    # draw the probability
    par(bg = '#191661', fg = '#ffffff', col.main = '#ffffff', col.lab = '#ffffff', col.axis = '#ffffff')
    plot(probability_at_lambda,type="o",col="#b1aef4", xlab="N", ylab="Probability", xlim=c(0, 100), ylim=c(0.0, 1.0), pch=19)
    lines(probability_at_lambda2,type="o",col="red", xlab="N", ylab="Probability2", xlim=c(0, 100), ylim=c(0.0, 1.0), pch=19)
    lines(probability_at_lambda3,type="o",col="green", xlab="N", ylab="Probability3", xlim=c(0, 100), ylim=c(0.0, 1.0), pch=19)
    title(main="Cumulative Binomial Probability")
  })
  
})

 
 

Conclusion

Today, you have learned how to:

  • Generate a cumulative binomial probability distribution using R
  • Use Shiny to visualise cumulative binomial probability

If you have any questions, please leave them in the comments below and I'll do my best to answer them.

Author: Michael Grogan

Michael Grogan is a machine learning consultant and educator, with a profound passion for statistics and data science.

Leave a Reply

Your email address will not be published. Required fields are marked *

two × five =