X > MAX for independent but not identically distributed betas
October 13, 2023
In this post I provide a general formula for the probability that the realization of a beta random variable is larger than the realizations from a sequence of beta random variables, all of them being independent. Assume \(p_j \sim \mathscr{B}(\alpha_j, \beta_j)\). We have J+1 independent but not identically distributed beta random variables, indexed from 0 to J, and we are interested in calculating \(\mathbb{P}(p_0 > max\{p_1,\dots,p_J\})\). \(\alpha_j\)’s and \(\beta_j\)’s are restricted to the positive integers. A general recursive formula in the style of the ones provided by Evan Miller is:
\[\begin{gather*} \mathbb{P}(p_0 > max\{p_1,\dots,p_J\}) = \mathbb{P}(p_0(\alpha_0, \beta_0) > max \{p_1,\dots, p_{J-1}\}) + \\ - \frac{1}{B(\alpha_0,\beta_0)\Gamma(\beta_J)} \sum_{i=1}^{\alpha_J}\frac{B(\alpha_0 + i -1, \beta_0 + \beta_J)\Gamma(\beta_J + i -1)}{\Gamma(i)} \mathbb{P}(p_0(\alpha_0 + i -1, \beta_0+ \beta_J) > max\{p_1,\dots,p_{J-1}\}) \end{gather*}\]where
\[\begin{aligned} \mathbb{P}(p_g > p_m) = \frac{1}{B(\alpha_m, \beta_m)}\sum_{i = 1}^{\alpha_g}\frac{B(\alpha_m + i - 1, \beta_g + \beta_m)}{(\beta_g+i-1)B(i,\beta_g)} \end{aligned}\]\(B\) denotes the beta function and \(\Gamma\) the gamma function. This object is typically of interest in the Beta-Binomial model and the Bayesian analysis of binary outcome Randomized Control Trials (A/B tests), where the prior on the probability of success is given by (conjugate) independent beta distributions. Note that that the restriction on the parameters can accommodate Haldane’s prior of \(\mathscr{B}(0,0)\) as long as there are at least one success and one failure for that \(j\) and the uniform prior of \(\mathscr{B}(1,1)\).
Due to the reduction in dimension on the \(max\) operator, it is a guarantee that the first equation will end up evaluating a bunch of the bottom equations. Each step reduces the dimension in 1 but requires \(\alpha_j+1\) evaluations of the formula again so it’s computable in \(\mathcal{O}\left(\prod_{j=1}^J(\alpha_j+1)\right)\) time. The formula can also be re-derived such that the sums happen over the \(\beta_j\)’s, instead of the \(\alpha_j\)’s but since the number of successes are typically much smaller than the failures it is more efficient to use the one provided here. They could also be combined, alternating to deliver computational time in \(\mathcal{O}\left(\prod_{j=1}^J \min \{\alpha_j+1, \beta_j +1\} \right)\).
This formula adds to what was already provided by Miller. Since calculation is quite burdensome I provide here a general R function based on the recursive nature of the formula. Memoization should substantially improve the performance of the function presented here. I do not know whether someone else has derived this formula elsewhere. I have searched for a bit but haven’t found it, so I would appreciate if someone pointed me to such formula being discovered. At the end I provide a sketch of the proof.
To calculate \(\mathbb{P}(p_i > max\{p_0, p_1, \dots, p_{i-1}, p_{i+1}, \dots p_J\})\) one only needs to reorder the \(\alpha_j\)’s and \(\beta_j\)’s appropriately. From simple rules of probability where \(P(A_1 \cap A_2) = P(A_1) - P(A_1 - A_2)\) we can obtain that the negative term corresponds to the probability of the following event: \(p_J > p_0 > max\{p_1,\dots,p_{J-1}\}\); that \(p_J\) is the largest and \(p_0\) the second largest. This object could also be of interest, specially due to the possibility of reordering to calculate any event of the type \(p_k > p_g > max\{p_1,\dots,p_{k-1},p_{k+1}, \dots,p_{g-1},p_{g+1},\dots,p_J\}\) and possibly other sequences of inequalities by going down the recursive rabbit hole. The relation with order statistics as just seen and the recursive nature (combinatorial or exponentially increasing evaluations), give some Bapat-Beg Theorem taste. It also makes me suspect of a close relationship between the maximum of beta random variables and the beta distribution itself.
ProbGBeatsM <- function(ag, bg, am, bm){
Bf <- sapply(1:ag, function(i) {exp(lbeta(am + i - 1, bg + bm) -
log(bg + i - 1) - lbeta(i, bg))})
return(sum(Bf)/beta(am,bm))
}
InnerTerm <- function(i, a0, b0, bJ, aS, bS){
return(exp(lbeta(a0 + i - 1, b0 + bJ) + lgamma(bJ + i - 1) - lgamma(i)
)*Prob0beatsAll(a0 + i - 1, b0 + bJ, head(aS, -1), head(bS, -1)) )
}
Prob0beatsAll <- function(alpha0, beta0, alpha_, beta_){
J = length(alpha_)
if (J == 1) {
return(ProbGBeatsM(alpha0, beta0, alpha_, beta_))
} else {
return( Prob0beatsAll(alpha0, beta0, head(alpha_, -1), head(beta_, -1))
- sum(sapply(1:alpha_[J], InnerTerm, a0 = alpha0, b0 = beta0, bJ = beta_[J],
aS = alpha_, bS = beta_))*exp(-lbeta(alpha0, beta0) - lgamma(beta_[J])) )
}
}
alpha0 = 5
beta0 = 8
alpha_ <- c(5,5,5,5)
beta_ <- c(8,8,8,8)
print(Prob0beatsAll(alpha0, beta0, alpha_, beta_)) # This confirms the 1/(J+1) case of iid variables.
## [1] 0.2
Proof Sketch
With some notational abuse:
\[\begin{aligned} \mathbb{P}(p_0 > max\{p_1,\dots,p_J\}) &= \int_0^1 \int_0^{p_0} \int_0^{p_0} \cdots \int_0^{p_0} \prod_{j=0}^J f_{j}(p_{j}) dj = \int_0^1 \prod_{j=1}^{J-1} \mathbb{I}_{p_0}(\alpha_j,\beta_j) \frac{p_0^{\alpha_0-1}(1-p_0)^{\beta_0-1}}{B(\alpha_0, \beta_0)} \mathbb{I}_{p_0}(\alpha_J,\beta_J) dp_0 \\ &= \int_0^1 \prod_{j=1}^{J-1} \mathbb{I}_{p_0}(\alpha_j,\beta_j) \frac{p_0^{\alpha_0-1}(1-p_0)^{\beta_0-1}}{B(\alpha_0, \beta_0)} \left(1-\sum_{i=1}^{\alpha_J}\frac{\Gamma(\beta_J +i-1)}{\Gamma(\beta_J)\Gamma(i)}p_0^{i-1}(1-p_0)^{\beta_J} \right) dp_0\\ &= \mathbb{P}(p_0 > max\{p_1,\dots,p_{J-1}\}) - \sum_{i=1}^{\alpha_J}\frac{\Gamma(\beta_J +i-1)}{\Gamma(\beta_J)\Gamma(i)}\int_0^1\prod_{j=1}^{J-1} \mathbb{I}_{p_0}(\alpha_j,\beta_j) \frac{p_0^{\alpha_0 +i-2}(1-p_0)^{\beta_0+\beta_J-1}}{B(\alpha_0,\beta_0)} dp_0 \\ &= \mathbb{P}(p_0 > max\{p_1,\dots,p_{J-1}\}) - \sum_{i=1}^{\alpha_J}\frac{B(\alpha_0 +i-1, \beta_0+\beta_J) \Gamma(\beta_J +i-1)}{B(\alpha_0,\beta_0)\Gamma(\beta_J)\Gamma(i)}\int_0^1 \prod_{j=1}^{J-1} \mathbb{I}_{p_0}(\alpha_j,\beta_j) \frac{p_0^{\alpha_0 +i-2}(1-p_0)^{\beta_0+\beta_J-1}}{B(\alpha_0 +i-1, \beta_0+\beta_J)} dp_0 \end{aligned}\]where \(\mathbb{I}\) is the regularized gamma function. The property used in the second line is: \(\mathbb{I}_x(a,b) = 1- \sum_{i=1}^b\frac{\Gamma(a+i-1)}{\Gamma(a)\Gamma(i)}x^a(1-x)^{i-1}\) if \(b\) is an integer.