Drawing with Restrictions in MCMC
June 18, 2024
It is often the case that, when simulating from posterior distributions using MCMC, researchers wish to impose restrictions on some parameters.
Two illustrative cases are discussed in Geweke’s Contemporary Bayesian Econometrics and Statistics Example 4.2.2 and 4.3.2 . The following description of the setup (and notation) is taken almost verbatim from Geweke. These examples consider a normal linear regression model with Gaussian disturbances in which the prior of \(\boldsymbol{\beta}\) (the regression coefficient) given \(h\) (precision) is \(\mathscr{N}(\underline{\boldsymbol{\beta}}, \underline{\textbf{H}})\) truncated to a set \(S\subseteq \mathbb{R}^k\). A typical acceptance sampling algorithm used once we enter the stage of drawing \(\boldsymbol{\beta}^{(m)}\) is:
Draw candidate \(\boldsymbol{\beta}^* \sim \mathscr{N}(\bar{\boldsymbol{\beta}}, \bar{\textbf{H}})\).
If \(\boldsymbol{\beta}^* \in S\), set \(\boldsymbol{\beta}^{(m)} = \boldsymbol{\beta}^*\). Otherwise (\(\boldsymbol{\beta}^* \notin S\)) go to 1.
The acceptance probability is:
\[\begin{equation} \tag{1} (2\pi)^{k/2}|\textbf{H}|^{1/2}\int_S \exp\left [-(\boldsymbol{\beta} - \bar{\boldsymbol{\beta}})'\bar{\textbf{H}}(\boldsymbol{\beta}- \bar{\boldsymbol{\beta}}) \right]d\boldsymbol{\beta} \end{equation}\]
An undesirable characteristic of this algorithm is that if the above probability is small the MCMC routine may get stuck in the conditional draw of \(\boldsymbol{\beta}\) given the rest of the parameters. If the restriction is an inequality restriction, see Geweke (1991) and Gosh (2015). Non-linear restrictions are fairly common, such imposing that the VAR parameters are in a stationary region, see Blake (2012). As an alternative to this algorithm Geweke suggests a random-walk Metropolist-Hastings algorithm in which the transition density \(q(\boldsymbol{\beta}^*|\boldsymbol{\beta}^{(m-1)}, H)\) is that of a \(\mathscr{N}(\boldsymbol{\beta}^{(m-1)}, \textbf{V})\). The acceptance probability is
\[ \alpha(\boldsymbol{\beta}^*|\boldsymbol{\beta}^{(m-1)}, H) = \frac{\exp \left[-(\boldsymbol{\beta}^*-\bar{\boldsymbol{\beta}})'\bar{\textbf{H}}(\boldsymbol{\beta}^*-\bar{\boldsymbol{\beta}})/2 \right]}{\exp \left[-(\boldsymbol{\beta}^{(m-1)}-\bar{\boldsymbol{\beta}})'\bar{\textbf{H}}(\boldsymbol{\beta}^{(m-1)}-\bar{\boldsymbol{\beta}})/2 \right]} \]
if \(\boldsymbol{\beta}^* \in S\), and 0 if \(\boldsymbol{\beta}^* \notin S\). Appropriate tunning of \(\textbf{V}\) is necessary. Large \(\textbf{V}\) will lead to few candidates in \(S\), while small \(\textbf{V}\) will lead to little exploration in the region of \(S\).
The above algorithm fails to exploit an additional characteristic present in the example and in many other cases: the posterior distribution (given other parameters) without the restriction is of a known form easy to sample from. For example, in the above case the posterior of \(\boldsymbol{\beta}\) given \(h\) is \(\mathscr{N}(\bar{\boldsymbol{\beta}}, \bar{\textbf{H}})\) if we ignore the restriction. We can therefore provide another algorithm based on an independent Metropolis chain. In the case of the independence chain, the general formula for the acceptance probability simplifies to
\[ \alpha(\boldsymbol{\theta}^*|\boldsymbol{\theta}^{(m-1)}, H) = \min\left \{\frac{w(\boldsymbol{\theta}^*)}{w(\boldsymbol{\theta}^{(m-1)})} \right \}; \ \ \ w(\boldsymbol{\theta}) = \frac{p(\boldsymbol{\theta}|I)}{q(\boldsymbol{\theta}|H)} \]
If we set the proposal distribution equal to the posterior distribution without the restriction we obtain the following relationship:
\[\begin{equation} \tag{2} p(\boldsymbol{\theta}|I) = \frac{q(\boldsymbol{\theta}|H)}{\underbrace{\int_S q(\boldsymbol{\theta}|H)d\boldsymbol{\theta}}_k} \end{equation}\]
if \(\boldsymbol{\theta} \in S\), 0 otherwise. Assume \(\boldsymbol{\theta}^{(m-1)} \in S\). Therefore, if \(\boldsymbol{\theta}^* \in S\), \(\alpha(\boldsymbol{\theta}^*|\boldsymbol{\theta}^{(m-1)},H) = \min\left\{\frac{1/k}{1/k}, 1\right\} = 1\). Otherwise, if \(\boldsymbol{\theta}^* \notin S\), \(\alpha(\boldsymbol{\theta}^*|\boldsymbol{\theta}^{(m-1)},H) = \min\left\{ \frac{0/q(\boldsymbol{\theta}^*|H)}{1/k}, 1 \right\} = 0\). This suggests the following procedure:
- Draw \(\boldsymbol{\beta}^* \sim \mathscr{N}(\bar{\boldsymbol{\beta}}, \bar{\textbf{H}})\).
- If \(\boldsymbol{\beta}^* \in S\), set \(\boldsymbol{\beta}^{(m)} = \boldsymbol{\beta}^*\). Otherwise set \(\boldsymbol{\beta}^{(m)} = \boldsymbol{\beta}^{(m-1)}\).
One should also make sure that the starting value, \(\boldsymbol{\theta}^{(0)}\), is inside \(S\).
The reader will outright notice that this procedure has the highest possible acceptance rate when \(\boldsymbol{\beta}^* \in S\) and the same acceptance (0) as Geweke’s approach when the candidate is not in the restriction region. Furthermore, accepted draws explore the region \(S\) in the same proportion as the posterior distribution under the restriction, from equation 2.
The main difference between the independence Metropolis chain proposal from the random-walk MH proposal is that the random-walk allows the proposal to be centered at the previous value, which highly increases the probability that the candidate \(\boldsymbol{\beta}^*\) is in the region \(S\).
The advantage of the independence proposal compared to the acceptance algorithm proposed here is that if the proposal draw is not inside the restriction region \(S\) the algorithm will continue. On the contrary, the acceptance algorithm might get stuck. Skipping can be helpful if the low probability of acceptance in 1 is due to a particularly unfortunate and unlikely draw of a parameter (other than \(\boldsymbol{\beta}\)) in the MCMC routine. Note that if the other parameters were known, so in our Gaussian linear regression example \(h\) would be fixed, the acceptance probability of the independence Metropolis Chain would be exactly identical to equation 1, facing the same problem.
References
Blake, A. P., & Mumtaz, H. (2012). Applied Bayesian econometrics for central bankers. Technical Books.
Geweke, J. (1991), Efficient Simulation From the Multivariate Normal and Student t-Distributions Subject to Linear Constraints, in Computer Sciences and Statistics Proceedings of the 23d Symposium on the Interface, pp. 571-578.
Geweke, J. (2005). Contemporary Bayesian econometrics and statistics. John Wiley & Sons.
Li, Y., & Ghosh, S. K. (2015). Efficient sampling methods for truncated multivariate normal and student-t distributions subject to linear inequality constraints. Journal of Statistical Theory and Practice, 9, 712-732.