Experimental Benchmark Models¶
This section presents a collection of benchmarking scenarios for comparing and assessing the performance, and behavior, of different Hamiltonian-based estimators based on several of their intrinsic properties:
- Correlation in the samples.
- Reversibility of the chain.
- Influence of the Importance Sampling re-weighting (e.g., for MMHMC).
Importantly, all necessary data (e.g., datasets, parameters, etc.) for running the experiments is provided in the becnhmarks/
directory.
Banana-Shaped Distribution¶
Given data \(\{y_k\}_{k=1}^K\), we sample from the banana-shaped posterior distribution of the parameter \(\theta = (\theta_1, \theta_2)\) for which the likelihood and prior distributions are respectively given as:
The sample data are generated with \(\theta_1+\theta_2^2=1, \sigma_y=2, \sigma_{\theta}=1\). Then, the potential function is given by:
Multivariate Gaussian Distribution¶
We sample from a \(D\)-dimensional Multivariate Gaussian Distribution \(\mathcal{N} (0, \Sigma)\), where the precision matrix \(\Sigma^{-1}\) is generated from a Wishart distribution.
For computational purposes, we take \(D=1000\) dimensions and the covariance matrix to be diagonal with:
where \(\sigma_i^2\) is the \(i\)-th smallest eigenvalue of the original covariance matrix. The potential function in this case is defined as:
Bayesian Logistic Regression (BLR)¶
Bayesian Logistic Regression (BLR) is the probabilistic extension of the traditional point-estimate logistic regression model by incorporating a prior distribution over the parameters of the model.
Given \(K\) data instances \(\{x_k, y_k\}_{k=1}^K\) where \(x_k=(1, x_1, \ldots, x_D)\) are vectors of \(D\) covariates and \(y_k \in \{0, 1\}\) are the binary responses, the probability of a particular outcome is linked to the linear predictor function through the logit function:
where \(\theta=(\theta_0, \theta_1, \ldots, \theta_D)^T\) are the parameters of the model, with \(\theta_0\) denoted as the intercept.
The prior distribution over the parameters \(\theta\) is chosen to be a Multivariate Gaussian distribution:
where \(\mu\in \mathbb{R}^{D+1}\) is the mean vector, \(\Sigma \in \mathbb{R}^{D+1}\) is the covariance matrix, \(0\) is the zero vector and \(I_{D+1}\) is the identity matrix of order \(D+1\).
In order to simplify the notation, let us define the vectorized response variable \(\symbf{y}=(y_1, \ldots, y_K)\), and the design matrix \(X\in \mathbb{R}^{K, D}\) as the input to the model:
The likelihood of the data is given by the product of the Bernoulli distributions as:
where \(X_k=(1, x_{k, 1}, \ldots, x_{k,D})\) is the \(k\)-th entry row vector of the design matrix \(X\).
Then, the potential function can be expressed as:
Available Datasets¶
The following datasets are included for benchmarking the BLR model:
Dataset | D | K | Reference |
---|---|---|---|
German | 25 | 1000 | German Credit Dataset |
Sonar | 61 | 208 | Sonar Dataset |
Musk | 167 | 476 | Musk Dataset (Version 1) |
Secom | 444 | 1567 | SECOM Dataset |
Dynamic COVID-19 Epidemiological Models¶
A SEIR (Susceptible-Exposed-Infectious-Remove) dynamic compartmental mechanistic epidemiological model with a time-dependent transmission rate parametrized using Bayesian P-splines is applied to modeling the COVID-19 incidence data in the Basque Country (Spain).
The \(SEIR\) model consists of the following compartments:
- \(S\): Number of individuals that are susceptible to be infected
- \(E_1, \ldots, E_M\): Number of individuals at different stages of exposure (infected but not yet infectious)
- \(I_1, \ldots, I_K\): Number of infectious individuals
- \(R\): Number of individuals removed from the pool of susceptible individuals
- \(C_I\): Counter of the total number of individuals that have been infected
- \(\beta(t)\): Time-dependent transmission rate
The transmission rate \(\beta(t)\) is modeled using B-splines:
where \(\{B_i (t)\}_{i=1}^m\) form a B-spline basis over the time interval \([t_0, t_1]\), with \(m=q+d-1\) (\(q\) is the number of knots, \(d\) is the degree of the polynomials of the B-splines); and \(\beta=(\beta_1, \ldots, \beta_m)\) is a vector of coefficients.
The \(SEIR\) model is governed by the following system of ODEs:
with the following constraints:
Talbot Physical Effect¶
This benchmark analyzes Partial Differential Equations (PDEs) in the context of the phenomenon occurring when a plane light wave is diffracted by an infinite set of equally spaced slits (the grating, with distance \(d\) between the slits).
We wish to find solutions to the following differential equation:
in the domain \(0 \leq x \leq \frac{d}{2}\), \(z \geq 0\), \(t \geq 0\) under the border conditions in \(x\):
the boundary conditions in \(z\):
and the initial conditions:
The solution can be expressed in closed-form as:
Solving the problem entails numerically approximating the complex integral, which involves:
- A Bessel function of the first kind.
- An avoidable singularity as \(\tau \rightarrow z\).
- A composition of two highly oscillatory functions.