| Title: | Bayesian Predictive Stacking for Scalable Geospatial Transfer Learning |
|---|---|
| Description: | Provides functions for Bayesian Predictive Stacking within the Bayesian transfer learning framework for geospatial artificial systems, as introduced in "Bayesian Transfer Learning for Artificially Intelligent Geospatial Systems: A Predictive Stacking Approach" (Presicce and Banerjee, 2025) <doi:10.48550/arXiv.2410.09504>. This methodology enables efficient Bayesian geostatistical modeling, utilizing predictive stacking to improve inference across spatial datasets. The core functions leverage 'C++' for high-performance computation, making the framework well-suited for large-scale spatial data analysis in parallel and distributed computing environments. Designed for scalability, it allows seamless application in computationally demanding scenarios. |
| Authors: | Luca Presicce [aut, cre]
|
| Maintainer: | Luca Presicce <[email protected]> |
| License: | GPL (>= 3) |
| Version: | 2.0-1 |
| Built: | 2026-06-06 05:56:10 UTC |
| Source: | https://github.com/lucapresicce/spbps |
Compute the Euclidean distance matrix
arma_dist(X)arma_dist(X)
X |
matrix (tipically of |
matrix distance matrix of the elements of
Gibbs sampler for Conjugate Bayesian Multivariate Linear Models
bayesMvLMconjugate(Y, X, mu_B, V_B, nu, Psi, n_iter = 1000, burn_in = 500)bayesMvLMconjugate(Y, X, mu_B, V_B, nu, Psi, n_iter = 1000, burn_in = 500)
Y |
matrix |
X |
matrix |
mu_B |
matrix |
V_B |
matrix |
nu |
double prior parameter for |
Psi |
matrix prior parameter for |
n_iter |
integer iteration number for Gibbs sampler |
burn_in |
integer number of burn-in iteration |
B_samples array of posterior sample for
Sigma_samples array of posterior samples for
## Generate data n <- 100 p <- 3 q <- 2 Y <- matrix(rnorm(n*q), nrow = n, ncol = q) X <- matrix(rnorm(n*p), nrow = n, ncol = p) ## Prior parameters mu_B <- matrix(0, p, q) V_B <- diag(10, p) nu <- 3 Psi <- diag(q) ## Samples from posteriors n_iter <- 1000 burn_in <- 500 set.seed(1234) samples <- spBPS::bayesMvLMconjugate(Y, X, mu_B, V_B, nu, Psi, n_iter, burn_in)## Generate data n <- 100 p <- 3 q <- 2 Y <- matrix(rnorm(n*q), nrow = n, ncol = q) X <- matrix(rnorm(n*p), nrow = n, ncol = p) ## Prior parameters mu_B <- matrix(0, p, q) V_B <- diag(10, p) nu <- 3 Psi <- diag(q) ## Samples from posteriors n_iter <- 1000 burn_in <- 500 set.seed(1234) samples <- spBPS::bayesMvLMconjugate(Y, X, mu_B, V_B, nu, Psi, n_iter, burn_in)
Combine subset models wiht BPS
BPS_combine(fit_list, K, rp)BPS_combine(fit_list, K, rp)
fit_list |
list K fitted model outputs composed by two elements each: first named |
K |
integer number of folds |
rp |
double percentage of observations to take into account for optimization ( |
matrix posterior predictive density evaluations (each columns represent a different model)
Perform the BPS sampling from posterior and posterior predictive given a set of stacking weights
BPS_post_MvT(data, X_u, priors, coords, crd_u, hyperpar, W, R)BPS_post_MvT(data, X_u, priors, coords, crd_u, hyperpar, W, R)
data |
list two elements: first named |
X_u |
matrix unobserved instances covariate matrix |
priors |
list priors: named |
coords |
matrix sample coordinates for X and Y |
crd_u |
matrix unboserved instances coordinates |
hyperpar |
list two elemets: first named |
W |
matrix set of stacking weights |
R |
integer number of desired samples |
list BPS posterior predictive samples
Compute the BPS posterior samples given a set of stacking weights
BPS_postdraws_MvT(data, priors, coords, hyperpar, W, R, par)BPS_postdraws_MvT(data, priors, coords, hyperpar, W, R, par)
data |
list two elements: first named |
priors |
list priors: named |
coords |
matrix sample coordinates for X and Y |
hyperpar |
list two elemets: first named |
W |
matrix set of stacking weights |
R |
integer number of desired samples |
par |
if |
matrix BPS posterior samples
Compute the BPS spatial prediction given a set of stacking weights
BPS_pred_MvT(data, X_u, priors, coords, crd_u, hyperpar, W, R)BPS_pred_MvT(data, X_u, priors, coords, crd_u, hyperpar, W, R)
data |
list two elements: first named |
X_u |
matrix unobserved instances covariate matrix |
priors |
list priors: named |
coords |
matrix sample coordinates for X and Y |
crd_u |
matrix unboserved instances coordinates |
hyperpar |
list two elemets: first named |
W |
matrix set of stacking weights |
R |
integer number of desired samples |
list BPS posterior predictive samples
Combine subset models wiht Pseudo-BMA
BPS_PseudoBMA(fit_list)BPS_PseudoBMA(fit_list)
fit_list |
list K fitted model outputs composed by two elements each: first named |
matrix posterior predictive density evaluations (each columns represent a different model)
Compute the BPS weights by convex optimization
BPS_weights_MvT(data, priors, coords, hyperpar, K)BPS_weights_MvT(data, priors, coords, hyperpar, K)
data |
list two elements: first named |
priors |
list priors: named |
coords |
matrix sample coordinates for X and Y |
hyperpar |
list two elemets: first named |
K |
integer number of folds |
matrix posterior predictive density evaluations (each columns represent a different model)
Solver for Bayesian Predictive Stacking of Predictive densities convex optimization problem
conv_opt(scores)conv_opt(scores)
scores |
matrix |
W matrix of Bayesian Predictive Stacking weights for the K models considered
Compute the BPS weights by convex optimization
CVXR_opt(scores)CVXR_opt(scores)
scores |
matrix |
conv_opt function to perform convex optimiazion with CVXR R package
Evaluate the density of a set of unobserved response with respect to the conditional posterior predictive
d_pred_cpp_MvT(data, X_u, Y_u, d_u, d_us, hyperpar, poster)d_pred_cpp_MvT(data, X_u, Y_u, d_u, d_us, hyperpar, poster)
data |
list two elements: first named |
X_u |
matrix unobserved instances covariate matrix |
Y_u |
matrix unobserved instances response matrix |
d_u |
matrix unobserved instances distance matrix |
d_us |
matrix cross-distance between unobserved and observed instances matrix |
hyperpar |
list two elemets: first named |
poster |
list output from |
double posterior predictive density evaluation
Compute the KCV of the density evaluations for fixed values of the hyperparameters
dens_kcv_MvT(data, priors, coords, hyperpar, K)dens_kcv_MvT(data, priors, coords, hyperpar, K)
data |
list two elements: first named |
priors |
list priors: named |
coords |
matrix sample coordinates for X and Y |
hyperpar |
list two elemets: first named |
K |
integer number of folds |
vector posterior predictive density evaluations
Compute the LOOCV of the density evaluations for fixed values of the hyperparameters
dens_loocv_MvT(data, priors, coords, hyperpar)dens_loocv_MvT(data, priors, coords, hyperpar)
data |
list two elements: first named |
priors |
list priors: named |
coords |
matrix sample coordinates for X and Y |
hyperpar |
list two elemets: first named |
vector posterior predictive density evaluations
expand.grid() in R)Build a grid from two vector (i.e. equivalent to expand.grid() in R)
expand_grid_cpp(x, y)expand_grid_cpp(x, y)
x |
vector first vector of numeric elements |
y |
vector second vector of numeric elements |
matrix expanded grid of combinations
and (i.e. updated parameters)Compute the parameters for the posteriors distribution of and (i.e. updated parameters)
fit_cpp_MvT(data, priors, coords, hyperpar)fit_cpp_MvT(data, priors, coords, hyperpar)
data |
list two elements: first named |
priors |
list priors: named |
coords |
matrix sample coordinates for X and Y |
hyperpar |
list two elemets: first named |
list posterior update parameters
Function to subset data for meta-analysis
forceSymmetry_cpp(mat)forceSymmetry_cpp(mat)
mat |
matrix not-symmetric matrix |
matrix symmetric matrix (lower triangular of mat is used)
Return the CV predictive density evaluations for all the model combinations
models_dens_MvT(data, priors, coords, hyperpar, useKCV, K)models_dens_MvT(data, priors, coords, hyperpar, useKCV, K)
data |
list two elements: first named |
priors |
list priors: named |
coords |
matrix sample coordinates for X and Y |
hyperpar |
list two elemets: first named |
useKCV |
if |
K |
integer number of folds |
matrix posterior predictive density evaluations (each columns represent a different model)
Sample R draws from the posterior distributions
post_draws_MvT(poster, R, par, p)post_draws_MvT(poster, R, par, p)
poster |
list output from |
R |
integer number of posterior samples |
par |
if |
p |
integer if |
list posterior samples
Predictive sampler for Conjugate Bayesian Multivariate Linear Models
pred_bayesMvLMconjugate(X_new, B_samples, Sigma_samples)pred_bayesMvLMconjugate(X_new, B_samples, Sigma_samples)
X_new |
matrix |
B_samples |
array of posterior sample for |
Sigma_samples |
array of posterior samples for |
Y_pred matrix of posterior mean for response matrix Y predictions
Y_pred_samples array of posterior predictive sample for response matrix Y
## Generate data n <- 100 p <- 3 q <- 2 Y <- matrix(rnorm(n*q), nrow = n, ncol = q) X <- matrix(rnorm(n*p), nrow = n, ncol = p) ## Prior parameters mu_B <- matrix(0, p, q) V_B <- diag(10, p) nu <- 3 Psi <- diag(q) ## Samples from posteriors n_iter <- 1000 burn_in <- 500 set.seed(1234) samples <- spBPS::bayesMvLMconjugate(Y, X, mu_B, V_B, nu, Psi, n_iter, burn_in) ## Extract posterior samples B_samples <- samples$B_samples Sigma_samples <- samples$Sigma_samples ## Samples from predictive posterior (based posterior samples) m <- 50 X_new <- matrix(rnorm(m*p), nrow = m, ncol = p) pred <- spBPS::pred_bayesMvLMconjugate(X_new, B_samples, Sigma_samples)## Generate data n <- 100 p <- 3 q <- 2 Y <- matrix(rnorm(n*q), nrow = n, ncol = q) X <- matrix(rnorm(n*p), nrow = n, ncol = p) ## Prior parameters mu_B <- matrix(0, p, q) V_B <- diag(10, p) nu <- 3 Psi <- diag(q) ## Samples from posteriors n_iter <- 1000 burn_in <- 500 set.seed(1234) samples <- spBPS::bayesMvLMconjugate(Y, X, mu_B, V_B, nu, Psi, n_iter, burn_in) ## Extract posterior samples B_samples <- samples$B_samples Sigma_samples <- samples$Sigma_samples ## Samples from predictive posterior (based posterior samples) m <- 50 X_new <- matrix(rnorm(m*p), nrow = m, ncol = p) pred <- spBPS::pred_bayesMvLMconjugate(X_new, B_samples, Sigma_samples)
Generates posterior predictive samples at new spatial locations using a
fitted spBPS object. Memory usage is controlled via
pred_batch_size
## S3 method for class 'spBPS' predict( object, newdata, draws = 200L, pred_batch_size = 200L, n_cores = 1L, return_summary = FALSE, probs = c(0.025, 0.975), ... )## S3 method for class 'spBPS' predict( object, newdata, draws = 200L, pred_batch_size = 200L, n_cores = 1L, return_summary = FALSE, probs = c(0.025, 0.975), ... )
object |
Object of class |
newdata |
List with elements |
draws |
Integer. Number of posterior predictive draws. Default 200. |
pred_batch_size |
Integer. Number of prediction sites processed per batch. Default 200. Smaller values use less RAM; larger values may be faster. |
n_cores |
Integer. Number of OMP threads. Default 1. |
return_summary |
Logical. If |
probs |
Numeric vector of length 2. Quantile probabilities for credible intervals when |
... |
Currently unused. |
When return_summary = FALSE, a list with arrays Wu,
Yu, MY of dimension u x q x R.
When return_summary = TRUE, a list with u x q matrices
mu_Yu, sd_Yu, q_lo, q_hi, mu_Wu.
n <- 1000 p <- 2 q <- 1 Y <- matrix(rnorm(n * q), ncol = q) X <- matrix(rnorm(n * p), ncol = p) coords <- matrix(runif(n * 2), ncol = 2) data <- list(Y = Y, X = X) priors <- list(mu_B = matrix(0, nrow = p, ncol = q), V_r = diag(10, p), Psi = diag(1, q), nu = 3) hyperpar <- list(alpha = 0.5, phi = 1) res <- spBPS(data, priors, coords, hyperpar, subset_size = 200) u <- 500 p <- 2 q <- 1 X_u <- matrix(rnorm(u * p), ncol = p) crd_u <- matrix(runif(u * 2), ncol = 2) # Predictive sampling pred <- predict(res, newdata=list(X=X_u, coords=crd_u), draws=200L, pred_batch_size=500L) # Summary statistics only (memory-efficient for large u) pred_sum <- predict(res, newdata=list(X=X_u, coords=crd_u), draws=500L, return_summary=TRUE, probs=c(0.025, 0.975))n <- 1000 p <- 2 q <- 1 Y <- matrix(rnorm(n * q), ncol = q) X <- matrix(rnorm(n * p), ncol = p) coords <- matrix(runif(n * 2), ncol = 2) data <- list(Y = Y, X = X) priors <- list(mu_B = matrix(0, nrow = p, ncol = q), V_r = diag(10, p), Psi = diag(1, q), nu = 3) hyperpar <- list(alpha = 0.5, phi = 1) res <- spBPS(data, priors, coords, hyperpar, subset_size = 200) u <- 500 p <- 2 q <- 1 X_u <- matrix(rnorm(u * p), ncol = p) crd_u <- matrix(runif(u * 2), ncol = 2) # Predictive sampling pred <- predict(res, newdata=list(X=X_u, coords=crd_u), draws=200L, pred_batch_size=500L) # Summary statistics only (memory-efficient for large u) pred_sum <- predict(res, newdata=list(X=X_u, coords=crd_u), draws=500L, return_summary=TRUE, probs=c(0.025, 0.975))
Draw from the conditional posterior predictive for a set of unobserved covariates
r_pred_cond_MvT(data, X_u, d_u, d_us, hyperpar, poster, post)r_pred_cond_MvT(data, X_u, d_u, d_us, hyperpar, poster, post)
data |
list two elements: first named |
X_u |
matrix unobserved instances covariate matrix |
d_u |
matrix unobserved instances distance matrix |
d_us |
matrix cross-distance between unobserved and observed instances matrix |
hyperpar |
list two elemets: first named |
poster |
list output from |
post |
list output from |
list posterior predictive samples
Draw from the joint posterior predictive for a set of unobserved covariates
r_pred_joint_MvT(data, X_u, d_u, d_us, hyperpar, poster, R)r_pred_joint_MvT(data, X_u, d_u, d_us, hyperpar, poster, R)
data |
list two elements: first named |
X_u |
matrix unobserved instances covariate matrix |
d_u |
matrix unobserved instances distance matrix |
d_us |
matrix cross-distance between unobserved and observed instances matrix |
hyperpar |
list two elemets: first named |
poster |
list output from |
R |
integer number of posterior predictive samples |
list posterior predictive samples
Draw from the joint posterior predictive for a set of unobserved covariates
r_pred_marg_MvT(data, X_u, d_u, d_us, hyperpar, poster, R)r_pred_marg_MvT(data, X_u, d_u, d_us, hyperpar, poster, R)
data |
list two elements: first named |
X_u |
matrix unobserved instances covariate matrix |
d_u |
matrix unobserved instances distance matrix |
d_us |
matrix cross-distance between unobserved and observed instances matrix |
hyperpar |
list two elemets: first named |
poster |
list output from |
R |
integer number of posterior predictive samples |
list posterior predictive samples
Function to sample integers (index)
sample_index(size, length, p)sample_index(size, length, p)
size |
integer dimension of the set to sample |
length |
integer number of elements to sample |
p |
vector sampling probabilities |
vector sample of integers
Orchestrates subsetting, local stacking weight estimation, global stacking combination, and optional posterior or predictive simulation using Double Bayesian Predictive Stacking for latent spatial regression. Works for both multivariate outcomes and the univariate case via q = 1.
spBPS( data, priors, coords, hyperpar, subset_size = 500L, K = NULL, cv_folds = 5L, rp = 1, combine_method = c("bps", "pseudoBMA"), draws = 0L, newdata = NULL, include_latent = FALSE, n_cores = 1L, pred_batch_size = 200L )spBPS( data, priors, coords, hyperpar, subset_size = 500L, K = NULL, cv_folds = 5L, rp = 1, combine_method = c("bps", "pseudoBMA"), draws = 0L, newdata = NULL, include_latent = FALSE, n_cores = 1L, pred_batch_size = 200L )
data |
List with matrices |
priors |
List of priors for the multivariate model: |
coords |
Matrix of observation coordinates (n x d). |
hyperpar |
List with elements |
subset_size |
Target subset size when |
K |
Optional number of subsets. When |
cv_folds |
Number of folds for local cross-validation. Default 5. |
rp |
Fraction of rows used when recomputing global stacking weights
(passed to |
combine_method |
Global combination method: Bayesian Predictive Stacking
( |
draws |
Number of posterior/predictive draws to return (0 to skip
sampling). When positive and |
newdata |
Optional list with |
include_latent |
Unused; kept for compatibility. |
n_cores |
Number of cores for parallel computation. Controls both the local weight estimation (parallel over K subsets) and predictive sampling. Default 1. |
pred_batch_size |
Batch size for streaming prediction. Controls the
maximum number of prediction sites processed at once, hence the peak memory
usage. Default 200. Rule of thumb: peak RAM (MB) ? batch_size x q x draws x 8 / 1e6.
Set |
Object of class "spBPS" ? a list with components:
Partition information: Y_list, X_list, crd_list.
K-vector of global stacking weights.
K-list of local stacking weight vectors.
K-list of n_k x J log-density matrices.
Stored for use by predict.spBPS.
Named numeric vector: fitting, combination, sampling (seconds).
(if draws > 0) List with three-dimensional arrays:
beta (p x q x R), sigma (q x q x R), model (R).
(if draws > 0 and newdata supplied) List with
three-dimensional arrays: Wu, Yu, MY (each u x q x R).
predict.spBPS for generating predictions on new data
after fitting.
n <- 1000 p <- 2 q <- 1 Y <- matrix(rnorm(n * q), ncol = q) X <- matrix(rnorm(n * p), ncol = p) coords <- matrix(runif(n * 2), ncol = 2) data <- list(Y = Y, X = X) priors <- list(mu_B = matrix(0, nrow = p, ncol = q), V_r = diag(10, p), Psi = diag(1, q), nu = 3) hyperpar <- list(alpha = 0.5, phi = 1) res <- spBPS(data, priors, coords, hyperpar, subset_size = 200)n <- 1000 p <- 2 q <- 1 Y <- matrix(rnorm(n * q), ncol = q) X <- matrix(rnorm(n * p), ncol = p) coords <- matrix(runif(n * 2), ncol = 2) data <- list(Y = Y, X = X) priors <- list(mu_B = matrix(0, nrow = p, ncol = q), V_r = diag(10, p), Psi = diag(1, q), nu = 3) hyperpar <- list(alpha = 0.5, phi = 1) res <- spBPS(data, priors, coords, hyperpar, subset_size = 200)
Orchestrates subsetting, local stacking weight estimation, global stacking
combination, and optional posterior or predictive simulation using the
multivariate Student-t spatial model. Works for both multivariate outcomes
and the univariate case via q = 1.
spBPS_old( data, priors, coords, hyperpar, subset_size = 500L, K = NULL, cv_folds = 5L, rp = 1, combine_method = c("bps", "pseudoBMA"), draws = 0L, newdata = NULL, include_latent = FALSE, cores = NULL )spBPS_old( data, priors, coords, hyperpar, subset_size = 500L, K = NULL, cv_folds = 5L, rp = 1, combine_method = c("bps", "pseudoBMA"), draws = 0L, newdata = NULL, include_latent = FALSE, cores = NULL )
data |
List with matrices |
priors |
List of priors for the multivariate model ( |
coords |
Matrix of observation coordinates. |
hyperpar |
List with elements |
subset_size |
Target subset size when |
K |
Optional number of subsets. When |
cv_folds |
Number of folds for local cross-validation (default 5). |
rp |
Fraction of rows used when recomputing global stacking weights
(passed to |
combine_method |
Choose between Bayesian Predictive Stacking ( |
draws |
Number of joint posterior/predictive draws to return (0 to
skip). When positive, |
newdata |
Optional list with |
include_latent |
Logical; if |
cores |
Optional integer; when >1 a parallel backend is registered
internally via |
List with components subsets, weights_global, weights_local,
epd, and optional posterior and predictive draws.
n <- 1000 p <- 2 q <- 1 Y <- matrix(rnorm(n*q), ncol = q) X <- matrix(rnorm(n*p), ncol = p) coords <- matrix(runif(n*2), ncol = 2) data <- list(Y = Y, X = X) priors <- list(mu_B = matrix(0, nrow = p, ncol = q), V_r = diag(10, p), Psi = diag(1, q), nu = 3) hyperpar <- list(alpha = 0.5, phi = 1) subset_size <- 200 res <- spBPS(data, priors, coords, hyperpar, subset_size = subset_size)n <- 1000 p <- 2 q <- 1 Y <- matrix(rnorm(n*q), ncol = q) X <- matrix(rnorm(n*p), ncol = p) coords <- matrix(runif(n*2), ncol = 2) data <- list(Y = Y, X = X) priors <- list(mu_B = matrix(0, nrow = p, ncol = q), V_r = diag(10, p), Psi = diag(1, q), nu = 3) hyperpar <- list(alpha = 0.5, phi = 1) subset_size <- 200 res <- spBPS(data, priors, coords, hyperpar, subset_size = subset_size)
Function to subset data for meta-analysis
subset_data(data, K)subset_data(data, K)
data |
list three elements: first named |
K |
integer number of desired subsets |
list subsets of data, and the set of indexes