Analysis of OC43 Recovery and equations 1 to 4 in the manuscript.
Error in rstan_options(auto_write = TRUE) :
could not find function "rstan_options"
library(future)
library(brms)
library(dplyr)
library(loo)
library(bayesplot)
library(posterior)
library(ggplot2)
library(tidybayes)
library(ggeffects)
library(glmmTMB)
library(bayestestR)
library(logspline)
library(MASS)
library(ggeffects)
library(tidybayes)
library(EnvStats)
library(moments)
library(emmeans)
oc43.data <- read.csv(paste0(c19.dir,"oc43_df.csv"))
bio.avg <- oc43.data %>%
group_by(site, date, bio_rep) %>%
summarise(across(starts_with("recovery"), ~ mean(.x, na.rm=TRUE)), .groups= 'drop')
#bio.avg
final.avg <- bio.avg %>%
group_by(site, date) %>%
summarise(across(starts_with("recovery"), ~ mean(.x, na.rm=TRUE)), .groups= 'drop')
final.avg$site <- factor(final.avg$site, levels = paste0("SS", 1:11) )
final.avg$date <- as.Date(final.avg$date)
final.avg <- final.avg %>%
mutate(epi_month = floor_date(date, "month"))
#filter out 2020-10-01 because it is the earliest month in the study and has 0 or 1 recovery measurement.
final.avg <- final.avg %>%
filter(epi_month != "2020-10-01")
#final.avg
To track the overall performance of laboratory methods used in our study of SARS-CoV-2 in wastewater samples, a Bayesian model was used to estimate the recovery of the OC43 process control spiked into these samples. It uses a student-t distribution for the recovery and is a simple intercept model with no predictors.
The model can be expressed as:
\[ y_i \sim \beta_0 + \epsilon_i \] Where:
\(y_i\) represents the log10 scale outcome variable (“Percent OC43 Recovery”) for observation \(i\)
\(\beta_0\) is the fixed intercept
\(\epsilon_i\) is the error term for observation \(i\)
# brms.1 <- brm(
# log10(recovery) ~ 1 ,
# data = datefact.avg,
# family =student(),
# prior = c(
# prior(normal(100, 10), class = "Intercept"),
# prior(gamma(2, 0.1), class = "nu") #degrees of freedom for student-t
# ),
# warmup = 500,
# iter = 4000,
# control=list(adapt_delta =0.95, max_treedepth=10),
# chains =4,
# sample_prior = "yes",
# seed = 513,
# save_pars = save_pars(all = TRUE),
# save_ranef = TRUE,
# save_all_pars = TRUE,
# file ='brm_1.rds',
# save_model = 'brm_1_model.txt',
# future=TRUE
# )
Family: student
Links: mu = identity; sigma = identity; nu = identity
Formula: log10(recovery) ~ 1
Data: datefact.avg (Number of observations: 679)
Draws: 4 chains, each with iter = 4000; warmup = 500; thin = 1;
total post-warmup draws = 14000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.6759 0.0148 1.6466 1.7047 1.0001 11191 9332
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.3601 0.0146 0.3311 0.3889 1.0000 7047 7516
nu 11.5832 4.2689 6.3149 22.6450 1.0002 7233 7186
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
prior class coef group resp dpar nlpar lb ub source
normal(100, 10) Intercept user
gamma(2, 0.1) nu 1 user
student_t(3, 0, 2.5) sigma 0 default
Inference for Stan model: anon_model.
4 chains, each with iter=4000; warmup=500; thin=1;
post-warmup draws per chain=3500, total post-warmup draws=14000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
b_Intercept 1.6759 0.0001 0.0148 1.6466 1.6659 1.6758 1.6859 1.7047 11174 1.0000
sigma 0.3601 0.0002 0.0146 0.3311 0.3503 0.3600 0.3700 0.3889 7056 1.0000
nu 11.5832 0.0519 4.2689 6.3149 8.7706 10.7009 13.2530 22.6450 6757 1.0000
Intercept 1.6759 0.0001 0.0148 1.6466 1.6659 1.6758 1.6859 1.7047 11174 1.0000
prior_Intercept 99.9494 0.0857 10.0527 80.3678 93.1786 99.9298 106.7891 119.5336 13752 1.0000
prior_sigma 2.7933 0.0324 3.7866 0.0785 0.8715 1.8910 3.5549 10.5708 13624 0.9999
prior_nu 20.0987 0.1203 14.0555 2.6339 9.7630 16.9294 27.0756 55.4785 13660 0.9998
lprior -56.1619 0.0018 0.1460 -56.5484 -56.1625 -56.1207 -56.1015 -56.0765 6529 0.9998
lp__ -391.4392 0.0159 1.2234 -394.6053 -392.0144 -391.1137 -390.5397 -390.0498 5950 1.0000
Samples were drawn using NUTS(diag_e) at Thu Apr 3 12:05:33 2025.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
The mean recovery is 47.41328 (47.41%) with a 95% credible interval of 44.3200253 to 50.6640612 (44.32 to 50.66).
PPC generates Leave-One-Out (LOO) intervals plot for PPC. It assesses the overall predictive performance of the model and detects potential issues with model specification or data quality (i.e., outliers).
Key Elements of the Plot:
The prob = 0.5 and prob_outer = 0.95 arguments generate 50% and 95% prediction intervals. For prob= 0.5, half of the observed data points should fall within these intervals if the model is well-calibrated. Likewise, for prob_outer =0.95, 95% of the observed data should fall within these intervals.
pp_check(brms.1.read , type="loo_intervals",prob = 0.5, prob_outer= 0.95)+
ggplot2::ylab("Recovery") +
ggplot2::xlab("Index") +
ggplot2::theme(
axis.text = ggplot2::element_text(size = 12),
axis.title = ggplot2::element_text(size = 14)
)
Using all posterior draws for ppc type 'loo_intervals' by default.
pp.draws <- posterior_predict(brms.1.read)
lower.ci <- apply(pp.draws, 2, quantile, prob = 0.025)
upper.ci <- apply(pp.draws, 2, quantile, prob = 0.975)
observed.val <- brms.1.read$data$recovery
within.interval <- (observed.val > 10^lower.ci) & (observed.val < 10^upper.ci)
n.within.interval <- sum(within.interval)
proportion.within.interval <- n.within.interval / length(observed.val)
The number of observed values within the 95% prediction interval is 648, which is 95.43 percent. This indicates that the model is accurate, as most observed data points fall within the 95% prediction interval.
resid.mat <- residuals(brms.1.read, type="ordinary" )
#resid.mat
mean.resid <- apply(resid.mat, 1, mean)
#mean.resid
oc43.withresiduals.df <- brms.1.read$data %>%
mutate(mean.resid = mean.resid,
row.index = row_number() )
#oc43.withresiduals.df
ggplot(oc43.withresiduals.df, aes(x= row.index, y=mean.resid))+
geom_point() +
labs(title ="Mean OC43 Residuals by Sample",
x = "Sample",
y = "Mean Residual"
)
qqnorm(oc43.withresiduals.df$mean.resid, main = "Q-Q Plot of OC43 Recovery Residuals")
qqline(oc43.withresiduals.df$mean.resid, col = "red")
The residuals are symmetrically centered around 0 between -1 and 1 and Q-Q residual plot indicates that the residuals are normally distributed. The consistent performance of the OC43 process control across all wastewater samples demonstrates that the laboratory methods used in our SARS-CoV-2 study are reliable.
c19.dir <- "D:/c19/seq_manuscript/r/"
c19.data <- read.csv(paste0(c19.dir, "sewershed_data.csv"),header=TRUE,na.strings=c("","NA"))
c19.data$date <-as.Date(c19.data$date)
c19.data$site <-as.factor(c19.data$site)
c19.data$phase <- as.factor(c19.data$phase)
c19.data$phase <- relevel(c19.data$phase, ref = "pre-VOC" )
Does SARS-CoV-2 shedding vary by sewershed communities?
The amount of SARS-CoV-2 entering wastewater treatment systems depends on the amount of feces excreted per person, the volume of wastewater per household, and the number of infected individuals.
The shedding function is based on the following equation (see Prasek et al. 2022;10.1016/j.scitotenv.2022.156535):
\[ S_w= \frac{C_w \times F_w}{ M_f\times N_i} \tag{Eq. 1} \]
This equation shows how \(S_f\) (the amount of SARS-CoV-2 shed per gram) can be estimated by using the concentration of N2 gene in wastewater \(C_w\) (copies/L), the wastewater flow \(F_w\) (L/day), the mass of feces excreted by a person \(M_f\) (128 g/day), and the number of infected persons \(N_i\) (COVID-19 cases).
According to Rose (2015,DOI:10.1080/10643389.2014.1000761), the quantity of feces excreted per person/day ranges from 51–796 g with a median, mean, and sigma of 128, 149, and 95.
Regression was used to determine if there are differences in virus shedding. The model is a random intercept model with a Gaussian distribution. The model can be expressed as:
\[ y_i \sim \beta_0 + u_j + \epsilon_i \] Where:
\(y_i\) represents the outcome variable \(S_w\) (“fecal_load”) for observation \(i\), \(\beta_0\) is the fixed intercept
\(u_j\) is the random intercept for site \(j\)
\(\epsilon_i\) is the error term for observation \(i\)
# brm.fecal.load.rand <- brm(
# log10(fecal_load) ~ 1+ (1|site),
# data = c19.data,
# family =gaussian,
# control=list(adapt_delta =0.95,max_treedepth=10), # try this 0.96 next
# warmup = 500,
# iter = 4000,
# chains =4,
# sample_prior = "yes",
# seed = 513,
# save_pars = save_pars(all = TRUE),
# save_ranef = TRUE,
# save_all_pars = TRUE,
# file ='brm_fecal_load_rand.rds', # this should be deleted in the Project directory each time model rerun
# save_model = 'brm.fecal.load.rand_mode.txt',
# future=TRUE
# )
Family: gaussian
Links: mu = identity; sigma = identity
Formula: log10(fecal_load) ~ 1 + (1 | site)
Data: c19.data (Number of observations: 871)
Draws: 4 chains, each with iter = 4000; warmup = 500; thin = 1;
total post-warmup draws = 14000
Multilevel Hyperparameters:
~site (Number of levels: 11)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.2945 0.0815 0.1836 0.4962 1.0010 2029 3134
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 8.4814 0.0915 8.2975 8.6622 1.0012 1502 2565
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.3173 0.0078 0.3024 0.3329 1.0006 6744 7175
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
prior class coef group resp dpar nlpar lb ub source
student_t(3, 8.5, 2.5) Intercept default
student_t(3, 0, 2.5) sd 0 default
student_t(3, 0, 2.5) sd site 0 (vectorized)
student_t(3, 0, 2.5) sd Intercept site 0 (vectorized)
student_t(3, 0, 2.5) sigma 0 default
Inference for Stan model: anon_model.
4 chains, each with iter=4000; warmup=500; thin=1;
post-warmup draws per chain=3500, total post-warmup draws=14000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
b_Intercept 8.4814 0.0024 0.0915 8.2975 8.4235 8.4823 8.5402 8.6622 1501 1.0013
sd_site__Intercept 0.2945 0.0018 0.0815 0.1836 0.2383 0.2789 0.3323 0.4962 2001 1.0005
sigma 0.3173 0.0001 0.0078 0.3024 0.3119 0.3172 0.3224 0.3329 6729 1.0006
Intercept 8.4814 0.0024 0.0915 8.2975 8.4235 8.4823 8.5402 8.6622 1501 1.0013
r_site[SS1,Intercept] 0.5674 0.0023 0.0967 0.3825 0.5048 0.5660 0.6282 0.7659 1710 1.0012
r_site[SS10,Intercept] -0.3067 0.0024 0.0975 -0.5001 -0.3687 -0.3087 -0.2445 -0.1116 1716 1.0012
r_site[SS11,Intercept] -0.1788 0.0024 0.0968 -0.3707 -0.2413 -0.1795 -0.1175 0.0114 1647 1.0010
r_site[SS2,Intercept] -0.0831 0.0024 0.0968 -0.2739 -0.1454 -0.0840 -0.0213 0.1082 1656 1.0011
r_site[SS3,Intercept] -0.0949 0.0024 0.0966 -0.2848 -0.1574 -0.0960 -0.0325 0.1009 1670 1.0014
r_site[SS4,Intercept] 0.0764 0.0024 0.0965 -0.1127 0.0132 0.0756 0.1387 0.2699 1643 1.0010
r_site[SS5,Intercept] -0.0093 0.0024 0.0999 -0.2033 -0.0749 -0.0110 0.0541 0.1927 1758 1.0011
r_site[SS6,Intercept] 0.2407 0.0023 0.0953 0.0521 0.1792 0.2399 0.3015 0.4323 1657 1.0010
r_site[SS7,Intercept] -0.1672 0.0024 0.0971 -0.3581 -0.2306 -0.1675 -0.1057 0.0277 1693 1.0009
r_site[SS8,Intercept] -0.2183 0.0024 0.0990 -0.4139 -0.2822 -0.2190 -0.1535 -0.0205 1751 1.0011
r_site[SS9,Intercept] 0.2036 0.0024 0.0973 0.0131 0.1397 0.2019 0.2658 0.3993 1686 1.0011
prior_Intercept 8.4988 0.0354 4.1975 0.4862 6.5833 8.5098 10.3884 16.2909 14036 1.0000
prior_sigma 2.7599 0.0274 3.2370 0.0940 0.8762 1.9162 3.5429 10.4162 13970 1.0003
prior_sd_site 2.8148 0.0318 3.7822 0.0898 0.8823 1.9247 3.6038 10.7338 14168 0.9998
lprior -4.3868 0.0002 0.0069 -4.4050 -4.3889 -4.3850 -4.3825 -4.3797 2031 1.0008
lp__ -257.4895 0.0720 3.4766 -265.2306 -259.6194 -257.1253 -255.0016 -251.6031 2333 1.0005
z_1[1,1] 2.0495 0.0131 0.5734 0.9778 1.6560 2.0349 2.4331 3.1993 1918 1.0006
z_1[1,2] -1.1113 0.0097 0.4236 -1.9595 -1.3907 -1.1070 -0.8231 -0.3140 1890 1.0006
z_1[1,3] -0.6488 0.0087 0.3619 -1.3829 -0.8889 -0.6493 -0.3982 0.0314 1744 1.0006
z_1[1,4] -0.3028 0.0081 0.3346 -0.9754 -0.5261 -0.3012 -0.0740 0.3414 1714 1.0008
z_1[1,5] -0.3449 0.0080 0.3362 -1.0213 -0.5696 -0.3446 -0.1143 0.3073 1753 1.0011
z_1[1,6] 0.2737 0.0081 0.3289 -0.3637 0.0459 0.2735 0.4958 0.9223 1668 1.0009
z_1[1,7] -0.0363 0.0079 0.3377 -0.6963 -0.2641 -0.0394 0.1912 0.6222 1807 1.0010
z_1[1,8] 0.8694 0.0090 0.3762 0.1512 0.6120 0.8635 1.1238 1.6140 1738 1.0007
z_1[1,9] -0.6069 0.0085 0.3587 -1.3263 -0.8442 -0.6062 -0.3608 0.0790 1779 1.0007
z_1[1,10] -0.7908 0.0090 0.3849 -1.5729 -1.0450 -0.7834 -0.5216 -0.0610 1848 1.0007
z_1[1,11] 0.7347 0.0088 0.3672 0.0415 0.4823 0.7262 0.9776 1.4725 1747 1.0009
Samples were drawn using NUTS(diag_e) at Fri Feb 7 12:04:41 2025.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Summary of Fecal Shedding Model
Overall Model Structure:
There are credible differences between sewershed sites as evidenced by the random effect standard deviation (0.2945: 95% CI: 0.1836 to 0.4962) not containing 0.
Fixed Effect (Overall Mean):
Random Effect (Site-Specific Variation):
Residual Variation:
Intraclass Correlation Coefficient (ICC):
The random effect standard deviation (0.2945) is relatively small compared to the overall mean (8.4814) indicating that while there are site-specific differences, they are not large relative to the overall mean.
pp_check(fs.mo1, type="loo_intervals",prob = 0.5,prob_outer= 0.95) +
ggplot2::ylab("Fecal Load") +
ggplot2::xlab("Sample Index") +
ggplot2::theme(
axis.text = ggplot2::element_text(size = 12),
axis.title = ggplot2::element_text(size = 14)
)
Using all posterior draws for ppc type 'loo_intervals' by default.
site_order <- c("SS1", "SS2", "SS3", "SS4", "SS5", "SS6", "SS7", "SS8", "SS9", "SS10", "SS11")
#extract random intercepts
fs.mo1 %>%
spread_draws(r_site[site,term]) %>%
filter(term %in% c("Intercept")) %>%
ungroup() %>%
mutate(site = factor(site, levels = site_order, ordered = TRUE)) %>%
ggplot(aes(y = site, x = r_site)) +
stat_halfeye(.width = c(.50, .95)) +
geom_vline(xintercept = 0, linetype = "dashed") +
facet_wrap(~term, scales = "free_x") +
labs(x = "Random Effect", y = "Site",
title = "Random Effects by Site")+
scale_y_discrete(limits = rev(site_order))
LOO-CV is a statistical method used to estimate the performance of models on data not used during the training of the model. It is a specific configuration of k-fold cross-validation where k is set to the number of examples in the dataset. For a dataset with n observations, LOO-CV will create n different models, each time leaving out one observation from the training set to evaluate performance.
Computed from 14000 by 871 log-likelihood matrix.
------
MCSE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.5, 1.5]).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
Interpretation of PSIS k values:
k < 0.5: The estimate is reliable
0.5 < k < 0.7: The estimate is somewhat reliable
k > 0.7: The estimate is unreliable and should be treated with caution
All 3 models have k < 0.5, which means their estimates are reliable.
contrast.summary <- fs.mo1 %>%
spread_draws(r_site[site,]) %>%
compare_levels(r_site, by = site) %>%
group_by(site) %>%
summarize(
median = median(r_site),
lower_95 = quantile(r_site, 0.025),
upper_95 = quantile(r_site, 0.975)
)
contrast.summary %>% arrange(-desc(median))
#library(tidybayes)
fs.mo1 %>%
spread_draws(r_site[site,]) %>%
compare_levels(r_site, by = site) %>%
ungroup() %>%
mutate(site = reorder(site, r_site)) %>%
ggplot(aes(y = site, x = r_site)) +
stat_halfeye(.width = c(.50, .95)) +
geom_vline(xintercept = 0, linetype = "dashed") +
theme(axis.text.y = element_text(size = 6))
This contrast plot of fecal shedding shows most paired sewersheds have credible differences (i.e., the difference does not contain 0).
SS1 has the highest fecal shedding followed by SS6. All the other sewersheds have fecal shedding estimates that are less than SS6. If SS1 is credibly greater than SS6 then it is also greater than the others.
fs.mo1.post <- posterior_samples(fs.mo1)
ss1.samples <- fs.mo1.post[, "r_site[SS1,Intercept]"]
ss6.samples <- fs.mo1.post[, "r_site[SS6,Intercept]"]
diff.samples <- ss1.samples - ss6.samples
prob.ss1.greater <- mean(diff.samples > 0)
bci.diff <- quantile(diff.samples, c(0.025, 0.975))
hist(diff.samples, main="95% Credible Interval for SS1 - SS6", xlab="Difference")
Conclusion of SS1 vs SS6
The histogram shows the 95% credible interval of the difference of SS1 minus SS6 posterior samples. The 95% CI of the difference is 0.2378 to 0.4169. The probability that SS1 is greater than SS6 is 1. This analysis seems to be at odds with the visual inspection of the half-eye plot (see Section “Halfeye Plot of the Sewershed Intercepts”), which appears to show that SS1 and SS6 share a common range with a marginal overlap of 95% credible intervals. However, the hypothesis test ‘SS1 > SS6’ is based on calculated posterior differences draw-by-draw using the joint distribution of both parameters. This underscores the importance of using numerical precision for hypothesis testing rather than relying solely on visual inspection of random effects. Also, the small overlap of distributions observed with the half-eye plot doesn’t necessarily contradict the hypothesis result; it is more a matter of how uncertainty is represented and combined.
The Fecal Shedding Random Intercept Model (fs.mo1) above shows estimated fecal shedding of SARS-CoV-2 varies by sewershed location, which accounts for 46% of the total variance. Accounting for SARS-CoV-2 VOC shedding rates may improve model performance and therefore better explain the dynamics of infections. VOC phases refers to the time when distinct variants of concern were predominant during the pandemic.
Below is the model with a fixed effect for phase. The model can be expressed as: \[ \log_{10}(y_{ij}) \sim \beta_0 + \beta x_{ij} + u_j + \epsilon_{ij} \\ u_j \sim N(0, \sigma_u^2)\\ \epsilon_{ij} \sim N(0, \sigma_\epsilon^2) \]
Where:
\(y_{ij}\) represents the outcome variable “fecal_load” for observation \(i\) in site \(j\)
\(\beta_0\) is the overall intercept
\(\beta\) is the coefficient for the fixed effect predictor \(x_{ij}\)
\(x_{ij}\) is the value of the fixed effect predictor for observation \(i\) in site \(j\)
\(u_j\) is the random intercept for site \(j\)
\(\epsilon_{ij}\) is the error term for observation \(i\) in site \(j\)
#use defaults priors
# brm.fecal.load.phase.rand <- brm(
# log10(fecal_load) ~ 1+ phase + (1|site),
# data = c19.data,
# family =gaussian,
# control=list(adapt_delta =0.95,max_treedepth=10), # try this 0.96 next
# warmup = 500,
# iter = 4000,
# chains =4,
# sample_prior = "yes",
# seed = 513,
# save_pars = save_pars(all = TRUE),
# save_ranef = TRUE,
# save_all_pars = TRUE,
# file ='brm_fs_phase.rds', # this should be deleted in the Project directory each time model rerun
# save_model = 'brm_fs_phase_model.txt'
# )
Family: gaussian
Links: mu = identity; sigma = identity
Formula: log10(fecal_load) ~ 1 + phase + (1 | site)
Data: c19.data (Number of observations: 871)
Draws: 4 chains, each with iter = 4000; warmup = 500; thin = 1;
total post-warmup draws = 14000
Multilevel Hyperparameters:
~site (Number of levels: 11)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.2918 0.0828 0.1796 0.4946 1.0013 2102 3784
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 8.4325 0.0993 8.2286 8.6296 1.0022 2153 3184
phasealpha 0.1104 0.0306 0.0499 0.1703 1.0003 6676 7528
phasedelta 0.0250 0.0333 -0.0400 0.0899 1.0004 6635 7944
phaseomicron -0.0564 0.0412 -0.1379 0.0252 1.0000 7494 8639
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.3124 0.0075 0.2984 0.3273 1.0001 9575 8987
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
prior class coef group resp dpar nlpar lb ub source
(flat) b default
(flat) b phasealpha (vectorized)
(flat) b phasedelta (vectorized)
(flat) b phaseomicron (vectorized)
student_t(3, 8.5, 2.5) Intercept default
student_t(3, 0, 2.5) sd 0 default
student_t(3, 0, 2.5) sd site 0 (vectorized)
student_t(3, 0, 2.5) sd Intercept site 0 (vectorized)
student_t(3, 0, 2.5) sigma 0 default
Inference for Stan model: anon_model.
4 chains, each with iter=4000; warmup=500; thin=1;
post-warmup draws per chain=3500, total post-warmup draws=14000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
b_Intercept 8.4325 0.0021 0.0993 8.2286 8.3713 8.4336 8.4946 8.6296 2162 1.0020
b_phasealpha 0.1104 0.0004 0.0306 0.0499 0.0899 0.1104 0.1311 0.1703 6663 1.0003
b_phasedelta 0.0250 0.0004 0.0333 -0.0400 0.0022 0.0249 0.0479 0.0899 6617 1.0004
b_phaseomicron -0.0564 0.0005 0.0412 -0.1379 -0.0844 -0.0566 -0.0288 0.0252 7480 1.0000
sd_site__Intercept 0.2918 0.0017 0.0828 0.1796 0.2351 0.2772 0.3311 0.4946 2265 1.0008
sigma 0.3124 0.0001 0.0075 0.2984 0.3072 0.3122 0.3173 0.3273 9549 0.9998
Intercept 8.4829 0.0021 0.0955 8.2907 8.4251 8.4841 8.5419 8.6704 2070 1.0019
r_site[SS1,Intercept] 0.5647 0.0021 0.1006 0.3677 0.5016 0.5624 0.6258 0.7715 2289 1.0016
r_site[SS10,Intercept] -0.3080 0.0021 0.1008 -0.5085 -0.3705 -0.3084 -0.2455 -0.1053 2343 1.0016
r_site[SS11,Intercept] -0.1846 0.0021 0.1001 -0.3840 -0.2470 -0.1860 -0.1222 0.0157 2254 1.0016
r_site[SS2,Intercept] -0.0851 0.0021 0.1004 -0.2838 -0.1478 -0.0862 -0.0228 0.1154 2278 1.0015
r_site[SS3,Intercept] -0.0969 0.0021 0.1002 -0.2948 -0.1597 -0.0978 -0.0355 0.1054 2283 1.0018
r_site[SS4,Intercept] 0.0735 0.0021 0.1005 -0.1243 0.0091 0.0725 0.1348 0.2750 2292 1.0016
r_site[SS5,Intercept] 0.0025 0.0021 0.1031 -0.1986 -0.0628 0.0009 0.0675 0.2091 2442 1.0015
r_site[SS6,Intercept] 0.2367 0.0021 0.0991 0.0417 0.1742 0.2355 0.2971 0.4383 2202 1.0018
r_site[SS7,Intercept] -0.1718 0.0021 0.1006 -0.3720 -0.2345 -0.1733 -0.1095 0.0294 2264 1.0015
r_site[SS8,Intercept] -0.2082 0.0021 0.1032 -0.4124 -0.2738 -0.2087 -0.1437 -0.0024 2351 1.0015
r_site[SS9,Intercept] 0.2008 0.0021 0.1004 0.0047 0.1382 0.1995 0.2631 0.4040 2213 1.0019
prior_Intercept 8.4580 0.0360 4.1678 0.5827 6.6062 8.4799 10.3828 15.9476 13385 1.0000
prior_sigma 2.7776 0.0269 3.2023 0.0796 0.8974 1.9666 3.6180 10.3951 14153 0.9999
prior_sd_site 2.7539 0.0279 3.2994 0.0789 0.8699 1.9002 3.5505 10.5385 14032 1.0004
lprior -4.3864 0.0002 0.0076 -4.4047 -4.3885 -4.3846 -4.3820 -4.3792 2390 1.0009
lp__ -243.7884 0.0799 3.7855 -252.1519 -246.1028 -243.4220 -241.0800 -237.3456 2246 1.0005
z_1[1,1] 2.0621 0.0122 0.5893 0.9504 1.6540 2.0468 2.4516 3.2819 2326 1.0011
z_1[1,2] -1.1281 0.0094 0.4334 -1.9962 -1.4182 -1.1183 -0.8314 -0.2908 2114 1.0009
z_1[1,3] -0.6767 0.0080 0.3708 -1.3959 -0.9263 -0.6798 -0.4217 0.0458 2133 1.0011
z_1[1,4] -0.3130 0.0072 0.3426 -0.9941 -0.5429 -0.3126 -0.0804 0.3533 2272 1.0013
z_1[1,5] -0.3562 0.0072 0.3441 -1.0424 -0.5867 -0.3586 -0.1242 0.3204 2255 1.0016
z_1[1,6] 0.2668 0.0069 0.3377 -0.3908 0.0323 0.2644 0.4931 0.9197 2427 1.0015
z_1[1,7] 0.0067 0.0068 0.3442 -0.6610 -0.2255 0.0035 0.2383 0.6806 2559 1.0014
z_1[1,8] 0.8632 0.0079 0.3839 0.1186 0.6021 0.8610 1.1156 1.6293 2352 1.0016
z_1[1,9] -0.6301 0.0079 0.3683 -1.3548 -0.8766 -0.6279 -0.3796 0.0856 2180 1.0011
z_1[1,10] -0.7621 0.0084 0.3914 -1.5410 -1.0193 -0.7609 -0.4961 -0.0072 2197 1.0010
z_1[1,11] 0.7315 0.0077 0.3722 0.0121 0.4773 0.7260 0.9813 1.4773 2367 1.0018
Samples were drawn using NUTS(diag_e) at Thu Apr 3 18:04:04 2025.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
The first sewershed in each pair has the greater COVID-19 log10(case/100k) mean. This plot compares pairwise differences of the means.
#library(tidybayes)
fs.mo2 %>%
spread_draws(r_site[site,]) %>%
compare_levels(r_site, by = site) %>%
ungroup() %>%
mutate(site = reorder(site, r_site)) %>%
ggplot(aes(y = site, x = r_site)) +
stat_halfeye() +
geom_vline(xintercept = 0, linetype = "dashed")
pp_check(fs.mo2 , type="loo_intervals",prob = 0.5,prob_outer= 0.95) +
ggplot2::ylab("Fecal Load") +
ggplot2::xlab("Sample Index") +
ggplot2::theme(
axis.text = ggplot2::element_text(size = 12),
axis.title = ggplot2::element_text(size = 14)
)
Using all posterior draws for ppc type 'loo_intervals' by default.
Computed from 14000 by 871 log-likelihood matrix.
------
MCSE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.6, 1.4]).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
elpd_diff se_diff
fs.mo2 0.0000 0.0000
fs.mo1 -12.2493 42.3244
Summary of LOO-CV Comparisons of the Two Fecal Shedding Models: The 2SE Rule
Using the “2SE rule” to estimate 95% confidence intervals of the
random intercept model, which does not contain the phase factor, has a
95% CI of-12.2 + c(-2,2) * 5.6 = (-23.4, -1.0)
. This CI
does not contain 0 indicating the random intercept model is an inferior
model in terms of predictive accuracy.
alpha.vs.delta.hyp <- brms::hypothesis(fs.mo2, "phasealpha - phasedelta > 0", alpha=0.05)
print(alpha.vs.delta.hyp, digits=4)
Hypothesis Tests for class b:
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
fs.mo2.post <- posterior_samples(fs.mo2)
alpha.samples <- fs.mo2.post[, "b_phasealpha"]
delta.samples <- fs.mo2.post[, "b_phasedelta"]
phase1.diff.samples <- alpha.samples - delta.samples
prob.alpha.greater <- mean(phase1.diff.samples > 0)
phase1.diff <- quantile(phase1.diff.samples, c(0.025, 0.975))
hist(phase1.diff.samples, main="95% Credible Interval for Alpha - Delta", xlab="Difference")
The histogram shows the 95% credible interval of the difference of Alpha minus Delta posterior samples. The 95% CI of the difference is 0.0348082 to 0.1354397. The probability that Alpha is greater than Delta is 0.9995.
delta.vs.omicron.hyp <- brms::hypothesis(fs.mo2, "phasedelta - phaseomicron > 0", alpha=0.05)
print(delta.vs.omicron.hyp, digits=4)
Hypothesis Tests for class b:
---
'CI': 90%-CI for one-sided and 95%-CI for two-sided hypotheses.
'*': For one-sided hypotheses, the posterior probability exceeds 95%;
for two-sided hypotheses, the value tested against lies outside the 95%-CI.
Posterior probabilities of point hypotheses assume equal prior probabilities.
The hypothesis tests for “Alpha > Delta” and “Delta > Omicron” show that Alpha shedding is credibly greater than Delta, Delta is credibly greater than Omicron, and therefore, Alpha is greater than Omicron.
Another way to assess the differences between virus shedding is to use pairwise comparisons via the emmeans R package, which works well with brms objects and provides a convenient way to estimate marginal means and contrasts of the entire model. Its pairwise contrasts include additional uncertainty due to random effects and residuals, which can be useful for understanding the overall model structure.
$emmeans
phase emmean lower.HPD upper.HPD
pre-VOC 8.43 8.26 8.64
alpha 8.55 8.36 8.73
delta 8.46 8.27 8.64
omicron 8.38 8.18 8.57
Point estimate displayed: median
HPD interval probability: 0.95
$contrasts
contrast estimate lower.HPD upper.HPD
(pre-VOC) - alpha -0.1105 -0.17018 -0.0504
(pre-VOC) - delta -0.0254 -0.09064 0.0398
(pre-VOC) - omicron 0.0570 -0.02426 0.1389
alpha - delta 0.0849 0.03412 0.1349
alpha - omicron 0.1669 0.09450 0.2340
delta - omicron 0.0819 0.00497 0.1558
Point estimate displayed: median
HPD interval probability: 0.95
Summary of emmeans Contrasts
HPD intervals: The presence of “lower.HPD” and “upper.HPD” columns indicates Highest Posterior Density intervals, which are Bayesian credible intervals.
HPD interval probability: The “HPD interval probability: 0.95” indicates that these are 95% credible intervals.
Major findings: The 95% HPD does not contain 0 for ‘Alpha - Delta’ and ‘Alpha - Omicron’ and ‘Delta - Omicron’ pairs, suggesting that Alpha shedding is credibly greater than Delta and Omicron, and Delta shedding is credibly greater than Omicron. These results agree with the brms::hypothesis tests above.
pre-VOC vs other: The 95% HPD contains 0 for ‘(pre-VOC) - delta’ and ’ (pre-VOC) - delta’ pairs, indicating pre-VOC is not credibly different from them. The 95% HPD does not contain 0 for ‘(pre-VOC) - alpha’ indicating pre-VOC is credibly less than Alpha.
Which model best predicts COVID-19 cases?
Three different multilevel Bayesian models were designed to predict COVID-19 cases using covariates such as N2 gene load, SARS-CoV-2 variants (e.g., pre-VOC, Alpha, Delta, and Omicron), and sewershed location.
\[ y_{ij} = \beta_0 + \beta_1 x_{ij} + b_{0j} + \epsilon_{ij} \tag {Eq. 2} \]
Explanation of Model 1 Terms
\(y_{ij}\): Response variable for the i-th observation in the j-th group, where group is sewershed location.
\(\beta_0\): Fixed intercept, representing the overall mean effect across all sewersheds.
\(\beta_1\): Fixed slope for the predictor \(x_{ij}\) (i.e., log_n2_flowadj), representing the effect of the predictor on the response variable.
\(b_{0j}\): Random intercept for the j-th group (i.e., sewershed site), representing the group-specific differences from the overall intercept.
\(\epsilon_{ij}\): Residual error for the i-th observation in the j-th group, representing the unexplained variability.
The Model 1 formula is log_case100000 ~ 1 + log_n2_flowadj + (1 | site), which means the log-transformed COVID-19 cases per 100,000 people is modeled as a function of the log-transformed adjusted flow (log_n2_flowadj) and a random intercept for each sewershed site (1 | site).
# #Define priors for the intercept and slope
# prior_intercept <- prior(normal(0, 100), class = Intercept)
# prior_slope <- prior(normal(0, 1), class = b)
#
# #Define sigma prior
# prior_sigma <- set_prior("cauchy(0, 5)", class = "sigma")
#
# brm.partpool.norm <- brm(
# log_case100000 ~ 1 + log_n2_flowadj + (1| site),
# data = c19.data,
# family =gaussian,
# prior = c(
# prior_intercept,
# prior_slope,
# prior_sigma
# ),
# warmup = 500,
# iter = 4000,
# control=list(adapt_delta =0.95, max_treedepth=10),
# chains =4,
# sample_prior = "yes",
# seed = 513,
# save_pars = save_pars(all = TRUE),
# save_ranef = TRUE,
# save_all_pars = TRUE,
# file ='brm_partpool_norm_priors.rds',
# save_model = 'brm_partpool_norm_priors_model.txt',
# future=TRUE
# )
Family: gaussian
Links: mu = identity; sigma = identity
Formula: log_case100000 ~ 1 + log_n2_flowadj + (1 | site)
Data: c19.data (Number of observations: 892)
Draws: 4 chains, each with iter = 4000; warmup = 500; thin = 1;
total post-warmup draws = 14000
Multilevel Hyperparameters:
~site (Number of levels: 11)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.5338 0.1406 0.3403 0.8809 1.0028 1754 3703
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -7.4963 0.2966 -8.0914 -6.9163 1.0007 3949 5948
log_n2_flowadj 0.7397 0.0203 0.6996 0.7793 1.0004 6763 7666
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.3229 0.0077 0.3083 0.3386 1.0002 6696 6611
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
prior class coef group resp dpar nlpar lb ub source
normal(0, 1) b user
normal(0, 1) b log_n2_flowadj (vectorized)
normal(0, 100) Intercept user
student_t(3, 0, 2.5) sd 0 default
student_t(3, 0, 2.5) sd site 0 (vectorized)
student_t(3, 0, 2.5) sd Intercept site 0 (vectorized)
cauchy(0, 5) sigma 0 user
Inference for Stan model: anon_model.
4 chains, each with iter=4000; warmup=500; thin=1;
post-warmup draws per chain=3500, total post-warmup draws=14000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
b_Intercept -7.4963 0.0047 0.2966 -8.0914 -7.6923 -7.4932 -7.2954 -6.9163 3937 1.0003
b_log_n2_flowadj 0.7397 0.0002 0.0203 0.6996 0.7258 0.7398 0.7536 0.7793 6775 1.0000
sd_site__Intercept 0.5338 0.0033 0.1406 0.3403 0.4344 0.5081 0.6030 0.8809 1854 1.0026
sigma 0.3229 0.0001 0.0077 0.3083 0.3177 0.3228 0.3279 0.3386 6641 1.0002
Intercept 1.3456 0.0035 0.1679 1.0161 1.2396 1.3461 1.4529 1.6687 2243 1.0012
r_site[SS1,Intercept] -0.6526 0.0035 0.1711 -0.9841 -0.7632 -0.6518 -0.5446 -0.3129 2328 1.0011
r_site[SS10,Intercept] 0.1741 0.0036 0.1717 -0.1527 0.0643 0.1725 0.2820 0.5155 2319 1.0011
r_site[SS11,Intercept] 0.1537 0.0036 0.1712 -0.1704 0.0424 0.1531 0.2617 0.4930 2298 1.0013
r_site[SS2,Intercept] 0.1288 0.0036 0.1714 -0.1966 0.0178 0.1275 0.2359 0.4722 2290 1.0012
r_site[SS3,Intercept] 0.2805 0.0036 0.1711 -0.0504 0.1695 0.2796 0.3883 0.6213 2315 1.0012
r_site[SS4,Intercept] -0.3923 0.0035 0.1710 -0.7234 -0.5010 -0.3921 -0.2847 -0.0528 2322 1.0012
r_site[SS5,Intercept] 0.4650 0.0036 0.1726 0.1378 0.3532 0.4632 0.5718 0.8121 2364 1.0012
r_site[SS6,Intercept] -0.7903 0.0035 0.1707 -1.1226 -0.9000 -0.7904 -0.6822 -0.4532 2331 1.0011
r_site[SS7,Intercept] -0.0666 0.0036 0.1709 -0.3921 -0.1769 -0.0673 0.0407 0.2697 2275 1.0012
r_site[SS8,Intercept] 0.7655 0.0036 0.1737 0.4355 0.6524 0.7642 0.8730 1.1143 2351 1.0011
r_site[SS9,Intercept] -0.0314 0.0036 0.1708 -0.3582 -0.1420 -0.0317 0.0762 0.3095 2315 1.0012
prior_Intercept -0.2559 0.8378 100.2397 -196.1637 -68.9480 0.0552 67.0257 195.0874 14316 1.0001
prior_b 0.0114 0.0085 1.0011 -1.9395 -0.6581 0.0109 0.6875 1.9818 13954 1.0000
prior_sigma 28.5709 2.9664 346.1256 0.1966 1.9786 4.8647 11.6964 122.5918 13614 0.9999
prior_sd_site 2.7630 0.0262 3.1019 0.0873 0.8867 1.9298 3.5581 10.4344 14057 0.9999
lprior -10.0383 0.0005 0.0248 -10.0949 -10.0510 -10.0357 -10.0220 -9.9981 2860 1.0014
lp__ -283.2441 0.0781 3.5520 -291.1038 -285.4774 -282.9222 -280.6651 -277.2153 2067 1.0028
z_1[1,1] -1.2985 0.0099 0.4372 -2.1707 -1.5935 -1.2890 -0.9994 -0.4777 1938 1.0018
z_1[1,2] 0.3438 0.0067 0.3218 -0.2677 0.1250 0.3403 0.5615 0.9880 2318 1.0014
z_1[1,3] 0.3033 0.0067 0.3191 -0.2979 0.0832 0.3030 0.5168 0.9387 2289 1.0016
z_1[1,4] 0.2538 0.0067 0.3174 -0.3506 0.0337 0.2530 0.4667 0.8930 2277 1.0013
z_1[1,5] 0.5555 0.0070 0.3355 -0.0828 0.3222 0.5546 0.7788 1.2275 2326 1.0015
z_1[1,6] -0.7811 0.0081 0.3639 -1.4956 -1.0275 -0.7745 -0.5332 -0.0857 2037 1.0015
z_1[1,7] 0.9218 0.0078 0.3759 0.2149 0.6615 0.9156 1.1756 1.6762 2330 1.0020
z_1[1,8] -1.5723 0.0111 0.4831 -2.5362 -1.8983 -1.5623 -1.2397 -0.6593 1900 1.0021
z_1[1,9] -0.1344 0.0067 0.3146 -0.7374 -0.3504 -0.1337 0.0790 0.4868 2183 1.0012
z_1[1,10] 1.5188 0.0099 0.4663 0.6489 1.1957 1.5093 1.8332 2.4650 2240 1.0023
z_1[1,11] -0.0641 0.0066 0.3129 -0.6596 -0.2811 -0.0641 0.1456 0.5533 2250 1.0013
Samples were drawn using NUTS(diag_e) at Tue Feb 4 18:01:44 2025.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
pp_check(mo1, type="loo_intervals",prob = 0.5, prob_outer= 0.95)+
ggplot2::ylab("Cases per 100,000 (log10)") +
ggplot2::xlab("Index") +
ggplot2::theme(
axis.text = ggplot2::element_text(size = 12),
axis.title = ggplot2::element_text(size = 14)
)
Using all posterior draws for ppc type 'loo_intervals' by default.
This plot shows how the response variable (log_case100000) changes as log_n2_flowadj varies, while holding other predictors constant.
This plot shows how the sites-specific baseline varies across the 11 sewersheds.
#library(tidybayes)
#define custom order for sewershed sites
site_order <- c("SS1", "SS2", "SS3", "SS4", "SS5", "SS6", "SS7", "SS8", "SS9", "SS10", "SS11")
#extract random intercepts
mo1 %>%
spread_draws(r_site[site,term]) %>%
filter(term %in% c("Intercept")) %>%
ungroup() %>%
mutate(site = factor(site, levels = site_order, ordered = TRUE)) %>%
ggplot(aes(y = site, x = r_site)) +
stat_halfeye(.width = c(.50, .95)) +
geom_vline(xintercept = 0, linetype = "dashed") +
facet_wrap(~term, scales = "free_x") +
labs(x = "Random Effect", y = "Site",
title = "Random Effects by Site")+
scale_y_discrete(limits = rev(site_order)) #reverse site order for top-to-bottom layout
The model was fit using 4 MCMC chains, each with 4000 iterations (500 warmup), resulting in 14,000 (i.e., 16000 minus 4 * 500 burnins) total post-warmup draws.
The group-level (random) effect shows that the standard deviation of the sewershed intercept across sites is 0.5375 with a 95% credible interval of [0.3352, 0.8867].
Fixed Effects - The population-level (fixed) effects show that:
The estimated intercept is -7.4963 with a 95% credible interval of [-8.0914, -6.9163].
The estimated coefficient for log_n2_flowadj is 0.7397 with a 95% credible interval of [0.6996, 0.7793]. This indicates that for every 1-unit increase in log_n2_flowadj, log_case100000 increases by approximately 0.74 on average.
Model Fit
Model Diagnostics
PPC and Conditional Effects
MCMC Intervals of Model 1
Random Effects for Intercepts
The model estimates site-specific intercepts, representing the baseline log cases per 100,000 when log_n2_flowadj is zero:
Highest intercepts: SS5 and SS8
Lowest intercepts: SS1, SS4, and SS6
Uncertainty: Intervals that don’t include zero suggest stronger evidence for a site-specific effect.
Precision: The standard errors are relatively consistent across sites, indicating similar precision in the estimates.
Intraclass Correlation Coefficient (ICC)
\[ICC=0.5338^2/(0.5338^2 + 0.3229^2)\approx 0.7321 \] The ICC is 73.2%, which indicates most of the variance is between sewershed locations rather than within them.
\[ y_{ij} = \beta_0 + \beta_1 x_{ij} + b_{0j} + b_{1j} x_{ij} + \epsilon_{ij} \tag{Eq. 3} \]
Explanation of Model 2 Terms
\(y_{ij}\): Response variable (log_case100000) for the i-th observation in the j-th group, where group is sewershed location. See Model 1 for the sewershed abbreviations.
\(\beta_0\): Fixed intercept, representing the overall mean effect across all sewershed groups.
\(\beta_1\): Fixed slope for the predictor \(x_{ij}\) (log_n2_flowadj ), representing the effect of the predictor on the response variable.
\(b_{0j}\): Random intercept for the j-th group, representing the sewershed group-specific differences from the overall intercept.
\(b_{1j}\): Random slope for the predictor \(x_{ij}\) in the j-th group, representing the sewershed group-specific differences in the effect of the predictor.
\(\epsilon_{ij}\): Residual error for the i-th observation in the j-th group, representing the unexplained variability.
# #priors for the intercept and slope
# prior_intercept <- prior(normal(0, 100), class = Intercept)
# prior_slope <- prior(normal(0, 1), class = b)
#
# #sigma prior
# prior_sigma <- set_prior("cauchy(0, 5)", class = "sigma")
#
# brm.partpool.randrand <- brm(
# log_case100000 ~ 1+ log_n2_flowadj + (1+log_n2_flowadj|site),
# data = c19.data,
# family =gaussian,
# prior = c(
# prior_intercept,
# prior_slope,
# prior_sigma
# ),
# control=list(adapt_delta =0.95,max_treedepth=10),
# warmup = 500,
# iter = 4000,
# chains =4,
# sample_prior = "yes",
# seed = 513,
# save_pars = save_pars(all = TRUE),
# save_ranef = TRUE,
# save_all_pars = TRUE,
# file ='brm_partpool_randrand.rds',
# save_model = 'brm_partpool_randrand_model.txt',
# future=TRUE
# )
Family: gaussian
Links: mu = identity; sigma = identity
Formula: log_case100000 ~ 1 + log_n2_flowadj + (1 + log_n2_flowadj | site)
Data: c19.data (Number of observations: 892)
Draws: 4 chains, each with iter = 4000; warmup = 500; thin = 1;
total post-warmup draws = 14000
Multilevel Hyperparameters:
~site (Number of levels: 11)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.8337 0.4975 1.0525 2.9867 1.0009 4266 6431
sd(log_n2_flowadj) 0.1275 0.0382 0.0664 0.2163 1.0005 3991 6364
cor(Intercept,log_n2_flowadj) -0.9482 0.0479 -0.9917 -0.8196 1.0006 5519 7909
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -7.6546 0.6136 -8.8863 -6.4530 1.0016 4528 6533
log_n2_flowadj 0.7506 0.0446 0.6639 0.8402 1.0011 4894 6261
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.3164 0.0076 0.3019 0.3317 1.0005 22241 10023
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The priors for Model 2 are chosen to be weakly informative.
prior class coef group resp dpar nlpar lb ub source
normal(0, 1) b user
normal(0, 1) b log_n2_flowadj (vectorized)
normal(0, 100) Intercept user
lkj_corr_cholesky(1) L default
lkj_corr_cholesky(1) L site (vectorized)
student_t(3, 0, 2.5) sd 0 default
student_t(3, 0, 2.5) sd site 0 (vectorized)
student_t(3, 0, 2.5) sd Intercept site 0 (vectorized)
student_t(3, 0, 2.5) sd log_n2_flowadj site 0 (vectorized)
cauchy(0, 5) sigma 0 user
Inference for Stan model: anon_model.
4 chains, each with iter=4000; warmup=500; thin=1;
post-warmup draws per chain=3500, total post-warmup draws=14000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
b_Intercept -7.6546 0.0092 0.6136 -8.8863 -8.0406 -7.6477 -7.2607 -6.4530 4477 1.0017
b_log_n2_flowadj 0.7506 0.0006 0.0446 0.6639 0.7222 0.7496 0.7783 0.8402 4826 1.0011
sd_site__Intercept 1.8337 0.0076 0.4975 1.0525 1.4860 1.7697 2.1038 2.9867 4329 1.0008
sd_site__log_n2_flowadj 0.1275 0.0006 0.0382 0.0664 0.1011 0.1230 0.1483 0.2163 4025 1.0005
cor_site__Intercept__log_n2_flowadj -0.9482 0.0006 0.0479 -0.9917 -0.9774 -0.9622 -0.9361 -0.8196 6018 1.0003
sigma 0.3164 0.0001 0.0076 0.3019 0.3112 0.3162 0.3214 0.3317 22004 1.0000
Intercept 1.3168 0.0028 0.1823 0.9542 1.2027 1.3170 1.4308 1.6785 4365 1.0014
r_site[SS1,Intercept] -2.3239 0.0103 0.9735 -4.3252 -2.9518 -2.2821 -1.6728 -0.5028 8962 1.0006
r_site[SS10,Intercept] 0.1749 0.0101 0.8655 -1.5518 -0.3844 0.1811 0.7366 1.8703 7367 1.0007
r_site[SS11,Intercept] -0.8238 0.0105 0.8803 -2.6069 -1.3962 -0.8051 -0.2325 0.8451 7074 1.0005
r_site[SS2,Intercept] 0.8720 0.0099 0.8095 -0.6867 0.3347 0.8510 1.4017 2.5017 6680 1.0006
r_site[SS3,Intercept] 0.1292 0.0104 0.8583 -1.5789 -0.4241 0.1295 0.6827 1.8228 6829 1.0004
r_site[SS4,Intercept] -1.4667 0.0101 0.8560 -3.1831 -2.0188 -1.4425 -0.8959 0.1655 7162 1.0006
r_site[SS5,Intercept] 1.9144 0.0099 0.8815 0.2718 1.3070 1.8920 2.4865 3.7186 7870 1.0005
r_site[SS6,Intercept] -1.1003 0.0099 0.8584 -2.7741 -1.6577 -1.1019 -0.5342 0.5699 7593 1.0008
r_site[SS7,Intercept] 1.3272 0.0101 0.8264 -0.2462 0.7745 1.2980 1.8526 3.0261 6714 1.0008
r_site[SS8,Intercept] 2.8467 0.0102 0.8606 1.2419 2.2556 2.8166 3.3954 4.6254 7175 1.0009
r_site[SS9,Intercept] -1.7206 0.0120 1.0169 -3.8258 -2.3625 -1.6857 -1.0279 0.1789 7187 1.0001
r_site[SS1,log_n2_flowadj] 0.1333 0.0007 0.0742 -0.0055 0.0834 0.1306 0.1812 0.2858 10032 1.0004
r_site[SS10,log_n2_flowadj] 0.0030 0.0007 0.0697 -0.1323 -0.0420 0.0021 0.0480 0.1431 8699 1.0003
r_site[SS11,log_n2_flowadj] 0.0858 0.0008 0.0698 -0.0442 0.0386 0.0838 0.1310 0.2297 8173 1.0001
r_site[SS2,log_n2_flowadj] -0.0616 0.0007 0.0642 -0.1920 -0.1034 -0.0600 -0.0182 0.0615 7885 1.0003
r_site[SS3,log_n2_flowadj] 0.0156 0.0008 0.0680 -0.1174 -0.0289 0.0143 0.0592 0.1522 7860 1.0001
r_site[SS4,log_n2_flowadj] 0.0900 0.0007 0.0659 -0.0358 0.0457 0.0880 0.1331 0.2227 8123 1.0003
r_site[SS5,log_n2_flowadj] -0.1213 0.0007 0.0702 -0.2648 -0.1667 -0.1192 -0.0732 0.0103 9368 1.0002
r_site[SS6,log_n2_flowadj] 0.0256 0.0007 0.0646 -0.1022 -0.0174 0.0269 0.0676 0.1521 8376 1.0005
r_site[SS7,log_n2_flowadj] -0.1159 0.0007 0.0649 -0.2503 -0.1568 -0.1138 -0.0721 0.0069 7641 1.0005
r_site[SS8,log_n2_flowadj] -0.1825 0.0008 0.0697 -0.3277 -0.2270 -0.1800 -0.1344 -0.0529 8570 1.0005
r_site[SS9,log_n2_flowadj] 0.1418 0.0009 0.0805 -0.0057 0.0864 0.1390 0.1925 0.3104 7844 1.0000
prior_Intercept 0.6414 0.8601 100.2844 -196.1053 -67.3463 0.8525 68.3187 194.6846 13594 1.0000
prior_b 0.0128 0.0083 0.9980 -1.9391 -0.6626 0.0144 0.7002 1.9554 14374 0.9999
prior_sigma 33.3944 5.0353 596.7249 0.1957 2.1099 4.9905 12.0436 128.9708 14044 1.0000
prior_sd_site 2.8048 0.0687 8.0890 0.0829 0.8827 1.8945 3.5310 10.1744 13856 1.0000
prior_cor_site 0.0063 0.0048 0.5768 -0.9504 -0.4895 0.0037 0.5082 0.9520 14345 0.9999
lprior -12.2781 0.0027 0.1798 -12.7290 -12.3631 -12.2428 -12.1528 -12.0357 4528 1.0008
lp__ -286.4031 0.0892 5.0099 -296.9129 -289.6187 -286.0476 -282.8270 -277.5582 3153 1.0010
z_1[1,1] -1.3213 0.0062 0.5680 -2.5017 -1.6919 -1.2953 -0.9307 -0.2749 8327 1.0006
z_1[1,2] 0.1004 0.0055 0.4828 -0.8550 -0.2189 0.1028 0.4198 1.0481 7840 1.0008
z_1[1,3] -0.4641 0.0057 0.4925 -1.4694 -0.7818 -0.4567 -0.1323 0.4704 7588 1.0005
z_1[1,4] 0.4961 0.0055 0.4553 -0.3757 0.1919 0.4834 0.7924 1.4202 6841 1.0004
z_1[1,5] 0.0779 0.0057 0.4803 -0.8664 -0.2381 0.0739 0.3857 1.0420 7167 1.0004
z_1[1,6] -0.8393 0.0058 0.4965 -1.8774 -1.1571 -0.8170 -0.5063 0.0889 7262 1.0008
z_1[1,7] 1.0878 0.0058 0.5064 0.1434 0.7425 1.0748 1.4164 2.1232 7708 1.0002
z_1[1,8] -0.6390 0.0060 0.5057 -1.6877 -0.9673 -0.6218 -0.2931 0.2996 7214 1.0009
z_1[1,9] 0.7523 0.0056 0.4685 -0.1380 0.4346 0.7318 1.0575 1.7101 6942 1.0007
z_1[1,10] 1.6264 0.0068 0.5353 0.6426 1.2534 1.6068 1.9745 2.7334 6247 1.0006
z_1[1,11] -0.9656 0.0062 0.5636 -2.1299 -1.3291 -0.9488 -0.5834 0.0994 8377 1.0002
z_1[2,1] -0.6156 0.0088 0.6582 -1.8278 -1.0650 -0.6395 -0.2072 0.7577 5640 1.0005
z_1[2,2] 0.4327 0.0061 0.5554 -0.6655 0.0780 0.4276 0.7845 1.5549 8371 1.0001
z_1[2,3] 0.9236 0.0060 0.5501 -0.0682 0.5445 0.8865 1.2744 2.0885 8497 1.0000
z_1[2,4] -0.1101 0.0058 0.5247 -1.2076 -0.4353 -0.0898 0.2420 0.8593 8185 1.0003
z_1[2,5] 0.7184 0.0061 0.5446 -0.3087 0.3624 0.7019 1.0629 1.8439 7941 1.0000
z_1[2,6] -0.2062 0.0068 0.5485 -1.2063 -0.5777 -0.2345 0.1266 0.9721 6488 1.0001
z_1[2,7] 0.1734 0.0078 0.6146 -1.0997 -0.2271 0.1965 0.5961 1.3110 6214 1.0005
z_1[2,8] -1.4170 0.0077 0.6014 -2.6100 -1.8146 -1.4134 -1.0186 -0.2281 6064 1.0000
z_1[2,9] -0.8377 0.0064 0.5497 -1.9876 -1.1861 -0.8026 -0.4591 0.1526 7385 1.0005
z_1[2,10] 0.1874 0.0090 0.7124 -1.2665 -0.2767 0.2116 0.6737 1.5221 6201 1.0003
z_1[2,11] 0.8069 0.0069 0.6013 -0.2725 0.3936 0.7745 1.1826 2.0850 7599 1.0003
L_1[1,1] 1.0000 NaN 0.0000 1.0000 1.0000 1.0000 1.0000 1.0000 NaN NaN
L_1[1,2] 0.0000 NaN 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 NaN NaN
L_1[2,1] -0.9482 0.0006 0.0479 -0.9917 -0.9774 -0.9622 -0.9361 -0.8196 6018 1.0003
L_1[2,2] 0.2927 0.0015 0.1140 0.1288 0.2115 0.2722 0.3518 0.5729 5642 1.0005
Samples were drawn using NUTS(diag_e) at Wed Feb 5 12:23:11 2025.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
pp_check(mo2, type="loo_intervals",prob = 0.5,prob_outer= 0.95)+# ,ndraws = 50)
ggplot2::ylab("Cases per 100,000 (log10)") +
ggplot2::xlab("Index") +
ggplot2::theme(
axis.text = ggplot2::element_text(size = 12),
axis.title = ggplot2::element_text(size = 14)
)
Using all posterior draws for ppc type 'loo_intervals' by default.
CE.m2 <- conditional_effects(mo2,effects="log_n2_flowadj",re_formula = NULL, conditions = data.frame(site = unique(c19.data$site) ) )
#NB, the re_formula = NULL includes the random effects in the predictions, allowing site-specific plots
ggplot(CE.m2[[1]], aes(x = log_n2_flowadj, y = estimate__, group = site, color = site)) +
geom_line() +
geom_ribbon(aes(ymin = lower__, ymax = upper__, fill = site), alpha = 0.2) +
theme_minimal() +
labs(title = "Conditional Effects of log_n2_flowadj by Site",
x = "log_n2_flowadj",
y = "log_case100000") +
theme(legend.position = "right")
#library(tidybayes)
#define custom order for sewershed sites
site_order <- c("SS1", "SS2", "SS3", "SS4", "SS5", "SS6", "SS7", "SS8", "SS9", "SS10", "SS11")
#extract random intercepts and slopes
mo2 %>%
spread_draws(r_site[site,term]) %>%
filter(term %in% c("Intercept", "log_n2_flowadj")) %>%
ungroup() %>%
mutate(site = factor(site, levels = site_order, ordered = TRUE)) %>%
ggplot(aes(y = site, x = r_site)) +
stat_halfeye(.width = c(.50, .95)) +
geom_vline(xintercept = 0, linetype = "dashed") +
facet_wrap(~term, scales = "free_x") +
labs(x = "Random Effect", y = "Site",
title = "Random Effects by Site",
subtitle = "Intercepts and Slopes for log_n2_flowadj") +
scale_y_discrete(limits = rev(site_order)) #reverse site order for top-to-bottom layout
Model 2 predicts log_case100000 using log_n2_flowadj as a fixed effect, with random intercepts and slopes for each site. The formula for this model is: log_case100000 ~ 1 + log_n2_flowadj + (1 + log_n2_flowadj | site). Model 2 was fit using 4 MCMC chains, each with 4000 iterations (500 warmup each), resulting in 14,000 total post-warmup draws.
Fixed Effects
The fixed effects show the overall relationship between the predictor and response:
Intercept: 7.6546 (95% CI: -8.8863 to -6.4530)
log_n2_flowadj: 0.7506 (95% CI: 0.66389 to 0.8402)
This indicates that for every 1-unit increase in log_n2_flowadj, log_case100000 increases by approximately 0.75 on average.
Random Effects
The model includes random effects for 11 different sewershed locations:
sd(Intercept): The standard deviation of the intercept is 1.8337 (95% CI: 1.0525 to 2.9867)
sd(log_n2_flowadj): The standard deviation of the slope is 0.1275 (95% CI: 0.0664 to 0.2163)
cor(Intercept,log_n2_flowadj): The correlation between intercept and slope is -0.9482 (95% CI: -0.9917 to -0.8186)
This suggests considerable variation in intercepts across sites, with less variation in slopes. The strong negative correlation indicates that sites with higher intercepts tend to have lower slopes.
Model Fit
This represents the residual standard deviation, indicating the deviation of observed values from the Model 2’s predictions.
Model Diagnostics
The Rhat values are all close to 1, suggesting good convergence of the MCMC chains. The effective sample sizes (Bulk_ESS and Tail_ESS) are generally high, indicating reliable posterior estimates.
PPC and Conditional Effects
The observed data fell within both 50% and 95% LOO intervals and the conditional effects plot show a positive, linear relationship between flow-adjusted N2 and log cases per 100,000.
Random Effects for Intercepts
The model estimates site-specific intercepts, representing the baseline log cases per 100,000 when log_n2_flowadj is zero:
Highest intercepts: SS5, SS7, and SS8
Lowest intercepts: SS1, SS4, and SS9
Random Effects for Slopes
The relationship between flow-adjusted N2 and log case counts per 100000 varies across sites. The notable site-specific slopes for log_n2_flowadj are:
Strongest positive associations: SS1, SS4, SS9, and SS11
Strongest negative associations: SS5, SS7, and SS8
In summary, Model 2 results suggest a positive relationship between log_n2_flowadj and log_case100000, with significant variation across sites in both the baseline levels (intercepts) and slopes.
\[ y_{ij} = \beta_0 + \beta_1 x_{ij} + \beta_2 z_{ij} + \beta_3 (x_{ij} \times z_{ij}) + b_{0j} + b_{1j} x_{ij} + \epsilon_{ij} \tag{Eq. 4} \]
Explanation of Model 3 Terms
\(y_{ij}\): Response variable (log_case100000) for the i-th observation in the j-th group, where group is sewershed sites. See Model 1 for abbreviations.
\(\beta_0\): Fixed intercept, representing the overall mean effect across all groups.
\(\beta_1\): Fixed slope for the predictor \(x_{ij}\) (log_n2_flowadj), representing the effect of the predictor on the response variable.
\(\beta_2\): Fixed slope for the predictor \(z_{ij}\), representing the effect of the second predictor (VOC phase) on the response variable.
\(\beta_3\): Fixed slope for the interaction term \(x_{ij} \times z_{ij}\), representing the interactions of the two predictors (log_n2_flowadj and log_case100000) on the response variable.
\(b_{0j}\): Random intercept for the j-th group, representing the group-specific differences from the overall intercept.
\(b_{1j}\): Random slope for the predictor \(x_{ij}\) in the j-th group, representing the group-specific differences in the effect of the predictor.
\(\epsilon_{ij}\): Residual error for the i-th observation in the j-th group, representing the unexplained variability.
# # Define priors for the intercept and slope
# prior_intercept <- prior(normal(0, 100), class = Intercept)
# prior_slope <- prior(normal(0, 1), class = b)
#
# # Define sigma prior
# prior_sigma <- set_prior("cauchy(0, 5)", class = "sigma")
#
# brm.partpool.rand.phase <- brm(
# log_case100000 ~ 1+ log_n2_flowadj * phase + (1+log_n2_flowadj|site),
# data = c19.data,
# family =gaussian,
# prior = c(
# prior_intercept,
# prior_slope,
# prior_sigma
# ),
# control=list(adapt_delta =0.95,max_treedepth=10), # try this next
# warmup = 500, # comment this and see if the divergent trans go away
# iter = 4000, #increase
# chains =4,
# sample_prior = "yes",
# seed = 513,
# save_pars = save_pars(all = TRUE),
# save_ranef = TRUE,
# save_all_pars = TRUE,
# file ='brm_partpool_rand.phase.rds', # this should be deleted in the Project directory each time model rerun
# save_model = 'brm_partpool_rand_phase_model.txt',
# future=TRUE
# )
Family: gaussian
Links: mu = identity; sigma = identity
Formula: log_case100000 ~ 1 + log_n2_flowadj * phase + (1 + log_n2_flowadj | site)
Data: c19.data (Number of observations: 892)
Draws: 4 chains, each with iter = 4000; warmup = 500; thin = 1;
total post-warmup draws = 14000
Multilevel Hyperparameters:
~site (Number of levels: 11)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.5707 0.4482 0.8769 2.6264 1.0003 4673 6908
sd(log_n2_flowadj) 0.1099 0.0340 0.0563 0.1904 1.0005 4373 6769
cor(Intercept,log_n2_flowadj) -0.9582 0.0427 -0.9938 -0.8486 1.0008 6312 8770
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept -4.4266 0.6750 -5.7518 -3.0781 1.0005 4386 6338
log_n2_flowadj 0.5000 0.0512 0.3978 0.6008 1.0004 4327 6696
phasealpha -1.7622 0.4501 -2.6271 -0.8783 1.0001 5104 7414
phasedelta 0.1541 0.4702 -0.7615 1.0876 1.0001 5838 8290
phaseomicron -1.7788 0.4935 -2.7411 -0.8160 1.0003 6070 8365
log_n2_flowadj:phasealpha 0.1119 0.0370 0.0395 0.1829 1.0001 5171 7544
log_n2_flowadj:phasedelta -0.0259 0.0385 -0.1026 0.0489 1.0001 5850 8256
log_n2_flowadj:phaseomicron 0.1457 0.0403 0.0673 0.2249 1.0003 6116 8290
Further Distributional Parameters:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma 0.2736 0.0066 0.2612 0.2869 1.0002 20233 9925
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The priors for Model 3 are the same as Model 2 and are weakly informative.
prior class coef group resp dpar nlpar lb ub source
normal(0, 1) b user
normal(0, 1) b log_n2_flowadj (vectorized)
normal(0, 1) b log_n2_flowadj:phasealpha (vectorized)
normal(0, 1) b log_n2_flowadj:phasedelta (vectorized)
normal(0, 1) b log_n2_flowadj:phaseomicron (vectorized)
normal(0, 1) b phasealpha (vectorized)
normal(0, 1) b phasedelta (vectorized)
normal(0, 1) b phaseomicron (vectorized)
normal(0, 100) Intercept user
lkj_corr_cholesky(1) L default
lkj_corr_cholesky(1) L site (vectorized)
student_t(3, 0, 2.5) sd 0 default
student_t(3, 0, 2.5) sd site 0 (vectorized)
student_t(3, 0, 2.5) sd Intercept site 0 (vectorized)
student_t(3, 0, 2.5) sd log_n2_flowadj site 0 (vectorized)
cauchy(0, 5) sigma 0 user
Inference for Stan model: anon_model.
4 chains, each with iter=4000; warmup=500; thin=1;
post-warmup draws per chain=3500, total post-warmup draws=14000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
b_Intercept -4.4266 0.0102 0.6750 -5.7518 -4.8725 -4.4319 -3.9869 -3.0781 4379 1.0005
b_log_n2_flowadj 0.5000 0.0008 0.0512 0.3978 0.4665 0.5000 0.5339 0.6008 4311 1.0004
b_phasealpha -1.7622 0.0063 0.4501 -2.6271 -2.0723 -1.7684 -1.4555 -0.8783 5081 1.0001
b_phasedelta 0.1541 0.0062 0.4702 -0.7615 -0.1625 0.1538 0.4682 1.0876 5828 1.0001
b_phaseomicron -1.7788 0.0064 0.4935 -2.7411 -2.1171 -1.7829 -1.4469 -0.8160 6027 1.0003
b_log_n2_flowadj:phasealpha 0.1119 0.0005 0.0370 0.0395 0.0868 0.1123 0.1373 0.1829 5148 1.0001
b_log_n2_flowadj:phasedelta -0.0259 0.0005 0.0385 -0.1026 -0.0516 -0.0259 0.0001 0.0489 5836 1.0001
b_log_n2_flowadj:phaseomicron 0.1457 0.0005 0.0403 0.0673 0.1186 0.1461 0.1732 0.2249 6067 1.0003
sd_site__Intercept 1.5707 0.0066 0.4482 0.8769 1.2555 1.5079 1.8088 2.6264 4652 1.0004
sd_site__log_n2_flowadj 0.1099 0.0005 0.0340 0.0563 0.0862 0.1054 0.1280 0.1904 4436 1.0005
cor_site__Intercept__log_n2_flowadj -0.9582 0.0005 0.0427 -0.9938 -0.9831 -0.9707 -0.9489 -0.8486 6494 1.0004
sigma 0.2736 0.0000 0.0066 0.2612 0.2690 0.2735 0.2779 0.2869 20371 0.9999
Intercept 1.3075 0.0019 0.1435 1.0229 1.2173 1.3061 1.3957 1.6003 5463 1.0001
r_site[SS1,Intercept] -2.1159 0.0093 0.8576 -3.8755 -2.6574 -2.0791 -1.5361 -0.5029 8507 1.0007
r_site[SS10,Intercept] 0.0770 0.0085 0.7449 -1.4551 -0.4004 0.0870 0.5642 1.5058 7697 1.0000
r_site[SS11,Intercept] -0.5132 0.0090 0.7625 -2.0806 -1.0118 -0.4958 0.0021 0.9475 7217 1.0000
r_site[SS2,Intercept] 0.9893 0.0087 0.7194 -0.3906 0.5126 0.9720 1.4526 2.4670 6833 1.0004
r_site[SS3,Intercept] 0.3037 0.0085 0.7332 -1.1425 -0.1793 0.3068 0.7802 1.7434 7436 1.0003
r_site[SS4,Intercept] -1.5327 0.0084 0.7439 -3.0335 -2.0209 -1.5078 -1.0291 -0.1138 7810 1.0001
r_site[SS5,Intercept] 1.6220 0.0084 0.7696 0.2013 1.0972 1.5973 2.1104 3.2258 8450 1.0002
r_site[SS6,Intercept] -0.9404 0.0088 0.7640 -2.4419 -1.4414 -0.9419 -0.4400 0.5618 7513 1.0006
r_site[SS7,Intercept] 0.8500 0.0085 0.7204 -0.5146 0.3585 0.8329 1.3191 2.3140 7172 1.0006
r_site[SS8,Intercept] 2.3240 0.0087 0.7430 0.9403 1.8189 2.2987 2.7971 3.8613 7358 1.0003
r_site[SS9,Intercept] -1.1853 0.0095 0.8679 -3.0291 -1.7405 -1.1476 -0.5813 0.3818 8300 0.9998
r_site[SS1,log_n2_flowadj] 0.1291 0.0007 0.0656 0.0058 0.0848 0.1272 0.1711 0.2630 9139 1.0007
r_site[SS10,log_n2_flowadj] 0.0020 0.0006 0.0600 -0.1135 -0.0373 0.0016 0.0402 0.1254 9108 1.0000
r_site[SS11,log_n2_flowadj] 0.0576 0.0007 0.0603 -0.0559 0.0167 0.0555 0.0971 0.1828 8130 0.9999
r_site[SS2,log_n2_flowadj] -0.0776 0.0007 0.0569 -0.1941 -0.1144 -0.0761 -0.0390 0.0301 7477 1.0004
r_site[SS3,log_n2_flowadj] -0.0027 0.0006 0.0580 -0.1166 -0.0411 -0.0040 0.0352 0.1138 8436 1.0003
r_site[SS4,log_n2_flowadj] 0.1007 0.0006 0.0575 -0.0080 0.0622 0.0985 0.1377 0.2179 8783 1.0001
r_site[SS5,log_n2_flowadj] -0.1034 0.0006 0.0610 -0.2308 -0.1429 -0.1011 -0.0618 0.0105 9779 1.0001
r_site[SS6,log_n2_flowadj] 0.0276 0.0007 0.0576 -0.0867 -0.0097 0.0277 0.0653 0.1413 7834 1.0005
r_site[SS7,log_n2_flowadj] -0.0772 0.0006 0.0566 -0.1942 -0.1137 -0.0759 -0.0384 0.0296 8026 1.0005
r_site[SS8,log_n2_flowadj] -0.1514 0.0007 0.0602 -0.2777 -0.1897 -0.1492 -0.1107 -0.0381 8443 1.0003
r_site[SS9,log_n2_flowadj] 0.1008 0.0007 0.0686 -0.0210 0.0528 0.0974 0.1440 0.2463 9075 0.9998
prior_Intercept 0.4701 0.8750 101.1153 -199.4187 -66.8473 1.2687 68.9080 196.2551 13353 1.0000
prior_b -0.0015 0.0084 0.9940 -1.9466 -0.6720 0.0042 0.6816 1.9284 14072 1.0000
prior_sigma 36.0892 9.0622 1072.6967 0.1964 2.0905 5.0482 12.0938 124.0882 14011 1.0001
prior_sd_site 2.7879 0.0300 3.5746 0.0904 0.8735 1.8911 3.5847 10.6203 14154 0.9999
prior_cor_site 0.0011 0.0049 0.5774 -0.9486 -0.5016 0.0001 0.5016 0.9515 14008 0.9998
lprior -21.0501 0.0217 1.4914 -24.4567 -21.9145 -20.8633 -19.9398 -18.7751 4743 1.0004
lp__ -166.5522 0.0968 5.4523 -178.2222 -170.0345 -166.2070 -162.6789 -156.9892 3173 1.0005
z_1[1,1] -1.4023 0.0061 0.5694 -2.5932 -1.7716 -1.3834 -1.0154 -0.3266 8597 1.0006
z_1[1,2] 0.0522 0.0054 0.4857 -0.9232 -0.2672 0.0566 0.3755 1.0000 8195 1.0000
z_1[1,3] -0.3363 0.0055 0.4984 -1.3581 -0.6612 -0.3281 0.0014 0.6340 8172 1.0000
z_1[1,4] 0.6569 0.0055 0.4730 -0.2457 0.3358 0.6455 0.9647 1.6286 7288 1.0002
z_1[1,5] 0.2059 0.0056 0.4832 -0.7313 -0.1176 0.2048 0.5225 1.1641 7564 1.0001
z_1[1,6] -1.0252 0.0056 0.5097 -2.0830 -1.3588 -1.0034 -0.6727 -0.0694 8400 1.0002
z_1[1,7] 1.0806 0.0057 0.5191 0.1185 0.7234 1.0591 1.4186 2.1418 8329 0.9999
z_1[1,8] -0.6371 0.0060 0.5235 -1.7206 -0.9694 -0.6287 -0.2853 0.3553 7515 1.0006
z_1[1,9] 0.5635 0.0054 0.4758 -0.3358 0.2382 0.5504 0.8761 1.5104 7738 1.0004
z_1[1,10] 1.5580 0.0068 0.5439 0.5542 1.1812 1.5378 1.9074 2.7000 6471 1.0000
z_1[1,11] -0.7757 0.0056 0.5630 -1.9418 -1.1358 -0.7589 -0.3909 0.2620 10078 0.9998
z_1[2,1] -0.5081 0.0082 0.6767 -1.7291 -0.9756 -0.5436 -0.0849 0.9122 6797 1.0000
z_1[2,2] 0.2809 0.0053 0.5816 -0.8644 -0.0959 0.2790 0.6492 1.4412 12071 1.0000
z_1[2,3] 0.9018 0.0059 0.5622 -0.1222 0.5145 0.8737 1.2579 2.0830 9113 0.9998
z_1[2,4] -0.4208 0.0056 0.5585 -1.6153 -0.7695 -0.3870 -0.0372 0.5932 9884 1.0001
z_1[2,5] 0.6924 0.0060 0.5672 -0.4188 0.3272 0.6841 1.0465 1.8427 9024 1.0000
z_1[2,6] -0.0830 0.0066 0.5816 -1.1313 -0.4774 -0.1203 0.2758 1.1459 7713 1.0000
z_1[2,7] 0.2360 0.0066 0.6355 -1.0749 -0.1689 0.2599 0.6720 1.4099 9235 1.0000
z_1[2,8] -1.3990 0.0072 0.5941 -2.5999 -1.7855 -1.3869 -0.9977 -0.2534 6779 1.0000
z_1[2,9] -0.7841 0.0056 0.5513 -1.9667 -1.1315 -0.7470 -0.4043 0.1863 9763 1.0000
z_1[2,10] 0.2059 0.0079 0.7225 -1.3010 -0.2618 0.2340 0.6949 1.5464 8267 0.9999
z_1[2,11] 0.8010 0.0059 0.5956 -0.2729 0.3928 0.7668 1.1804 2.0640 10181 0.9999
L_1[1,1] 1.0000 NaN 0.0000 1.0000 1.0000 1.0000 1.0000 1.0000 NaN NaN
L_1[1,2] 0.0000 NaN 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 NaN NaN
L_1[2,1] -0.9582 0.0005 0.0427 -0.9938 -0.9831 -0.9707 -0.9489 -0.8486 6494 1.0004
L_1[2,2] 0.2610 0.0014 0.1093 0.1110 0.1833 0.2404 0.3156 0.5291 6218 1.0004
Samples were drawn using NUTS(diag_e) at Wed Feb 5 16:16:32 2025.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
pp_check(mo3, type="loo_intervals",prob = 0.5,prob_outer= 0.95) +
ggplot2::ylab("Cases per 100,000 (log10)") +
ggplot2::xlab("Index") +
ggplot2::theme(
axis.text = ggplot2::element_text(size = 12),
axis.title = ggplot2::element_text(size = 14)
)
Using all posterior draws for ppc type 'loo_intervals' by default.
#library(tidybayes)
#define custom order for sewershed sites
site_order <- c("SS1", "SS2", "SS3", "SS4", "SS5", "SS6", "SS7", "SS8", "SS9", "SS10", "SS11")
#extract random intercepts and slopes
mo3 %>%
spread_draws(r_site[site,term]) %>%
filter(term %in% c("Intercept", "log_n2_flowadj")) %>%
ungroup() %>%
mutate(site = factor(site, levels = site_order, ordered = TRUE)) %>%
ggplot(aes(y = site, x = r_site)) +
stat_halfeye(.width = c(.50, .95)) +
geom_vline(xintercept = 0, linetype = "dashed") +
facet_wrap(~term, scales = "free_x") +
labs(x = "Random Effect", y = "Sewershed"
#comment to format for inclusion of Figure 3 in main text of manuscript
# title = "Random Effects by Sewershed",
# subtitle = "Intercepts and Slopes"
) +
scale_y_discrete(limits = rev(site_order)) + #reverse site order for top-to-bottom layout
theme(strip.text = element_blank()) # remove facet labels
Model 3 is log_case100000 ~ 1 + log_n2_flowadj * phase + (1 + log_n2_flowadj | site) and was fit using 4 MCMC chains, each with 4000 iterations (500 warmup), resulting in 14,000 total post-warmup draws.
Random Effects
The standard deviation of the random intercepts is 1.5707 (95% CI: 0.8769 to 2.6264).
The standard deviation of the random slopes for log_n2_flowadj is 0.1099 (95% CI: 0.0563 to 0.1904).
Correlation: There is a strong negative correlation (-0.9582) between the random intercepts and slopes.
Fixed Effects
The main effect of log_n2_flowadj is 0.5000 (95% CI: 0.3978 to 0.6008), indicating a credible effect.
Compared to the reference phase (i.e., pre-VOC):
The Alpha phase has a credibly lower intercept (-1.7622; 95% CI: -2.6271 to -0.8783 ). This interval does not include 0.
The Delta phase is not credibly different (0.1541, 95% CI: -0.7615 to 1.0876). This interval includes 0.
The Omicron phase has a credibly lower intercept (-1.7788, 95% CI: -2.7411 to -0.8160). This interval does not include 0.
The interactions between log_n2_flowadj and SARS-CoV-2 VOCs:
Alpha: Credible positive interaction (0.1119, 95% CI: 0.0395 to 0.1829). This interval does not include 0.
Delta: No credible interaction (-0.0259, 95% CI: -0.1026 to 0.0489). This interval includes 0.
Omicron: Credible positive interaction (0.1457, 95% CI: 0.0673 to 0.2249). This interval does not include 0.
The interactions between log_n2_flowadj and the Alpha and Omicron phases is as follows:
Alpha: The interaction term log_n2_flowadj:phasealpha has an estimate of 0.1119. Thus, for every 1 unit increase in log_n2_flowadj during the Alpha phase, there is an additional 0.11 log unit increase in log_case100000 compared to the reference phase (i.e., pre_VOC). The total effect of log_n2_flowadj during the Alpha period is: 0.5000 (main effect) + 0.1119 (interaction) = 0.6119. This means that for every 1 unit increase in log_n2_flowadj during the Alpha phase, log_case100000 is expected to increase by 0.61 log units, holding other variables constant.
Omicron: The interaction term log_n2_flowadj:phaseomicron has an estimate of 0.1457. For every 1 unit increase in log_n2_flowadj during the Omicron phase, there is an additional 0.1454 log unit increase in log_case100000. This indicates a stronger relationship between N2 gene concentration and COVID-19 cases for Omicron than Alpha phases. The total effect of log_n2_flowadj for Omicron is: 0.5000 (main effect) + 0.1457 (interaction) = 0.6457. For every 1 unit increase in log_n2_flowadj during the Alpha phase, log_case100000 is expected to increase by 0.64 log units, holding other variables constant.
Model Fit
The residual standard deviation (sigma) is 0.2736 (95% CI: 0.2612 to 0.2869).
The Rhat values are all close to 1, indicating good convergence.
The effective sample sizes (ESS) are generally high, suggesting reliable posterior estimates.
PPC and Conditional Effects
The observed data fell within both 50% and 95% LOO intervals and the conditional effects plot show a positive, linear relationship between flow-adjusted N2 and COVID-19 cases, with visual differences in the slopes for Alpha and Omicron compared to pre-VOC.
In summary, Model 3 suggests that the relationship between flow-adjusted N2 and COVID-19 cases varies across different variant phases, with credible differences for Alpha and Omicron (but not Delta) variants in both intercepts and slopes when compared to pre-VOC (the reference phase).
Overview of Random Effects for Intercepts
These represent how much each site’s intercept deviates from the grand mean intercept:
Highest intercepts: SS2, SS5, SS7, and SS8
Lowest intercepts: SS1, SS4, SS6, and SS9
Overview of Random Effects for Slopes
These results show how the effect of log_n2_flowadj varies by site:
Highest slopes: SS1, SS4, SS9, and SS11
Lowest slopes: SS2, SS5, SS7, and SS8
Model Precision
The residual standard deviation (sigma) is estimated at 0.2736 (95% CI: 0.2612 to 0.2869), indicating the unexplained variability in the model.
Random Effects Correlation
The correlation between random intercepts and slopes (cor_site__Intercept__log_n2_flowadj) is strongly negative (-0.9582), suggesting that sites with higher baseline log case counts tend to have negative associations between log_n2_flowadj and population-normalized log case counts, and vice versa. In other words, the reciprocal relationship between the effect sizes of the intercepts and slopes for SS2, SS5, SS7, and SS8 suggests that populations with more severe outbreaks stabilize or decrease more quickly, while those with less severe outbreaks tend to increase more quickly over time.
Summary
In summary, Model 3 reveals credible variation across sites in both baseline log case counts and the strength of the association between N2 load interacting with VOCs and COVID-19 cases. The phase-specific effects of Alpha and Omicron varied across the sewersheds.
Computed from 14000 by 892 log-likelihood matrix.
------
MCSE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.5, 1.3]).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
Computed from 14000 by 892 log-likelihood matrix.
------
MCSE of elpd_loo is 0.1.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.6, 1.4]).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
Computed from 14000 by 892 log-likelihood matrix.
------
MCSE of elpd_loo is 0.1.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.5, 1.2]).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
elpd_diff se_diff
mo3 0.0 0.0
mo2 -126.4 15.1
mo1 -139.6 15.6
Summary of LOO-CV
The models are ordered from best to worst predictive performance.
elpd_diff: This column shows the difference in expected log predictive density (ELPD) between each model and the top-performing model. A higher ELPD indicates better predictive performance.
se_diff: This column shows the standard error of the difference in ELPD.
Model 3:
This is the best-performing model because it has the highest predictive accuracy for the models.
Model 2:
The ELPD difference: -126.4. This model performs worse than Model 3 because the difference is substantial (more than 2 SE away from 0).
Model 1:
2SE Rule
Using the “2SE rule” to estimate 95% confidence intervals, the second best performing model (Model 2) has a 95% CI of -126.4 + c(-2,2)*15.1 = (-156.6 and -0.96.2), which does not contain 0. The ratio of abs(elpd_diff/se_diff) = 8.371, which is greater than 2. A ratio of 2 is a common threshold for evaluating performance comparisons. Model 3 is the worst performing model with a CI of (-170.8, -108.4) that also does not contain 0. Model 3 clearly outperforms the other two models in terms of predictive accuracy because it accounts for the interaction between N2 gene and VOCs while allowing the slope and intercepts to vary by sewershed site.
// generated with brms 2.21.0
functions {
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
array[N] int<lower=1> J_1; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
int prior_only; // should the likelihood be ignored?
}
transformed data {
}
parameters {
real Intercept; // temporary intercept for centered predictors
real<lower=0> sigma; // dispersion parameter
vector<lower=0>[M_1] sd_1; // group-level standard deviations
array[M_1] vector[N_1] z_1; // standardized group-level effects
}
transformed parameters {
vector[N_1] r_1_1; // actual group-level effects
real lprior = 0; // prior contributions to the log posterior
r_1_1 = (sd_1[1] * (z_1[1]));
lprior += student_t_lpdf(Intercept | 3, 8.5, 2.5);
lprior += student_t_lpdf(sigma | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
lprior += student_t_lpdf(sd_1 | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Intercept;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
}
target += normal_lpdf(Y | mu, sigma);
}
// priors including constants
target += lprior;
target += std_normal_lpdf(z_1[1]);
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept;
// additionally sample draws from priors
real prior_Intercept = student_t_rng(3,8.5,2.5);
real prior_sigma = student_t_rng(3,0,2.5);
real prior_sd_1 = student_t_rng(3,0,2.5);
// use rejection sampling for truncated priors
while (prior_sigma < 0) {
prior_sigma = student_t_rng(3,0,2.5);
}
while (prior_sd_1 < 0) {
prior_sd_1 = student_t_rng(3,0,2.5);
}
}
// generated with brms 2.21.0
functions {
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
int<lower=1> Kc; // number of population-level effects after centering
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
array[N] int<lower=1> J_1; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
int prior_only; // should the likelihood be ignored?
}
transformed data {
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // regression coefficients
real Intercept; // temporary intercept for centered predictors
real<lower=0> sigma; // dispersion parameter
vector<lower=0>[M_1] sd_1; // group-level standard deviations
array[M_1] vector[N_1] z_1; // standardized group-level effects
}
transformed parameters {
vector[N_1] r_1_1; // actual group-level effects
real lprior = 0; // prior contributions to the log posterior
r_1_1 = (sd_1[1] * (z_1[1]));
lprior += student_t_lpdf(Intercept | 3, 8.5, 2.5);
lprior += student_t_lpdf(sigma | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
lprior += student_t_lpdf(sd_1 | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Intercept;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
}
target += normal_id_glm_lpdf(Y | Xc, mu, b, sigma);
}
// priors including constants
target += lprior;
target += std_normal_lpdf(z_1[1]);
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
// additionally sample draws from priors
real prior_Intercept = student_t_rng(3,8.5,2.5);
real prior_sigma = student_t_rng(3,0,2.5);
real prior_sd_1 = student_t_rng(3,0,2.5);
// use rejection sampling for truncated priors
while (prior_sigma < 0) {
prior_sigma = student_t_rng(3,0,2.5);
}
while (prior_sd_1 < 0) {
prior_sd_1 = student_t_rng(3,0,2.5);
}
}
// generated with brms 2.21.0
functions {
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
int<lower=1> Kc; // number of population-level effects after centering
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
array[N] int<lower=1> J_1; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
int prior_only; // should the likelihood be ignored?
}
transformed data {
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // regression coefficients
real Intercept; // temporary intercept for centered predictors
real<lower=0> sigma; // dispersion parameter
vector<lower=0>[M_1] sd_1; // group-level standard deviations
array[M_1] vector[N_1] z_1; // standardized group-level effects
}
transformed parameters {
vector[N_1] r_1_1; // actual group-level effects
real lprior = 0; // prior contributions to the log posterior
r_1_1 = (sd_1[1] * (z_1[1]));
lprior += normal_lpdf(b | 0, 1);
lprior += normal_lpdf(Intercept | 0, 100);
lprior += cauchy_lpdf(sigma | 0, 5)
- 1 * cauchy_lccdf(0 | 0, 5);
lprior += student_t_lpdf(sd_1 | 3, 0, 2.5)
- 1 * student_t_lccdf(0 | 3, 0, 2.5);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Intercept;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
}
target += normal_id_glm_lpdf(Y | Xc, mu, b, sigma);
}
// priors including constants
target += lprior;
target += std_normal_lpdf(z_1[1]);
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
// additionally sample draws from priors
real prior_b = normal_rng(0,1);
real prior_Intercept = normal_rng(0,100);
real prior_sigma = cauchy_rng(0,5);
real prior_sd_1 = student_t_rng(3,0,2.5);
// use rejection sampling for truncated priors
while (prior_sigma < 0) {
prior_sigma = cauchy_rng(0,5);
}
while (prior_sd_1 < 0) {
prior_sd_1 = student_t_rng(3,0,2.5);
}
}
// generated with brms 2.21.0
functions {
/* compute correlated group-level effects
* Args:
* z: matrix of unscaled group-level effects
* SD: vector of standard deviation parameters
* L: cholesky factor correlation matrix
* Returns:
* matrix of scaled group-level effects
*/
matrix scale_r_cor(matrix z, vector SD, matrix L) {
// r is stored in another dimension order than z
return transpose(diag_pre_multiply(SD, L) * z);
}
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
int<lower=1> Kc; // number of population-level effects after centering
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
array[N] int<lower=1> J_1; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
vector[N] Z_1_2;
int<lower=1> NC_1; // number of group-level correlations
int prior_only; // should the likelihood be ignored?
}
transformed data {
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // regression coefficients
real Intercept; // temporary intercept for centered predictors
real<lower=0> sigma; // dispersion parameter
vector<lower=0>[M_1] sd_1; // group-level standard deviations
matrix[M_1, N_1] z_1; // standardized group-level effects
cholesky_factor_corr[M_1] L_1; // cholesky factor of correlation matrix
}
transformed parameters {
matrix[N_1, M_1] r_1; // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_1] r_1_1;
vector[N_1] r_1_2;
real lprior = 0; // prior contributions to the log posterior
// compute actual group-level effects
r_1 = scale_r_cor(z_1, sd_1, L_1);
r_1_1 = r_1[, 1];
r_1_2 = r_1[, 2];
lprior += normal_lpdf(b | 0, 1);
lprior += normal_lpdf(Intercept | 0, 100);
lprior += cauchy_lpdf(sigma | 0, 5)
- 1 * cauchy_lccdf(0 | 0, 5);
lprior += student_t_lpdf(sd_1 | 3, 0, 2.5)
- 2 * student_t_lccdf(0 | 3, 0, 2.5);
lprior += lkj_corr_cholesky_lpdf(L_1 | 1);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Intercept;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n];
}
target += normal_id_glm_lpdf(Y | Xc, mu, b, sigma);
}
// priors including constants
target += lprior;
target += std_normal_lpdf(to_vector(z_1));
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
// compute group-level correlations
corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
vector<lower=-1,upper=1>[NC_1] cor_1;
// additionally sample draws from priors
real prior_b = normal_rng(0,1);
real prior_Intercept = normal_rng(0,100);
real prior_sigma = cauchy_rng(0,5);
real prior_sd_1 = student_t_rng(3,0,2.5);
real prior_cor_1 = lkj_corr_rng(M_1,1)[1, 2];
// extract upper diagonal of correlation matrix
for (k in 1:M_1) {
for (j in 1:(k - 1)) {
cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
}
}
// use rejection sampling for truncated priors
while (prior_sigma < 0) {
prior_sigma = cauchy_rng(0,5);
}
while (prior_sd_1 < 0) {
prior_sd_1 = student_t_rng(3,0,2.5);
}
}
// generated with brms 2.21.0
functions {
/* compute correlated group-level effects
* Args:
* z: matrix of unscaled group-level effects
* SD: vector of standard deviation parameters
* L: cholesky factor correlation matrix
* Returns:
* matrix of scaled group-level effects
*/
matrix scale_r_cor(matrix z, vector SD, matrix L) {
// r is stored in another dimension order than z
return transpose(diag_pre_multiply(SD, L) * z);
}
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
int<lower=1> Kc; // number of population-level effects after centering
// data for group-level effects of ID 1
int<lower=1> N_1; // number of grouping levels
int<lower=1> M_1; // number of coefficients per level
array[N] int<lower=1> J_1; // grouping indicator per observation
// group-level predictor values
vector[N] Z_1_1;
vector[N] Z_1_2;
int<lower=1> NC_1; // number of group-level correlations
int prior_only; // should the likelihood be ignored?
}
transformed data {
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // regression coefficients
real Intercept; // temporary intercept for centered predictors
real<lower=0> sigma; // dispersion parameter
vector<lower=0>[M_1] sd_1; // group-level standard deviations
matrix[M_1, N_1] z_1; // standardized group-level effects
cholesky_factor_corr[M_1] L_1; // cholesky factor of correlation matrix
}
transformed parameters {
matrix[N_1, M_1] r_1; // actual group-level effects
// using vectors speeds up indexing in loops
vector[N_1] r_1_1;
vector[N_1] r_1_2;
real lprior = 0; // prior contributions to the log posterior
// compute actual group-level effects
r_1 = scale_r_cor(z_1, sd_1, L_1);
r_1_1 = r_1[, 1];
r_1_2 = r_1[, 2];
lprior += normal_lpdf(b | 0, 1);
lprior += normal_lpdf(Intercept | 0, 100);
lprior += cauchy_lpdf(sigma | 0, 5)
- 1 * cauchy_lccdf(0 | 0, 5);
lprior += student_t_lpdf(sd_1 | 3, 0, 2.5)
- 2 * student_t_lccdf(0 | 3, 0, 2.5);
lprior += lkj_corr_cholesky_lpdf(L_1 | 1);
}
model {
// likelihood including constants
if (!prior_only) {
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Intercept;
for (n in 1:N) {
// add more terms to the linear predictor
mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n];
}
target += normal_id_glm_lpdf(Y | Xc, mu, b, sigma);
}
// priors including constants
target += lprior;
target += std_normal_lpdf(to_vector(z_1));
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
// compute group-level correlations
corr_matrix[M_1] Cor_1 = multiply_lower_tri_self_transpose(L_1);
vector<lower=-1,upper=1>[NC_1] cor_1;
// additionally sample draws from priors
real prior_b = normal_rng(0,1);
real prior_Intercept = normal_rng(0,100);
real prior_sigma = cauchy_rng(0,5);
real prior_sd_1 = student_t_rng(3,0,2.5);
real prior_cor_1 = lkj_corr_rng(M_1,1)[1, 2];
// extract upper diagonal of correlation matrix
for (k in 1:M_1) {
for (j in 1:(k - 1)) {
cor_1[choose(k - 1, 2) + j] = Cor_1[j, k];
}
}
// use rejection sampling for truncated priors
while (prior_sigma < 0) {
prior_sigma = cauchy_rng(0,5);
}
while (prior_sd_1 < 0) {
prior_sd_1 = student_t_rng(3,0,2.5);
}
}
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C LC_TIME=English_United States.utf8
time zone: America/New_York
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rstan_2.32.6 StanHeaders_2.32.10 emmeans_1.10.4 moments_0.14.1 EnvStats_3.0.0 MASS_7.3-60.2
[7] logspline_2.1.22 bayestestR_0.14.0 glmmTMB_1.1.9 ggeffects_1.7.0 tidybayes_3.0.6 ggplot2_3.5.1
[13] posterior_1.6.0 bayesplot_1.11.1 loo_2.8.0 dplyr_1.1.4 brms_2.21.0 Rcpp_1.0.13
[19] future_1.34.0
loaded via a namespace (and not attached):
[1] RColorBrewer_1.1-3 tensorA_0.36.2.1 rstudioapi_0.16.0 jsonlite_1.8.8 shape_1.4.6.1 magrittr_2.0.3
[7] TH.data_1.1-2 estimability_1.5.1 nloptr_2.1.1 GlobalOptions_0.1.2 vctrs_0.6.5 minqa_1.2.7
[13] terra_1.7-78 htmltools_0.5.8.1 distributional_0.4.0 curl_5.2.1 raster_3.6-26 sass_0.4.9
[19] parallelly_1.38.0 bslib_0.8.0 sandwich_3.1-1 zoo_1.8-12 lubridate_1.9.3 cachem_1.1.0
[25] TMB_1.9.14 mime_0.12 lifecycle_1.0.4 iterators_1.0.14 pkgconfig_2.0.3 Matrix_1.7-0
[31] R6_2.5.1 fastmap_1.2.0 shiny_1.9.1 clue_0.3-65 digest_0.6.36 numDeriv_2016.8-1.1
[37] colorspace_2.1-1 S4Vectors_0.42.1 fansi_1.0.6 timechange_0.3.0 abind_1.4-5 mgcv_1.9-1
[43] compiler_4.4.1 withr_3.0.1 doParallel_1.0.17 backports_1.5.0 inline_0.3.19 QuickJSR_1.3.1
[49] hexbin_1.28.3 pkgbuild_1.4.4 rjson_0.2.21 rasterVis_0.51.6 prophet_1.0 tools_4.4.1
[55] httpuv_1.6.15 glue_1.7.0 nlme_3.1-164 promises_1.3.0 grid_4.4.1 checkmate_2.3.2
[61] cluster_2.1.6 generics_0.1.3 diffobj_0.3.5 gtable_0.3.5 tidyr_1.3.1 sp_2.1-4
[67] utf8_1.2.4 BiocGenerics_0.50.0 foreach_1.5.2 pillar_1.9.0 ggdist_3.3.2 stringr_1.5.1
[73] later_1.3.2 circlize_0.4.16 splines_4.4.1 lattice_0.22-6 survival_3.6-4 deldir_2.0-4
[79] tidyselect_1.2.1 ComplexHeatmap_2.20.0 knitr_1.48 arrayhelpers_1.1-0 gridExtra_2.3 V8_5.0.0
[85] IRanges_2.38.1 stats4_4.4.1 xfun_0.46 bridgesampling_1.1-2 matrixStats_1.3.0 stringi_1.8.4
[91] boot_1.3-30 codetools_0.2-20 interp_1.1-6 tibble_3.2.1 cli_3.6.3 RcppParallel_5.1.8
[97] xtable_1.8-4 munsell_0.5.1 jquerylib_0.1.4 globals_0.16.3 coda_0.19-4.1 png_0.1-8
[103] svUnit_1.0.6 parallel_4.4.1 rstantools_2.4.0 latticeExtra_0.6-30 jpeg_0.1-10 Brobdingnag_1.2-9
[109] lme4_1.1-35.5 listenv_0.9.1 viridisLite_0.4.2 mvtnorm_1.2-5 scales_1.3.0 insight_0.20.3
[115] purrr_1.0.2 crayon_1.5.3 GetoptLong_1.0.5 rlang_1.1.4 multcomp_1.4-26