---
title: "Nonlinear models in flocker"
author: "Jacob Socolar"
date: "2023-10-21"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Nonlinear models in flocker}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
Here we show how we can use `flocker` to fit nonlinear occupancy models via 
`brms`. In most occupancy models, occupancy and detection probabilities are 
modeled as logit-linear combinations of covariates. In some models (e.g. those 
with splines or Gaussian processes), probabilities are modeled as the sum of 
more flexible functions of covariates. These are straightforward to fit in
`flocker` using the `brms` functions `s()`, `t2()`, and `gp()`; see the [flocker tutorial vignette](https://jsocolar.github.io/flocker/articles/flocker_tutorial.html) for 
details.
This vignette focuses on more complicated nonlinear models that require the use
of special nonlinear `brms` formulas. We showcase two models. The first fits
a parametric nonlinear predictor. The second fits a model with a spatially
varying coefficient that is given a gaussian process prior.
## Parametric nonlinear predictor
In this scenario, we consider a model where the response is a specific nonlinear 
parametric function whose parameters are fitted and might or might not depend on covariates. Suppose for example that an expanding population of a territorial 
species undergoes logistic growth, and also that some unknown proportion of 
territories are unsuitable due to an unobserved factor, such that occupancy 
asymptotes at some probability less than one. Thus, occupancy probability 
changes through time as $\frac{L}{1 + e^{-k(t-t_0)}}$, where $L$ is the 
asymptote, $k$ is a growth rate, $t$ is time, and $t_0$ is the timing of the 
inflection point. At multiple discrete times, we randomly sample several sites 
to survey, and survey each of those sites over several repeat visits.
```r
library(flocker); library(brms)
set.seed(3)
L <- 0.5
k <- .1
t0 <- -5
t <- seq(-15, 15, 1)
n_site_per_time <- 30
n_visit <- 3
det_prob <- .3
data <- data.frame(
  t = rep(t, n_site_per_time)
)
data$psi <- L/(1 + exp(-k*(t - t0)))
data$Z <- rbinom(nrow(data), 1, data$psi)
data$v1 <- data$Z * rbinom(nrow(data), 1, det_prob)
data$v2 <- data$Z * rbinom(nrow(data), 1, det_prob)
data$v3 <- data$Z * rbinom(nrow(data), 1, det_prob)
fd <- make_flocker_data(
  obs = as.matrix(data[,c("v1", "v2", "v3")]),
  unit_covs = data.frame(t = data[,c("t")]),
  event_covs <- list(dummy = matrix(rnorm(n_visit*nrow(data)), ncol = 3))
)
```
We wish to fit an occupancy model that recovers the unknown parameters $L$, $k$, 
and $t_0$. We can achieve this using the nonlinear formula syntax provided by 
`brms` via `flocker`. 
`flocker` will always assume that the occupancy formula is provided on the logit 
scale. Thus, we need to convert our nonlinear function giving the occupancy 
probability to a function giving the logit occupancy probability. A bit of 
simplification via Wolfram Alpha and we arrive at 
$\log(\frac{L}{1 + e^{-k(t - t_0)} - L})$. We then write a `brms` formula 
representing occupancy via this function. To specify a formula wherein a 
distributional parameter (`occ` in this case, referring to occupancy) is 
nonlinear we need to use `brms::set_nl()` rather than merely providing the 
`nl = TRUE` argument to `brms::bf()`.
`flocker`'s main fitting function `flock()` accepts `brmsformula` inputs to its 
`f_det` argument. When supplying a `brmsformula` to `f_det` (rather than the 
typical one-sided detection formula), the following behaviors are triggered:
* Several input checks are turned off.  For example, `flocker` no longer checks 
to ensure that event covariates are absent from the occupancy formula. 
`flocker` also no longer explicitly checks that formulas are provided for all of 
the required distributional terms for a given family (detection, occupancy, 
colonization, extinction, and autologistic terms, depending on the family).
* All inputs to `f_occ`, `f_col`, `f_ex`, `f_auto` are silently ignored. It is 
obligatory to pass the entire formula for all distributional parameters as a 
single `brmsformula` object. This means in turn that the user must be familiar 
with `flocker`'s internal naming conventions for all of the relevant 
distributional parameters (`det` and one or more of `occ`, `colo`, `ex`, 
`autologistic`, `Omega`). If fitting a data-augmented model, it will be required 
to pass the `Omega ~ 1` formula within the `brmsformula` (When passing the 
traditional one-sided formula to `f_det`, `flocker` includes the formula for 
`Omega` internally and automatically).
* Nonlinear formulas that involve data that are required to be positive might 
fail! Internally, some irrelevant data positions get filled with `-99`, but 
these positions might still get evaluated by the nonlinear formula, even though 
they make no contribution to the likelihood.
With all of that said, we can go ahead and fit this model!
```r
fit <- flock(f_det = brms::bf(
                 det ~ 1 + dummy,
                 occ ~ log(L/(1 + exp(-k*(t - t0)) - L)),
                 L ~ 1,
                 k ~ 1,
                 t0 ~ 1
               ) +
               brms::set_nl(dpar = "occ"),
             prior = 
               c(
                 prior(normal(0, 5), nlpar = "t0"),
                 prior(normal(0, 1), nlpar = "k"), 
                 prior(beta(1, 1), nlpar = "L", lb = 0, ub = 1)
                ),
             flocker_data = fd, 
             control = list(adapt_delta = 0.9),
             cores = 4)
```
```r
summary(fit)
#>  Family: occupancy_single 
#>   Links: mu = identity; occ = identity 
#> Formula: ff_y | vint(ff_n_unit, ff_n_rep, ff_Q, ff_rep_index1, ff_rep_index2, ff_rep_index3) ~ 1 + dummy 
#>          occ ~ log(L/(1 + exp(-k * (t - t0)) - L))
#>          L ~ 1
#>          k ~ 1
#>          t0 ~ 1
#>    Data: data (Number of observations: 2790) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Population-Level Effects: 
#>              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept       -0.91      0.13    -1.18    -0.66 1.00     2535     2318
#> L_Intercept      0.50      0.10     0.38     0.77 1.00     1241      864
#> k_Intercept      0.19      0.08     0.07     0.36 1.00     1283     1433
#> t0_Intercept    -5.90      3.38   -10.34     3.17 1.00     1248      921
#> dummy            0.00      0.08    -0.15     0.15 1.00     2620     2236
#> 
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
```
It works! 
Note that if desired, we could fit more complicated formulas than `~ 1` for any of the nonlinear parameters. For more see the [brms nonlinear model vignette](https://paul-buerkner.github.io/brms/articles/brms_nonlinear.html).
## Spatially varying coefficients via a Gaussian process
The `gp()` function in `brms` includes a Gaussian process of arbitrary dimension
in the linear predictor. We can use the nonlinear formula syntax to tell `brms`
to include a Gaussian process prior on a coefficient as well.
First we simulate some data wherein the logit of the occupancy probability
depends on a covariate, and the slope of the dependency is modeled via a 
two-dimensional spatial Gaussian process. It turns out that we will need
quite a few of data points to constrain the standard deviation of the Gaussian
process, so we simulate with 2000 sites:
```r
set.seed(1)
n <- 2000 # sample size
lscale <- 0.3 # square root of l of the gaussian kernel
sigma_gp <- 1 # sigma of the gaussian kernel
intercept <- 0 # occupancy logit-intercept
det_intercept <- -1 # detection logit-intercept
n_visit <- 4
# covariate data for the model
gp_data <- data.frame(
  x = rnorm(n), 
  y = rnorm(n),
  covariate = rnorm(n)
  )
# get distance matrix
dist.mat <- as.matrix(
  stats::dist(gp_data[,c("x", "y")])
  )
# get covariance matrix
cov.mat <- sigma_gp^2 * exp(- (dist.mat^2)/(2*lscale^2))
# simulate occupancy data
gp_data$coef <- mgcv::rmvn(1, rep(0, n), cov.mat)
gp_data$lp <- intercept + gp_data$coef * gp_data$covariate
gp_data$psi <- boot::inv.logit(gp_data$lp)
gp_data$Z <- rbinom(n, 1, gp_data$psi)
# simulate visit data
obs <- matrix(nrow = n, ncol = n_visit)
for(j in 1:n_visit){
  obs[,j] <- gp_data$Z * rbinom(n, 1, boot::inv.logit(det_intercept))
}
```
And here's how we can fit this model in `flocker`! Because we have a large number of sites, we use a [Hilbert space approximate Gaussian process](https://arxiv.org/abs/2004.11408) for
computational efficiency.
```r
fd2 <- make_flocker_data(obs = obs, unit_covs = gp_data[, c("x", "y", "covariate")])
svc_mod <- flock(
  f_det = brms::bf(
                 det ~ 1,
                 occ ~ occint + g * covariate,
                 occint ~ 1,
                 g ~ 0 + gp(x, y, scale = FALSE, k = 20, c = 1.25)
               ) +
               brms::set_nl(dpar = "occ"),
  flocker_data = fd2,
  cores = 4
)
```
```r
summary(svc_mod)
#>  Family: occupancy_single_C 
#>   Links: mu = identity; occ = identity 
#> Formula: ff_n_suc | vint(ff_n_trial) ~ 1 
#>          occ ~ occint + g * covariate
#>          occint ~ 1
#>          g ~ 0 + gp(x, y, scale = FALSE, k = 20, c = 1.25)
#>    Data: data (Number of observations: 2000) 
#>   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
#>          total post-warmup draws = 4000
#> 
#> Gaussian Process Terms: 
#>                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> sdgp(g_gpxy)       1.66      0.75     0.73     3.66 1.00     2849     3317
#> lscale(g_gpxy)     0.23      0.11     0.08     0.50 1.00     3850     3141
#> 
#> Population-Level Effects: 
#>                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
#> Intercept           -1.04      0.06    -1.15    -0.94 1.00     5532     2810
#> occint_Intercept     0.09      0.09    -0.08     0.27 1.00     5736     3182
#> 
#> Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
```
Again, it worked!