In this example we demonstrate how to set correlation structures for latent variables and random effects in gllvm. The species abundances are often correlated in time or spatially in community ecology. In gllvm, such correlation structures can now be set either for latent variables or comunity level random row effects.
Denote the abundance of the \(j\)th species (\(j\in \{1,\dots, p\}\)) at the \(i\)th site (\(i\in \{1,\dots, n\}\)) as \(y_{ij}\). A set of k environmental variables, or experimental treatments, may also be recorded at each site and stored in the vector \(\boldsymbol{x}_i = (x_{i1}, ..., x_{ik})\). A common form for the GLLVM is then defined for the mean abundance \(\mu_{ij}\):
\[g(\mu_{ij}) = \eta_{ij} = r_{s_a(i)} + \beta_{0j} + \boldsymbol{x}_i'\boldsymbol{\beta}_j + \boldsymbol{u}_{s_u(i)}'\boldsymbol{\theta}_j,\]
Now we can specify a correlation structure for latent variables \(\boldsymbol{u}_{s_u(i)} = ({u}_{1,s_u(i)}, \dots,
{u}_{d,s_u(i)})'\) (\(d=\)
number of latent variables, defined by num.lv in
gllvm) and/or community level row effects \(r_{s_r(i)}\). The subscripts \(s_u(i)\) defines the structure of the
latent variables. For instance, assume that we have a hierarchical
sampling design with \(n_s<n\) sites
and sites are sampled \(n_t\) times in
consecutive years. Thus we could want to include a site level latent
variables such that \(s_u(i) = k\) if
sampling unit \(i\) is from site \(k\) (\(k\in
\{1,\dots, n_s\}\)). Similarly, we can set a structure for the
row effects defining subscript \(s_r(i)\), eg. for time \(s_r(i) = t\) if sampling unit \(i\) is from sampling year \(t\) (\(t\in
\{1,\dots, n_t\}\)). Then we can define the correlation
structures for latent variables and/or row effects: denote \(\boldsymbol{u}_{q.} = (u_{q1}, ...,
u_{qn_s})\) \[
{u}_{q.} \sim N(\boldsymbol{0}, \Sigma_q), \,  q=1,\dots, d.
\] The covariance matrix \(\Sigma_q\) defines the correlation
structure and has unit variances on diagonal. The options are identity
(independent), AR(1) correlation, exponentially decaying correlation
structure and Matern correlation. Currently we can only use the same
structure for all latent variables \(q=1,\dots,d\), but the parameters defining
the strength of the correlation is estimated uniquely for each latent
variable.
Similarly, we can define the correlation structure for random row effect, denote \(\boldsymbol{r} = (r_{1}, ..., r_{n_t})\), and set \[ {r} \sim N(\boldsymbol{0}, \Sigma_r), \] where the covariance matrix \(\Sigma_r\) is a \(n_t \times n_t\) matrix and defines the correlation structure for random row effect. For row effects we have the same options as we have for latent variables. The structure for the row effects can be different from the structure of the latent variables and this is illustrated in the case study below.
As an example, we use time series of sessile algal and invertebrate species from permanent transects (Arkema et al. 2009, Reed and Miller (2023)). Data consists of percent cover of 132 species from 11 sites which each have 2-8 transects, yearly recorded from 2000 to 2021. It also includes measurements of the environment; the number of stripes of giant kelp and percent rock. In total, data has 836 sampling units.
Species percent cover matrix is in Yabund, for the
analyses we consider only sessile invertebrates and include species
observed at least 10 times. The rarest species are modeled as a sum.
##    AB AL AMZO   ANSP AR ARUD     AS ATM     AU     BA BAEL BCAL     BF BLD     BN
## 1   0  0    0 0.0125  0    0 0.1125   0 0.0125 0.0000    0    0 0.0000   0 0.0125
## 2   0  0    0 0.0000  0    0 0.0000   0 0.0000 0.0000    0    0 0.0125   0 0.0000
## 3   0  0    0 0.0000  0    0 0.0750   0 0.0000 0.0000    0    0 0.0000   0 0.0000
## 4   0  0    0 0.0000  0    0 0.0125   0 0.0000 0.0000    0    0 0.0125   0 0.0000
## 5   0  0    0 0.0000  0    0 0.0000   0 0.0000 0.0000    0    0 0.0000   0 0.0000
## 6   0  0    0 0.0000  0    0 0.0000   0 0.0000 0.0000    0    0 0.0000   0 0.0000
## 7   0  0    0 0.0000  0    0 0.0000   0 0.0000 0.0000    0    0 0.0000   0 0.0000
## 8   0  0    0 0.0000  0    0 0.0000   0 0.0000 0.0000    0    0 0.0000   0 0.0000
## 9   0  0    0 0.0000  0    0 0.0500   0 0.0000 0.0125    0    0 0.0000   0 0.0000
## 10  0  0    0 0.0000  0    0 0.0250   0 0.0000 0.0000    0    0 0.0125   0 0.0000
##    BO BOW BPSE     BR    BRA
## 1   0   0    0 0.0250 0.0375
## 2   0   0    0 0.0250 0.0125
## 3   0   0    0 0.0375 0.0000
## 4   0   0    0 0.0000 0.0000
## 5   0   0    0 0.0125 0.0000
## 6   0   0    0 0.0000 0.0125
## 7   0   0    0 0.0000 0.0000
## 8   0   0    0 0.0000 0.0000
## 9   0   0    0 0.1000 0.0000
## 10  0   0    0 0.1000 0.0875Yinvert <- Yabund[SPinfo$GROUP == "INVERT"]
Yinvert10 = Yinvert[,colSums(Yinvert>0, na.rm = TRUE)>9]
# Sum species that were observed less than 9 times, 
Yinvert10$Sum_inv = rowSums(Yinvert[,(colSums(Yinvert>0, na.rm = TRUE)<=9)], na.rm = TRUE)There is information about the species:
##   SP_CODE  GROUP            COMMON_NAME     SCIENTIFIC_NAME TAXON_KINGDOM
## 1      AB INVERT Coarse Sea Fir Hydroid    Abietinaria spp.      Animalia
## 2      AL INVERT  Aggregating cup Coral    Astrangia haimei      Animalia
## 3    AMZO  ALGAE                   <NA> Amphiroa beauvoisii       Plantae
## 4    ANSP INVERT    Aggregating anemone    Anthopleura spp.      Animalia
## 5      AR INVERT          Sand Tunicate  Eudistoma psammion      Animalia
## 6    ARUD INVERT              Octocoral   Discophyton rudyi      Animalia
##   TAXON_PHYLUM     TAXON_CLASS     TAXON_ORDER    TAXON_FAMILY TAXON_GENUS
## 1     Cnidaria        Hydrozoa    Leptothecata   Sertulariidae Abietinaria
## 2     Cnidaria        Anthozoa    Scleractinia    Rhizangiidae   Astrangia
## 3   Rhodophyta Florideophyceae    Corallinales Lithophyllaceae    Amphiroa
## 4     Cnidaria        Anthozoa      Actiniaria      Actiniidae Anthopleura
## 5     Chordata      Ascidiacea Aplousobranchia   Polycitoridae   Eudistoma
## 6     Cnidaria        Anthozoa      Alcyonacea     Alcyoniidae DiscophytonSPinfo10 = SPinfo[SPinfo$SP_CODE %in% colnames(Yinvert10),]
SPinfo10 = rbind(SPinfo10,c("Sum_inv","INVERT", rep(NA, ncol(SPinfo10)-2)))And finally, information about the study design (site name, year, transect index) and environmental coefficients (average number of kelp fronds, percent rock). The distribution of kelp fronds is very skewed to the right so a log of the number of kelp fronds is used as a covariate instead. In addition, covariates are scaled.
##   SITE YEAR TRANSECT KELP_FRONDS PERCENT_ROCKY
## 1 ABUR 2001        1     13.8750          35.0
## 2 ABUR 2001        2      0.0000          37.5
## 3 ABUR 2002        1      8.5125          10.0
## 4 ABUR 2002        2      0.3875           2.5
## 5 ABUR 2003        1      0.6000           0.0
## 6 ABUR 2003        2      0.0000           0.0par(mfrow=c(1,2))
hist(Xenv$KELP_FRONDS, main = "KELP FRONDS")
Xenv$logKELP_FRONDS = log(Xenv$KELP_FRONDS+1)
Xenv$logKELP_FRONDSsc = scale(Xenv$logKELP_FRONDS)
Xenv$PERCENT_ROCKYsc = scale(Xenv$PERCENT_ROCKY)
hist(Xenv$logKELP_FRONDSsc, main = "log of KELP FRONDS")plot of chunk Xcovar
For modeling percent cover data, has three options, beta for values (0,1), ordered beta for [0,1] and beta hurdle model for [0,1). Here we use beta-hurdle response model.