Implicit dynamics occupancy models in R

In this tutorial, I cover:

Implicit dynamics occupancy models with the R package RPresence. These models estimate occupancy probability when it changes through time without estimating colonization and extinction parameters.

The code and sample data from this tutorial are available on GitHub.

Implicit vs Explicit Dynamics in Occupancy Models

Dynamic occupancy models are for scenarios where occupancy state at any site changes between primary survey periods (e.g., years). My previous tutorial on dynamic occupancy models used the explicit formulation of that modelling approach, which explicitly estimates colonization (\(\gamma\)) and extinction (\(\epsilon\)) probabilities. Recall that in explicit dynamic occupancy models, the occupancy probability in each primary survey period after the first season is derived (i.e., estimated using the parameters in the model but is not a parameter in the model). An alternative approach with fewer parameters is implicit dynamics occupancy models, which estimate detection probability and occupancy in each primary survey period.

The implicit dynamics occupancy model can be useful when there is limited data because there are fewer parameters to estimate (2 vs 4).

How Implicit Dynamics Occupancy Models Work

The two parameters are similar to the single-season occupancy models:

detection probability (\(p_i,_t\)), the probability of finding a species at site i if it is present at time t
occupancy (\(\Psi_i,t\)), the probability that a species occurs at site i in each primary survey period t.

This approach has a somewhat restrictive assumption:

\(\epsilon_i,_t\) = 1 - \(\gamma_i,_t\) For site i at primary survey period t.

Still, it can be a useful model when there are sparse data.

Let’s look at an example.

Loading data to use in RPresence

To get started, we need data describing on which surveys that a species was detected or not detected. The example data are surveys for frogs calling where we assume occupancy status does not change within a year (the primary survey period). There are 3 surveys per year at each site over 5 years (15 total site visits). The data set is the same example data I used to demonstrate explicit dynamic occupancy models.

The RPresence package uses a slightly different data object type than unmarked, but the basic approach is the same. We need:

detection history data of 0, 1, or NA with rows for sites and columns for surveys
survey covariate data with rows for sites and columns for surveys
a matrix for site and/or time covariates that affect \(\Psi_t\) that has rows = sites x number of primary surveys, and columns for each variable. This is different than the formatting used for unmarked occupancy models.

library(dplyr) # for data organization
library(ggplot2) # for plotting
# install.packages('RPresence', repo='https://www.mbr-pwrc.usgs.gov/mbrCRAN') # first time only
library(RPresence)

# Load detection history (100 sites with 15 visits each)
detection_history <- read.csv("dynamic_detection_history.csv", 
                              # First variable ("X") has row.names but not data
                              row.names = "X") 

# Examine data
head(detection_history)

##   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
## 1  1  0  0  0  0  0  0  0  0   0   0   0   0   0   0
## 2  0  0  0  0  0  0  1  0  1   0   0   0   0   0   0
## 3  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0
## 4  0  0  0  0  0  0  0  0  1   0   0   0   0   0   0
## 5  0  0  0  0  0  0  0  0  0   0   1   0   0   0   0
## 6  0  1  0  1  0  1  0  0  0   0   0   0   0   0   0

# Load covariate data
effort <- read.csv("dynamic_effort.csv",
                   # First variable ("X") has row.names but not data
                   row.names = "X") 
site_cov <- read.csv("dynamic_site_cov.csv",
                     # First variable ("X") has row.names but not data
                     row.names = "X")

# RPresence uses createPao to create the input object for fitting occupancy models
frog_pao <- createPao( # data = a matrix with observed detection history 
                                          # 0's and 1's, one row per site, one column per survey
                                      data = as.matrix(detection_history),
                                      # nsurveyseason is a vector where the length = number of primary surveys
                                      # and the values are the number of secondary surveys
                                      nsurveyseason = rep(3,5),
                                      # survcov = survey covariates; a list where each element 
                                      # is a matrix the same size as "data"
                                      survcov = list(effort = as.matrix(effort)),
                                      # seasonncov is used for Explicit models (which we will compare to)
                                      seasncov = list(wetland_explicit_t = as.matrix(site_cov[,c(2, 5:8)])),
                                      # unitcov = dataframe with site rows x column variables, the site covariates
                                      # the first column of site_cov is the site number
                                      # Used for Explicit models (which we will compare to)
                                      unitcov = site_cov[,2:4],
                                      unitnames = as.character(site_cov$sites),
                                      title = "Frogs example") 

# Summary of pao occupancy model data
summary(frog_pao)

## paoname=pres.pao
## title=Frogs example
## Naive occ=0.57
##        nunits      nsurveys      nseasons nsurveyseason       methods 
##         "100"          "15"           "5"   "3,3,3,3,3"           "1" 
##      nunitcov      nsurvcov 
##           "3"           "2" 
## unit covariates  : wetland temp prec 
## survey covariates: effort SURVEY

# Setting-up wetland change through time as an occupancy covariate for implicit dynamics occupancy model
# All site covariates used for Psi model must be included in this dataframe
# From occMod() documentation for implicit dynamics ("occMod_DO4"):
# psi.cov should be a data frame containing the unit-specific covariates to use for the occupancy
# component of the model, with number of rows = data$nunits or data$nunits*data$nseasons. If the 
# shorter version of the data frame is supplied the rows are recycled to the longer length.
implicit_occ_psi_covs <- data.frame(wetland_t = c(site_cov$wetland, site_cov$wetland.2,
                               site_cov$wetland.3, site_cov$wetland.4,
                               site_cov$wetland.5),
                               temp = rep(site_cov$temp, 5),
                               prec = rep(site_cov$prec, 5))

Now that the data are organized, we can build models using the function occMod() and the type = “do.4”. We’ll build a model where:

\(\Psi_t\) is related to wetland amount during the primary survey period (wetland_t), site temperature (temp), and site precipitation (prec)
\(p_t\) is related to search effort (scaled in our data to mean = 0, sd = 1)

The variables for each parameter are specified with equations and must match the names within our pao object (frog_pao), except for the psi.cov component.

# Fit models with occMod()
# type = "do.4" for implicit dynamics occupancy
implicit_occ_m1 <- occMod(model = list(psi ~wetland_t + temp + prec,
                                       p ~ effort), 
                          cov.list = list(psi.cov = implicit_occ_psi_covs),
                          data = frog_pao,
                          type = "do.4")

# Look at occupancy coefficients
implicit_occ_m1$beta$psi

##                                  est       se
## psi1_A1.Int                -1.726916 0.466688
## psi1_A1.Int.psi1.wetland_t  3.231776 1.082804
## psi1_A1.Int.psi1.temp       0.157848 0.047338
## psi1_A1.Int.psi1.prec      -0.001340 0.034049

# Look at detection coefficients
implicit_occ_m1$beta$p

##                              est       se
## P[1-1]_D1.Int          -1.387104 0.192540
## P[1-1]_D1.Int.p.effort  0.056659 0.021839

What do our model coefficients above mean?

Occupancy in each time period is positively related to the amount of wetland (estimate positive), positively related to temperature (estimate positive), and not strongly related to the amount of precipitation (estimate close to 0 and overlaps zero)
Detection probability is positively related to search effort

Remember: the coefficient estimates are on the logit scale.

Unlike in explicit dynamic models, ocupancy probabilities beyond the first primary survey are not ‘derived’ parameters. However, the colonization (gamma) and extinction (epsilon) probabilities are included in the model object as derived parameters.

# Show first 6 rows of derived colonization (gamma) parameter
# row names = site_primary survey period
head(implicit_occ_m1$derived$gamma)

##           est         se lower_0.95 upper_0.95
## 1_1 0.8504866 0.08899526  0.5906605  0.9573094
## 2_1 0.5603327 0.10993615  0.3470332  0.7534561
## 3_1 0.3898104 0.06745188  0.2681722  0.5268984
## 4_1 0.3865307 0.11767132  0.1923929  0.6249701
## 5_1 0.4160844 0.09034024  0.2558502  0.5962620
## 6_1 0.2996474 0.08646611  0.1602293  0.4896434

# Show first 6 rows of derived extinction (epsilon) parameter
head(implicit_occ_m1$derived$epsilon)

##           est         se lower_0.95 upper_0.95
## 1_1 0.1495134 0.08899526  0.0426906  0.4093395
## 2_1 0.4396673 0.10993615  0.2465439  0.6529668
## 3_1 0.6101896 0.06745188  0.4731016  0.7318278
## 4_1 0.6134693 0.11767132  0.3750299  0.8076071
## 5_1 0.5839156 0.09034024  0.4037380  0.7441498
## 6_1 0.7003526 0.08646611  0.5103566  0.8397707

# Test whether epsion = 1 - gamma for first row
implicit_occ_m1$derived$epsilon$est[1] == 1 - implicit_occ_m1$derived$gamma$est[1]

## [1] TRUE

We will use this one model to predict occupancy at each site through time.

# Mean predicted occupancy probabilities can be accessed using:
predicted_occupancy <- implicit_occ_m1$real$psi %>%
  mutate(site_time = row.names(.),
         site = rep(x = c(1:100), 5),
         time = rep(x = 1:5, each = 100))

predicted_occupancy_summary <- predicted_occupancy %>%
  group_by(time) %>%
  summarize(mean_occ = mean(est),
            lower_0.95 = mean(est) - 1.96*mean(se),
            upper_0.95 = mean(est) + 1.96*mean(se)) %>%
  mutate(type = "implicit dynamics")

##                               est       se
## psi1_A1.Int              0.661709 2.502828
## psi1_A1.Int.psi.wetland -1.147814 4.418273
## psi1_A1.Int.psi.temp     0.644175 0.328885
## psi1_A1.Int.psi.prec    -0.038753 0.090326

##                                                   est       se
## epsilon1_C1.Int                              2.027540 1.182048
## epsilon1_C1.Int.epsilon.wetland_explicit_t1 -6.469573 2.808100

Now we have an estimate (with uncertainty) of the occupancy probability in all study sites during each year (primary survey). Let’s plot it alongside the predictions from the equivalent explicit dynamic occupancy model (code not shown, but on Github):

You can see that overall the point estimates and uncertainty are similar for each model type, but that they are not exactly the same. The trade-offs will depend on a number of things, including:

how much data you have (more data means generally more likely to be able to accurately estimate colonization and extinction parameters with explicit models)
how linked colonization and extinction probabilities are. Remember in the implicit dynamics model that extinction = 1 - colonization.

And that’s it!

We fit implicit dynamics (multi-season) occupancy models, predicted occupancy probabilities at each site and for the sample of surveyed sites, plus we compared the output to predictions from the explicit dynamic occupancy model.

Have suggestions for a tutorial you want on spatial ecology, mark-recapture models, occupancy models, or another related topic? Let me know!

Implicit vs Explicit Dynamics in Occupancy Models

How Implicit Dynamics Occupancy Models Work

Loading data to use in RPresence

Further reading and references