Introduction to Cormack-Jolly-Seber mark-recapture models in R
James E Paterson
Mark-recapture studies are useful for estimating population size and survival in wildlife populations, but can be overwhelming because of the massive amount of literature on their development and application.
Let’s dip our toes into some capture-recapture models using the famous dipper
data set and the marked
package in R.
The syntax (and approach) is almost identical using RMark
to access the widely used MARK program. The code from this tutorial is available on my GitHub page for mark-recapture workshops
Let’s get started!
# Load packages
# install.packages("marked") # first time using the package
library(marked) # For building CJS models
What are Cormack-Jolly-Seber mark-recapture models for?
You went to a study site and sampled a wildlife population. You identified animals through marks, tags, patterns, or some other characteristic. Later, you went back and sampled the population again. Some of the animals in this sample will be recaptures of the previous sampling event, and some will be new. You want to estimate the probability of animals surviving between capture events.
There are two challenges with these data.
First, some animals died, some animals were born (or hatched or similar), some left the study area (emmigrated) and some moved into the study area (immigrated) between sampling events (open population).
Second, the probability of finding, catching, or detecting an animal in a capture event is less than one. Thus, some animals are alive and in the study area during a capture event, but aren’t detected.
How do we turn this information into estimates of survival?
Cormack-Jolly-Seber models are mark-recapture models used to estimate two parameters:
- detection probability (\(p_t\), the probability of encountering a live animal at time t)
- apparent survival (\(\Phi_t\), the probability of an animal surviving and remaining in the study area between time t and t + 1).
“Apparent” survival confounds surviving and remaining in the study area. For the rest of the post, I use “survival” when describing apparent survival for simplicity. But, be warned!
## ch sex
## 1 0000001 Female
## 2 0000001 Female
## 3 0000001 Female
## 4 0000001 Female
## 5 0000001 Female
## 6 0000001 Female
The capture histories are stored in a dataframe where:
- each row represents one individual
- the capture histories (0 for undetected, 1 for detected) are stored in a field named “ch”
- grouping variables or individual covariates can be added (e.g. “sex” in this data set)
Now we’re ready to build a basic CJS model.
Building CJS models
There are two components of the model (one for detection probability, one for survial probability). Each sub-model is a linear model (here on a logit scale to bound probabilities between 0 and 1). We’ll start with the default model of constant detection probability and constant survival probability.
##
## Elapsed time in minutes: 0.0068
##
## crm Model Summary
##
## Npar : 2
## -2lnL: 666.8377
## AIC : 670.8377
##
## Beta
## Estimate
## Phi.(Intercept) 0.2420302
## p.(Intercept) 2.2270627
##
## crm Model Summary
##
## Npar : 2
## -2lnL: 666.8377
## AIC : 670.8377
##
## Beta
## Estimate se lcl ucl
## Phi.(Intercept) 0.2420302 0.1020079 0.04209475 0.4419656
## p.(Intercept) 2.2270627 0.3252176 1.58963625 2.8644892
The estimates are on a logit scale, so let’s change them back to real estimates using the inverse logit. You can calculate:
- ‘by hand’,
- with the
plogis
function, or - with the
predict
function
All options should give you the same estimate.
# exp(x)/(1+exp(x)) or R function "plogis"
exp(cjs.m1$results$beta$Phi)/(1+exp(cjs.m1$results$beta$Phi)) # real Phi (survival) estimate by hand
## (Intercept)
## 0.5602139
## (Intercept)
## 0.5602139
# real estimates of Phi and p with the "predict" function
predict(cjs.m1,
newdata = data.frame(sex = c("Female", "Male")),
se = TRUE) # In this case there are no groups or covariates in the model, so "newdata" not used
## $Phi
## occ estimate se lcl ucl
## 1 1 0.5602139 0.02513211 0.5105221 0.6087273
##
## $p
## occ estimate se lcl ucl
## 1 2 0.9026536 0.0285769 0.8305649 0.9460628
Therefore, the probability of animal surviving between capture events was 0.56, and the probability of catching an animal during a capture event was 0.90.
In our example, we assumed all time intervals between captures were equal (so they default to 1). However, capture events are often not perfectly spaced in time. For example, capture events may have been delayed due to weather constraints or there are missing surveys because someone was sick.
cjs.m1.unequaltime <- crm(dipper,
# time.interval vector is the interval between each capture event.
# for 7 capture events, there are 6 intervals
time.intervals = c(1,3,1,1,4,5))
##
Number of evaluations: 100 -2lnl: 746.9322147
## Elapsed time in minutes: 0.0052
## $Phi
## occ estimate
## 1 6 0.81671
##
## $p
## occ estimate
## 1 7 0.8610409
# Notice the predicted survival is higher now because the time length between capture events is >= 1
Now we’re on our way to being mark-recapture professionals. However, we took a few shortcuts because the model structure we were using was so simple. Usually, there are multiple models that may fit the data, so we should fit candidate models and compare them.
A more flexible approach is to:
- Process data (using
process.data
) - Create design data (using
make.design.data
). This allows adding variables that depend on time (e.g. seasonal changes, weather, or catastrophic events) - Set-up and execute the entire candidate model set
The key to building multiple models is to use formulas for linear models. Each sub-model (detection and survival) is a linear model (on the logit scale).
logit(\(\Phi\)) = \(\beta_0\) + \(\beta_1\) * \(x_1\) + \(\beta_2\) * \(x_2\) +…
logit(\(p\)) = \(\beta_0\) + \(\beta_1\) * \(x_1\) + \(\beta_2\) * \(x_2\) +…
Where \(x_1\), …. are the predictor variables and \(\beta_1\)… are the estimates for how the variables are related to the responses.
Let’s use the same data set, but fit multiple models and compare them.
# It is good practice to follow these steps:
# 1. Process data
# 2. Create design data
# 3. Set-up and execute entire candidate model set
# The advantage is it is easier to add covariates (e.g. time-varying effects from weather or experiments)
# Process data (and set grouping variables)
dipper.proc <- process.data(dipper,
group = "sex") # group variable has to be in the data
# Make design data (from processed data)
dipper.ddl <- make.design.data(dipper.proc)
# Outine formulas for each parameter
Phi.dot <- list(formula=~1) # ~1 is always a constant (or single estimate)
Phi.sex <- list(formula=~sex) # This formula will have an intercept (for females) and an estimate for the difference between females and males
p.sex <- list(formula=~sex) # Be careful of case-sensitive names. Use the exact group column that was in data
# Make new model (using design data) with constant survival, but different detection probabilities between sexes
cjs.m2 <- crm(dipper.proc,
dipper.ddl,
model.parameters = list(Phi = Phi.dot,
p = p.sex),
accumulate = FALSE)
##
Number of evaluations: 100 -2lnl: 666.3128716
## Elapsed time in minutes: 0.0032
##
## crm Model Summary
##
## Npar : 3
## -2lnL: 666.1934
## AIC : 672.1934
##
## Beta
## Estimate
## Phi.(Intercept) 0.2440005
## p.(Intercept) 1.9933749
## p.sexMale 0.5089575
# Is cjs.m2 a better fit than Phi.dot.p.dot (i.e. "cjs.m1")? (hint: look at AIC)
cjs.m3 <- crm(dipper.proc,
dipper.ddl,
model.parameters = list(Phi = Phi.sex,
p = p.sex),
accumulate = FALSE)
##
Number of evaluations: 100 -2lnl: 666.1518679
Number of evaluations: 200 -2lnl: 666.151813
## Elapsed time in minutes: 0.0035
##
## crm Model Summary
##
## Npar : 4
## -2lnL: 666.1518
## AIC : 674.1518
##
## Beta
## Estimate
## Phi.(Intercept) 0.22319868
## Phi.sexMale 0.04163579
## p.(Intercept) 2.01109165
## p.sexMale 0.47516992
You can see how using equations for each parameter is a flexible way of generating different models. However, this approach takes a long time for even a modest (>5) numbers of models. An even better approach is to use a function to generate the subsets of models to fit and then compare the models in a table.
# You almost always will fit more than a few models, so manually creating each model
# would take a long time and have a high chance of making a mistake.
# It is more efficient (and tidy) to use a function to automate model creation
# and put them in a table to rank based on AIC or QAIC
fit.dipper.cjs.models <- function(){
# Apparent survival (Phi) formula
Phi.sex.time <- list(formula=~sex*time) # Just like in other linear models "*" includes main effects and an interaction
Phi.time <- list(formula=~time) # differs between discrete times
Phi.sex <- list(formula=~sex) # differs between males and females
Phi.dot <- list(formula=~1) # constant survival
# Detection probability (p) formula
p.sex <- list(formula=~sex) # differs between males and females
p.time <- list(formula=~time) # one discrete estimate of p per capture event
p.dot <- list(formula=~1) # constant detection
# Construct all combinations and put into one model table
cml <- create.model.list(c("Phi","p")) # makes all possibile combinations of those parameter formulas
results <- crm.wrapper(cml,
data = dipper.proc,
ddl = dipper.ddl,
external = FALSE,
accumulate = FALSE,
hessian = TRUE)
return(results)
}
# Run function
dipper.cjs.models <- fit.dipper.cjs.models()
## Phi.dot.p.dot
##
## Elapsed time in minutes: 0.0036
## Phi.dot.p.sex
##
Number of evaluations: 100 -2lnl: 666.3128716
## Elapsed time in minutes: 0.0039
## Phi.dot.p.time
##
Number of evaluations: 100 -2lnl: 664.6522471
Number of evaluations: 200 -2lnl: 664.4802715
Number of evaluations: 300 -2lnl: 664.5413058
Number of evaluations: 400 -2lnl: 664.4809584
Number of evaluations: 500 -2lnl: 664.4801712
Number of evaluations: 100 -2lnl: 664.496642
Number of evaluations: 200 -2lnl: 664.4810316
## Elapsed time in minutes: 0.0041
## Phi.sex.p.dot
##
Number of evaluations: 100 -2lnl: 666.6859736
## Elapsed time in minutes: 0.0034
## Phi.sex.p.sex
##
Number of evaluations: 100 -2lnl: 666.1518679
Number of evaluations: 200 -2lnl: 666.151813
## Elapsed time in minutes: 0.0035
## Phi.sex.p.time
##
Number of evaluations: 100 -2lnl: 665.2316223
Number of evaluations: 200 -2lnl: 664.3177221
Number of evaluations: 300 -2lnl: 664.3076764
Number of evaluations: 400 -2lnl: 664.3163199
Number of evaluations: 500 -2lnl: 664.3572816
Number of evaluations: 600 -2lnl: 664.3042702
Number of evaluations: 100 -2lnl: 664.3162866
Number of evaluations: 200 -2lnl: 664.3072392
## Elapsed time in minutes: 0.0046
## Phi.sex.time.p.dot
##
Number of evaluations: 100 -2lnl: 658.2743479
Number of evaluations: 200 -2lnl: 658.7359792
Number of evaluations: 300 -2lnl: 658.2430781
Number of evaluations: 400 -2lnl: 658.2412897
Number of evaluations: 500 -2lnl: 658.2409277
Number of evaluations: 600 -2lnl: 658.2418873
Number of evaluations: 700 -2lnl: 658.2423799
Number of evaluations: 800 -2lnl: 658.242275
Number of evaluations: 900 -2lnl: 658.2415775
Number of evaluations: 1000 -2lnl: 658.2622653
Number of evaluations: 1100 -2lnl: 658.2413231
Number of evaluations: 1200 -2lnl: 658.3884781
Number of evaluations: 1300 -2lnl: 658.2409177
Number of evaluations: 100 -2lnl: 658.7848452
Number of evaluations: 200 -2lnl: 658.241138
Number of evaluations: 300 -2lnl: 658.3359205
Number of evaluations: 400 -2lnl: 658.2604029
Number of evaluations: 500 -2lnl: 658.2448818
Number of evaluations: 600 -2lnl: 658.247566
Number of evaluations: 700 -2lnl: 658.7940624
## Elapsed time in minutes: 0.0065
## Phi.sex.time.p.sex
##
Number of evaluations: 100 -2lnl: 657.9639451
Number of evaluations: 200 -2lnl: 657.9124693
Number of evaluations: 300 -2lnl: 657.9040659
Number of evaluations: 400 -2lnl: 657.9007529
Number of evaluations: 500 -2lnl: 657.8996276
Number of evaluations: 600 -2lnl: 657.899594
Number of evaluations: 700 -2lnl: 658.0528476
Number of evaluations: 800 -2lnl: 657.9072046
Number of evaluations: 900 -2lnl: 657.9940957
Number of evaluations: 1000 -2lnl: 657.8996
Number of evaluations: 1100 -2lnl: 657.9018488
Number of evaluations: 1200 -2lnl: 657.9000604
Number of evaluations: 1300 -2lnl: 657.9907821
Number of evaluations: 1400 -2lnl: 657.9073193
Number of evaluations: 1500 -2lnl: 657.899591
Number of evaluations: 100 -2lnl: 658.3568935
Number of evaluations: 200 -2lnl: 657.9211854
Number of evaluations: 300 -2lnl: 658.0006531
Number of evaluations: 400 -2lnl: 657.9001002
Number of evaluations: 500 -2lnl: 658.0017745
Number of evaluations: 600 -2lnl: 657.9012363
Number of evaluations: 700 -2lnl: 658.3609701
Number of evaluations: 800 -2lnl: 657.9001076
## Elapsed time in minutes: 0.0062
## Phi.sex.time.p.time
##
Number of evaluations: 100 -2lnl: 656.8666384
Number of evaluations: 200 -2lnl: 655.8307533
Number of evaluations: 300 -2lnl: 655.6269719
Number of evaluations: 400 -2lnl: 655.5286275
Number of evaluations: 500 -2lnl: 655.4913761
Number of evaluations: 600 -2lnl: 655.4803468
Number of evaluations: 700 -2lnl: 655.4750079
Number of evaluations: 800 -2lnl: 655.4733072
Number of evaluations: 900 -2lnl: 655.4733619
Number of evaluations: 1000 -2lnl: 655.7305761
Number of evaluations: 1100 -2lnl: 655.4783407
Number of evaluations: 1200 -2lnl: 655.4912409
Number of evaluations: 1300 -2lnl: 655.4885978
Number of evaluations: 1400 -2lnl: 655.4768671
Number of evaluations: 1500 -2lnl: 655.474615
Number of evaluations: 1600 -2lnl: 655.6564642
Number of evaluations: 1700 -2lnl: 655.4764062
Number of evaluations: 1800 -2lnl: 655.5424416
Number of evaluations: 1900 -2lnl: 655.4769784
Number of evaluations: 2000 -2lnl: 655.5443691
Number of evaluations: 2100 -2lnl: 655.4889033
Number of evaluations: 2200 -2lnl: 655.5437948
Number of evaluations: 2300 -2lnl: 655.4732849
Number of evaluations: 100 -2lnl: 655.5892537
Number of evaluations: 200 -2lnl: 655.5413415
Number of evaluations: 300 -2lnl: 655.5241377
Number of evaluations: 400 -2lnl: 655.4921313
Number of evaluations: 500 -2lnl: 655.4792322
Number of evaluations: 600 -2lnl: 655.4753242
Number of evaluations: 700 -2lnl: 655.8477481
Number of evaluations: 800 -2lnl: 655.4892412
Number of evaluations: 900 -2lnl: 655.8065579
Number of evaluations: 1000 -2lnl: 655.4773135
Number of evaluations: 1100 -2lnl: 655.577421
Number of evaluations: 1200 -2lnl: 655.4783922
Number of evaluations: 1300 -2lnl: 655.4842987
## Elapsed time in minutes: 0.0098
## Phi.time.p.dot
##
Number of evaluations: 100 -2lnl: 659.7864808
Number of evaluations: 200 -2lnl: 659.7964259
Number of evaluations: 300 -2lnl: 659.7311214
Number of evaluations: 400 -2lnl: 659.8513169
Number of evaluations: 100 -2lnl: 659.7964376
Number of evaluations: 200 -2lnl: 659.7675953
## Elapsed time in minutes: 0.0034
## Phi.time.p.sex
##
Number of evaluations: 100 -2lnl: 659.1953389
Number of evaluations: 200 -2lnl: 659.1583364
Number of evaluations: 300 -2lnl: 659.6055011
Number of evaluations: 400 -2lnl: 659.1586396
Number of evaluations: 500 -2lnl: 659.1748988
Number of evaluations: 100 -2lnl: 659.2247344
Number of evaluations: 200 -2lnl: 659.1905395
## Elapsed time in minutes: 0.0036
## Phi.time.p.time
##
Number of evaluations: 100 -2lnl: 657.9114231
Number of evaluations: 200 -2lnl: 657.0242112
Number of evaluations: 300 -2lnl: 656.9585639
Number of evaluations: 400 -2lnl: 656.950212
Number of evaluations: 500 -2lnl: 657.2868122
Number of evaluations: 600 -2lnl: 656.9532409
Number of evaluations: 700 -2lnl: 656.9719899
Number of evaluations: 800 -2lnl: 656.9543594
Number of evaluations: 900 -2lnl: 656.9670165
Number of evaluations: 1000 -2lnl: 656.9507391
Number of evaluations: 1100 -2lnl: 656.950212
Number of evaluations: 100 -2lnl: 658.2954998
Number of evaluations: 200 -2lnl: 656.9623214
Number of evaluations: 300 -2lnl: 657.0383159
Number of evaluations: 400 -2lnl: 656.9668133
Number of evaluations: 500 -2lnl: 657.019537
Number of evaluations: 600 -2lnl: 656.9523229
## Elapsed time in minutes: 0.0048
## model npar AIC DeltaAIC weight neg2lnl
## 1 Phi(~1)p(~1) 2 670.8377 0.000000 4.021185e-01 666.8377
## 2 Phi(~1)p(~sex) 3 672.1934 1.355702 2.041583e-01 666.1934
## 4 Phi(~sex)p(~1) 3 672.6762 1.838535 1.603693e-01 666.6762
## 10 Phi(~time)p(~1) 7 673.7301 2.892416 9.468339e-02 659.7301
## 5 Phi(~sex)p(~sex) 4 674.1518 3.314144 7.668259e-02 666.1518
## 11 Phi(~time)p(~sex) 8 675.1583 4.320639 4.635954e-02 659.1583
## 3 Phi(~1)p(~time) 7 678.4802 7.642502 8.806549e-03 664.4802
## 6 Phi(~sex)p(~time) 8 680.3043 9.466601 3.537592e-03 664.3043
## 12 Phi(~time)p(~time) 12 680.9502 10.112543 2.561198e-03 656.9502
## 7 Phi(~sex * time)p(~1) 13 684.2409 13.403249 4.941690e-04 658.2409
## 8 Phi(~sex * time)p(~sex) 14 685.8996 15.061922 2.156250e-04 657.8996
## 9 Phi(~sex * time)p(~time) 18 691.4733 20.635616 1.328578e-05 655.4733
## convergence
## 1 0
## 2 0
## 4 0
## 10 0
## 5 0
## 11 0
## 3 0
## 6 0
## 12 0
## 7 0
## 8 0
## 9 0
From the model table, we can see that the simplest model (constant survival, constant detection probability) is the most supported (lowest AIC). We already looked at the predicted survival and detection probability for this model (cjs.m1
above).
Time-dependent variables
The last section I want to discuss is how to add variables that change through time. Time-dependent variables are easy to add using the design data matrices.
As an example, we will add a Flood
variable to the dipper
design data (dipper.ddl
), and create another CJS model testing if flooding affects survival or detection probability.
Let’s say there was a flood at the study site during capture events 3 and 4. Flooding affects survival from event 3 to 4, and survival from event 4 to 5. Flooding affects detection probability at events 3 and 4.
# Sometimes, you may want to model a variable that changes through time,
# rather than a variable that differs between groups or individuals (e.g. sex, or some other demographic variable)
# As an example, we will model how flooding affects survival in dippers
# time varying variables can be added to the design matrix, which is part of the reason
# I suggested to get into the practice of creating processed and design data
# We will model Flood as a variable with two levels (floods during events 3, and 4)
# Note there is a difference between "time" and "Time" (very tricky!)
# Phi flood
dipper.ddl$Phi$Flood <- "noflood" # New variable set as 0
dipper.ddl$Phi$Flood[dipper.ddl$Phi$time==3 | dipper.ddl$Phi$time==4] <- "flood" # Phi 1 is survival from time 1 to time 2
# p flood
dipper.ddl$p$Flood <- "noflood"
dipper.ddl$p$Flood[dipper.ddl$p$time==3 | dipper.ddl$p$time==4] <- "flood"
# Check it worked
head(dipper.ddl$Phi)
## id occ time cohort age sex freq group Time Cohort Age order Flood
## 1 1 1 1 6 0 Female 12 Female 0 5 0 1 noflood
## 2 1 2 2 6 0 Female 12 Female 1 5 0 2 noflood
## 3 1 3 3 6 0 Female 12 Female 2 5 0 3 flood
## 4 1 4 4 6 0 Female 12 Female 3 5 0 4 flood
## 5 1 5 5 6 0 Female 12 Female 4 5 0 5 noflood
## 6 1 6 6 6 0 Female 12 Female 5 5 0 6 noflood
## id occ time cohort age sex freq group Time Cohort Age fix order
## 1 1 2 2 6 0 Female 12 Female 0 5 0 0 1
## 2 1 3 3 6 0 Female 12 Female 1 5 0 0 2
## 3 1 4 4 6 0 Female 12 Female 2 5 0 0 3
## 4 1 5 5 6 0 Female 12 Female 3 5 0 0 4
## 5 1 6 6 6 0 Female 12 Female 4 5 0 1 5
## 6 1 7 7 6 1 Female 12 Female 5 5 1 NA 6
## Flood
## 1 noflood
## 2 flood
## 3 flood
## 4 noflood
## 5 noflood
## 6 noflood
# Make a model with Flood variable
Phi.Flood <- list(formula =~Flood)
p.Flood <- list(formula =~Flood)
dipper.cjs.flood <- crm(dipper.proc,
dipper.ddl,
model.parameters = list(Phi = Phi.Flood, p = p.Flood),
accumulate = FALSE, hessian = TRUE)
##
Number of evaluations: 100 -2lnl: 666.5218993
Number of evaluations: 200 -2lnl: 666.5218993
## Elapsed time in minutes: 0.003
## $Phi
## Flood occ estimate se lcl ucl
## 1 noflood 6 0.5664593 0.0328039 0.5014029 0.6293029
## 2 flood 4 0.5504444 0.0401976 0.4710516 0.6273480
##
## $p
## Flood occ estimate se lcl ucl
## 1 noflood 7 0.9092540 0.03165140 0.8253035 0.9550588
## 2 flood 4 0.8783819 0.06003167 0.7059450 0.9560022
The estimates for survival (and detection) are very similar for events where the study was flooded and not flooded. This shouldn’t be too surprising as we just invented this variable.
To review, in this tutorial we covered:
- creating Cormack-Jolly-Seber models using formulas
- estimating survival and detection probabilities from models
- comparing models using model-selection (AIC, but can be done with AICc, QAIC, QAICc)
- adding variables to design data matrices
One thing we have not yet addressed is whether the model adequately fit the data. Goodness-of-fit tests should be part of any mark-recapture analyses.
In the next post, we will test how well CJS models fit the data in R.
Further reading
Don’t already have your data in capture-history format? Check-out my past post on creating capture histories.
Program MARK: a gentle introduction (e-book; free). http://www.phidot.org/software/mark/docs/book/
Phidot Forum (for MARK and RMark): http://www.phidot.org/forum/index.php
Williams, Byron K., James D. Nichols, and Michael J. Conroy. 2002. Analysis and management of animal populations (usually available at University libraries)
Rachel S. McCrea, Byron J. T. Morgan. 2014. Analysis of Capture-Recapture Data. https://www.crcpress.com/Analysis-of-Capture-Recapture-Data/McCrea-Morgan/p/book/9781439836590 (usually available at University libraries)
marked Package Vignette https://cran.r-project.org/web/packages/marked/vignettes/markedVignette.pdf