Goodness-of-fit tests for mark-recapture models in R

Mark-recapture models are a useful framework to test hypotheses about what drives differences in wildlife survival and detection probability. However, it is important to assess the goodness-of-fit (GOF) for these models before we make inferences.

What causes lack of fit?

assumption violations (more on that below!),
model misspecification (e.g. missing important variables), or
unmodelled heterogeneity (e.g. some ‘trap happy’ or ‘trap-shy’ animals)

In this tutorial, I cover testing assumptions of the Cormack-Jolly-Seber model using R2ucare.

The code from this tutorial is available on my GitHub page for mark-recapture workshops.

Let’s get started!

What are Cormack-Jolly-Seber mark-recapture models for?

Cormack-Jolly-Seber models are mark-recapture models used to estimate two parameters:

detection probability (\(p_t\), the probability of encountering a live animal at time t)
apparent survival (\(\Phi_t\), the probability of an animal surviving and remaining in the study area between time t and t + 1).

For an introduction to fitting CJS models, check out my tutorial on CJS models in R.

Assumptions of the Cormack-Jolly-Seber model

The four basic assumptions of this model are:

Every marked animal present in the population at time t has the same probability of recapture (\(p_t\)) (“equal detection assumption”)
Every marked animal in the population immediately after time t has the same probability of surviving to time t+1 (“equal survival assumption”)
Marks are not lost or missed.
Marked animals are released right after sampling, and sampling is essentially instantaneous compared to the interval between occasions.

Generally, assumptions 3 and 4 aren’t tested formally. But, U-CARE via the R2ucare package can test assumptions 1 and 2. (Since this always confused me and I’m frequently asked: CARE = CApture REcapture)

R2ucare is a translated version of U-CARE, and doesn’t require any installation of U-CARE.

# install.packages("R2ucare") # first time only
library(R2ucare) # For Goodness of fit tests
library(dplyr) # for tidy data
library(magrittr) # for pipes

The names of the tests we will use (“Test 1”, Test 2" and “Test 3”) are confusing and uninformative, but are enshrined in the mark-recapture world now, so we just have to deal with them.

R2ucare requires capture history data in a slightly different format than the format required by unmarked or RMark to fit the models. We need a matrix with a column for each capture event and a row for each individual.

Let’s load the dipper data set we used for fitting CJS models, but turn the capture history into a matrix. We’ll create different matrices for females and males (the grouping factor) to better diagnose any sources of heterogeneity in each group.

# Load dipper data (in "marked", but also in R2ucare package in different format)
# I'm loading `marked` version to be consistent with other tutorials/workshops
# and for fitting models below in "marked".
data(dipper, package = "marked")

# Full data
dipper.ch.gof <- dipper$ch %>%
  strsplit('') %>%
  sapply(`[`) %>%
  t() %>%
  unlist() %>%
  as.numeric %>%
  matrix(nrow = nrow(dipper))

# Females only
dipper.fem.ch.gof <- dipper$ch[dipper$sex == "Female"] %>%
  strsplit('') %>%
  sapply(`[`) %>%
  t() %>%
  unlist() %>%
  as.numeric %>%
  matrix(nrow = nrow(dipper[dipper$sex == "Female",]))

# Males only
dipper.mal.ch.gof <- dipper$ch[dipper$sex == "Male"] %>%
  strsplit('') %>%
  sapply(`[`) %>%
  t() %>%
  unlist() %>%
  as.numeric %>%
  matrix(nrow = nrow(dipper[dipper$sex == "Male",]))

Alright, now what about these “tests”?

Test 1 = the omnibus or overall test. Overall, is there evidence that animals have equal capture probabilities and equal survival? Tells us if there’s a problem, but not where (which events) or why (which assumption is violated).
Test 2 = Does recapture depend on when an animal was first marked? Tests the equal catchability assumption.
Test 3 = Does marking affect survival? Tests the equal survival assumption.

Generally, try Test 1, and if there is evidence of lack-of-fit, use Test 2 and Test 3 to determine which assumptions are violated. Note: These tests are always for time-dependent models, i.e. “Phi.time.p.time”.

Let’s start with Test 1. We will just consider the female capture data

# first argument = capture history matrix, second argument = capture history frequency (vector of 1's for our example)
overall_CJS(dipper.fem.ch.gof, rep(1,nrow(dipper[dipper$sex == "Female",]))) # Females only

##                           chi2 degree_of_freedom p_value
## Gof test for CJS model: 10.276                12   0.592

We can see that using the \(\chi\)² value and an alpha level of 0.05, we fail to reject the null hypothesis. Thus, there is no strong evidence for lack-of-fit, but we can’t really see what we’re testing. Test 1 is actually a combination of Test 2 and Test 3.

Next, let’s perform Tests 2, and 3 on the subset of data for females to better understand the components and how they’re related.

Test 2 asks the question: does recapture depend on when an animal was first marked? There are two components: Test2.CT and Test2.CL.

Test 2.CT tests whether there is a difference in p at t+1 between those captured and not captured at t when animals are known to be alive because they are recaptured later in the study.

# first argument = capture history matrix, second argument is frequency of each capture history (1 for example)
test2ct_fem <- test2ct(dipper.fem.ch.gof, rep(1,nrow(dipper[dipper$sex == "Female",]))) 
test2ct_fem

## $test2ct
##      stat        df     p_val sign_test 
##     3.250     4.000     0.517    -0.902 
## 
## $details
##   component dof stat p_val signed_test test_perf
## 1         2   1    0     1           0    Fisher
## 2         3   1    0     1           0    Fisher
## 3         4   1    0     1           0    Fisher
## 4         5   1 3.25 0.071      -1.803    Fisher

Notice that there were not enough individuals for some components, which is often the case, depending on the capture histories in your data set.

Next, we perform Test2.CL. This tests if there is a difference in the expected time of next recapture between individuals captured and not captured at t when animals are known to be alive.

test2cl_fem <- test2cl(dipper.fem.ch.gof, rep(1,nrow(dipper[dipper$sex == "Female",])))
test2cl_fem

## $test2cl
##  stat    df p_val 
##     0     0     1 
## 
## $details
##   component dof stat p_val test_perf
## 1         2   0    0     0      None
## 2         3   0    0     0      None
## 3         4   0    0     0      None

Uhoh! No tests were performed for test2cl (test_perf = “none” for all components) because of low sample sizes. These tests use contingency tables, so higher cell counts (more captures and recaptures) will increase your ability to perform the tests.

Next, we perform Test 3, which tests whether marking affects survival (equal survival assumption). There are two components to Test 3 (Test3.SR and Test3.SM).

Test3.SR: Does marking affect survival? Do individuals with previous marks have different survival rates than first-time captures?

test3sr_fem <- test3sr(dipper.fem.ch.gof, rep(1,nrow(dipper[dipper$sex == "Female",])))
test3sr_fem

## $test3sr
##      stat        df     p_val sign_test 
##     4.985     5.000     0.418     1.428 
## 
## $details
##   component  stat p_val signed_test  test_perf
## 1         2 0.858 0.354       0.926 Chi-square
## 2         3 3.586 0.058       1.894 Chi-square
## 3         4 0.437 0.509       0.661 Chi-square
## 4         5 0.103 0.748      -0.321 Chi-square
## 5         6 0.001 0.982       0.032 Chi-square

There is no evidence that individuals with previous marks have different survival rates than individuals caught for the first time.

The final test component is Test3.SM. For animals seen again, does when they are recaptured depend on whether they were marked on or before time t?

test3sm_fem <- test3sm(dipper.fem.ch.gof, rep(1,nrow(dipper[dipper$sex == "Female",])))
test3sm_fem

## $test3sm
##  stat    df p_val 
## 2.041 3.000 0.564 
## 
## $details
##   component  stat df p_val test_perf
## 1         2 1.542  1 0.214    Fisher
## 2         3     0  1     1    Fisher
## 3         4 0.499  1  0.48    Fisher
## 4         5     0  0     0      None
## 5         6     0  0     0      None

The time to recapture of individuals known to be alive does not depend on when they were first marked. Notice again that some components (5, 6) were not performed because of low cell counts.

The overall test statistic (Test 1) is the sum of test statistics in all component tests (Test2.CT + Test2.CL + Test3.SR + Test3.SM). Looking above to our Test 1, the \(\chi\)² value was 10.276.

Let’s check by adding the \(\chi\)² from each component:

4.985+2.041+3.250+0

## [1] 10.276

Review

Wow, that was a lot! To summarize:

The CJS assumptions of equal catchability and equal survival can be tested with the R2ucare package using Tests 1, 2, and 3.
Test 1 (omnibus test) is okay if there is no lack-of-fit.
Test 2 and Test 3 are very useful for diagnosing when (which events) and why (detection or survival) there is evidence for lack-of-fit.
The sub-components often lack adequate sample sizes for tests when the capture data is sparse (e.g. few recaptures, and capture events with small number of animals).

In the next tutorial, I’ll introduce the variance inflation factor (\(\hat{c}\)) and adjusting model selection approach when \(\hat{c}\) is greater than 1.

What are Cormack-Jolly-Seber mark-recapture models for?

Assumptions of the Cormack-Jolly-Seber model

Review

Further reading