Mark-recapture models are a useful framework to test hypotheses about what drives differences in wildlife survival and detection probability. However, it is important to assess the goodness-of-fit (GOF) for these models before we make inferences.

What causes lack of fit?

In this tutorial, I deal with overdispersion by:

  1. Calculating the variance inflation factor (\(\hat{c}\), “c-hat”).
  2. Adjusting model selection when there is evidence of overdispersion.

The code from this tutorial is available on my GitHub page for mark-recapture workshops.

Let’s get started!

What is \(\hat{c}\) (“c-hat”) or the variance inflation factor?

The variance inflation factor (\(\hat{c}\)) can help adjust model output based on extra-binomial noise or overdispersion. Extra-binomial noise refers to more variance in the capture history data (binomial strings of 0 and 1) than expected.

The variance inflation factor is also an estimate of the fit of the model to the observed data.

One (simple!) estimate is: \(\hat{c}\) = \(\chi\)2/df (in maths)

Or (in words): the sum from Test2 and Test3 divided by the degrees of freedom given in Test1 from U-CARE components.

This method of \(\hat{c}\) estimation only applies if you’re using a time-dependent model as the general model. By time-dependent, I mean that there are separate estimates of p and \(\Phi\) for each time period. If you’re using a simpler general model, you will want to use a different estimate of \(\hat{c}\).

For other approaches, look up the median or bootstrap methods in Gentle Introduction to MARK.

Since we have two groups in our dipper example (females and males), our estimate of \(\hat{c}\) = (male test stat + female test stat) / (male df + female df)

This represents a test for the model: Phi.sex* time.p.sex* time

##                           chi2 degree_of_freedom p_value
## Gof test for CJS model: 11.062                 9   0.271
##                           chi2 degree_of_freedom p_value
## Gof test for CJS model: 10.276                12   0.592
## [1] 1.016095

A simple guide to interpreting \(\hat{c}\) values:

  • \(\hat{c}\) < 1: generally we ignore slight deviations with underdispersion,
  • \(\hat{c}\) = 1: variance in data is as expected, given the model
  • \(\hat{c}\) > 1: there is extra binomial noise in the data. We can adjust model selection to use QAIC and increase variance around parameter estimates (widens confidence intervals).
  • \(\hat{c}\) > 3: generally represents poor fit and suggests model structure should be adjusted (is there an important part of the system we haven’t modelled?)

So, now we have an estimate of \(\hat{c}\), but what do we do with that?

Adjusting model selection when there is evidence of overdispersion

We can adjust our model-selection to account for this extra noise and to adjust (increase) the standard error in our predictions.

##                       model npar      AIC  DeltaAIC       weight  neg2lnl
## 1              Phi(~1)p(~1)    2 670.8377  0.000000 4.021185e-01 666.8377
## 2            Phi(~1)p(~sex)    3 672.1934  1.355702 2.041583e-01 666.1934
## 4            Phi(~sex)p(~1)    3 672.6762  1.838535 1.603693e-01 666.6762
## 10          Phi(~time)p(~1)    7 673.7301  2.892416 9.468339e-02 659.7301
## 5          Phi(~sex)p(~sex)    4 674.1518  3.314144 7.668259e-02 666.1518
## 11        Phi(~time)p(~sex)    8 675.1583  4.320639 4.635954e-02 659.1583
## 3           Phi(~1)p(~time)    7 678.4802  7.642502 8.806549e-03 664.4802
## 6         Phi(~sex)p(~time)    8 680.3043  9.466601 3.537592e-03 664.3043
## 12       Phi(~time)p(~time)   12 680.9502 10.112543 2.561198e-03 656.9502
## 7     Phi(~sex * time)p(~1)   13 684.2409 13.403249 4.941690e-04 658.2409
## 8   Phi(~sex * time)p(~sex)   14 685.8996 15.061922 2.156250e-04 657.8996
## 9  Phi(~sex * time)p(~time)   18 691.4733 20.635616 1.328578e-05 655.4733
##    convergence
## 1            0
## 2            0
## 4            0
## 10           0
## 5            0
## 11           0
## 3            0
## 6            0
## 12           0
## 7            0
## 8            0
## 9            0

We have a set of models ranked by AIC, but this ranking assumes that there is no overdispersion (\(\hat{c}\) = 1).

Let’s adjust our model table (dipper.cjs.models) to use QAIC with our \(\hat{c}\) estimate.

##                       model npar      AIC  DeltaAIC       weight  neg2lnl
## 1              Phi(~1)p(~1)    2 670.8377  0.000000 4.021185e-01 666.8377
## 2            Phi(~1)p(~sex)    3 672.1934  1.355702 2.041583e-01 666.1934
## 3            Phi(~sex)p(~1)    3 672.6762  1.838535 1.603693e-01 666.6762
## 4           Phi(~time)p(~1)    7 673.7301  2.892416 9.468339e-02 659.7301
## 5          Phi(~sex)p(~sex)    4 674.1518  3.314144 7.668259e-02 666.1518
## 6         Phi(~time)p(~sex)    8 675.1583  4.320639 4.635954e-02 659.1583
## 7           Phi(~1)p(~time)    7 678.4802  7.642502 8.806549e-03 664.4802
## 8         Phi(~sex)p(~time)    8 680.3043  9.466601 3.537592e-03 664.3043
## 9        Phi(~time)p(~time)   12 680.9502 10.112543 2.561198e-03 656.9502
## 10    Phi(~sex * time)p(~1)   13 684.2409 13.403249 4.941690e-04 658.2409
## 11  Phi(~sex * time)p(~sex)   14 685.8996 15.061922 2.156250e-04 657.8996
## 12 Phi(~sex * time)p(~time)   18 691.4733 20.635616 1.328578e-05 655.4733
##    convergence     chat     QAIC
## 1            0 1.016095 660.2748
## 2            0 1.016095 661.6407
## 3            0 1.016095 662.1159
## 4            0 1.016095 663.2798
## 5            0 1.016095 663.5998
## 6            0 1.016095 664.7171
## 7            0 1.016095 667.9546
## 8            0 1.016095 669.7815
## 9            0 1.016095 670.5439
## 10           0 1.016095 673.8142
## 11           0 1.016095 675.4783
## 12           0 1.016095 681.0904

Model selection will favour simpler models more as \(\hat{c}\) increases. In the current case, the models were already mostly ordered from most simple to more complicated.

The end

To review, we covered:

  • Estimating the variance inflation factor (\(\hat{c}\)), and
  • Adjusting model selection approach when \(\hat{c}\) is greater than 1.

In general, the other approaches for estimating \(\hat{c}\), especially the bootstrap and median \(\hat{c}\) approach, are recommended more than the sums of Test2 and Test3, as used here. Unfortunately, there is no simple approach to do this within R. I recommend exporting the mark-recapture models and estimating \(\hat{c}\) in MARK.

Further reading

  1. Creating capture histories.

  2. Introduction to Cormack-Jolly-Seber mark-recapture models in R

  3. Goodness-of-fit tests for mark-recapture models in R

  4. Program MARK: a gentle introduction (e-book; free). http://www.phidot.org/software/mark/docs/book/. I strongly recommend reading Chapter 5 on goodness-of-fit tests.

  5. Phidot Forum (for MARK and RMark): http://www.phidot.org/forum/index.php

  6. Williams, Byron K., James D. Nichols, and Michael J. Conroy. 2002. Analysis and management of animal populations (usually available at University libraries)

  7. Rachel S. McCrea, Byron J. T. Morgan. 2014. Analysis of Capture-Recapture Data. https://www.crcpress.com/Analysis-of-Capture-Recapture-Data/McCrea-Morgan/p/book/9781439836590 (usually available at University libraries)

  8. marked Package Vignette https://cran.r-project.org/web/packages/marked/vignettes/markedVignette.pdf

  9. R2ucare Package Vignette https://cran.r-project.org/web/packages/R2ucare/vignettes/vignette_R2ucare.html