Estimating population size is one of the primary purposes we collect mark-recapture data, so how can we estimate that?

In this tutorial, I cover:

  1. Basics of Jolly-Seber models (using POPAN formulation),
  2. Fitting POPAN Jolly-Seber models with the R package marked
  3. Deriving population size using estimated parameters from POPAN Jolly-Seber models

The code from this tutorial is available on my GitHub page for mark-recapture models.

Let’s get started!

What are Jolly-Seber models?

Jolly-Seber models are a class of open population mark-recapture models that can be used to estimate population size, population growth, and recruitment (depending on the formulation).

In this tutorial, I will consider the POPAN formulation, which can be used to estimate population size. The POPAN Jolly-Seber model estimates four parameters:

  • detection probability (\(p_t\), the probability of detecting an individual at time t if it is alive)
  • apparent survival (\(\Phi_t\)) is the probability of an individual surviving to the next time step
  • Super population size (\(N_{super}\)) is the total number of individuals available to enter the study
  • Probability of new individuals from the super-population entering at time t (through births and immigration \(pent_t\))

The p and \(\Phi\) parameters are estimated using the same approach as in Cormack-Jolly-Seber models. The additional two parameters estimate how many new individuals enter the population between each time period.

New individuals enter the population between time steps. The number of individuals that enter at time t is the product of \(N_{super}\) and \(pent_t\).

Since the super-population size is the total number of individuals that can enter the study, then the sum of all \(pent_t\) values must be 1. Thus, the model estimates one less \(pent\) value and the last estimate is calculated by subtraction. This constraint also means the parameters need a special link function (the multinomial logit, mlogit, instead of the logit link used for \(p\) and \(\Phi\) parameters).

We will need to remember this when we estimate the population size in the first capture event.

Let’s load the dipper data set and then fit some Jolly-Seber models.

##        ch    sex
## 1 0000001 Female
## 2 0000001 Female
## 3 0000001 Female
## 4 0000001 Female
## 5 0000001 Female
## 6 0000001 Female

For the model fitting, we will use a function to fit a series of models that might be feasible (e.g. time-dependent differences in survival or constant survival, time-dependent entry probability or constant).

# Jolly-Seber models (POPAN formulation) are open population models, and 
# can be used to estimate abundance by including two more parameters than the CJS

# Additional parameters:
# Nsuper (or "superpopulation") = total number of individuals available to enter population throughout study
# pent ("probability of entry") =  the rate at which individuals enter the population from Nsuper (via births and immigration)

# WARNING: there is no adequate GOF tests for Jolly-Seber models. 
# One common method: Test equivalent structure of CJS model with R2ucare (previous tutorials).

# This tests *some* assumptions of Phi and p.
# Jolly-Seber models have an additional assumption:
# marked AND unmarked animals have same p (R2ucare doesn't test this)
# This assumption is required to estimate total abundance (sum of marked and unmarked animals in population)

# First, process data (Notice model = "JS", previous version = "CJS")
dipper.js.proc <- process.data(dipper, 
                               model = "JS", 
                               groups = "sex")

# Second, make design data (from processed data)
dipper.js.ddl <- make.design.data(dipper.js.proc)

fit.js.dipper.models <- function(){
  # Phi formulas
  Phi.dot <- list(formula=~1)
  Phi.time <- list(formula=~time)
  # p formulas
  p.dot <- list(formula=~1)
  # pent formulas. pent estimates MUST SUM to 1 (for each group).
  # This is constained using a Multinomial Logit link
  pent.time <- list(formula=~time)
  pent.sex <- list(formula=~sex)
  pent.dot <- list(formula=~1)
  # Nsuper formulas. Don't confuse "N" from model with predicted population size
  N.sex <- list(formula=~sex)
  N.dot <- list(formula=~1)
  cml <- create.model.list(c("Phi","p", "pent", "N"))
  results <- crm.wrapper(cml, data = dipper.js.proc, ddl = dipper.js.ddl,
                         external = FALSE, accumulate = FALSE, hessian = TRUE)
  
  return(results)
}

# Run function
dipper.js.models <- fit.js.dipper.models()
##                                model npar      AIC  DeltaAIC       weight
## 1          Phi(~1)p(~1)pent(~1)N(~1)    4 742.0136  0.000000 0.4077537375
## 3        Phi(~1)p(~1)pent(~sex)N(~1)    5 743.6466  1.632999 0.1802173625
## 2        Phi(~1)p(~1)pent(~1)N(~sex)    5 743.9901  1.976459 0.1517802792
## 7       Phi(~time)p(~1)pent(~1)N(~1)    9 745.0847  3.071070 0.0878058694
## 4      Phi(~1)p(~1)pent(~sex)N(~sex)    6 745.6211  3.607445 0.0671507949
## 9     Phi(~time)p(~1)pent(~sex)N(~1)   10 746.7239  4.710281 0.0386877491
## 8     Phi(~time)p(~1)pent(~1)N(~sex)   10 747.0611  5.047469 0.0326854214
## 10  Phi(~time)p(~1)pent(~sex)N(~sex)   11 748.6968  6.683216 0.0144263381
## 5       Phi(~1)p(~1)pent(~time)N(~1)    9 749.1438  7.130150 0.0115373337
## 6     Phi(~1)p(~1)pent(~time)N(~sex)   10 751.1204  9.106768 0.0042942617
## 11   Phi(~time)p(~1)pent(~time)N(~1)   14 752.0725 10.058881 0.0026677167
## 12 Phi(~time)p(~1)pent(~time)N(~sex)   15 754.0487 12.035103 0.0009931358
##     neg2lnl convergence
## 1  734.0136           0
## 3  733.6466           0
## 2  733.9901           0
## 7  727.0847           0
## 4  733.6211           0
## 9  726.7239           0
## 8  727.0611           0
## 10 726.6968           0
## 5  731.1438           0
## 6  731.1204           0
## 11 724.0725           0
## 12 724.0487           0

The model table (dipper.js.models) suggests the most supported model has constant survival, constant detection probability, and constant probability of entry (~1 = intercept only model). Although we can model-average the results, for simplicity we will just continue with the most-supported model.

Let’s look at the model output of the most supported model, and then estimate real parameters (remember the coefficients are on different scales depending on the link function).

We can use the predict function to estimate the real parameters, or calculate them ‘by hand’ using the link functions for each parameter.

## 
## crm Model Summary
## 
## Npar :  4
## -2lnL:  734.0136
## AIC  :  742.0136
## 
## Beta
##                   Estimate        se        lcl      ucl
## Phi.(Intercept)  0.2361657 0.1014104 0.03740139 0.434930
## p.(Intercept)    2.3455039 0.3412612 1.67663204 3.014376
## pent.(Intercept) 0.6703973 0.2232187 0.23288867 1.107906
## N.(Intercept)    1.9348641 0.4583174 1.03656213 2.833166
## $Phi
##   occ  estimate         se       lcl       ucl
## 1   1 0.5587685 0.02500235 0.5093493 0.6070503
## 
## $p
##   occ  estimate         se      lcl       ucl
## 1   1 0.9125762 0.02722612 0.842458 0.9532194
## 
## $pent
##   time occ  estimate          se       lcl       ucl
## 1    2   2 0.1535743 0.002692886 0.1483701 0.1589269
## 2    3   3 0.1535743 0.002692886 0.1483701 0.1589269
## 3    4   4 0.1535743 0.002692886 0.1483701 0.1589269
## 4    5   5 0.1535743 0.002692886 0.1483701 0.1589269
## 5    6   6 0.1535743 0.002692886 0.1483701 0.1589269
## 6    7   7 0.1535743 0.002692886 0.1483701 0.1589269
## 
## $N
##   estimate       se      lcl     ucl
## 1 6.923103 3.172978 2.819507 16.9992

The output shows that survival between capture events is 0.56, detection probability is 0.91, pent is 0.15 each capture event, and the number of unmarked individuals in the superpopulation is about 7 (so the super population is ~7 + 294 marked individuals =~ 301). Usually the estimated number of unmarked individuals will be higher, but since our detection probability is so high (0.91), there are few individuals that are not captured during the study.

So what about population size?

There is no direct estimate of population size in the model. The esitmate of “N” in the model output is for the number of unmarked individuals in the superpopulation.

To estimate population size, we can derive it using the model estimates.

\(N_{t+1}\) = (\(N_t\) * \(\Phi_t\) )+(\(N_{super}\) * \(pent_t\))

In words, this equation says that the population size is the previous population size times the survival rate (to calculate how many individuals survive) plus the number of new individuals from births and immigration.

To get the inital population size, we estimate \(pent_1\) by the constraint:

\(pent_1\) = 1 - \(\sum (pent_{2..t})\)

##   occ       Phi   Nsuper       pent         N
## 1   1 0.5587685 300.9231 0.07855408  23.63874
## 2   2 0.5587685 300.9231 0.15357432  59.42264
## 3   3 0.5587685 300.9231 0.15357432  79.41756
## 4   4 0.5587685 300.9231 0.15357432  90.59010
## 5   5 0.5587685 300.9231 0.15357432  96.83296
## 6   6 0.5587685 300.9231 0.15357432 100.32127
## 7   7        NA 300.9231 0.15357432 102.27043

We have now derived the population size (N) using the estimates form our Jolly-Seber model. But, how do we estimate standard error and confidence intervals?

How confident are we about these estimates?

To add 95% confidence intervals on these estimates, we will (for convenience) switch to another R package. Since the error from each parameter (\(\Phi\), pent, \(N_{super}\)) propagates throught the derived population size, it is not straight forward to estimate the standard error (and 95% CI) using marked.

Let’s use the RMark package, which makes computing the standard error and 95% CI much easier using the delta method. To do this by hand using our model results from the marked package is frustrating since it is difficult to get the variance-covariance matrix for our real estimtes (not the variance-covariance of the beta estimates). (if you find an easy way to do this, let me know!)

To use RMark you need to have the package installed and have the program MARK installed.

## This is RMark 2.2.6
##  Documentation available at http://www.phidot.org/software/mark/rmark/RMarkDocumentation.zip
##   Occasion         N       se      LCL       UCL
## 1        1  24.33323 5.060686 14.41429  34.25217
## 2        2  61.08258 2.529976 56.12383  66.04133
## 3        3  81.63592 2.694208 76.35527  86.91657
## 4        4  93.13108 3.849887 85.58530 100.67686
## 5        5  99.56014 4.849041 90.05602 109.06426
## 6        6 103.15582 5.568716 92.24113 114.07050
## 7        7 105.16682 6.056553 93.29598 117.03766

How close to the results from RMark line-up with our by-hand estimates from the marked package results?

Below, I plot the marked estimates in black, and the RMark estimates (and confidence intervals) in blue.

They are very similar, but not exactly the same. What gives?

The difference is because the estimates for \(N_{super}\) are slightly different in the models fit with different R packages (marked estimate = 300.9231, RMark estimate = 309.1735). The other parameter estimates are almost identical.

To review, we covered:

  • the basics of Jolly-Seber POPAN models
  • fitting POPAN models in R using marked (and RMark)
  • estimating population size (‘by-hand’ and using RMark’s popan.derived function)

Further reading

  1. Don’t already have your data in capture-history format? Check-out my past post on creating capture histories.

  2. Program MARK: a gentle introduction (e-book; free). http://www.phidot.org/software/mark/docs/book/.

  3. Phidot Forum (for MARK and RMark): http://www.phidot.org/forum/index.php

  4. Williams, Byron K., James D. Nichols, and Michael J. Conroy. 2002. Analysis and management of animal populations (usually available at University libraries)

  5. Rachel S. McCrea, Byron J. T. Morgan. 2014. Analysis of Capture-Recapture Data. https://www.crcpress.com/Analysis-of-Capture-Recapture-Data/McCrea-Morgan/p/book/9781439836590 (usually available at University libraries)

  6. marked Package Vignette https://cran.r-project.org/web/packages/marked/vignettes/markedVignette.pdf

  7. RMark Package Documentation https://cran.r-project.org/web/packages/RMark/RMark.pdf