Do you do mark-recapture analyses? The required input data are “capture histories”: binary strings to describe when animals (or plants) were found in a study.

For example, a capture history of “101” describes a study where an individual was:

  • caught on the first event,
  • not observed on the second event, and
  • observed on the third event.

Data from mark-recapture and mark-resight studies are not usually recorded or stored in this format. Often data are stored in ‘long-form’ where each capture or resighting is a row. Plus, events when an individual are not re-sighted do not usually have rows. A very simple example:

id event
a 1
b 1
c 1
b 2
b 3
c 3

Notice that individual “a” was not sighted in event 2 or 3, but there are only rows for when it was sighted.

Using the dplyr and tidyr packages, it is easy to create individual capture histories from this type of long-form data, so that you can spend more time on your data analyses, and less time manually messing with capture history data. Another reason to use this approach: manually creating capture histories will increase the chances of introducing errors into your data.

First, load packages we’ll use:


Now, let’s simulate some long-form capture data:

# Simulate some data (long-form)
set.seed(1111) # makes it repeatable <- data.frame(id = c(sample(1:50, 20, replace = FALSE), 
                                sample(1:50, 20, replace = FALSE), 
                                sample(1:50, 20, replace = FALSE)),
                         event = c(rep(1, 20), rep(2, 20), rep(3, 20)),
                         detect = 1)

Take a look at first few rows of our simulated data:

id event detect
24 1 1
21 1 1
44 1 1
7 1 1
34 1 1
48 1 1

To create our capture histories, we will use:

  • spread and unite from the tidyr package,
  • and group_by and summarize from the dplyr package.
capt.hist <- %>%
  # remove duplicates, which may occur when individuals are caught multiple times in an event
  # For example, your event may be a year and an individual may be caught multiple times in a year.
  distinct() %>%
  # spread out data. The fill = 0 adds rows for combinations of id and event where individuals were not observerd
  spread(event, detect, fill = 0) %>% 
  # For every individual....
  group_by(id) %>%
  # Paste together 0's and 1's
  # Unite is similar to paste. Here we are pasting the strings together from the second column (first capture event)
  # to the last capture event ("tail(names(.),1)").
  # we don't want any characters separating 0's and 1's, so we use: sep = ""
  unite("ch", 2:tail(names(.),1), sep = "")

Let’s take a peek at the first few rows:

id ch
1 110
2 010
3 010
4 010
6 100
7 110

Now, we’re ready to build some basic mark-recapture models, such as Cormack-Jolly-Seber models. To modify to your own data, replace id with the column you use for identifying individuals, and event for whatever column identifies the capture session (e.g. year, month, etc.).

For adapting this to your own data, the event column needs to either be ordered chronologically (e.g. 1, 2, 3 or 2019, 2020, 2021) or alphabetically (e.g. event a, event b, event c) for the capture histories to be ordered correctly.