Summarizing

Overview

Summarizing functions allow you to group images by deployment, location or taxon and compute detection or abundance statistics.

Here is a quick overview of the different summarizing functions and their description:

Function	Description
`compute_count_summary`	Computes a summary of images, records and taxa count by deployment.
`compute_detection`	Computes the detection (in terms of abundance or presence)of each taxon by deployment.
`compute_detection_history`	Computes the detection history (in terms of abundance or presence) by taxon and deployment, grouping observations into specific days-long intervals.
`compute_general_count`	Computes the general abundance and number of deployments for each taxon.
`compute_hill_numbers`	Computes the Hill numbers of order q (also called effective number of species) by site for some given values of q.

Except from the compute_detection_history function, all the summarizing functions have a groupby argument to specify whether the results should be grouped by deployment (using the deployment_id column in the images file) or by location (using the placename columns in the deployments file). By default, this argument is "deployment" but you might want to use "location" for those projects where each location had multiple deployments over time.

Another important thing to mention is that, because images can have multiple objects (i.e. animals), abundance across summarizing functions is computed by summing the number_of_objects column of the images file rather than counting each image as an individual.

Note

All the summarizing functions group images by taxa to compute the different statistics. If you want to ignore images that do not have a species or genus level identification, make sure to use the remove_unidentified function on the images dataframe first.

For every snippet of code showed here, we will assume you have already run the following code:

import wiutils

cameras, deployments, images, projects = wiutils.load_demo("cajambre")

Computing count summary

The compute_count_summary functions allows you to get a summary of images and identifications by deployment.

Note

The compute_count_summary function removes unidentified and duplicate images under the hood to get the number of identified images and records. Thus, you should avoid using these filters on the images dataframe before calling this function.

Here is an example:

>>> wiutils.compute_count_summary(images)

   deployment_id  total_images  identified_images  records  taxa
0    CTCAJ013743           501                397       33     9
1    CTCAJ023749            75                 59       10     8
2    CTCAJ033779           156                118       25    10
3    CTCAJ043772           351                291       26     6
4    CTCAJ053782           222                148       29     7
5    CTCAJ063750           186                153       31    11
6    CTCAJ073781           120                108       15     8
7    CTCAJ083775           600                544       56    11
8    CTCAJ093776           978                888       58    14
9    CTCAJ103744           174                118       22     8
10   CTCAJ113742           321                272       55     7
11   CTCAJ123777           210                138       38    11
12   CTCAJ133746           236                178       31    10
13   CTCAJ143747             1                  1        1     1
14   CTCAJ143748            75                 63       12     7
15   CTCAJ153745           111                 86       15     7
16   CTCAJ163747           327                288       54     9
17   CTCAJ183778           216                164       27     7
18   CTCAJ193741           393                322       56     8

By default, this function return a dataframe with the following columns:

total_images: the number of images for each deployment.
identified_images: the number of images for each deployment.
records: the number of records (i.e. number of individuals of the same taxa after duplicate image removal) for each deployment.
taxa: number of unique taxa for each deployment.

This function uses two filters to compute the last three columns: remove_unidentified and remove_duplicates. In order to control the behavior of these filters (i.e. change the default filter values), the compute_count_summary has two arguments: remove_unidentified_kws and remove_duplicates_kws. Both accept a dictionary with the arguments that are going to be passed to the remove_unidentified and the remove_duplicates functions. For example, you might want this summary to have the number of identified images for everything that was identified down to at least the family level and remove those duplicate images withing a five-day window:

>>> wiutils.compute_count_summary(images, remove_unidentified_kws={"rank": "family"}, remove_duplicates_kws={"interval": 5, "unit": "days"})

   deployment_id  total_images  identified_images  records  taxa
0    CTCAJ013743           501                379       16     8
1    CTCAJ023749            75                 57        8     7
2    CTCAJ033779           156                106       15     8
3    CTCAJ043772           351                291       12     6
4    CTCAJ053782           222                148       17     7
5    CTCAJ063750           186                153       19    11
6    CTCAJ073781           120                108       12     8
7    CTCAJ083775           600                535       24     8
8    CTCAJ093776           978                229       24    13
9    CTCAJ103744           174                118       14     8
10   CTCAJ113742           321                260       12     6
11   CTCAJ123777           210                135       19    10
12   CTCAJ133746           236                172       19     9
13   CTCAJ143747             1                  1        1     1
14   CTCAJ143748            75                 63       10     7
15   CTCAJ153745           111                 86       13     7
16   CTCAJ163747           327                270       14     7
17   CTCAJ183778           216                164       14     7
18   CTCAJ193741           393                319       14     7

Notice how there are fewer identified images, fewer number of records and fewer number of unique taxa in each deployment.

The compute_count_summary function also offer two arguments to add number of records and taxa divided by class: add_records_by_class and add_taxa_by_class:

>>> result = wiutils.compute_count_summary(images, add_records_by_class=True, add_taxa_by_class=True)
>>> result.columns

Index(['deployment_id', 'total_images', 'identified_images', 'records',
       'records_mammalia', 'records_aves', 'records_reptilia',
       'records_amphibia', 'taxa', 'taxa_mammalia', 'taxa_aves',
       'taxa_reptilia', 'taxa_amphibia'],
      dtype='object')

Notice how the result has now more columns, with records and taxa discriminated by taxonomic class. Depending on the number of unique classes present in the images file, the number of columns will vary. For example, if you had only mammals in your images, only two new columns would have been created: records_mammalia and taxa_mammalia.

If instead of grouping taxa by deployment you want to group taxa by location, use the groupby parameter (notice that you have to pass the deployments dataframe as well):

>>> wiutils.compute_count_summary(images, deployments, groupby="location")

   placename  total_images  identified_images  records  taxa
0    CTCAJ01           501                397       33     9
1    CTCAJ02            75                 59       10     8
2    CTCAJ03           156                118       25    10
3    CTCAJ04           351                291       26     6
4    CTCAJ05           222                148       29     7
5    CTCAJ06           186                153       31    11
6    CTCAJ07           120                108       15     8
7    CTCAJ08           600                544       56    11
8    CTCAJ09           978                888       58    14
9    CTCAJ10           174                118       22     8
10   CTCAJ11           321                272       55     7
11   CTCAJ12           210                138       38    11
12   CTCAJ13           236                178       31    10
13   CTCAJ14            76                 64       13     8
14   CTCAJ15           111                 86       15     7
15   CTCAJ16           327                288       54     9
16   CTCAJ18           216                164       27     7
17   CTCAJ19           393                322       56     8

Computing detection

The compute_detection function allows you to compute abundance by taxon by deployment (or location).

>>> wiutils.compute_detection(images)

             taxon deployment_id  value
0         Amphibia   CTCAJ013743      0
1         Amphibia   CTCAJ023749      0
2         Amphibia   CTCAJ033779      0
3         Amphibia   CTCAJ043772      0
4         Amphibia   CTCAJ053782      0
..             ...           ...    ...
565  Tinamus major   CTCAJ143748      0
566  Tinamus major   CTCAJ153745     48
567  Tinamus major   CTCAJ163747    127
568  Tinamus major   CTCAJ183778      6
569  Tinamus major   CTCAJ193741     13

[570 rows x 3 columns]

If instead of computing the number of observations we just wanted to compute presence and absence, we can set the compute_abundance parameter to False:

>>> wiutils.compute_detection(images, compute_abundance=False)

             taxon deployment_id  value
0         Amphibia   CTCAJ013743      0
1         Amphibia   CTCAJ023749      0
2         Amphibia   CTCAJ033779      0
3         Amphibia   CTCAJ043772      0
4         Amphibia   CTCAJ053782      0
..             ...           ...    ...
565  Tinamus major   CTCAJ143748      0
566  Tinamus major   CTCAJ153745      1
567  Tinamus major   CTCAJ163747      1
568  Tinamus major   CTCAJ183778      1
569  Tinamus major   CTCAJ193741      1

[570 rows x 3 columns]

If you prefer

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 [30

a wide-format table over a long-format table, use the pivot parameter: ht">

>>> wiutils.compute_detection(images, pivot=True) taxon  CTCAJ013743  ...  CTCAJ183778  CTCAJ193741 Amphibia            0  ...            0            0 Aves           18  ...            0            3 Canis familiaris            3  ...            0            0 Crax rubra            0  ...            0            0 Crypturellus berlepschi            0  ...            0            0 Cuniculus paca            0  ...            0            0 Dasyprocta punctata            0  ...            0            0 Dasypus novemcinctus            5  ...            0            0 Didelphidae            0  ...            0          165 Didelphis marsupialis            6  ...            0            0 Eira barbara            0  ...            0           12 Herpailurus yagouaroundi            0  ...            0            0 Homo sapiens          287  ...            9           30 Leopardus pardalis            6  ...            0            0 Leptotila            0  ...            0            0 Leptotrygon veraguensis            0  ...            0           16 Mazama temama            0  ...          110           81 Micrastur ruficollis           10  ...            0            0 Pecari tajacu            0  ...            0            0 Penelope ortoni            0  ...            0            0 Procyon cancrivorus            0  ...            0            0 Proechimys semispinosus            5  ...            9            0 Rallidae            0  ...            9            0 Rhynchortyx cinctus            0  ...            0            0 Rodentia            0  ...            0            0 Sciurus granatensis            0  ...            3            0 Squamata            0  ...            0            0 Tamandua mexicana            0  ...            0            3 Tigrisoma lineatum            0  ...           30            0 Tinamus major           57  ...            6           13 rows x 20 columns]
If instead of grouping observations by deployment you want to group observations by location, use the groupby parameter (notice that you have to pass the deployments dataframe as well):
             taxon placename  value
0         Amphibia   CTCAJ01      0
1         Amphibia   CTCAJ02      0
2         Amphibia   CTCAJ03      0
3         Amphibia   CTCAJ04      0
4         Amphibia   CTCAJ05      0
..             ...       ...    ...
535  Tinamus major   CTCAJ14      0
536  Tinamus major   CTCAJ15     48
537  Tinamus major   CTCAJ16    127
538  Tinamus major   CTCAJ18      6
539  Tinamus major   CTCAJ19     13

[540 rows x 3 columns]

Computing detection history
Detection histories summarize taxa observations in each deployment over time. They are useful to create occupancy models (see the R package unmarked for examples). The compute_detection_history allows you to create detection histories using an arbitrary number of days to group observations into.

Note
All images without an identification down to at least the class level will be removed before computing the detection histories. You can use the remove_unidentified function before if you want to narrow the images even further down to more specific rank (e.g. genus).

By default, observations are going to be grouped into one-day intervals but the days parameter lets you control this number. For example, we can group observations into multiple five-day intervals:
>>> wiutils.compute_detection_history(images, deployments, days=5)

              taxon deployment_id  timestamp  value
0          Amphibia   CTCAJ013743 2014-10-22    0.0
1          Amphibia   CTCAJ013743 2014-10-27    0.0
2          Amphibia   CTCAJ013743 2014-11-01    0.0
3          Amphibia   CTCAJ013743 2014-11-06    0.0
4          Amphibia   CTCAJ013743 2014-11-11    0.0
             ...           ...        ...    ...
6265  Tinamus major   CTCAJ193741 2014-11-21    0.0
6266  Tinamus major   CTCAJ193741 2014-11-26    0.0
6267  Tinamus major   CTCAJ193741 2014-12-01    0.0
6268  Tinamus major   CTCAJ193741 2014-12-06    0.0
6269  Tinamus major   CTCAJ193741 2014-12-11    NaN

[6270 rows x 4 columns]

In this result, there is a row for the observations (value column) during a specific interval for a taxon in a given deployment. One thing to note is that, by default, the interval start date (in this case 2014-10-22) is taken from the earliest start date of all the deployments. For different reasons that were already explained in the extraction section, you might want the interval start date to rather be the date of the first image across all the deployments. In that case, pass date_range="images" when calling the compute_detection_history function.
If you prefer a wide-format table over a long-format table, use the pivot parameter:
>>> wiutils.compute_detection_history(images, deployments, days=5, pivot=True)

             taxon deployment_id  ...  2014-12-06  2014-12-11
0         Amphibia   CTCAJ013743  ...         0.0         NaN
1         Amphibia   CTCAJ023749  ...         0.0         NaN
2         Amphibia   CTCAJ033779  ...         0.0         NaN
3         Amphibia   CTCAJ043772  ...         0.0         NaN
4         Amphibia   CTCAJ053782  ...         0.0         NaN
..             ...           ...  ...         ...         ...
565  Tinamus major   CTCAJ143748  ...         0.0         NaN
566  Tinamus major   CTCAJ153745  ...         3.0         NaN
567  Tinamus major   CTCAJ163747  ...        12.0         NaN
568  Tinamus major   CTCAJ183778  ...         0.0         NaN
569  Tinamus major   CTCAJ193741  ...         0.0         NaN

[570 rows x 13 columns]

Look how now there is a column for each interval and the number of rows is much smaller.
If instead of computing the number of observations we just wanted to compute presence and absence, we can set the compute_abundance parameter to False:
>>> wiutils.compute_detection_history(images, deployments, days=5, compute_abundance=False, pivot=True)

             taxon deployment_id  ...  2014-12-06  2014-12-11
0         Amphibia   CTCAJ013743  ...         0.0         NaN
1         Amphibia   CTCAJ023749  ...         0.0         NaN
2         Amphibia   CTCAJ033779  ...         0.0         NaN
3         Amphibia   CTCAJ043772  ...         0.0         NaN
4         Amphibia   CTCAJ053782  ...         0.0         NaN
..             ...           ...  ...         ...         ...
565  Tinamus major   CTCAJ143748  ...         0.0         NaN
566  Tinamus major   CTCAJ153745  ...         1.0         NaN
567  Tinamus major   CTCAJ163747  ...         1.0         NaN
568  Tinamus major   CTCAJ183778  ...         0.0         NaN
569  Tinamus major   CTCAJ193741  ...         0.0         NaN

[570 rows x 13 columns]

In the examples above, you can see that there are multiple NaN values. These correspond to intervals that are outside the corresponding deployment date range and are thus masked.
Computing general count
The compute_general_count allows you to create a summary of observations by taxon.
>>> wiutils.compute_general_count(images)

                       taxon     n  deployments
0                   Amphibia     3            1
1                       Aves  1014            9
2           Canis familiaris     3            1
3                 Crax rubra   274            7
4    Crypturellus berlepschi    12            2
5             Cuniculus paca   248           10
6        Dasyprocta punctata    16            3
7       Dasypus novemcinctus   414           10
8                Didelphidae   213            4
9      Didelphis marsupialis   199           13
10              Eira barbara    18            2
11  Herpailurus yagouaroundi     6            1
12              Homo sapiens   478           16
13        Leopardus pardalis    23            3
14                 Leptotila    21            4
15   Leptotrygon veraguensis   285            5
16             Mazama temama   285            5
17      Micrastur ruficollis    15            2
18             Pecari tajacu   335            2
19           Penelope ortoni    12            1
20       Procyon cancrivorus    24            3
21   Proechimys semispinosus   389           17
22                  Rallidae     9            1
23       Rhynchortyx cinctus    54            3
24                  Rodentia     3            1
25       Sciurus granatensis    23            5
26                  Squamata    18            3
27         Tamandua mexicana    76            8
28        Tigrisoma lineatum    30            1
29             Tinamus major   457           16

It shows the number of individuals for each taxon as well as the number of deployments where that particular taxon was recorded (at least once).
You can also add the higher taxonomic classification for each taxon using the add_taxonomy parameter:
>>> result = wiutils.compute_general_count(images, add_taxonomy=True)
>>> result.columns

Index(['taxon', 'n', 'deployments', 'class', 'order', 'family', 'genus',
       'species'],
      dtype='object')

If instead of getting the number of deployments where each taxon was recorded you want to get the number of locations, use the groupby parameter (notice that you have to pass the deployments dataframe as well):
>>> wiutils.compute_general_count(images, deployments, groupby="location")

                       taxon     n  locations
0                   Amphibia     3          1
1                       Aves  1014          9
2           Canis familiaris     3          1
3                 Crax rubra   274          7
4    Crypturellus berlepschi    12          2
5             Cuniculus paca   248         10
6        Dasyprocta punctata    16          3
7       Dasypus novemcinctus   414         10
8                Didelphidae   213          4
9      Didelphis marsupialis   199         13
10              Eira barbara    18          2
11  Herpailurus yagouaroundi     6          1
12              Homo sapiens   478         16
13        Leopardus pardalis    23          3
14                 Leptotila    21          4
15   Leptotrygon veraguensis   285          5
16             Mazama temama   285          5
17      Micrastur ruficollis    15          2
18             Pecari tajacu   335          2
19           Penelope ortoni    12          1
20       Procyon cancrivorus    24          3
21   Proechimys semispinosus   389         17
22                  Rallidae     9          1
23       Rhynchortyx cinctus    54          3
24                  Rodentia     3          1
25       Sciurus granatensis    23          5
26                  Squamata    18          3
27         Tamandua mexicana    76          8
28        Tigrisoma lineatum    30          1
29             Tinamus major   457         16

Computing hill numbers
Hill numbers or the effective number of species are diversity indices used to quantify the taxonomic diversity in a community. Hill numbers are parameterized by an order \(q\), which determines the sensitivity to species (or taxa) relative abundances.
For any value of \(q\) different from \(1\), the corresponding diversity index is computed as follows:
$$
^{q}D = \left(\sum_{i=1}^{R} p_{i}^{q}\right)^{1/(1-q)}
$$
When \(q=1\), the diversity index is computed as follows:
$$
^{1}D = exp \left(-\sum_{i=1}^{R} p_{i} \ln(p_{i})\right)
$$
In both equations, \(R\) is richness (i.e. total number of species or taxa) and \(p_i\) is the relative abundance of species or taxon \(i\).
The most common values of \(q\) that are used to compute Hill numbers are:

\(0\), which corresponds to species richness.
\(1\), which corresponds to the Shannon diversity.
\(2\), which corresponds to the Simpson diversity.

The compute_hill_numbers function allows you to compute these indices for any given values of \(q\), grouping taxa by deployment or location. By default, Hill numbers are computed using \(q\) values of \(0\), \(1\) and \(2\).
For example, we can compute species richness, Shannon diversity and Simpson diversity for the demo dataset:
>>> result = wiutils.compute_hill_numbers(images)
>>> result.head(9)  # Show just the first three deployments

  deployment_id  q          D
0   CTCAJ013743  0   9.000000
1   CTCAJ013743  1   2.773289
2   CTCAJ013743  2   1.828984
3   CTCAJ023749  0   8.000000
4   CTCAJ023749  1   5.444932
5   CTCAJ023749  2   4.080891
6   CTCAJ033779  0  10.000000
7   CTCAJ033779  1   7.020295
8   CTCAJ033779  2   5.701884

If you prefer wide-format tables over long-format tables, the compute_hill_numbers has a pivot parameter:
>>> result = wiutils.compute_hill_numbers(images, pivot=True)
>>> result.head(3)  # Show just the first three deployments

  deployment_id     0         1         2
0   CTCAJ013743   9.0  2.773289  1.828984
1   CTCAJ023749   8.0  5.444932  4.080891
2   CTCAJ033779  10.0  7.020295  5.701884

If you wanted to compute just the Shannon and Simpson diversity, you can use the q_values parameter to change the default values:
>>> result = wiutils.compute_hill_numbers(images, q_values=[1, 2])
>>> result.head(6) # Show just the first three deployments

  deployment_id  q         D
0   CTCAJ013743  1  2.773289
1   CTCAJ013743  2  1.828984
2   CTCAJ023749  1  5.444932
3   CTCAJ023749  2  4.080891
4   CTCAJ033779  1  7.020295
5   CTCAJ033779  2  5.701884

If instead of grouping taxa by deployment you want to group taxa by location, use the groupby parameter (notice that you have to pass the deployments dataframe as well):
>>> result = wiutils.compute_hill_numbers(images, deployments, groupby="location")
>>> result.head(9) # Show just the first three deployments

   placename  q          D
0    CTCAJ01  0   9.000000
1    CTCAJ01  1   2.773289
2    CTCAJ01  2   1.828984
3    CTCAJ02  0   8.000000
4    CTCAJ02  1   5.444932
5    CTCAJ02  2   4.080891
6    CTCAJ03  0  10.000000