Skip to content

Filtering

Overview

Filtering functions allow you to remove specific images from a Wildlife Insights dataset. These functions are useful to explore the images but also to filter them before using other functions, specially the summarizing ones.

Here is a quick overview of the different filtering functions and their description:

Function Description
remove_domestic Removes images where the identification corresponds to a domestic species.
remove_duplicates Removes duplicate records (images) from the same taxon in the same deployment given a time interval.
remove_inconsistent_dates Removes images where the timestamp is outside the date range of the corresponding deployment.
remove unidentified Removes unidentified (up to a specific taxonomic rank) images.

Note

All the filtering functions have a reset_index parameter that is False by default. If you want to have a new consecutive index from 0 to n - 1 (n being the number of rows in the dataframe) instead of keeping the original index after filtering the images, pass reset_index=True.

For every snippet of code showed here, we will assume you have already run the following code:

import wiutils

cameras, deployments, images, projects = wiutils.load_demo("cajambre")

Removing unidentified images

Camera traps are usually deployed to record animals in the wild. However, sometimes they might be activated by a moving leaf or branch and register images without any wildlife. Furthermore, there might be some cases where neither of the Wildlife Insights computer vision algorithm and the researcher were able to identify an animal on a given image.

If we check the images dataframe, we will see that it has 5253 rows (i.e. images):

>>> len(images)
5253

However, not all of these images have an identification:

>>> images["class"].isna().sum()
917
>>> images["genus"].isna().sum()
1881
Of all the images, 917 did not have any identification at all and 1881 were not identified down to at least the genus level.

We can use the remove_unidentified function to remove images without an identification down to a specific level. For example, we can remove all the images that did not have any identification at all:

>>> subset = wiutils.remove_unidentified(images, rank="class")
>>> len(subset)
4336

Likewise, we can remove all the images that were not identified down to the genus level:

>>> subset = wiutils.remove_unidentified(images, rank="genus")
>>> len(subset)
3372

The rank parameter accepts any of the following values:

  • "species"
  • "genus"
  • "family"
  • "order"
  • "class"

More specific ranks will likely have more images removed. For example, for rank="order" only images without an identification for order (and therefore all the lower ranks -family, genus and species-) will be removed. However, for rank="species", all the images that were not identified down to the species level (i.e. do not have an epithet) will be removed.

To better illustrate this, let's take a subset of the images dataframe:

>>> subset = images.loc[[1, 5, 639, 788]]
>>> columns = ["class", "order", "family", "genus", "species"]  # We will use this later.
>>> subset[columns]

        class             order       family      genus species
1        Aves  Struthioniformes    Tinamidae    Tinamus   major
5         NaN               NaN          NaN        NaN     NaN
639      Aves     Columbiformes   Columbidae  Leptotila     NaN
788  Mammalia   Didelphimorphia  Didelphidae        NaN     NaN

And now, let's remove unidentified images using different values for the rank parameter:

>>> wiutils.remove_unidentified(subset, rank="class")[columns]

        class             order       family      genus species
1        Aves  Struthioniformes    Tinamidae    Tinamus   major
639      Aves     Columbiformes   Columbidae  Leptotila     NaN
788  Mammalia   Didelphimorphia  Didelphidae        NaN     NaN

>>> wiutils.remove_unidentified(subset, rank="genus")[columns]

    class             order      family      genus species
1    Aves  Struthioniformes   Tinamidae    Tinamus   major
639  Aves     Columbiformes  Columbidae  Leptotila     NaN

>>> wiutils.remove_unidentified(subset, rank="species")[columns]

  class             order     family    genus species
1  Aves  Struthioniformes  Tinamidae  Tinamus   major
As you can see, when we pass rank="class" only one image is removed but when we pass rank="species" three images are removed.

Removing duplicate images

There are at least two different cases where you might consider a set of images as duplicates:

  • Your camera is configured to take multiple shots when activated, resulting in multiple images of the same individual when an animal activated it.
  • The camera captured an animal that returned after a certain time (could be seconds, minutes or even days) and activated the camera again, resulting in multiple images of the same individual.

The remove_duplicates function allows you to specify an arbitrary time window for which images in the same deployment with the same identification (i.e. lowest identified taxon) will be removed (except for the first record).

Let's create a subset (with only one deployment and one taxon) of the images dataframe to illustrate the two cases described above:

>>> subset = images[(images["deployment_id"] == "CTCAJ093776") & (images["genus"] == "Leptotila")]
>>> subset = subset.sort_values("timestamp")
>>> columns = ["deployment_id", "genus", "species", "timestamp"]  # We will use this later.
>>> subset[columns]

     deployment_id      genus species           timestamp
4170   CTCAJ093776  Leptotila     NaN 2014-11-06 07:20:10
1229   CTCAJ093776  Leptotila     NaN 2014-11-06 07:20:12
1405   CTCAJ093776  Leptotila     NaN 2014-11-06 07:20:12
1431   CTCAJ093776  Leptotila     NaN 2014-11-11 08:49:02
4136   CTCAJ093776  Leptotila     NaN 2014-11-11 08:49:02
1196   CTCAJ093776  Leptotila     NaN 2014-11-11 08:49:04
3740   CTCAJ093776  Leptotila     NaN 2014-11-15 09:27:46
3846   CTCAJ093776  Leptotila     NaN 2014-11-15 09:27:48
4185   CTCAJ093776  Leptotila     NaN 2014-11-15 09:27:48
3894   CTCAJ093776  Leptotila     NaN 2014-11-18 10:02:16
1404   CTCAJ093776  Leptotila     NaN 2014-11-18 10:02:18
1443   CTCAJ093776  Leptotila     NaN 2014-11-18 10:02:18

Notice how some images are just a few seconds apart from each other; it is evident that there are four groups of images. Instead of overestimating the records of Leptotila, we might assume that each group of images corresponds to one individual and that there are four different individuals of that genus in total (see the highlighted rows). Because within each group the images are just two seconds apart from each other, we can use an arbitrary time window of five seconds to remove duplicate images:

>>> result = wiutils.remove_duplicates(subset, interval=5, unit="seconds")
>>> result[columns]

     deployment_id      genus species           timestamp
4170   CTCAJ093776  Leptotila     NaN 2014-11-06 07:20:10
1431   CTCAJ093776  Leptotila     NaN 2014-11-11 08:49:02
3740   CTCAJ093776  Leptotila     NaN 2014-11-15 09:27:46
3894   CTCAJ093776  Leptotila     NaN 2014-11-18 10:02:16

Now there are only four images for that genus.

For the second case, we might assume that images within a four-day interval correspond to the same individual. Thus, we are going to remove those images using an arbitrary window of four days.

>>> result = wiutils.remove_duplicates(subset, interval=4, unit="days")
>>> result[columns]

     deployment_id      genus species           timestamp
4170   CTCAJ093776  Leptotila     NaN 2014-11-06 07:20:10
1431   CTCAJ093776  Leptotila     NaN 2014-11-11 08:49:02
3740   CTCAJ093776  Leptotila     NaN 2014-11-15 09:27:46

Now there are only three images because the third and fourth group are considered as one individual.

Note

The remove_duplicates function recognizes duplicates of the same taxon, regardless of the taxonomic rank. For example, if you have one deployment with images that were identified down to the Leptotila genus and images that were identified as Leptotila verreauxi, duplicates will be recognized independently for each taxon, regardless of the time window used.

By default, the remove_duplicates function uses a five-minute interval but depending on your project, you might want to use a different time window. The interval parameter accepts any positive integer and the unit parameter has to be one of:

  • "weeks"
  • "days"
  • "hours"
  • "minutes"
  • "seconds"

Removing images with domestic species

Depending on where the camera traps were deployed, it is not uncommon to register domestic species (e.g. dogs, cats or pigs). For different analysis, such as computing diversity indices, you might want to ignore these species. The remove_domestic function does this by removing images from the following species (or subspecies):

  • Cat: Felis catus
  • Cattle: Bos bubalis, Bos taurus, Bubalus bubalis
  • Chicken: Gallus domesticus, Gallus gallus domesticus
  • Dog: Canis familiaris, Canis familiaris domesticus, Canis lupus familiaris
  • Donkey: Equus asinus
  • Duck: Anas platyrhynchos, Anas platyrhynchos domesticus
  • Goat: Capra hircus
  • Goose: Anser anser, Anser cygnoides
  • Guinea fowl: Numida meleagris, Phasianus meleagris
  • Horse: Equus caballus, Equus ferus caballus
  • Human: Homo sapiens
  • Muscovy duck: Cairina moschata domestica
  • Pig: Sus domesticus, Sus scrofa domesticus
  • Sheep: Ovis aries
  • Turkey: Meleagris gallopavo

Note

This is not a comprehensive list of domestic species and it might be improved. Feel free to create a pull request if you want to add more species to the list.

Using the images dataframe, we can remove domestic species as follows:

>>> subset = wiutils.remove_domestic(images)
>>> len(subset)
4772
>>> len(images) - len(subset)
481

We can see that there were 481 images with identified domestic species.

It is possible that in some cases domestic species were identified down just to the genus but not the species level. By default, the remove_domestic function extracts the scientific names from the images and compares it to the list of domestic species. However, it offers a broader strategy that uses just the genera from both the images and the list of domestic species. To use this strategy, use the broad parameter:

wiutils.remove_domestic(images, broad=True)

Warning

When passing broad=True to the remove_domestic function, there are some special cases where non-domestic species might be deleted. For example, if you have images from both dogs and wolfs (genus Canis), their records will be removed when using a broader strategy.

Removing images with inconsistent dates

As shown in the extraction section, there might be cases where image dates do not coincide with their corresponding deployment dates. Usually because of camera misconfiguration, this can lead to images having dates that are outside the deployment range. In certain scenarios where associated dates are essential (e.g. computing detection histories), it is probably a good idea to remove those images. The remove_inconsistent_dates removes all the images whose date is outside the corresponding deployment range.

Warning

The remove_inconsistent_dates assumes that deployment date ranges coming from Wildlife Insights are correct. Wrong dates could lead to the removal of consistent images.

The cajambre demo dataset does not have any inconsistent images, so we will have to modify some images' dates to show how this function work. Let's take the first deployment (CTCAJ013743) as an example, using the get_date_ranges function:

>>> date_ranges = wiutils.get_date_ranges(images, deployments, source="both", pivot=True)
>>> date_ranges.loc["CTCAJ013743"]

            source
start_date  deployments   2014-10-22
            images        2014-10-22
end_date    deployments   2014-12-08
            images        2014-12-08
Name: CTCAJ013743, dtype: datetime64[ns]

This particular deployment was working from 2014-10-22 to 2014-12-08 and its first and last image coincide with those dates. Let's subtract a few days from the dates of all the images of that deployment so some fall outside the range:

>>> import pandas as pd
>>> images.loc[images["deployment_id"] == "CTCAJ013743", "timestamp"]

0      2014-12-07 07:46:00
1      2014-11-21 05:58:38
2      2014-12-07 07:46:02
3      2014-11-21 05:58:22
4      2014-10-22 12:31:30
               ...
2857   2014-10-23 06:09:42
2858   2014-10-21 12:09:38
2859   2014-10-22 10:15:42
2860   2014-10-22 09:53:50
2861   2014-10-22 10:53:52
Name: timestamp, Length: 501, dtype: datetime64[ns]

>>> images_copy = images.copy()  # Copy the dataframe before modifying it.
>>> images_copy.loc[images_copy["deployment_id"] == "CTCAJ013743", "timestamp"] -= pd.DateOffset(days=20)
>>> images_copy.loc[images_copy["deployment_id"] == "CTCAJ013743", "timestamp"]

0      2014-11-18 07:46:00
1      2014-11-02 05:58:38
2      2014-11-18 07:46:02
3      2014-11-02 05:58:22
4      2014-10-03 12:31:30
               ...
2857   2014-10-04 06:09:42
2858   2014-10-02 12:09:38
2859   2014-10-03 10:15:42
2860   2014-10-03 09:53:50
2861   2014-10-03 10:53:52
Name: timestamp, Length: 501, dtype: datetime64[ns]

Now, let's check the date ranges again:

>>> date_ranges = wiutils.get_date_ranges(images_copy, deployments, source="both", pivot=True)
>>> date_ranges.loc["CTCAJ013743"]

            source
start_date  deployments   2014-10-22
            images        2014-10-02
end_date    deployments   2014-12-08
            images        2014-11-18
Name: CTCAJ013743, dtype: datetime64[ns]

There is a 20-day difference between deployments and images ranges. We can now use the remove_inconsistent_dates function:

>>> subset = wiutils.remove_inconsistent_dates(images_copy, deployments)
>>> len(images_copy) - len(subset)
390

There were 390 inconsistent images in our modified images dataframe.