Skip to content

Darwin Core

Overview

Darwin Core functions allow you to convert tables from Wildlife Insights format to the Darwin Core Standard. This is useful to publish information from Wildlife Insights projects to different biodiversity information centers (e.g. GBIF).

Here is a quick overview of the different Darwin Core functions and their description:

Function Description
create_dwc_archive Creates a Darwin Core Archive consisting of four different cores and extensions: Event, Occurrence, Measurement or Facts and Simple Multimedia.
create_dwc_event Creates a Darwin Core Event dataframe from deployments and projects information.
create_dwc_measurement Creates a Darwin Core Measurement or Facts dataframe from cameras and deployments information.
create_dwc_multimedia Creates a Darwin Core Simple Multimedia dataframe from images and deployments information.
create_dwc_occurrence Creates a Darwin Core Occurrence dataframe from images, deployments and projects information

Note

The Darwin Core functions only map available information from Wildlife Insights to Darwin Core Standard terms across four different cores/extensions. However, it is possible that you might want to add more terms and complement this information before publishing it (e.g. adding geographic or taxonomic details to the events and occurrences). regi0, another Python package, might be useful for this.

For every snippet of code showed here, we will assume you have already run the following code:

import wiutils

cameras, deployments, images, projects = wiutils.load_demo("cajambre")

Creating the Event Core

The Darwin Core Event is the category of information pertaining to an action that occurs at some location during some time. In this context, events can be thought of as deployments.

Here is a map between Wildlife Insights fields from different tables (source) and Darwin Core Event terms. Some terms have constant values or are computed using existing information:

source (WI) field (WI) term (DwC) constant comments
deployments deployment_id eventID
deployments placename parentEventID
- - samplingProtocol camera trap
- - sampleSizeValue 1
- - sampleSizeUnit camera
- - samplingEffort Delta (in days) between end_date and start_date + "trap-nights".
deployments start_date, end_date eventDate Concatenation of both fields using "/".
deployments event_description eventRemarks
projects country_code countryCode Converted from ISO 3166-1 alpha-3 to ISO 3166-1 alpha-2.
deployments feature_type locationRemarks
deployments latitude decimalLatitude
deployments longitude decimalLongitude
- - geodeticDatum WGS84
projects project_admin_organization institutionCode

The create_dwc_event function allows you to create a dataframe with this information. You'll need to pass both the deployments and projects dataframes:

>>> event =  wiutils.create_dwc_event(deployments, projects)
>>> event

        eventID parentEventID  ... geodeticDatum     institutionCode
0   CTCAJ103744       CTCAJ10  ...         WGS84  Instituto Humboldt
1   CTCAJ033779       CTCAJ03  ...         WGS84  Instituto Humboldt
2   CTCAJ163747       CTCAJ16  ...         WGS84  Instituto Humboldt
3   CTCAJ193741       CTCAJ19  ...         WGS84  Instituto Humboldt
4   CTCAJ083775       CTCAJ08  ...         WGS84  Instituto Humboldt
5   CTCAJ123777       CTCAJ12  ...         WGS84  Instituto Humboldt
6   CTCAJ143748       CTCAJ14  ...         WGS84  Instituto Humboldt
7   CTCAJ133746       CTCAJ13  ...         WGS84  Instituto Humboldt
8   CTCAJ073781       CTCAJ07  ...         WGS84  Instituto Humboldt
9   CTCAJ183778       CTCAJ18  ...         WGS84  Instituto Humboldt
10  CTCAJ093776       CTCAJ09  ...         WGS84  Instituto Humboldt
11  CTCAJ113742       CTCAJ11  ...         WGS84  Instituto Humboldt
12  CTCAJ153745       CTCAJ15  ...         WGS84  Instituto Humboldt
13  CTCAJ043772       CTCAJ04  ...         WGS84  Instituto Humboldt
14  CTCAJ063750       CTCAJ06  ...         WGS84  Instituto Humboldt
15  CTCAJ023749       CTCAJ02  ...         WGS84  Instituto Humboldt
16  CTCAJ053782       CTCAJ05  ...         WGS84  Instituto Humboldt
17  CTCAJ013743       CTCAJ01  ...         WGS84  Instituto Humboldt
18  CTCAJ143747       CTCAJ14  ...         WGS84  Instituto Humboldt

[19 rows x 14 columns]

>>> event.columns

Index(['eventID', 'parentEventID', 'samplingProtocol', 'sampleSizeValue',
       'sampleSizeUnit', 'samplingEffort', 'eventDate', 'eventRemarks',
       'countryCode', 'locationRemarks', 'decimalLatitude', 'decimalLongitude',
       'geodeticDatum', 'institutionCode'],
      dtype='object')

You can then save the file to different formats:

>>> event.to_csv("Event.csv", index=False)  # csv
>>> event.to_csv("Event.txt", index=False, header=None, sep=" ")  # txt
>>> event.to_excel("Event.xlsx", index=False)  # xlsx

Creating the Occurrence Core / Extension

The Darwin Core Occurrence is the category of information pertaining to the existence of an Organism at a particular place at a particular time. In this context, occurrences can be thought of as biodiversity records (i.e. non-duplicate images with identified wildlife).

Here is a map between Wildlife Insights fields from different tables (source) and Darwin Core Occurrence terms. Some terms have constant values or are computed using existing information:

source (WI) field (WI) term (DwC) constant comments
images, deployments deployment_id eventID
deployments placename parentEventID
images timestamp eventDate Extracted date from timestamp
images timestamp eventTime Extracted time from timestamp
images identified_by identifiedBy
images uncertainty identificationRemarks
images image_id recordNumber
deployments recorded_by recordedBy
images number_of_objects organismQuantity
- - organismQuantityType individual(s)
images sex sex
images age lifeStage
- - preparations photograph
images location associatedMedia Because one occurrence can have multiple associated images, this field is a pipe separated list of those images location. Also, image locations are converted from Google Cloud Storage URI (gs://) to an HTTPS URL.
images individual_animal_notes occurrenceRemarks
images individual_id organismID
projects project_admin_organization institutionCode
- - basisOfRecord MachineObservation
images wi_taxon_id taxonID
- - scientificName Computed using wiutils.get_lowest_taxon.
- - kingdom Animalia
- - phylum Chordata
images class class
images order order
images family family
images genus genus
images species specificEpithet Corresponds to the first word in the species field.
images species infraspecificEpithet Corresponds to the second word in the species field.
- - taxonRank Computed using wiutils.get_lowest_taxon.
images common_name vernacularName

The create_dwc_occurrence function allows you to create a dataframe with this information. The create_dwc_event allow you to create a dataframe with this information. You'll need to pass the images, deployments and projects dataframes. Because occurrences only consider non-duplicate images, this function allows you to pass an arbitrary time window to remove duplicate images (using the remove_duplicates function under the hood) with the remove_duplicate_kws parameter. Here is an example to create the Occurrence Core / Extension using a one-hour window:

>>> occurrence = wiutils.create_dwc_occurrence(images, deployments, projects, remove_duplicate_kws={"interval": 1, "unit": "hours"})
>>> occurrence

         eventID parentEventID  ... taxonRank                vernacularName
0    CTCAJ013743       CTCAJ01  ...   species                         Human
1    CTCAJ013743       CTCAJ01  ...   species                 Great Tinamou
2    CTCAJ013743       CTCAJ01  ...   species                         Human
3    CTCAJ013743       CTCAJ01  ...   species              Tome's Spiny Rat
4    CTCAJ013743       CTCAJ01  ...   species                 Great Tinamou
..           ...           ...  ...       ...                           ...
567  CTCAJ193741       CTCAJ19  ...   species                         Tayra
568  CTCAJ193741       CTCAJ19  ...    family                 Possum Family
569  CTCAJ193741       CTCAJ19  ...    family                 Possum Family
570  CTCAJ193741       CTCAJ19  ...   species  Central American Red Brocket
571  CTCAJ193741       CTCAJ19  ...    family                 Possum Family

[572 rows x 30 columns]

>>> occurrence.columns

Index(['eventID', 'parentEventID', 'eventDate', 'eventTime', 'identifiedBy',
       'identificationRemarks', 'recordNumber', 'recordedBy',
       'organismQuantity', 'organismQuantityType', 'sex', 'lifeStage',
       'preparations', 'associatedMedia', 'occurrenceRemarks', 'organismID',
       'institutionCode', 'basisOfRecord', 'taxonID', 'scientificName',
       'kingdom', 'phylum', 'class', 'order', 'family', 'genus',
       'specificEpithet', 'infraspecificEpithet', 'taxonRank',
       'vernacularName'],
      dtype='object')

You can then save the file to different formats:

>>> occurrence.to_csv("Occurrence.csv", index=False)  # csv
>>> occurrence.to_csv("Occurrence.txt", index=False, header=None, sep=" ")  # txt
>>> occurrence.to_excel("Occurrence.xlsx", index=False)  # xlsx

Creating the Measurement or Facts Extension

The Darwin Core Simple Measurement or Facts is a support for measurements or facts, allowing links to any type of Core. In this context, it has relevant information about cameras and deployments and is linked to the Event Core.

Here is a map between Wildlife Insights fields from different tables (source) and four Darwin Core Measurement or Facts terms. The measurementType and measurementUnit terms have constant values:

term (DwC) measurementType measurementValue measurementUnit measurementRemarks
source (WI) constant field (WI) constant field (WI)
cameras camera make make
cameras camera serial number serial_number
cameras camera year purchased year_purchased
deployments bait type bait_type bait_description
deployments quiet period quiet_period seconds
deployments camera functioning camera_functioning
deployments sensor height sensor_height height_other
deployments sensor orientation sensor orientation orientation_other
deployments plot treatment plot_treatment plot_treatment_description
deployments detection distance detection_distance meters

The create_dwc_measurement function allows you to create a dataframe with this information. You'll need to pass both the deployments and cameras dataframes:

>>> measurement = wiutils.create_dwc_measurement(deployments, cameras)

         eventID     measurementType  ... measurementUnit measurementRemarks
0    CTCAJ103744         camera make  ...             NaN                NaN
1    CTCAJ033779         camera make  ...             NaN                NaN
2    CTCAJ163747         camera make  ...             NaN                NaN
3    CTCAJ193741         camera make  ...             NaN                NaN
4    CTCAJ083775         camera make  ...             NaN                NaN
..           ...                 ...  ...             ...                ...
109  CTCAJ063750  sensor orientation  ...             NaN                NaN
110  CTCAJ023749  sensor orientation  ...             NaN                NaN
111  CTCAJ053782  sensor orientation  ...             NaN                NaN
112  CTCAJ013743  sensor orientation  ...             NaN                NaN
113  CTCAJ143747  sensor orientation  ...             NaN                NaN

[114 rows x 5 columns]

Index(['eventID', 'measurementType', 'measurementValue', 'measurementUnit',
       'measurementRemarks'],
      dtype='object')

Notice that there is an extra column (eventID) which links the measurements to their respective camera/deployment. Even though this term is not on the [extension description]((https://rs.gbif.org/extension/dwc/measurements_or_facts_2022-02-02.xml), it is used to link to the Event Core when using tools such as the Integrated Publishing Toolkit (IPT).

>>> measurement.to_csv("MeasurementOrFacts.csv", index=False)  # csv
>>> measurement.to_csv("MeasurementOrFacts.txt", index=False, header=None, sep=" ")  # txt
>>> measurement.to_excel("MeasurementOrFacts.xlsx", index=False)  # xlsx

Creating the Simple Multimedia Extension

The Darwin Core Simple Multimedia is a simple extension for exchanging metadata about multimedia resources, in particular links to image, video and audio files. In this context, it has relevant information about all the images from a project, including those without identified wildlife.

Here is a map between Wildlife Insights fields from different tables (source) and Darwin Core Simple Multimedia terms. Some terms have constant values or are computed using existing information:

source (WI) field (WI) term (DwC) constant comments
images, deployments deployment_id eventID
- - type Image
- - format image/jpeg
images image_id identifier
images location references Image location is converted from Google Cloud Storage URI (gs://) to an HTTP URL.
images - title Computed using wiutils.get_lowest_taxon. For blank or unidentified images, title is 'Blank or unidentified'.
images timestamp created
deployments recorded_by creator
images identified_by contributor
- - publisher Wildlife Insights
images license license

The create_dwc_multimedia function allows you to create a dataframe with this information. You'll need to pass both the images and deployments dataframes:

>>> multimedia = wiutils.create_dwc_multimedia(images, deployments)
>>> multimedia

          eventID   type  ...          publisher   license
0     CTCAJ013743  Image  ...  Wildlife Insights  CC-BY-NC
1     CTCAJ013743  Image  ...  Wildlife Insights  CC-BY-NC
2     CTCAJ013743  Image  ...  Wildlife Insights  CC-BY-NC
3     CTCAJ013743  Image  ...  Wildlife Insights  CC-BY-NC
4     CTCAJ013743  Image  ...  Wildlife Insights  CC-BY-NC
           ...    ...  ...                ...       ...
5248  CTCAJ193741  Image  ...  Wildlife Insights  CC-BY-NC
5249  CTCAJ193741  Image  ...  Wildlife Insights  CC-BY-NC
5250  CTCAJ193741  Image  ...  Wildlife Insights  CC-BY-NC
5251  CTCAJ193741  Image  ...  Wildlife Insights  CC-BY-NC
5252  CTCAJ193741  Image  ...  Wildlife Insights  CC-BY-NC

[5253 rows x 11 columns]

>>> multimedia.columns

Index(['eventID', 'type', 'format', 'identifier', 'references', 'title',
       'created', 'creator', 'contributor', 'publisher', 'license'],
      dtype='object')

>>> multimedia.to_csv("SimpleMultimedia.csv", index=False)  # csv
>>> multimedia.to_csv("SimpleMultimedia.txt", index=False, header=None, sep=" ")  # txt
>>> multimedia.to_excel("SimpleMultimedia.xlsx", index=False)  # xlsx

Creating the Darwin Core Archive

The Darwin Core Archive is a structured collection of text files containing different cores and extensions (see more about this on https://www.gbif.org/darwin-core). In this context, we refer to the Darwin Core Archive as the four cores / extensions that result from Wildlife Insights information:

  • Darwin Core Event
  • Darwin Core Occurrence
  • Darwin Core Simple Multimedia
  • Darwin Core Measurement or Facts

By having these four files, you can use tools such as the Integrated Publishing Toolkit (IPT) to publish the project's information.

The create_dwc_archive function uses the other four Darwin Core functions described above to conveniently create these four dataframes at once. Notice that this function also has the remove_duplicate_kws parameter:

>>> event, occurrence, measurement, multimedia = wiutils.create_dwc_archive(cameras, deployments, images, projects, remove_duplicate_kws={"interval": 1, "unit": "hours"})