A large-scale reference dataset for bioacoustics Please find the accompanying code at our official repository:
github.com/livingingroups/animal2vec [Optional ]You can find the animal2vec model weights using the MeerKAT dataset
here. MeerKAT is a 1068h large-scale dataset containing data from boom-mics and audio-recording collars worn by free-ranging meerkats (Suricata suricatta) at the Kalahari Research Centre, South Africa, of which 184h are labeled with twelve time-resolved vocalization-type ground truth target classes, each with millisecond resolution. The labeled 184h MeerKAT subset exhibits realistic sparsity conditions for a bioacoustic dataset (96% background-noise or other signals and 4% vocalizations), dispersed across 66398 10-second samples, spanning 251562 labeled events and showcasing significant spectral and temporal variability, making it a large-scale reference point with real-world conditions for benchmarking pretraining and finetuning approaches in bioacoustics deep learning.
The majority of the audio originates from acoustic collars (Edic Mini Tiny+ A77, Zelenograd, Russia, which sample at 8kHz with 10bit quantization) that were attached to the animals (41 individuals throughout both campaigns), where each file corresponds to a recording for a single individual and day. The remainder of the dataset was recorded using Marantz PMD661 digital recorders (Carlsbad, CA, U.S.) attached to directional Sennheiser ME66 microphones (Wedemark, Germany) sampling at 48kHz with 32bit quantization. When recording, field researchers held the microphones close to the animals (within 1m). The data were recorded during times when meerkats typically forage for food by digging in the ground for small prey. See our paper and [1] and [2] for more details.
MeerKAT is released as 384 592 10-second samples, amounting to 1068 h, where 66 398 10-second samples (184 h) are labeled and ground-truth-complete; all call and recurring anthropogenic events in this 184 h are labeled. For further details, see [2]. All samples have been standardized to a sample rate of 8 kHz with 16-bit quantization, sufficient to capture the majority of meerkat vocalization frequencies (the first two formants are below the Nyquist frequency of 4 kHz). The total dataset size of 59 GB (61 GB, including the label files) is comparatively small, making MeerKAT easily accessible and portable despite its extensive length. Each 10-second file has an accompanying HDF5 label file that lists label categories, start and end time offsets (s), and a "focal" designation indicating whether the call was given by the collar-wearing or followed individual or not.
By agreement with the Kalahari Research Centre (KRC), we have made these data available in a way that can further machine learning research without compromising the ability of the KRC to continue conducting valuable ecological research on these data.
Consequently, the filenames of the 10-second samples have been randomly sampled, and their temporal order and individual identity cannot be recovered, but can be requested from us.
[1] Demartsev, V. et al. Signalling in groups: New tools for the integration of animal communication and collective movement. Methods Ecol. Evol. (2022).
[2] Demartsev, V. et al. Mapping vocal interactions in space and time differentiates signal broadcast versus signal exchange in meerkat groups. Philos. Trans. R. Soc. Lond. B Biol. Sci. 379 (2024)