Deep Learning Meets the Deep Sea

Bioacoustics for Conservation

BY Matt James & Peter Bermant

How can we best understand and protect wild species?

The identification and monitoring of species is a critical aspect of conservation efforts—helping us answer fundamental questions and respond appropriately to contemporary challenges. A range of tools have emerged in the last decades, and changes in monitoring design can revolutionize the utility of data in conservation. Satellite imagery has recently transformed scientists’ understanding of species for example, but imaging modalities can neither see past obstructing layers such as a dense canopy, nor transmit or receive signals underwater (among other limitations). 


The promise of Bioacoustics

Bioacoustics (bī-(ˌ)ō-ə-ˈkü-stiks). Noun. From Greek βίος (“bios”) for “life” and ἀκουστικός (“akoustikos”) for “ready to hear”.

Cue Bioacoustics—the branch of acoustics concerned with the sound production, dispersion and reception of animals. Audio recorders can now non-invasively record many species over large areas, unobstructed. This means that vast volumes of bioacoustic data can be acquired autonomously and remotely, requiring little human intervention—and heralding a golden age for bioacoustics.


What are ongoing challenges to bioacoustic data processing?

Advances in deep learning have revolutionized the analysis of troves of bioacoustic data. But processing all this data has recently faced two big bottlenecks. 

  1. Methods have heavily relied on the manual annotation of acoustic spectrograms to identify whale, bird, or turtle sounds for example on a recording that could include hundreds of species—requiring tedious and time-consuming work.
  2. Many approaches have treated acoustic event detection as a binary classification task, identifying the presence of an event rather than pinpointing its precise boundaries—limiting deeper analyses. 

The Colossal solution: Self-supervised deep learning for bioacoustic event detection

In a recent paper, we apply advanced self-supervised deep learning methods to enable the automated and precise detection of bioacoustic events. Relying on contrastive representation learning (1), we use a deep neural network trained on real-world bioacoustic data to uncover hidden or complex patterns and extract features relevant to optimizing the contrastive learning objective. Applied to a bioacoustic dataset of choice, a peak-finding algorithm detects regions of high dissimilarity between these features in adjacent acoustic windows. These correspond to the temporal boundaries of bioacoustic events. 

As for background noise? This is a core challenge to the processing of real-world bioacoustic data using unsupervised schemes, as the noises of human enterprise are now ubiquitous, up to 10 times higher than just decades ago (3). We address this by integrating on-the-fly noise reduction layers into the model architecture (2). 

Most importantly, though? Our model requires no manual annotation. 

“The unprecedented quantities of high-quality ecological data combined with the shift toward more extreme approaches to conservation demand new techniques to automate data processing,” explains our study’s senior computational scientist Leandra Brickson.

Sperm whale coda click detection 

While the first bioacoustic recording of marine mammals—beluga whales in 1949—led to an explosion of whale research for the decades to come, many species of whales remain listed as endangered under the Endangered Species Act. How can we best identify them in order to protect them? 

Sperm whales emit these extraordinarily idiosyncratic series of clicks called codas to communicate (4). Applying our model to 90 minutes of sperm whale click sound data obtained from recordings from the Woods Hole Oceanographic Institution allows us to successfully detect these coda clicks—outperforming traditional threshold-based baseline methods on precision and recall. 


How is our model applied to a range of real-world data?

Bengalese finch song segmentation

Studying birdsongs has taught us about everything from how we learn to how we can avoid destroying an entire ecosystem.

We demonstrate the viability of our model in the specific analysis of Bengalese finch vocalizations. Bengalese finch produce charming if choppy melodies consisting of sequential vocal elements, or syllables. We reformulate the detection problem as a segmentation problem on a collection of songs from 4 Bengalese finches to detect the onsets and offsets of signals that correspond to salient vocal units.

What’s so powerful about this? This lays the groundwork for our model to be adapted to the communication architectures of a variety of species—past, present, and future.

Green sea turtle behavioral dynamics using movement data 

Nearly all species of sea turtle are now classified as endangered. Exploiting acceleration, gyroscopic, and depth information recorded from sensors attached to 13 turtles, our contrastive detection framework was able to predict transitions in green sea turtle behavioral dynamic data. Such automatic segmentation of animal behavior data will help monitor and respond to the behaviors of free-ranging sea turtles across their entire life cycle, from mating and nesting to migration. 


How are bioacoustic data insights answering key conservation questions?

Extending our work to a host of real-world data, countless are the conservation questions we can continue or even begin to answer (5).


Animal identification

Animals emit sounds that can be used to uniquely identify them—think bioacoustic fingerprints. And a slew of cryptic species are being uncovered with the help of diagnostic vocalizations. Who knew there were novel types of bats across Madagascar and the Comoros islands? 

Animal monitoring and tracking

Bioacoustic data allows us to track animals across vast chunks of time and space—and accurate monitoring is essential if we are to save species. 

  • How can a grid of recording stations be optimized to precisely track species’ range changes, expansions and contractions? An elephant’s rumble, for example, is accompanied by seismic vibrations which can be harnessed to study elephant behavior. In light of Colossal’s conservation efforts, we continue to share our latest technologies to advance the mission of our partners such as Save the Elephants. 
  • Acoustic buoys off the coast of Savannah in a highly trafficked zone are relaying sound information to mariners to prevent ships from colliding with whales. How can we further enhance bioacoustic tracking to detect marine wildlife and guide vessels in real-time? Relatedly, how can our insights inform the design of wildlife corridors or nature preserves
  • Declines in frog calls are a tragic harbinger of the spread of the fatal skin disease chytridiomycosis, “the worst infectious disease ever recorded among vertebrates … for its propensity to drive them to extinction”. Can we optimize the use of listening stations to track the spread of diseases before irreversible damage is done?
  • Poaching can be tracked by listening to gunshots, human voices, and animal alarm calls: One group set up 20 audio recorders to expose hunting in a conserved area of the Peruvian Amazon. How can we keep advancing bioacoustics to track defaunation?
  • Vocalizations of every acoustically communicating animal are threatened by climate change. Ocean acidification and noise pollution disrupt the communication of marine mammals, while changes in precipitation and temperature warp the sounds of terrestrial species—it turns out that urban birds are evolving to sing louder and at higher pitches than their rural counterparts. Eventually, entire auditory systems might change too. How can bioacoustic landscapes help keep tabs on these effects of climate change?

Animal interpretation

Acoustic communication is fundamental to a range of behaviors across species: A recent study found that acoustic communication in vertebrates with lungs is at least 407 million years old, and detected acoustic abilities in over 50 species previously considered mute (7).

  • Can we collect bioacoustic data to create not only a dictionary but also a grammar guide of animal communication to understand different types of vocalizations, from stress to mating calls (8)? One vision could be to generate audio recordings of natural soundscapes collected from a global network of stations (like we do for temperature data). Permanently archived for analysis, each clip would be like a museum specimen, but hosting many species. 
  • Singing helps birds attract mates—the more singing there is, the more likely birds are to breed in a specific spot. Can bioacoustic data help elucidate changes in reproductive patterns in response to anthropogenic pressures?
  • Animal vocalizations have important emotional cues. These are already being studied in industrial farming to monitor animal wellness. Can we use similar indicators of behavioral states to shed light on the well-being of species in the wild? 

The Colossal vision: A collaborative future for the well-being of all species

Whether harnessing the electromagnetic spectrum or sound pressure waves, we’ve seen through human history an extraordinary expansion of our physical repertoire of inquiry into the different methods species have evolved to make sense of and interact with each other and the world. 

At Colossal, not only are we thrilled to keep tackling cutting-edge computational challenges, but we know it is our duty to keep digging deeper than ever into these extraordinary properties of nature—to advance conservation and to advance de-extinction. And to advance them now. 

As concludes by our study’s first author Peter Bermant, “By better understanding animal behavior, communication, and ecology, these novel computational tools will help us design improved conservation and management strategies in the hopes of reversing the current human-driven extinction crisis and returning the Earth to a healthier state”.

All our data and code are available online and we vividly encourage additional teams to further expand on our results. 

Together, we will continue to shift the paradigm of bioacoustic data processing towards a fully automated pipeline for the dynamic monitoring of a range of real-world data (9)—while collectively tapping ever more creatively into the subtle patterns whispered (or clicked) through the physics of our faunal world. 



  1. Gutmann M, Hyvärinen A. Noise-contrastive estimation: A new estimation principle for unnormalized statistical models. In: Journal of Machine Learning Research. 2010. 
  2. Bermant PC. BioCPPNet: automatic bioacoustic source separation with deep neural networks. Sci Rep. 2021; 
  3. Hildebrand JA. Anthropogenic and natural sources of ambient noise in the ocean. Mar Ecol Prog Ser. 2009; 
  4. Weilgart L, Whitehead H. Coda communication by sperm whales (Physeter macrocephalus) off the Galapagos Islands. Can J Zool. 1993; 
  5. Guerra CA, Pendleton L, Drakou EG, Proença V, Appeltans W, Domingos T, et al. Finding the essential: Improving conservation monitoring across scales. Global Ecology and Conservation. 2019. 
  6. Gascon C, Collins JP, Moore RD, Church DR, Mckay JE, Mendelson III JR. Amphibian Conservation Action Plan. Russian Journal of Herpetology. 2007. 
  7. Jorgewich-Cohen G, Townsend SW, Padovese LR, Klein N, Praschag P, Ferrara CR, et al. Common evolutionary origin of acoustic communication in choanate vertebrates. Nat Commun 2022 131. 2022 Oct 25;13(1):1–7. 
  8. Teixeira D, Maron M, van Rensburg BJ. Bioacoustic monitoring of animal vocal behavior for conservation. Conserv Sci Pract. 2019 Aug 1;1(8). 
  9. Sainburg T, Gentner TQ. Toward a Computational Neuroethology of Vocal Communication: From Bioacoustics to Neurophysiology, Emerging Tools and Future Directions. Frontiers in Behavioral Neuroscience. 2021.