The $10 Million Question Behind the Dolittle Prize: How AI Decoded a Chimp, Bonobo, Mouse, and Finch
- David Borish

- Jun 4
- 6 min read

The most striking number in this year's race for the Dolittle Prize is not the $100,000 on the table or the $10 million grand prize waiting behind it. It is twelve. That is the size of the entire vocal repertoire that wild chimpanzees in Ivory Coast appear to draw on, according to one of the four finalist teams. With only a dozen basic calls, the chimps generate a far larger range of messages by combining those sounds in different orders, a flexibility that looks uncomfortably close to something we used to reserve for ourselves.
The prize, sponsored by the British financier Jeremy Coller and administered by Tel Aviv University, awards $100,000 each year to the research team that makes the most significant advance toward deciphering animal communication. This year's winner will be announced on June 25. Behind the annual award sits a much larger lure: a grand prize of either a $10 million equity investment or $500,000 in cash for any team that can demonstrate sustained two-way communication, defined as an animal initiating contact on its own without recognizing that a human is on the other end. The inaugural prize in 2025 went to a team studying bottlenose dolphins off Sarasota, Florida. This year the shortlist spans four very different animals, and what links them is less the species than the method.
The machine in the field
Every finalist this year leaned on the same family of tools that powers large language models. The pattern is consistent across the projects. Researchers gather an enormous volume of recordings, far more than any human could annotate by hand, then train neural networks to find the structure hiding inside the noise.
The clearest example comes from Nicolas Mathevon's work on the African striped mouse. In 2023 his team recorded 122,619 squeaks from dozens of wild mice over twelve days and nights, using 23 microphones spread across four nest bushes. The repertoire held at least seven distinct squeak types, some used inside the nest and others at the edges of the animals' territory. When the team fed those recordings to an artificial neural network, the same architecture underlying systems like ChatGPT, it found that each nest carried its own vocal signature. Later analysis went finer still and identified signatures unique to individual mice. Mathevon is direct about why the machine matters here. With that many vocalizations, he says, a human researcher simply cannot manage the data, and machine learning becomes essential rather than optional.
What the mice encode, by his account, is largely static information about identity, a kind of acoustic name tag that does not change over time. That is a more modest finding than syntax, and it is worth holding onto as a reference point. Not every animal that produces a rich soundscape is necessarily saying complex things. Sometimes the signal is closer to a stable label than a sentence.
Chimpanzees and the hint of syntax
The chimpanzee work pushes into more contested territory. Catherine Crockford, who runs the Ape Social Mind Lab in Lyon, and her collaborator Roman Wittig have spent years building an archive of roughly 20,000 hours of recordings from chimpanzees in Taï National Park in Ivory Coast. They know the animals individually. Wittig describes a population of about 150 chimpanzees the team recognizes by name, many followed from birth to death, now into a third generation of observation. That depth of context is what lets the acoustic data mean anything, because the researchers can pair a given call with what the animal was actually doing when it made it.
Their recent analysis focused on two-call combinations, the pairings linguists would call bigrams. Out of more than four thousand utterances, a clear pattern emerged: a single call carries one meaning, and the same call embedded in a pair can shift toward another. A "hoo" on its own tends to signal resting. A "pant" on its own tends to signal play. Put together, the combination is associated with a different activity again, building a nest.
Earlier studies had found animals reshuffling calls mainly to raise alarms, such as warning about a snake. The Taï finding suggests the reshuffling reaches into ordinary daily life, which is a different and more general claim. Crockford frames the headline carefully. The chimps have only twelve call types, but the freedom to combine them lets them modify meaning or generate new meanings, which she calls a genuinely surprising capacity in a wild animal.
That word, syntax, has to be handled with care, and the researchers handle it with care.
Combining a small set of units into a larger set of meanings is one of the defining features of human language. Seeing even a partial version of it in a close relative does not collapse the distance between chimp calls and human speech, which remains vast. It does suggest the building blocks were available a long way back, perhaps in the common ancestor humans and chimpanzees shared millions of years ago.
Bonobos asking for peace, finches grading their own calls
The bonobo project, led by Mélissa Berthet with Martin Surbeck and Simon Townsend, worked from about 700 calls recorded at the Kokolopori site in the Democratic Republic of Congo. Berthet built a visual map of single calls and their combinations, then looked for cases where a pairing meant something neither call meant alone. One stood out. A "peep," which on its own proposes a course of action, combined with a "whistle," which on its own helps keep a traveling group together, produces a meaning tied to tense social moments. The combination shows up when one animal is threatening another. Berthet reads it as something close to a request to make peace, a way of defusing a situation rather than naming an object or an action.
The zebra finch work takes a different and clever angle on the hardest problem in this whole field, which is verification. Julie Elie, a neuroscientist at the University of California, Berkeley, catalogued eleven call types in the birds and linked them to meanings including hunger, danger, bonding, and social conflict. Her categories matched the descriptions the late ornithologist Richard Zann had recorded from wild finches. The interesting move was checking whether the birds agreed with the human sorting. Elie trained finches to peck a button that played a call, and rewarded a peck with a seed only when the call belonged to the category she was testing that day.
Over a few days, birds learned to identify the rewarded category, which means they were grouping the sounds by meaning rather than by raw acoustics. Published work from her group describes this as evidence that the birds form mental representations of their calls' meanings, a kind of semantic perception that resembles how humans handle word categories.
The Dolittle machine and the case for caution
The long-term goal behind the prize is a "Dolittle machine," a system that would let humans hold a real two-way exchange with another species. The argued payoff is conservation and welfare. If we could read distress, hunger, or social tension in a wild population, the thinking goes, we could manage habitats and interventions with far better information than we have now.
The scientists closest to the work are also the ones raising the loudest warnings, and that is the part of this story most likely to be lost in the excitement. Surbeck, who helped establish the bonobo site, is blunt about the risk of broadcasting signals back at animals we only partly understand. He describes wanting to avoid messing with their heads at any price, and notes that unsettling a wild group could have consequences no one can anticipate. Jonathan Birch, a philosopher at the London School of Economics who sits on the prize's judging panel, sets the honest distance plainly, saying the vision of fluent two-way contact remains far off. Groups such as Project CETI, which studies sperm whales, are already working through the legal and ethical questions that would arrive if humans really did start understanding what animals say to each other.
There is a pattern worth naming here for anyone tracking how AI capabilities arrive. The decoding has run well ahead of the dialogue. We can now read structure in mouse squeaks and chimp pairings and finch categories long before we can responsibly say a word back. The analysis layer matures first, in the relative safety of recordings and trained networks, while the real-world act of two-way contact stays just out of reach and carries risks we have barely begun to map. The machine that understands is arriving before the machine that converses, and that order is not an accident. It is how this kind of capability tends to show up.
For now, the four teams have shown that a dozen calls, the right microphones, and a well-trained network can surface patterns that looked invisible a decade ago. Whether any of that becomes a conversation, and whether it should, are the questions the June 25 announcement will not answer.
Comments