A Multi-Agent AI System Diagnosed Swine Diseases With 94.5% Accuracy and Under-15-Second Response Times
- David Borish

- 2 days ago
- 5 min read

Swine farming operates under a persistent pressure: disease moves fast through a herd, veterinarians are unevenly distributed, and the gap between a sick pig and a confirmed diagnosis can cost producers significantly. Researchers at AXONS, working in collaboration with Charoen Pokphand Foods, set out to address this directly by building a multi-agent AI diagnostic system trained on veterinary knowledge and capable of identifying swine diseases from clinical symptoms alone.
Their results, published in March 2025, are detailed enough to be practically useful. The system achieved 94.5% accuracy on disease diagnosis tasks, classified user queries correctly 95.23% of the time, and returned responses within 15 seconds on average. These aren't benchmark numbers in isolation; the researchers compared multiple leading language models against each other and identified where the gains come from.
How the System Works
The system is built around a three-stage pipeline. The first stage classifies incoming queries into one of four categories: Knowledge Retrieval Queries that seek general information like vaccination guidelines; Symptom-Based Diagnostic Queries that describe clinical signs; To-Be-Clarified Queries that need follow-up prompting; and General Queries that cover broad or casual questions. The classifier uses a probabilistic model that assigns likelihood scores to each category and routes the query accordingly.
When a symptom-based query is identified, the second stage takes over. Rather than asking a single question and attempting a diagnosis from incomplete information, the system collects symptoms in layers. It begins with general health indicators, moves to external physical signs, and then narrows to specific symptom clusters tied to particular disease groups. This staged collection is designed to reduce the ambiguity that makes symptom-based diagnosis difficult, since many swine diseases share surface-level signs before diverging in their underlying pathology.
Once the symptom picture is assembled, multiple specialized agents analyze the data simultaneously. Each agent generates a confidence score for candidate diseases, and those scores are combined through a weighted fusion mechanism. A confidence threshold determines which diseases rise to the top of the prediction, and the system ranks them from very high to low confidence. The final output from this stage is a prioritized list of likely diagnoses based on accumulated clinical evidence.
The third stage generates treatment recommendations using Retrieval-Augmented Generation, or RAG. Rather than relying solely on what the language model learned during training, the system retrieves relevant documents from a domain-specific knowledge base and uses them to ground the recommendations in current veterinary guidance. If the initial output doesn't meet quality thresholds, the system retries with a longer delay interval before producing a final response.
What the Numbers Show
The research team evaluated the system across three distinct tasks. On query classification, it reached 95.23% accuracy, meaning the system correctly identified what kind of question was being asked nearly all of the time. That matters because routing a symptom-based query to the wrong processing stream produces a fundamentally different and less useful output.
On disease diagnosis, the results varied somewhat by disease type. African Swine Fever (ASF) and Porcine Epidemic Diarrhea (PED) returned reliable diagnoses consistently. Porcine Reproductive and Respiratory Syndrome (PRRS) and Foot-and-Mouth Disease (FMD) showed more inconsistency, which the researchers attribute to symptom overlap and query ambiguity. These diseases share clinical presentations with other conditions, and the system's staged questioning approach helps but doesn't fully resolve the challenge.
The model comparison results are instructive. GPT-4o achieved 90.63% test accuracy with an average response time of 18.78 seconds. Gemini-1.5-Pro-002 reached 94.23% validation accuracy but dropped to 87.50% on the test set, a gap that suggests it may be fitting to particular query patterns rather than generalizing across the full range of cases. The o1-mini model lagged on both accuracy and speed, with an average latency of 29.38 seconds, which the researchers flag as a practical limitation for real-world deployment. These differences in model behavior illustrate why the multi-agent design matters: pooling predictions across models with different strengths reduces the risk of any single model's failure modes driving the final output.
The Veterinary Access Problem
The system is designed with a specific operational context in mind. Swine disease surveillance has long depended on trained veterinarians, and in many parts of the world, those veterinarians are concentrated in urban areas or are unavailable during the hours when disease first becomes visible. A pig farmer who notices reduced appetite, labored breathing, or skin lesions in a small number of animals in the early morning often can't get a veterinarian on site until hours later or the following day. By then, the disease may have spread through the pen.
An AI system capable of asking structured diagnostic questions and returning a high-confidence provisional diagnosis within 15 seconds changes that timeline. A farmer or farm manager can begin documentation, adjust biosecurity protocols, and prepare for veterinary follow-up before the vet arrives. In regions where veterinary access is severely limited, the system could serve as the primary triage mechanism, helping producers distinguish between conditions that require immediate isolation and those that can be monitored.
The researchers acknowledge that the system is intended to support veterinary decision-making, not replace it. The confidence ratings attached to each diagnosis give the user a clear signal about certainty level, and the treatment recommendations are grounded in retrievable veterinary literature rather than generated freely by the model. That grounding matters for a domain where incorrect guidance can accelerate harm.
The Broader Pattern
This research fits into a wider movement of applying multi-agent AI frameworks to specialized domains. The AXONS team drew explicitly on methodologies developed in human healthcare, where multi-agent diagnostic systems have shown similar advantages over single-model approaches. The translation to veterinary medicine is direct in some respects and complicated in others. Human diagnostic AI benefits from extensive labeled datasets built over decades of clinical records. Veterinary data, particularly for swine diseases in small or mid-scale farming operations, is thinner and less standardized.
The researchers point to several areas where the system still needs work. Symptom overlap remains a limiting factor, particularly for PRRS and FMD. The current disease coverage is finite, and expanding it will require additional structured veterinary knowledge. The system's performance on To-Be-Clarified Queries, where user inputs are vague or incomplete, is an area the team identifies for future improvement. Getting a user who is uncertain what they're observing to provide useful diagnostic information is a communication problem as much as a technical one.
Despite those limits, the core result stands. A system asking structured questions in stages, routing them to specialized agents, and fusing their outputs against a vetted knowledge base achieved diagnostic accuracy that outperformed several frontier language models operating on their own. The multi-agent design wasn't just an architectural choice; it produced measurably better results than any of the individual models tested.
For an industry where disease outbreaks can move from a few animals to a full herd in days, the ability to get a credible preliminary diagnosis in under 15 seconds, at any hour, without waiting for a veterinary visit, is practically significant. Whether this kind of AI system reduces losses in the field depends on deployment and adoption decisions that go beyond the research itself. But the technical foundation for it now exists and has been tested.
Click image to learn more

Comments