Researchers Create The First Quantum AI Experiments Run on Production Hardware

David Borish
1 day ago
6 min read

When a Quantum Layer Changes the Answer

Ask Meta's Llama 3.1 8B which planets in our solar system have rings, and the base model gets it wrong. It selects only Saturn. Ask the same question after a team of researchers from Multiverse Computing has inserted a small quantum circuit block into the model's architecture and run inference on IBM's 156-qubit quantum processor, and the model identifies all four Jovian planets correctly. The model changed its answer. The only thing that changed was the addition of 6,000 parameters, representing a 0.000075 percent increase over the model's existing 8 billion.

That result, published May 7 on arXiv by researchers at Multiverse Computing, represents what the authors describe as the first end-to-end demonstration of quantum enhancement in a production-scale, widely deployed large language model running on real superconducting quantum hardware. The framing matters. Prior work on quantum-AI integration had operated at smaller scales, in simulation, or on toy tasks. This ran on a real quantum processor with a model people actually use.

How the Architecture Works

The mechanism involves a class of quantum circuit components called Cayley-parameterized unitary adapters, or CUAs. Cayley parameters are mathematical matrices that can be trained to weight specific components. Researchers insert these adapters into a specific layer of the language model, train them on a classical computer while keeping the original model parameters frozen, and then execute the combined system on IBM's Quantum System Two, a 156-qubit superconducting processor.

The original model parameters are never modified. The quantum adapters act as a lightweight addition to the existing system, and the full hybrid model, containing both the classical weights and the trained Cayley parameters, runs on the quantum hardware during inference. The researchers measured prediction quality using perplexity, a standard metric in language modeling that captures how well a model predicts sequences of text. Lower perplexity indicates more consistent, accurate prediction. The hybrid system reduced Llama 3.1 8B's perplexity by 1.4 percent on the WikiText benchmark.

In addition to the astronomy question about planetary rings, the enhanced model also corrected a biology error. The base Llama model, asked about the population-genetic consequences of gene flow, selected "Hardy-Weinberg disruption." The quantum-enhanced version correctly identified increased genetic homogeneity. Two different domains, two different types of factual reasoning, both corrected by the same architectural modification.

A secondary experiment using the smaller SmolLM2 model, at 135 million parameters, showed that perplexity improved consistently as researchers increased the size of the unitary block, up to the point where quantum noise overwhelmed the gains. That transition point, which the paper describes as a noise-expressivity phase transition, identifies the specific hardware threshold at which additional qubit scale begins delivering utility. It gives researchers a concrete target for what future hardware needs to achieve.

Why the Parameter Count Matters

The scaling dynamics of large language models are approaching a practical ceiling in certain respects. Every trainable parameter consumes classical memory, and the infrastructure required to run, fine-tune, or deploy a model scales accordingly. GPT-5.5, by public estimates, contains somewhere between two trillion and five trillion parameters. Each incremental improvement in performance at that scale requires a proportional increase in compute, memory, and energy.

The Multiverse Computing result suggests a different path. The 6,000 quantum parameters added to Llama 3.1 8B produced a measurable accuracy improvement that would require a much larger classical parameter addition to replicate, if it could be replicated at all. Quantum circuits operate over what researchers call a Hilbert space, a mathematical space that grows exponentially with each added qubit. That exponential expansion allows quantum systems to represent and process correlations that would require vastly more classical resources to approximate. The CUA approach captures some of that capacity without needing to redesign the underlying model.

Borja Aizpurua, the paper's first author and a senior research scientist at Multiverse Computing, described the result as analogous in significance to early experimental demonstrations of Shor's algorithm. The point was not the magnitude of the improvement but the existence of the effect at all on hardware of this scale.

IonQ's Parallel Approach

Multiverse Computing is not the only group reporting measurable gains from quantum-classical integration. In May 2025, IonQ published research demonstrating a hybrid architecture for LLM fine-tuning that takes a different structural approach. Rather than inserting quantum adapters into an existing model layer, IonQ replaced the classical classification head of a pre-trained language model with a parameterized quantum circuit, a tunable circuit with adjustable parameters that function like weights in a neural network.

The resulting hybrid system was tested on sentiment analysis tasks, including the Stanford Sentiment Treebank benchmark. The quantum fine-tuned model outperformed classical methods using a comparable parameter count, and the researchers observed that accuracy increased as qubit count increased. The effect held even in low-data settings where classical models had difficulty generalizing, which matters for enterprise applications where large labeled datasets are often unavailable.

IonQ also ran a second set of experiments using quantum-enhanced generative adversarial networks for image synthesis in materials science, in collaboration with an automotive manufacturer. That system produced synthetic images of rare microstructure anomalies at higher quality scores than classical GAN baselines in up to 70 percent of test cases. The two projects together suggest that quantum enhancement is not limited to language modeling. The same class of hybrid architecture is producing consistent results across multiple AI task types.

The Noise Problem

Both research programs are operating in what quantum computing researchers call the NISQ era, short for Noisy Intermediate-Scale Quantum. Current quantum processors are not fault-tolerant. Errors accumulate through a variety of mechanisms: interactions between adjacent qubits, electromagnetic interference from the surrounding environment, cosmic ray events, and thermal fluctuations. The larger a quantum circuit, the more opportunities for noise to corrupt the computation.

This is why both Multiverse Computing and IonQ are working with relatively small quantum layers inserted into larger classical architectures rather than attempting fully quantum AI systems. The CUA adapters Multiverse Computing used are intentionally compact. Aizpurua noted that circuit size directly determines noise exposure, and the SmolLM2 experiments confirmed this by showing where additional qubit scale stops helping and starts hurting. That inflection point is the technical target for hardware improvement.

IBM has publicly stated that it expects to demonstrate quantum advantage, the point at which quantum systems perform tasks no classical computer can replicate, in 2026. IBM is also developing Starling, described as the first fault-tolerant quantum computer, with a target of 2029. Fault-tolerant hardware would allow the quantum portions of hybrid AI systems to scale without the noise ceiling that currently limits how much of a model's computation can run on quantum hardware.

What Comes Next

Aizpurua described the current work as a proof of concept and outlined two directions for follow-on research. First, the team wants to develop methods to encode the entire quantum circuit directly, rather than just the Cayley adapters. That would allow larger portions of the computation to benefit from quantum processing. Second, future work will attempt to expand the qubit count and circuit complexity, including large-scale tests planned for the Frontier supercomputer at Oak Ridge National Laboratory.

IonQ, meanwhile, is preparing to move its LLM fine-tuning research from simulation to full deployment on its Forte and Forte Enterprise quantum hardware systems, which support 36 algorithmic qubits. The company is also running joint quantum-AI research with Japan's National Institute of Advanced Industrial Science and Technology.

The consistency of results across multiple hardware types, IonQ's trapped-ion systems and IBM's superconducting processors use fundamentally different physical approaches, and across multiple task domains, from language modeling to sentiment classification to image generation, makes a stronger case than any single result could on its own. Quantum components appear to capture correlations that classical networks miss, and they do so while adding a fraction of the parameters that a purely classical approach would require.

The broader implication touches on one of the core patterns in modern AI development. Simulation has repeatedly preceded physical demonstration in this field. Architectures are validated in constrained environments before being deployed at scale. The quantum-AI research programs at Multiverse Computing and IonQ are now following a similar trajectory, moving from simulation to hardware, from toy benchmarks to production models, and from theoretical claims to results that change specific answers on specific questions. The hardware constraints are real and the gains are still small, but the direction of travel is clear.

DAVID BORISH

Researchers Create The First Quantum AI Experiments Run on Production Hardware

When a Quantum Layer Changes the Answer

How the Architecture Works

Why the Parameter Count Matters

IonQ's Parallel Approach

The Noise Problem

What Comes Next

Comments

SIGN UP FOR MY NEWSLETTER

ARTIFICIAL INTELLIGENCE, BUSINESS, TECHNOLOGY, RECENT PRESS & EVENTS

Back to top