How to take AI from demo to real-world deployment

The difference between AI that works and AI that doesn’t sometimes comes down to a single second of latency. Éric Pinet’s team learned this while testing a voice AI prototype for medical clinics across Québec.

The system is designed to free up nurses for clinical care by answering and triaging patient calls. When responses came in under a second, patients stayed on the line. When it took longer, they asked for a human, undermining the system’s desired outcome.

“When you can manage everything inside your own infrastructure, it’s much easier to meet compliance standards.”

For Pinet, president of Québec City-based Unicorne, that experience captured something he sees in nearly every AI project. A system can perform well in a demo and still fall short when exposed to real workflows and real users. Whether it survives the transition often comes down to cost, compliance and how the system fits within business or regulated environments.

In his experience, the projects that survive that transition are built with deployment and scale in mind from the start, not adapted or tacked on later.

“With AI and generative AI, it’s quite easy to create a beautiful demonstration. The hard part is putting it into production with real customers, and making it work at an efficient cost,” he told BetaKit.

Broken prototype promises

He calls it suspended animation, the defining pattern of enterprise AI in 2024 and 2025. Most AI projects don’t fail outright, he says. They stall between promise and deployment, usually at one of two points.

The first is cost. A system that feels cheap in a demo can become hard to sustain once it’s handling real volume. Every interaction with a generative model carries a token cost, and those costs are not amortized at scale like with traditional SaaS. What looks efficient in a controlled environment can become unaffordable in production unless the system is designed from the start to manage how much information flows through the model.

The second is security. Prototypes often rely on external APIs, which are difficult to defend in a regulated environment. The moment data leaves your infrastructure, you’re trusting someone else to keep it safe and decide where the data goes.

“When you can manage everything inside your own infrastructure, it’s much easier to meet compliance standards,” said Pinet.

That’s how Unicorne approaches its work. Those infrastructure questions—where the data lives, who sees it, what gets logged—come before the model questions. It’s a backwards order from how most teams build, but in regulated settings, infrastructure isn’t a backdrop to the product. It is the product.

Phone tag

The Québec health-tech project recently put Unicorne’s approach to the test. Clinics across the province are dealing with high volumes of calls and not enough staff to handle them. Under the old system, receptionists take messages without clinical context, and nurses call patients back in the order the calls came in. The reason for the call often becomes clear only once the patient conversation begins.

The solution Unicorne helped build addressed that first step. A voice-based AI answers the call, asks structured questions, and applies each clinic’s triage protocols. By the time a nurse calls back, they already have a thorough summary of the patient’s situation, and urgent cases can be prioritized earlier.

“The AI is only for the first triage,” Pinet said. “It’s not for diagnosis.”

The pipeline runs entirely inside AWS, with Connect handling calls, Nova Sonic on voice, and Bedrock doing the reasoning over the clinic’s triage protocols. Patient audio never leaves that secure environment, in keeping with Québec’s privacy rules. Equally important, each interaction is logged, and every decision is traceable.

“With AI and generative AI, it’s quite easy to create a beautiful demonstration. The hard part is putting it into production with real customers.”

Pinet’s team also worked with nurses and receptionists to map the scenarios when a human should take over from the AI bot. A patient who sounds distressed, a symptom combination that falls outside the protocol, or someone who prefers to speak with a nurse will all trigger a handoff.

Notably, “The whole process has to happen very quickly, to give the patient a quick answer,” said Pinet. “We had to find solutions to be more efficient.”

In practice, every response travels through several steps before the patient hears it, from speech to text, through a reasoning model, and back to speech, and any of those transitions can introduce a lag. To keep the ‘conversation’ flowing, the team built in short acknowledgments the model can deliver while it works on its next response—phrases like “OK, I understand,” the way a real person might.

The system now handles more than 200 calls a day across client clinics, according to Pinet, with most triage completed before a human becomes involved. Pinot said nurses have responded positively, noting the intake summaries they receive through the system before each patient callback are particularly helpful in improving patient care and efficiency.

Pinet’s advice to founders is to ask the unglamorous questions early—the ones that rarely feel urgent until they are.

“The difficulty is getting to a real production system. What’s the cost? What’s the security? How do you manage access patterns?” he said. “It’s that part that you have to understand.”

PRESENTED BY

If you’re ready to turn AI ambition into action, talk to Unicorne.