Wrong answers only: industry continues to grapple with “inherent characteristic” of GenAI

At the two-year mark after the launch of OpenAI’s large-language model (LLM)-powered chatbot ChatGPT, and with it, the explosion of generative artificial intelligence (GenAI) into the public consciousness, the technology’s growing pains and future evolution dominated conversations at Amazon Web Services’ re:Invent conference in Las Vegas.

“I think generative AI has the potential to transform every single industry, every single company out there, every single workflow out there, every single user experience out there,” Matt Garman, CEO of AWS, said during his keynote (BetaKit received travel and accommodation support from Amazon to attend re:Invent).

Later in the day, though, he said that as enterprises look to integrate the technology into their operations, they are looking for firm boundaries around what AI tools can do and what their generations look like.

“I think generative AI has the potential to transform every single industry, every single company out there.”
Matt Garman
Amazon Web Services

The stubborn problem of generative AI models hallucinating—or generating incorrect, misleading, or nonsensical text—is “one of the things that actually stops people from moving generative AI into real production,” Garman said.

“In reality, as good as the models are today, sometimes they get things wrong. So when you did a proof-of-concept last year or the year before, 90 percent was okay. But when you get down into a production application, that’s not okay.”

At re:Invent, Garman previewed a new feature called Automated Reasoning checks, meant to safeguard against hallucinations from the foundational models it makes available to customers through its Amazon Bedrock platform. These include Amazon’s own models and those made by Anthropic (in which Amazon is a major investor), Cohere, Meta, Mistral, and Stability AI. The company claims the feature will assess the responses a model generates for accuracy against information provided by the customer and, in the event of a possible hallucination, present its own answer alongside the potential mistake.

The announcement followed the release of similar features by Microsoft and Google earlier this year.

MattGarman_AWS_ReInvent | BetaKit — Matt Garman, CEO of AWS, speaking at re:Invent.

Hallucination is “really an inherent characteristic of LLMs itself,” said Pradeep Prabhakaran, a senior manager of solution architecture at Cohere, during a panel discussion at the conference. Prabhakaran was responding to a question about the factors that need to be addressed to advance the adoption of generative AI enterprise applications, and not discussing AWS’s automatic reasoning feature.

“As you’re taking [applications] from prototype to production, you need to build things so you can have a constant feedback loop … so if something doesn’t appear right, you still have a way to validate,” Prabhakaran said.

For Canadian challenger bank Koho, the accuracy of outputs is an important factor as the company explores consumer-facing generative AI applications.

Speaking to BetaKit on the sidelines of the event, David Kormushoff, the company’s vice-president of technology and AI, said Koho is interested in education-related use cases to teach clients to build their wealth and offer them insights into their spending habits. But “we don’t want to be … giving them bad information,” he added. “That’s the opposite of everything we believe in.”

However, he said, “I think we’ll probably get to a point where we feel confident enough to put [these tools] in front of a customer.”

Thomas Storwick, the co-founder and COO of Coastal Carbon, a Waterloo, Ont.-based startup that uses generative AI models to analyze geospatial data for applications including insurance, agriculture, and climate change mitigation and adaptation, said in an interview with BetaKit that the company is currently learning about geospatial foundational models, and what hallucinations look like in that context, as the models evolve. “It’s common sense and keeping humans in the loop, and making sure we work with clients who understand the type of data we’re giving them,” he said.

While many LLM companies have chased scale over the last few years, expecting that ever-larger models with greater training datasets would lead to capability improvements, industry players are now debating whether bigger is actually better.

Hyperscalers, including Microsoft, Amazon, and Google, have rushed to invest in building more data centres and more powerful graphic-processing units to support even larger foundational models.

But for enterprise applications, using a large foundational model may not be the right fit, said Cohere’s Prabhakaran during the panel. “[We need to think about] how to build models on smaller infrastructure that still meet the constraints of latency, accuracy, and cost,” he said.

In a Dec. 5 letter to staff and shareholders, Cohere CEO Aidan Gomez said the company sees the future of enterprise AI as smaller plug-and-play tools. Gomez announced the company would launch a suite of workplace AI assistants that can be plugged into companies’ existing systems—such as their email platform and customer relationship management tool—and would customize smaller models for clients, which could be deployed privately.

Stronger data privacy and security measures can help with AI adoption, particularly among regulated businesses, and improve the models’ outputs by training them on business-relevant data, Gomez wrote.

Patricia Nielsen, AWS’s head of startups for Canada, said in an interview with BetaKit that the company was seeing a “strong focus on the responsible use of data” among businesses.

“I think that companies are starting to realize more and more that they have to be cautious,” Nielsen said, adding that there is a need for transparency about the sources of training data and how data is trained.

While generative AI adoption has increased, some industry leaders, including Canadian AI ‘godfathers’ Geoffrey Hinton and Yoshua Bengio, have expressed strong concerns about the possible future dangers of the technology, warning that it is a threat to humanity.

In a fireside chat, Andrew Ng, the founder of the Google Brain project and DeepLearning.AI, who is now an Amazon board member, dismissed those concerns as a “distraction.”

While he said he worried “a bit about AI polluting the information ecosystem,” he said most threats are application-specific. He said he believes AI teams have taken the issue of bias in such systems “very seriously, and really work hard on it.”

Images courtesy AWS.