As Google and Cohere expand multilingual AI offerings, experts warn of “plausible BS”

Translation experts at tech companies and certified translators in Canada are concerned about the fallout of machine-translated content being used in important documents without expert oversight.

Canadian businesses are increasingly incorporating artificial intelligence (AI) into their day-to-day operations, often to automate tasks currently being performed by humans. Statistics Canada data shows that 6.1 percent of all Canadian businesses have used AI for goods and services in the last 12 months.

Google Translate’s Isaac Caswell told BetaKit that AI models are great at spitting out “very plausible BS.”

AI-powered language translation is an attractive option for companies that want to access international markets more easily. But experts say it’s unwise, and even dangerous, to rely on machine translation without professional review.

Instead, translators and tech experts are encouraging a symbiotic relationship between translators and AI—one that safeguards the role of human translators while maintaining high information quality across languages.

This view is echoed among translators themselves, who have already lost work due to automation, according to Québec’s professional order for translators, terminologists, and interpreters. The Ordre des traducteurs, terminologues et interprètes agréés du Québec (OTTIAQ) recently warned against using AI translation without the supervision of certified translators, arguing that it can result in misinterpretation and misinformation with grave consequences.

Data dearth

Automated translation experts in the tech sector, including those at Google Canada, agree with this warning.

Google Translate recently unveiled two new additions to its free program used by over 500 million people worldwide: Canadian French and Inuktut, the family of Indigenous languages spoken in Canada’s Arctic. Developed in partnership with Inuit Tapiriit Kanatami, an Inuit advocacy organization, the program incorporates multiple dialects of the Inuktitut language.

Isaac Caswell, a senior software engineer at Google Translate, told BetaKit that AI translation tools should not be used “without checking in any professional, high-stakes situations.” AI models are great at spitting out “very plausible BS,” he added.

Adding support for Indigenous languages, particularly those with a relatively small number of speakers due to acts of cultural erasure perpetrated by the Canadian government, was more challenging because less data is available, Caswell explained.

Some companies are seeking to close the gap in multilingual support for LLMs. Cohere, one of Canada’s most valuable AI companies, recently rolled out an update to its Aya Expanse program, a multilingual large-language model (LLM) that supports 101 languages.

Sara Hooker, the vice president of research and head of Cohere For AI, the company’s non-profit research lab, explained that their research is working to improve the quality of LLMs in languages other than English, which suffer partly due to a dearth of available high-quality training data.

“Available datasets treat many languages as invisible, and favour North American language and cultural perspectives,” Hooker wrote in an email to BetaKit. “This language gap can produce biases and undermine safety for users.”

Aya Expanse models were trained using data from over 3,000 professional translators, the company claims. However, that expertise cannot safeguard against some of the common inaccuracies that plague machine translation.

When asked if Aya Expanse models could be used without the input of a professional translator, a Cohere spokesperson said that these models are intended for use by AI researchers and therefore could not comment.

Google Translate notes that the following disclaimer should be used when a translated text is presented without vetting by a professional translator: “Reasonable efforts have been made to provide an accurate translation, however, no automated translation is perfect nor is it intended to replace human translators.”

Translators sound the alarm

Betty Cohen, president of OTTIAQ, said that some translators lost clients when generative AI tools first began to roll out, but they came back very quickly because the machine-generated results were not up to the same standards.

“As soon as you have a text that is a bit more sensitive or technical, or that uses a very specialized language, or that is only badly written in the original language, then the result is never very good,” Cohen said in an interview with BetaKit.

Cohen noted that translators have been using AI for years, and are the best people to use AI for this tool because they understand how different models approach the task.

One recent study by a Google DeepMind researcher found that combining translators’ skills with increasingly powerful AI tools can boost productivity. Scaling training compute, which means increasing the computational power of an AI model, significantly boosted the productivity of translators working on tasks with the assistance of LLMs. Compute scaling had an even larger effect on productivity for less skilled translators.

“Machines are not at a stage where professional translators can be skipped.”
Gary Kalaci
Alexa Translations

OTTIAQ argues that trusting the product without expertise from a professional translator can lead to misunderstandings, bad writing, and the erosion of a company’s message.

Cohen framed the irresponsible use of AI tools to translate text as a terrible marketing strategy for companies. At worst, she said, a company could be alienating an entire customer base with error-riddled marketing copy.

“It’s not just translating words,” she said. “It’s translating your culture, it’s translating your image, translating your message.”

A potential way to mitigate these issues is customization, which has been the business model for Alexa Translations for over 20 years. The legal translation company acquired an AI machine translation company in 2017, and now offers the model to its clients, customized to fit their company’s internal legal jargon and learn from it.

Despite the tailoring capacity of their model, CEO Gary Kalaci told BetaKit that it was “dangerous to use without professional intervention indiscriminately.”

“Machines are not at a stage where professional translators can be skipped,” he added.

But across the internet, they are being skipped—and it’s contributing to what American journalist Jason Koebler of 404 Media calls the Zombie Internet: the erosion of the information ecosystem as AI-generated slop clogs up websites and social media feeds.

Starting in 2023, the social media platform Reddit has rolled out the automated translation of its content into languages such as French, German, and Spanish without expert intervention. The company said in a quarterly earnings release this year that the move contributed to 44 percent year-over-year growth of its user base.

But as Cohen points out, large-scale automated translation with no supervision could compound small errors and lead to lower content quality and the spread of misinformation.

Large language liability

In some instances, faulty AI translation has led to public errors with varying degrees of severity, from embarrassing slogan mistakes to mistranslations of refugee and asylum claims in the US.

Argyri Panezi, the Canada Research Chair in digital information law and policy at the University of New Brunswick, explained that companies should be prepared to be held legally responsible for any AI-generated content they choose to use.

“If something goes wrong, and a case ends up being litigated before the court, they will look for a human to attribute this fault,” Panezi said in an interview with BetaKit.

Air Canada, for example, was found responsible for misinformation that a customer-service chatbot sent to a customer and was ordered to pay compensation.

Beyond costly lawsuits and reputational risk, data privacy is another concern when using free online programs for machine translation.

“For those kinds of freemium programs, typically, you are the product,” Kalaci said.

DeepL, a popular AI-powered translation service, only ensures that texts are deleted and not used to train its AI models for paid Pro subscribers. Free users, then, could have their inputs stored and used for training.

The reputational, legal, and privacy-related risks of using imperfect AI models to translate material are serious, according to Cohen, which is all the more reason for tech companies to employ professionals who understand the strengths and limitations of both neural machine translation and LLMs.

“The professional translator today is your insurance against hallucinations,” Cohen said.