Lost in Translation: Why AI Still Doesn't Speak the World's

POLICY-WIRE Exclusive: Ask an AI assistant a question in English and it can feel like magic. It drafts your email, summarizes a contract, walks you through a medical form. Ask the same question in Urdu, Yoruba, Tagalog, or Quechua, and the magic thins out fast. The answer arrives slower, shallower, sometimes confidently wrong and every so often the system simply gives up on your language and replies in English, as if to say: we’ll do this on my terms, not yours.

I’ve spent my career building the autonomous AI systems “agents” that companies now deploy to millions of people. The most uncomfortable thing I’ve learned is also the simplest: artificial intelligence does not speak the world’s languages. It speaks English, with a heavy accent in a few dozen others and falls largely silent across the thousands that remain. For a technology being sold as a universal tool, that is not a small footnote. It is a fault line running straight through who gets to benefit from the AI era and who gets left talking to a wall.

The performance cliff nobody markets

The numbers behind modern AI are lopsided in a way the marketing never mentions. The text these models learn from is overwhelmingly English and a handful of other well-resourced languages. Everything else is what engineers politely call “low-resource” which in practice means the model has seen comparatively little of it and performs accordingly.

The result is a performance cliff. A model that scores brilliantly in English can become unreliable two languages over and barely usable a few more beyond that. It mistranslates idioms, mangles names, loses the thread in scripts it rarely encountered and reasons less accurately because under the hood, it is often translating your language into English, thinking in English and translating back. Each of those hops is a place for meaning to leak out.

For a casual chat, that’s an annoyance. For someone using AI to understand a government benefit, a prescription or a loan agreement in their own language, it’s the difference between being served and being misled.

“Language drift” and why agents make it worse

There’s a specific failure mode I watch for constantly: language drift. You ask the system to operate in one language, and partway through a complex task it quietly slides into another — usually English. It starts a response in your language and finishes in mine. It mixes scripts mid-sentence. It “corrects” a perfectly good term into an English one nobody asked for.

This used to be a cosmetic problem when AI only chatted. It is a serious one now that AI acts. The shift everyone is racing toward is from assistants that talk to agents that do systems that take multi-step actions on your behalf, calling tools, filling forms, completing transactions. In a chain of ten steps, a small language slip in step two doesn’t stay small. It compounds. By the final step you can have an action taken on a misread instruction, in a language the user never agreed to operate in. Multiply that across hundreds of thousands of daily interactions and “drift” stops being a quirk and becomes a reliability and trust problem.

Bigger models won’t simply fix this

The reflex in my industry is to assume the next, larger model will quietly absorb every shortcoming. On language, I don’t believe it will, not on its own. The imbalance is in the world’s data and you cannot scale your way out of a gap you keep underfeeding. Waiting for the universal model to arrive is, for most of the world’s speakers, a polite way of asking them to wait their turn indefinitely.

What actually closes the gap is less glamorous than a bigger model and more honest about the problem: deliberate engineering around the language itself. In the systems I’ve built to hold consistent across 17+ languages, the difference came from controls layered on top of the model, not from hoping the model behaved. Deterministic guardrails that lock the system to the user’s chosen language and refuse to drift. Enforced terminology so that critical words — a medical term, a legal term, a product name — render the same way every time instead of being creatively re-translated. And a rigorous evaluation layer, including using AI to systematically judge AI’s output across every supported language, so failures are caught by design rather than by an angry customer.

None of that is exotic. It’s the unglamorous scaffolding that separates a demo that dazzles in English from a system you can actually trust in forty languages at scale. The technology to do this exists today. What’s frequently missing is the decision to treat non-English languages as a requirement rather than a “phase two” someone will fund later.

It matters who gets served

This is not just a technical detail. Language is how people reach almost everything AI is now being pushed into public services, healthcare, education, banking. When a system works smoothly in English and stumbles in everything else, every institution that adopts it inherits that gap and passes it on. And the people on the receiving end are usually the ones already least served: those who don’t have a fluent-English alternative to fall back on.

A choice, not a destiny

The good news is that this is a choice, which means it can be chosen differently. Whether AI speaks a language well is not destiny handed down by a dataset; it is the product of where teams point their effort. Companies can demand multilingual reliability as a launch criterion. Buyers, governments and enterprises serving multilingual populations can refuse to accept “it works great in English” as a finished product. Engineers like me can keep building the controls that make breadth possible instead of treating it as a stretch goal.

A truly useful AI is not the one that astonishes a fluent English speaker in a demo. It’s the one that quietly, reliably does its job for a grandmother in Lahore, a farmer in Oaxaca and a nurse in Lagos in their own words, without drifting back to mine. Until the industry treats that as the baseline rather than the bonus, the most universal technology we’ve ever built will keep being, for most of humanity, something that was almost talking to them.

Author

Ishween Kaur

Ishween Kaur is a software engineer specializing in agentic AI. As a founding engineer on Salesforce's AgentForce, she designed the agentic graph architecture, personalization memory, multi-step workflows, and security boundaries behind enterprise AI agents deployed at Fortune 500 scale. She writes about the engineering reality of AI systems and what it takes to make them reliable, multilingual and trustworthy at scale.

View all posts