For most of the world, artificial intelligence still sounds foreign, not just as a technology, but literally in the language it speaks. The models that now summarise documents, answer questions and write code are overwhelmingly trained on English and a handful of other global tongues, even as billions of people go about their lives in Wolof, Amharic, Sinhala, Kazakh or Urdu. In low- and middle-income countries, where the mobile phone has become the primary interface with the state and the market, this imbalance turns language from a matter of cultural pride into a hard economic fault line: an AI system that cannot understand how people actually talk cannot meaningfully advise a farmer, resolve a billing dispute or guide a young woman through the maze of digital bureaucracy. The result is a quiet but consequential divide: even where connectivity and devices exist, AI remains, in practice, a tool for those fluent in the right languages, and a distant curiosity for everyone else.
It is this gap that the GSMA’s report Bridging the Language Gap: The Role of Mobile Network Operators in AI Ecosystems sets out to map, and, more importantly, to politicise. Rather than treating local-language support as a cosmetic feature to be added once the “real” model is built elsewhere, the authors argue that language sits at the heart of AI’s uneven geography. They show that countries dominated by so-called low-resource languages adopt AI at markedly lower rates than otherwise similar, better-resourced peers, even after controlling for GDP and connectivity, and they connect this directly to how models are trained, governed and deployed. Around these stark statistics, they document a crowded but fragile ecosystem of grassroots researchers, community data initiatives and regional model builders who are trying to bend AI toward their own idioms, often on shoestring budgets and rented compute. Their conclusion is disarmingly simple: without deliberate effort, the AI revolution will speak in the languages of those who already dominate the internet, and everyone else will be left negotiating in translation.
The report builds its case on a layered understanding of the AI economy. At the base are digital-economy foundations — connectivity, devices, human capital and policy, on top of which sit three core “building blocks” of AI: data, compute and skills. Cross-cutting these are the familiar enablers of financing, partnerships and research and development. What the authors insist on, however, is that language is not just another layer; it is the medium through which data becomes meaningful, skills become usable and services become trustworthy. The internet itself is skewed: more than half of global web content is in English, even though only a small fraction of the world speaks it natively, while the overwhelming majority of the planet’s roughly 7,000 languages are effectively absent from large-scale AI training corpora. Benchmarks tell the same story in more technical terms: state-of-the-art language models deliver around 80 per cent accuracy on English tasks but often drop below 55 per cent for widely spoken African languages such as Yoruba, despite tens of millions of speakers. The AI boom, in other words, is riding on a very narrow linguistic base, and this base is tightly correlated with global economic and political power.
Against this backdrop, the report surveys a surprisingly rich landscape of efforts to counterbalance the dominance of English and other high-resource languages. It highlights regional initiatives like Masakhane and AfriBERTa in Africa, SEA-LION in Southeast Asia and AI4Bharat in India, all of which are building shared datasets, multilingual models and benchmarks to support under-represented languages at scale. Alongside them sit community-driven projects; Mozilla Common Voice, Karya, Digital Umuganda, GhanaNLP, HausaNLP, gheero and others, that collect speech and text through participatory methods, often engaging thousands of volunteers to record, transcribe and validate data in their own dialects. On the applied side, startups such as Lesan or Pindo embed local-language machine translation and voice agents into sectors like health, government services and micro-finance. Common to all of these is a pragmatic technical strategy: rather than training giant models from scratch, they fine-tune existing multilingual architectures, exploit cross-lingual transfer to squeeze as much value as possible from a few thousand high-quality sentences, and use hybrid approaches such as retrieval-augmented generation and translation pipelines to adapt global models to local realities. This is a world of ingenuity under constraint, where the frontier is not measured in parameter counts but in how far a scarce resource; clean, consented language data can be stretched.
Yet the very richness of this ecosystem throws its vulnerabilities into sharp relief. The work of building language datasets is slow, expensive and ethically complex: it requires not only recruiting and paying contributors, but also managing consent, protecting privacy and navigating the power dynamics of “extracting” language data from communities that may already have histories of exploitation. Compute remains a chronic bottleneck: many African and Asian innovators can assemble data and design fine-tuning strategies, but they lack sustained access to GPUs and engineering resources to iterate and maintain models. Funding is precarious; most projects depend on time-limited grants rather than robust commercial models, which makes long-term maintenance difficult and exposes language infrastructure to the whims of donor priorities. Perhaps most importantly, distribution is weak: the report notes that many promising models and applications never reach last-mile users because they sit in standalone apps or research environments, disconnected from the channels people actually use USSD codes, IVR lines, basic messaging and telco-branded super-apps. What emerges is a picture of linguistic innovation without a delivery mechanism, a set of engines without a chassis.
It is at this point that the report makes its most provocative move: it argues that mobile network operators, often seen as conservative utilities, occupy a structurally significant yet under-recognised place in this puzzle. Telcos, after all, sit at the convergence of infrastructure, data, services and regulation in most low- and middle-income countries. They operate the channels that matter to last-mile users – voice lines, SMS, USSD, IVR, low-end Android apps, and have deep relationships with regulators and governments. The report is clear about their constraints: they are bound by strict privacy and compliance requirements and are neither structurally nor legally positioned to “open up” raw customer data for model training, even when that data might seem linguistically valuable. Nor are they research labs inclined or equipped to invent new model architectures from scratch. But precisely because of their position, they can play three roles that most other actors cannot: they can integrate language technologies into mass-market services, they can convene coalitions around shared models and datasets, and, in some cases, they can invest in domestic AI infrastructure that reduces dependence on foreign platforms. Language, in this telling, becomes not only a cultural asset but a dimension of digital sovereignty.
The report fleshes out these roles through four case studies, each grounded in a different geography but linked by a common logic. In Senegal, Orange faces the everyday reality that while French is the official language, much of its customer base speaks Wolof at home. Rather than trying to build a frontier-scale model, the company assembles a hybrid system: rule-based workflows, modular natural-language understanding, speech recognition and synthesis, all carefully adapted to Wolof and other West African languages, and all wrapped in a human-in-the-loop process that allows linguists and customer-care agents to validate and correct outputs. Training and inference run on Orange’s own infrastructure, even when external partners like Meta or OpenAI are involved in earlier experimentation, because data protection and trust demand it. The immediate payoff is prosaic; fewer misrouted calls, shorter waiting times, more satisfied customers, but the longer-term effect is that Wolof becomes a first-class language in the digital interface of one of the country’s largest companies, rather than an afterthought.
In Sri Lanka, Dialog Axiata takes a different route but with a similar sensibility. The company runs Ideamart, a no-code platform that allows small businesses and individuals to build basic apps and services, but it finds that many potential women entrepreneurs are locked out by English-centric interfaces and documentation. Rather than investing in a Sinhala- or Tamil-specific model, Dialog partners with a local startup to layer a general-purpose large language model on top of AppMaker. Through careful prompt design and what the report calls “hybrid language prompting,” they allow users to describe their desired app in Sinhala or Tamil, while the model internally reasons in English and then executes tasks via an API wrapper. No fine-tuning is required; the system relies on the base model’s multilingual capabilities plus a rich system prompt that encodes AppMaker’s rules and constraints. What matters here is not technical novelty but institutional pragmatism: within tight time and budget constraints, and without any change to the underlying model weights, Dialog is able to make its creation tools accessible to a segment of users who were previously excluded.
The Kazakhstan and Indonesia cases push further into the realm of digital sovereignty, and they speak most directly to the report’s ambition to move beyond pilots. In Kazakhstan, Beeline’s innovation arm confronts the fact that existing global models perform poorly in Kazakh, a language central to national identity and increasingly prominent in education and public life. Working with the Ministry of Digital Development, universities and the Barcelona Supercomputing Center, they first build a smaller Kazakh-leaning model and then a full-scale large language model, KazLLM, by fine-tuning LLaMA on a multilingual corpus spanning Kazakh, Russian, Turkish and English. KazLLM is deployed in Beeline’s Janymda super-app as an AI Tutor for students and teachers, and its weights are released openly, positioning it as a public good as much as a corporate asset. In Indonesia, Indosat Ooredoo Hutchison goes beyond deployment to infrastructure: in partnership with Nvidia and AI Singapore, it builds GPU Merdeka, a domestic AI cloud, and the Sahabat-AI family of models, fine-tuned from Llama 3 and Gemma 2 on Bahasa Indonesia and regional languages such as Javanese and Sundanese. These models are offered as part of an AI Factory, compute-as-a-service and model-as-a-service, for enterprises and government agencies, positioning Indosat as a national AI enabler rather than merely a consumer of foreign clouds. In both countries, the telco becomes an infrastructural actor: not the only one, but one that matters.
Across these examples, the technical philosophy is consistent. The operators favour adaptation over invention: fine-tuning multilingual models when they have sufficient, high-quality local data; relying on prompt-based techniques and translation pipes when they do not; training small, task-specific models for latency-sensitive or constrained environments such as IVR systems. Human oversight is non-negotiable, both for quality control and to manage reputational risk. The report summarises these strategies in a useful taxonomy, from prompt-based adaptation and hybrid language prompting to fine-tuning and small language models, and is explicit about the trade-offs between speed, cost, control and performance. More importantly, it uses these details to make a broader point: language-inclusive AI at scale is less about having the most sophisticated model and more about aligning technical choices with institutional roles and deployment realities. The frontier, in other words, is institutional as much as it is mathematical.
Seen from Pakistan, this report reads less like a distant set of case studies and more like an outline of a path that is already, in fragments, being traced here. Pakistan’s linguistic landscape, Urdu as a national lingua franca and symbol of unity, coexisting with Punjabi, Sindhi, Pashto, Balochi, Seraiki and others, is precisely the kind of complexity the GSMA describes in African and Indonesian contexts, where legal and digital monolingualism masks deep social multilingualism. English, as elsewhere, is over-represented in courts, bureaucracy and higher education, and consequently in the online text and code that feed the world’s largest models. But in the last two years Pakistan has quietly begun to assemble the ingredients of its own language-AI ecosystem: an Urdu-first large language model, Qalb, advertised as the world’s largest Urdu AI model and trained on nearly two billion tokens; a government-backed initiative in which NUST and the National Information Technology Board, with a telecom partner, have agreed to build an indigenous Urdu LLM with datasets for Pashto and Punjabi; and experimental Pashto-language models like Qehwa AI, developed by independent engineers. These developments are framed explicitly as attempts to bridge the technological divide for Urdu speakers and to bring generative AI into Pakistani languages rather than relying solely on English-first tools.
On top of this, Meta has launched ALIF, an Urdu AI capability presented as part of Pakistan’s national AI push, while Zong 4G has announced what it calls the country’s first locally developed large language model, designed around Pakistani linguistic and cultural nuances, and is marketing ozgpt, an AI assistant that promises to “speak local” in Urdu, Punjabi, Sindhi, Pashto, Balochi, Kashmiri and Siraiki. At present, these efforts are loosely connected. The Qalb team is pursuing its own trajectory, positioning the model for education and enterprise use; the NUST–NITB consortium is operating under government mandates and timelines; regional experiments are still at an early stage; global platforms are extending their multilingual reach according to their own corporate priorities; and telcos such as Zong are primarily focused on customer-care efficiencies and brand differentiation. What the GSMA report offers Pakistan is not a ready-made blueprint but a way of stitching these threads into a coherent strategy.
It suggests, first, that Pakistani operators can and should lean into the service-provider role: integrating Urdu and regional-language understanding into IVR, call centres, chatbots and super-app interfaces, not as side projects but as core operational systems, in the same way Orange has woven Wolof into its customer support. Doing so would make AI tangibly useful for people who are comfortable in Urdu or Punjabi but not in English, and for low-literacy users who rely on voice rather than text, particularly women and rural customers who are already under-represented in Pakistan’s digital public sphere. Second, the report invites Pakistani telcos to step into the role of ecosystem convener. Just as Beeline assembled a coalition of ministry, universities and foreign compute partners to build KazLLM, and Indosat worked with AI Singapore and Nvidia, a Pakistani operator or, better, a consortium could bring together NUST, other public universities, independent model builders and the relevant ministries to agree on shared datasets, benchmarks and governance structures for Urdu and regional languages. That coordination is especially important if Pakistan is to avoid reproducing its existing language hierarchies in digital form. A purely Urdu-centric AI strategy, even if locally built, risks amplifying Urdu’s dominance at the expense of Sindhi, Balochi or Seraiki, just as the GSMA warns that some digitisation efforts strengthen already powerful languages while leaving threatened ones further behind.
Bringing provincial universities, language departments and community organisations into the design of corpora, evaluation benchmarks and licensing terms would not only enrich the models but also help ensure that the benefits of AI are more evenly spread across linguistic communities. Finally, the Indonesian and Kazakh examples point toward a more ambitious possibility: Pakistan’s telcos could, over time, evolve into sovereign AI infrastructure providers. There is, as yet, no “GPU Pakistan” whose mission is to host domestic language models and provide AI-as-a-service to local firms and the state; compute for Pakistani models is mostly rented from global clouds, and the governance of those models is shaped by foreign terms of service. The GSMA report shows that it is possible, albeit politically and financially demanding, for an operator to anchor a national AI cloud, as Indosat has done with GPU Merdeka, and to treat local-language models as national digital infrastructure rather than merely as corporate assets.
For Pakistan, this could mean a regulated, in-country GPU facility, perhaps jointly owned by telcos, state entities and private investors, that hosts Urdu and regional-language LLMs under licences designed to preserve data sovereignty and community interests, and exposes them through APIs to startups, public-sector agencies and even foreign partners on Pakistani terms. It would not free the country from global AI entirely, and nor should it; but it would ensure that at least some of the models mediating how Pakistanis learn, trade and debate are trained on Pakistani languages, governed under Pakistani law and accountable, in some measure, to Pakistani institutions. Such a strategy would not be cheap or easy. It would demand investment in compute at a time of fiscal constraint, institutional cooperation in a political culture not always known for it, and a willingness by telcos to see themselves as more than spectrum licensees and tariff merchants. But the alternative is not a neutral equilibrium. It is a future in which the core linguistic infrastructure of Pakistan’s digital life is outsourced by default: where the most capable models for Urdu and regional languages are fine-tuned abroad on data scraped from Pakistani conversations; where public agencies rely on foreign APIs whose behaviour they cannot audit; and where the speech patterns of Karachi and Quetta become just another line item in someone else’s training dataset. The GSMA report does not mention Pakistan once, but it captures the stakes with uncomfortable clarity: language is where AI’s promises and inequalities meet, and mobile operators, by virtue of where they sit, can help decide whether their societies enter the AI age in their own voices or as a translated afterthought.
Source Intelligence Layer: 1
Follow the SPIN IDG WhatsApp Channel for updates across the Smart Pakistan Insights Network covering all of Pakistan’s technology ecosystem.