Pakistani Developer Creates Qehwa, The World’s First Pashto Language AI Large Language Model And Chatbot

A Pakistani developer has independently built what is believed to be the world’s first large language model designed specifically for the Pashto language, filling a significant gap in artificial intelligence coverage for one of South Asia’s most widely spoken tongues. The model, named Qehwa, was developed entirely by Junaid Ahmed as a solo effort with no external funding, institutional backing, or team support, and is aimed at serving more than 60 million Pashto speakers across the world. The project was inspired by Qalb, the Urdu large language model built by Taimoor Hassan, and is tailored specifically to the Peshawari dialect of Pashto, addressing the consistent shortcomings that existing global artificial intelligence systems have demonstrated when processing Pashto text and its associated cultural context.

The development process involved two distinct training phases built on top of Qwen2.5-7B, an open-source large language model with seven billion parameters developed by Alibaba Cloud, which provided a strong general foundation in logic, coding, and multilingual understanding. In the first phase, the model underwent continued pre-training using 3.4 million Pakistani Pashto documents, with the objective of deepening its vocabulary, grammatical competence, and cultural awareness. To make the training feasible for a solo developer without access to enterprise-grade hardware, Ahmed employed a technique called Low-Rank Adaptation, set at a rank of 64, which allows a large model to be fine-tuned by updating only a targeted subset of its parameters rather than all seven billion, significantly reducing the computational resources required. In the second phase, the model was further trained on more than 100,000 Pashto instruction pairs, enabling it to respond to prompts, handle question-and-answer tasks, perform translations, and carry on conversational exchanges. Ahmed acknowledged guidance from Faiza Ghaffar throughout the project and credited the broader open-source ecosystem, including tools from Unsloth AI and Hugging Face, as instrumental in enabling the development process.

Qehwa was evaluated using a purpose-built benchmark consisting of 150 tests spread across 15 categories, making it the first Pashto artificial intelligence model to be assessed through structured evaluation criteria of this kind. The model achieved an overall accuracy of 85.3% across all categories. In the translation domain, English to Pashto conversion reached 90% accuracy while Urdu to Pashto translation scored 84%, and subject-specific categories including culture and history, health and daily life, and geography and nature each reached 90% accuracy. The model accepts prompts in Pashto, English, and Urdu and generates responses in Pashto. Qehwa is available as a free and open-source project, allowing researchers and developers to explore, adapt, and build upon it. For those looking to run the model locally, it supports deployment via Unsloth for faster inference as well as four-bit quantisation through BitsAndBytes, which compresses the model sufficiently to run on consumer-grade graphics cards rather than requiring expensive server-level hardware.

Follow the SPIN IDG WhatsApp Channel for updates across the Smart Pakistan Insights Network covering all of Pakistan’s technology ecosystem.