CW Pakistan
  • Legacy
    • Legacy Editorial
    • Editor’s Note
  • Academy
  • Wired
  • Cellcos
  • PayTech
  • Business
  • Ignite
  • Digital Pakistan
  • DFDI
  • PSEB
  • PASHA
  • TechAdvisor
  • GamePro
  • Partnerships
  • PCWorld
  • Macworld
  • Infoworld
  • TechHive
  • TechAdvisor
0
0
0
0
0
Subscribe
CW Pakistan
CW Pakistan CW Pakistan
  • Legacy
    • Legacy Editorial
    • Editor’s Note
  • Academy
  • Wired
  • Cellcos
  • PayTech
  • Business
  • Ignite
  • Digital Pakistan
  • DFDI
  • PSEB
  • PASHA
  • TechAdvisor
  • GamePro
  • Partnerships
  • TechAdvisor

Google EmbeddingGemma Leads Small Parameter Embedding Models With On-Device AI

  • September 6, 2025
Total
0
Shares
0
0
0
Share
Tweet
Share
Share
Share
Share

Google has expanded its Gemma family of models with the launch of EmbeddingGemma, an open-source embedding model designed for on-device use across smartphones, laptops, and desktops. Based on the Gemma 3 architecture, EmbeddingGemma is a 308 million parameter model trained on more than 100 languages and tailored to deliver efficient, private, and high-quality embeddings. According to Google DeepMind’s Min Choi, product manager, and Sahil Dua, lead research engineer, the model is built to integrate seamlessly with widely used tools such as Ollama, llama.cpp, MLX, LiteRT, LMStudio, LangChain, LlamaIndex, and Cloudflare, making it highly adaptable for developers seeking to deploy AI applications locally.

EmbeddingGemma has demonstrated strong results on the Massive Text Embedding Benchmark (MTEB) multilingual v2, where it ranked as the top-performing model under 500 million parameters. This performance underscores Google’s focus on delivering models that can run natively on personal hardware without requiring cloud dependency. The model’s design also supports customizable output dimensions and allows developers to apply it for a range of use cases, including Retrieval Augmented Generation (RAG) and semantic search. These capabilities position EmbeddingGemma as a tool that enables efficient AI-powered applications directly on user devices, ensuring privacy and functionality even in offline environments.

One of the most significant applications of EmbeddingGemma is its role in enabling mobile RAG pipelines. Traditionally, RAG systems rely on cloud or on-premises infrastructure to process embeddings and generate context-aware responses. By shifting this capability to devices like laptops and smartphones, enterprises can empower employees to access and query information directly through their local hardware. This approach allows for faster, more secure interactions with data, while reducing reliance on internet connectivity. Choi and Dua emphasized that the quality of the initial retrieval step is crucial in such pipelines, noting that poor embeddings can lead to irrelevant or inaccurate answers. EmbeddingGemma addresses this challenge with its high-quality representations, which enhance the reliability of on-device RAG systems.

To achieve this flexibility, Google introduced a method called Matryoshka Representation Learning within EmbeddingGemma. This allows developers to choose between different embedding vector sizes depending on their needs. For instance, developers may use the full 768-dimension vector for detailed tasks or opt for smaller dimensions to prioritize speed and efficiency. This adaptability makes the model suitable for diverse scenarios, from advanced enterprise applications to lightweight mobile solutions. The release also reflects growing interest in the embedding model space, where Google faces competition from Cohere’s Embed 4, Mistral’s Codestral Embed, OpenAI’s Text Embedding 3 Large, and Qodo’s Qodo-Embed-1-1.5B.

As interest in running AI applications natively on mobile devices continues to expand, hardware makers like Apple, Samsung, and Qualcomm are also working on ways to support models without compromising device performance or battery life. The arrival of EmbeddingGemma illustrates how embedding models are increasingly becoming a core component of enterprise AI strategies, with developers and organizations showing enthusiasm for integrating them into local workflows. Google’s emphasis on multilingual training, flexibility, and compatibility with popular AI frameworks positions EmbeddingGemma as an important entry in the embedding model market, particularly for developers seeking practical and private on-device solutions.

Source

Follow the SPIN IDG WhatsApp Channel for updates across the Smart Pakistan Insights Network covering all of Pakistan’s technology ecosystem. 

Share
Tweet
Share
Share
Share
Related Topics
  • AI models
  • DeepMind
  • EmbeddingGemma
  • embeddings
  • Gemma 3
  • Google
  • MLX
  • mobile AI
  • Ollama
  • on-device AI
  • RAG
  • semantic search
Previous Article
  • Business

Pakistan’s Services Exports Jump 18.3% In July Driven By IT And Telecom Growth

  • September 6, 2025
Read More
Next Article
  • Cellcos

PTCL Confirms Submarine Cable Damage May Slow Internet Speeds In Pakistan

  • September 6, 2025
Read More
You May Also Like
Read More
  • TechAdvisor

Nothing Launches Its First Budget Smartphone Phone (3a) Lite In Pakistan

  • Press Desk
  • November 4, 2025
Read More
  • TechAdvisor

Gray Imports Fill Pakistan’s Gadget Gap But Leave Buyers With Higher Costs And Uncertain Support

  • Press Desk
  • November 2, 2025
Read More
  • TechAdvisor

KOORUI Officially Launches Gaming Monitors In Pakistan With Up To 320Hz Displays

  • Press Desk
  • October 31, 2025
Read More
  • TechAdvisor

Microsoft Teams Wi-Fi Attendance Feature Set to Influence Hybrid Work Culture in Pakistan

  • Press Desk
  • October 29, 2025
Read More
  • TechAdvisor

Airlink To Begin Local Assembly Of Acer Laptops In Pakistan Under Pilot Phase

  • Press Desk
  • October 26, 2025
Read More
  • TechAdvisor

YouTube Fixes Major Outage Affecting Global Video Streaming Services

  • Press Desk
  • October 19, 2025
Read More
  • TechAdvisor

Instagram Tightens Teen Account Filters To Match PG-13 Rating Standards

  • Press Desk
  • October 17, 2025
Read More
  • TechAdvisor

OpenAI Launches ChatGPT Prompt Templates For Multiple Job Roles And Industries

  • Press Desk
  • October 17, 2025
Trending Posts
  • Pakistani Researchers Recognized Among World’s Top 2% Most-Cited Scientists by Stanford
    • November 12, 2025
  • Pathfinder Group Charts Pakistan’s Digital Transformation at Flagship Tech Event
    • November 12, 2025
  • TikTok Introduces Enhanced Safety and Productivity Tools for Pakistani Creators
    • November 12, 2025
  • CeDAR at LUMS Hosts Startup Weekend Lahore 2025 Highlighting FutureTech Innovations
    • November 12, 2025
  • Pakistan Moves Forward With E-Courts System to Digitize Judiciary
    • November 12, 2025
about
CWPK Legacy
Launched in 1967 internationally, ComputerWorld is the oldest tech magazine/media property in the world. In Pakistan, ComputerWorld was launched in 1995. Initially providing news to IT executives only, once CIO Pakistan, its sister brand from the same family, was launched and took over the enterprise reporting domain in Pakistan, CWPK has emerged as a holistic technology media platform reporting everything tech in the country. It remains the oldest continuous IT publishing brand in the country and in 2025 is set to turn 30 years old, which will be its biggest benchmark and a legacy it hopes to continue for years to come. CWPK is part of the SPIN/IDG Wakhan media umbrella.
Read more
Explore Computerworld Sites Globally
  • computerworld.es
  • computerworld.com.pt
  • computerworld.com
  • cw.no
  • computerworldmexico.com.mx
  • computerwoche.de
  • computersweden.idg.se
  • computerworld.hu
Content from other IDG brands
  • PCWorld
  • Macworld
  • Infoworld
  • TechHive
  • TechAdvisor
CW Pakistan CW Pakistan
  • CWPK
  • CXO
  • DEMO
  • WALLET

CW Media & all its sub-brands are copyrighted to SPIN-IDG Wakhan Media Inc., the publishing arm of NCC-RP Group. This site is designed by Crunch Collective. ©️1995-2025. Read Privacy Policy.

Input your search keywords and press Enter.