Google is reportedly in discussions with Marvell Technology to develop a two chip strategy aimed at improving artificial intelligence inference performance, signaling a shift in how large scale AI workloads are being optimized across cloud infrastructure. The proposed plan includes the development of two distinct chips, one designed to enhance existing tensor processing units and another built as a next generation TPU specifically optimized for inference tasks. The development reflects a growing industry focus on improving efficiency in the deployment phase of AI models, where systems generate outputs rather than train on data.
The first component of the proposed architecture is a memory processing unit, which would work alongside Google’s existing TPUs by handling memory intensive operations separately. This approach is intended to reduce bottlenecks associated with data movement and bandwidth limitations, allowing the primary processing unit to focus on computation. By offloading memory related tasks, the system can achieve faster response times and improved overall efficiency, particularly in large scale AI applications that rely heavily on real time inference.
The second chip in the plan is a new TPU designed specifically for inference workloads. Unlike traditional AI chips that are often built to handle both training and inference, this specialized design focuses on optimizing cost, power consumption, and performance for serving AI models at scale. Industry reports suggest that inference is becoming the dominant cost factor in AI deployment, as the frequency of real world usage continues to grow across applications such as search, digital assistants, and enterprise software.
From a broader technology perspective, the move highlights Google’s strategy to diversify its semiconductor supply chain and reduce reliance on external GPU providers such as Nvidia. By working with multiple partners, including Marvell alongside existing collaborators like Broadcom and MediaTek, Google is building a more flexible and scalable custom silicon ecosystem. This multi partner approach also allows the company to tailor chip designs for specific workloads, improving efficiency while managing production constraints in a highly competitive semiconductor market.
The proposed two chip TPU model reflects a larger industry transition toward application specific integrated circuits, where companies design custom hardware tailored to their own AI workloads. As demand for AI services continues to increase, particularly in cloud computing environments, the ability to optimize inference performance is becoming a key differentiator. While the discussions between Google and Marvell have not yet been finalized, the initiative underscores how major technology companies are rethinking hardware design to support the next phase of AI driven computing.
Follow the SPIN IDG WhatsApp Channel for updates across the Smart Pakistan Insights Network covering all of Pakistan’s technology ecosystem.