Why Nvidia's $20 billion Groq acquisition is a bet on the inference

By Shlomo Strauss · 2026-05-01

Without our noticing, the AI industry has undergone a quiet revolution that poses a serious threat to NVIDIA — and one of the clearest expressions of this shift is the $20 billion acquisition of Groq.

Many interpretations have been offered for this acquisition, and there is probably some truth in all of them.
But the real reason is tied to a deeper transformation sweeping the AI industry.

When artificial intelligence burst into our lives, the bulk of the effort and investment in the field was directed at training new models. Investors poured mountains of cash that were quickly burned up in the scorching cores of energy-hungry server farms.
Cost was of no concern to investors. The goal was singular — to produce the smartest models possible, and to do so before the competition.

Since the launch of the latest round of models from Claude, OpenAI, and Google, something has changed.
The models are now good enough to deliver real value to users and to the business sector, and investors are stopping the cash flow and starting to look for returns.
This shift signals a more mature phase of the industry, and the fact that it arrived so quickly is as impressive as anything else associated with artificial intelligence.

NVIDIA was the queen of the training era.
Its processors were designed for parallel processing — splitting massive tasks across thousands of processing cores and executing them simultaneously.
But for inference — the ability to receive a query from an individual user and generate a response using an existing model — their hardware architecture is inefficient. It responds slowly and consumes substantial resources.

During the training era, Google lagged behind with its TPU chips, which were developed specifically to answer Google search queries as quickly and efficiently as possible, but were weaker in terms of parallel processing.
With the shift to the inference era, that very capability turned out to be a gold mine, as these chips allow Google to operate its Gemini models at a low cost per token and with high profit margins.

If the threat from Google weren't enough, additional competitors are breathing down NVIDIA's neck, developing their own chips with the goal of reducing dependence on NVIDIA and cutting costs.

This is where the acquisition of Groq and its LPU chips enters the picture — chips designed specifically for efficient inference. They achieved this by eliminating the hardware component that distributes tasks among the processing cores in NVIDIA's GPUs.
In their chip, the compiler — whose role is to translate source code into machine language — also serves to precisely plan the distribution of processing tasks among the cores, making the process particularly efficient.
In addition, they did away with external RAM and instead embedded small, simple memory components directly inside the processor package. In this way, they achieved a significant performance improvement at lower cost.

The acquisition of Groq prepares NVIDIA for the inference era and allows it to compete in a market where it has fallen behind.
Evidence of this could be seen at the recent CES, where Jensen Huang, the company's CEO, announced that Groq's technology would be integrated into the company's new AI server, called Vera Rubin.

I wrote about this topic in greater detail in an article published on the Channel 10 website — if the technical explanation interests you, you're welcome to head over there and read more.

Image: An LPU chip manufactured by Groq | Credit: Groq

Why Nvidia's $20 billion Groq acquisition is a bet on the inference