Jalapeño: OpenAI's first chip goes straight for inference (and NVIDIA)

OpenAI just unveiled Jalapeño, its first in-house chip, co-designed with Broadcom. The name is playful, but the move is one of the serious ones: OpenAI stops being just a consumer of someone else’s silicon and starts to build its own full stack. It’s worth understanding clearly what it is and what it isn’t.

What Jalapeño is (and isn’t)

First, to avoid the easy headline: Jalapeño is not a training chip. It is an inference accelerator optimized for LLMs —what OpenAI calls an “Intelligence Processor.” That is, it isn’t meant to train the next GPT from scratch; it’s meant to run already-trained models faster and, above all, cheaper. In practice, it’s designed so ChatGPT and the API can answer millions of queries with better economics.

That distinction matters. Training remains NVIDIA’s undisputed kingdom. But inference —serving the model in production, day after day— is where the bulk of recurring spend lives at scale. Going after that is a smart play.

The numbers they did give

OpenAI was sparse on hardware specs (no public figures for memory, HBM, TFLOPs or power draw), but it dropped three numbers that tell the story:

~50% cheaper inference. That’s the central promise: half the cost versus the current alternative. At OpenAI’s scale, that’s billions.
9-month development. A brutally short cycle for a custom chip, and they admit they sped it up using their own AI models in the design. It’s a powerful image: using AI to build the hardware that will serve more AI.
Deployment by end of 2026, scaling to gigawatt-scale data centers. This isn’t a lab experiment; it’s going to production.

The real target: NVIDIA

All the context points the same way. Jalapeño is, plainly, a strike at NVIDIA dependence. The press described it as a rival to Blackwell, as a way to cover the capacity NVIDIA can’t supply, and as a way to put NVIDIA’s pricing power in inference on notice.

It’s the same play Google (TPU), Amazon (Trainium/Inferentia) and Microsoft have already made: if you depend on a single supplier for your most expensive input, you design your own silicon to claw back margin and control. OpenAI is late to that party, but with an inference volume that more than justifies the effort.

What we still don’t know

Let’s be honest about what’s missing. There are no public technical specs: no process node, no memory, no bandwidth, no raw performance. The “~50% cheaper” is OpenAI’s figure, with no independent benchmark yet. And announcing is one thing; deploying at gigawatt scale with reliability is another. The history of custom chips is full of brilliant announcements that took their time to deliver.

My take

Jalapeño looks like the right move in the right place. Going after inference —not training— is pragmatic: it’s where the most recurring spend is and where a 50% saving translates directly into margin. The 9-month cycle, accelerated with their own models, is also a sign of where all this is heading. I don’t expect NVIDIA to tremble tomorrow; training is still theirs. But inference is an increasingly contested field, and OpenAI just planted its flag with its own silicon. We’ll need to see the specs and the real deployments, but the direction is unmistakable.