Categories
Hardware

Jalapeño: OpenAI's first chip goes straight for inference (and NVIDIA)

OpenAI unveiled Jalapeño, its first custom chip co-designed with Broadcom. It is not for training: it is an LLM inference accelerator promising ~50% lower cost, built in 9 months, aimed squarely at NVIDIA's dominance.

Retrato profesional de Giovanni Moreno, ingeniero de IA, con iluminación cinematográfica en tonos púrpura.

Giovanni Moreno

AI/ML Engineer & Backend Architect

June 23, 2026 3 min read
A ripe jalapeño pepper on a neutral background, a direct nod to the name of OpenAI's first chip.

OpenAI just unveiled Jalapeño, its first in-house chip, co-designed with Broadcom. The name is playful, but the move is one of the serious ones: OpenAI stops being just a consumer of someone else’s silicon and starts to build its own full stack. It’s worth understanding clearly what it is and what it isn’t.

What Jalapeño is (and isn’t)

First, to avoid the easy headline: Jalapeño is not a training chip. It is an inference accelerator optimized for LLMs —what OpenAI calls an “Intelligence Processor.” That is, it isn’t meant to train the next GPT from scratch; it’s meant to run already-trained models faster and, above all, cheaper. In practice, it’s designed so ChatGPT and the API can answer millions of queries with better economics.

That distinction matters. Training remains NVIDIA’s undisputed kingdom. But inference —serving the model in production, day after day— is where the bulk of recurring spend lives at scale. Going after that is a smart play.

The numbers they did give

OpenAI was sparse on hardware specs (no public figures for memory, HBM, TFLOPs or power draw), but it dropped three numbers that tell the story:

The real target: NVIDIA

All the context points the same way. Jalapeño is, plainly, a strike at NVIDIA dependence. The press described it as a rival to Blackwell, as a way to cover the capacity NVIDIA can’t supply, and as a way to put NVIDIA’s pricing power in inference on notice.

It’s the same play Google (TPU), Amazon (Trainium/Inferentia) and Microsoft have already made: if you depend on a single supplier for your most expensive input, you design your own silicon to claw back margin and control. OpenAI is late to that party, but with an inference volume that more than justifies the effort.

What we still don’t know

Let’s be honest about what’s missing. There are no public technical specs: no process node, no memory, no bandwidth, no raw performance. The “~50% cheaper” is OpenAI’s figure, with no independent benchmark yet. And announcing is one thing; deploying at gigawatt scale with reliability is another. The history of custom chips is full of brilliant announcements that took their time to deliver.

My take

Jalapeño looks like the right move in the right place. Going after inference —not training— is pragmatic: it’s where the most recurring spend is and where a 50% saving translates directly into margin. The 9-month cycle, accelerated with their own models, is also a sign of where all this is heading. I don’t expect NVIDIA to tremble tomorrow; training is still theirs. But inference is an increasingly contested field, and OpenAI just planted its flag with its own silicon. We’ll need to see the specs and the real deployments, but the direction is unmistakable.

OpenAI Broadcom inference chips NVIDIA
Retrato profesional de Giovanni Moreno, ingeniero de IA, con iluminación cinematográfica en tonos púrpura.

The author

Giovanni Moreno

Informatics Engineer with 3+ years building ML pipelines, NLP systems, and computer vision solutions. Currently engineering AIOps at IBM with Python, FastAPI, and Kubernetes on AWS.

Follow

Join the conversation

Loading...

Related insights