OpenAI and Broadcom Debut Inference Chip Jalapeño, Custom Silicon Era Begins in Earnest
OpenAI on Tuesday pulled the curtain back on Jalapeño, a custom-designed accelerator built in partnership with Broadcom and aimed squarely at one job: running large language models in production. Unlike the general-purpose GPUs that have powered the generative AI boom so far, Jalapeño is an inference-only chip, stripped of the heavy training circuitry that adds cost and complexity. The pitch is straightforward. The vast majority of compute that a company like OpenAI burns through is no longer spent teaching models but answering the billions of queries that flow in every day, and a chip tuned for exactly that workload promises to do it faster, cooler, and far more cheaply than the hardware it would replace.
The collaboration with Broadcom is telling. Rather than attempt to become a chip manufacturer overnight, OpenAI has leaned on a partner with deep experience in custom silicon and high-speed networking, the same playbook Google followed with its TPUs and Amazon with Inferentia. Broadcom supplies the design expertise and the connective tissue that lets thousands of these accelerators work in concert inside a data center, while OpenAI contributes intimate knowledge of how its own models actually behave under load. That feedback loop, the ability to shape the silicon around the quirks of GPT-class transformers instead of bending the software to fit someone else's hardware, is precisely what off-the-shelf parts cannot offer.
Beneath the engineering lies a clear strategic calculation. For years OpenAI has been one of Nvidia's largest and most visible customers, and that reliance has come with painful trade-offs: scarce supply, premium pricing, and a roadmap set by another company. By bringing inference in-house, OpenAI gains leverage over the single largest line item in its cost structure and insulates itself from the supply crunches that have repeatedly throttled the entire industry. It also joins a widening movement, with Google, Amazon, Microsoft, and Meta all racing to design their own accelerators, that is quietly redrawing the balance of power in AI infrastructure away from a single dominant supplier.
Whether Jalapeño lives up to its billing will depend on numbers OpenAI has not yet fully disclosed, chief among them real-world cost per token and how quickly the company can manufacture and deploy the chip at scale. Custom silicon is notoriously hard to get right, and a first-generation part rarely topples an incumbent on its own. But the direction is unmistakable. As inference becomes the dominant expense of operating frontier models, controlling the hardware that performs it is no longer a luxury but a competitive necessity, and OpenAI has just declared that it intends to own that layer of its own stack.