OpenAI's road to become a hyperscaler
TSMC's dilemma, OpenAI or Oracle, prediction on ambient computing
In this week’s post:
TSMC’s dilemma, OpenAI’s hyperscaler dreams, and ambient computing
Last week, OpenAI and Broadcom confirmed long-suspected rumors of a custom AI chip (rumored since 2023). OpenAI will buy 10GW of custom AI chips from Broadcom over the next 4 years, starting in the second half of 2026.
OpenAI will be effectively triple-sourcing its long term AI chip supply: NVIDIA, AMD, and Broadcom.
Remember…, NVIDIA, AMD, and Broadcom purchase the vast majority of their AI accelerator chips from TSMC.
TSMC’s dilemma
If OpenAI needs 1 or 2 orders of magnitude more chips, from 2GW (today’s) to 20GW (over the next 5 years) to 250GW (in 10 years), where are the new fabs?
As of last week’s earnings, TSMC has not yet announced any increase to its rate of capital expenditures since the OpenAI deals.
Tricky business for TSMC going forward. They face a few strategic dilemma moving forward:
Managing supply distribution to top customers: In the short term, TSMC continues to have an effective monopoly on leading-edge chip fabrication.
Now with one end-user (OpenAI) exerting demand pressure via 3 vendors (NVIDIA, AMD, and Broadcom):
How can TSMC effectively price and manage supply allocations to its top customers?
Will this start a bidding war?
Taiwan vs. U.S. domestic production: TSMC and Samsung (TSMC’s closest competitor) both have operating (or close-to-operating) chip fabs in the U.S.
Yet, TSMC continues to call out U.S. fabs are more expensive1 to operate, diluting gross margins by 200 to 400 basis points (~2-4%).
How will TSMC manage CapEx discipline overseas vs. Taiwan in the face of more competitive pricing environment?
Will TSMC be stuck investing in the U.S. for geopolitical reasons, and thus giving up gross margins in the long term?
Intentional undersupply or risk management?: C.C. Wei (TSMC CEO & Chairman) recently said:
I believe we are just in the early stage of the AI application. So very hard to make right forecast at this moment.
For each 1GW increase in AI data center buildout, TSMC estimates it costs its customers ~$50 billion in investments.
Yet, for 2025, TSMC’s annual CapEx guidance is ~$42 billion. If inline with comments and historic trends, its 2026 CapEx will be closer to ~$42-50 billion2.
OpenAI alone expects to build out 20GW of capacity over the next 4 years, effectively investing ~$1 trillion in total.
Does TSMC believe the ramp in AI demand, and will it show up in its CapEx guidance next year? Or is it intentionally undersupplying as a risk-management strategy?
OpenAI’s hyperscaler dreams via vertical integration
Broadcom is the chip supplier for Google’s TPUs, Meta’s custom AI chips (MTIA), and Apple’s not yet released AI server chips.
Custom chips for cloud workloads are not new. Cloud computing hyperscalers—Amazon Web Services, Microsoft Azure, and Google Cloud Platform—all deploy custom chips in their cloud services. Custom chips are used mostly for CPUs workloads, but increasingly for GPU workloads.
Effectively, OpenAI chose to pursue the same vertical integration strategy as the cloud computing giants. It is trying to become a hyperscaler.
But will Oracle be the new hyperscaler instead?
For hyperscalers, Amazon, Google, Microsoft, and META all build their custom chips to specifications for maximum efficiency. The chip designs take into account the data center’s design, configuration, and operations.
Ultimately, cloud computing and hyperscalers are economies of scale businesses. Managing this long-lead time and capital intensive business requires showing its compute investments have increasing returns to scale—meaning for each dollar of revenue and investment, its returns should grow.
But, OpenAI’s cloud computing (including GPU compute) is primarily served via Microsoft Azure and Oracle’s cloud services3. They run mostly everything on Azure.
Oracle and Azure will no doubt accrue economies of scale savings, as they expand and operate their data centers with diligence. But they have no mandate to pass on such savings to OpenAI.
Even if AI demand explodes as planned, OpenAI could be holding the bag and kneecapping their long term strategic position.
Instead, Oracle and Larry Ellison may emerge as the new hyperscaler and capturing a stranglehold in the AI computing value chain—just like they did in the database business three decades ago.
Always on agents: ambient computing will need more compute
In OpenAI’s Broadcom announcement podcast (the full transcript is reproduced at the end of this post), Greg Brockman revealed some of his thinking on agents over the longer term:
Our intent is to turn ChatGPT into something that helps you achieve your goals. The thing is, we can only release [ChatGPT Pulse] to the pro tier because that’s the amount of compute that we have available. Ideally, everyone would have an agent that’s running for them 24/7 behind the scenes, helping them achieve their goals.
Ideally, everyone has their own accelerator, has their own compute power that’s just running constantly.
That means with 10 billion humans, we are nowhere near being able to build 10 billion chips.
Having an ambient AI agent running all the time seems to be a no-brainer. But yet, there are not nearly enough compute resources to be a reality or cost-effective.
For an ambient agent to be productive, it needs to observe, listen, and react in real-time.
Today, the closest offering to a realtime AI mode is OpenAI’s realtime API feature.
It is also one of OpenAI’s most expensive API offerings, nearly ~6x output pricing.
For audio, it costs $32/1M input tokens and $64/1M output tokens–that is compared to GPT-5’s $1.25/1M input and $10/1M output.
The high price reflects the high amount of resources required to serve an AI model in realtime (with very low latency).
As I’ve written before, when running AI models, there is a trade-off between latency (speed for the model to respond) and throughput (how many responses can be processed simultaneously).
Having higher throughput is more efficient, thus less costly for a model provider.
Faster responses (low latency) requires very low batch sizes (very low throughput).
Physically, that means the model service provider is holding GPU chips idle (or below optimal utilization) in order to guarantee a very timely response.
Lastly, when combining the need for low latency (realtime) with thinking (reasoning mode), the compute requirements for ambient AI grow exponentially.
Today’s OpenAI realtime model has no reasoning function, partially for this reason.
Reasoning will linearly increase the context of the conversation, which will exponentially increase the compute requirements for the model to continue to respond to the user.
Here, I make a prediction that always-on agents (or ambient computing) is the missing UX (user experience) component in today’s AI paradigm. But the technology is bottlenecked by the availability and cost of compute, the effects can only slowly diffuse over time.
Once realtime agents can be served as cheaply as Google search (marginal cost of near zero for Google), then ambient computing can diffuse quickly across the internet.
The following is a transcript of an OpenAI Podcast episode.
The host’s questions are in bold. The other speakers are:
SA = Sam Altman (OpenAI)
GB = Greg Brockman (OpenAI)
HT = Hock Tan (Broadcom)
CK = Charlie Kawwas (Broadcom)
The transcript is lightly edited for content and clarity.
What are we talking about today? What brought you all together?
SA: Today we’re announcing a partnership between Broadcom and OpenAI. We’ve been working together for about the last 18 months designing a new custom chip.
More recently, we’ve also started working on a whole custom system; these things have gotten so complex you need the whole thing.
We will be starting in late next year, deploying 10 gigawatts of these racks of these systems and our chip, which is a gigantic amount of computing infrastructure to serve the needs of the world to use advanced intelligence.
So this is going to entail both compute and chip design and scaling out?
SA: This is a full system. We closely collaborated for a while on designing a chip that is specific for our workloads.
When it became clear to us just how much inference capacity the world was going to need, we began to think about whether we could do a chip that was meant just for that kind of a very specific workload. Broadcom is the best partner in the world for that, obviously.
Then to our great surprise—this was not the way we started—as we realized that we were going to need the whole system together to support this as this got more and more complex, it turns out Broadcom is also incredible at helping design systems. So we are working together on that entire package, and this will help us even further increase the amount of capacity we can offer for our services.
Hock, how did this come about? When did you first talk about working together on this?
HT: Other than the fact that Sam and Greg are great people to work with, it’s a natural fit because OpenAI has been doing and continues to do the most advanced frontier models in generative AI out there.
As part of it, you continue to need compute capacity—the best, latest compute capacity as you progress in a roadmap towards a better and better frontier model and towards super intelligence.
Compute is a key part, and that comes with semiconductors and, as Sam indicated, more than semiconductors. We are, even though I say it myself, probably the best semiconductor company out there. AI is a very exciting opportunity for us; my engineers are pushing the innovation envelope on newer and newer generations of semiconductor technology. For us, collaborating with the best generative AI company out there is a natural fit.
Vertical Integration and Scale
And this isn’t just chips, it’s going out to scale—like 10 gigawatts. I have trouble kind of even understanding that. What does that even mean when you’re talking about 10 gigawatts?
SA: The vertical integration point is really important. We are able to think from etching the transistors all the way up to the token that comes out when you ask ChatGPT a question and design the whole system: all of the stuff about the chip, the way we design these racks, the networking between them, how the algorithms that we’re using fit the inference chip itself—a lot of other stuff all the way to the end product. By being able to optimize across that entire stack, we can get huge efficiency gains, and that will lead to much better performance—faster models, cheaper models, all of that.
As you get that better performance and cheaper and smarter models, one thing that we have consistently seen is people just want to use way more. We used to think we’ll optimize things by 10x and we’ll solve all of our problems, but you optimize by 10x and there’s 20x more demand. So 10 incremental gigawatts—this is all on top of what we’re already doing with other partners and all the other data centers and silicon partnerships we’ve done. Ten gigawatts is a gigantic amount of capacity. And yet, if we do as good of a job as we hope, even though it’s vastly more than the world has today, we expect that with very high-quality intelligence delivered very fast and at very low price, the world will absorb it super fast and find incredible new things to use it for.
The hope is that the kinds of things people are doing now with this compute—writing code, automating more of enterprises, generating videos in Sora, whatever it is—they will be able to do that much more of it and with much smarter models.
AI Designing AI Chips
Greg and Charlie, when you think about historically when people have tried to develop chips or hardware to suit whatever was the current mode for using computing, what examples have you looked upon historically to figure out how to plan forward? What’s been inspiring you?
GB: The number one thing, honestly, is working with good partners. It’s very clear that we as a company are not able to do everything ourselves. Getting into actually building our own chips for our own specific workloads was not something we could do from a total standstill without working with Hock and Charlie and Broadcom. It’s been really incredible to lean on their expertise, together with our understanding of the workload.
It’s been very interesting to see the places where OpenAI is able to do things very differently from the rest of the industry or the way that things would historically be done.
For example, we’ve been able to apply our own models to designing this chip, which has been really cool. We’ve been able to pull in the schedule; we’ve been able to get massive area reductions. You take components that humans have already optimized and just pour compute into it, and the model comes up with its own optimizations.
We’re at the point now where I don’t think any of the optimizations we have are ones that human designers couldn’t have come up with. Usually our experts take a look at it later and say, “Yeah, this was on my list, but it was like 20 things that would have taken them another month to get to”. It’s really interesting that we were coming up on a deadline working with Charlie’s team and we were running optimizations. We had a choice: do we actually take a look at what those optimizations were, or do we just keep going until the deadline and then take a look after? We decided, of course, you got to just keep going.
We’ve really been building up this expertise in-house to understand this domain, and that’s something we think can help lift up the whole industry. I think we are heading to a world where AI intelligence is able to help humanity make new breakthroughs that just would not be possible otherwise, and we’re going to need just as much compute as possible to power that.
One very concrete example is that we are in a world now where ChatGPT is changing from something that you talk to interactively to something that can go do work for you behind the scenes.
If you’ve used features like Pulse, you wake up every morning and it has some really interesting things that are related to what you’re interested in; it’s very personalized.
Our intent is to turn ChatGPT into something that helps you achieve your goals. The thing is, we can only release this to the pro tier because that’s the amount of compute that we have available. Ideally, everyone would have an agent that’s running for them 24/7 behind the scenes, helping them achieve their goals.
Ideally, everyone has their own accelerator, has their own compute power that’s just running constantly.
That means with 10 billion humans, we are nowhere near being able to build 10 billion chips.
There’s a long way to go before we are able to saturate not just the demand, but what humanity really deserves.
CK: For us, it’s been absolutely exciting and refreshing because the beauty of the work we do together is the focus on a certain workload.
We started by first looking at the IP and AI accelerator, which is what we call the XPU. Then we realized very quickly that we can now go from the workload all the way down to the transistor. As Greg was just explaining, we can both work together to customize that platform for your workload, resulting in the best platform in the world. Then we realized, as Sam was saying, it’s not just that XPU or accelerator; it’s the networking that needs to scale it up, scale it out, and scale it across.
Suddenly we saw that we can actually drive the next level of standardization and openness that not only benefits us, but I think will benefit the entire ecosystem and get Gen AI to an AGI much faster. So, very excited about the technical capabilities of the teams we have, but also the vision and the speed at which we’ve been moving.
The Scale of the Mission
I’m still kind of wrapping my head around the scale of it. This is a global effort. What comparisons have you been able to draw for this to other examples in history?
SA: I always think the historical analogies are tough, but as far as I know... I don’t know what fraction of global GDP building the Great Wall was at the time, but in a lot of ways that you would look at the AI infrastructure build-out right now, you would say it’s the biggest joint industrial project in human history. This requires a lot of companies, a lot of countries, a lot of industries to come together. A lot of stuff has to happen at the same time, and we’ve all got to invest together.
At this point, given everything we see coming on the research front and all of the value we see being created on the business front, I think the whole industry has decided this is a very good bet to take. It is huge. You go to one of these one-gigawatt data centers, and you look at the scale of what’s happening there—it’s like a tiny city. It’s a big, complex thing.
GB: To the point of this being a massive collaborative project, whenever I call Charlie, he’s in a different part of the world trying to secure capacity, trying to find a way to help us build what we’re trying to do together.
CK: One of the coolest things I was thinking about is what we’re doing together in this wonderful partnership: we’re defining civilization’s next-generation operating system. And we’re doing it at the transistor level, building new fabs, new manufacturing sites, all the way to building these racks and ultimately the 10 gigawatts of data centers you’re talking about.
It’s an important thing to keep track of. Often people get fixated just on the chips themselves, and it’s kind of like thinking the National Highway Project was about selling asphalt, or railroads are about steel. In reality, it’s the things that become possible on top of that.
HT: This is like railroads, the internet. That’s what I think this is becoming over time: critical infrastructure or a critical utility. And more than just critical utility for, say, 10,000 enterprises; this is a critical utility over time for 8 billion people globally. It’s like the industrial revolution of a different sort. But it cannot be done with just one party, or we like to think it can be done with two, but it needs a lot of partnerships; it needs collaboration across an ecosystem. Because of that, it’s important to create, much as we say about developing chips for specific workloads and applications, standards that are open and more transparent for all to use, because you need to build up a whole infrastructure to become a critical utility for 6 billion people in the world. We’re very excited, which is why we think we make great partners, because I think we share the same conviction.
It is about scaling computing to create breakthroughs in super intelligence and models. It’s building the foundation of that.
The Drive for Custom Silicon
You guys have a lot on your plate. Why design chips now?
GB: This project, we’ve probably been working on it for 18 months now, and it’s moved incredibly quickly. We’ve hired some really amazing people. We found that we have a deep understanding of the workload, and we work with a number of parties across the ecosystem.
There’s a number of chips out there that I think are really incredible, and there’s a niche for each one. We’ve really been looking for specific workloads that we feel are underserved and how can we build something that will be able to accelerate what’s possible.
That ability to say that we are able to do the full vertical integration for something we see coming, but it’s hard for us to work through other partners—that’s a very clear use case for this kind of project.
HT: Yes, and more than that. Computing is a big part of what’s gating this journey towards super intelligence, towards creating better and better frontier models. A lot of it comes down to computing—and not just any computing, but computing that is effective, high performance, and efficient, especially on power. What Greg is saying is exactly what we learned and saw here.
For instance, if you want to train, you design chips that are much stronger in computing capacity, measured in TFLOPs, as well as network, because it’s not just one chip that makes it happen; it’s a cluster.
But if you want to do inference, you put in more memory and memory access relative to compute. So you are, over time, creating chips optimized for particular workloads and applications as we go along. That, at the end of the day, is what will create the most effective models on a platform that you want to create end-to-end.
GB: And also, one piece of historical context is that when we started OpenAI, we didn’t really have that much of a focus on compute. We felt that the path to AGI is really about ideas, it’s about tryouts and stuff.
Eventually, we’ll put the right conceptual pieces in place, and then AGI. About two years in, in 2017, we found that we were getting the best results out of scale. It wasn’t something we set out to prove; it was something we discovered empirically because everything else didn’t work nearly as well. The first results were scaling up our reinforcement learning in the context of the video game Dota 2.
GB: Did you guys pay attention to the Dota 2 project back in the day? It was a super cool project.
We really saw if you scale it by 2x, suddenly your agent is 2x better. It’s like, okay, we have to push this to the limit. At that point, we started paying attention to the whole ecosystem. There were all sorts of chip startups with novel approaches that were very different from GPUs. We started giving them a ton of feedback saying, here’s where we think things are going; it needs to be models of this shape. Honestly, a lot of them just didn’t listen to us. It’s very frustrating to be in this position where you say, “We see the direction the future should be going,” but we have no ability to really influence it besides trying to influence other people’s roadmaps.
By being able to take some of this in-house, we feel like we are able to actually realize that vision, and again, in a way that we hope we can show a direction and other people will fill in. Because the amount of compute required to bring our vision of AGI to the world—10 gigawatts is not enough. That is a drop in the bucket compared to where we need to go.
SA: It’s a big drop...
Looking Ahead
What becomes possible with this when you’re building your own chips for inference and for training? Where can you take this?
SA: To zoom out a little bit, if you simplify what we do in this whole process to: melt sand, run energy through it, and get intelligence out the other end...
What we want is the most intelligence we can get out of each unit of energy, because that will become the gate at some point. I hope what this whole process will show us—from the model we design to the chip to the rack—is that we will be able to wring out so much more intelligence per watt. Then everybody that’s using these models in all of these incredible ways will do so much with it.
HT: And you control your own destiny. If you do your own chips, you control your destiny.
It’s interesting to think about how the things that we’re doing today are pretty amazing, but we’re using stuff that wasn’t actually designed specifically for the way we’re doing it.
SA: The GPUs of today are incredible things. I’m very grateful, and we will continue to need a lot of those. The flexibility and the ability to let us do fast research is amazing. But you are right that as we get more confident in what the shape of the future is going to look like, a very optimized system to the workload will let us wring more out per watt.
CK: And it’s a long journey that takes decades. If you go back to Hock’s example, take railroads, it took about a century to roll it out as a critical infrastructure. The internet took about 30 years.
This is not going to take five years; it’s going to take a long time. As we collectively, especially with this partnership, continue to figure out ways to wring out more tokens, we’ll discover that for this training or research, maybe a GPU is great, or maybe we can take what we’re doing with Greg. It’s actually a platform that allows you, like a Lego block, to take things in and out. Suddenly we can get another XPU or an accelerator for next-gen that’s targeted at training or inference or research.
GB: To the point that Sam said of GPUs have really come an incredible way, in 2017 when we started looking at all these other accelerators, it was very non-obvious what the landscape would look like in 5 or 10 years. I think it’s really a testament to companies like NVIDIA and AMD for how much the GPU has just moved forward and continued to be the dominant accelerator. But at the same time, there’s a massive design space out there, and what we see is workloads that are not served through existing platforms. That’s where that full vertical integration is something unique.
SA: The first cluster OpenAI had, the first one that I can remember the energy size for, was 2 megawatts. We got things done with those two.
I don’t remember when we got to 20; I remember when we got to 200. We will finish this year a little bit over 2 gigawatts, and these recent partnerships will take us close to 30.
The world has done far more than I thought they were going to do. It turns out you can serve 10% of the world’s population with ChatGPT and do the research and do Sora and do our API and a few other things on 2 gigawatts. But think about how much more the world would like to do than they get to do right now.
If we had 30 gigawatts today with today’s quality of models, I think you would still saturate that relatively quickly in terms of what people would do, especially with the lower cost we’ll be able to do with this.
But the thing we have learned again and again is, let’s say we can push GPT-6 to feel like 30 IQ points past GPT-5—something big. And that it can work on problems not for a few hours, but for a few days, weeks, months, whatever. While we do that, we bring the cost per token down.
The amount of economic value and surplus demand that happens each time we’ve been able to do that goes up a crazy amount. To pick a well-known example, when ChatGPT could write a little bit of code, people actually used it for that. They would very painfully paste in their code and wait and say, “Do this for me,” and paste it back in. Models couldn’t do much, but they could do a few things. The models got better, the UX got better, and now we have Codex. Codex is growing unbelievably fast and can now do a few hours of work at a higher level of capability. When that’s possible, the demand increase is crazy. Maybe the next version of Codex can do a few days of work at the level of one of the best engineers you know—or maybe that takes a few more versions, whatever, it’ll get there. Think how much demand there will be just for that, and then do it for every knowledge work industry.
GB: One way I like to think of it is that intelligence is the fundamental driver of economic growth, of increasing the standard of living for everyone. What we’re doing with AI is actually bringing more intelligence and amplifying the intelligence of everyone. As these models get better, everyone’s going to become more productive, and the output of what is possible is going to be totally different from what exists today.
And is that a motivating factor for you, the fact that every time you create these new efficiencies, it just benefits so many more people?
HT: From our side on hardware and compute capacity, where the rubber hits the road on this, it’s really incumbent on us to keep optimizing, pushing the envelope on leading-edge technology. There’s still room to go, even from where we are as we go from two nanometers forward, and even smaller than two nanometers, as we start doing all kinds of different technology. It is really great, exciting times, especially for the hardware and the semiconductor industry.
SA: What Broadcom has done here is really quite incredible. It used to be extremely difficult for a company like ours to think about making a competitive chip; in fact, so hard we just wouldn’t have done it. I think a lot of other companies wouldn’t have done it as well. This customized chip and system to a workload just wouldn’t be a thing in the world. The fact that they have pushed so hard and so well on making it so that a company can partner with them and they can do a miracle of a technology chip quickly and at scale... unfortunately they do it for all of our competitors too, but hopefully our chip will be the best. It’s really quite incredible.
GB: And not just what they can do for us today, but looking at the upcoming roadmap, it’s so exciting the kinds of technologies that they’re going to be able to bring to bear for us to be able to utilize.
HT: It’s just the excitement of enabling and collaboratively building models—ChatGPT-5, 6, 7, on and on. Each of them will require a different chip, a better chip, a more developed, advanced chip that we haven’t even begun to figure out how to get to, but we will.
CK: We’re actually looking forward to that because my software engineers now already use that from a software point of view, and it’s delivering efficiencies of dozens of engineers. On the hardware side, we’re not there yet.
But with respect to compute, when we started building these XPUs, you can build a maximum certain number of compute units in 800 square millimeters. Today, we’re working together to ship multiple of these in a two-dimensional space. The next thing we’re talking about is stacking these into the same chip, so now we’re going in the Y or Z dimension. Then the last step we’re also talking about is bringing optics into this, which is what we just announced: 100 terabits of switching with optics integrated all into the same chip.
These are the technologies that will take compute, the size of the cluster, the total performance and wattage of the cluster to a whole new level. I think it will keep doubling at least every six to 12 months.
What kind of timeframe are we talking about? When are we going to first start to see what’s coming out of the relationship?
SA: End of next year, and then we’ll deploy very rapidly over the next three years.
CK: Greg and I are talking about this at least once a week. We just had a chat earlier today on this.
GB: We’re really excited to get silicon back starting very soon, actually. My view of this whole project is it’s not easy. It’s easy to just say, “Oh, yeah, 10 gigawatts,” but when you look at what is required to actually design a whole new chip and deliver this at scale, get the whole thing working end to end, it’s an astronomical amount of work. I would say that we’re very serious. Our mission is to ensure that AGI benefits all of humanity, and we’re very serious about “benefits everyone”. We really want this to be a technology that is accessible to the whole world, that lifts up everyone. You can really see that in trying to make the world be one of compute abundance, because by default, we’re heading towards one that is quite compute scarce.
My wife feels it when she’s trying to get more Sora credits; it feels very scarce.
GB: We feel it so concretely. Teams within OpenAI, their output is a direct function of how much compute they get. The intensity on who gets the compute allocation is so extreme. What we really want is to be in a world where if you have an idea, you want to create, you want to build something, you have the compute power behind you to make it happen.
Gentlemen, thank you very much for sharing this with us. It’s going to be very exciting to see where this goes, and I hope we can keep talking about this as it continues to develop.
SA: Thank you guys for the partnership.
HT: Thank you. Thank you for the partnership. We’re really enjoying it.
GB: We are too.
“The company said the cost of its overseas expansion continued to dilute its gross margin, which was 59.5 per cent.“
TSMC has not released 2026 CapEx guidance, it is expected to announce guidance in January 2026.
At this point, OpenAI sources its cloud computing services from multiple providers: Microsoft Azure, Oracle Cloud, Google Cloud.