Remark For all of the superlative-laden claims, OpenAI’s new high mannequin seems to be much less of an development and extra of a option to save compute prices — one thing that hasn’t precisely gone over nicely with the corporate’s most devoted customers.
Because the flag bearer that kicked off the generative AI period, OpenAI is below appreciable strain not solely to exhibit technological advances, but additionally to justify its large, multi-billion-dollar funding rounds by exhibiting its enterprise is rising.
To do this, OpenAI can both improve its consumer base, elevate costs, or lower prices. A lot of the business is already aligning round its $20 and $200 a month pricing tiers. So OpenAI would want to supply one thing others can’t to justify a premium, or threat dropping prospects to rivals similar to Anthropic or Google.
With the tutorial yr about to kick off, OpenAI is bound to select up a recent spherical of subscriptions as college students file again into school rooms following the summer season break. Whereas extra paying prospects will imply extra revenues, it additionally means larger compute prices.
Enter the cost-cutting period.
Maybe one of the best proof of cost-cutting is the truth that GPT-5 is not truly one mannequin. It is a assortment of a minimum of two fashions: a light-weight LLM that may shortly reply to most requests and a heavier responsibility one designed to sort out extra advanced subjects. Which mannequin prompts land in is set by a router mannequin, which acts a bit like an clever load balancer for the platform as an entire. Picture prompts use a very completely different mannequin, Picture Gen 4o.
It is a departure from how OpenAI has operated up to now. Beforehand, Plus and Professional customers have been in a position to decide on which mannequin they’d like to make use of. In the event you wished to ask o3 mundane questions that GPT-4o may have simply dealt with, you possibly can.
In principle, OpenAI’s router mannequin ought to permit the majority of GPT-5’s visitors to be served by its smaller, much less resource-intensive fashions.
We will see extra proof of cost-cutting in OpenAI’s determination to toggle reasoning on and off by default robotically, relying on the complexity of the immediate. Freeloaders… we imply free tier customers, do not have the power to toggle this on themselves. The much less reasoning the fashions are doing, the less tokens they generate and the cheaper they’re to function.
However whereas this strategy could also be smarter for OpenAI’s backside line, it would not appear to have made the fashions themselves all that a lot smarter. As we addressed in our launch day protection, OpenAI’s benchmarks present fairly modest beneficial properties in comparison with prior fashions. The largest enhancements have been in device calling and curbing hallucinations.

Your eyes aren’t deceiving you, GPT-5 reveals solely iterative enhancements in math benchmarks like AIME 2025 – Click on to enlarge
The brand new system relies on the routing mannequin to redirect prompts to the proper language mannequin, which, based mostly on early suggestions, hasn’t been going all that nicely for OpenAI. In line with Altman, on launch day, GPT-5’s routing performance was damaged, which made the mannequin appear “method dumber” than it truly is.
Presumably that is why GPT-5 thought that “Blueberry” has only one B. Now it seems that OpenAI has mounted that fairly embarrassing mistake.
However since GPT-5’s router is a separate mannequin, the corporate can, a minimum of, enhance it.
Deprecating fashions
The router mannequin is not OpenAI’s solely cost-cutting measure. In the course of the AI behemoth’s launch occasion final week, execs revealed that they have been so assured in GPT-5 that they have been deprecating all prior fashions.
That did not go over nice with customers, and CEO Sam Altman later admitted that OpenAI made a mistake when it elected to take away fashions like GPT-4o, which, regardless of its lack of reasoning functionality and customarily poorer efficiency in benchmarks, is seemingly fairly common with finish customers and enterprises.
“When you have been following the GPT-5 rollout, one factor you could be noticing is how a lot of an attachment some folks should particular AI fashions. It feels completely different and stronger than the sorts of attachment folks have needed to earlier sorts of expertise (and so out of the blue deprecating outdated fashions that customers trusted of their workflows was a mistake),” he wrote.
Nonetheless, fewer fashions to wrangle means extra sources to go round.
OpenAI would not disclose a lot technical element about its inside (non open-source) fashions, but when GPT-5 is something just like the dev’s open-weights fashions, gpt-oss-20b and gpt-oss-120b, and it was quantized to MXFP4, OpenAI has good purpose for wanting all these legacy GPTs gone.
As we not too long ago explored, the info sort can scale back the reminiscence, bandwidth, and compute required by LLMs by as much as 75 % in comparison with utilizing BF16.
For now, OpenAI restored GPT-4o for paying customers, however we have now little doubt that, as soon as OpenAI figures out what makes the mannequin so endearing and the way they’ll apply it to GPT-5, they’re going to do exactly that.
Lack of context
Along with architectural modifications, OpenAI opted to not improve GPT-5’s context window, which you’ll be able to consider as its long-term reminiscence. Free customers are nonetheless restricted to an 8,000-token context whereas Plus and Professional customers cap out at 128,000 tokens.
Examine that to Claude’s Professional plan, which Anthropic costs equally to OpenAI’s Plus subscription, and which presents a 200,000 token context window. Google’s Gemini helps contexts as much as 1 million tokens.
Bigger contexts are nice for looking or summarizing massive volumes of textual content, however in addition they require huge quantities of reminiscence. By sticking with smaller contexts, OpenAI can get by working its fashions on fewer GPUs.
If OpenAI’s claims about GPT-5 hallucinating as much as 80 % lower than prior fashions are true, then we anticipate customers to need bigger context home windows for doc search.
With that mentioned, if lengthy contexts are necessary to you, the model of GPT-5 accessible through OpenAI’s API helps context home windows as much as 400,000 tokens, however you will be paying a fairly penny if you happen to truly wish to benefit from it.
Filling the context simply as soon as on GPT-5 will set you again about 50 cents USD, which might add up shortly if you happen to plan to throw massive paperwork on the mannequin persistently.
Altman waves his palms
Altman has been doing a good bit of harm management within the days since GPT-5’s debut.
Along with bringing GPT-4o again, paid customers can now choose and modify GPT-5’s response velocity amongst Auto, Quick, and Pondering. He is additionally boosted charge limits to three,000 messages per week.
On Monday, Altman laid out OpenAI’s technique for allocating compute over the subsequent few months, which is able to unsurprisingly prioritize paying prospects.
As soon as ChatGPT’s prospects get their sources, Altman says, API use will take priority a minimum of as much as the present allotted capability. “For a tough sense, we will assist about an extra ~30% new API progress from the place we’re in the present day with this capability,” he wrote in an X put up.
Solely then will OpenAI take a look at enhancing the standard of ChatGPT’s free tier or increasing API capability. However fear not, if Altman is to be believed, OpenAI may have twice the compute to play with by the top of the yr.
“We’re doubling our compute fleet over the subsequent 5 months (!) so this case ought to get higher,” he wrote. ®
Remark For all of the superlative-laden claims, OpenAI’s new high mannequin seems to be much less of an development and extra of a option to save compute prices — one thing that hasn’t precisely gone over nicely with the corporate’s most devoted customers.
Because the flag bearer that kicked off the generative AI period, OpenAI is below appreciable strain not solely to exhibit technological advances, but additionally to justify its large, multi-billion-dollar funding rounds by exhibiting its enterprise is rising.
To do this, OpenAI can both improve its consumer base, elevate costs, or lower prices. A lot of the business is already aligning round its $20 and $200 a month pricing tiers. So OpenAI would want to supply one thing others can’t to justify a premium, or threat dropping prospects to rivals similar to Anthropic or Google.
With the tutorial yr about to kick off, OpenAI is bound to select up a recent spherical of subscriptions as college students file again into school rooms following the summer season break. Whereas extra paying prospects will imply extra revenues, it additionally means larger compute prices.
Enter the cost-cutting period.
Maybe one of the best proof of cost-cutting is the truth that GPT-5 is not truly one mannequin. It is a assortment of a minimum of two fashions: a light-weight LLM that may shortly reply to most requests and a heavier responsibility one designed to sort out extra advanced subjects. Which mannequin prompts land in is set by a router mannequin, which acts a bit like an clever load balancer for the platform as an entire. Picture prompts use a very completely different mannequin, Picture Gen 4o.
It is a departure from how OpenAI has operated up to now. Beforehand, Plus and Professional customers have been in a position to decide on which mannequin they’d like to make use of. In the event you wished to ask o3 mundane questions that GPT-4o may have simply dealt with, you possibly can.
In principle, OpenAI’s router mannequin ought to permit the majority of GPT-5’s visitors to be served by its smaller, much less resource-intensive fashions.
We will see extra proof of cost-cutting in OpenAI’s determination to toggle reasoning on and off by default robotically, relying on the complexity of the immediate. Freeloaders… we imply free tier customers, do not have the power to toggle this on themselves. The much less reasoning the fashions are doing, the less tokens they generate and the cheaper they’re to function.
However whereas this strategy could also be smarter for OpenAI’s backside line, it would not appear to have made the fashions themselves all that a lot smarter. As we addressed in our launch day protection, OpenAI’s benchmarks present fairly modest beneficial properties in comparison with prior fashions. The largest enhancements have been in device calling and curbing hallucinations.

Your eyes aren’t deceiving you, GPT-5 reveals solely iterative enhancements in math benchmarks like AIME 2025 – Click on to enlarge
The brand new system relies on the routing mannequin to redirect prompts to the proper language mannequin, which, based mostly on early suggestions, hasn’t been going all that nicely for OpenAI. In line with Altman, on launch day, GPT-5’s routing performance was damaged, which made the mannequin appear “method dumber” than it truly is.
Presumably that is why GPT-5 thought that “Blueberry” has only one B. Now it seems that OpenAI has mounted that fairly embarrassing mistake.
However since GPT-5’s router is a separate mannequin, the corporate can, a minimum of, enhance it.
Deprecating fashions
The router mannequin is not OpenAI’s solely cost-cutting measure. In the course of the AI behemoth’s launch occasion final week, execs revealed that they have been so assured in GPT-5 that they have been deprecating all prior fashions.
That did not go over nice with customers, and CEO Sam Altman later admitted that OpenAI made a mistake when it elected to take away fashions like GPT-4o, which, regardless of its lack of reasoning functionality and customarily poorer efficiency in benchmarks, is seemingly fairly common with finish customers and enterprises.
“When you have been following the GPT-5 rollout, one factor you could be noticing is how a lot of an attachment some folks should particular AI fashions. It feels completely different and stronger than the sorts of attachment folks have needed to earlier sorts of expertise (and so out of the blue deprecating outdated fashions that customers trusted of their workflows was a mistake),” he wrote.
Nonetheless, fewer fashions to wrangle means extra sources to go round.
OpenAI would not disclose a lot technical element about its inside (non open-source) fashions, but when GPT-5 is something just like the dev’s open-weights fashions, gpt-oss-20b and gpt-oss-120b, and it was quantized to MXFP4, OpenAI has good purpose for wanting all these legacy GPTs gone.
As we not too long ago explored, the info sort can scale back the reminiscence, bandwidth, and compute required by LLMs by as much as 75 % in comparison with utilizing BF16.
For now, OpenAI restored GPT-4o for paying customers, however we have now little doubt that, as soon as OpenAI figures out what makes the mannequin so endearing and the way they’ll apply it to GPT-5, they’re going to do exactly that.
Lack of context
Along with architectural modifications, OpenAI opted to not improve GPT-5’s context window, which you’ll be able to consider as its long-term reminiscence. Free customers are nonetheless restricted to an 8,000-token context whereas Plus and Professional customers cap out at 128,000 tokens.
Examine that to Claude’s Professional plan, which Anthropic costs equally to OpenAI’s Plus subscription, and which presents a 200,000 token context window. Google’s Gemini helps contexts as much as 1 million tokens.
Bigger contexts are nice for looking or summarizing massive volumes of textual content, however in addition they require huge quantities of reminiscence. By sticking with smaller contexts, OpenAI can get by working its fashions on fewer GPUs.
If OpenAI’s claims about GPT-5 hallucinating as much as 80 % lower than prior fashions are true, then we anticipate customers to need bigger context home windows for doc search.
With that mentioned, if lengthy contexts are necessary to you, the model of GPT-5 accessible through OpenAI’s API helps context home windows as much as 400,000 tokens, however you will be paying a fairly penny if you happen to truly wish to benefit from it.
Filling the context simply as soon as on GPT-5 will set you again about 50 cents USD, which might add up shortly if you happen to plan to throw massive paperwork on the mannequin persistently.
Altman waves his palms
Altman has been doing a good bit of harm management within the days since GPT-5’s debut.
Along with bringing GPT-4o again, paid customers can now choose and modify GPT-5’s response velocity amongst Auto, Quick, and Pondering. He is additionally boosted charge limits to three,000 messages per week.
On Monday, Altman laid out OpenAI’s technique for allocating compute over the subsequent few months, which is able to unsurprisingly prioritize paying prospects.
As soon as ChatGPT’s prospects get their sources, Altman says, API use will take priority a minimum of as much as the present allotted capability. “For a tough sense, we will assist about an extra ~30% new API progress from the place we’re in the present day with this capability,” he wrote in an X put up.
Solely then will OpenAI take a look at enhancing the standard of ChatGPT’s free tier or increasing API capability. However fear not, if Altman is to be believed, OpenAI may have twice the compute to play with by the top of the yr.
“We’re doubling our compute fleet over the subsequent 5 months (!) so this case ought to get higher,” he wrote. ®