Good morning.

Someone ran a single research prompt last week and it spawned 339 agents in ten minutes, burned 846,000 tokens, and wiped out a five-hour quota before the job finished.

Recursive spawning shipped as a feature, and it became a runaway bill overnight.

That's the sharp edge of the bigger story in this issue.

Bain projects token costs climbing from one or two percent of operating expense to twenty or thirty percent by 2028. Altman says enterprises are burning their whole 2026 AI budget in the first quarter.

AT&T cut its bill 90% by routing work to smaller models, and Gartner found that 80% of the companies cutting staff for AI have no return to show for it.

Every dollar you pull off payroll reappears as compute spend. The operators getting a return are metering tokens like cloud spend and redesigning the work, and the ones who only cut headcount are finding nothing on the other side.

Fourteen items from the last few weeks, each with my read on what it means for an operator at your scale.

— Sam

IN TODAY’S ISSUE 🤖

  • Token costs become 30% of opex

  • Altman: firms burned 2026 budgets in Q1

  • One prompt spawned 339 agents

  • How AT&T cut costs 90% by routing

  • Shopify grew 30% with 500 fewer people

  • Klarna's agent does 853 jobs

  • Why Intuit cut 3,000 roles

  • Entry-level hiring fell 16%

  • 80% cut staff, zero ROI

  • Tool failures break 31% of agents

  • A 30B model that runs locally

  • Salesforce bought an agent for $3.6B

  • Two AI giants filed to go public

  • OpenAI's plan: a personal AGI for everyone

Let’s get into it.

1. Bain Projects Token Costs Will Run 20-30% Of Operating Expense

Bain & Company projects that for AI frontrunners, token costs go from 1-2% of operating expense today to 20-30% by 2028-2029. The same report found the top 5% of users already consume more tokens than the other 95% combined. (Source)

Your P&L is reorganizing. The savings you pull off headcount land on a compute line that barely existed two years ago, and Bain says it grows into a fifth or a third of operating expense before the decade turns.

Start tracking token spend per head now. Your power users are reshaping your cost structure whether you watch it or not, and the operators who treat compute as a budgeted, metered resource this year are the ones who won't be surprised by it in 2028.

2. Altman: Companies Are Burning Their 2026 AI Budgets In Q1

OpenAI CEO Sam Altman said token costs have become a "huge issue" for enterprises, with a common refrain from customers being "my company spent my entire 2026 budget in Q1." Top spenders now run 100 billion tokens a month. (Source)

Treat token consumption the way a careful operator treated AWS in 2015:

  • Meter it. Know what each workflow costs to run.

  • Cap it. Set a per-team or per-tool budget before usage compounds.

  • Justify it. Require a reason for the workflows riding on expensive frontier models.

The teams blowing their year in a quarter told their people to use AI as much as possible and gave them no number to spend against. Give your team real leverage and a real budget in the same breath. The budget keeps the leverage honest, and it surfaces the runaway workflow before the invoice does.

3. One Research Prompt Spawned 339 Subagents And Burned A Full Quota

Claude Code v2.1.172 shipped recursive subagent spawning up to five levels deep on June 10. Within days, a user reported a single research prompt triggering 339 subagents in ten minutes, burning 846,000 output tokens and 64 million cached-read tokens. 87% of those agents hit rate limits and returned nothing, and the user's entire five-hour quota was gone. (Source)

Recursive spawning is the new runaway cloud bill. Before you hand any agentic coding tool to your team, set hard caps on recursion depth and a token budget per workflow.

The defaults ship tuned for capability demos, not for production economics, and one unguarded prompt can clear a month of compute in the time it takes to get coffee. This is a governance decision, and it belongs at the top. Set the caps centrally, then make them the default everyone inherits.

4. AT&T Cut AI Costs 90% By Routing 27 Billion Daily Tokens To Smaller Models

Facing large inference bills, AT&T re-architected its AI stack to route tasks through "super agents" that hand work to smaller, domain-specific worker models instead of pushing everything through frontier models. Daily token volume rose from 8 billion to 27 billion while costs dropped 90% at three times the throughput. (Source)

If every automation you run pipes through a top-tier model out of habit, you're overpaying for most of it. The margin in agent deployment comes from matching the model to the job: cheap tasks run on cheap models, and you reserve the expensive reasoning for the small share that needs it.

AT&T tripled its volume and cut its bill at the same time by building that routing layer up front. Ask your technical lead where every prompt currently goes. The honest answer is usually "the most expensive option, every time."

5. Shopify Grew Revenue 30% With 500 Fewer People

Following CEO Tobi Lütke's April 2025 memo requiring teams to prove AI couldn't do a job before requesting headcount, Shopify reported 2025 revenue of $11.6B (30% growth), with Q1 2026 growth accelerating to 34%. Headcount fell from 8,100 to 7,600, and revenue per employee climbed from $1.1M to $1.63M. (Source)

The mandate worked because Shopify built the internal AI infrastructure first, then rewrote the performance criteria to match. Telling your team to "use more AI" gets you nothing on its own. You give them the tools, then make the failure to use them a performance issue.

The number worth copying is revenue per employee, and Shopify moved it nearly half a million dollars per head in a year. Before your next requisition, ask what it would take for an agent to absorb the role instead.

6. Klarna's AI Agent Now Does The Work Of 853 Employees

Klarna's customer service AI handles the workload of 853 human employees and generates $60 million in annual savings, taking two-thirds of all customer inquiries and cutting response times by 82%. (Source)

The cost arbitrage in support stopped being theoretical a while ago. If you still scale your support team in a straight line with your customer count, you're carrying a structural disadvantage against anyone who doesn't. The benchmark customers compare you to now is instant resolution at close to zero marginal cost.

You won't reach 853 seats of leverage, and you don't need to. Take the highest-volume, most repetitive third of your support queue, route it to an agent first, then measure what it frees on the human side.

7. Intuit Cut 17% Of Its Workforce To Fund Its AI Pivot

Intuit announced a 17% workforce reduction, roughly 3,000 employees, with CEO Sasan Goodarzi framing the cuts as a step to accelerate the company's AI platform and mid-market growth, aiming to triple developer productivity and become a "builder powerhouse." (Source)

A $170B company is willing to take the cultural hit of cutting nearly a fifth of its staff to redirect capital toward AI capacity. Whether or not you'd make that trade, it tells you how the largest operators are reading the next three years.

Don't read this as a cue to cut your team. Read it as a cost question: keeping people on staff for work agents can now do is getting harder to justify. Look hard at the roles that exist mainly to absorb volume, and ask which of them an agent could carry.

8. Stanford: Entry-Level Employment Fell 16% In AI-Exposed Jobs

The Stanford Digital Economy Lab analyzed millions of payroll records and found a 16% relative employment decline for workers ages 22-25 in AI-exposed occupations since late 2022. Older workers in the same fields saw 6-9% employment growth, while entry-level hiring collapsed in what the researchers call the "fastest, broadest" workplace change they've measured. (Source)

You no longer hire juniors to do the grunt work while they learn the business, because AI does the grunt work now. That breaks the apprenticeship model every firm has relied on to grow its next generation of leaders.

The entry-level tasks people used to cut their teeth on are the exact tasks agents absorb first, so the path from junior to senior has a hole in the middle of it. If you want a leadership bench five years out, you need a deliberate answer for how people build judgment when the cheap reps are gone.

9. Gartner: 80% Of Orgs Cut Staff For AI, With No ROI To Show For It

A May 2026 Gartner survey of 350 global executives found that 80% of organizations piloting autonomous business technology have reduced their workforce. The study found no correlation between those headcount reductions and improved return on investment. (Source)

Read this against every layoff headline in this issue. Cutting people to fund AI is a capital allocation move, and on its own Gartner found it returns nothing measurable.

The return shows up only when you re-engineer the work so agentic speed changes the output. Cut the team without redesigning the workflow and you get a smaller company running at the same pace as the old one. The order matters: rebuild the process first, then let staffing follow what the new process needs.

10. Tool-Call Failures Drive 31% Of Agent Breakdowns In Production

An analysis of 73 AI agent production incidents between January and May 2026 found tool-call failures (31%) to be the leading cause of agent breakdowns, ahead of hallucinations. The same data shows 88% of AI proofs-of-concept never reach production scale, largely because of integration issues. (Source)

The leading cause of agent failure in production is now the tool call. Thirty-one percent of breakdowns trace to a brittle API, ahead of hallucination.

The work that gets an agent to production is error handling, retry logic, and fallback states for every tool it touches, which is plain engineering rather than prompt craft. The 88% of pilots that never ship mostly treated reliability as an afterthought. Moving an agent from demo to daily use, budget most of your effort for the integration layer, because that's where it fails first.

11. Cohere's North Mini Code Is A 30B Open Model That Runs Locally

Cohere released North Mini Code, an Apache 2.0 open-weights coding model: 30B total parameters with only 3B active via mixture-of-experts, a 256K context window, and 80.2% pass@10 on SWE-Bench. It runs locally on a single H100 or on Apple Silicon. (Source)

You no longer have to send your proprietary codebase to a third-party API to get capable coding help. Open-weight models are crossing the threshold for real engineering work, and this one runs on hardware you can already own.

If data privacy has been the reason you've held back an AI deployment, that reason is gone. When a capable model runs on your own machine against your own data, the recurring API bill stops being the only path, and the privacy objection loses its force.

12. Salesforce Is Buying Fin For $3.6 Billion

Salesforce signed a definitive agreement to acquire Fin (formerly Intercom) for $3.6 billion, expected to close in Q4 of its fiscal 2027, folding Fin's autonomous agent technology into the Salesforce ecosystem for customer service resolution. (Source)

The big platforms are buying the agent layer rather than building it, and that carries two consequences for you:

  • The tools you already pay for get more capable.

  • They get more expensive as that capability gets absorbed and repriced.

If a core vendor is mid-acquisition, lock your pricing on a longer term before the new capabilities arrive with a new number attached. It's also the moment to look at independent agent frameworks, so vendor consolidation doesn't decide your cost structure for you.

13. OpenAI And Anthropic Both Filed Confidential IPO Paperwork

Both Anthropic (June 1) and OpenAI (June 8) filed confidential S-1 registrations to go public, opening a race to the public markets, though neither has set a firm timeline. (Source)

The frontier labs need public-market capital to fund their compute buildouts, which puts them under pressure to show revenue growth and a path to profit. For operators building on their APIs, read that as a forecast: expect pricing models to move and enterprise tiers to get more aggressive as these companies prepare their roadshows.

The cheap, founder-friendly pricing of the experimentation era ran on venture money. Public markets ask for margin, and the bill for that usually arrives on the customer's side of the invoice.

14. OpenAI's Phase Three: A Personal AGI For Everyone, By 2028

On June 8, Sam Altman and new CEO Jakub Pachocki published OpenAI's roadmap: build an automated AI researcher (targeting a "significant fraction" of research done by AI systems by March 2028), accelerate the economy through scientific progress, and give every person on Earth a personal AGI. They wrote that "entirely automating everything is not the future we want" and called for an international body able to slow frontier development when needed. (Source)

Read this as a product roadmap rather than a philosophy essay. "Personal AGI for everyone" means OpenAI will push hard on free and low-cost tiers to make its models the default layer under everything, so expect aggressive pricing aimed at distribution.

The March 2028 automated-researcher target is the number to mark: it tells you the window for human-only advantages in knowledge work is shorter than most operators are planning around. Move your moat to what compounds while raw capability gets cheap: the proprietary data and customer trust an agent can't copy.

Read the fourteen together and the same line runs through all of them.

The price of raw capability keeps falling per token, and total compute spend keeps climbing as a share of the P&L, on its way to a fifth or a third of operating expense.

Those two facts sound like a contradiction, but they hold together:

You use more of a thing as it gets cheaper, so the bill grows even as each unit costs less.

The operators getting a return on that are doing two things at once. They meter and route compute like the budgeted resource it now is, and they redesign the work so the savings show up as output.

Interestingly enough, the ones cutting staff and stopping there are the 80% in the Gartner data with nothing to show for it.

Until next time,
Sam Woods
The Editor

.

Keep Reading