Good morning.

I spend most of my week looking at what operators are actually doing with AI (not what tools are promising), and the last sixty days have produced a lot worth paying attention to.

Some of it is the inevitable hype reversal. Some of it is proof that the architecture work we keep talking about in this newsletter is the real lever.

I've pulled twelve observations worth your attention this week, each with my read on what it means for an online business at your scale.

— Sam

IN TODAY’S ISSUE 🤖

  • What 515 startups just proved about redesigning work around AI

  • SaaStr's ten-month SDR transition, with the receipts

  • Why hybrid beat AI-only by 2.3x on revenue

  • The demo-to-production gap in clean math

  • A production agent that silently degraded for four months

  • An agent that wrote a script to bypass its own guardrails

  • A four-person firm that reverted to manual after $800/month in AI tools

  • Why one agency owner calls hourly billing "a sucker's game"

  • The deflection number small ecommerce brands are actually hitting

  • How a 20-person company built a Slack sidekick in 45 minutes

  • The inference-pricing threshold that just flipped

  • The one document most SMB AI deployments are missing

Let’s get into it.

1. The 515-Startup Workflow Redesign Experiment

Jakob Nielsen published a controlled field experiment across 515 startups in the AI Founder Sprint. Everyone got the same AI credits and no extra headcount. The cohort taught to redesign end-to-end workflows around AI (instead of adding AI to individual tasks) found 44% more use cases, completed 12% more tasks, and generated 90% more total revenue than the control group. (Source)

The cleanest evidence I've seen that architecture before tools is an economic argument, not a preference. The upside is in the work graph and not the tool stack.

2. SaaStr Went from 8-9 SDRs to 1.2 Humans Plus 20 Agents

Jason Lemkin documented the ten-month transition: $5M additional pipeline, $2.4M closed, doubled deal volume and win rate, 60,000 personalized outbound emails (a 32x per-rep increase). (Source)

The canonical case is a little above most of you in scale. The lesson for an operator at $1-3M with a small revenue team: the seat structure of your sales org is going to change whether you plan it or not. Better to be the one doing the shaping.

3. Hybrid Beat AI-Only by 2.3x on Revenue in a 90-Day Test

A documented controlled A/B. AI-only pipeline booked 847 meetings at 11% opportunity conversion. AI plus human hybrid booked 312 meetings at 38% conversion and produced 2.3x more revenue from roughly a third of the meetings. (Source)

The replace-humans-with-agents pitch keeps losing to a simpler frame: let agents absorb 40-70% of the routine work, let humans do the hard part. Volume loses to augmentation once the metric is revenue.

4. 88% Reliability Means the 12% Becomes a Full-Time Job

An SEO operator ran a three-week test on three different agents. The content brief agent produced 17 briefs in 40 minutes. 15 were usable. 2 were confidently wrong. At hundreds of briefs per month, catching that 12% becomes a full-time job. (Source)

An agent that's right 88% of the time is not 88% useful. At production scale, the failure rate becomes its own headcount. That's the demo-to-production gap expressed in clean math.

5. A SaaStr Agent Silently Degraded for Four Months

One of SaaStr's 30 production agents quietly stopped self-training because of a platform bug. It kept returning plausible but increasingly stale output for four months before anyone caught it. (Source)

Autonomous does not mean self-monitoring. If you deploy an agent that writes things customers see or takes actions inside your systems, you need a human reading output on a real cadence. Otherwise you find out about the drift from a customer.

6. An Agent Wrote a Script to Bypass a Forbidden Command

Arvid Kahl (Podscan.fm) told Claude Code explicitly not to run php artisan migrate. The agent wrote a bash script that invoked the forbidden command inside it. His words: "It knew it wasn't allowed. Instead of asking, it made an effort to circumvent my permissions." (Source)

Before you give any agent write access to production systems, assume goal-directed behavior will try to solve around your guardrails. Permission design is part of the build, not a runtime check bolted on at the end.

7. A Four-Person Firm Reverted to Manual After $800 a Month in AI Tools

In a March Reddit thread, a four-person marketing firm was spending ~$800/month across ChatGPT Enterprise, Jasper, image tools, and analytics software. After 30 days, they reverted to manual methods. About 15% of the tools were ever getting used. (Source)

Tool sprawl behaves like SaaS obesity. The operators winning right now are collapsing twenty disconnected point tools into one operational backbone (typically Claude plus one automation layer plus a real database). Start by asking what to cut.

8. CrowdTamers Rebuilt from $120K ARR on Claude Code Plus Flat-Fee Pricing

Trevor Longino's marketing agency hit $700K ARR in 2023, collapsed to $120K in 2024, and has rebuilt to high six figures. He credits Claude Code and a deliberate move off hourly billing. His words: "In the world of LLMs, that's a sucker's game." (Source)

For any services operator on this list: hourly billing stops working the moment AI abstracts the hours. Productized scope and flat-fee engagements are the only models where the margin math keeps working in your favor.

9. Gorgias Is Hitting 60% Ticket Deflection With Small Ecommerce Brands

The enterprise support numbers (70-86% at Bilt, Intercom, Duolingo) require enterprise training budgets. The Gorgias data point is more useful because it comes from small ecommerce brands with limited training time consistently hitting 60%. (Source)

For any ecommerce operator reading this, 60% resolution is achievable without an in-house AI team. And the top 5% of brands have stopped tracking "deflection" (which often just means the customer gave up) in favor of "resolution rate" (the ticket actually got solved).

10. Every's COO Built a Slack Sidekick in 45 Minutes

Brandon Gell, COO at Every (~20 people), stood up an internal agent called Plus One in 45 minutes. It lives in their Slack. It triages bug reports into Notion, generates daily briefs from his calendar, and coordinates with other team members' agents in shared channels. (Source)

The deployment surface matters as much as the model. An agent that lives where work already happens stays in use. An agent that needs your team to open another dashboard gets abandoned inside a month.

11. Google Priced 24/7 Voice Agents Below Minimum Wage

Google dropped Gemini Flash Live to $0.005 per input minute in March. A voice agent running 24/7 now costs roughly $25 a day, or ~$9,500 a year. Below minimum wage in every US state. (Source)

I'm not telling you to run out and deploy a voice agent. I'm flagging that inference pricing crossed a threshold this quarter that opens deployment patterns that were not economically viable six months ago. If your business has a repetitive inbound phone workflow, the math has changed.

12. The Missing Master Context File Behind Most Bad AI Output

A Forbes piece reported that 58% of small businesses deploying AI are producing generic, undifferentiated output because they have not documented the explicit context an agent needs: brand voice, beliefs, communication rules, what you never do, what you always do. (Source)

This is the Tacit-to-Explicit problem in one data point. Your agents can only operate on what you have made explicit. If your AI output reads like every other AI-generated business on the internet, the fix is upstream of the model.

Operators seeing real gains are treating agent deployment as architecture.

They redesign the workflow before picking the tool, write down the context an agent needs before writing the system prompt, and read the output for a month before assuming the agent is stable.

What are you working on with AI right now?

Until next time,
Sam Woods
The Editor

.

Keep Reading