Good morning.

An operator I worked with last month had three agents wired up in two weeks. By the third week none of them were producing anything he could use.

When I asked to see the system prompts, two of them were a paragraph long and one was three sentences. None named what the agent should refuse to do. The access had been scoped in the same fifteen minutes as the prompts, and all three had been running on production tasks from the day they shipped.

The fix lives in the playbook he already has. Most operators would never let a new hire into the company Slack on day one without a job description, a scoped set of tools, a starter task they could review end to end, and a clear runway before adding a second responsibility.

Then they deploy an agent the same week and skip every one of those steps.

This issue is the playbook applied to the agent, end to end, with the six-step protocol you can run on the next one you ship.

— Sam

IN TODAY’S ISSUE 🤖

  • Why agents are middle-layer workers, not tools

  • Where the agent runs and how the protocol changes

  • What deployment looks like next to onboarding

  • Which agent to put through this protocol first

  • The Protocol: six steps from prompt to working agent

  • A SOUL.md fragment you can copy today

  • Performance management, the handoff, and the diagnostic table

  • Where this gets you in ninety days

Let’s get into it.

The Frame Change

Your business runs on three kinds of workers now. Humans hold strategy, taste, and the calls that need a person in the room. Software is the rules layer (Zapier, n8n, your CRM workflows, the deterministic if-X-then-Y automations you have already wired up).

Agents are the new layer between them. They reason, they adapt, and they take multi-step actions across your tools with ambiguity baked into the inputs.

The job an agent does is closer to a junior team member than to a Zapier zap, which means you manage it closer to a junior team member than to a Zapier zap.

Most agent deployments fail at this point. The operator picks a platform, writes a paragraph of instructions, hands the agent a real workflow, and reads the mediocre output as evidence that agents are not ready.

The actual problem is that no team member would produce good output under those conditions either.

A new hire with no job description, no scoped access, no review on day one, and three responsibilities by Friday would fail in exactly the same way. The agent is doing what an unmanaged employee does.

Where The Agent Runs

Before you onboard, name the environment. Agents fall into two architectural buckets, and the protocol surfaces differently in each.

Embedded agents run inside your cognitive folder. Cowork, Claude Code, Claude projects, any setup where the model reads markdown files from a folder you own and operates inside it.

The cognitive folder is not a knowledge source the agent reaches into. It is the agent's working environment. SOUL.md is a real file the agent loads. Skills sit in a skills/ directory and load on demand. Memory writes back to files in the same folder. The agent and the folder are one thing.

External agents run on a platform that owns the runtime. Lindy, Gumloop, n8n, Zapier, anything coded against an API. The cognitive folder is a separate asset the platform reaches into through a connector or a sync.

The SOUL.md content lives in the platform's system prompt field, not in a file the agent loads.

Context might be synced from your folder through Drive or GitHub, or it might be pasted in chunks. Memory lives in the platform's database, not in your folder.

The protocol works for both. The surface where each step happens is different.

Step

Embedded (Cowork, Claude Code, project)

External (Lindy, n8n, coded)

Write the job description

Create SOUL.md in agents/[name]/

Paste the same content into the platform's system prompt field

Set up access

Files in the project, MCPs connected, scoped to the folder

Tools selected in the platform UI, credentials wired, OAuth scopes set

Run the first task

Run inside the folder, output to a draft file

Trigger manually, output to inbox or Slack

First recurring workflow

Scheduled task or trigger in Cowork or Code

Native platform trigger or cron

Expand the scope

Add a second SOUL.md or extend the existing one

New flow in the platform pointed at the same context

Move to sample audit

Friday review of the agent's output files

Friday review in the platform's run history

Different surfaces. Same six moves. If you are running embedded agents in Cowork or Claude Code, the cognitive folder issue from two weeks ago is the prerequisite for this one. If you are running external agents in Lindy or n8n, the folder still earns its keep as the source of truth that gets pasted or synced into the platform.

Onboarding Versus Deployment

The two columns are doing the same job. One is muscle memory and the other is new.

Hiring an employee

Onboarding an agent

Job description

System prompt

Tool access and credentials

Connector or MCP setup

First-week shadowing

Reviewing every output

Probation period

First recurring workflow with full review

Expanding responsibilities

Adding the second workflow

Performance review

Weekly sample audit

Promotion to autonomous work

Reducing review frequency

Read the right column and you can see why most deployments stall. Operators do most of the left column for humans and almost none of the right column for agents.

The other practical implication: every cognitive layer you have already built (CLAUDE.md, the context folder, the voice guide, the agent-team file from the cognitive folder issue) is the equivalent of your employee handbook.

Your agent reads it before doing anything, the same way a new hire reads the onboarding doc on their first morning.

What To Onboard First

You can onboard any agent through this protocol, but the order matters. The first agent you put through it should teach you the protocol as much as it does the work. Pick a role with low blast radius and visible output, and you will calibrate the rest of your agent layer against it.

Intelligence is usually the right first role. It reads from external sources (saved searches, competitor feeds, news, your own analytics), writes a daily or weekly digest, and never touches anything customer-facing.

Low risk on access, fast loop on calibration, and the operator sees the output every morning. Two weeks of running this agent will teach you more about your own context folder than any planning exercise will.

QA is usually the right second. It reads from your own output (drafts, proposals, agent-produced work) and flags anything that misses voice, compliance, or factual standards.

Like the intelligence agent, it has no write access to production systems. Unlike the intelligence agent, it gives you confidence in every other agent you ship after it.

The third agent depends on the business:

Agencies tend to onboard a draft agent third (proposals, briefs, status updates).

Ecommerce tends to onboard a triage agent third (incoming tickets, refund requests, common questions).

SaaS tends to onboard an operations agent third (metric pulls, dashboard refreshes, anomaly flags).

The Protocol

Six steps. Each one is the agent equivalent of something you already do for human hires. Run this for one agent at a time, start to finish, before adding another.

Trying to onboard three agents in parallel is the same mistake as bringing on three new hires the same week and expecting all of them to be productive by Friday.

Step 1. Write the job description. Open the agent's SOUL.md (or the system prompt field if you are configuring directly in a platform) and capture five things:

  • Identity in one sentence.

  • Scope in three to five lines, including what the agent does not do.

  • The hard limits, meaning the actions or claims it must never produce; the tone for any output that leaves the team.

  • And the deliverable format the agent produces every time.

Twenty to thirty minutes if you have your business context already documented. Longer if you do not, in which case write the missing context first because the agent cannot do its job without it.

What fails when you skip this: the first run reads like a fresh chat session instead of like a member of the team, and every correction has to be made one prompt at a time.

A real fragment of a working SOUL.md, for an intelligence agent in an e-commerce business:

Identity: You are the Intelligence Agent for [business name]. You scan, filter, and summarize external signals the operator needs to make weekly decisions.

Scope:
- Read three sources daily: saved search alerts, competitor blogs listed in context/competitors.md, and any threads the operator drops into the inbox folder.
- Produce one daily digest by 7am, formatted per the template below.
- Flag anything that needs the operator's attention before the next digest.

What you do not do:
- Make recommendations on pricing, hiring, or partnership decisions.
- Write any output that goes to a customer or external stakeholder.
- Access the CRM, billing systems, or shared drives outside /intelligence-inbox.

Hard limits:
- Never quote a source you have not read end to end.
- Never present a competitor claim as a fact without naming it as a claim.
- Never use the words listed in context/voice.md as banned.

Output: A daily markdown file in /digests/, named YYYY-MM-DD.md, with five sections: top signal, three secondary signals, competitor moves, what to ignore this week, what to watch next week.

That is half a page in the source file. It does the work of an employee handbook page plus a job description, and you write it once.

Step 2. Set up access. Decide what tools and data the agent reads from and what it can write to. Default to read-only on day one. The agent can pull from your CRM, your analytics, your shared drives, your context folder. It writes only to its own output location (a draft file, a Slack channel, an email draft for your review). No write access to production systems on day one.

The same way you do not give a new hire write access to the customer database in the first week.

The surface this happens on depends on where the agent runs. Inside a Claude project, it is which files live in the project and which MCPs you connect. For Lindy, the same decision is the trigger plus the tool selection. On n8n or Zapier, it is the credentials and the node choice. Different surface, same access decision.

What fails when you skip this: you spend the next month rolling back actions in production systems and rebuilding trust with the team that has to clean up after the agent.

Step 3. Run the first task. Give the agent one specific task that should take it ten to thirty minutes of work. Not a recurring workflow yet. A single, scoped piece of work you would otherwise do yourself this morning. Watch the run end to end. Read the output line by line. Note three things: where the output matched what you would have produced, where it missed in a way that points to a missing piece in the system prompt or context files, and where it missed in a way that points to a real limitation of the agent for this task.

Update the system prompt or context files based on what you saw. Run the same task again. The second run should be visibly better than the first. If it is not, the issue is in the system prompt or in your access scoping, not in the agent.

What fails when you skip this: the recurring workflow inherits every gap you would have caught on the first run, and you trace those misses for weeks before realizing you skipped the calibration step.

Step 4. Set up the first recurring workflow. Once the agent has produced two clean outputs on its first task, move it to a recurring version. Same task, repeated on a schedule or trigger. Daily, twice a week, weekly, whatever the natural cadence is. Review every output for the first two weeks of the recurring run.

Log what the agent flagged, what it missed, what it produced that you would not have. Two weeks of full review is the agent equivalent of a probation period. You are calibrating trust by watching production runs rather than theoretical capability.

What fails when you skip this: you never establish a baseline for what working looks like, so you have nothing to measure against when the output drifts later.

Step 5. Expand the scope. After two clean weeks on the recurring workflow, add the second responsibility. One more task, one more workflow, scoped the same way as the first. Connect any new tools or data sources the second responsibility requires.

The mistake to avoid is doubling the scope at the same time as adding the responsibility. Add the responsibility first, run it on the existing access, then expand access only where the new work requires it.

The same logic you would use giving a new hire their second project before adding admin permissions to a system they have not earned access to yet.

What fails when you skip this: you bundle scope and access in the same move, and when something misses you cannot tell whether the prompt is wrong, the workflow is wrong, or the access is wrong.

Step 6. Move from reviewing every output to reviewing samples. By the time the agent has six to eight weeks of clean output across two workflows, the cost of reviewing every run starts to outweigh the value of catching the rare miss.

Move to a sample audit: review every output on Friday for one of the two workflows, alternating weeks. Spot-check anything the agent flags as low confidence.

Keep a running log of any miss that reaches a customer or a stakeholder, because that is the signal that scope expanded faster than capability and the agent needs to step back to full review on that workflow.

What fails when you skip this: you either pay the review tax forever, or you skip review entirely the first day the agent saves you ten hours, and you find out about the misses from a customer.

By the end of the sixth step the agent is doing two recurring jobs reliably enough that you trust the output most of the time, you have a written job description and access scope you can hand to the next operator on your team to maintain, and you have a calibrated review cadence that catches misses without consuming your week.

Every agent after the first goes through the same six steps, and each one takes less time because the cognitive layer is already in place.

Performance Management Once The Agent Is On The Team

A working agent is the start of the management work, not the end of it.

Every Friday, look at the agent's outputs the way you would review a junior team member's work. Find one place where the output landed and one place where it missed, then figure out what in the system prompt or the context files would have prevented the miss.

The answer goes in the prompt or the file, not in a one-off instruction for the next request. Adding instructions to individual runs is the agent equivalent of correcting a hire one task at a time without ever updating their job description, and the same misses will keep coming because the underlying instructions never changed.

Three months in, the system prompt and context files for any working agent should look noticeably different from the version you wrote in step one. If they look the same, the agent has not been managed. It has only been used.

The Friday ritual from the cognitive folder issue is the exact cadence to put around this work. Twenty minutes a week, one owner per file, edits going into the source rather than into individual prompts.

The agent gets better the way a junior team member gets better, through accumulated calibration that lives somewhere you can see.

The Handoff

The reason the protocol works is that the work is documented. The artifact you can hand to someone else is also what makes the agent feel like real headcount rather than a project that lives in your head.

A clean handoff to the team member taking over an agent has four pieces:

  1. The SOUL.md or the equivalent system prompt content, with a version date at the top.

  2. The access scope document. A short list of what the agent reads from, what it writes to, and what it cannot touch. For external agents, screenshots of the connector configuration belong here.

  3. The review log. Two weeks of full review notes from Step 4 and a running list of any miss that reached a customer or stakeholder. The new owner sees the failure history before they own the agent.

  4. The known failure modes. Three to five ways this specific agent tends to miss, what each symptom looks like, and which step of the protocol to revisit. Built from your own review log over the first six to eight weeks.

Operators who skip the handoff end up with a personal productivity tool. Operators who document the handoff end up with team capability that compounds as the team grows. The handoff is the move that turns the agent from yours into the business's.

Diagnostic Reference

Most things that go wrong with a working agent map back to one of the six steps. The signal is the symptom. The fix is the step.

Symptom

What it usually means

Revisit

Agent confidently produces wrong output

Scope is too broad, the agent is improvising past its real capability

Step 1

First runs are good, output decays over weeks

The system prompt drifted out of date, or context files were not updated as the business changed

Step 1 and the Friday ritual

Output is fine but the agent took an action it should not have

Write access was scoped too aggressively, or read access pulled in production data it should not touch

Step 2

The agent's second-run output is worse than its first

The system prompt added contradictory instructions during calibration, or an existing instruction was broken in the rewrite

Step 3

Customer-facing miss reaches a stakeholder

You moved to sample audit before the agent earned it, or the second workflow expanded scope at the same time as access

Step 5 or Step 6

You stopped reviewing entirely the day the agent saved you ten hours

You skipped the move from full review to sample audit and went straight to no review

Step 6

If a symptom does not show up here, the cause is almost always context the agent does not have. The fix lives in the cognitive folder, not in a one-off prompt.

What This Looks Like At Ninety Days

An operator who runs the protocol for one agent in month one, the second in month two, and the third in month three ends the quarter with three working agents covering recurring work that previously sat on the operator or a team member.

Each agent has a documented job description, a calibrated access scope, and a known review cadence. Together they form the start of an agent layer the team can extend without the operator in the room every time.

The operator who tries to deploy three agents at once in week one ends the quarter with three half-working agents, no documented job descriptions, scope confusion across all three, and pulls back from agents entirely until something forces the question open again.

That is the failure pattern I see most often, and it almost always traces back to skipping the onboarding step on every agent at the same time.

Treat the next one like a new hire, run it through the six steps, and watch how the output changes by week three.

If you have not built the cognitive folder yet, the previous issue ships the full file-and-folder structure plus a Claude Skill that builds it for you in one conversation.

Build the folder first, then run the next agent through this protocol.

Let me know how you make out.

Talk soon,
Sam Woods
The Editor

.

Keep Reading