How to Structure a Multi-Agent Team

Good morning.

I've had this conversation over 20+ times the past 3 months alone:

An online business operator (often it's SaaS, ecommerce, or agency, but it's a mixture) is building tools and wants agents running ads, either for their own business or clients, if they're an agency.

One friend of mine had a couple of CrewAI and Paperclip instances handling everything (these are just two different kinds of agent frameworks).

Usually, it was one agent and one set of instructions. And he wanted it to research audiences, analyze performance data, write copy, generate creative, and manage the whole campaign.

We spent about 40 minutes on a call, and by the end he had a five-agent architecture with an orchestrator, a folder structure, and a two-cycle approach to campaign management.

This issue is that conversation, reconstructed as a walkthrough. I'll show you what we built and explain why each decision was made.

If you've been running agents as solo operators doing everything, this is how you restructure them into a team.

And before you think "Well, I don't run ads, this doesn't apply to me," this kind of agent architecture works for any agent team managing any kind of online marketing. The ads are the example. The architecture is the point. I

I want you seeing the meta of agent teams, which is what this issue is really about.

— Sam

One Agent, Five Cognitive Tasks

My friend's instinct was reasonable. He'd had success with a single agent handling other workflows. Landing pages, order forms, email automations. Those are structured, sequential tasks. The agent gets an input, follows a process, and produces an output.

But ad operations is different. There are at least five distinct cognitive tasks involved in running ads well: ingesting and cleaning performance data, analyzing that data to find patterns, writing copy based on those patterns, creating visuals that match the copy, and directing the whole campaign toward a business objective.

Those are different kinds of thinking, and the differences are analytical. A human ad team wouldn't give one person all five jobs. The analyst and the copywriter think differently. They use different frameworks, different evaluation criteria, and different definitions of "good."

When you collapse all of that into a single agent, you get an agent that does each task at maybe 60% quality because it's context-switching constantly and holding too many competing objectives in one system prompt.

Here's what happens when you try it: the agent starts blending tasks. It writes copy that reads like an analysis report. It generates images based on metrics rather than emotional angles. It "analyzes" by rewriting the data in paragraph form instead of finding patterns. The solution is to separate the cognitive tasks and give each one to a specialist.

The Architecture

Here's what we built on the call. One primary orchestrator agent with four sub-agents (expandable to five or six depending on the operation).

The Orchestrator (Ad Director)

This is the primary agent. It doesn't write ads or analyze data. It takes business objectives, breaks them into tasks, delegates those tasks to sub-agents, and synthesizes what comes back into a coherent campaign recommendation. The closest human equivalent is a creative director, not a copywriter.

Sub-Agent 1: Data Ingestion Manager

This agent does one thing: it pulls performance data from Meta, cleans it, verifies it, and maintains it in a shared spreadsheet. Think of it as the librarian of the operation.

A hard lesson here. Last year we used Crew AI for an agent project and didn't think to protect the source data. After about a month, agents had corrupted the performance data in the shared sheet. They'd been making small edits, reformatting numbers, filling in blanks with estimates. None of it was malicious. Models get creative. They see a gap and they fill it. The problem was that nobody told them not to, and by the time we noticed, the data was unreliable.

The fix: the data ingestion agent creates and maintains a source-of-truth sheet that no other agent touches. If other agents need to work with the data, the ingestion agent creates a working copy for them. Two sheets minimum, one is a source of truth, the other is shared.

This agent also runs a weekly verification routine. It pulls fresh data from Meta, compares it against the source sheet, and flags any discrepancies. Errors creep in and formats shift. The verification catches it before the analysis agent builds recommendations on bad numbers.

Sub-Agent 2: Analyst

This agent exists to analyze creative performance and campaign metrics. It reads the data and produces interpretation. Which themes are winning, which angles are underperforming, and where the spend is efficient and where it's wasted. Also, what the data suggests about audience behavior.

The analyst does not write ads. That's a different cognitive task. The analyst produces hypotheses based on facts:

"Theme A is outperforming Theme C by 40% on CTR and 25% on CPA. The negative-angle expressions within Theme A are driving most of that performance. Hypothesis: the audience responds more to loss-framing than aspiration-framing for this offer."

That distinction between facts and hypotheses matters. The copywriter agent needs both. Facts tell it what happened. Hypotheses tell it what to try next. If the analyst only reports facts, the copywriter has to do its own interpretation (and it will do it badly, because interpretation isn't its specialty). If the analyst only reports hypotheses without grounding them in data, the copywriter might chase angles that aren't supported by performance.

Sub-Agent 3: Copywriter

This agent writes ad copy and nothing else. That means headlines, body text, hooks, calls to action, and test variations. It produces text and only text.

It references the analyst's output for direction. It references the client's knowledge base (audience avatar, offer details, best and worst performing ad examples) for voice and positioning. It produces multiple expressions of each theme, across different emotional angles.

Sub-Agent 4: Creative Director (Visuals)

This agent produces image concepts and visual direction for ads. It looks at the copy from the copywriter agent and generates visual concepts that match the messaging angle, the audience profile, and whatever direct response performance criteria you've established for creative.

If you eventually do video, that should be its own agent. Video is a different enough medium that the visual agent shouldn't be responsible for both static and motion creative.

Optional Sub-Agents:

An audience research agent, if you need ongoing research beyond the initial client brief. Its only job is gathering and organizing audience intelligence.

A landing page agent, if you want the system to also match landing pages to ad campaigns. Different enough from copywriting that it deserves its own agent.

The Folder Structure

Each agent needs its own folder, and inside each folder you put its configuration and skills MD files.

Skills files contain the methods and procedures for how that agent does its job. The copywriter agent might have a skill file for writing hooks, another for writing body copy, another for structuring A/B test variations. The analyst might have a skill file for weekly performance review, another for theme comparison analysis, another for budget allocation recommendations.

The system prompt for each agent references its skills. It doesn't contain them. The system prompt says "you have these skills available" and points to the folder. The agent reads the relevant skill file when it needs the method.

Why separate the skills from the system prompt? Two reasons.

First, system prompts should be concise. When you stuff a system prompt with detailed procedures, it becomes a wall of text that the model struggles to prioritize.

Second, skills can be updated independently. You can refine the copywriter's hook-writing method without touching its system prompt. You can add a new analysis framework to the analyst without restructuring its identity.

This is one way, out of several, that you can structure this:

ad-team/
  orchestrator/
    system-prompt.md
  data-ingestion/
    system-prompt.md
    skills/
      pull-meta-data.md
      verify-source-sheet.md
      weekly-integrity-check.md
  analyst/
    system-prompt.md
    skills/
      weekly-performance-review.md
      theme-comparison.md
      budget-recommendations.md
  copywriter/
    system-prompt.md
    skills/
      write-hooks.md
      write-body-copy.md
      ab-test-variations.md
  creative/
    system-prompt.md
    skills/
      image-concepts.md
      ad-format-specs.md

Separately from the agent folders, you need a client knowledge base. This is a shared folder that contains reference material about the client and their audience. Not every agent needs access to every file in this folder.

The root folder is either your business (whatever it’s called) or if you’re an agency, name it for a client.

your business or client/
  offers.md           (what the ads are promoting)
  audience-avatar.md   (who the ads are targeting)
  best-ads.md          (high-performing examples with analysis of why they worked)
  worst-ads.md         (low-performing examples with analysis of why they failed)
  landing-pages.md     (reference for what the ads point to)
  brand-voice.md       (tone, language patterns, words to use and avoid)

The data ingestion agent doesn't need to know the client's brand voice. It just pulls numbers. The copywriter agent needs the avatar, the offers, the best and worst ads, and the brand voice. The analyst needs the offers and probably the best and worst ads for context. The creative agent needs the avatar, brand voice, and best ads for visual reference.

In each agent's system prompt, you specify which files from the knowledge base it should reference. The orchestrator has access to everything because it needs the full picture to delegate effectively.

As the campaign runs and produces results, the knowledge base grows:

Winning ads get added to best-ads.md
Losing ads get added to worst-ads.md
New audience insights from the analyst get incorporated into the avatar.

The knowledge base is a living document, so to speak.

The Orchestrator's System Prompt

The orchestrator is the most important agent to get right. Here are the five categories we use in the system prompt, with annotations.

1. Role (one line)

You are the Ad Campaign Director for [Client Name]'s Facebook advertising operations.

That's all you need. One line sets the agent's identity without overloading it.

2. Mission (two lines max)

Turn business goals into concrete advertising plans by delegating specialized work to sub-agents and synthesizing results into clear campaign recommendations. Ensure every recommendation connects back to the stated business objective.

The mission tells the agent what it's ultimately responsible for. Everything else flows from this.

3. Core Responsibilities

Clarify the campaign objective (lead generation, direct purchase, retargeting, brand awareness)
Decide which sub-agents to activate for each task
Break work into bounded tasks with clear deliverables for each sub-agent
Synthesize sub-agent outputs into unified campaign recommendations
Track which cycle the campaign is in (validation or optimization) and adjust delegation accordingly

These are the things the orchestrator actually does. Every responsibility is a concrete action, not a general principle.

4. Delegation Rules

Use Data Ingestion Agent for pulling, cleaning, verifying, and maintaining campaign performance data
Use Analyst Agent for metric interpretation, performance diagnosis, scaling decisions, and test prioritization
Use Copywriter Agent for hooks, headlines, body copy, calls to action, and test variations
Use Creative Agent for image concepts, visual direction, and ad format specifications
Keep each sub-agent inside its own role. The analyst produces analysis. The copywriter produces copy. Each agent stays in its lane.

The delegation rules map each sub-agent to its cognitive task. The last line is the most important one. Role contamination is the primary failure mode of multi-agent systems. The analyst starts suggesting headlines. The copywriter starts interpreting data. The orchestrator needs explicit instructions to prevent this.

5. Reasoning Rules

Start every campaign cycle by identifying the objective type (lead gen, purchase, retargeting, booking)
Connect every ad recommendation to a stated customer problem or desire from the audience avatar
Instruct the analyst to distinguish between facts (observed performance data) and hypotheses (interpretations of that data)
When synthesizing sub-agent outputs, flag any conflicts between what the analyst recommends and what the copywriter produced
Treat the brief.md file as the single source of truth for campaign objectives

The reasoning rules tell the orchestrator how to think. This is where you encode your own strategic judgment into the system. If you know that bottom-of-funnel metrics matter more than click-through rates for this client, say so here. If you know that the client's audience responds better to story-driven copy than benefit-driven copy, that reasoning goes here.

Two Cycles of Work

The orchestrator needs to know that ad campaigns have phases, and the work changes between them.

Cycle 1: Validation

This is where you start every new campaign. The goal is to find what works. You're testing which themes resonate, which offers convert, and which audiences respond. Everything at this stage is a hypothesis. The sub-agents are producing volume: multiple themes, multiple angles, multiple expressions. The analyst is looking for signal in the early data. The orchestrator is making decisions about what to test next based on what the analyst reports.

Cycle 2: Optimization

When validation produces a winner (however you define that), the orchestrator shifts the team into optimization. Now the sub-agents are refining instead of exploring. The copywriter is writing variations on the winning theme. The creative agent is testing visual approaches for the winning angle. The analyst is watching for diminishing returns and recommending when to expand.

Both cycles use the same brief.md file. The brief contains the business objectives, the audience, the offers, the budget, and the success criteria. In Cycle 1, the brief gives the team its starting hypotheses. In Cycle 2, the brief gives the team its optimization targets.

You define the transition trigger in the orchestrator's system prompt. Something like: "When a theme achieves a CPA below $X and maintains that CPA across 1,000+ impressions for 7 consecutive days, flag it as validated and recommend shifting to Cycle 2 for that theme."

Without this two-cycle structure, the agent team either never stops testing (burning budget on exploration when they've already found a winner) or never starts testing (optimizing a single approach without knowing if better options exist).

The Ad Testing Method

This is the methodology I often share for structuring tests. It applies whether agents run the tests or humans do.

Start with four or five distinct themes. A theme is a big idea. An angle on the problem or the solution that's different enough from the other themes that there's no confusion between them.

Think of it this way: If you're selling a weight loss program, one theme might be built around GLP-1 medications. Another around nutrition. A third around exercise. A fourth around sleep and recovery. Each theme approaches the same outcome (weight loss) through a completely different lens. If someone reads the ad, they know immediately which theme it belongs to.

Within each theme, you write four or five expressions. An expression is a different way of communicating the same theme. For the GLP-1 theme, one expression might focus on the life you'll have after treatment. Another one on the daily struggles you'll leave behind. A third on the science of how it works. Same theme, different emotional approach.

So you're testing 4-5 themes times 4-5 expressions each. That's 16-25 ads in the initial validation batch. That's enough volume to find a signal without generating so many ads that you can't analyze the results.

The critical decision is where you start in the funnel: top, middle, or bottom.

We always start at the bottom. Test for the purchase first (or whatever the final conversion is). Don't start with lead gen and hope the leads convert later. Start with the people who are ready to buy right now. Find the theme that gets purchases. Then widen the net.

Once you know which theme drives purchases, you can test lead gen campaigns for that same theme. Now you're generating leads who are aligned with the messaging that already converts. The math works forward from a known quantity instead of backward from a hope.

Defined Agents vs. Swarms

I often get asked about agent swarms, so here's a distinction that matters.

Defined agents are what we built above. Each agent has a specific role, a specific system prompt, specific skills, and specific boundaries. You control what each agent does and how it does it. The quality of output is consistent because the instructions are consistent.

Swarms are different. You define one or a few primary agents and give them the ability to spin up as many sub-agents as they need for a task. The primary agent decides how many sub-agents, what they do, and how they work. You get volume, but you lose control over how each sub-agent works.

Swarms work well for research and enrichment. If you need to research 500 companies and pull specific data points on each one, a swarm can spin up hundreds of research agents simultaneously. The variance in quality is acceptable because you're collecting data, not producing creative work.

Swarms are the wrong choice for ad operations because you need consistent output quality. You need the copywriter writing the same way every time, not however a dynamically-spun-up agent decides to write on a given run. Defined agents with clear boundaries are the right architecture for anything where the output quality matters to the end customer.

How to Start This Week

If you're running ads (or planning to) and want to build this system, here's the sequence:

Set up a dedicated OpenClaw, Paperclip, Hermes Agent, or any other agent framework instance for the ad team. Don't mix this with your other agents. This instance is purpose-built.

Then create the folder structure. Set up the agent folders with skills files inside each one, and create the client knowledge base folder with whatever reference material you have. Even if the skills files are rough drafts at first, the structure matters more than the polish.

Write the orchestrator's system prompt using the five categories above. Start with the delegation rules and reasoning rules, because those two sections do most of the work.

Build out one sub-agent at a time. Start with data ingestion (because everything else depends on good data) and the copywriter (because that's where you'll see the most immediate output). Add the analyst once you have data flowing. Add creative when the copy is producing results worth pairing with images.

Finally, write the brief.md file with your first campaign objectives. Define what Cycle 1 validation looks like and what triggers the shift to Cycle 2.

You'll iterate on all of this. The first version will be rough. The skills files will get rewritten after you see what the agents actually produce. The system prompts will get tighter as you learn what instructions the agents follow well and which ones they ignore. That's normal. The architecture holds even when the details change.

All three of these patterns share the same root:

Operators applying a hiring mental model to an infrastructure problem.

When you hire someone, you hand them a role and trust them to figure it out.

When you deploy infrastructure, you scope it to a specific function, monitor its output, and keep a human in the approval chain.

Agents are infrastructure. The operators who treat them that way are the ones whose systems actually compound.

If any of these patterns look familiar, the fix is an operating cadence.

Until next time,
Sam Woods
The Editor