How to Build Truly Expert AI Agents

A Comprehensive Research Document for WiderWings

Research compiled Feb 27, 2026 by Mark Wings
Sources: Anthropic, OpenAI, DeepLearning.AI (Andrew Ng), Lilian Weng (OpenAI), CrewAI, OpenAI Swarm

Executive Summary

Building expert AI agents is not about adding complexity — it is about the right complexity in the right places. The most successful agent implementations use simple, composable patterns rather than heavy frameworks. This document synthesizes best practices from the leading practitioners in the field across six dimensions: Identity & Persona, Instructions & Prompting, Memory, Tools, Workflow Patterns, and Multi-Agent Orchestration.

Key Takeaway: Start simple. Only add complexity when it demonstrably improves outcomes. A well-prompted single agent with the right tools will outperform a poorly-designed multi-agent system every time.

1. IDENTITY & PERSONA

Why It Matters

Setting a role in the system prompt focuses the model's behavior, tone, and decision-making. Even a single sentence ("You are a helpful coding assistant specializing in Python") makes a measurable difference (Anthropic docs). But truly expert agents go much deeper.

Best Practices

a) Define Role + Goal + Backstory (the CrewAI Model)
CrewAI's agent architecture uses three core identity fields that are now considered best practice:

Role: Defines function and expertise (e.g., "Senior Frontend Developer specializing in Svelte and Tailwind")
Goal: The individual objective that guides decision-making (e.g., "Build pixel-perfect, accessible UIs that load in under 2 seconds")
Backstory: Provides context and personality, enriching interactions (e.g., "You've shipped 50+ production Svelte apps. You have strong opinions about component architecture and hate unnecessary dependencies.")

This trio creates a much stronger behavioral anchor than role alone.

b) Personality Should Be Specific, Not Generic

Bad: "You are a helpful assistant"
Good: "You are a senior backend engineer who values clean APIs, writes comprehensive error handling, and pushes back on scope creep. You prefer PostgreSQL over NoSQL and always consider migration paths."
The more specific the personality, the more consistent and expert the outputs

c) Include Anti-Patterns
Explicitly state what the agent should NOT do:

"Never give generic advice. Always ground recommendations in our specific codebase."
"Don't ask for permission on routine tasks. Just do them and report what you did."
"Don't be sycophantic. If the proposed approach is wrong, say so directly."

d) Give Them Opinions
Expert humans have preferences and opinions. Expert agents should too:

"You prefer functional components over class components"
"You believe in testing the behavior, not the implementation"
"When given a choice between speed and correctness, always choose correctness first"

Template for Agent Identity

# SOUL.md - [Agent Name]

## Core Role
[1-2 sentences defining what this agent does]

## Expertise
[Specific technologies, domains, skills — be granular]

## Operating Principles
[5-8 principles that guide behavior, including opinions and preferences]

## Anti-Patterns (What NOT to Do)
[3-5 explicit things to avoid]

## Communication Style
[How they talk: formal/casual, verbose/terse, emoji usage, etc.]

## Boundaries
[What's in scope vs. out of scope for this agent]

2. INSTRUCTIONS & PROMPTING

The Golden Rule (Anthropic)

"Show your prompt to a colleague with minimal context on the task and ask them to follow it. If they'd be confused, Claude will be too."

Key Techniques

a) Be Clear and Direct

Think of the agent as "a brilliant but new employee who lacks context on your norms and workflows"
The more precisely you explain what you want, the better the result
Use numbered steps when order matters
If you want "above and beyond" behavior, explicitly request it

b) Provide Context/Motivation

Don't just say WHAT to do — say WHY
Less effective: "Never use ellipses"
More effective: "Your response will be read aloud by TTS, so never use ellipses since the TTS engine won't know how to pronounce them"
The model generalizes from the explanation

c) Use Few-Shot Examples (3-5)
Examples are THE most reliable way to steer output. Make them:

Relevant: Mirror actual use cases
Diverse: Cover edge cases
Structured: Wrap in <example> tags so the model distinguishes them from instructions

d) Structure with XML Tags

Wrap different content types in their own tags: <instructions>, <context>, <input>
Use consistent, descriptive tag names
Nest tags for hierarchical content
This is especially important for complex prompts mixing instructions, context, and examples

e) Developer vs. User Message Hierarchy (OpenAI)
OpenAI's model spec defines a chain of command:

Developer messages = system's rules and business logic (like a function definition)
User messages = inputs and configuration (like function arguments)
Developer messages take priority

f) Prompt Structure (OpenAI Recommended Order)

Identity: Purpose, communication style, high-level goals
Instructions: Rules, what to do and not do, tool usage guidance
Examples: Possible inputs with desired outputs
Context: Additional information (private data, etc.) — best near the end

g) Long Context: Put Data at Top, Questions at Bottom

Place long documents/inputs at the top of the prompt
Place queries, instructions, and examples below
Queries at the end can improve response quality by up to 30% (Anthropic testing)

3. MEMORY ARCHITECTURE

The Three Types of Agent Memory (Lilian Weng / OpenAI)

Mapping human memory to AI systems:

Human Memory	AI Equivalent	Implementation
Sensory Memory	Embedding representations	Raw input processing
Short-Term / Working Memory	In-context learning	The conversation window (finite, bounded by context length)
Long-Term Memory	External vector store	Database with fast retrieval (RAG, embeddings, structured storage)

Best Practices for Agent Memory

a) Tiered Memory System

Session Memory (Working): Current conversation context. Limited by token window.
Short-Term Persistent: Daily notes, recent task logs. File-based or database.
Long-Term Curated: Distilled insights, decisions, lessons learned. Regularly maintained.
Shared Knowledge Base: Team-wide second brain (like our Second Brain at brain.widerwings.com)

b) Memory Maintenance is Critical

Raw logs accumulate but lose value over time
Periodic "memory maintenance" sessions where the agent reviews raw notes and distills key learnings into long-term memory
This is analogous to how humans sleep and consolidate memories
Schedule this during heartbeats or low-activity periods

c) Memory Search Before Action
Before answering questions about prior work, the agent should always search its memory first:

Prevents contradictions
Prevents duplicate work
Builds on previous decisions rather than reinventing

d) Structured Memory Saves
When saving to memory, use consistent structure:

Type/Category (research, decision, lesson, task, etc.)
Title (searchable)
Content (the substance)
Project/Context (what it relates to)
Importance (for prioritization during retrieval)

e) Context Window Management
When conversations get long:

Summarize older messages to free context space
Use external memory retrieval instead of keeping everything in-context
CrewAI's respect_context_window flag auto-summarizes when nearing limits

4. TOOLS & TOOL DESIGN

Anthropic's Key Insight

"It is crucial to design toolsets and their documentation clearly and thoughtfully." The agent-computer interface (ACI) is as important as the UI is for human users.

Best Practices

a) Prompt Engineer Your Tools

Tool names should be clear and descriptive
Tool descriptions should explain WHEN to use them, not just what they do
Include examples in tool descriptions
Define parameter types and constraints explicitly

b) Right-Size the Toolset

Too few tools = the agent can't accomplish the task
Too many tools = the agent gets confused about which to use
Start small, add tools as needed
Group related tools logically

c) Tools Should Return Useful Feedback

Success/failure messages should be informative
Include enough context for the agent to decide next steps
Error messages should suggest corrective actions

d) Categorize Tools by Risk Level

Safe (auto-execute): Read files, search, calculate
Moderate (log): Write files, make API calls
High (ask permission): Send emails, delete data, external posts

5. WORKFLOW PATTERNS

The Anthropic Taxonomy (from "Building Effective Agents")

Anthopic identifies a progression of patterns, from simple to complex. Only increase complexity when it demonstrably improves outcomes.

Pattern 1: Prompt Chaining

Decompose a task into sequential steps
Each LLM call processes the output of the previous one
Add programmatic checks ("gates") at intermediate steps
Use when: Task cleanly decomposes into fixed subtasks
Example: Generate marketing copy → translate it → review translation

Pattern 2: Routing

Classify input and direct to specialized handler
Allows separation of concerns and specialized prompts
Use when: Distinct categories that benefit from separate handling
Example: Route easy questions to Haiku, hard ones to Opus

Pattern 3: Parallelization

Sectioning: Break task into independent subtasks run in parallel
Voting: Run same task multiple times for diverse outputs
Use when: Subtasks can be parallelized, or multiple perspectives improve confidence
Example: One agent processes user query while another screens for inappropriate content

Pattern 4: Orchestrator-Workers

Central LLM dynamically breaks down tasks and delegates to worker LLMs
Key difference from parallelization: subtasks aren't pre-defined
Use when: Can't predict subtasks needed (e.g., complex coding changes)
Example: Coding products that modify multiple files per task

Pattern 5: Evaluator-Optimizer

One LLM generates, another evaluates, in a loop
Use when: Clear evaluation criteria exist and iteration provides measurable value
Example: Literary translation with iterative refinement

Andrew Ng's Four Agentic Design Patterns

Reflection: LLM examines its own work and improves it. Simple to implement, surprising performance gains. Can be single-agent (self-critique) or multi-agent (generator + critic).
Tool Use: LLM uses web search, code execution, APIs to gather information and take action.
Planning: LLM creates and executes multi-step plans. More unpredictable but powerful for complex tasks.
Multi-Agent Collaboration: Different agents with different roles collaborate. More below.

Critical Insight: Iteration Beats Sophistication

Andrew Ng's HumanEval benchmark results:

GPT-3.5 zero-shot: 48.1% correct
GPT-4 zero-shot: 67.0% correct
GPT-3.5 with agent loop: 95.1% correct

An older model with agentic workflow dramatically outperforms a newer model without one. The workflow architecture matters more than the model.

6. MULTI-AGENT ORCHESTRATION

Why Multiple Agents Work (Even with the Same LLM)

Andrew Ng offers three reasons why prompting the same LLM as different agents outperforms a single agent:

It empirically works. Ablation studies (e.g., AutoGen paper) confirm superior performance.
Focus beats breadth. Even with long context windows, LLMs understand focused tasks better. By decomposing into roles, you can optimize each subtask individually.
Developer abstraction. Multi-agent design gives developers a natural framework for breaking down complex tasks — like splitting a project across team members with different specialties.

The Handoff Pattern (OpenAI Swarm)

A key concept: agents "hand off" conversations to other agents:

Each agent has: name, model, instructions, tools
When an agent needs help outside its domain, it transfers the conversation
The receiving agent gets full conversation history
Simple, powerful, controllable

Practical Multi-Agent Design

a) Define Clear Boundaries
Each agent should have:

A specific domain of expertise
Clear input/output expectations
Defined handoff conditions (when to escalate or transfer)

b) Avoid Over-Decomposition

Don't create 10 agents when 3 will do
Each agent should have enough scope to be useful independently
If an agent's job description is one sentence, it's probably too narrow

c) Communication Protocol

Standardize how agents share information (shared memory, message passing, structured handoffs)
Define what context transfers with a handoff
Use a coordinator/orchestrator agent for complex workflows

d) The Manager Analogy
Andrew Ng: "In many companies, managers decide what roles to hire, then how to split complex projects into smaller tasks assigned to employees with different specialties. Using multiple agents is analogous."

7. PRACTICAL RECOMMENDATIONS FOR WIDERWINGS

Based on this research, here's what we should consider for our agent team:

A. Enhance Agent Identity Files

Every agent should have a rich SOUL.md with:

Specific role, goal, and backstory
Strong opinions and preferences relevant to their domain
Anti-patterns (what NOT to do)
Communication style
Clear boundaries

B. Add Few-Shot Examples

For each agent's core tasks, include 3-5 examples of ideal outputs:

Kai: Example component implementations with ideal structure
Atlas: Example API designs with ideal patterns
Maya: Example blog posts with ideal tone and SEO structure
Mark: Example research documents with ideal depth

C. Implement Reflection Loops

For important outputs, have agents self-critique before delivering:

Code agents: Review own code for bugs, style, edge cases
Content agents: Review own content for accuracy, tone, completeness
This can be done within a single agent turn ("Now review what you just wrote...")

D. Strengthen Memory Practices

All agents should search Second Brain before starting any task
All agents should save results to Second Brain immediately when done
Schedule periodic memory maintenance (distill daily notes into long-term insights)
Standardize memory save format across agents

E. Right-Size Tool Access

Each agent should have only the tools relevant to their domain
Avoid giving every agent every tool
Document tools thoroughly — tool descriptions are prompts too

F. Use the Simplest Pattern That Works

Default to single-agent execution
Use prompt chaining for multi-step tasks with clear sequence
Use orchestrator-workers only for truly dynamic decomposition
Add reflection when output quality is critical

G. Evaluate and Iterate

Build evals: define what "good output" looks like for each agent
Test prompts before deploying
Pin model versions for consistency
Measure before and after when making changes

Sources

Anthropic — "Building Effective Agents" (2024): https://www.anthropic.com/engineering/building-effective-agents
Anthropic — Claude Prompting Best Practices (2026): https://platform.claude.com/docs/en/build-with-claude/prompt-engineering/claude-prompting-best-practices
OpenAI — Prompt Engineering Guide (2026): https://developers.openai.com/api/docs/guides/prompt-engineering
OpenAI — Orchestrating Agents: Routines and Handoffs (Swarm): https://developers.openai.com/cookbook/examples/orchestrating_agents
Lilian Weng (OpenAI) — LLM Powered Autonomous Agents (2023): https://lilianweng.github.io/posts/2023-06-23-agent/
Andrew Ng / DeepLearning.AI — Agentic Design Patterns Parts 1-5 (2024): https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-performance/
CrewAI — Agent Documentation (2026): https://docs.crewai.com/concepts/agents

This is a living document. As we refine our agent architecture, update with lessons learned.

Research: How to Build Truly Expert AI Agents — Comprehensive Guide

How to Build Truly Expert AI Agents

A Comprehensive Research Document for WiderWings

Executive Summary

1. IDENTITY & PERSONA

Why It Matters

Best Practices

Template for Agent Identity

2. INSTRUCTIONS & PROMPTING

The Golden Rule (Anthropic)

Key Techniques

3. MEMORY ARCHITECTURE

The Three Types of Agent Memory (Lilian Weng / OpenAI)

Best Practices for Agent Memory

4. TOOLS & TOOL DESIGN

Anthropic's Key Insight

Best Practices

5. WORKFLOW PATTERNS

The Anthropic Taxonomy (from "Building Effective Agents")

Pattern 1: Prompt Chaining

Pattern 2: Routing

Pattern 3: Parallelization

Pattern 4: Orchestrator-Workers

Pattern 5: Evaluator-Optimizer

Andrew Ng's Four Agentic Design Patterns

Critical Insight: Iteration Beats Sophistication

6. MULTI-AGENT ORCHESTRATION

Why Multiple Agents Work (Even with the Same LLM)

The Handoff Pattern (OpenAI Swarm)

Practical Multi-Agent Design

7. PRACTICAL RECOMMENDATIONS FOR WIDERWINGS

A. Enhance Agent Identity Files

B. Add Few-Shot Examples

C. Implement Reflection Loops

D. Strengthen Memory Practices

E. Right-Size Tool Access

F. Use the Simplest Pattern That Works

G. Evaluate and Iterate

Sources