Content is user-generated and unverified.

Behavior Trees for AI Coding Agents

Core Concepts

A behavior tree is a hierarchical control structure that determines which actions an agent should take based on conditions and priorities. Unlike static task lists, behavior trees provide reactive decision-making with built-in fallback strategies.

Key Advantages for AI Agents

  1. Graceful failure handling - Fallback strategies encoded in the tree structure
  2. Reduced LLM calls - Logic flows through the tree without constant replanning
  3. Debuggability - Clear audit trail of decisions ("Why did it do that?")
  4. Modularity - Reusable subtrees for common patterns
  5. Reactivity - Responds to changing conditions without full replanning

Node Types

Selector (Fallback) Node

  • Symbol: ? or
  • Behavior: Tries each child from left to right until one succeeds
  • Returns: Success if any child succeeds, Failure if all children fail
  • Use case: "Try plan A, failing that try B, failing that try C..."
Selector: Handle Test Failure
  → Try quick fix (syntax error, typo)
  → Try deeper analysis (logic error)
  → Add debug logging and report
  → Escalate to human

Sequence Node

  • Symbol: or
  • Behavior: Executes each child in order until one fails
  • Returns: Success if all children succeed, Failure if any child fails
  • Use case: Multi-step processes where all steps must complete
Sequence: Implement Feature
  → Understand requirements
  → Check existing codebase
  → Write implementation
  → Run tests
  → Verify tests pass

Condition Node

  • Behavior: Evaluates a condition, returns Success or Failure
  • Examples: "File exists?", "Tests passing?", "Syntax valid?"

Action Node

  • Behavior: Performs an action (might call LLM, run tool, execute code)
  • Examples: "Read file", "Write code", "Run tests", "Ask clarifying question"

Architecture for Coding Agents

Top-Level Task Router

Selector: Choose Task Type
  → Sequence: Bug Fix Task
      • Is this a bug fix request?
      • Execute bug fix subtree
  → Sequence: New Feature Task
      • Is this a feature request?
      • Execute feature subtree
  → Sequence: Refactoring Task
      • Is this refactoring?
      • Execute refactor subtree
  → Sequence: Analysis Task
      • Is this analysis/explanation?
      • Execute analysis subtree
  → Action: Request clarification

Example: Bug Fix Subtree

Sequence: Fix Bug
  → Action: Read error message/description
  → Action: Locate relevant code
  → Selector: Identify Bug Type
      • Sequence: Syntax Error
          - Is syntax error?
          - Fix syntax
          - RETURN SUCCESS
      • Sequence: Import Error
          - Is import/dependency issue?
          - Install/fix import
          - RETURN SUCCESS
      • Sequence: Logic Error
          - Analyze logic
          - Identify fix
          - RETURN SUCCESS
      • Action: Complex - needs investigation subtree
  → Action: Apply fix
  → Action: Run tests
  → Selector: Handle Test Results
      • Condition: Tests pass? → SUCCESS
      • Execute debug subtree → Continue

Example: Feature Implementation Subtree

Sequence: Implement Feature
  → Condition: Requirements clear?
      • If false → Action: Ask questions, then restart
  → Action: Read related existing code
  → Action: Plan implementation approach
  → Selector: Choose Implementation Strategy
      • Sequence: Extend Existing
          - Can extend existing code?
          - Extend and modify
      • Sequence: New Module
          - Create new file/module
          - Implement feature
      • Sequence: Hybrid
          - Create new + modify existing
  → Action: Write code
  → Action: Run tests
  → Selector: Verify Quality
      • Sequence: Tests Pass
          - Tests passing?
          - Meets requirements?
          - SUCCESS
      • Execute debug/fix subtree
  → Action: Format and return

Example: Debug Subtree

Selector: Debug Failed Tests
  → Sequence: Quick Fixes
      • Identify obvious errors (syntax, typos)
      • Apply fixes
      • Re-run tests
      • Tests pass? → SUCCESS
  → Sequence: Systematic Debug
      • Add logging/print statements
      • Run tests again
      • Analyze output
      • Identify root cause
      • Apply fix
      • Re-run tests
      • Tests pass? → SUCCESS
  → Sequence: Deeper Investigation
      • Check test expectations vs actual
      • Trace execution flow
      • Identify edge cases
      • Implement fix
      • Tests pass? → SUCCESS
  → Action: Report unable to fix automatically (with analysis)

Execution Model

Tick-Based vs Event-Based

Tick-based (game AI style):

  • Tree evaluates from root every "tick" (iteration)
  • Good for reactive, real-time systems
  • Can respond to changing conditions mid-execution

Event-based (better for coding agents):

  • Tree evaluates when triggered (new task, action complete, failure)
  • More efficient for non-real-time tasks
  • Still reactive but doesn't waste computation

State Management

Nodes can be:

  • Stateless: Evaluate fresh each time (conditions, simple actions)
  • Stateful: Remember progress (long-running actions, LLM calls)

Example: A "Write 500 lines of code" action might be stateful - it returns RUNNING while the LLM is generating, then SUCCESS when complete.

Return Values

Each node returns:

  • SUCCESS: Task completed successfully
  • FAILURE: Task failed, try next alternative
  • RUNNING: Task still in progress (for async operations)

Debugging and Traceability

Decision Audit Trail

Log every node evaluation with:

json
{
  "timestamp": "2025-11-25T14:32:01Z",
  "node_type": "Selector",
  "node_name": "Debug Failed Tests",
  "children_tried": [
    {
      "name": "Quick Fixes",
      "result": "FAILURE",
      "reason": "No obvious syntax errors found"
    },
    {
      "name": "Systematic Debug", 
      "result": "SUCCESS",
      "details": "Added logging revealed null pointer in line 42"
    }
  ],
  "final_result": "SUCCESS",
  "path_taken": ["Debug Failed Tests", "Systematic Debug", "Add logging", "Analyze output"]
}

Backward Tracing

When something goes wrong, trace backwards:

Question: "Why did the agent delete my config file?"

Trace:

  1. Root Selector chose "Cleanup Task" branch
  2. Cleanup Sequence executed
  3. "Find unused files" action ran
  4. "Is file referenced?" condition returned FALSE
  5. "Delete file" action executed

Root cause: Reference checker didn't scan configuration directories.

This is vastly cleaner than parsing LLM chain-of-thought outputs.

Visualization

Create visual tree diagrams showing:

  • Nodes executed (highlighted in green)
  • Nodes failed (highlighted in red)
  • Nodes skipped (grayed out)
  • Decision points with conditions

Tools like Graphviz can generate these automatically from logs.

Implementation Considerations

LLM Integration Points

The LLM should be called at specific nodes:

  • Planning nodes: "Analyze this bug and identify the issue"
  • Action nodes: "Write code to implement X"
  • Condition evaluation: "Does this code satisfy requirement Y?"

The tree structure itself should be deterministic and not require LLM calls to navigate.

When to Use vs Task Lists

Use behavior trees when:

  • Complex conditional branching
  • Multiple fallback strategies needed
  • Debugging/traceability important
  • Reactive behavior valuable
  • You're writing lots of nested if-statements

Stick with task lists when:

  • Linear workflows
  • Simple error handling sufficient
  • Prototyping/early development
  • Tasks are naturally sequential

Hybrid Approach

Consider: Behavior tree for control flow, LLM-generated task list for planning.

Sequence: Execute Task
  → Action: LLM generates task list
  → Selector: Execute Each Task
      • For each task in list:
          Sequence:
            - Action: Execute task
            - Selector: Handle outcome
                • Success → Continue
                • Failure → Execute retry subtree

This combines the best of both: LLM for high-level planning, behavior tree for execution and error handling.

Common Patterns

Retry with Backoff

Selector: Attempt Task with Retries
  → Sequence: Try Once
      • Execute action
      • Success? → DONE
  → Sequence: Try with modifications
      • Modify parameters
      • Execute action
      • Success? → DONE
  → Sequence: Try alternative approach
      • Use different method
      • Execute action
      • Success? → DONE
  → Action: Report failure with context

Precondition Checking

Sequence: Safe File Operation
  → Condition: File exists?
  → Condition: Have write permissions?
  → Condition: Not a system file?
  → Action: Backup file
  → Action: Modify file
  → Selector: Verify
      • Tests pass? → SUCCESS
      • Action: Restore backup → FAILURE

Progressive Enhancement

Selector: Implement with Quality Levels
  → Sequence: Production Quality
      • Implement with tests
      • Add error handling
      • Add documentation
      • All checks pass? → SUCCESS
  → Sequence: Functional Quality
      • Basic implementation
      • Basic tests
      • Works? → SUCCESS
  → Sequence: Prototype Quality
      • Minimal implementation
      • Manual verification
      • SUCCESS

Comparison to Other Approaches

vs Prolog Backtracking

Similar: Both try alternatives until one succeeds

Different:

  • Prolog: Deep backtracking with variable unification
  • Behavior trees: Shallow, explicit control flow
  • Prolog: Explores entire solution space
  • Behavior trees: Real-time action selection

vs State Machines

Behavior trees advantages:

  • More modular (reusable subtrees)
  • Better for hierarchical decisions
  • Easier to extend

State machines advantages:

  • Better for systems with distinct modes
  • Clearer state transitions
  • Good for turn-based systems

vs Utility AI

Behavior trees: Priority-based (try in order)

Utility AI: Score-based (pick highest scoring action)

Behavior trees are simpler and more debuggable. Utility AI is better for nuanced decision-making with many factors.

Further Reading

  • Game AI Pro chapters on behavior trees
  • Programming Game AI by Example by Mat Buckland
  • Unreal Engine behavior tree documentation
  • Papers on hierarchical task networks (HTNs) - related concept

Getting Started

  1. Start with simple sequences for linear tasks
  2. Add selectors when you need fallbacks
  3. Extract common patterns into reusable subtrees
  4. Add logging/tracing once the tree grows complex
  5. Build visualization tools when debugging gets hard

The key is to start simple and only add complexity when the pain of ad-hoc if-statements becomes greater than the pain of maintaining a tree structure.

Content is user-generated and unverified.
    Behavior Trees for AI Agents: Complete Implementation Guide | Claude