Back to blog

Agentic workflows at scale

Designing long-running agents that stay reliable across many tool calls.

Draft

Topics

AgentsToolingPlanningEvaluation

Long-running agents are systems, not scripts. This post outlines how I design plan loops, guardrails, and evaluation for stable agent behavior.

Outline

  • Agent loops that do not collapse
  • Planning, retries, and safe fallbacks
  • Tool call reliability and guardrails
  • Measuring agents like real systems

Plan-loop design

  • Task decomposition with checkpoints.
  • Explicit success criteria per step.
  • Graceful recovery when steps fail.

Tooling reliability

  • Idempotent tools and safe retries.
  • Backoff strategies for flaky APIs.
  • Context pruning to prevent overload.

Safety controls

  • Budget caps and stop conditions.
  • Human-in-the-loop review points.
  • Audit logs for full visibility.

Evaluation at scale

  • Success metrics aligned to business outcomes.
  • Replay logs for regression testing.
  • Continuous improvement via evals.

Want a specific angle covered?

Tell me what you're building and I'll write it.