Agentic workflows at scale

Designing long-running agents that stay reliable across many tool calls.

Draft

Topics

AgentsToolingPlanningEvaluation

Long-running agents are systems, not scripts. This post outlines how I design plan loops, guardrails, and evaluation for stable agent behavior.

Outline

Agent loops that do not collapse
Planning, retries, and safe fallbacks
Tool call reliability and guardrails
Measuring agents like real systems

Plan-loop design

Task decomposition with checkpoints.
Explicit success criteria per step.
Graceful recovery when steps fail.

Tooling reliability

Idempotent tools and safe retries.
Backoff strategies for flaky APIs.
Context pruning to prevent overload.

Safety controls

Budget caps and stop conditions.
Human-in-the-loop review points.
Audit logs for full visibility.

Evaluation at scale

Success metrics aligned to business outcomes.
Replay logs for regression testing.
Continuous improvement via evals.

Want a specific angle covered?

Tell me what you're building and I'll write it.