Designing escalation paths for blocked agents
How to design escalation paths when coding agents get stuck—identifying blockage types, matching responders, configuring timeouts, and building feedback loops.
A blocked coding agent is wasted compute and lost momentum. Without a clear escalation path, blocked agents either spin indefinitely or fire noisy alerts that nobody owns. Effective escalation design matches each type of blockage to the right human responder, applies appropriate timeouts, and creates feedback loops so the same problems stop recurring.
Identifying blockage types
Not all blocks are equal. Categorizing them lets you route each to the right responder and set appropriate urgency levels.
Dependency blockages
The agent needs something from another team or service—an API that is not deployed, a database migration that has not run, or a library update that is pending review. Dependency blocks are often the slowest to resolve because they cross team boundaries.
Permission blockages
The agent lacks access to a resource it needs—a repository, a secrets vault, a staging environment, or a third-party API key. Permission blocks are usually fast to resolve once the right person is notified, but they can sit for hours if the request goes to a generic queue.
Ambiguity blockages
The agent cannot proceed because the task specification is unclear, contradictory, or incomplete. These blocks require human judgment—a product owner or tech lead must clarify intent before the agent can resume.
Infrastructure blockages
The environment itself has a problem—CI is down, a container fails to build, tests time out due to resource limits, or a network partition isolates the agent from required services. Infrastructure blocks often affect multiple agents simultaneously.
Matching blockages to responders
Each blockage type should map to a specific role or person, documented in a single reference that the team can find without searching.
Dependency blocks route to the team that owns the blocking service. If that team uses Dailybot, cross-post the escalation to their channel with context so they can prioritize.
Permission blocks route to the team lead or security admin who can grant access. Include the specific resource and the reason for access so the approver can act without back-and-forth.
Ambiguity blocks route to the product owner, tech lead, or whoever wrote the original specification. Include the agent’s interpretation and the specific question it cannot resolve.
Infrastructure blocks route to the platform or DevOps team. Include error logs, environment identifiers, and whether other agents are also affected.
Configuring timeout-based escalation
Timeouts prevent blocks from sitting silently. The right timeout depends on task criticality and blockage type.
Setting timeout windows
A reasonable starting framework:
- Critical tasks (sprint-blocking, production-related): escalate after 30 minutes
- Standard tasks (feature work, refactoring): escalate after 1-2 hours
- Low-priority tasks (nice-to-have improvements, documentation): escalate after 4 hours
Within these bands, adjust by blockage type. Permission blocks often have faster resolution paths than dependency blocks, so you might set a shorter timeout for permissions (30 minutes) and a longer one for dependencies (2 hours) even on the same priority level.
Escalation levels
Design at least two levels. The first level notifies the designated responder via DM or channel mention. If no response comes within an additional timeout window, the second level notifies a broader group—a manager, an ops channel, or an on-call rotation.
Avoid going beyond three levels. If a block requires three escalations to get attention, the problem is not the escalation system—it is the response culture.
Building feedback loops
Escalation without learning is just firefighting. Every resolved blockage is data for preventing the next one.
Post-resolution documentation
After resolving a block, the responder or agent should record: what caused the block, how it was resolved, and whether the fix is permanent or temporary. This data feeds a knowledge base that agents and humans can reference.
Recurrence tracking
Track which blockage types recur most frequently. If permission blocks fire three times a month for the same resource, invest in a standing permission grant rather than resolving each one individually. If ambiguity blocks cluster around a specific product area, the specifications for that area need improvement.
Agent learning
Some blockages can be prevented by giving agents better context upfront. If an agent consistently gets stuck on the same type of environment setup, add that setup to the agent’s initialization checklist. If ambiguity blocks arise from the same missing information, include that information in the task template.
Designing the escalation document
Create a single-page reference that the team can consult in thirty seconds:
- Blockage type (dependency, permission, ambiguity, infrastructure)
- First responder (role or named person)
- Timeout to first escalation (minutes)
- Second responder (role or group)
- Timeout to second escalation (minutes)
- Notification method (DM, channel, pager)
Post this document in your ops channel, link it from your agent configuration, and review it quarterly. When team structure changes—new hires, role rotations, team splits—update the escalation document the same week.
Making escalation invisible
The best escalation system is one that rarely fires because the underlying causes have been systematically eliminated. Use escalation data not as a performance metric but as a signal for where to invest in prevention. Over time, the goal is not faster escalation—it is fewer escalations needed in the first place.
FAQ
- What are the main types of agent blockages?
- Dependency blockages (waiting on another service or team), permission blockages (lacking access to a resource), ambiguity blockages (unclear requirements or conflicting specs), and infrastructure blockages (environment failures, resource exhaustion).
- How should escalation timeouts be configured?
- Set timeouts based on task criticality and blockage type. Critical tasks might escalate after 30 minutes, while lower-priority work tolerates one to two hours. Each blockage type can have its own timeout to match the expected resolution speed.
- How do feedback loops prevent repeat escalations?
- After resolving a blockage, document the root cause and fix. If the same blockage type recurs three or more times, invest in a systemic fix—updating permissions, improving specs, or adding environment checks—so agents stop hitting the same wall.