Agent heartbeat for long-running tasks
A ready-to-use pattern for periodic agent check-ins so teams see progress during multi-hour refactors, migrations, and batch jobs.
Long-running agent work—multi-hour refactors, large schema migrations, batch ingestion, or exhaustive test fixes—creates a visibility gap. Humans step away, context switches, or assumes the agent failed when chat goes quiet. A heartbeat template in Dailybot turns that silence into a predictable rhythm: short, structured updates that prove the agent is active, show movement, and surface problems early.
What this template includes
Periodic check-in schedule
The template defines when the agent must report, not just what it says. Typical patterns include a fixed interval (for example, every thirty minutes during heavy work), a phase-based cadence (heartbeat at each migration chunk), or a hybrid (hourly during execution, then a final summary). The schedule should match task risk: higher blast radius or cost of failure warrants shorter intervals. Document the schedule in the same place as the workflow so anyone joining mid-run knows when to expect the next ping.
Progress update format
Each heartbeat uses a compact, scannable block so channels stay readable. Recommended fields: run identifier, current phase, progress indicator (percentage, records processed, or files touched), elapsed time, any errors or retries, and the next planned action before the following heartbeat. Optional tags such as healthy, degraded, or blocked help routing rules and filters. Keeping the format consistent lets humans skim a thread and reconstruct state without opening logs.
Completion and failure notifications
The template ends with two explicit outcomes. Completion posts a final message: summary of what changed, links to artifacts (PRs, migration reports, job IDs), validation steps taken, and handoff notes for review. Failure posts include error class, last successful step, whether the run is safe to retry or requires manual intervention, and who was notified. Treat failure heartbeats as first-class: they should trigger the same escalation path as a missed heartbeat when severity warrants it.
How to configure it
Heartbeat interval
Choose an interval that balances noise versus risk. A fifteen-minute cadence may spam a busy channel; a two-hour gap may leave too much uncertainty during fragile operations. Start with thirty to sixty minutes for general development agents, tighten for production-impacting work, and widen for idempotent background jobs. If the agent supports dynamic pacing, document rules (for example, “accelerate to fifteen minutes after the first error”).
Escalation on missed heartbeats
Define miss as no message within interval plus an agreed grace period (often one full interval). After one miss, optional automated nudge to the agent owner or runner. After two consecutive misses, route to an ops or engineering channel with run metadata. For on-call cultures, mirror the rule in your paging policy only when the underlying job is truly critical—Dailybot shines as the human-readable timeline; paging should follow your existing incident standards.
Sample configuration
Use the following as a starting point for a workflow or form-driven checklist your agent updates via API, slash command, or scheduled automation:
heartbeat:
run_id: "migration-2026-03-20"
interval_minutes: 30
grace_minutes: 15
channel: "#platform-migrations"
template_fields:
- phase
- progress_note
- items_processed
- blockers
- next_checkpoint
status_emoji:
healthy: "🟢"
degraded: "🟡"
blocked: "🔴"
escalation:
after_missed: 2
notify: "#infra-oncall"
include_last_known_state: true
completion:
post_summary: true
attach_artifacts: true
Adjust names and channels to your workspace. The important part is that interval, grace, escalation, and field list are explicit so both humans and automation share the same contract.
Operating tips
Keep heartbeats idempotent-friendly: if a duplicate post slips through, readers should still understand current state. Prefer linking to a single source of truth (dashboard, job URL, or doc) rather than pasting huge logs. When multiple agents run in parallel, prefix messages with agent or task identifiers to avoid collisions.
Used well, this template turns long agent sessions from a black box into a shared operational narrative—exactly what developers and ops need when the work outlasts a single meeting or work block.
FAQ
- What problem does an agent heartbeat solve?
- It gives humans visibility when coding agents run for hours—so nobody assumes a stuck or silent process when work is still progressing.
- What should each heartbeat message include?
- Timestamp, phase or milestone, percent or qualitative progress, blockers, next expected heartbeat, and whether the run is healthy or degraded.
- How do teams handle missed heartbeats?
- Define an interval and grace window, then route an escalation to a designated channel or on-call surface after consecutive misses.