Skip to content

What eng leaders get wrong about observability

Common observability mistakes: confusing monitoring with observability, tracking vanity metrics, and treating it as a tool purchase rather than a practice. Here is the correct mental model.

opinion Leadership Manager 7 min read

Observability is one of those concepts that every engineering leader agrees is important and almost nobody implements well. The gap between “we value observability” and “we practice observability” is where most mistakes live. And the consequences are getting worse as coding agents introduce a new class of work that is even harder to observe than human contributions.

Here are the mistakes that keep showing up, and what to do instead.

Mistake one: confusing monitoring with observability

Monitoring and observability are not the same thing, even though most organizations treat them interchangeably.

Monitoring answers predefined questions. Is the server responding? Is latency below the threshold? Is the error rate within bounds? You decide what to watch, set up alerts for known failure modes, and get notified when something crosses a line.

Observability is different. It is the ability to ask new questions about system behavior — questions you did not know you needed to ask until something unexpected happened. An observable system lets you investigate novel problems. A monitored system only tells you about problems you anticipated.

The distinction matters because the most painful failures are the ones nobody predicted. If your observability strategy is a collection of dashboards for known metrics, you are well-monitored but poorly observable. You will catch the failures you expected and be blind to the ones that actually hurt.

Mistake two: tracking vanity metrics

Lines of code written. Number of commits. Pull requests merged per week. Story points completed. These are the observability equivalent of checking your step count and concluding you are healthy.

Vanity metrics measure activity, not outcomes. A developer who refactors 500 lines into 50 is doing more valuable work than one who adds 500 lines of redundant code, but the vanity metrics make the second developer look more productive. A team that merges twenty small PRs is not necessarily outperforming a team that merges five large, well-considered ones.

The problem compounds with coding agents. Agents can generate enormous volumes of code, commits, and pull requests. If your metrics reward volume, agents will look spectacularly productive — even when half their output needs rework. You end up optimizing for the wrong signal.

The correction is measuring outcomes rather than outputs. How many customer-facing bugs were resolved? How quickly did the team ship a working feature? What is the defect rate in agent-generated code versus human-generated code? These questions are harder to instrument, but they reveal whether the work is actually valuable.

Mistake three: applying infrastructure observability to human work

The engineering world has excellent observability for infrastructure. We know how to trace requests through microservices, monitor memory usage, and alert on latency spikes. The temptation is to apply the same patterns to human and agent work — treating developers and agents like servers and measuring their “throughput” and “uptime.”

This does not work. Human work is not a request pipeline. A developer’s most valuable contribution might be a conversation that prevented a bad architectural decision — an event that generates zero metrics. An agent’s most valuable contribution might be a test that catches a bug before production — which looks identical to any other test in the metrics.

Infrastructure observability works because servers behave deterministically and have well-defined performance characteristics. Human and agent work is creative, variable, and context-dependent. The observability model has to be different: less about measuring throughput and more about maintaining shared awareness of what is happening and why.

Mistake four: treating observability as a tool purchase

“We bought Datadog, so now we have observability.” This is like saying “we bought a gym membership, so now we are fit.”

Tools are necessary but not sufficient. Observability is a practice — a sustained investment in the ability to understand what your system (and your team) is doing. That practice includes deciding what questions matter, instrumenting systems to answer those questions, building the habit of asking them, and evolving the questions as conditions change.

Many organizations buy excellent observability tools and then use them to build the same dashboards they had before. The tool changed; the practice did not. Real observability requires cultural commitment to investigation, not just data collection.

Mistake five: observing the system but not the work

Most observability strategies focus on the technical system — the servers, the services, the deployments. Almost none focus on the work — who is building what, whether human and agent contributions are visible to the team, and whether anyone has a coherent picture of progress.

This blind spot was manageable when all contributors were humans who talked to each other. It becomes critical when agents contribute significant output that nobody discusses in standup. The “system” now includes human developers and AI agents, and the observability practice has to expand to match.

Work observability means knowing, at any given time: what has been completed (by humans and agents), what is in progress, what is blocked, and whether the overall trajectory matches the plan. This is not project management — it is the engineering equivalent of the situational awareness that keeps complex systems safe.

The correct mental model

Observability is the practice of maintaining the ability to understand system behavior, including the behavior of the people and agents who build the system. It is not dashboards. It is not metrics. It is not a tool. It is the organizational habit of staying curious, asking good questions, and building the infrastructure to answer them.

For teams using coding agents, this means bringing agent output into the same visibility stream as human work. Tools like Dailybot unify human check-ins and agent reports into a single timeline, giving leaders the observability they need across both types of contributors. Not as surveillance, but as the shared awareness that lets a team coordinate effectively.

Get observability right, and you can manage complexity that would be ungovernable otherwise. Get it wrong, and you will have more data than ever with less understanding than you need.

FAQ

What is the most common observability mistake engineering leaders make?
The most common mistake is confusing monitoring with observability. Monitoring answers predefined questions ('Is the server up?'). Observability lets you ask new questions about system behavior you did not anticipate. If you can only check known failure modes, you have monitoring, not observability.
Why are metrics like lines of code and commit counts considered vanity metrics?
These metrics measure activity, not outcomes. A developer who deletes 500 lines of dead code and a developer who adds 500 lines of bloated code look identical by this measure. Vanity metrics create the illusion of insight without actually revealing whether the team is producing valuable work.
How should engineering leaders think about observability differently?
Observability is a practice, not a product. It requires investing in the ability to ask new questions about system and team behavior as conditions change. The goal is understanding, not data collection. Leaders should focus on what questions they need to answer, not what metrics they can collect.