Observability Is Not Monitoring — Here's the Difference That Matters
"We have observability — we use Grafana dashboards."
No. You have monitoring. And there's a critical difference that most teams miss until their first major incident.
"We have observability — we use Grafana dashboards."
No. You have monitoring. And there's a critical difference that most teams miss until their first major incident.
Your infrastructure was deployed with Terraform. It's version-controlled. It's reviewed. It's compliant.
Then someone SSH'd into a server and changed a config file. Someone used the Azure portal to add a firewall rule. Someone ran kubectl edit to patch a deployment in production.
Now your Terraform state says one thing. Reality says another. That gap is configuration drift — and it's the silent killer of reliable infrastructure.
3 AM. PagerDuty fires. You open the runbook. It says:
"Check the logs and restart if necessary."
That's not a runbook. That's a suggestion. And at 3 AM, it's useless.
Here's how to write runbooks that actually help during incidents.