Monitoring vs. Observability
Monitoring tells you when something is broken. Observability tells you why. A good DevOps setup needs both. In practice, that means collecting three types of data:
- Metrics — numbers over time (CPU usage, request rate, error rate)
- Logs — timestamped events from your applications and infrastructure
- Traces — the path a request takes through your system
In this guide, we'll set up Prometheus for metrics collection and Grafana for visualization using Docker Compose.
Step 1: Docker Compose Setup
Create a docker-compose.yml:
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention.time=15d'
restart: unless-stopped
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_PASSWORD=devopspack
volumes:
- grafana_data:/var/lib/grafana
depends_on:
- prometheus
restart: unless-stopped
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
ports:
- "9100:9100"
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
restart: unless-stopped
volumes:
prometheus_data:
grafana_data:
Step 2: Configure Prometheus
Create prometheus.yml:
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets: []
rule_files:
- "alerts.yml"
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
- job_name: 'your-app'
static_configs:
- targets: ['your-app:8080']
metrics_path: '/metrics'
Step 3: Create Alerts
Create alerts.yml — these are the three alerts every production system needs:
groups:
- name: infrastructure
rules:
- alert: HighCPUUsage
expr: 100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage detected"
description: "CPU usage is above 85% for 5 minutes"
- alert: LowDiskSpace
expr: (node_filesystem_free_bytes / node_filesystem_size_bytes) * 100 < 10
for: 1m
labels:
severity: critical
annotations:
summary: "Low disk space"
description: "Less than 10% disk space remaining"
- alert: ServiceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Service is down"
description: "{{ $labels.job }} has been down for more than 1 minute"
Step 4: Launch and Connect Grafana
docker compose up -d
# Check everything is running
docker compose ps
# Prometheus UI
open http://localhost:9090
# Grafana (login: admin / devopspack)
open http://localhost:3000
In Grafana, add Prometheus as a data source: Settings → Data Sources → Add data source → Prometheus. Set the URL to http://prometheus:9090 and click Save & Test.
Step 5: Import a Dashboard
Instead of building dashboards from scratch, import the official Node Exporter dashboard:
- Go to Dashboards → Import
- Enter dashboard ID 1860 (Node Exporter Full)
- Select your Prometheus data source
- Click Import
You now have a full system dashboard showing CPU, memory, disk, and network — all in real time.
Key Metrics to Watch
- CPU — alert above 85% sustained for 5+ minutes
- Memory — alert above 90% used
- Disk — alert below 10% free (and below 5% as critical)
- HTTP error rate — alert if 5xx responses exceed 1% of traffic
- Response time (p99) — alert if the 99th percentile exceeds your SLA
Good monitoring catches problems before your users do. Set it up early, tune your thresholds over time, and make sure your alerts actually reach your team — a silent alert is no alert at all.

Member discussion