Monitoring Your Infrastructure with Prometheus and Grafana

Monitoring vs. Observability

Monitoring tells you when something is broken. Observability tells you why. A good DevOps setup needs both. In practice, that means collecting three types of data:

Metrics — numbers over time (CPU usage, request rate, error rate)
Logs — timestamped events from your applications and infrastructure
Traces — the path a request takes through your system

In this guide, we'll set up Prometheus for metrics collection and Grafana for visualization using Docker Compose.

Step 1: Docker Compose Setup

Create a docker-compose.yml:

version: '3.8'

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=15d'
    restart: unless-stopped

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=devopspack
    volumes:
      - grafana_data:/var/lib/grafana
    depends_on:
      - prometheus
    restart: unless-stopped

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    ports:
      - "9100:9100"
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
    restart: unless-stopped

volumes:
  prometheus_data:
  grafana_data:

Step 2: Configure Prometheus

Create prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets: []

rule_files:
  - "alerts.yml"

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'your-app'
    static_configs:
      - targets: ['your-app:8080']
    metrics_path: '/metrics'

Step 3: Create Alerts

Create alerts.yml — these are the three alerts every production system needs:

groups:
  - name: infrastructure
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage detected"
          description: "CPU usage is above 85% for 5 minutes"

      - alert: LowDiskSpace
        expr: (node_filesystem_free_bytes / node_filesystem_size_bytes) * 100 < 10
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Low disk space"
          description: "Less than 10% disk space remaining"

      - alert: ServiceDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Service is down"
          description: "{{ $labels.job }} has been down for more than 1 minute"

Step 4: Launch and Connect Grafana

docker compose up -d

# Check everything is running
docker compose ps

# Prometheus UI
open http://localhost:9090

# Grafana (login: admin / devopspack)
open http://localhost:3000

In Grafana, add Prometheus as a data source: Settings → Data Sources → Add data source → Prometheus. Set the URL to http://prometheus:9090 and click Save & Test.

Step 5: Import a Dashboard

Instead of building dashboards from scratch, import the official Node Exporter dashboard:

Go to Dashboards → Import
Enter dashboard ID 1860 (Node Exporter Full)
Select your Prometheus data source
Click Import

You now have a full system dashboard showing CPU, memory, disk, and network — all in real time.

Key Metrics to Watch

CPU — alert above 85% sustained for 5+ minutes
Memory — alert above 90% used
Disk — alert below 10% free (and below 5% as critical)
HTTP error rate — alert if 5xx responses exceed 1% of traffic
Response time (p99) — alert if the 99th percentile exceeds your SLA

Good monitoring catches problems before your users do. Set it up early, tune your thresholds over time, and make sure your alerts actually reach your team — a silent alert is no alert at all.

Monitoring Your Infrastructure with Prometheus and Grafana

Peter Gonda

Uptime Kuma + Globalping: Self-Hosted Monitoring With Global Visibility

Supabase: The Open Source Firebase Alternative Built on PostgreSQL

HashiCorp Nomad + Autoscaler: Orchestration Without the Kubernetes Tax

Monitoring vs. Observability

Step 1: Docker Compose Setup

Step 2: Configure Prometheus

Step 3: Create Alerts

Step 4: Launch and Connect Grafana

Step 5: Import a Dashboard

Key Metrics to Watch

Member discussion

Zenoh: The Protocol That Runs From Microcontroller to Data Center

Stalwart + Bulwark: Self-Hosted Email That Doesn't Require a PhD to Run

Docmost: The Open Source Wiki That Doesn't Need an Identity Provider to Work

Mattermost: Self-Hosted Team Messaging That Your Organization Actually Owns

Open WebUI: Self-Hosted ChatGPT With Local Models and Full Data Control

Monitoring Your Infrastructure with Prometheus and Grafana

Peter Gonda

Uptime Kuma + Globalping: Self-Hosted Monitoring With Global Visibility

Supabase: The Open Source Firebase Alternative Built on PostgreSQL

HashiCorp Nomad + Autoscaler: Orchestration Without the Kubernetes Tax

Get all the latest posts delivered straight to your inbox.

Monitoring vs. Observability

Step 1: Docker Compose Setup

Step 2: Configure Prometheus

Step 3: Create Alerts

Step 4: Launch and Connect Grafana

Step 5: Import a Dashboard

Key Metrics to Watch

Member discussion

Zenoh: The Protocol That Runs From Microcontroller to Data Center

Stalwart + Bulwark: Self-Hosted Email That Doesn't Require a PhD to Run

Docmost: The Open Source Wiki That Doesn't Need an Identity Provider to Work

Mattermost: Self-Hosted Team Messaging That Your Organization Actually Owns

Open WebUI: Self-Hosted ChatGPT With Local Models and Full Data Control