ops-demo/docs/06-monitoring.md
Paul Harkink ed5d39efa2 feat(ex06): bonus monitoring — Prometheus + Grafana via kube-prometheus-stack
- apps/monitoring/prometheus-grafana.yaml: ArgoCD Application (chart 68.4.4)
- manifests/monitoring/values.yaml: lightweight values, Grafana ingress, 6h retention
- docs/06-monitoring.md: Exercise 06 bonus participant guide
2026-02-28 15:34:47 +01:00

3.8 KiB
Raw Blame History

Exercise 06 (Bonus) — Monitoring: Prometheus + Grafana

Time: ~60 min Goal: Deploy a full observability stack via ArgoCD and explore cluster + application metrics in Grafana.


What you'll learn

  • How to deploy a complex multi-component stack (kube-prometheus-stack) purely via GitOps
  • How Prometheus scrapes metrics from Kubernetes and applications
  • How to navigate Grafana dashboards for cluster and pod-level metrics

Prerequisites

Exercises 0103 complete. Ingress-Nginx is running and nip.io URLs are reachable from your laptop.

Note: This exercise adds ~700 MB of additional memory usage. It works on an 8 GB VM but may be slow. If the VM feels sluggish, reduce replicas or skip Prometheus storageSpec.


Steps

1. Enable the monitoring Application

The ArgoCD Application manifest for the monitoring stack is already in apps/monitoring/. The root App-of-Apps watches this directory, so the application should already appear in ArgoCD as prometheus-grafana.

Check its sync status:

kubectl get application prometheus-grafana -n argocd

The initial sync takes 58 minutes — the kube-prometheus-stack chart is large and installs many CRDs.


2. Watch the stack come up

kubectl get pods -n monitoring -w
# You'll see prometheus, grafana, kube-state-metrics, node-exporter pods appear

Once all pods are Running:

kubectl get ingress -n monitoring
# NAME      CLASS   HOSTS                               ADDRESS
# grafana   nginx   grafana.192.168.56.200.nip.io       192.168.56.200

3. Open Grafana

From your laptop: http://grafana.192.168.56.200.nip.io

Login: admin / workshop123


4. Explore dashboards

kube-prometheus-stack ships with pre-built dashboards. In the Grafana sidebar: Dashboards → Browse

Useful dashboards for this workshop:

Dashboard What to look at
Kubernetes / Compute Resources / Namespace (Pods) CPU + memory per pod in podinfo namespace
Kubernetes / Compute Resources / Node (Pods) Node-level resource view
Node Exporter / Full VM-level CPU, memory, disk, network

5. Generate some load on podinfo

In a new terminal, run a simple load loop:

# Inside the VM
while true; do curl -s http://podinfo.192.168.56.200.nip.io > /dev/null; sleep 0.2; done

Switch back to Grafana → Kubernetes / Compute Resources / Namespace (Pods) → set namespace to podinfo. You should see CPU usage climb for the podinfo pod.


6. Explore the GitOps aspect

Every configuration change to the monitoring stack goes through Git.

Try changing the Grafana admin password:

vim manifests/monitoring/values.yaml
# Change: adminPassword: workshop123
# To:     adminPassword: supersecret
git add manifests/monitoring/values.yaml
git commit -m "chore(monitoring): update grafana admin password"
git push

Watch ArgoCD sync the Helm release, then try logging into Grafana with the new password.


Expected outcome


Troubleshooting

Symptom Fix
Pods in Pending state VM may be low on memory; kubectl describe pod to confirm
Grafana 502 from Nginx Grafana pod not ready yet; wait and retry
No data in dashboards Prometheus needs ~2 min to scrape first metrics; wait and refresh
CRD conflict on sync First sync installs CRDs; second sync applies resources — retry

Going further (at home)

  • Add a podinfo ServiceMonitor so Prometheus scrapes podinfo's /metrics endpoint
  • Create a custom Grafana dashboard for podinfo request rate and error rate
  • Alert on high memory usage with Alertmanager (enable it in values.yaml)