diff --git a/roadmap.md b/roadmap.md new file mode 100644 index 0000000..1801083 --- /dev/null +++ b/roadmap.md @@ -0,0 +1,69 @@ +# Workshop Roadmap + +## Exercise Map + +| # | Exercise | Type | Est. Time | Status | +|---|----------|------|-----------|--------| +| 01 | Bootstrap ArgoCD | Core | 30 min | ✅ Implemented | +| 02 | Deploy podinfo via GitOps | Core | 30 min | ✅ Implemented | +| 03 | MetalLB + Ingress-Nginx (LAN exposure) | Core | 45 min | ✅ Implemented | +| 04 | Tekton pipeline (image tag bump → GitOps loop) | Core | 45 min | ✅ Implemented | +| 05 | App upgrade via GitOps | Core | 15 min | ✅ Implemented | +| 06 | Monitoring: Prometheus + Grafana | Bonus | 60 min | ✅ Implemented | + +**Total core: ~2.5–3h. Beginners may stop after Exercise 03 (~1h45m).** + +--- + +## Solution Branches + +Model: solution branches are **standalone per exercise** (not cumulative). + +| Branch | State | +|--------|-------| +| `solution/01-argocd-bootstrap` | ArgoCD running, root app applied | +| `solution/02-deploy-podinfo` | podinfo synced via ArgoCD | +| `solution/03-metallb-ingress` | MetalLB + Ingress-Nginx + podinfo reachable on LAN; CRD `caBundle` drift handling included | +| `solution/04-tekton-pipeline` | Full Tekton GitOps loop working | +| `solution/05-app-upgrade` | deployment.yaml bumped to 6.7.0 | +| `solution/06-monitoring` | Prometheus + Grafana running | + +--- + +## Verification Status + +| Exercise | Smoke-tested | +|----------|-------------| +| 01 | ✅ Validated (clean VM + bootstrap + root sync) | +| 02 | ✅ Validated (podinfo app deploy + healthy) | +| 03 | ✅ Validated (MetalLB + ingress + podinfo URL reachable) | +| 04 | ✅ Validated after hardening fixes (PSA patch + pipeline runtime fixes) | +| 05 | ✅ Validated (upgrade/drift workflow over working 04 stack) | +| 06 | ✅ Validated (Prometheus/Grafana app healthy + Grafana ingress reachable) | + +Full end-to-end test: completed on `ops-demo-tryout` from clean baseline through 01–06. + +--- + +## Recent Changes (2026-03-01) + +- End-to-end smoke test executed in clean tryout environment (`vagrant destroy && vagrant up`). +- Exercise 04 hardening to make tutorial reproducible: + - Tekton namespace PodSecurity patch (`pod-security.kubernetes.io/enforce=privileged`) + - pipeline validate step switched to pure client-side `kubectl create --dry-run=client` + - clone task now ensures workspace writeability for later task images (`chmod -R a+rwX .`) + - git clone/push switched to HTTP auth header flow (no URL credential embedding) +- Exercise 04 docs clarified with explicit PSA semantics and workshop trade-offs. +- Assignment clarity improvements across docs/01..06: + - every shell snippet clearly marked as `VM` or `HOST` + - removed large per-page top callout blocks; context now lives at snippet level +- Exercise 03 docs expanded with practical explanation around MetalLB manifests and key Kubernetes terms. +- Exercise 04 docs expanded with: + - explicit mandatory credential step before PipelineRun + - clear distinction between Argo wrapper manifest vs full Tekton pipeline manifest + - Tekton Dashboard + ingress walkthrough +- `scripts/vm/set-git-credentials.sh` now prints a context-correct PipelineRun path (`/vagrant/...` fallback included). +- Earlier branch-level fixes remain in place: + - root recursive discovery + - MetalLB CRD `caBundle` drift handling + - Tekton empty `kustomize` drift fix in solution flow diff --git a/sessions.md b/sessions.md new file mode 100644 index 0000000..24e619a --- /dev/null +++ b/sessions.md @@ -0,0 +1,208 @@ +# Sessions Log + +Per-session progress notes. Newest entry first. + +--- + +## 2026-03-01 — Full end-to-end validation + out-of-box hardening (SESSION 5) + +**Session goal**: Run the complete workshop flow in a clean tryout repo/VM and close all blockers until 01→06 works out of the box. + +**Validated flow**: +- Fresh baseline in `ops-demo-tryout`: + - force-reset to `upstream/main` + - `vagrant destroy -f && vagrant up` + - bootstrap + Argo repo registration + root commit +- Progressive exercise validation completed through 01→06. +- Final runtime state confirmed: + - all Argo apps `Synced/Healthy` + - podinfo image at `ghcr.io/stefanprodan/podinfo:6.7.0` + - URLs responding: podinfo `200`, tekton dashboard `200`, grafana `302` (login redirect) + +**Critical blockers found and fixed**: +1. **Tekton TaskRuns rejected by Pod Security Admission** + Symptom: `PodAdmissionFailed` in `tekton-pipelines` namespace. + Fix: + - `manifests/ci/tekton/kustomization.yaml` now patches existing Namespace + - new `manifests/ci/tekton/namespace-podsecurity-patch.yaml` + - docs/04 updated with explicit rationale (what PSA means and why this trade-off is used in workshop) + +2. **Pipeline validate step required unintended RBAC** + Symptom: `validate` task failed with `Forbidden` on reads in `podinfo` namespace. + Fix: + - switched validate command from `kubectl apply --dry-run=client` to + `kubectl create --dry-run=client` (pure client-side validation) + +3. **Workspace file ownership/mode mismatch between task images** + Symptom: `bump-image-tag` failed with permission denied writing `deployment.yaml`. + Fix: + - clone task now runs `chmod -R a+rwX .` so subsequent task images/users can write. + +4. **Git push URL credential embedding failed** + Symptom: `git-commit-push` failed with URL parse error (`Port number was not a decimal number...`). + Fix: + - clone/push now use `http.extraHeader=Authorization: Basic ...` + instead of embedding credentials in remote URL. + +**Docs hardened**: +- `docs/04-tekton-pipeline.md` on `main` expanded with practical explanations: + - clear PSA meaning (`enforce=privileged` does **not** mean pods must be privileged) + - why namespace patch is needed in this workshop + - task-level explanation and stronger troubleshooting guidance +- removed obsolete troubleshooting about `validate Forbidden` after validate-step fix. + +**Branches updated**: +- `main`: + - `f7a54b6` docs(ex04): clarify PodSecurity patch meaning and rationale + - `2ef3bae` docs(ex04): align validate explanation with client-side check +- `solution/04-tekton-pipeline`: + - `acf6be0` fix(ex04): patch Tekton namespace pod-security label + - `09262dc` docs(ex04): clarify PodSecurity patch meaning and rationale + - includes validated pipeline runtime fixes (validate mode, workspace perms, auth header clone/push) + +**Notes**: +- Tryout required repoURL substitutions to its fork URL where solution manifests referenced `ops-demo`. +- No unresolved runtime blockers remained at end of session. + +--- + +## 2026-03-01 — Assignment clarity pass + Tekton docs hardening (SESSION 4) + +**Session goal**: Remove ambiguity in exercise instructions and align docs with real execution flow. + +**Completed this session**: +- Exercise 03 expanded with explanatory text around key manifests: + - MetalLB speaker/tolerations explanation + - IPAddressPool + L2Advertisement purpose + - Argo app split and sync-wave reasoning + - Ingress intent for podinfo and ArgoCD +- Exercise 04 clarified and hardened: + - Explicitly states `apps/ci/pipeline.yaml` is only an Argo wrapper + - Makes `set-git-credentials.sh` a mandatory pre-step + - Added Tekton Dashboard + ingress walkthrough in assignment text + - Added troubleshooting for common Tekton/root drift +- Command-context UX improved across assignments: + - Shell snippets now clearly labeled `VM` or `HOST` in quote-style blocks + - Removed oversized top callout blocks from exercise pages per user preference +- `scripts/vm/set-git-credentials.sh` improved: + - Next-step output now prints a usable PipelineRun manifest path (`manifests/...` or `/vagrant/...`) + depending on where the script is run. + +**Key commits pushed (main)**: +- `83d227a` docs(ex04): document tekton kustomize drift fix +- `a2c15d6` docs(ex04): add Tekton Dashboard UI + ingress walkthrough +- `0212f4b` docs: clarify command context and workshop flow + +**Open follow-up**: +- If dashboard setup should be mandatory in `solution/04`, validate in tryout and backport explicitly to that branch. + +--- + +## 2026-02-28 — Workflow hardening + docs alignment (SESSION 3) + +**Session goal**: Fix operator-facing workflow issues, prevent wrong-cluster mistakes, and align docs/solutions with real usage. + +**Completed this session**: +- Host/VM script split is now the working model in docs and flow: + - host: `scripts/host/bootstrap-from-host.*`, `scripts/host/argocd-ui-tunnel.*` + - vm: `scripts/vm/bootstrap.sh`, `scripts/vm/set-git-credentials.sh`, `scripts/vm/argocd-port-forward.sh` +- Bootstrap safety improved: + - cluster target checks enforced in `scripts/vm/bootstrap.sh` + - recursive app discovery fix merged (`cc0d36b`) +- README and exercise docs updated multiple times for: + - `vagrant ssh` usage + - Argo repo registration requirement + - GitHub PAT guidance (fine-grained token path and permissions context) + - host/VM execution clarity and troubleshooting +- MetalLB OutOfSync drift investigated and fixed: + - root cause: CRD webhook `caBundle` drift behavior in Argo comparison + - validated against `pms15-cluster` behavior + - `solution/03-metallb-ingress` updated to ignore CRD `caBundle` drift generically (not single CRD name) + - docs/03 troubleshooting updated on main +- Formatting pass landed for markdown readability (`fc0eb1b`), then targeted wording corrections. +- `CLAUDE.md` refreshed to current architecture and branch model. + +**Key commits pushed (main)**: +- `c68292e` docs: clarify VM access via vagrant ssh only +- `71c1f79` improve bootstrap safety + host-side Argo access scripts +- `4d77c82` fix host KUBECONFIG leakage in host/vm scripts +- `cb912cf` split host/vm scripts + Argo tunnel workflow fix +- `d59818d` docs refinements (workshop flow + Argo repo credentials) +- `cc0d36b` bootstrap: recursive app discovery in root app +- `fc0eb1b` markdown formatting/readability +- `0dc7062` ex03 docs: Metallb CRD drift troubleshooting + +**Key commit pushed (solution branch)**: +- `solution/03-metallb-ingress`: `2e6b4fb` (ignore Metallb CRD `caBundle` drift across CRDs) + +**Notes / follow-up**: +- Keep `sessions.md`/`roadmap.md` in sync after every significant change. +- Verify all solution branches still obey "standalone per exercise" constraints before next content edits. + +--- + +## 2026-02-28 — Branching restructure + Dutch translation (SESSION 2, INCOMPLETE) + +**Session goal**: Restructure branches, translate docs to Dutch, rebuild solution branches off thin main. + +**Completed this session**: +- `reference-solution` branch created from old main (full working solution) ✓ +- Solution files removed from `main` (staged, NOT committed) ✓ +- `scripts/bootstrap.sh` rewritten: Dutch, auto-detects fork URL (SSH→HTTPS), generates apps/root.yaml ✓ +- `README.md` rewritten in Dutch ✓ +- `docs/vm-setup.md` rewritten in Dutch ✓ +- `docs/01-argocd-bootstrap.md` through `docs/06-monitoring.md` rewritten in Dutch ✓ +- `docs/presentation/final-talk.md` — **STILL EMPTY** (1 line) — NOT YET DONE +- Old solution/NN-* branches NOT yet deleted/recreated +- **NOTHING COMMITTED on new main yet** + +**Git status on `main`**: +- STAGED: deletions of all solution files (apps/apps/, apps/ci/, etc., manifests/apps/, etc.) +- UNSTAGED MODIFIED: README.md, all docs/*.md, scripts/bootstrap.sh +- UNTRACKED: none relevant + +**What to do next session**: +1. Write `docs/presentation/final-talk.md` in Dutch (translate from reference-solution branch, natural dev-Dutch) +2. `git add -A` + ONE commit on main (all deletions + Dutch docs + new bootstrap.sh) +3. Delete old solution branches: solution/01 through solution/06 +4. Recreate solution/01-argocd-bootstrap through solution/06-monitoring cumulatively off new thin main, each with ONE commit +5. Push everything to GitHub (paulharkink/ops-demo) +6. Continue smoke-testing exercises 02–05 + +**Key Dutch translation rules** (user was very clear): +- Natural dev-Dutch, written as if Paul wrote it +- Technical terms stay English: "branches", "cluster", "pipeline", "deployment", "namespace", etc. +- "takken" is NEVER acceptable +- No Apple Silicon warnings +- No "Co-Authored-By: Claude" in commits + +--- + +## 2026-02-28 — Initial implementation (SESSION 1) + +**Session goal**: Full repo scaffold from implementation plan. + +**Completed**: +- Phase 1: CLAUDE.md, sessions.md, roadmap.md, Vagrantfile, scripts/bootstrap.sh, + apps/root.yaml, apps/project.yaml, apps/argocd.yaml, manifests/argocd/values.yaml + → `solution/01-argocd-bootstrap` branch created +- Phase 2: apps/apps/podinfo.yaml, manifests/apps/podinfo/, docs/01-argocd-bootstrap.md, + docs/02-deploy-podinfo.md + → `solution/02-deploy-podinfo` branch created +- Phase 3: MetalLB + Ingress-Nginx apps/manifests, podinfo ingress, ArgoCD ingress, + docs/03-metallb-ingress.md + → `solution/03-metallb-ingress` branch created +- Phase 4: Tekton app/manifests, pipeline resources, scripts/set-git-credentials.sh, + docs/04-tekton-pipeline.md + → `solution/04-tekton-pipeline` branch created +- Phase 5: docs/05-app-upgrade.md → `solution/05-app-upgrade` branch (deployment at 6.7.0) +- Phase 6: Monitoring app/manifests, docs/06-monitoring.md → `solution/06-monitoring` branch +- Phase 7: docs/vm-setup.md, README.md, docs/presentation/final-talk.md + +**Vagrantfile fixes applied**: +- yq arch-aware: ARCH=$(dpkg --print-architecture) +- Tekton images: ghcr.io (not gcr.io) +- Docker Hub images: docker.io/ prefix required for k3s ctr +- kubeconfig chmod 600 + +**Not yet verified**: Full end-to-end smoke test pending.