wip: k3s hardening: PodSecurity baseline + NetworkPolicy template + postgres loopback bind #682

Draft
lytedev wants to merge 3 commits from sec-k8s-platform into main
Owner

Security-audit follow-up. Hardens the beefcake k3s cluster with PodSecurity admission, provides a real (not no-op) NetworkPolicy baseline, closes a latent postgres exposure, and lays out a podman->k8s migration plan. Build-checked, not deployed.

What's in here

1. PodSecurity admission (implemented, cluster-wide)

  • New k3s.podSecurity module option feeds kube-apiserver an AdmissionConfiguration via --kube-apiserver-arg=admission-control-config-file= (a nix-store config file, referenced by path — the apiserver reads it in-process as root).
  • Enabled on beefcake: baseline enforce default, restricted warn/audit, kube-system exempt (traefik/coredns/metrics-server/local-path/helm-install would fail baseline otherwise).
  • PSA gates newly-admitted pods only — running pods (incl. the stale echo-server) are untouched until they restart. Safe to deploy.

2. NetworkPolicy — audit premise corrected ⚠️

The audit said netpol would be a no-op on flannel. That's wrong for k3s. k3s embeds kube-router's netpol controller alongside flannel and enforces NetworkPolicy by default (docs); this cluster does not pass --disable-network-policy (verified on the running server). So default-deny is real here.

  • Added lib/doc/k8s-networkpolicy-template.yaml: reusable per-namespace default-deny-ingress + allow-from-traefik + allow-dns.
  • Not auto-applied cluster-wide: netpol is namespaced and a blanket deny in kube-system would sever coredns/traefik. Each migrated app carries its own copy (per the migration doc).

3. postgres 0.0.0.0 -> loopback (implemented)

enableTCPIP = true made postgres listen_addresses = '*' (0.0.0.0:5432). Investigated consumers: the only TCP client is the happy container, which runs --network=host and connects to localhost. Nothing authorized reaches postgres off-loopback (pg_hba trusts only 127.0.0.1/32 + ::1/128; firewall doesn't open 5432). So the 0.0.0.0 bind was pure latent exposure. Now bound to localhost (both v4/v6, so happy's hostname-based URL still resolves). Documented the flannel-bridge (10.42.0.1 + scoped pg_hba) path for future in-cluster pods.

4. Migration plan (lib/doc/podman-to-k8s-migration.md)

Thumbs-up but selective. Triage of the ~7 podman workloads:

  • Early: openobserve, actual, bulwark, hearth (HTTP, single-container, hostPath-friendly).
  • 🟡 Batch later: the matrix bridges (share a tuwunel+sops+sqlite shape).
  • Stay podman: music-assistant (host-net mDNS), mmrelay (host-net + mesh hardware), minecraft (raw TCP :25565 — the cluster has no LAN port path by design), happy (host-net multi-service).
  • Covers images (k3s ctr images import), secrets (sops -> hostPath mount or kubectl-create-secret oneshot), storage (hostPath on /storage, already in restic), and a recommended order.

Action items for you (runtime state — not touched here)

  • Delete the stale default/echo-server Deployment+Service (169 days old, a test service). Left alone per the read-only-on-prod rule; flagging for you to kubectl delete.
  • Deploy is a config-only switch for postgres + k3s flags. Note the k3s ExecStart changes (new apiserver arg) — a k3s restart re-reads it; confirm PSA is active with a test privileged-pod rejection in a non-exempt namespace after deploy.

🤖 Generated with Claude Code

Security-audit follow-up. Hardens the beefcake k3s cluster with PodSecurity admission, provides a real (not no-op) NetworkPolicy baseline, closes a latent postgres exposure, and lays out a podman->k8s migration plan. **Build-checked, not deployed.** ## What's in here ### 1. PodSecurity admission (implemented, cluster-wide) - New `k3s.podSecurity` module option feeds kube-apiserver an `AdmissionConfiguration` via `--kube-apiserver-arg=admission-control-config-file=` (a nix-store config file, referenced by path — the apiserver reads it in-process as root). - Enabled on beefcake: **`baseline` enforce** default, `restricted` warn/audit, **`kube-system` exempt** (traefik/coredns/metrics-server/local-path/helm-install would fail baseline otherwise). - PSA gates **newly-admitted** pods only — running pods (incl. the stale `echo-server`) are untouched until they restart. Safe to deploy. ### 2. NetworkPolicy — audit premise corrected ⚠️ The audit said netpol would be a no-op on flannel. **That's wrong for k3s.** k3s embeds kube-router's netpol controller alongside flannel and enforces NetworkPolicy by default ([docs](https://docs.k3s.io/networking/networking-services#network-policy-controller)); this cluster does **not** pass `--disable-network-policy` (verified on the running server). So default-deny is **real** here. - Added `lib/doc/k8s-networkpolicy-template.yaml`: reusable per-namespace **default-deny-ingress + allow-from-traefik + allow-dns**. - **Not** auto-applied cluster-wide: netpol is namespaced and a blanket deny in kube-system would sever coredns/traefik. Each migrated app carries its own copy (per the migration doc). ### 3. postgres 0.0.0.0 -> loopback (implemented) `enableTCPIP = true` made postgres `listen_addresses = '*'` (0.0.0.0:5432). Investigated consumers: the only TCP client is the `happy` container, which runs `--network=host` and connects to `localhost`. Nothing authorized reaches postgres off-loopback (pg_hba trusts only `127.0.0.1/32` + `::1/128`; firewall doesn't open 5432). So the 0.0.0.0 bind was pure latent exposure. Now bound to `localhost` (both v4/v6, so `happy`'s hostname-based URL still resolves). Documented the flannel-bridge (`10.42.0.1` + scoped pg_hba) path for future in-cluster pods. ### 4. Migration plan (`lib/doc/podman-to-k8s-migration.md`) Thumbs-up but selective. Triage of the ~7 podman workloads: - ✅ **Early:** openobserve, actual, bulwark, hearth (HTTP, single-container, hostPath-friendly). - 🟡 **Batch later:** the matrix bridges (share a tuwunel+sops+sqlite shape). - ❌ **Stay podman:** music-assistant (host-net mDNS), mmrelay (host-net + mesh hardware), minecraft (raw TCP :25565 — the cluster has no LAN port path by design), happy (host-net multi-service). - Covers images (`k3s ctr images import`), secrets (sops -> hostPath mount or kubectl-create-secret oneshot), storage (hostPath on /storage, already in restic), and a recommended order. ## Action items for you (runtime state — not touched here) - **Delete the stale `default/echo-server`** Deployment+Service (169 days old, a test service). Left alone per the read-only-on-prod rule; flagging for you to `kubectl delete`. - **Deploy is a config-only switch** for postgres + k3s flags. Note the k3s ExecStart changes (new apiserver arg) — a k3s restart re-reads it; confirm PSA is active with a test privileged-pod rejection in a non-exempt namespace after deploy. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
feat(k3s): PodSecurity baseline + netpol template + postgres loopback bind
Some checks failed
/ check-format (push) Has been cancelled
/ build (push) Has been cancelled
8539e5e531
Security-audit follow-up hardening the beefcake k3s cluster and closing a
latent postgres exposure.

- PodSecurity admission: new k3s.podSecurity module option feeds kube-apiserver
  an AdmissionConfiguration (via --kube-apiserver-arg=admission-control-config-
  file). Enabled on beefcake with a cluster-wide 'baseline' enforce default
  (restricted warn/audit), kube-system exempt. Gates newly-admitted pods only;
  running pods (incl. stale echo-server) are untouched until restart.

- NetworkPolicy: corrected the audit premise -- k3s DOES enforce NetworkPolicy
  out of the box via its embedded kube-router netpol controller (not disabled
  here), so default-deny is real, not a flannel no-op. Added a reusable
  per-namespace default-deny + allow-from-traefik + allow-dns template
  (lib/doc/k8s-networkpolicy-template.yaml). Not auto-applied cluster-wide
  (netpol is namespaced; a blanket deny would sever kube-system) -- each
  migrated app carries its own copy.

- postgres: was binding 0.0.0.0:5432 (enableTCPIP=true -> listen_addresses='*').
  Nothing authorized reaches it off-loopback (pg_hba trusts only 127.0.0.1/::1,
  firewall doesn't open 5432), and the sole TCP client (happy) is --network=host
  on localhost. Bound to loopback ('localhost', both v4/v6) instead; documented
  the flannel-bridge path for future in-cluster pods.

- Added lib/doc/podman-to-k8s-migration.md: triage of the ~7 podman workloads
  (early candidates vs stay-podman for host-network/hardware/raw-TCP), the
  image/secret/storage/netpol mechanics, and a recommended order.

Build-checked (nixosConfigurations.beefcake.config.system.build.toplevel).
Not deployed.
feat(beefcake): delete retired happy server; refine k8s migration plan
All checks were successful
/ check-format (push) Successful in 1m2s
/ build (push) Successful in 16m27s
58f5a2f88d
- Delete packages/hosts/beefcake/happy.nix (self-hosted happy-coder server).
  Already retired since 2026-06 (import was commented out; podman-happy.service
  failed on activation) and unused, so this is dead-code removal — build is
  unchanged. Config only: the orphaned sops secrets (happy.env, garage.toml) and
  on-disk state (/storage/happy, /var/lib/happy, /var/lib/garage, postgres happy
  db) need a manual cleanup pass, noted in the PR + the import comment.

- Rework lib/doc/podman-to-k8s-migration.md per Daniel's triage: bulwark is the
  first HTTP move; hearth next; matrix bridges migrate *with* tuwunel as a unit
  (bidirectional appservice traffic must not straddle the host/cluster boundary);
  music-assistant / mmrelay / game servers are feasible on a single node but land
  in a privileged, netpol-exempt namespace (hostNetwork / hostPort); openobserve
  stays off-cluster (observability sink = circular dep); actual dropped; happy
  deleted.

- Add a 'NodePort exception' section: nodeport-addresses is node-global, so to
  LAN-expose one workload use a pod hostPort (bypasses kube-proxy, doesn't touch
  the :443 edge, needs a privileged namespace) or a caddy-l4 stream proxy (keeps
  caddy the sole edge). Never widen nodeport-addresses or re-enable ServiceLB.
lytedev changed title from k3s hardening: PodSecurity baseline + NetworkPolicy template + postgres loopback bind to wip: k3s hardening: PodSecurity baseline + NetworkPolicy template + postgres loopback bind 2026-07-01 11:25:56 -05:00
feat(beefcake/k3s): firewall-gated NodePorts (drop loopback pin); prune orphaned happy secrets
All checks were successful
/ check-format (push) Successful in 7s
/ build (push) Successful in 6m32s
af4e6ed8f4
Policy change per Daniel: allow plain NodePorts for simple LAN TCP services
instead of forcing everything through caddy/ingress.

- Remove the --kube-proxy-arg=nodeport-addresses=127.0.0.0/8 loopback pin.
  NodePorts now bind all interfaces (kube-proxy default) but the host firewall
  keeps 30000-32767 closed on the LAN, so a NodePort is only LAN-reachable once
  its port is explicitly opened (declaratively via networking.firewall,
  nodePort pinned so the rule stays stable). No reverse proxy needed.
  - The :443 edge is unaffected: NodePorts live in 30000-32767 and cannot bind
    :443, and --disable=servicelb (kept) is the actual guard against a
    LoadBalancer seizing host :80/:443. Dropping the pin only removes one layer
    of a two-layer defense; the closed firewall still gates LAN NodePort access.
  - caddy still reaches traefik's NodePort 30081 on loopback; updated the
    now-inaccurate 'loopback NodePort' comments accordingly.

- secrets/beefcake/secrets.yml: sops-unset the orphaned happy.env and
  garage.toml entries left behind by the happy deletion (re-encrypted + MAC
  recomputed for all recipients; other 49 secrets untouched).

- migration doc: NodePort section reworked to lead with 'NodePort + open the
  firewall port' as the default for simple TCP; hostPort and caddy-l4 demoted to
  specific cases. Anti-goal narrowed to 'never re-enable ServiceLB / create a
  LoadBalancer' (was over-broad 'never widen nodeport-addresses').
All checks were successful
/ check-format (push) Successful in 7s
Required
Details
/ build (push) Successful in 6m32s
Required
Details
This pull request has changes conflicting with the target branch.
  • secrets/beefcake/secrets.yml
View command line instructions

Manual merge helper

Use this merge commit message when completing the merge manually.

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin sec-k8s-platform:sec-k8s-platform
git switch sec-k8s-platform
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lytedev/nix!682
No description provided.