feat(beefcake): alert disk health (smartd + ZED) to Matrix #615
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "beefcake-disk-alerts"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
What
Wires smartd (SMART pre-failure / failed self-tests) and the ZFS Event Daemon (vdev FAULTED/DEGRADED, spare activation, scrub/resilver errors) to post to a matrix-hookshot generic webhook. Both daemons already run on beefcake but notify nobody by default — which is why
sde's SMART impending failure and the Jun 1 distributed-spare activation went silently unnoticed.One shared best-effort
disk-alert-notifyscript serves both:services.smartd.notifications.mail.mailerservices.zfs.zed.settings.ZED_EMAIL_PROGWhy host-direct (not an OpenObserve alert)
OO's alert rules live in its UI/DB, not this repo (not reviewable/reproducible here), and OO stores its data on the very pool that may be failing — the wrong dependency for disk-failure alerts. So this pushes straight to the same hookshot webhook channel
jmap-matrix-notifyalready uses.Bootstrap required before merge/deploy
!hookshot webhook disk-alerts, copy the URL.nix develop -c sops secrets/beefcake/secrets.yml→ adddisk-alert-webhook-url.smartctl -t short /dev/sde(smartd reports result)zpool scrub zstorage(ZED reports completion, verbose on)Not yet deployed or verified — pending the webhook secret.
🤖 Generated with Claude Code