feat(beefcake): alert disk health (smartd + ZED) to Matrix #615

Merged
lytedev merged 2 commits from beefcake-disk-alerts into main 2026-06-26 16:14:25 -05:00
Owner

DRAFT — do not merge/deploy until the webhook secret is seeded (see Bootstrap).

What

Wires smartd (SMART pre-failure / failed self-tests) and the ZFS Event Daemon (vdev FAULTED/DEGRADED, spare activation, scrub/resilver errors) to post to a matrix-hookshot generic webhook. Both daemons already run on beefcake but notify nobody by default — which is why sde's SMART impending failure and the Jun 1 distributed-spare activation went silently unnoticed.

One shared best-effort disk-alert-notify script serves both:

  • smartd → services.smartd.notifications.mail.mailer
  • ZED → services.zfs.zed.settings.ZED_EMAIL_PROG

Why host-direct (not an OpenObserve alert)

OO's alert rules live in its UI/DB, not this repo (not reviewable/reproducible here), and OO stores its data on the very pool that may be failing — the wrong dependency for disk-failure alerts. So this pushes straight to the same hookshot webhook channel jmap-matrix-notify already uses.

Bootstrap required before merge/deploy

  1. In the target Matrix room: invite hookshot, !hookshot webhook disk-alerts, copy the URL.
  2. nix develop -c sops secrets/beefcake/secrets.yml → add disk-alert-webhook-url.
  3. Deploy, then prove end-to-end:
    • smartctl -t short /dev/sde (smartd reports result)
    • zpool scrub zstorage (ZED reports completion, verbose on)

Not yet deployed or verified — pending the webhook secret.

🤖 Generated with Claude Code

> **DRAFT — do not merge/deploy until the webhook secret is seeded (see Bootstrap).** ## What Wires **smartd** (SMART pre-failure / failed self-tests) and the **ZFS Event Daemon** (vdev FAULTED/DEGRADED, spare activation, scrub/resilver errors) to post to a matrix-hookshot generic webhook. Both daemons already run on beefcake but notify nobody by default — which is why `sde`'s SMART *impending failure* and the Jun 1 distributed-spare activation went silently unnoticed. One shared best-effort `disk-alert-notify` script serves both: - smartd → `services.smartd.notifications.mail.mailer` - ZED → `services.zfs.zed.settings.ZED_EMAIL_PROG` ## Why host-direct (not an OpenObserve alert) OO's alert rules live in its UI/DB, not this repo (not reviewable/reproducible here), and OO stores its data on the very pool that may be failing — the wrong dependency for disk-failure alerts. So this pushes straight to the same hookshot webhook channel `jmap-matrix-notify` already uses. ## Bootstrap required before merge/deploy 1. In the target Matrix room: invite hookshot, `!hookshot webhook disk-alerts`, copy the URL. 2. `nix develop -c sops secrets/beefcake/secrets.yml` → add `disk-alert-webhook-url`. 3. Deploy, then prove end-to-end: - `smartctl -t short /dev/sde` (smartd reports result) - `zpool scrub zstorage` (ZED reports completion, verbose on) Not yet deployed or verified — pending the webhook secret. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
feat(beefcake): alert disk health (smartd + ZED) to Matrix
Some checks failed
/ check-format (push) Has been cancelled
/ build (push) Has been cancelled
a86027f678
smartd and the ZFS Event Daemon already run but, by default, notify
nobody — which is why sde's SMART 'impending failure' and the Jun 1
spare activation went unnoticed. Wire both to post to a matrix-hookshot
generic webhook (same mechanism as jmap-matrix-notify) via one shared
best-effort notify script:

  - smartd -> services.smartd.notifications.mail.mailer
  - ZED    -> services.zfs.zed.settings.ZED_EMAIL_PROG

Host-direct push rather than an OpenObserve alert rule on purpose: OO's
alert rules aren't in this repo, and OO stores its data on the very pool
that might be failing — the wrong dependency for disk alerts.

Requires a one-time webhook+sops bootstrap (disk-alert-webhook-url),
documented in the module header. Draft until that secret is seeded.
feat(beefcake): seed disk-alert-webhook-url secret (lyte.dev alerts room)
All checks were successful
/ check-format (push) Successful in 20s
/ build (push) Successful in 7m29s
37e4d6d8fc
lytedev deleted branch beefcake-disk-alerts 2026-06-26 16:14:26 -05:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lytedev/nix!615
No description provided.