fix(migrate-daniel-to-kanidm): robust deps + smarter session guard #501

Closed
lytedev wants to merge 0 commits from fix-migrate-daniel-unit into main
Owner

Summary

The migration oneshot introduced in #498 silently bailed on every boot on foxtrot (and anywhere else that got rebooted cleanly — it only worked on thinker because we ran it by hand). Two bugs, one trivial one structural:

  1. Missing PATH. Systemd services run with an empty PATH by default. The script half-used ${pkgs.x}/bin/y absolute paths and half-used bare commands (getent, grep, dirname). set -e + bare getent = silent command not found → the "daniel not resolvable via NSS" branch fires and the unit returns success-with-nothing-done. Same for grep on the session guard.
  2. Session guard matched by name, not uid. loginctl list-sessions | awk '{print $3}' | grep -qx daniel fires on any session named daniel, which includes both the pre-migration user@1000.service lingering from the old generation AND the new kanidm session once a user logs in. So after a normal reboot, the unit effectively never migrates — the guard always matches something.

Changes

  • Replace the per-command let block with path = [ coreutils findutils gawk glibc.bin gnugrep rsync shadow systemd ] so every binary resolves naturally (and getent/grep/dirname actually exist).
  • Rewrite the session guard: query getent passwd daniel for the post-migration uid, then skip only if there's an active session at a different uid (i.e. the uid=1000 lingering). A kanidm session at the new uid doesn't block migration.
  • Downgrade the "not resolvable" log to "will retry next boot" so it's clearer this is expected behavior pre-kanidm-unixd-startup.

Test plan

  • Deploy to a freshly-rebooted host that still has nested /home/daniel/.home (foxtrot).
  • First boot: migration flattens .home, chowns /home/daniel, writes /var/lib/lyte/migrate-daniel-to-kanidm.done.
  • Second boot: marker present, unit exits 0 immediately.
  • If a uid=1000 session somehow lingers (linger enabled, deploy-via-SSH, etc.), unit logs a skip message with the remediation.
## Summary The migration oneshot introduced in #498 silently bailed on every boot on foxtrot (and anywhere else that got rebooted cleanly — it only worked on thinker because we ran it by hand). Two bugs, one trivial one structural: 1. **Missing PATH.** Systemd services run with an empty PATH by default. The script half-used `${pkgs.x}/bin/y` absolute paths and half-used bare commands (`getent`, `grep`, `dirname`). `set -e` + bare `getent` = silent `command not found` → the "daniel not resolvable via NSS" branch fires and the unit returns success-with-nothing-done. Same for `grep` on the session guard. 2. **Session guard matched by name, not uid.** `loginctl list-sessions | awk '{print $3}' | grep -qx daniel` fires on *any* session named `daniel`, which includes both the pre-migration `user@1000.service` lingering from the old generation AND the new kanidm session once a user logs in. So after a normal reboot, the unit effectively never migrates — the guard always matches something. ## Changes - Replace the per-command `let` block with `path = [ coreutils findutils gawk glibc.bin gnugrep rsync shadow systemd ]` so every binary resolves naturally (and `getent`/`grep`/`dirname` actually exist). - Rewrite the session guard: query `getent passwd daniel` for the post-migration uid, then skip only if there's an active session at a *different* uid (i.e. the uid=1000 lingering). A kanidm session at the new uid doesn't block migration. - Downgrade the "not resolvable" log to "will retry next boot" so it's clearer this is expected behavior pre-kanidm-unixd-startup. ## Test plan - [ ] Deploy to a freshly-rebooted host that still has nested `/home/daniel/.home` (foxtrot). - [ ] First boot: migration flattens `.home`, chowns `/home/daniel`, writes `/var/lib/lyte/migrate-daniel-to-kanidm.done`. - [ ] Second boot: marker present, unit exits 0 immediately. - [ ] If a uid=1000 session somehow lingers (linger enabled, deploy-via-SSH, etc.), unit logs a skip message with the remediation.
fix(migrate-daniel-to-kanidm): robust deps + smarter session guard
All checks were successful
/ check-format (push) Successful in 7s
/ build (push) Successful in 5m43s
253263e774
Previous incarnation of the migration oneshot silently bailed on every
boot because:
- `getent`, `grep`, and `dirname` weren't in the service's PATH
  (systemd services start with a minimal PATH and the script only
  absolute-path'd about half its commands).
- The active-session guard matched on the *name* `daniel`, which
  fires even for the newly-resolved kanidm session (uid 2001) — and
  especially fires if the pre-migration user@1000.service is lingering.

Switch to `path = [...]` with every needed package so bare
`getent`/`grep` work, and switch the guard to checking session
UIDs: we skip only if there is an active session at the pre-migration
uid (1000) where chowning live files would stomp on that user.
lytedev force-pushed fix-migrate-daniel-unit from 253263e774
All checks were successful
/ check-format (push) Successful in 7s
/ build (push) Successful in 5m43s
to ca5af6b042
All checks were successful
/ check-format (push) Successful in 7s
/ build (push) Successful in 5m37s
2026-04-20 11:14:27 -05:00
Compare
lytedev scheduled this pull request to auto merge when all checks succeed 2026-04-20 11:17:12 -05:00
lytedev canceled auto merging this pull request when all checks succeed 2026-04-20 11:17:17 -05:00
lytedev force-pushed fix-migrate-daniel-unit from ca5af6b042
All checks were successful
/ check-format (push) Successful in 7s
/ build (push) Successful in 5m37s
to 5fb7f812dd
All checks were successful
/ check-format (push) Successful in 8s
/ build (push) Successful in 7m10s
2026-04-20 13:45:01 -05:00
Compare
lytedev closed this pull request 2026-04-21 11:07:15 -05:00
All checks were successful
/ check-format (push) Successful in 8s
Required
Details
/ build (push) Successful in 7m10s
Required
Details

Pull request closed

Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lytedev/nix!501
No description provided.