NAS Encryption Migration: A War Story
Migrating critical infrastructure under pressure, and the subtle ways it tried to betray me
Encrypting a 10TB NAS array, migrating from Unraid to TrueNAS, and the subtle ways persistent network mounts will betray you.
Encrypt a 10TB NAS array. Migrate from Unraid to TrueNAS. Don’t lose any data. Do it before a 4-month travel sabbatical.
The Mission
Two matched 22TB drives in a mirror. ~10TB of data (photos, documents, projects, backups). Goal: native ZFS encryption with checksums (bit rot protection), proper snapshots, and the peace of mind that comes with encrypted-at-rest storage.
The pressure: Three weeks until I leave. A thousand things to get done before then - packing up an apartment, wrapping up work, coordinating with people across multiple countries. The NAS isn’t just storage; it’s the backbone of everything. My notes, my photos, my project files, my backups. It needs to be perfect the entire time, because I’m going to be hammering it constantly while preparing to disappear for four months.
This wasn’t “migrate whenever convenient.” This was “execute a flawless OS migration on critical infrastructure while actively using that infrastructure at maximum intensity, with zero margin for extended downtime or data loss.”
The migration had to be swift, it had to be perfect, and it had to be done in a way that let me keep working through it.
Critical Path Analysis
| Operation | Duration | Notes |
|---|---|---|
| Deduplicate files (czkawka) | 3-4 hrs | 726k duplicate files found, ~582 GB freed |
| Review & delete duplicates | 2-4 hrs | Manual verification |
| Backup to external drive | 2-3 hrs | restic to USB 3.0 HDD |
| TrueNAS install + encrypted pool | 1-2 hrs | Fresh OS install |
| Restore from backup | 19h 55m | 10.183 TiB, ran overnight |
| Dataset restructure | ~7 hrs | ZFS dataset per share |
| Container recreation | 4-8 hrs | Human-intensive |
Total: ~30 hours elapsed (restore runs unattended overnight)
The Bottleneck Question
Before committing, I tested restore throughput. The NAS has a Celeron N95 - would it be the bottleneck?
Result: N95 is NOT the bottleneck. Restore achieved ~100 MB/s sustained. The limit was I/O (USB 3.0 to backup drive + array write speed), not CPU. This meant I could run the restore directly on the NAS hardware rather than needing to involve another machine.
The Failover System
During the ~24-hour migration window, I still needed my vault (Obsidian notes) accessible. The solution:
- Syncthing already syncs the vault to phone/laptop/PC
- Failover script on main PC:
- SSHs to NAS and disables SMB/NFS sharing
- Creates local symlink so apps see files in the same path
- When done: re-enables NAS shares and removes symlink
nas-failover local # Switch to local copy, disable NAS sharing
nas-failover nas # Switch back to NAS
nas-failover status # Check current state
This made Obsidian and other tools work seamlessly during migration - they didn’t know the files had moved.
What Actually Happened (The Mistakes)
Mistake 1: Wrong Order of Operations
What I did: Stopped Syncthing container FIRST, then activated failover.
The problem: There was ~1hr 15min where I was still editing files via SMB (existing connections persist even when you think you’ve switched). Those edits went to the NAS, not my local copy.
The fix: Had to SSH in and manually recover the files after discovering the gap.
The lesson: Disable SMB/NFS server-side FIRST, breaking all client connections. THEN stop sync services. The failover script now enforces this order.
Mistake 2: Persistent Network Mounts Are Sneaky
Any application with an open handle to an SMB/NFS mount will continue writing to the remote filesystem even after you “think” you’ve switched to local:
- Obsidian (vault open = persistent connection)
- VS Code / editors (workspace open)
- Terminal sessions (cwd on mount)
- File managers (directory open)
- Any app with “recent files” pointing to mount
The fix: The failover script now SSHs to the NAS and stops the SMB/NFS services entirely. This forcibly breaks all client connections. No sneaky persistence.
Mistake 3: Testing Shares Too Early
While in failover mode (using local copy), I configured SMB shares on the new TrueNAS install and tested mounts. This re-enabled NAS access, and autofs happily mounted the (stale) NAS data.
The lesson: Configure shares but don’t test mounts until you’re ready to fully switch back.
Mistake 4: Syncthing Doesn’t Use SMB
This one bit me twice.
Syncthing syncs over its own protocol (port 22000), completely independent of SMB/NFS. This means:
- Failover mode doesn’t block Syncthing. With SMB disabled and local symlink in place, Syncthing can still sync between NAS and local PC.
- This is actually useful: you can start Syncthing BEFORE unwinding failover.
What I did wrong: Unwound failover first (switching apps to stale NAS mount), then started Syncthing. This created a window where edits could go to stale NAS files and get overwritten when Syncthing caught up.
The correct sequence:
- Keep failover active (apps write to local via symlink)
- Start Syncthing container on NAS
- Wait for Syncthing to fully sync local โ NAS
- THEN unwind failover (switch apps to NAS mount)
I had to emergency re-activate failover when I realized the mistake, then wait for sync to complete.
Mistake 5: “Idle” Doesn’t Mean “Synced”
Discovered this one while writing this very post.
Local Syncthing status: needFiles: 0, state: "idle". Great, fully synced!
NAS Syncthing status: needFiles: 37421, state: "syncing". Very much not synced.
The lesson: “Idle” means “I have nothing to do right now” - it doesn’t mean all peers are caught up. The local device might be idle because it’s already sent everything, but the remote is still receiving. Always check the destination device’s status, not just the source.
Not A Mistake: The UPS
At 10:33 PM on day two of the migration, the electricity went out for a second.
The NAS kept humming. The UPS earned its keep.
If you’re doing anything like this without a UPS, you’re gambling. Don’t.
TrueNAS Learnings
Things I discovered along the way:
- No LUKS support: TrueNAS SCALE kernel lacks
dm_mod/dm_crypt. Can’t mount LUKS-encrypted external drives directly on TrueNAS. - Pool vs dataset encryption: Pool encryption is key-only. Passphrase option only available at dataset level.
- SSH quirks: Password auth needs BOTH the service setting AND a per-user checkbox. Easy to miss.
- Middleware vs ZFS: Pools created via
zpool createCLI won’t show in the web UI. Use the UI or the middleware API. - NVMe SLOG isn’t like Unraid cache: SLOG only buffers sync writes for ~5 seconds. Not the same use case at all.
- Immutable rootfs: Can’t
apt installanything. Services like Tailscale need to run as Docker containers.
The Restore
Restore landed data at /mnt/tank/data/mnt/user/* - the original Unraid paths came along for the ride.
Errors: 8306 symlink failures. All in cruft directories (node_modules and .cache leftover from an old external drive backup). Vault intact.
Restructuring: Created proper ZFS datasets (tank/Files, tank/Backups, etc.) and rsynced data from the nested paths. Moving between ZFS datasets = full copy (not rename), so this added ~7 hours. Started with the vault (1.1TB) so I could get back to work fastest.
Pro tip: Run dataset migrations sequentially, not in parallel. Parallel rsync on the same HDD pool causes head thrashing and tanks total throughput.
Timeline
| Date | Event |
|---|---|
| Jan 18 | Project started. Decision to encrypt. Started pre-migration backup. |
| Jan 24 | Critical path analysis. Identified restore as potential bottleneck. Started dedupe overnight. |
| Jan 25 AM | Dedupe complete (582 GB freed). Benchmarked restore. Created rollback image of old boot USB. |
| Jan 25 PM | Activated failover. Ran final backup. Installed TrueNAS. Started restore (~16hr ETA). |
| Jan 26 PM | Restore complete (19h 55m). Started restructuring. |
| Jan 26 evening | Datasets migrated. Syncthing running. Tailscale configured. Waiting for sync. |
The Stack
Before: Unraid with Docker containers (Syncthing, Backrest, Filestash), CIFS/NFS shares, parity drive.
After: TrueNAS Community Edition with:
- Encrypted ZFS mirror (2x22TB)
- Native ZFS snapshots
- Docker containers (Syncthing, Tailscale)
- Direct restic + cron (simpler than Backrest GUI wrapper)
- Same CIFS/NFS shares
Was It Worth It?
Absolutely. Native ZFS encryption, checksums for bit rot protection, proper snapshots, and I learned a ton about failover orchestration and the subtleties of persistent mounts.
The mistakes were educational. The failover script is now bulletproof. And critically: it’s done. Not “I’ll do it someday when I have time” - done, tested, solid, with a week to spare before travel. The NAS will hum along perfectly while I’m on the other side of the world, and I won’t be lying awake wondering if I should have encrypted it before leaving my apartment empty for four months.
Could I have spread this over three weeks, working on it bit by bit? Absolutely not. The NAS can’t be half-broken while you leisurely tinker. It’s either working or it’s not, and I needed it working - constantly, reliably - while simultaneously preparing to leave the country. Rip the bandaid off. Execute in one intense push. Get back to a known-good state as fast as possible.
Total hands-on time: probably 12-15 hours over 3 days. Total elapsed time: ~48 hours (restore ran overnight). Coffee consumed: significant.