NAS Encryption Migration: A War Story

Encrypt a 10TB NAS array. Migrate from Unraid to TrueNAS. Don’t lose any data. Do it before a 4-month travel sabbatical.

The Mission

Two matched 22TB drives in a mirror. ~10TB of data (photos, documents, projects, backups). Goal: native ZFS encryption with checksums (bit rot protection), proper snapshots, and the peace of mind that comes with encrypted-at-rest storage.

The pressure: Three weeks until I leave. A thousand things to get done before then - packing up an apartment, wrapping up work, coordinating with people across multiple countries. The NAS isn’t just storage; it’s the backbone of everything. My notes, my photos, my project files, my backups. It needs to be perfect the entire time, because I’m going to be hammering it constantly while preparing to disappear for four months.

This wasn’t “migrate whenever convenient.” This was “execute a flawless OS migration on critical infrastructure while actively using that infrastructure at maximum intensity, with zero margin for extended downtime or data loss.”

The migration had to be swift, it had to be perfect, and it had to be done in a way that let me keep working through it.

Critical Path Analysis

Operation	Duration	Notes
Deduplicate files (czkawka)	3-4 hrs	726k duplicate files found, ~582 GB freed
Review & delete duplicates	2-4 hrs	Manual verification
Backup to external drive	2-3 hrs	restic to USB 3.0 HDD
TrueNAS install + encrypted pool	1-2 hrs	Fresh OS install
Restore from backup	19h 55m	10.183 TiB, ran overnight
Dataset restructure	~7 hrs	ZFS dataset per share
Container recreation	4-8 hrs	Human-intensive

Total: ~30 hours elapsed (restore runs unattended overnight)

The Bottleneck Question

Before committing, I tested restore throughput. The NAS has a Celeron N95 - would it be the bottleneck?

Result: N95 is NOT the bottleneck. Restore achieved ~100 MB/s sustained. The limit was I/O (USB 3.0 to backup drive + array write speed), not CPU. This meant I could run the restore directly on the NAS hardware rather than needing to involve another machine.

The Failover System

During the ~24-hour migration window, I still needed my vault (Obsidian notes) accessible. The solution:

Syncthing already syncs the vault to phone/laptop/PC
Failover script on main PC:
- SSHs to NAS and disables SMB/NFS sharing
- Creates local symlink so apps see files in the same path
- When done: re-enables NAS shares and removes symlink

nas-failover local   # Switch to local copy, disable NAS sharing
nas-failover nas     # Switch back to NAS
nas-failover status  # Check current state

This made Obsidian and other tools work seamlessly during migration - they didn’t know the files had moved.

What Actually Happened (The Mistakes)

Mistake 1: Wrong Order of Operations

What I did: Stopped Syncthing container FIRST, then activated failover.

The problem: There was ~1hr 15min where I was still editing files via SMB (existing connections persist even when you think you’ve switched). Those edits went to the NAS, not my local copy.

The fix: Had to SSH in and manually recover the files after discovering the gap.

The lesson: Disable SMB/NFS server-side FIRST, breaking all client connections. THEN stop sync services. The failover script now enforces this order.

Mistake 2: Persistent Network Mounts Are Sneaky

Any application with an open handle to an SMB/NFS mount will continue writing to the remote filesystem even after you “think” you’ve switched to local:

Obsidian (vault open = persistent connection)
VS Code / editors (workspace open)
Terminal sessions (cwd on mount)
File managers (directory open)
Any app with “recent files” pointing to mount

The fix: The failover script now SSHs to the NAS and stops the SMB/NFS services entirely. This forcibly breaks all client connections. No sneaky persistence.

Mistake 3: Testing Shares Too Early

While in failover mode (using local copy), I configured SMB shares on the new TrueNAS install and tested mounts. This re-enabled NAS access, and autofs happily mounted the (stale) NAS data.

The lesson: Configure shares but don’t test mounts until you’re ready to fully switch back.

Mistake 4: Syncthing Doesn’t Use SMB

This one bit me twice.

Syncthing syncs over its own protocol (port 22000), completely independent of SMB/NFS. This means:

Failover mode doesn’t block Syncthing. With SMB disabled and local symlink in place, Syncthing can still sync between NAS and local PC.
This is actually useful: you can start Syncthing BEFORE unwinding failover.

What I did wrong: Unwound failover first (switching apps to stale NAS mount), then started Syncthing. This created a window where edits could go to stale NAS files and get overwritten when Syncthing caught up.

The correct sequence:

Keep failover active (apps write to local via symlink)
Start Syncthing container on NAS
Wait for Syncthing to fully sync local → NAS
THEN unwind failover (switch apps to NAS mount)

I had to emergency re-activate failover when I realized the mistake, then wait for sync to complete.

Mistake 5: “Idle” Doesn’t Mean “Synced”

Discovered this one while writing this very post.

Local Syncthing status: needFiles: 0, state: "idle". Great, fully synced!

NAS Syncthing status: needFiles: 37421, state: "syncing". Very much not synced.

The lesson: “Idle” means “I have nothing to do right now” - it doesn’t mean all peers are caught up. The local device might be idle because it’s already sent everything, but the remote is still receiving. Always check the destination device’s status, not just the source.

Not A Mistake: The UPS

At 10:33 PM on day two of the migration, the electricity went out for a second.

The NAS kept humming. The UPS earned its keep.

If you’re doing anything like this without a UPS, you’re gambling. Don’t.

TrueNAS Learnings

Things I discovered along the way:

No LUKS support: TrueNAS SCALE kernel lacks dm_mod/dm_crypt. Can’t mount LUKS-encrypted external drives directly on TrueNAS.
Pool vs dataset encryption: Pool encryption is key-only. Passphrase option only available at dataset level.
SSH quirks: Password auth needs BOTH the service setting AND a per-user checkbox. Easy to miss.
Middleware vs ZFS: Pools created via zpool create CLI won’t show in the web UI. Use the UI or the middleware API.
NVMe SLOG isn’t like Unraid cache: SLOG only buffers sync writes for ~5 seconds. Not the same use case at all.
Immutable rootfs: Can’t apt install anything. Services like Tailscale need to run as Docker containers.

The Restore

Restore landed data at /mnt/tank/data/mnt/user/* - the original Unraid paths came along for the ride.

Errors: 8306 symlink failures. All in cruft directories (node_modules and .cache leftover from an old external drive backup). Vault intact.

Restructuring: Created proper ZFS datasets (tank/Files, tank/Backups, etc.) and rsynced data from the nested paths. Moving between ZFS datasets = full copy (not rename), so this added ~7 hours. Started with the vault (1.1TB) so I could get back to work fastest.

Pro tip: Run dataset migrations sequentially, not in parallel. Parallel rsync on the same HDD pool causes head thrashing and tanks total throughput.

Timeline

Date	Event
Jan 18	Project started. Decision to encrypt. Started pre-migration backup.
Jan 24	Critical path analysis. Identified restore as potential bottleneck. Started dedupe overnight.
Jan 25 AM	Dedupe complete (582 GB freed). Benchmarked restore. Created rollback image of old boot USB.
Jan 25 PM	Activated failover. Ran final backup. Installed TrueNAS. Started restore (~16hr ETA).
Jan 26 PM	Restore complete (19h 55m). Started restructuring.
Jan 26 evening	Datasets migrated. Syncthing running. Tailscale configured. Waiting for sync.

The Stack

Before: Unraid with Docker containers (Syncthing, Backrest, Filestash), CIFS/NFS shares, parity drive.

After: TrueNAS Community Edition with:

Encrypted ZFS mirror (2x22TB)
Native ZFS snapshots
Docker containers (Syncthing, Tailscale)
Direct restic + cron (simpler than Backrest GUI wrapper)
Same CIFS/NFS shares

Was It Worth It?

Absolutely. Native ZFS encryption, checksums for bit rot protection, proper snapshots, and I learned a ton about failover orchestration and the subtleties of persistent mounts.

The mistakes were educational. The failover script is now bulletproof. And critically: it’s done. Not “I’ll do it someday when I have time” - done, tested, solid, with a week to spare before travel. The NAS will hum along perfectly while I’m on the other side of the world, and I won’t be lying awake wondering if I should have encrypted it before leaving my apartment empty for four months.

Could I have spread this over three weeks, working on it bit by bit? Absolutely not. The NAS can’t be half-broken while you leisurely tinker. It’s either working or it’s not, and I needed it working - constantly, reliably - while simultaneously preparing to leave the country. Rip the bandaid off. Execute in one intense push. Get back to a known-good state as fast as possible.

Total hands-on time: probably 12-15 hours over 3 days. Total elapsed time: ~48 hours (restore ran overnight). Coffee consumed: significant.