Home AI Lab
A secondhand workstation running local LLMs for family and friends
The Problem
I got tired of API rate limits and having no say in which model runs. I wanted a local inference server my family could use from their phones — no accounts, no subscriptions, no data leaving my apartment.
The Build
Found an HP Z440 on Dutch Marktplaats for €320 — Xeon E5-2680v4, 32 GB ECC, GTX 1080. Checked every component in person before handing over cash. Installed Ubuntu Server 24.04 headless, wired up NVIDIA drivers, Docker GPU passthrough, and Ollama + Open WebUI. Remote access runs through Tailscale — mesh VPN, no port forwarding, nothing exposed to the internet. My mom chats with a local LLM from her phone now. She doesn't know what a model is. That's the point.
The Lockout
Two weeks in, I lost SSH access. authorized_keys got wiped by a silent save failure — turns out LVM had only allocated 100 GB of a 256 GB NVMe, and 64 GB of AER error logs had quietly filled the disk. Writes failed silently. I recovered through one still-open terminal session that happened to be running. One session. That was all that stood between me and reformatting the drive and starting over. I spent that night extending the LVM partition, clearing the logs, and hardening everything I could think of. The server hasn't gone down since.
The Ops Layer
- LAN + Tailscale only. No direct public internet exposure.
- Key-only SSH, UFW, fail2ban, and unattended security upgrades.
- Journald capped, reserved disk blocks, and a daily disk-usage warning after the lockout.
- Open WebUI stays the family-facing layer; admin stays in the terminal.
The Result
During bring-up, Llama 3.2 was running at about 75 tok/s on the original GTX 1080. Since then the box has kept growing: BIOS updated, the 1080 swapped for an RTX 3060, and the stack hardened enough that I trust it on the LAN and over the tailnet. It earns its electricity bill.