What if you could run the same data infrastructure that powers AI systems at major tech companies—right from your home office? In this series, you’ll build a complete AI-ready data platform on a single server, gaining hands-on experience with Kubernetes, data pipelines, experiment tracking, and model serving.
By the end, you’ll have a production-ready environment for ingesting data, training models, tracking experiments, and serving predictions—all running on hardware you own.
Why Build Your Own AI Platform?
Cloud platforms are excellent for production workloads, but they come with costs that add up quickly during learning and experimentation. A home server offers advantages:
- No recurring bills — Pay once for hardware, run experiments indefinitely
- Full control — No vendor lock-in, no artificial limitations
- Real-world experience — Learn infrastructure skills that transfer directly to production
- Always available — No need to spin up/down resources or worry about idle costs
Running your own infrastructure teaches you things documentation can’t: how Kubernetes orchestrates workloads, why engineers choose specific storage patterns, what happens when pipelines fail at 3 AM, and how to debug systems when logs aren’t enough.
The skills transfer directly. The same Kubernetes manifests run on cloud clusters. The same pipeline patterns scale to enterprise deployments.
Hardware Requirements
You don’t need enterprise hardware. A modern mini PC provides an excellent balance of performance, power efficiency, and cost.
Minimum specifications:
- RAM: 32GB minimum (64GB recommended)
- Storage: 1TB NVMe SSD
- CPU: Modern x86_64 processor with 8+ cores
- Network: Gigabit Ethernet
Why mini PCs work well:
- Silent operation — Quiet enough for an office
- Low power — 15-65W under load vs. 200W+ for towers
- Small footprint — Fits on a shelf
- Adequate performance — Modern CPUs handle most ML workloads
Models from Beelink, Intel NUC, or Minisforum are popular choices. Any mini PC meeting the specs above will work.
The single-node constraint is actually a feature: it forces efficient architectural decisions rather than throwing hardware at problems.
What We’ll Build
Throughout this series, you’ll deploy a complete AI data platform:
- Kubernetes — Container orchestration and infrastructure management
- Data lake — Medallion architecture for raw, cleaned, and curated data
- Pipeline orchestration — Automated data workflows with Dagster
- Data quality — Validation gates with Great Expectations
- Monitoring — Prometheus and Grafana for observability
- ML integration — Feature stores, experiment tracking, model serving
This first article covers the foundation: Ubuntu Server and Kubernetes installation.
Part 1: Ubuntu Server Installation
These instructions assume a dedicated machine where Ubuntu can use the entire disk. If you need a dual-boot setup, adjust the storage configuration accordingly.
Ubuntu Server 24.04 LTS has lower overhead than the desktop version—important when you want maximum resources for workloads.
Creating Installation Media
Download Ubuntu Server 24.04 LTS from ubuntu.com/download/server.
On macOS:
# Find your USB drive identifier
diskutil list
# Unmount the USB (replace diskN with your disk number)
diskutil unmountDisk /dev/diskN
# Write the ISO to USB
sudo dd if=~/Downloads/ubuntu-24.04.1-live-server-amd64.iso of=/dev/rdiskN bs=1m
# Eject when complete
diskutil eject /dev/diskN
On Windows, use Rufus or balenaEtcher.
Installing Ubuntu
Connect a monitor and keyboard to your mini PC for initial setup.
- Insert USB and power on
- Press F2, F12, or DEL during boot to access BIOS
- Select USB as boot device
During installation:
| Setting | Recommendation |
|---|---|
| Installation type | Ubuntu Server (minimal) |
| Network | Use DHCP initially; note the IP |
| Storage | Use entire disk with default partitioning |
| Server name | Something memorable (e.g., mlserver) |
| SSH | Enable OpenSSH server |
| Additional packages | Skip |
After installation, remove the USB and reboot.
Initial Configuration
# Update the system
sudo apt update && sudo apt upgrade -y
# Install essential tools
sudo apt install -y \
build-essential \
curl \
wget \
git \
vim \
htop \
net-tools
# Set your timezone
sudo timedatectl set-timezone Your/Timezone
# Enable automatic security updates
sudo apt install -y unattended-upgrades
sudo dpkg-reconfigure -plow unattended-upgrades
Find your server’s IP:
ip addr show
# Look for the inet address on your ethernet interface
Part 2: Remote Access Setup
SSH enables comfortable remote access from your main computer.
SSH Key Authentication
On your main computer:
# Check for existing keys
ls ~/.ssh/id_ed25519*
# Generate new key if needed
ssh-keygen -t ed25519 -C "your-email@example.com"
# Copy public key to server
ssh-copy-id your-username@server-ip-address
# Test passwordless login
ssh your-username@server-ip-address
VS Code Remote Development
VS Code’s Remote-SSH extension lets you edit files locally while execution happens on the server.
- Install VS Code
- Install the “Remote - SSH” extension
- Press
Cmd+Shift+P→ “Remote-SSH: Open Configuration File” - Add your server:
Host mlserver
HostName your-server-ip
User your-username
ForwardAgent yes
ServerAliveInterval 60
Connect: Cmd+Shift+P → “Remote-SSH: Connect to Host” → select your server.
Part 3: Installing Kubernetes with MicroK8s
MicroK8s provides a lightweight but fully-featured Kubernetes distribution for single-node deployments.
Why MicroK8s over alternatives (k3s, kind, minikube)?
- Production-grade — Full Kubernetes API compatibility
- Addon system — Easy installation of common components
- Kubeflow integration — First-class ML pipeline support
- Low overhead — Runs well on a single node
Installation
# Install MicroK8s
sudo snap install microk8s --classic --channel=1.31/stable
# Add your user to the microk8s group
sudo usermod -a -G microk8s $USER
mkdir -p ~/.kube
chmod 0700 ~/.kube
# Apply group changes
newgrp microk8s
# Wait for ready
microk8s status --wait-ready
Create a kubectl alias:
echo "alias kubectl='microk8s kubectl'" >> ~/.bashrc
source ~/.bashrc
Enabling Essential Addons
# DNS for service discovery
microk8s enable dns
# Storage for persistent data
microk8s enable hostpath-storage
# Load balancer for external access
microk8s enable metallb
When enabling MetalLB, you’ll be prompted for an IP range. This must be:
- On the same subnet as your server
- Outside your router’s DHCP range
Example: if your router assigns 192.168.1.100-200, use 192.168.1.210-250 for MetalLB.
Verify addons:
microk8s status
Part 4: Kubernetes Fundamentals
Core concepts you’ll use throughout this series:
Pods — Smallest deployable units. Run one or more containers sharing storage and network.
Deployments — Manage pods declaratively. Specify desired state, Kubernetes makes it happen.
Services — Provide stable network endpoints to pods.
PersistentVolumes — Storage that survives pod restarts.
Essential kubectl Commands
# View resources
kubectl get pods
kubectl get services
kubectl get deployments
# Detailed information
kubectl describe pod <pod-name>
# View logs
kubectl logs <pod-name>
# Execute commands in a pod
kubectl exec -it <pod-name> -- /bin/bash
# Apply configuration
kubectl apply -f manifest.yaml
# Delete resources
kubectl delete -f manifest.yaml
Part 5: Deploy Your First Application
Verify everything works with a simple deployment.
Create hello-app.yaml:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: hello-app
labels:
app: hello
spec:
replicas: 2
selector:
matchLabels:
app: hello
template:
metadata:
labels:
app: hello
spec:
containers:
- name: hello
image: gcr.io/google-samples/hello-app:1.0
ports:
- containerPort: 8080
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "200m"
---
apiVersion: v1
kind: Service
metadata:
name: hello-service
spec:
type: LoadBalancer
selector:
app: hello
ports:
- protocol: TCP
port: 80
targetPort: 8080
Deploy:
kubectl apply -f hello-app.yaml
# Watch pods start
kubectl get pods
# Get external IP (may take 30 seconds)
kubectl get services
# Test
curl http://<external-ip>
You should see:
Hello, world!
Version: 1.0.0
Hostname: hello-app-xxxxx
Scaling and Updates
# Scale to 5 replicas
kubectl scale deployment hello-app --replicas=5
# Rolling update with zero downtime
kubectl set image deployment/hello-app hello=gcr.io/google-samples/hello-app:2.0
kubectl rollout status deployment/hello-app
Clean up:
kubectl delete -f hello-app.yaml
Part 6: Testing Persistent Storage
ML workloads need storage that persists beyond pod lifecycles.
Create storage-test.yaml:
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: microk8s-hostpath
---
apiVersion: v1
kind: Pod
metadata:
name: storage-test
spec:
containers:
- name: test
image: busybox
command: ["/bin/sh", "-c"]
args:
- |
echo "Pod started: $(date)" >> /data/timestamps.txt
echo "All recorded starts:"
cat /data/timestamps.txt
sleep 3600
volumeMounts:
- name: data
mountPath: /data
volumes:
- name: data
persistentVolumeClaim:
claimName: test-pvc
Deploy and verify:
kubectl apply -f storage-test.yaml
kubectl logs storage-test
First run output:
All recorded starts:
Pod started: Mon Jan 13 10:15:32 UTC 2025
Now delete the pod (but not the PVC) and recreate it:
kubectl delete pod storage-test
kubectl apply -f storage-test.yaml
kubectl logs storage-test
Second run output:
All recorded starts:
Pod started: Mon Jan 13 10:15:32 UTC 2025
Pod started: Mon Jan 13 10:18:45 UTC 2025
The original timestamp survives—proving that data written to a PersistentVolumeClaim persists across pod restarts.
Once we no longer need the storage we can clean up the pods and PVC:
kubectl delete -f storage-test.yaml
Verifying Your Setup
You should now have:
- Ubuntu Server running with SSH access
- VS Code connected for remote development
- MicroK8s with DNS, storage, and load balancer addons
- Successful test deployments
Check resource usage:
htop
With just Kubernetes running:
- RAM: ~4-5 GB used
- CPU: ~5-10% used
- Disk: ~25-30 GB used
Plenty of headroom for the tools we’ll deploy next.
What’s Next
With the foundation in place, the next chapter deploys the core data infrastructure: MinIO for S3-compatible storage, Dagster for pipeline orchestration, and Prometheus with Grafana for monitoring. Stay tuned!
Feel free to post comments and reactions below.
Comments