What if you could run the same data infrastructure that powers AI systems at major tech companies—right from your home office? In this series, you’ll build a complete AI-ready data platform on a single server, gaining hands-on experience with Kubernetes, data pipelines, experiment tracking, and model serving.

By the end, you’ll have a production-ready environment for ingesting data, training models, tracking experiments, and serving predictions—all running on hardware you own.

Why Build Your Own AI Platform?

Cloud platforms are excellent for production workloads, but they come with costs that add up quickly during learning and experimentation. A home server offers advantages:

  • No recurring bills — Pay once for hardware, run experiments indefinitely
  • Full control — No vendor lock-in, no artificial limitations
  • Real-world experience — Learn infrastructure skills that transfer directly to production
  • Always available — No need to spin up/down resources or worry about idle costs

Running your own infrastructure teaches you things documentation can’t: how Kubernetes orchestrates workloads, why engineers choose specific storage patterns, what happens when pipelines fail at 3 AM, and how to debug systems when logs aren’t enough.

The skills transfer directly. The same Kubernetes manifests run on cloud clusters. The same pipeline patterns scale to enterprise deployments.

Hardware Requirements

You don’t need enterprise hardware. A modern mini PC provides an excellent balance of performance, power efficiency, and cost.

Minimum specifications:

  • RAM: 32GB minimum (64GB recommended)
  • Storage: 1TB NVMe SSD
  • CPU: Modern x86_64 processor with 8+ cores
  • Network: Gigabit Ethernet

Why mini PCs work well:

  • Silent operation — Quiet enough for an office
  • Low power — 15-65W under load vs. 200W+ for towers
  • Small footprint — Fits on a shelf
  • Adequate performance — Modern CPUs handle most ML workloads

Models from Beelink, Intel NUC, or Minisforum are popular choices. Any mini PC meeting the specs above will work.

The single-node constraint is actually a feature: it forces efficient architectural decisions rather than throwing hardware at problems.

What We’ll Build

Throughout this series, you’ll deploy a complete AI data platform:

  • Kubernetes — Container orchestration and infrastructure management
  • Data lake — Medallion architecture for raw, cleaned, and curated data
  • Pipeline orchestration — Automated data workflows with Dagster
  • Data quality — Validation gates with Great Expectations
  • Monitoring — Prometheus and Grafana for observability
  • ML integration — Feature stores, experiment tracking, model serving

This first article covers the foundation: Ubuntu Server and Kubernetes installation.

Part 1: Ubuntu Server Installation

These instructions assume a dedicated machine where Ubuntu can use the entire disk. If you need a dual-boot setup, adjust the storage configuration accordingly.

Ubuntu Server 24.04 LTS has lower overhead than the desktop version—important when you want maximum resources for workloads.

Creating Installation Media

Download Ubuntu Server 24.04 LTS from ubuntu.com/download/server.

On macOS:

# Find your USB drive identifier
diskutil list

# Unmount the USB (replace diskN with your disk number)
diskutil unmountDisk /dev/diskN

# Write the ISO to USB
sudo dd if=~/Downloads/ubuntu-24.04.1-live-server-amd64.iso of=/dev/rdiskN bs=1m

# Eject when complete
diskutil eject /dev/diskN

On Windows, use Rufus or balenaEtcher.

Installing Ubuntu

Connect a monitor and keyboard to your mini PC for initial setup.

  1. Insert USB and power on
  2. Press F2, F12, or DEL during boot to access BIOS
  3. Select USB as boot device

During installation:

SettingRecommendation
Installation typeUbuntu Server (minimal)
NetworkUse DHCP initially; note the IP
StorageUse entire disk with default partitioning
Server nameSomething memorable (e.g., mlserver)
SSHEnable OpenSSH server
Additional packagesSkip

After installation, remove the USB and reboot.

Initial Configuration

# Update the system
sudo apt update && sudo apt upgrade -y

# Install essential tools
sudo apt install -y \
    build-essential \
    curl \
    wget \
    git \
    vim \
    htop \
    net-tools

# Set your timezone
sudo timedatectl set-timezone Your/Timezone

# Enable automatic security updates
sudo apt install -y unattended-upgrades
sudo dpkg-reconfigure -plow unattended-upgrades

Find your server’s IP:

ip addr show
# Look for the inet address on your ethernet interface

Part 2: Remote Access Setup

SSH enables comfortable remote access from your main computer.

SSH Key Authentication

On your main computer:

# Check for existing keys
ls ~/.ssh/id_ed25519*

# Generate new key if needed
ssh-keygen -t ed25519 -C "your-email@example.com"

# Copy public key to server
ssh-copy-id your-username@server-ip-address

# Test passwordless login
ssh your-username@server-ip-address

VS Code Remote Development

VS Code’s Remote-SSH extension lets you edit files locally while execution happens on the server.

  1. Install VS Code
  2. Install the “Remote - SSH” extension
  3. Press Cmd+Shift+P → “Remote-SSH: Open Configuration File”
  4. Add your server:
Host mlserver
    HostName your-server-ip
    User your-username
    ForwardAgent yes
    ServerAliveInterval 60

Connect: Cmd+Shift+P → “Remote-SSH: Connect to Host” → select your server.

Part 3: Installing Kubernetes with MicroK8s

MicroK8s provides a lightweight but fully-featured Kubernetes distribution for single-node deployments.

Why MicroK8s over alternatives (k3s, kind, minikube)?

  • Production-grade — Full Kubernetes API compatibility
  • Addon system — Easy installation of common components
  • Kubeflow integration — First-class ML pipeline support
  • Low overhead — Runs well on a single node

Installation

# Install MicroK8s
sudo snap install microk8s --classic --channel=1.31/stable

# Add your user to the microk8s group
sudo usermod -a -G microk8s $USER
mkdir -p ~/.kube
chmod 0700 ~/.kube

# Apply group changes
newgrp microk8s

# Wait for ready
microk8s status --wait-ready

Create a kubectl alias:

echo "alias kubectl='microk8s kubectl'" >> ~/.bashrc
source ~/.bashrc

Enabling Essential Addons

# DNS for service discovery
microk8s enable dns

# Storage for persistent data
microk8s enable hostpath-storage

# Load balancer for external access
microk8s enable metallb

When enabling MetalLB, you’ll be prompted for an IP range. This must be:

  • On the same subnet as your server
  • Outside your router’s DHCP range

Example: if your router assigns 192.168.1.100-200, use 192.168.1.210-250 for MetalLB.

Verify addons:

microk8s status

Part 4: Kubernetes Fundamentals

Core concepts you’ll use throughout this series:

Pods — Smallest deployable units. Run one or more containers sharing storage and network.

Deployments — Manage pods declaratively. Specify desired state, Kubernetes makes it happen.

Services — Provide stable network endpoints to pods.

PersistentVolumes — Storage that survives pod restarts.

Essential kubectl Commands

# View resources
kubectl get pods
kubectl get services
kubectl get deployments

# Detailed information
kubectl describe pod <pod-name>

# View logs
kubectl logs <pod-name>

# Execute commands in a pod
kubectl exec -it <pod-name> -- /bin/bash

# Apply configuration
kubectl apply -f manifest.yaml

# Delete resources
kubectl delete -f manifest.yaml

Part 5: Deploy Your First Application

Verify everything works with a simple deployment.

Create hello-app.yaml:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: hello-app
  labels:
    app: hello
spec:
  replicas: 2
  selector:
    matchLabels:
      app: hello
  template:
    metadata:
      labels:
        app: hello
    spec:
      containers:
      - name: hello
        image: gcr.io/google-samples/hello-app:1.0
        ports:
        - containerPort: 8080
        resources:
          requests:
            memory: "64Mi"
            cpu: "100m"
          limits:
            memory: "128Mi"
            cpu: "200m"
---
apiVersion: v1
kind: Service
metadata:
  name: hello-service
spec:
  type: LoadBalancer
  selector:
    app: hello
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080

Deploy:

kubectl apply -f hello-app.yaml

# Watch pods start
kubectl get pods

# Get external IP (may take 30 seconds)
kubectl get services

# Test
curl http://<external-ip>

You should see:

Hello, world!
Version: 1.0.0
Hostname: hello-app-xxxxx

Scaling and Updates

# Scale to 5 replicas
kubectl scale deployment hello-app --replicas=5

# Rolling update with zero downtime
kubectl set image deployment/hello-app hello=gcr.io/google-samples/hello-app:2.0
kubectl rollout status deployment/hello-app

Clean up:

kubectl delete -f hello-app.yaml

Part 6: Testing Persistent Storage

ML workloads need storage that persists beyond pod lifecycles.

Create storage-test.yaml:

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: test-pvc
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: microk8s-hostpath
---
apiVersion: v1
kind: Pod
metadata:
  name: storage-test
spec:
  containers:
  - name: test
    image: busybox
    command: ["/bin/sh", "-c"]
    args:
      - |
        echo "Pod started: $(date)" >> /data/timestamps.txt
        echo "All recorded starts:"
        cat /data/timestamps.txt
        sleep 3600
    volumeMounts:
    - name: data
      mountPath: /data
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: test-pvc

Deploy and verify:

kubectl apply -f storage-test.yaml
kubectl logs storage-test

First run output:

All recorded starts:
Pod started: Mon Jan 13 10:15:32 UTC 2025

Now delete the pod (but not the PVC) and recreate it:

kubectl delete pod storage-test
kubectl apply -f storage-test.yaml
kubectl logs storage-test

Second run output:

All recorded starts:
Pod started: Mon Jan 13 10:15:32 UTC 2025
Pod started: Mon Jan 13 10:18:45 UTC 2025

The original timestamp survives—proving that data written to a PersistentVolumeClaim persists across pod restarts.

Once we no longer need the storage we can clean up the pods and PVC:

kubectl delete -f storage-test.yaml

Verifying Your Setup

You should now have:

  • Ubuntu Server running with SSH access
  • VS Code connected for remote development
  • MicroK8s with DNS, storage, and load balancer addons
  • Successful test deployments

Check resource usage:

htop

With just Kubernetes running:

  • RAM: ~4-5 GB used
  • CPU: ~5-10% used
  • Disk: ~25-30 GB used

Plenty of headroom for the tools we’ll deploy next.

What’s Next

With the foundation in place, the next chapter deploys the core data infrastructure: MinIO for S3-compatible storage, Dagster for pipeline orchestration, and Prometheus with Grafana for monitoring. Stay tuned!

Feel free to post comments and reactions below.