Migrating from RunPod to Local Whisper Inference with MLX and a DGX Spark

Sun, 05 Apr 2026 00:00:00 +0000

In my previous article, I described how we use OpenAI’s Whisper model to transcribe radio and TV broadcasts for Monitorea, our media monitoring platform. At the time, we were running inference on RunPod - a serverless GPU platform that lets you deploy ML models without managing hardware. It was the right call to get started quickly. But as we scaled, the economics stopped making sense.

Here’s how we migrated to fully local inference in about a weekend, using MLX on Apple Silicon and a DGX Spark we call Sparky.

Infrastructure on David Bartolomei-Guzmán

Migrating from RunPod to Local Whisper Inference with MLX and a DGX Spark