<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Infrastructure on David Bartolomei-Guzmán</title><link>https://www.davidbartolomei.com/categories/infrastructure/</link><description>Recent content in Infrastructure on David Bartolomei-Guzmán</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Sun, 05 Apr 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://www.davidbartolomei.com/categories/infrastructure/feed.xml" rel="self" type="application/rss+xml"/><item><title>Migrating from RunPod to Local Whisper Inference with MLX and a DGX Spark</title><link>https://www.davidbartolomei.com/migrating-from-runpod-to-local-whisper-inference/</link><pubDate>Sun, 05 Apr 2026 00:00:00 +0000</pubDate><guid>https://www.davidbartolomei.com/migrating-from-runpod-to-local-whisper-inference/</guid><description>&lt;p&gt;In my &lt;a href="https://www.davidbartolomei.com/case-study-leveraging-machine-learning-for-spoken-media-analysis-share-of-voice-of-puerto-ricos-political-figures-in-2024/"&gt;previous article&lt;/a&gt;, I described how we use OpenAI&amp;rsquo;s Whisper model to transcribe radio and TV broadcasts for Monitorea, our media monitoring platform. At the time, we were running inference on RunPod - a serverless GPU platform that lets you deploy ML models without managing hardware. It was the right call to get started quickly. But as we scaled, the economics stopped making sense.&lt;/p&gt;
&lt;p&gt;Here&amp;rsquo;s how we migrated to fully local inference in about a weekend, using MLX on Apple Silicon and a DGX Spark we call Sparky.&lt;/p&gt;</description></item></channel></rss>