A year ago I published a case study analyzing Share of Voice across Puerto Rico’s AM radio stations using Whisper transcriptions. The article ended with a long list of “future work” — fine-tuning, entity recognition, segment classification, summarization. At the time, those were ideas I wanted to explore. As of this month, most of them are running in production.
Monitorea is now in private beta. Here’s what changed and how we got here.
From Mention Detection to Discussion Detection
The original system was simple: record radio, transcribe with Whisper, match keywords with fuzzy search at 95% confidence. It worked for the case study, but in practice it generated an overwhelming amount of noise. A station might mention a politician’s name 50 times in a day across news, ads, casual conversation, and background chatter. Not all of those are worth flagging.
The core insight that shaped the product was this: what matters isn’t that a keyword was mentioned — it’s that an actual discussion happened. A 30-second mention of a name in passing is noise. A 4-minute segment where two commentators debate someone’s policy position is signal.
So instead of building a smarter keyword matcher, we built something different — AI agents that watch broadcast streams, detect when real discussions are happening, capture the full context, and deliver a complete package: the clip, the transcript, an AI-generated summary, and sentiment analysis. All within minutes of the discussion airing.
During our beta testing, this approach reduced alert volume from 578 daily keyword matches to 77 complete discussions — 87% less noise while maintaining full visibility into what actually matters.
The Pipeline
The case study’s pipeline was straightforward: record → transcribe → pattern match. The production pipeline has a few more steps:
Recording — Edge recorders run 24⁄7 on each station, capturing audio (radio) and video (TV) in segments. These get buffered locally and uploaded to S3.
Transcription — Each segment is sent to a Whisper endpoint for speech-to-text. We’re still using RunPod for GPU inference, running the Whisper medium model.
Transcript Correction — An LLM pass cleans up transcription errors — especially useful for Puerto Rican Spanish, where local names, slang, and code-switching trip up the base Whisper model.
Segmentation — Another AI pass splits the broadcast into logical segments: news blocks, interviews, commercial breaks, music. This is the segment classification I described as future work in the case study — turns out LLMs handle it well without needing a dedicated fine-tuned model.
Entity Extraction — Identifies who’s being discussed, what topics are covered, and which organizations are mentioned. This replaces the simple fuzzy matching from the original system.
Summarization and Sentiment — Each segment gets an AI-generated summary with sentiment analysis, so users can scan dozens of discussions without reading full transcripts.
Embeddings — Everything gets vectorized and stored in PostgreSQL with pgvector, enabling semantic search. Users can search by meaning, not just keywords — “discussions about energy policy” returns relevant results even if the exact phrase never appears in the transcript.
Each step is a Celery task that runs asynchronously. If a step has already been completed (say, during a reprocess), it skips automatically. The whole chain runs in about 5-10 minutes from the moment a segment airs to when the intelligence is delivered.
The Tech Stack
The case study ran on a fairly minimal setup — FastAPI, Postgres, Celery, a NextJS frontend, and RunPod for transcription. The production stack evolved from there:
- Backend: FastAPI with SQLAlchemy 2.0, PostgreSQL with pgvector, Redis + Celery for task orchestration
- Frontend: Next.js with TypeScript, Tailwind CSS, shadcn/ui
- Recording: Custom Python edge recorders with FFmpeg, running on local hardware (a mini PC for Puerto Rico radio, another for TV via HLS streams)
- AI Pipeline: Whisper on RunPod for transcription, Claude and GPT-4 for correction/segmentation/extraction/summarization
- Infrastructure: DigitalOcean App Platform for the web services, local hardware for recording and transcription
- Search: Hybrid semantic + full-text search powered by pgvector embeddings
One thing I’m happy with is how the recording infrastructure turned out. Each edge recorder manages its own buffer, processing queue, and upload pipeline. If the network drops, segments accumulate locally and catch up when connectivity returns. If a segment fails to process, it can be backfilled on demand. For a system that needs to record 24⁄7 without gaps, this resilience matters.
Agents, Not Alerts
The product positioning went through a few iterations, but we landed on something that resonates: AI agents, not keyword alerts.
A Monitorea agent is a topic, person, or organization you want to monitor — “Governor,” “energy reform,” “your brand.” Each agent watches all monitored stations continuously. When it detects a real discussion (not just a passing mention), it captures the complete context and delivers it as a clip with transcript, summary, and analysis.
The mental model shift matters. Legacy media monitoring services charge per clip and deliver fragmented keyword matches. Human analysts cost $60K+/year and work business hours. Monitorea agents cost a fraction of that and work 24⁄7. During our beta, a single customer’s agents processed an energy crisis in real time — detecting 77 relevant discussions across multiple stations and delivering clips with analysis within minutes. Their estimate was that it saved about 52 hours of manual review.
What’s Live Today
We’re launching the private beta focused on Puerto Rico, monitoring:
- Radio: NotiUno 630 AM, WAPA Radio 680 AM, WKAQ 580 AM, Radio Isla 1320 AM
- TV: Telemundo Puerto Rico, TeleOnce, WAPA TV, MegaTV
- Podcasts: 11 major Puerto Rican podcasts
The platform includes:
- Agent deployment — Configure what to monitor with multiple keywords per agent
- Discussion clips — AI-generated clips with adjustable timestamps for manual refinement
- Shareable pages — One link per clip with the recording, transcript, and AI analysis, ready to share with clients or leadership
- Semantic search — Search broadcast archives by meaning, not just keywords
- Daily summaries — Automated AI-generated digests delivered by email
- Trends — Track which entities and topics are gaining traction over time
We’re starting with a handful of pilot organizations — PR agencies and communications teams in Puerto Rico — and will expand coverage to Washington D.C. and other markets as we validate.
Full Circle
In 2018, my co-founder and I built Monito Media — a media monitoring platform that reached $10K in MRR before we shut it down. The transcription quality wasn’t there, the false positive rate was too high, and the manual review model didn’t scale. We were too early.
Seven years later, the technology caught up. Whisper gives us 90-98% transcription accuracy in Spanish. LLMs handle segmentation, entity extraction, and summarization without needing custom fine-tuned models for each task. Vector databases enable semantic search that would have been a research project in 2018. And the cost of inference has dropped enough that monitoring dozens of stations is economically viable.
The case study was my way of testing whether the thesis still held. It did. Monitorea is the result.
If you’re in PR, communications, or media monitoring and want early access, reach out at hello@monitorea.ai or book a demo.