SL
Signal Loom AI
Media In · Machine Intelligence Out

Watch a File Become Intelligence

A live demonstration of exactly what happens — step by step — when you process audio through Signal Loom AI.

20s
Audio input
4.3s
Processing time
4
Segments
4x
Faster than realtime
0
Cloud API costs
1
File Goes In
Any audio or video format
MP3, MP4, M4A, WAV, MOV, FLAC, AVI, MKV — all handled identically. Video files have audio extracted automatically before processing.
Input file
signal-loom-api/uploads/93e2e7cb...test2.wav Size: 640 KB Format: WAV, 16kHz mono, 256 kb/s
ffmpeg normalization
2
Audio Extraction + Normalization
ffmpeg → 16kHz mono WAV
All input is normalized to a consistent format before Whisper runs. This is where quality is locked in — Whisper gets clean, standardized input regardless of how the original was recorded.
Processing log
[ffmpeg] Input: 640 KB WAV, 20.0 seconds Stream #0:0 -> #0:0 (pcm_s16le -> pcm_s16le) Output: 625 KB, 16kHz mono WAV Speed: 4,293× realtime
MLX Whisper on Apple Silicon
3
Whisper Transcription — MLX on Apple Silicon
mlx-community/whisper-large-v3-turbo · Local inference
Whisper runs entirely on your hardware via Apple's MLX framework. No cloud. No API calls. No per-minute costs. The audio never leaves your infrastructure. Privacy by architecture.
Console output
Detected language: English [00:00.480 --> 00:05.640] Hello, this is Travis Brady with AIM-T Pulse... [00:06.460 --> 00:11.160] More extemporaneous narrative through the slides today. [00:12.980 --> 00:14.900] Excited to tell you about our company. [00:16.340 --> 00:16.880] Slide two. Segments: 4 · Chars: 186 · Time: 4.28s
Structured output — 4 formats generated
4
Structured Output — Four Formats, One Pass
JSON · SRT · VTT · TXT
Every run produces all four output formats simultaneously. JSON for AI systems, SRT and VTT for subtitles and captions, TXT for simple archival. Timestamps on every word in every format.
JSON
SRT
VTT
TXT
Structured JSON output
{ "title": "93e2e7cb...test2", "source_kind": "local_av", "media_kind": "audio", "language": "en", "model": "mlx-community/whisper-large-v3-turbo", "duration_seconds": 20.0, "segments": [ { "segment_id": "S1", "start_seconds": 0.48, "end_seconds": 5.64, "start_time": "00:00:00", "end_time": "00:00:06", "text": "Hello, this is Travis Brady with AIM-T Pulse and AIM Elemental Health Solutions." }, { "segment_id": "S2", "start_seconds": 6.46, "end_seconds": 11.16, "text": "More extemporaneous narrative through the slides today." } ] }
This is the difference that matters.
A transcript is for humans. Structured, timestamped JSON is for machines.

Every word has a timestamp. Every segment has a time boundary. Every output carries the metadata that makes it useful to AI systems — not just readable by people. That's Signal Loom.