SL
Signal Loom AI
Don't Transcribe it — Loom it.

Watch audio become structured intelligence

A live walkthrough of what Signal Loom actually delivers — not a transcript, but a machine-readable knowledge object your AI systems can search, cite, and act on.

20s
Audio input
4.3s
To structured output
4
Timestamps generated
1
Knowledge object output
4
Output formats
1
Source Received — Provenance Attached
YouTube · File upload · Direct URL · Live stream
Any media source — YouTube URL, uploaded file, direct link, or live microphone — enters the pipeline with its provenance metadata attached. Signal Loom records what it is, where it came from, and what platform it came from before processing begins.
Source metadata attached
source_ref: youtube.com/watch?v=dQw4w9WgXcQ source_kind: youtube_url media_kind: audio platform: youtube title: [extracted from video metadata]
Provenance metadata attached
2
Signal Loom Schema — Temporal + Semantic Intelligence
timestamped segments · language · duration · source reference
Audio is transformed into structured data the moment it enters the pipeline. Every word gets a timestamp. Every segment gets a time boundary. The source gets a provenance reference. What was once unsearchable audio becomes a citable, queryable knowledge object.
Schema dimensions extracted
Provenance: source_kind=youtube_url | platform=youtube Temporal: 4 segments mapped | start=0.48s | end=16.88s Language: English (detected, no explicit language flag) Duration: 20.0 seconds processed in 4.3s (4.6× realtime) Format: JSON / SRT / VTT / TXT — all from a single pass
Structured intelligence delivered
3
Don't Transcribe it — Loom it. Loose threads become the lasting fabric in AI Context building.
schema-defined JSON · 4 formats · provenance-rich
The output isn't a transcript for humans to read — it's a structured knowledge object for AI systems to reason about. Timestamps enable citation. Segments enable retrieval. Provenance enables source-grounding. This is what makes audio useful to agents, RAG pipelines, and enterprise workflows.
Signal Loom Schema — JSON output
Schema: "Signal Loom Schema v1" (schema.signalloomai.com) Source: source_kind=youtube_url | source_ref=youtube.com/watch?v=... Temporal: 4 segments | chars=186 | 4.3s processing Language: English | model=mlx-community/whisper-large-v3-turbo What AI systems get: → Timestamps for citation: "see segment S2, 6.46s–11.16s" → Segments for retrieval: chunk by topic, not by token count → Provenance for grounding: always know what video a quote came from → Multi-format output: JSON (AI) · SRT (subtitles) · VTT (web) · TXT (archive)
Four formats — one pipeline pass
4
Structured Output — Four Formats, One Pass
JSON · SRT · VTT · TXT
Every run produces all four output formats simultaneously. JSON for AI systems, SRT and VTT for subtitles and captions, TXT for simple archival. Timestamps on every word in every format.
JSON
SRT
VTT
TXT
Signal Loom Schema — JSON output (truncated)
{ "schema": "Signal Loom Schema v1", "schema_url": "https://signalloomai.com/schema", "source_ref": "youtube.com/watch?v=...", "source_kind": "youtube_url", "media_kind": "audio", "language": "en", "duration_seconds": 20.0, "segments": [ { "segment_id": "S1", "start_seconds": 0.48, "end_seconds": 5.64, "start_time": "00:00:00", "end_time": "00:00:06", "text": "Hello, this is Travis Brady with AIM-T Pulse..." } ], "metadata": { "schema_dimensions": [ { "dimension": "temporal", "description": "timestamped segments with start/end seconds" }, { "dimension": "provenance", "description": "source reference + platform metadata" } ] } }
This is the difference that matters.
A transcript is for humans. Structured, timestamped, provenance-rich JSON is for AI systems.

Every segment has a time boundary. Every output carries its source. That makes audio citable in a RAG pipeline, retrievable by timestamp, and groundable in a knowledge graph. Signal Loom doesn't just transcribe — it structurally understands what was said and where it came from.