Speaker Recognition: How dijin Knows Who's Talking

In a meeting with five people, a raw transcript is almost useless without knowing who said what. dijin solves this with on-device speaker diarization.

ℹ

Speaker diarization answers the question "who said what?" — turning a wall of text into an attributed conversation with names, timestamps, and context.

How Voice Enrollment Works

Record a Short Sample

Speak for about 10 seconds. dijin captures your unique vocal characteristics using the on-device microphone.

Voice Embedding Generated

A compact mathematical representation of your voice is created entirely on-device. No server involved.

Stored Locally

The embedding stays on your device. Not uploaded anywhere. Hardware-encrypted in Keychain.

Recognition Across Sessions

Days, weeks, months later — dijin matches the voice automatically. No re-enrollment needed.

The Diarization Pipeline

Speaker Diarization Pipeline

Audio Input

VAD (Voice │ ──▶ Split into segments Activity Det)

Embedding │ ──▶ Per-segment voice vector Computation

Clustering │ ──▶ Group similar voices

Matching │ ──▶ Compare vs enrolled voiceprints

└──────────────┘```

~10 seconds

Enrollment

100% on-device

Processing

Cross-session

Persistence

Embeddings never uploaded

Privacy

Privacy by Design

🔒

Voice embeddings are one-way mathematical representations. They cannot be reversed into audio. Even if someone accessed the embedding data, they could not reconstruct your voice. Embeddings never leave your device.

Cross-Session Memory

Because dijin stores embeddings locally, speaker identity persists across sessions. Meeting with Ayse on Monday and again on Thursday? dijin recognizes Ayse in both — no re-enrollment needed.

Scenario	What Happens
First meeting with Ayse	dijin clusters voice, you label "Ayse"
Meeting 3 days later	dijin auto-recognizes Ayse from stored embedding
New unknown speaker	Labeled "Speaker 2" until you assign a name
Multiple speakers at once	Each speaker gets separate embedding and label

This separates a transcription tool from a memory system. dijin doesn't just record words — it remembers who said them.