Speaker Recognition: How dijin Knows Who's Talking
In a meeting with five people, a raw transcript is almost useless without knowing who said what. dijin solves this with on-device speaker diarization.
How Voice Enrollment Works
Record a Short Sample
Speak for about 10 seconds. dijin captures your unique vocal characteristics using the on-device microphone.
Voice Embedding Generated
A compact mathematical representation of your voice is created entirely on-device. No server involved.
Stored Locally
The embedding stays on your device. Not uploaded anywhere. Hardware-encrypted in Keychain.
Recognition Across Sessions
Days, weeks, months later β dijin matches the voice automatically. No re-enrollment needed.
The Diarization Pipeline
Privacy by Design
Cross-Session Memory
Because dijin stores embeddings locally, speaker identity persists across sessions. Meeting with Ayse on Monday and again on Thursday? dijin recognizes Ayse in both β no re-enrollment needed.
| Scenario | What Happens |
|---|---|
| First meeting with Ayse | dijin clusters voice, you label "Ayse" |
| Meeting 3 days later | dijin auto-recognizes Ayse from stored embedding |
| New unknown speaker | Labeled "Speaker 2" until you assign a name |
| Multiple speakers at once | Each speaker gets separate embedding and label |
This separates a transcription tool from a memory system. dijin doesn't just record words β it remembers who said them.