99 Languages, One Model: How dijin Handles Every Accent
dijin supports 99 languages for transcription β not by downloading 99 different models, but by running a single on-device speech engine with per-language audio processing profiles.
One Model, Many Profiles
The on-device speech engine natively supports 99 languages. dijin adds a layer of per-language optimization:
| Tier | Languages | Profile Type | Tuning |
|---|---|---|---|
| Optimized (16) | en, tr, ja, zh, ko, es, fr, de, it, pt, ru, nl, ar, sv, no, fi | Fine-tuned | Custom VAD, AGC, quality thresholds |
| Default (83) | vi, th, id, pl, uk, hi, cs, el, hu, ro, da, sk, and 71 more | Robust default | Works well for most accents and environments |
How Language Detection Works
Audio Segment Captured
Voice Activity Detection (VAD) identifies speech segments in the audio stream.
Language Auto-Detection
The speech engine automatically detects the spoken language from the audio content.
Profile Loading
The appropriate per-language audio processing profile is loaded for optimal results.
Transcription Output
Text is produced in the detected language with language-specific optimizations applied.
Per-Language Audio Profiles
Different languages have different acoustic characteristics. Japanese has distinct pitch patterns, Arabic has emphatic consonants, tonal languages like Mandarin need different frequency analysis. dijin's profiles tune:
| Parameter | What It Controls | Why It Matters |
|---|---|---|
| VAD Sensitivity | Speech vs silence detection | Tonal languages need different thresholds |
| AGC Parameters | Automatic gain control | Language dynamics vary (volume, pace) |
| Quality Thresholds | Confidence calibration | Per-language accuracy optimization |