
Most natural real-time TTS with voice cloning and sub-60ms latency, on-prem or via API. Grammar-aware normalization reads phone numbers, IBANs, addresses, and medications naturally across 25+ languages, with word-level timestamps and IPA support. Adapters for LiveKit, Pipecat, and Vapi. Built by 4 in Berlin.
KugelAudio is a real-time text-to-speech model that offers voice cloning and sub-60ms latency, available for self-hosting or via API. It supports over 25 languages with features like grammar-aware normalization and word-level timestamps.