Build ultra-realistic voice agents with Sesame CSM-1B

We're excited to announce that Vogent now supports ultra-realistic voices from Sesame's CSM-1B model. This wasn't a simple integration; our team re-architected Sesame's voice model from the ground-up to make it super low-latency and available at no additional cost. In our testing, Vogent's CSM-1B implementation generates audio within 200-400 milliseconds, which is faster than even the fastest text-to-speech vendors on the market. Furthermore, as Vogent hosts the rearchitected model, the Sesame voices are available at no additional charge, and are included in HIPAA-compliant workspaces.

Accessing Sesame Voices

To use a Sesame voice, choose a voice with the Sesame badge in the Voice dropdown on your agent's Config page.

Vogent's Sesame inference is based in the US, which is where latency benchmarks were measured. There may be additional latency for users outside the country.

Private Beta Features

Voice Cloning

Vogent also supports creating Sesame voice clones. This feature is in beta -- to access, email jagath@vogent.ai with the subject line Sesame Voice Clone, and we'll enable it in your workspace.

Text-to-speech API

We're also releasing our low-latency CSM-1B implementation as a TTS API. The API is in private beta, and will release soon. For early access, email jagath@vogent.ai with the subject line Sesame TTS API, and we'll provide you with endpoints and keys.

Build Ultrarealistic Voice Agents with Sesame

Accessing Sesame Voices

Private Beta Features

Voice Cloning

Text-to-speech API