Help

Onboarding Deep Dive Voice Mode

Airgentic Help

Deep-Dive: Voice Mode

This module covers voice interaction capabilities in Airgentic. Read this when your service has voice mode enabled or when you're planning to enable it.


Who this is for

  • Platform administrators configuring voice settings
  • Anyone planning to add voice capabilities to their service
  • Content owners who want to understand how voice affects responses

What you'll learn

  • How voice mode works
  • Configuration options for voice interactions
  • How voice affects response style
  • Best practices for voice-enabled services

How voice mode works

The user experience

When voice mode is enabled:

  1. A microphone button appears in the chat widget
  2. User clicks/taps to speak their question
  3. Speech is converted to text
  4. The AI processes the question normally
  5. Response is returned as both text and audio
  6. User can continue by voice or switch to text

Voice and text can be mixed freely — users aren't locked into one mode.

Behind the scenes

Voice mode uses the same:
- Agents and routing
- Knowledge sources and search
- Functions and capabilities

The main difference is how responses are phrased — voice responses may be adapted for spoken delivery.


Configuration options

Voice settings are found in Service Configuration → Voice Mode.

Enable/disable

Toggle voice mode on or off. When disabled, the microphone button doesn't appear.

Voice preamble prompt

A block of instructions prepended to every voice interaction:

What to include:
- Language and accent preferences
- Tone adjustments for spoken delivery
- Instructions to keep responses concise
- Guidance on pronunciation or terminology

Example:

Speak in Australian English with a friendly, conversational tone. 
Keep responses brief and easy to follow when spoken aloud.
Avoid long lists; summarise instead.

The preamble ensures consistent voice personality across all interactions.

Voice welcome prompt

The initial prompt for voice sessions — defines the greeting and initial behaviour:

What to include:
- How to greet the user
- What to offer or ask

Example:

You are a helpful assistant for {{title}}. Greet the user warmly and ask how you can help them today.

The {{title}} substitution inserts your service name.


Voice vs text differences

Response style

Voice responses often need different phrasing:

Text Voice
Long, detailed answers Shorter, summarised answers
Bullet lists Numbered or flowing sequences
Exact URLs "I'll show you a link" (since URLs are in the text too)
Technical precision Plain language explanations

The AI naturally adapts when in voice mode, but you can guide this further in your preamble.

Handling complexity

Some content works better in text:
- Long lists of options
- Detailed specifications
- Step-by-step instructions with many steps

Consider how your content will sound when spoken. Very technical or list-heavy content may be harder to follow in voice.


Best practices

Design for listening

  • Assume users can't easily re-read — be clear the first time
  • Avoid jargon that's hard to understand when heard
  • Use natural pauses and structure in responses

Keep it conversational

Voice interactions feel more personal. The AI should sound like a helpful colleague, not a document being read aloud.

Test with voice

When testing in Admin Chat:
- Enable voice mode
- Actually speak your test questions
- Listen to the responses
- Check if they sound natural and helpful

Consider your audience

Voice may be particularly useful for:
- Mobile users
- Accessibility needs
- Hands-busy scenarios
- Users who prefer speaking to typing

Not every service needs voice. Enable it when it adds value for your users.


Limitations

What voice mode doesn't change

  • Same knowledge sources and search
  • Same agent routing
  • Same functions and capabilities
  • Same curated answers

Technical considerations

  • Requires user's microphone access
  • Speech recognition accuracy varies by environment
  • Audio playback needs speakers or headphones
  • May use more data than text-only

Content considerations

  • Very long responses are hard to follow in audio
  • Complex information may be better delivered in text
  • Users can always read the text version alongside audio


Back to: Optional Deep-Dives

You have unsaved changes