Voice APIOutbound Call AI Streaming

Outbound Call AI Streaming

Lê Đức Tuệ·6/2/2026

WebSocket protocol for AI voice bot integration with Voice Gateway. The vendor provides a WebSocket server according to the spec below; the Gateway will connect when the call starts and exchange audio/control events throughout the call. Reference model: Twilio Media Streams.

Version 1.0— Audience: AI Voice Bot Vendor

Table of Contents

  1. Overview & Architecture

  2. Processing Flow — Sequence diagrams

  3. REST API — Initiate call

  4. WebSocket Connection

  5. Audio Format

  6. Events — Gateway → Bot

  7. Events — Bot → Gateway

  8. Contact


01. Overview & Architecture

Integration consists of 2 parts: REST API for the vendor to request the Gateway to dial, and WebSocket for the Gateway to stream audio/control events with the bot after the caller picks up.

Half-duplex in v1:The bot sends audio response, then the Gateway forwards the caller's audio. The bot does not needto handle barge-in in this version.

1.1 Overall Architecture

          VENDOR SIDE                                    PLATFORM SIDE                   PSTN
┌──────────────────────────┐              ┌──────────────────────────────────┐         ┌────────┐
│                          │              │                                  │         │        │
│  ┌────────────────────┐  │   1. REST    │  ┌────────────────────────────┐  │   SIP   │        │
│  │                    │──┼──────────────┼─►│                            │─────────── ──►      │
│  │  REST client       │  │   POST       │  │     Voice Gateway          │  │         │ Caller │
│  │  (vendor app)      │◄─┼──────────────┼──│     (REST server)          │◄─────────────       │
│  │                    │  │   200 OK     │  │                            │  │         │        │
│  └────────────────────┘  │              │  └──────────────┬─────────────┘  │         │        │
│                          │              │                 │                │         │        │
│  ┌────────────────────┐  │              │                 │ 2. WS connect  │         │        │
│  │                    │◄─┼──────────────┼─────── WSS ─────┘ (sau khi caller│         │        │
│  │  AI Voice Bot      │  │              │                   nhấc máy)     │         │        │
│  │  (WebSocket        │──┼──────────────┼─────── WSS ─────────────────────►│         │        │
│  │   server)          │  │              │                                  │         └────────┘
│  │                    │  │              │                                  │
│  │  STT → LLM → TTS   │  │              │                                  │
│  └────────────────────┘  │              │                                  │
│                          │              │                                  │
└──────────────────────────┘              └──────────────────────────────────┘

Components

Roles

Voice Gateway

WebSocket client(to bot), also a REST server(receiving dial commands from vendor)

AI Voice Bot

WebSocket server(receiving connections from Gateway), also a REST client(calling dial API)

1.2 Flow Summary

Step

Action

Description

1

Vendor requests dialing via REST API

Call POST /v1/voice/callbotwith phone, campaignId, transactionId, socketUrl, personalized metadata. Async API — responds immediately 200 OK.

2

Gateway dials via SIP/PSTN

The Gateway makes an outbound call to the caller. If the caller does not pick up → no WebSocket is opened.

3

Gateway opens WebSocket to bot

After the caller picks up, the Gateway connects to socketUrlof the vendor, sends connectedthen startwith call metadata.

4

Exchange audio/control

The bot sends audio response (media) + mark. The Gateway plays audio for the caller, and when finished, echoes markback. The bot activates ASR, and the Gateway forwards the caller's audio (mediatrack inbound).

5

End the call

The caller hangs up / the bot sends stop/ the bot transferto the agent. The Gateway sends stop+ closes WS with close code 1000.


02. Processing Flow — Sequence diagrams

Three common scenarios: happy path with transfer, bot actively ends, caller does not pick up.

2.1 Happy path — from dialing to call end


    AI Voice Bot                                 Voice Gateway                                   Caller
    ════════════                                 ═════════════                                   ══════
          │                                            │                                            │
════  PHASE 0 · Vendor yêu cầu quay số (REST API)  ═══════════════════════════════════════════════════════════
          │                                            │                                            │
          │   POST /v1/voice/callbot                   │                                            │
          ├───────────────────────────────────────────►│                                            │
          │phone, campaignId, transactionId, socketUrl │                                            │
          │                                            │                                            │
          │   200 OK  {error_code: "success"}          │                                            │
          │◄───────────────────────────────────────────┤                                            │
          │                                            │                                            │
          │                                            │   Dial (SIP / PSTN)                        │
          │                                            ├───────────────────────────────────────────►│
          │                                            │   Nhấc máy                                 │
          │                                            │◄───────────────────────────────────────────┤
          │                                            │                                            │
════  PHASE 1 · WebSocket handshake & start  ═════════════════════════════════════════════════════════════════
          │                                            │                                            │
          │   WS connect                               │                                            │
          │◄───────────────────────────────────────────┤                                            │
          │      wss://.../ws/voice?api_key=...        │                                            │
          │                                            │                                            │
          │   101 Switching Protocols                  │                                            │
          ├───────────────────────────────────────────►│                                            │
          │                                            │                                            │
          │   connected                                │                                            │
          │◄───────────────────────────────────────────┤                                            │
          │                                            │                                            │
          │   start                                    │                                            │
          │◄───────────────────────────────────────────┤                                            │
          │         call_sid, metadata.custom          │                                            │
          │                                            │                                            │
════  PHASE 2 · Loop hội thoại (lặp cho mỗi lượt)  ═══════════════════════════════════════════════════════════
          │                                            │                                            │
            ╭── [ BOT NÓI ] ──────────────────────────────────────────────────────────────────────╮
          │                                            │                                            │
          │   media                                    │                                            │
          ├───────────────────────────────────────────►│                                            │
          │          audio chunks (PCM 8kHz)           │                                            │
          │                                            │                                            │
          │                                            │   phát audio                               │
          │                                            ├┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈►│
          │                                            │                                            │
          │   mark  name: "turn_N_done"                │                                            │
          ├───────────────────────────────────────────►│                                            │
          │                                    [ chờ phát xong ]                                    │
          │   mark (echo)                              │                                            │
          │◄───────────────────────────────────────────┤                                            │
            ╰─────────────────────────────────────────────────────────────────────────────────────╯
          │                                            │                                            │
            ╭── [ CALLER NÓI ] ───────────────────────────────────────────────────────────────────╮
          │                                            │                                            │
          │                                            │   caller speak                             │
          │                                            │◄┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┤
          │                                            │                                            │
          │   media  track: "inbound"                  │                                            │
          │◄───────────────────────────────────────────┤                                            │
            ╰─────────────────────────────────────────────────────────────────────────────────────╯
          │                                            │                                            │
          │                         … bot xử lý và quay lại pha "Bot nói" …                         │
          │                                            │                                            │
════  PHASE 3 · Chuyển cuộc gọi sang agent  ══════════════════════════════════════════════════════════════════
          │                                            │                                            │
          │   transfer  target: "agent_01"             │                                            │
          ├───────────────────────────────────────────►│                                            │
          │                                            │                                            │
          │                                            │   SIP REFER / bridge → agent               │
          │                                            ├───────────────────────────────────────────►│
          │                                            │                                            │
          │   stop  reason: "transferred"              │                                            │
          │◄───────────────────────────────────────────┤                                            │
          │                                            │                                            │

Regarding mark:The bot sends markafter each audio segment, the Gateway echoes markwhen the caller has finished listening. The bot uses this signal to know when to activate ASR to process the caller's response.

2.2 Bot actively ends the call


    AI Voice Bot                                 Voice Gateway
    ════════════                                 ═════════════
          │                                            │
════  Bot chủ động kết thúc cuộc gọi  ════════════════════════════════════════════════════════════════════════
          │                                            │
          │   media  (chunk cuối)                      │
          ├───────────────────────────────────────────►│
          │                                            │
          │   stop  reason: "conversation_complete"    │
          ├───────────────────────────────────────────►│
          │                                            │
          │             [ drain buffer ]               │
          │         phát nốt audio cho caller          │
          │                                            │
          │   stop (ack)  close code 1000              │
          │◄───────────────────────────────────────────┤
          │                                            │

2.3 Caller does not pick up


    AI Voice Bot                                 Voice Gateway                                   Caller
    ════════════                                 ═════════════                                   ══════
          │                                            │                                            │
          │   POST /v1/voice/callbot                   │                                            │
          ├───────────────────────────────────────────►│                                            │
          │                                            │                                            │
          │   200 OK  {error_code: "success"}          │                                            │
          │◄───────────────────────────────────────────┤                                            │
          │                                            │                                            │
          │                                            │   Dial (SIP / PSTN)                        │
          │                                            ├───────────────────────────────────────────►│
          │                                            │                                            │
                    ╔════════════════════════════════════════════════════════════════════╗
                    ║                                                                    ║
                    ║            Không nhấc máy  ·  máy bận  ·  thuê bao tắt             ║
                    ║                                                                    ║
                    ╚════════════════════════════════════════════════════════════════════╝

                                       → Không có WebSocket nào được mở

Note on dialing result feedback:In v1, there is no webhook returning the dial result to the vendor. If the caller does not pick up, the vendor will not receive the information — should use transactionIdfor reconciliation, or contact the integration teamto obtain call_sidwhen debugging is needed.


03. REST API — Initiate Call

The vendor calls this API when they want the Gateway to dial out to the caller (e.g., in an outbound campaign). The API is async— returns immediately upon receiving the request; the dialing process and WebSocket connection occur afterward.

3.1 Endpoint

POST {{baseUrl}}/v1/voice/callbot

3.2 Authentication

Header X-Api-Keyis provided when the vendor registers the bot.

X-Api-Key: <api_key>
Content-Type: application/json

3.3 Request

curl --location '{{baseUrl}}/v1/voice/callbot' \
  --header 'X-Api-Key: {{api_key}}' \
  --header 'Content-Type: application/json' \
  --data-raw '{
    "phone": "0123456789",
    "campaignId": 1,
    "transactionId": "TXN_004",
    "socketUrl": "wss://bot.vendor.com/ws/voice?api_key=xxx",
    "name": "John Doe",
    "email": "john.doe@example.com",
    "address": "123 Main St, Anytown, USA",
    "pField1": "Thông tin cá thể hoá 1",
    "pField2": "Thông tin cá thể hoá 2",
    "pField3": "Thông tin cá thể hoá 3",
    "pField4": "Thông tin cá thể hoá 4",
    "pField5": "Thông tin cá thể hoá 5",
    "pField6": "Thông tin cá thể hoá 6"
  }'
const axios = require('axios')

const response = await axios.post(
  '{{baseUrl}}/v1/voice/callbot',
  {
    phone: '0123456789',
    campaignId: 1,
    transactionId: 'TXN_004',
    socketUrl: 'wss://bot.vendor.com/ws/voice?api_key=xxx',
    name: 'John Doe',
    email: 'john.doe@example.com',
    address: '123 Main St, Anytown, USA',
    pField1: 'Thông tin cá thể hoá 1',
    pField2: 'Thông tin cá thể hoá 2',
    pField3: 'Thông tin cá thể hoá 3',
    pField4: 'Thông tin cá thể hoá 4',
    pField5: 'Thông tin cá thể hoá 5',
    pField6: 'Thông tin cá thể hoá 6'
  },
  {
    headers: {
      'X-Api-Key': '{{api_key}}',
      'Content-Type': 'application/json'
    }
  }
)

console.log(response.data)
import requests

response = requests.post(
  '{{baseUrl}}/v1/voice/callbot',
  json={
    'phone': '0123456789',
    'campaignId': 1,
    'transactionId': 'TXN_004',
    'socketUrl': 'wss://bot.vendor.com/ws/voice?api_key=xxx',
    'name': 'John Doe',
    'email': 'john.doe@example.com',
    'address': '123 Main St, Anytown, USA',
    'pField1': 'Thông tin cá thể hoá 1',
    'pField2': 'Thông tin cá thể hoá 2',
    'pField3': 'Thông tin cá thể hoá 3',
    'pField4': 'Thông tin cá thể hoá 4',
    'pField5': 'Thông tin cá thể hoá 5',
    'pField6': 'Thông tin cá thể hoá 6'
  },
  headers={
    'X-Api-Key': '{{api_key}}',
    'Content-Type': 'application/json'
  }
)

print(response.json())

Request fields

Field

Type

Required

Description

phone

string

Yes

Caller’s phone number

campaignId

int

Yes

ID of the campaign configured on the Gateway

transactionId

string

Yes

Vendor’s transaction ID, used for reconciliation

socketUrl

string

Yes

Bot’s WebSocket URL — the Gateway will connect to it after the caller picks up

name

string

No

Customer name

email

string

No

Email

address

string

No

Address

pField1pField6

string

No

Personalization fields, to be forwarded into metadata.customwhen the Gateway opens WebSocket to the bot

3.4 Response

{
  "error_code": "success",
  "message": "OK"
}

Response semantics:HTTP 200 + error_code = "success"→ The Gateway has received the request and will dial. Other values error_codeare errors (wrong api_key, missing field, campaign does not exist, …).


04. WebSocket Connection

The Gateway actively opens a WebSocket to socketUrlprovided by the vendor during the call initiation step.

4.1 URL & Authentication

The Gateway will connect to socketUrlprovided by the vendor in the API request. The URL must include api_keyas a query param:

wss://bot.vendor.com/ws/voice?api_key=<KEY>

Bot verifies api_keyin the handshake:If incorrect → close WebSocket with close code 1008(policy violation).

4.2 Technical Requirements

Requirement

Value

Protocol

WebSocket (RFC 6455)

Scheme

wss://— TLS required for production

Message format

JSON, text frames, UTF-8

Audio transport

Base64 in field media.payload

4.3 Timeouts

Timeout

Default value

Connect timeout

5 seconds

Idle timeout (no media)

30 seconds

Max session duration

900 seconds

End of call:When the call ends, the Gateway sends event stopthen closes WS with close code 1000(normal). Vendor does not need to reconnect.


05. Audio Format

Fixed audio format in v1. Both the Gateway and bot must use this format correctly.

Property

Value

Codec

PCM signed 16-bit little-endian (pcm_s16le)

Sample rate

8000 Hz

Channels

1 (mono)

Frame size

20ms / chunk (160 samples = 320 bytes)

Transport

Base64 string in JSON


06. Events — Gateway → Bot

Every message is JSON UTF-8 sent via WebSocket text frame.

6.1 connected

GATEWAY → BOT— Sent immediately after WebSocket handshake is successful.

{
  "event": "connected",
  "protocol": "voice_stream",
  "version": "1.0"
}

6.2 start

GATEWAY → BOT— Sent once after connected. Contains call metadata. The vendor should cache this information throughout the session.

{
  "event": "start",
  "sequence_number": 1,
  "start": {
    "stream_sid": "MZxxxxxxxxxxxxxxxxx",
    "call_sid": "call-abc123",
    "media_format": {
      "encoding": "pcm_s16le",
      "sample_rate": 8000,
      "channels": 1
    },
    "metadata": {
      "phone_number": "0900000000",
      "direction": "outbound",
      "custom": {
        "key1": "value1",
        "key2": "value2"
      }
    }
  }
}

Field

Type

Description

stream_sid

string

Unique ID of the WebSocket stream

call_sid

string

Call ID, used for logging / tracing

media_format

object

Always PCM 8kHz mono s16le in v1

metadata.phone_number

string

Caller’s phone number

metadata.direction

string

outbound/ inbound

metadata.custom

object

Dynamic fields depending on each bot's configuration

metadata.custom:Dynamic fields agreed upon by the vendor and Gateway prior to integration. The schema is defined when registering the bot (e.g., mapping from pField1…pField6in REST API).

6.3 media

GATEWAY → BOT— Audio from caller streams to the bot. Sent continuously ~20ms / chunk.

{
  "event": "media",
  "sequence_number": 42,
  "media": {
    "track": "inbound",
    "chunk": 41,
    "timestamp": 1776326027630,
    "payload": "<base64_pcm_data>"
  }
}

Field

Type

Description

track

string

Always "inbound"

chunk

int

Chunk order number, starting from 0

timestamp

int

Unix ms

payload

string

Base64 of raw PCM bytes (320 bytes / chunk)

6.4 mark

GATEWAY → BOT— Echo back the marker that the bot sent. The Gateway sends markback to the bot when the corresponding audio has finished playing for the caller.

{
  "event": "mark",
  "sequence_number": 80,
  "mark": {
    "name": "greeting_done"
  }
}

6.5 stop

GATEWAY → BOT— Call ended. The Gateway sends then closes WS.

{
  "event": "stop",
  "sequence_number": 999,
  "stop": {
    "reason": "caller_hangup",
    "call_sid": "call-abc123"
  }
}

reason

Description

caller_hangup

Caller hangs up

ai_hangup

Bot requests to stop

transferred

Call has been successfully transferred

timeout

Idle timeout

error

System error


07. Events — Bot → Gateway

The bot sends audio response, marker, transfer command, or stop to the Gateway.

7.1 media

BOT → GATEWAY— Audio response the bot plays for the caller.

{
  "event": "media",
  "media": {
    "payload": "<base64_pcm_data>"
  }
}
  • Must be PCM 8kHz mono s16le (matching start.media_format)

  • Chunk size should be 20–100ms to reduce latency

  • No need for chunk/ timestamp— The Gateway sequences automatically

7.2 mark

BOT → GATEWAY— Set checkpoint. The Gateway will echo back when the previous audio has finished playing for the caller.

{
  "event": "mark",
  "mark": {
    "name": "question_1_done"
  }
}

The bot uses this to know when the caller has finished listening to the audio, thus activating ASR to listen for responses.

7.3 transfer

BOT → GATEWAY— Transfer the call to an agent or queue.

{
  "event": "transfer",
  "transfer": {
    "target": "agent_extension_or_queue",
    "context": "default",
    "on_complete": "hangup_bot"
  }
}

Field

Type

Description

target

string

Extension / queue ID / phone number

context

string

Routing context, received from Gateway during registration

on_complete

string

hangup_bot(close WS bot) / keep_alive(keep WS)

After receiving transfer, the Gateway will:

  1. Play the remaining audio in the buffer (drain)

  2. Perform transfer

  3. Send stopwith reason transferredto the bot

  4. Close WS

7.4 stop

BOT → GATEWAY— Active call end from the bot.

{
  "event": "stop",
  "stop": {
    "reason": "conversation_complete"
  }
}

The Gateway will play the remaining audio in the buffer then end the call.


08. Contact

Contact the Alohub integration teamwhen support is needed:

  • Review implementation:Sanity check spec and event processing logic before going to production.

  • Configuration api_key& URL:On staging / production environment for each bot.

  • Debug real calls:Need to provide call_sidor transactionIdto trace logs.

Was this article helpful?
Updated: 6/2/2026
để chuyển bài