Voice AITechnology
The technology Patent pending · India

Built where voice AI actually breaks.

The Indian phone call — 8 kHz, packet loss, Hinglish, accents the training data never saw — breaks most voice stacks. We rebuilt the stack around it.

Patent pending

Method & system for low-latency, code-switched conversational AI over low-bitrate telephony.

Filed with the Indian Patent Office. Covers end-to-end: the audio-native inference path, the code-switch recognition head, and the telephony-first acoustic conditioning that makes it work on an 8 kHz line.

Application filed · 2026Indian Patent OfficeMethod claimSystem claim
Audio-native
Inference
No STT → LLM → TTS relay. A single audio-in / audio-out model.
Code-switch
Detection
Mid-utterance language change resolved in the acoustic pathway.
Telephony
Conditioning
Robust to 8 kHz codecs, packet loss, jitter — not studio audio.
Memory
Compression
Tiered RAG fits 10k-document KBs in a 700ms* budget.
The stack

Six layers. Each one rebuilt.

Where off-the-shelf pieces lost latency, fidelity, or compliance footing — we built the layer ourselves. This is the full path from the PSTN trunk to the agent response.

LAYER
L1

Edge telephony

PSTN → SIP → our edge
Carrier-grade SIP ingress in Mumbai
Regulated India-only routing
DID + outbound CLI pools
Codec: G.711/G.722/OPUS
LAYER
L2

Real-time audio pipe

Packet loss-tolerant streaming
VAD + barge-in handling
Jitter buffer + forward error correction
Acoustic echo cancellation
Sub-50ms transport overhead
LAYER
L3

Audio-native inference

Patent core
The patent-pending core
Single-model audio → audio
22-language code-switch head
Prosody-aware response shaping
Streaming generation with early barge-in
LAYER
L4

Knowledge & memory

Tiered retrieval in a latency budget
In-model short-term memory
Hot cache: call-turn context
Tiered RAG: summaries → chunks
Vector + BM25 hybrid retrieval
LAYER
L5

Orchestration

Agent logic, tools, fallbacks
Tool-calling with audit trail
Warm human handoff
CRM + LMS + WhatsApp webhooks
Compliance guardrails at runtime
LAYER
L6

Observability & trust

Logged, replayable, auditable
Every call transcribed + logged
Forensic export (7-year retention)
DPDP-compliant encryption at rest
Real-time QA + drift monitoring
What we invented

Six problems we had to solve ourselves.

None of these were off-the-shelf. Each one is either patented, in patent process, or a production-only technique you cannot buy.

01
Architecture

Audio-native inference on the phone

Most voice AIs stitch three models: speech-to-text, language model, text-to-speech. Each hop adds latency and discards paralinguistic signal (hesitation, laughter, urgency). Our model reasons directly in the audio domain — so the response carries the tone of the input, not a reconstruction of it.

02
Linguistics

Code-switching in the acoustic path

Detecting "mujhe EMI ka balance chahiye" is not a translation problem — it is an acoustic one. The language signal lives in prosody and phoneme patterns, not punctuation. We handle code-switches inside the acoustic encoder, not after transcription.

03
Training

Telephony-first conditioning

Open-source voice models are trained on studio audio. Phones send 8 kHz, lossy, jitter-ridden audio over cellular. We fine-tuned on >2M hours of real Indian phone calls — so the model is robust to exactly the conditions it will meet in production.

04
Retrieval

Tiered RAG inside the latency budget

A 10,000-document knowledge base cannot be searched inside a 700ms budget the naive way. We use summary-first retrieval: the model queries compressed chunk summaries, then pulls full chunks only on demand. Average retrieval overhead: under 90ms.

05
Real-time

Interruptible generation with barge-in

A human says "stop, stop" — the AI must hear it and actually stop talking. We stream generation in sub-100ms frames with a side-channel VAD on the caller. The moment the caller speaks, the output halts and new audio enters the context.

06
Compliance

Compliance guardrails at runtime

Regulatory constraints (RBI Fair Practices, DPDP, DND, time-of-day rules) enforce at the dispatch layer — not through a post-hoc audit. A call to a blocklisted number or after 8pm never makes it to the dialer, by policy.

Numbers

Measured, not marketed.

All numbers from production traffic or public benchmark methodology. We publish the methodology on request.

700ms*
first-word response
p95, last 30 days
2.2s
cascaded stack baseline
Industry typical
2M+
hours training audio
Indian phone calls
22
languages in production
+ Hinglish code-switch
8 kHz
phone-native
Not studio audio
60s
signup to first call
End-to-end
Sovereignty

Indian data. Indian infrastructure. Indian accountability.

Enterprises regulated by the RBI, IRDAI, or SEBI cannot send customer conversation data to foreign inference endpoints. Neither can DPDP-compliant operators. We built for that reality from day one.

🇮🇳
India-hosted
Production traffic runs on Indian regions. No traffic leaves the country for inference by default.
⚙︎
No third-party models for core
The inference core is our model, served from our infrastructure. No US-based LLM vendor in the audio path.
🔒
Private deployment path
Enterprise customers can host the inference core in a dedicated VPC with full cryptographic control of voice data.
📜
DPDP-ready, SOC 2 partner
Full data-processing agreement available. Grievance Officer + DPO appointed and publicly listed.

Live in 5 minutes. Really.

Sign up, describe your agent in one sentence, test-call your own phone. If you're not live inside 5 minutes, we'll credit your account.

₹100 credits · no card · 100 min free · OTP signup