The technology Patent pending · India

Built where voice AI actually breaks.

The Indian phone call — 8 kHz, packet loss, Hinglish, accents the training data never saw — breaks most voice stacks. We rebuilt the stack around it.

Patent pending

Method & system for low-latency, code-switched conversational AI over low-bitrate telephony.

Filed with the Indian Patent Office. Covers end-to-end: the audio-native inference path, the code-switch recognition head, and the telephony-first acoustic conditioning that makes it work on an 8 kHz line.

Application filed · 2026Indian Patent OfficeMethod claimSystem claim

Audio-native

Inference

No STT → LLM → TTS relay. A single audio-in / audio-out model.

Code-switch

Detection

Mid-utterance language change resolved in the acoustic pathway.

Telephony

Conditioning

Robust to 8 kHz codecs, packet loss, jitter — not studio audio.

Memory

Compression

Tiered RAG fits 10k-document KBs in a 700ms* budget.

The stack

Six layers. Each one rebuilt.

Where off-the-shelf pieces lost latency, fidelity, or compliance footing — we built the layer ourselves. This is the full path from the PSTN trunk to the agent response.

LAYER

Edge telephony

PSTN → SIP → our edge

Carrier-grade SIP ingress in Mumbai

Regulated India-only routing

DID + outbound CLI pools

Codec: G.711/G.722/OPUS

LAYER

Real-time audio pipe

Packet loss-tolerant streaming

VAD + barge-in handling

Jitter buffer + forward error correction

Acoustic echo cancellation

Sub-50ms transport overhead

LAYER

Audio-native inference

Patent core

The patent-pending core

Single-model audio → audio

22-language code-switch head

Prosody-aware response shaping

Streaming generation with early barge-in

LAYER

Knowledge & memory

Tiered retrieval in a latency budget

In-model short-term memory

Hot cache: call-turn context

Tiered RAG: summaries → chunks

Vector + BM25 hybrid retrieval

LAYER

Orchestration

Agent logic, tools, fallbacks

Tool-calling with audit trail

Warm human handoff

CRM + LMS + WhatsApp webhooks

Compliance guardrails at runtime

LAYER

Observability & trust

Logged, replayable, auditable

Every call transcribed + logged

Forensic export (7-year retention)

DPDP-compliant encryption at rest

Real-time QA + drift monitoring

What we invented

Six problems we had to solve ourselves.

None of these were off-the-shelf. Each one is either patented, in patent process, or a production-only technique you cannot buy.

Architecture

Audio-native inference on the phone

Most voice AIs stitch three models: speech-to-text, language model, text-to-speech. Each hop adds latency and discards paralinguistic signal (hesitation, laughter, urgency). Our model reasons directly in the audio domain — so the response carries the tone of the input, not a reconstruction of it.

Linguistics

Code-switching in the acoustic path

Detecting "mujhe EMI ka balance chahiye" is not a translation problem — it is an acoustic one. The language signal lives in prosody and phoneme patterns, not punctuation. We handle code-switches inside the acoustic encoder, not after transcription.

Training

Telephony-first conditioning

Open-source voice models are trained on studio audio. Phones send 8 kHz, lossy, jitter-ridden audio over cellular. We fine-tuned on >2M hours of real Indian phone calls — so the model is robust to exactly the conditions it will meet in production.

Retrieval

Tiered RAG inside the latency budget

A 10,000-document knowledge base cannot be searched inside a 700ms budget the naive way. We use summary-first retrieval: the model queries compressed chunk summaries, then pulls full chunks only on demand. Average retrieval overhead: under 90ms.

Real-time

Interruptible generation with barge-in

A human says "stop, stop" — the AI must hear it and actually stop talking. We stream generation in sub-100ms frames with a side-channel VAD on the caller. The moment the caller speaks, the output halts and new audio enters the context.

Compliance

Compliance guardrails at runtime

Regulatory constraints (RBI Fair Practices, DPDP, DND, time-of-day rules) enforce at the dispatch layer — not through a post-hoc audit. A call to a blocklisted number or after 8pm never makes it to the dialer, by policy.

Numbers

Measured, not marketed.

All numbers from production traffic or public benchmark methodology. We publish the methodology on request.

700ms*

first-word response

p95, last 30 days

2.2s

cascaded stack baseline

Industry typical

2M+

hours training audio

Indian phone calls

languages in production

+ Hinglish code-switch

8 kHz

phone-native

Not studio audio

60s

signup to first call

End-to-end

Sovereignty

Indian data. Indian infrastructure. Indian accountability.

Enterprises regulated by the RBI, IRDAI, or SEBI cannot send customer conversation data to foreign inference endpoints. Neither can DPDP-compliant operators. We built for that reality from day one.

🇮🇳

India-hosted

Production traffic runs on Indian regions. No traffic leaves the country for inference by default.

⚙︎

No third-party models for core

The inference core is our model, served from our infrastructure. No US-based LLM vendor in the audio path.

🔒

Private deployment path

Enterprise customers can host the inference core in a dedicated VPC with full cryptographic control of voice data.

📜

DPDP-ready, SOC 2 partner

Full data-processing agreement available. Grievance Officer + DPO appointed and publicly listed.

Live in 5 minutes. Really.

Sign up, describe your agent in one sentence, test-call your own phone. If you're not live inside 5 minutes, we'll credit your account.

Launch the app Read the architecture

₹100 credits · no card · 100 min free · OTP signup