Files
helix-engage/docs/superpowers/specs/2026-03-21-live-call-assist-design.md
saridsa2 3064eeb444 feat: CC agent features, live call assist, worklist redesign, brand tokens
CC Agent:
- Call transfer (CONFERENCE + KICK_CALL) with inline transfer dialog
- Recording pause/resume during active calls
- Missed calls API (Ozonetel abandonCalls)
- Call history API (Ozonetel fetchCDRDetails)

Live Call Assist:
- Deepgram Nova STT via raw WebSocket
- OpenAI suggestions every 10s with lead context
- LiveTranscript component in sidebar during calls
- Browser audio capture from remote WebRTC stream

Worklist:
- Redesigned table: clickable phones, context menu (Call/SMS/WhatsApp)
- Last interaction sub-line, source column, improved SLA
- Filtered out rows without phone numbers
- New missed call notifications

Brand:
- Logo on login page
- Blue scale rebuilt from logo blue rgb(32, 96, 160)
- FontAwesome duotone CSS variables set globally
- Profile menu icons switched to duotone

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 10:36:10 +05:30

7.6 KiB
Raw Blame History

Live Call Assist — Design Spec

Problem

CC agents have no real-time intelligence during calls. The AI sidebar shows a static pre-call summary and a chat interface that requires manual typing — useless when the agent is on the phone. The agent has to remember lead history, doctor availability, and past interactions from memory.

Solution

Stream the call's remote audio (customer voice) to the sidecar, transcribe via Deepgram Nova, and every 10 seconds feed the accumulated transcript + full lead context to OpenAI for real-time suggestions. Display a scrolling transcript with AI suggestion cards in the sidebar.

Architecture

Browser (WebRTC call)
  │
  ├─ Remote audio track (customer) ──► AudioWorklet (PCM 16-bit, 16kHz)
  │                                         │
  │                                    WebSocket to sidecar
  │                                         │
  │                              ┌──────────▼──────────┐
  │                              │  Sidecar Gateway     │
  │                              │  ws://api/call-assist│
  │                              └──────────┬──────────┘
  │                                         │
  │                          ┌──────────────┼──────────────┐
  │                          ▼                             ▼
  │                   Deepgram Nova WS              Every 10s: OpenAI
  │                   (audio → text)               (transcript + context
  │                                                 → suggestions)
  │                          │                             │
  │                          ▼                             ▼
  │                   Transcript lines              AI suggestion cards
  │                          │                             │
  │                          └──────────────┬──────────────┘
  │                                         │
  │                                    WebSocket to browser
  │                                         │
  └─────────────────────────────────────────▼
                                      AI Sidebar
                                   (transcript + suggestions)

Components

1. Browser: Audio capture + WebSocket client

Audio capture: When call becomes active, grab the remote audio track from the peer connection. Use an AudioWorklet to downsample to 16-bit PCM at 16kHz (Deepgram's preferred format). Send raw audio chunks (~100ms each) over WebSocket.

WebSocket client: Connects to wss://engage-api.srv1477139.hstgr.cloud/api/call-assist. Sends:

  • Initial message: { type: "start", ucid, leadId, callerPhone }
  • Audio chunks: binary PCM data
  • End: { type: "stop" }

Receives:

  • { type: "transcript", text: "...", isFinal: boolean } — real-time transcript lines
  • { type: "suggestion", text: "...", action?: "book_appointment" | "transfer" } — AI suggestions
  • { type: "context_loaded", leadName: "...", summary: "..." } — confirmation that lead context was loaded

2. Sidecar: WebSocket Gateway

NestJS WebSocket Gateway at /api/call-assist. On connection:

  1. Receives start message with ucid, leadId, callerPhone
  2. Loads lead context from platform: lead details, past calls, appointments, doctors, follow-ups
  3. Opens Deepgram Nova WebSocket (wss://api.deepgram.com/v1/listen)
  4. Pipes incoming audio chunks to Deepgram
  5. Deepgram returns transcript chunks — forwards to browser
  6. Every 10 seconds, sends accumulated transcript + lead context to OpenAI gpt-4o-mini for suggestions
  7. Returns suggestions to browser

System prompt for OpenAI (loaded once with lead context):

You are a real-time call assistant for Global Hospital Bangalore.
You listen to the conversation and provide brief, actionable suggestions.

CALLER CONTEXT:
- Name: {leadName}
- Phone: {phone}
- Source: {source} ({campaign})
- Previous calls: {callCount} (last: {lastCallDate}, disposition: {lastDisposition})
- Appointments: {appointmentHistory}
- Interested in: {interestedService}
- AI Summary: {aiSummary}

AVAILABLE RESOURCES:
- Doctors: {doctorList with departments and clinics}
- Next available slots: {availableSlots}

RULES:
- Keep suggestions under 2 sentences
- Focus on actionable next steps
- If customer mentions a doctor/department, show available slots
- If customer wants to cancel, note the appointment ID
- Flag if customer sounds upset or mentions a complaint
- Do NOT repeat information the agent already said

OpenAI call (every 10 seconds):

const response = await openai.chat.completions.create({
    model: 'gpt-4o-mini',
    messages: [
        { role: 'system', content: systemPrompt },
        { role: 'user', content: `Conversation so far:\n${transcript}\n\nProvide a brief suggestion for the agent.` },
    ],
    max_tokens: 150,
});

3. Frontend: Live transcript sidebar

Replace the AI chat tab content during active calls with a live transcript view:

  • Scrolling transcript with timestamps
  • Customer lines in one color, suggestions in a highlighted card
  • Auto-scroll to bottom as new lines arrive
  • Suggestions appear as colored cards between transcript lines
  • When call ends, transcript stays visible for reference during disposition

4. Context loading

On start message, the sidecar queries the platform for:

# Lead details
{ leads(filter: { id: { eq: "{leadId}" } }) { edges { node { ... } } } }

# Past appointments
{ appointments(filter: { patientId: { eq: "{leadId}" } }) { edges { node { ... } } } }

# Doctors
{ doctors(first: 20) { edges { node { id fullName department clinic } } } }

This context is loaded once and injected into the system prompt. No mid-call refresh needed.

File structure

Sidecar (helix-engage-server)

File Responsibility
src/call-assist/call-assist.gateway.ts WebSocket gateway — handles audio streaming, Deepgram connection, OpenAI calls
src/call-assist/call-assist.module.ts Module registration
src/call-assist/call-assist.service.ts Context loading from platform, OpenAI prompt building

Frontend (helix-engage)

File Responsibility
src/lib/audio-capture.ts AudioWorklet to capture + downsample remote audio track
src/hooks/use-call-assist.ts WebSocket connection to sidecar, manages transcript + suggestion state
src/components/call-desk/live-transcript.tsx Scrolling transcript + suggestion cards UI
src/components/call-desk/context-panel.tsx Modify: show LiveTranscript instead of AiChatPanel during active calls
src/pages/call-desk.tsx Modify: remove CallPrepCard during active calls

Dependencies

  • Deepgram SDK: @deepgram/sdk in sidecar (or raw WebSocket)
  • DEEPGRAM_API_KEY: environment variable in sidecar
  • AudioWorklet: browser API, no dependencies (supported in all modern browsers)
  • OpenAI: already configured in sidecar (gpt-4o-mini)

Cost estimate

Per 5-minute call:

  • Deepgram Nova: ~$0.02 (at $0.0043/min)
  • OpenAI gpt-4o-mini: ~$0.005 (30 calls × ~500 tokens each)
  • Total: $0.025 per call (₹2)

Out of scope

  • Agent mic transcription (only customer audio for now — agent's words are visible in the AI suggestions context)
  • Voice response from AI (text only)
  • Persistent transcript storage (future: save to Call record after call ends)
  • Multi-language support (English only for now)