# Live Call Assist — Design Spec ## Problem CC agents have no real-time intelligence during calls. The AI sidebar shows a static pre-call summary and a chat interface that requires manual typing — useless when the agent is on the phone. The agent has to remember lead history, doctor availability, and past interactions from memory. ## Solution Stream the call's remote audio (customer voice) to the sidecar, transcribe via Deepgram Nova, and every 10 seconds feed the accumulated transcript + full lead context to OpenAI for real-time suggestions. Display a scrolling transcript with AI suggestion cards in the sidebar. ## Architecture ``` Browser (WebRTC call) │ ├─ Remote audio track (customer) ──► AudioWorklet (PCM 16-bit, 16kHz) │ │ │ WebSocket to sidecar │ │ │ ┌──────────▼──────────┐ │ │ Sidecar Gateway │ │ │ ws://api/call-assist│ │ └──────────┬──────────┘ │ │ │ ┌──────────────┼──────────────┐ │ ▼ ▼ │ Deepgram Nova WS Every 10s: OpenAI │ (audio → text) (transcript + context │ → suggestions) │ │ │ │ ▼ ▼ │ Transcript lines AI suggestion cards │ │ │ │ └──────────────┬──────────────┘ │ │ │ WebSocket to browser │ │ └─────────────────────────────────────────▼ AI Sidebar (transcript + suggestions) ``` ## Components ### 1. Browser: Audio capture + WebSocket client **Audio capture**: When call becomes `active`, grab the remote audio track from the peer connection. Use an `AudioWorklet` to downsample to 16-bit PCM at 16kHz (Deepgram's preferred format). Send raw audio chunks (~100ms each) over WebSocket. **WebSocket client**: Connects to `wss://engage-api.srv1477139.hstgr.cloud/api/call-assist`. Sends: - Initial message: `{ type: "start", ucid, leadId, callerPhone }` - Audio chunks: binary PCM data - End: `{ type: "stop" }` Receives: - `{ type: "transcript", text: "...", isFinal: boolean }` — real-time transcript lines - `{ type: "suggestion", text: "...", action?: "book_appointment" | "transfer" }` — AI suggestions - `{ type: "context_loaded", leadName: "...", summary: "..." }` — confirmation that lead context was loaded ### 2. Sidecar: WebSocket Gateway **NestJS WebSocket Gateway** at `/api/call-assist`. On connection: 1. Receives `start` message with `ucid`, `leadId`, `callerPhone` 2. Loads lead context from platform: lead details, past calls, appointments, doctors, follow-ups 3. Opens Deepgram Nova WebSocket (`wss://api.deepgram.com/v1/listen`) 4. Pipes incoming audio chunks to Deepgram 5. Deepgram returns transcript chunks — forwards to browser 6. Every 10 seconds, sends accumulated transcript + lead context to OpenAI `gpt-4o-mini` for suggestions 7. Returns suggestions to browser **System prompt for OpenAI** (loaded once with lead context): ``` You are a real-time call assistant for Global Hospital Bangalore. You listen to the conversation and provide brief, actionable suggestions. CALLER CONTEXT: - Name: {leadName} - Phone: {phone} - Source: {source} ({campaign}) - Previous calls: {callCount} (last: {lastCallDate}, disposition: {lastDisposition}) - Appointments: {appointmentHistory} - Interested in: {interestedService} - AI Summary: {aiSummary} AVAILABLE RESOURCES: - Doctors: {doctorList with departments and clinics} - Next available slots: {availableSlots} RULES: - Keep suggestions under 2 sentences - Focus on actionable next steps - If customer mentions a doctor/department, show available slots - If customer wants to cancel, note the appointment ID - Flag if customer sounds upset or mentions a complaint - Do NOT repeat information the agent already said ``` **OpenAI call** (every 10 seconds): ```typescript const response = await openai.chat.completions.create({ model: 'gpt-4o-mini', messages: [ { role: 'system', content: systemPrompt }, { role: 'user', content: `Conversation so far:\n${transcript}\n\nProvide a brief suggestion for the agent.` }, ], max_tokens: 150, }); ``` ### 3. Frontend: Live transcript sidebar Replace the AI chat tab content during active calls with a live transcript view: - Scrolling transcript with timestamps - Customer lines in one color, suggestions in a highlighted card - Auto-scroll to bottom as new lines arrive - Suggestions appear as colored cards between transcript lines - When call ends, transcript stays visible for reference during disposition ### 4. Context loading On `start` message, the sidecar queries the platform for: ```graphql # Lead details { leads(filter: { id: { eq: "{leadId}" } }) { edges { node { ... } } } } # Past appointments { appointments(filter: { patientId: { eq: "{leadId}" } }) { edges { node { ... } } } } # Doctors { doctors(first: 20) { edges { node { id fullName department clinic } } } } ``` This context is loaded once and injected into the system prompt. No mid-call refresh needed. ## File structure ### Sidecar (helix-engage-server) | File | Responsibility | |------|---------------| | `src/call-assist/call-assist.gateway.ts` | WebSocket gateway — handles audio streaming, Deepgram connection, OpenAI calls | | `src/call-assist/call-assist.module.ts` | Module registration | | `src/call-assist/call-assist.service.ts` | Context loading from platform, OpenAI prompt building | ### Frontend (helix-engage) | File | Responsibility | |------|---------------| | `src/lib/audio-capture.ts` | AudioWorklet to capture + downsample remote audio track | | `src/hooks/use-call-assist.ts` | WebSocket connection to sidecar, manages transcript + suggestion state | | `src/components/call-desk/live-transcript.tsx` | Scrolling transcript + suggestion cards UI | | `src/components/call-desk/context-panel.tsx` | Modify: show LiveTranscript instead of AiChatPanel during active calls | | `src/pages/call-desk.tsx` | Modify: remove CallPrepCard during active calls | ## Dependencies - **Deepgram SDK**: `@deepgram/sdk` in sidecar (or raw WebSocket) - **DEEPGRAM_API_KEY**: environment variable in sidecar - **AudioWorklet**: browser API, no dependencies (supported in all modern browsers) - **OpenAI**: already configured in sidecar (`gpt-4o-mini`) ## Cost estimate Per 5-minute call: - Deepgram Nova: ~$0.02 (at $0.0043/min) - OpenAI gpt-4o-mini: ~$0.005 (30 calls × ~500 tokens each) - Total: ~$0.025 per call (~₹2) ## Out of scope - Agent mic transcription (only customer audio for now — agent's words are visible in the AI suggestions context) - Voice response from AI (text only) - Persistent transcript storage (future: save to Call record after call ends) - Multi-language support (English only for now)