Files
helix-engage/docs/superpowers/specs/2026-03-21-live-call-assist-design.md
saridsa2 3064eeb444 feat: CC agent features, live call assist, worklist redesign, brand tokens
CC Agent:
- Call transfer (CONFERENCE + KICK_CALL) with inline transfer dialog
- Recording pause/resume during active calls
- Missed calls API (Ozonetel abandonCalls)
- Call history API (Ozonetel fetchCDRDetails)

Live Call Assist:
- Deepgram Nova STT via raw WebSocket
- OpenAI suggestions every 10s with lead context
- LiveTranscript component in sidebar during calls
- Browser audio capture from remote WebRTC stream

Worklist:
- Redesigned table: clickable phones, context menu (Call/SMS/WhatsApp)
- Last interaction sub-line, source column, improved SLA
- Filtered out rows without phone numbers
- New missed call notifications

Brand:
- Logo on login page
- Blue scale rebuilt from logo blue rgb(32, 96, 160)
- FontAwesome duotone CSS variables set globally
- Profile menu icons switched to duotone

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 10:36:10 +05:30

174 lines
7.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Live Call Assist — Design Spec
## Problem
CC agents have no real-time intelligence during calls. The AI sidebar shows a static pre-call summary and a chat interface that requires manual typing — useless when the agent is on the phone. The agent has to remember lead history, doctor availability, and past interactions from memory.
## Solution
Stream the call's remote audio (customer voice) to the sidecar, transcribe via Deepgram Nova, and every 10 seconds feed the accumulated transcript + full lead context to OpenAI for real-time suggestions. Display a scrolling transcript with AI suggestion cards in the sidebar.
## Architecture
```
Browser (WebRTC call)
├─ Remote audio track (customer) ──► AudioWorklet (PCM 16-bit, 16kHz)
│ │
│ WebSocket to sidecar
│ │
│ ┌──────────▼──────────┐
│ │ Sidecar Gateway │
│ │ ws://api/call-assist│
│ └──────────┬──────────┘
│ │
│ ┌──────────────┼──────────────┐
│ ▼ ▼
│ Deepgram Nova WS Every 10s: OpenAI
│ (audio → text) (transcript + context
│ → suggestions)
│ │ │
│ ▼ ▼
│ Transcript lines AI suggestion cards
│ │ │
│ └──────────────┬──────────────┘
│ │
│ WebSocket to browser
│ │
└─────────────────────────────────────────▼
AI Sidebar
(transcript + suggestions)
```
## Components
### 1. Browser: Audio capture + WebSocket client
**Audio capture**: When call becomes `active`, grab the remote audio track from the peer connection. Use an `AudioWorklet` to downsample to 16-bit PCM at 16kHz (Deepgram's preferred format). Send raw audio chunks (~100ms each) over WebSocket.
**WebSocket client**: Connects to `wss://engage-api.srv1477139.hstgr.cloud/api/call-assist`. Sends:
- Initial message: `{ type: "start", ucid, leadId, callerPhone }`
- Audio chunks: binary PCM data
- End: `{ type: "stop" }`
Receives:
- `{ type: "transcript", text: "...", isFinal: boolean }` — real-time transcript lines
- `{ type: "suggestion", text: "...", action?: "book_appointment" | "transfer" }` — AI suggestions
- `{ type: "context_loaded", leadName: "...", summary: "..." }` — confirmation that lead context was loaded
### 2. Sidecar: WebSocket Gateway
**NestJS WebSocket Gateway** at `/api/call-assist`. On connection:
1. Receives `start` message with `ucid`, `leadId`, `callerPhone`
2. Loads lead context from platform: lead details, past calls, appointments, doctors, follow-ups
3. Opens Deepgram Nova WebSocket (`wss://api.deepgram.com/v1/listen`)
4. Pipes incoming audio chunks to Deepgram
5. Deepgram returns transcript chunks — forwards to browser
6. Every 10 seconds, sends accumulated transcript + lead context to OpenAI `gpt-4o-mini` for suggestions
7. Returns suggestions to browser
**System prompt for OpenAI** (loaded once with lead context):
```
You are a real-time call assistant for Global Hospital Bangalore.
You listen to the conversation and provide brief, actionable suggestions.
CALLER CONTEXT:
- Name: {leadName}
- Phone: {phone}
- Source: {source} ({campaign})
- Previous calls: {callCount} (last: {lastCallDate}, disposition: {lastDisposition})
- Appointments: {appointmentHistory}
- Interested in: {interestedService}
- AI Summary: {aiSummary}
AVAILABLE RESOURCES:
- Doctors: {doctorList with departments and clinics}
- Next available slots: {availableSlots}
RULES:
- Keep suggestions under 2 sentences
- Focus on actionable next steps
- If customer mentions a doctor/department, show available slots
- If customer wants to cancel, note the appointment ID
- Flag if customer sounds upset or mentions a complaint
- Do NOT repeat information the agent already said
```
**OpenAI call** (every 10 seconds):
```typescript
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{ role: 'system', content: systemPrompt },
{ role: 'user', content: `Conversation so far:\n${transcript}\n\nProvide a brief suggestion for the agent.` },
],
max_tokens: 150,
});
```
### 3. Frontend: Live transcript sidebar
Replace the AI chat tab content during active calls with a live transcript view:
- Scrolling transcript with timestamps
- Customer lines in one color, suggestions in a highlighted card
- Auto-scroll to bottom as new lines arrive
- Suggestions appear as colored cards between transcript lines
- When call ends, transcript stays visible for reference during disposition
### 4. Context loading
On `start` message, the sidecar queries the platform for:
```graphql
# Lead details
{ leads(filter: { id: { eq: "{leadId}" } }) { edges { node { ... } } } }
# Past appointments
{ appointments(filter: { patientId: { eq: "{leadId}" } }) { edges { node { ... } } } }
# Doctors
{ doctors(first: 20) { edges { node { id fullName department clinic } } } }
```
This context is loaded once and injected into the system prompt. No mid-call refresh needed.
## File structure
### Sidecar (helix-engage-server)
| File | Responsibility |
|------|---------------|
| `src/call-assist/call-assist.gateway.ts` | WebSocket gateway — handles audio streaming, Deepgram connection, OpenAI calls |
| `src/call-assist/call-assist.module.ts` | Module registration |
| `src/call-assist/call-assist.service.ts` | Context loading from platform, OpenAI prompt building |
### Frontend (helix-engage)
| File | Responsibility |
|------|---------------|
| `src/lib/audio-capture.ts` | AudioWorklet to capture + downsample remote audio track |
| `src/hooks/use-call-assist.ts` | WebSocket connection to sidecar, manages transcript + suggestion state |
| `src/components/call-desk/live-transcript.tsx` | Scrolling transcript + suggestion cards UI |
| `src/components/call-desk/context-panel.tsx` | Modify: show LiveTranscript instead of AiChatPanel during active calls |
| `src/pages/call-desk.tsx` | Modify: remove CallPrepCard during active calls |
## Dependencies
- **Deepgram SDK**: `@deepgram/sdk` in sidecar (or raw WebSocket)
- **DEEPGRAM_API_KEY**: environment variable in sidecar
- **AudioWorklet**: browser API, no dependencies (supported in all modern browsers)
- **OpenAI**: already configured in sidecar (`gpt-4o-mini`)
## Cost estimate
Per 5-minute call:
- Deepgram Nova: ~$0.02 (at $0.0043/min)
- OpenAI gpt-4o-mini: ~$0.005 (30 calls × ~500 tokens each)
- Total: ~$0.025 per call (~₹2)
## Out of scope
- Agent mic transcription (only customer audio for now — agent's words are visible in the AI suggestions context)
- Voice response from AI (text only)
- Persistent transcript storage (future: save to Call record after call ends)
- Multi-language support (English only for now)