mirror of
https://dev.azure.com/globalhealthx/EMR/_git/helix-engage
synced 2026-04-11 18:28:15 +00:00
feat: CC agent features, live call assist, worklist redesign, brand tokens
CC Agent: - Call transfer (CONFERENCE + KICK_CALL) with inline transfer dialog - Recording pause/resume during active calls - Missed calls API (Ozonetel abandonCalls) - Call history API (Ozonetel fetchCDRDetails) Live Call Assist: - Deepgram Nova STT via raw WebSocket - OpenAI suggestions every 10s with lead context - LiveTranscript component in sidebar during calls - Browser audio capture from remote WebRTC stream Worklist: - Redesigned table: clickable phones, context menu (Call/SMS/WhatsApp) - Last interaction sub-line, source column, improved SLA - Filtered out rows without phone numbers - New missed call notifications Brand: - Logo on login page - Blue scale rebuilt from logo blue rgb(32, 96, 160) - FontAwesome duotone CSS variables set globally - Profile menu icons switched to duotone Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
173
docs/superpowers/specs/2026-03-21-live-call-assist-design.md
Normal file
173
docs/superpowers/specs/2026-03-21-live-call-assist-design.md
Normal file
@@ -0,0 +1,173 @@
|
||||
# Live Call Assist — Design Spec
|
||||
|
||||
## Problem
|
||||
|
||||
CC agents have no real-time intelligence during calls. The AI sidebar shows a static pre-call summary and a chat interface that requires manual typing — useless when the agent is on the phone. The agent has to remember lead history, doctor availability, and past interactions from memory.
|
||||
|
||||
## Solution
|
||||
|
||||
Stream the call's remote audio (customer voice) to the sidecar, transcribe via Deepgram Nova, and every 10 seconds feed the accumulated transcript + full lead context to OpenAI for real-time suggestions. Display a scrolling transcript with AI suggestion cards in the sidebar.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
Browser (WebRTC call)
|
||||
│
|
||||
├─ Remote audio track (customer) ──► AudioWorklet (PCM 16-bit, 16kHz)
|
||||
│ │
|
||||
│ WebSocket to sidecar
|
||||
│ │
|
||||
│ ┌──────────▼──────────┐
|
||||
│ │ Sidecar Gateway │
|
||||
│ │ ws://api/call-assist│
|
||||
│ └──────────┬──────────┘
|
||||
│ │
|
||||
│ ┌──────────────┼──────────────┐
|
||||
│ ▼ ▼
|
||||
│ Deepgram Nova WS Every 10s: OpenAI
|
||||
│ (audio → text) (transcript + context
|
||||
│ → suggestions)
|
||||
│ │ │
|
||||
│ ▼ ▼
|
||||
│ Transcript lines AI suggestion cards
|
||||
│ │ │
|
||||
│ └──────────────┬──────────────┘
|
||||
│ │
|
||||
│ WebSocket to browser
|
||||
│ │
|
||||
└─────────────────────────────────────────▼
|
||||
AI Sidebar
|
||||
(transcript + suggestions)
|
||||
```
|
||||
|
||||
## Components
|
||||
|
||||
### 1. Browser: Audio capture + WebSocket client
|
||||
|
||||
**Audio capture**: When call becomes `active`, grab the remote audio track from the peer connection. Use an `AudioWorklet` to downsample to 16-bit PCM at 16kHz (Deepgram's preferred format). Send raw audio chunks (~100ms each) over WebSocket.
|
||||
|
||||
**WebSocket client**: Connects to `wss://engage-api.srv1477139.hstgr.cloud/api/call-assist`. Sends:
|
||||
- Initial message: `{ type: "start", ucid, leadId, callerPhone }`
|
||||
- Audio chunks: binary PCM data
|
||||
- End: `{ type: "stop" }`
|
||||
|
||||
Receives:
|
||||
- `{ type: "transcript", text: "...", isFinal: boolean }` — real-time transcript lines
|
||||
- `{ type: "suggestion", text: "...", action?: "book_appointment" | "transfer" }` — AI suggestions
|
||||
- `{ type: "context_loaded", leadName: "...", summary: "..." }` — confirmation that lead context was loaded
|
||||
|
||||
### 2. Sidecar: WebSocket Gateway
|
||||
|
||||
**NestJS WebSocket Gateway** at `/api/call-assist`. On connection:
|
||||
|
||||
1. Receives `start` message with `ucid`, `leadId`, `callerPhone`
|
||||
2. Loads lead context from platform: lead details, past calls, appointments, doctors, follow-ups
|
||||
3. Opens Deepgram Nova WebSocket (`wss://api.deepgram.com/v1/listen`)
|
||||
4. Pipes incoming audio chunks to Deepgram
|
||||
5. Deepgram returns transcript chunks — forwards to browser
|
||||
6. Every 10 seconds, sends accumulated transcript + lead context to OpenAI `gpt-4o-mini` for suggestions
|
||||
7. Returns suggestions to browser
|
||||
|
||||
**System prompt for OpenAI** (loaded once with lead context):
|
||||
```
|
||||
You are a real-time call assistant for Global Hospital Bangalore.
|
||||
You listen to the conversation and provide brief, actionable suggestions.
|
||||
|
||||
CALLER CONTEXT:
|
||||
- Name: {leadName}
|
||||
- Phone: {phone}
|
||||
- Source: {source} ({campaign})
|
||||
- Previous calls: {callCount} (last: {lastCallDate}, disposition: {lastDisposition})
|
||||
- Appointments: {appointmentHistory}
|
||||
- Interested in: {interestedService}
|
||||
- AI Summary: {aiSummary}
|
||||
|
||||
AVAILABLE RESOURCES:
|
||||
- Doctors: {doctorList with departments and clinics}
|
||||
- Next available slots: {availableSlots}
|
||||
|
||||
RULES:
|
||||
- Keep suggestions under 2 sentences
|
||||
- Focus on actionable next steps
|
||||
- If customer mentions a doctor/department, show available slots
|
||||
- If customer wants to cancel, note the appointment ID
|
||||
- Flag if customer sounds upset or mentions a complaint
|
||||
- Do NOT repeat information the agent already said
|
||||
```
|
||||
|
||||
**OpenAI call** (every 10 seconds):
|
||||
```typescript
|
||||
const response = await openai.chat.completions.create({
|
||||
model: 'gpt-4o-mini',
|
||||
messages: [
|
||||
{ role: 'system', content: systemPrompt },
|
||||
{ role: 'user', content: `Conversation so far:\n${transcript}\n\nProvide a brief suggestion for the agent.` },
|
||||
],
|
||||
max_tokens: 150,
|
||||
});
|
||||
```
|
||||
|
||||
### 3. Frontend: Live transcript sidebar
|
||||
|
||||
Replace the AI chat tab content during active calls with a live transcript view:
|
||||
|
||||
- Scrolling transcript with timestamps
|
||||
- Customer lines in one color, suggestions in a highlighted card
|
||||
- Auto-scroll to bottom as new lines arrive
|
||||
- Suggestions appear as colored cards between transcript lines
|
||||
- When call ends, transcript stays visible for reference during disposition
|
||||
|
||||
### 4. Context loading
|
||||
|
||||
On `start` message, the sidecar queries the platform for:
|
||||
```graphql
|
||||
# Lead details
|
||||
{ leads(filter: { id: { eq: "{leadId}" } }) { edges { node { ... } } } }
|
||||
|
||||
# Past appointments
|
||||
{ appointments(filter: { patientId: { eq: "{leadId}" } }) { edges { node { ... } } } }
|
||||
|
||||
# Doctors
|
||||
{ doctors(first: 20) { edges { node { id fullName department clinic } } } }
|
||||
```
|
||||
|
||||
This context is loaded once and injected into the system prompt. No mid-call refresh needed.
|
||||
|
||||
## File structure
|
||||
|
||||
### Sidecar (helix-engage-server)
|
||||
| File | Responsibility |
|
||||
|------|---------------|
|
||||
| `src/call-assist/call-assist.gateway.ts` | WebSocket gateway — handles audio streaming, Deepgram connection, OpenAI calls |
|
||||
| `src/call-assist/call-assist.module.ts` | Module registration |
|
||||
| `src/call-assist/call-assist.service.ts` | Context loading from platform, OpenAI prompt building |
|
||||
|
||||
### Frontend (helix-engage)
|
||||
| File | Responsibility |
|
||||
|------|---------------|
|
||||
| `src/lib/audio-capture.ts` | AudioWorklet to capture + downsample remote audio track |
|
||||
| `src/hooks/use-call-assist.ts` | WebSocket connection to sidecar, manages transcript + suggestion state |
|
||||
| `src/components/call-desk/live-transcript.tsx` | Scrolling transcript + suggestion cards UI |
|
||||
| `src/components/call-desk/context-panel.tsx` | Modify: show LiveTranscript instead of AiChatPanel during active calls |
|
||||
| `src/pages/call-desk.tsx` | Modify: remove CallPrepCard during active calls |
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **Deepgram SDK**: `@deepgram/sdk` in sidecar (or raw WebSocket)
|
||||
- **DEEPGRAM_API_KEY**: environment variable in sidecar
|
||||
- **AudioWorklet**: browser API, no dependencies (supported in all modern browsers)
|
||||
- **OpenAI**: already configured in sidecar (`gpt-4o-mini`)
|
||||
|
||||
## Cost estimate
|
||||
|
||||
Per 5-minute call:
|
||||
- Deepgram Nova: ~$0.02 (at $0.0043/min)
|
||||
- OpenAI gpt-4o-mini: ~$0.005 (30 calls × ~500 tokens each)
|
||||
- Total: ~$0.025 per call (~₹2)
|
||||
|
||||
## Out of scope
|
||||
|
||||
- Agent mic transcription (only customer audio for now — agent's words are visible in the AI suggestions context)
|
||||
- Voice response from AI (text only)
|
||||
- Persistent transcript storage (future: save to Call record after call ends)
|
||||
- Multi-language support (English only for now)
|
||||
Reference in New Issue
Block a user