# Live Call Assist — Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Stream customer audio during calls to Deepgram for transcription, feed transcript + lead context to OpenAI every 10 seconds for suggestions, display live transcript + AI suggestions in the sidebar. **Architecture:** Browser captures remote WebRTC audio via AudioWorklet, streams PCM over Socket.IO to sidecar. Sidecar pipes audio to Deepgram Nova WebSocket for STT, accumulates transcript, and every 10 seconds sends transcript + pre-loaded lead context to OpenAI gpt-4o-mini for suggestions. Results stream back to browser via Socket.IO. **Tech Stack:** Socket.IO (already installed), Deepgram Nova SDK, OpenAI via Vercel AI SDK (already installed), AudioWorklet (browser API) --- ## File Map ### Sidecar (helix-engage-server) | File | Action | |------|--------| | `src/call-assist/call-assist.gateway.ts` | Create: Socket.IO gateway handling audio stream, Deepgram + OpenAI orchestration | | `src/call-assist/call-assist.service.ts` | Create: Lead context loading from platform, OpenAI prompt building | | `src/call-assist/call-assist.module.ts` | Create: Module registration | | `src/app.module.ts` | Modify: import CallAssistModule | | `package.json` | Modify: add `@deepgram/sdk` | ### Frontend (helix-engage) | File | Action | |------|--------| | `src/lib/audio-capture.ts` | Create: Capture remote audio track, downsample to 16kHz PCM, emit chunks | | `src/hooks/use-call-assist.ts` | Create: Socket.IO connection, manages transcript + suggestions state | | `src/components/call-desk/live-transcript.tsx` | Create: Scrolling transcript + AI suggestion cards | | `src/components/call-desk/context-panel.tsx` | Modify: show LiveTranscript during active calls instead of AiChatPanel | | `src/pages/call-desk.tsx` | Modify: remove CallPrepCard during active calls | --- ## Task 1: Sidecar — Call Assist service (context loading + OpenAI) **Files:** - Create: `helix-engage-server/src/call-assist/call-assist.service.ts` - [ ] **Step 1: Create the service** ```typescript import { Injectable, Logger } from '@nestjs/common'; import { ConfigService } from '@nestjs/config'; import { generateText } from 'ai'; import { PlatformGraphqlService } from '../platform/platform-graphql.service'; import { createAiModel } from '../ai/ai-provider'; import type { LanguageModel } from 'ai'; @Injectable() export class CallAssistService { private readonly logger = new Logger(CallAssistService.name); private readonly aiModel: LanguageModel | null; private readonly platformApiKey: string; constructor( private config: ConfigService, private platform: PlatformGraphqlService, ) { this.aiModel = createAiModel(config); this.platformApiKey = config.get('platform.apiKey') ?? ''; } async loadCallContext(leadId: string | null, callerPhone: string | null): Promise { const authHeader = this.platformApiKey ? `Bearer ${this.platformApiKey}` : ''; if (!authHeader) return 'No platform context available.'; try { const parts: string[] = []; // Load lead details if (leadId) { const leadResult = await this.platform.queryWithAuth( `{ leads(filter: { id: { eq: "${leadId}" } }) { edges { node { id name contactName { firstName lastName } contactPhone { primaryPhoneNumber } source status interestedService lastContacted contactAttempts aiSummary aiSuggestedAction } } } }`, undefined, authHeader, ); const lead = leadResult.leads.edges[0]?.node; if (lead) { const name = lead.contactName ? `${lead.contactName.firstName} ${lead.contactName.lastName}`.trim() : lead.name; parts.push(`CALLER: ${name}`); parts.push(`Phone: ${lead.contactPhone?.primaryPhoneNumber ?? callerPhone}`); parts.push(`Source: ${lead.source ?? 'Unknown'}`); parts.push(`Interested in: ${lead.interestedService ?? 'Not specified'}`); parts.push(`Contact attempts: ${lead.contactAttempts ?? 0}`); if (lead.aiSummary) parts.push(`AI Summary: ${lead.aiSummary}`); } // Load past appointments const apptResult = await this.platform.queryWithAuth( `{ appointments(filter: { patientId: { eq: "${leadId}" } }, first: 10, orderBy: [{ scheduledAt: DescNullsLast }]) { edges { node { id scheduledAt appointmentStatus doctorName department reasonForVisit } } } }`, undefined, authHeader, ); const appts = apptResult.appointments.edges.map((e: any) => e.node); if (appts.length > 0) { parts.push(`\nPAST APPOINTMENTS:`); for (const a of appts) { const date = a.scheduledAt ? new Date(a.scheduledAt).toLocaleDateString('en-IN') : '?'; parts.push(`- ${date}: ${a.doctorName ?? '?'} (${a.department ?? '?'}) — ${a.appointmentStatus}`); } } } else if (callerPhone) { parts.push(`CALLER: Unknown (${callerPhone})`); parts.push('No lead record found — this may be a new enquiry.'); } // Load doctors const docResult = await this.platform.queryWithAuth( `{ doctors(first: 20) { edges { node { fullName { firstName lastName } department specialty clinic { clinicName } } } } }`, undefined, authHeader, ); const docs = docResult.doctors.edges.map((e: any) => e.node); if (docs.length > 0) { parts.push(`\nAVAILABLE DOCTORS:`); for (const d of docs) { const name = d.fullName ? `Dr. ${d.fullName.firstName} ${d.fullName.lastName}`.trim() : 'Unknown'; parts.push(`- ${name} — ${d.department ?? '?'} — ${d.clinic?.clinicName ?? '?'}`); } } return parts.join('\n') || 'No context available.'; } catch (err) { this.logger.error(`Failed to load call context: ${err}`); return 'Context loading failed.'; } } async getSuggestion(transcript: string, context: string): Promise { if (!this.aiModel || !transcript.trim()) return ''; try { const { text } = await generateText({ model: this.aiModel, system: `You are a real-time call assistant for Global Hospital Bangalore. You listen to the customer's words and provide brief, actionable suggestions for the CC agent. ${context} RULES: - Keep suggestions under 2 sentences - Focus on actionable next steps the agent should take NOW - If customer mentions a doctor or department, suggest available slots - If customer wants to cancel or reschedule, note relevant appointment details - If customer sounds upset, suggest empathetic response - Do NOT repeat what the agent already knows`, prompt: `Conversation transcript so far:\n${transcript}\n\nProvide a brief suggestion for the agent based on what was just said.`, maxTokens: 150, }); return text; } catch (err) { this.logger.error(`AI suggestion failed: ${err}`); return ''; } } } ``` - [ ] **Step 2: Type check and commit** ``` feat: add CallAssistService for context loading and AI suggestions ``` --- ## Task 2: Sidecar — Call Assist WebSocket gateway **Files:** - Create: `helix-engage-server/src/call-assist/call-assist.gateway.ts` - Create: `helix-engage-server/src/call-assist/call-assist.module.ts` - Modify: `helix-engage-server/src/app.module.ts` - Modify: `helix-engage-server/package.json` - [ ] **Step 1: Install Deepgram SDK** ```bash cd helix-engage-server && npm install @deepgram/sdk ``` - [ ] **Step 2: Create the gateway** ```typescript import { WebSocketGateway, SubscribeMessage, MessageBody, ConnectedSocket, OnGatewayDisconnect, } from '@nestjs/websockets'; import { Logger } from '@nestjs/common'; import { ConfigService } from '@nestjs/config'; import { Socket } from 'socket.io'; import { createClient, LiveTranscriptionEvents } from '@deepgram/sdk'; import { CallAssistService } from './call-assist.service'; type SessionState = { deepgramConnection: any; transcript: string; context: string; suggestionTimer: NodeJS.Timeout | null; }; @WebSocketGateway({ cors: { origin: process.env.CORS_ORIGIN ?? '*', credentials: true }, namespace: '/call-assist', }) export class CallAssistGateway implements OnGatewayDisconnect { private readonly logger = new Logger(CallAssistGateway.name); private readonly sessions = new Map(); private readonly deepgramApiKey: string; constructor( private readonly callAssist: CallAssistService, private readonly config: ConfigService, ) { this.deepgramApiKey = process.env.DEEPGRAM_API_KEY ?? ''; } @SubscribeMessage('call-assist:start') async handleStart( @ConnectedSocket() client: Socket, @MessageBody() data: { ucid: string; leadId?: string; callerPhone?: string }, ) { this.logger.log(`Call assist start: ucid=${data.ucid} lead=${data.leadId ?? 'none'}`); // Load lead context const context = await this.callAssist.loadCallContext( data.leadId ?? null, data.callerPhone ?? null, ); client.emit('call-assist:context', { context: context.substring(0, 200) + '...' }); // Connect to Deepgram if (!this.deepgramApiKey) { this.logger.warn('DEEPGRAM_API_KEY not set — transcription disabled'); client.emit('call-assist:error', { message: 'Transcription not configured' }); return; } const deepgram = createClient(this.deepgramApiKey); const dgConnection = deepgram.listen.live({ model: 'nova-2', language: 'en', smart_format: true, interim_results: true, endpointing: 300, sample_rate: 16000, encoding: 'linear16', channels: 1, }); const session: SessionState = { deepgramConnection: dgConnection, transcript: '', context, suggestionTimer: null, }; dgConnection.on(LiveTranscriptionEvents.Open, () => { this.logger.log(`Deepgram connected for ${data.ucid}`); }); dgConnection.on(LiveTranscriptionEvents.Transcript, (result: any) => { const text = result.channel?.alternatives?.[0]?.transcript; if (!text) return; const isFinal = result.is_final; client.emit('call-assist:transcript', { text, isFinal }); if (isFinal) { session.transcript += `Customer: ${text}\n`; } }); dgConnection.on(LiveTranscriptionEvents.Error, (err: any) => { this.logger.error(`Deepgram error: ${err.message}`); }); dgConnection.on(LiveTranscriptionEvents.Close, () => { this.logger.log(`Deepgram closed for ${data.ucid}`); }); // AI suggestion every 10 seconds session.suggestionTimer = setInterval(async () => { if (!session.transcript.trim()) return; const suggestion = await this.callAssist.getSuggestion(session.transcript, session.context); if (suggestion) { client.emit('call-assist:suggestion', { text: suggestion }); } }, 10000); this.sessions.set(client.id, session); } @SubscribeMessage('call-assist:audio') handleAudio( @ConnectedSocket() client: Socket, @MessageBody() audioData: ArrayBuffer, ) { const session = this.sessions.get(client.id); if (session?.deepgramConnection) { session.deepgramConnection.send(Buffer.from(audioData)); } } @SubscribeMessage('call-assist:stop') handleStop(@ConnectedSocket() client: Socket) { this.cleanup(client.id); this.logger.log(`Call assist stopped: ${client.id}`); } handleDisconnect(client: Socket) { this.cleanup(client.id); } private cleanup(clientId: string) { const session = this.sessions.get(clientId); if (session) { if (session.suggestionTimer) clearInterval(session.suggestionTimer); if (session.deepgramConnection) { try { session.deepgramConnection.finish(); } catch {} } this.sessions.delete(clientId); } } } ``` - [ ] **Step 3: Create the module** ```typescript import { Module } from '@nestjs/common'; import { CallAssistGateway } from './call-assist.gateway'; import { CallAssistService } from './call-assist.service'; import { PlatformModule } from '../platform/platform.module'; @Module({ imports: [PlatformModule], providers: [CallAssistGateway, CallAssistService], }) export class CallAssistModule {} ``` - [ ] **Step 4: Register in app.module.ts** Add `CallAssistModule` to imports. - [ ] **Step 5: Add DEEPGRAM_API_KEY to docker-compose env** The env var needs to be set in the VPS docker-compose for the sidecar container. - [ ] **Step 6: Type check and commit** ``` feat: add call assist WebSocket gateway with Deepgram STT + OpenAI suggestions ``` --- ## Task 3: Frontend — Audio capture utility Capture the remote audio track from WebRTC, downsample to 16kHz 16-bit PCM, and provide chunks via callback. **Files:** - Create: `helix-engage/src/lib/audio-capture.ts` - [ ] **Step 1: Create the audio capture module** ```typescript type AudioChunkCallback = (chunk: ArrayBuffer) => void; let audioContext: AudioContext | null = null; let mediaStreamSource: MediaStreamAudioSourceNode | null = null; let scriptProcessor: ScriptProcessorNode | null = null; export function startAudioCapture(remoteStream: MediaStream, onChunk: AudioChunkCallback): void { stopAudioCapture(); audioContext = new AudioContext({ sampleRate: 16000 }); mediaStreamSource = audioContext.createMediaStreamSource(remoteStream); // Use ScriptProcessorNode (deprecated but universally supported) // AudioWorklet would be better but requires a separate file scriptProcessor = audioContext.createScriptProcessor(4096, 1, 1); scriptProcessor.onaudioprocess = (event) => { const inputData = event.inputBuffer.getChannelData(0); // Convert Float32 to Int16 PCM const pcm = new Int16Array(inputData.length); for (let i = 0; i < inputData.length; i++) { const s = Math.max(-1, Math.min(1, inputData[i])); pcm[i] = s < 0 ? s * 0x8000 : s * 0x7FFF; } onChunk(pcm.buffer); }; mediaStreamSource.connect(scriptProcessor); scriptProcessor.connect(audioContext.destination); } export function stopAudioCapture(): void { if (scriptProcessor) { scriptProcessor.disconnect(); scriptProcessor = null; } if (mediaStreamSource) { mediaStreamSource.disconnect(); mediaStreamSource = null; } if (audioContext) { audioContext.close().catch(() => {}); audioContext = null; } } ``` - [ ] **Step 2: Commit** ``` feat: add audio capture utility for remote WebRTC stream ``` --- ## Task 4: Frontend — useCallAssist hook Manages Socket.IO connection to `/call-assist`, sends audio, receives transcript + suggestions. **Files:** - Create: `helix-engage/src/hooks/use-call-assist.ts` - [ ] **Step 1: Create the hook** ```typescript import { useEffect, useRef, useState, useCallback } from 'react'; import { io, Socket } from 'socket.io-client'; import { startAudioCapture, stopAudioCapture } from '@/lib/audio-capture'; import { getSipClient } from '@/state/sip-manager'; const SIDECAR_URL = import.meta.env.VITE_SIDECAR_URL ?? 'http://localhost:4100'; type TranscriptLine = { id: string; text: string; isFinal: boolean; timestamp: Date; }; type Suggestion = { id: string; text: string; timestamp: Date; }; export const useCallAssist = (active: boolean, ucid: string | null, leadId: string | null, callerPhone: string | null) => { const [transcript, setTranscript] = useState([]); const [suggestions, setSuggestions] = useState([]); const [connected, setConnected] = useState(false); const socketRef = useRef(null); const idCounter = useRef(0); const nextId = useCallback(() => `ca-${++idCounter.current}`, []); useEffect(() => { if (!active || !ucid) return; const socket = io(`${SIDECAR_URL}/call-assist`, { transports: ['websocket'], }); socketRef.current = socket; socket.on('connect', () => { setConnected(true); socket.emit('call-assist:start', { ucid, leadId, callerPhone }); // Start capturing remote audio from the SIP session const sipClient = getSipClient(); const audioElement = (sipClient as any)?.audioElement as HTMLAudioElement | null; if (audioElement?.srcObject) { startAudioCapture(audioElement.srcObject as MediaStream, (chunk) => { socket.emit('call-assist:audio', chunk); }); } }); socket.on('call-assist:transcript', (data: { text: string; isFinal: boolean }) => { if (!data.text.trim()) return; setTranscript(prev => { if (!data.isFinal) { // Replace last interim line const withoutLastInterim = prev.filter(l => l.isFinal); return [...withoutLastInterim, { id: nextId(), text: data.text, isFinal: false, timestamp: new Date() }]; } // Add final line, remove interims const finals = prev.filter(l => l.isFinal); return [...finals, { id: nextId(), text: data.text, isFinal: true, timestamp: new Date() }]; }); }); socket.on('call-assist:suggestion', (data: { text: string }) => { setSuggestions(prev => [...prev, { id: nextId(), text: data.text, timestamp: new Date() }]); }); socket.on('disconnect', () => setConnected(false)); return () => { stopAudioCapture(); socket.emit('call-assist:stop'); socket.disconnect(); socketRef.current = null; setConnected(false); }; }, [active, ucid, leadId, callerPhone, nextId]); // Reset state when call ends useEffect(() => { if (!active) { setTranscript([]); setSuggestions([]); } }, [active]); return { transcript, suggestions, connected }; }; ``` - [ ] **Step 2: Install socket.io-client in frontend** ```bash cd helix-engage && npm install socket.io-client ``` - [ ] **Step 3: Expose audioElement in SIPClient** In `helix-engage/src/lib/sip-client.ts`, the `audioElement` is private. Add a public getter: ```typescript getAudioElement(): HTMLAudioElement | null { return this.audioElement; } ``` Update `getSipClient` usage in the hook — access via `getSipClient()?.getAudioElement()?.srcObject`. - [ ] **Step 4: Type check and commit** ``` feat: add useCallAssist hook for live transcription WebSocket ``` --- ## Task 5: Frontend — LiveTranscript component **Files:** - Create: `helix-engage/src/components/call-desk/live-transcript.tsx` - [ ] **Step 1: Create the component** Scrolling list of transcript lines with AI suggestion cards interspersed. Auto-scrolls to bottom. ```typescript import { useEffect, useRef } from 'react'; import { FontAwesomeIcon } from '@fortawesome/react-fontawesome'; import { faSparkles, faMicrophone } from '@fortawesome/pro-duotone-svg-icons'; import { cx } from '@/utils/cx'; type TranscriptLine = { id: string; text: string; isFinal: boolean; timestamp: Date; }; type Suggestion = { id: string; text: string; timestamp: Date; }; type LiveTranscriptProps = { transcript: TranscriptLine[]; suggestions: Suggestion[]; connected: boolean; }; export const LiveTranscript = ({ transcript, suggestions, connected }: LiveTranscriptProps) => { const scrollRef = useRef(null); // Auto-scroll to bottom useEffect(() => { if (scrollRef.current) { scrollRef.current.scrollTop = scrollRef.current.scrollHeight; } }, [transcript.length, suggestions.length]); // Merge transcript and suggestions by timestamp const items = [ ...transcript.map(t => ({ ...t, kind: 'transcript' as const })), ...suggestions.map(s => ({ ...s, kind: 'suggestion' as const, isFinal: true })), ].sort((a, b) => a.timestamp.getTime() - b.timestamp.getTime()); return (
{/* Header */}
Live Assist
{/* Transcript body */}
{items.length === 0 && (

Listening to customer...

Transcript will appear here

)} {items.map(item => { if (item.kind === 'suggestion') { return (
AI Suggestion

{item.text}

); } return (
{item.timestamp.toLocaleTimeString('en-IN', { hour: '2-digit', minute: '2-digit', second: '2-digit' })} {item.text}
); })}
); }; ``` - [ ] **Step 2: Commit** ``` feat: add LiveTranscript component for call sidebar ``` --- ## Task 6: Wire live transcript into the call desk **Files:** - Modify: `helix-engage/src/components/call-desk/context-panel.tsx` - Modify: `helix-engage/src/pages/call-desk.tsx` - [ ] **Step 1: Update context-panel.tsx to show LiveTranscript during calls** Import the hook and component: ```typescript import { useCallAssist } from '@/hooks/use-call-assist'; import { LiveTranscript } from './live-transcript'; ``` Accept new props: ```typescript interface ContextPanelProps { selectedLead: Lead | null; activities: LeadActivity[]; callerPhone?: string; isInCall?: boolean; callUcid?: string | null; } ``` Inside the component, use the hook: ```typescript const { transcript, suggestions, connected } = useCallAssist( isInCall ?? false, callUcid ?? null, selectedLead?.id ?? null, callerPhone ?? null, ); ``` When `isInCall` is true, replace the AI Assistant tab content with LiveTranscript: ```typescript {activeTab === 'ai' && ( isInCall ? ( ) : ( ) )} ``` - [ ] **Step 2: Pass isInCall and callUcid to ContextPanel in call-desk.tsx** ```typescript ``` Also get `callUcid` from `useSip()`: ```typescript const { connectionStatus, isRegistered, callState, callerNumber, callUcid } = useSip(); ``` - [ ] **Step 3: Remove CallPrepCard during active calls** In `call-desk.tsx`, remove the CallPrepCard from the active call area: ```typescript {isInCall && (
)} ``` Keep the CallPrepCard import for now — it might be useful in other contexts later. - [ ] **Step 4: Type check and commit** ``` feat: wire live transcript into call desk sidebar ``` --- ## Task 7: Deploy and verify - [ ] **Step 1: Get Deepgram API key** Sign up at deepgram.com — free tier includes $200 credit. Set `DEEPGRAM_API_KEY` in the sidecar's docker-compose env. - [ ] **Step 2: Build and deploy sidecar** ```bash cd helix-engage-server && npm install && npm run build ``` - [ ] **Step 3: Build and deploy frontend** ```bash cd helix-engage && npm install && npm run build ``` - [ ] **Step 4: Test end-to-end** 1. Login as CC agent 2. Place or receive a call 3. Sidebar should show "Live Assist" with green dot 4. Customer speaks → transcript appears in real-time 5. Every 10 seconds → AI suggestion card appears with contextual advice 6. Call ends → transcript stays visible during disposition --- ## Notes - **ScriptProcessorNode is deprecated** but universally supported. AudioWorklet would require a separate JS file served via a URL. Can upgrade later. - **Deepgram `interim_results: true`** gives streaming partial results (updated as words are recognized). `isFinal` results are the confirmed transcription. - **Socket.IO binary support** — `socket.emit('call-assist:audio', chunk)` sends ArrayBuffer natively. No base64 encoding needed. - **The `audioElement.srcObject`** is the remote MediaStream — this is the customer's audio only. We don't send the agent's mic to avoid echo/feedback in transcription. - **Cost**: ~₹2 per 5-minute call (Deepgram + OpenAI combined). - **If DEEPGRAM_API_KEY is not set**, the gateway logs a warning and sends an error event to the client. Transcription is disabled gracefully — the app still works without it.