6.6 KiB
Helix Engage — Weekly Status Update
Period: April 6 – April 11, 2026 Team: Engineering
Executive Summary
Major infrastructure milestone — Helix Engage is now running on AWS EC2 with multi-tenant architecture supporting both Ramaiah Hospitals and Global Hospital on a single instance. A full CI/CD pipeline with automated E2E testing and Teams notifications is operational. 17 defects from QA were triaged, 8 fixed and deployed, and a cross-tenant security vulnerability in the telephony layer was discovered and patched.
1. AWS EC2 Deployment (Multi-Tenant)
Status: Live
Migrated from single-tenant VPS to multi-tenant EC2 architecture:
- Instance: m6i.xlarge, Mumbai (ap-south-1), 15GB RAM
- 14 Docker containers running: platform, 2 sidecars, telephony dispatcher, 4 Redis instances, Caddy, PostgreSQL, ClickHouse, Redpanda, MinIO
- Strict tenant isolation: each hospital has its own sidecar container, Redis instance, and data volume
- Host-routed Caddy: cross-tenant webhook routing is physically impossible
URLs deployed:
- ramaiah.engage.healix360.net (Ramaiah Hospitals)
- global.engage.healix360.net (Global Hospital)
- ramaiah.app.healix360.net / global.app.healix360.net (Platform)
- telephony.engage.healix360.net (Event dispatcher)
- operations.healix360.net (CI/CD dashboard)
- git.healix360.net (Git forge)
2. Telephony Event Dispatcher
Status: Live
Built a NestJS service that routes Ozonetel agent/call events to the correct hospital's sidecar:
- Ozonetel event subscriptions are account-level (not per-campaign) — one URL for all agents
- Dispatcher receives all events, looks up
agentIdin Redis, forwards to the correct sidecar - Sidecars self-register on boot with their agent list; heartbeat every 30s, TTL 90s
- No manual configuration needed when adding new hospitals
3. Cross-Tenant Security Fix (defaultAgentId)
Status: Fixed and deployed
Discovered that 6 sidecar endpoints used a hardcoded OZONETEL_AGENT_ID env var as a fallback when agentId wasn't provided by the frontend. In a multi-tenant setup, this caused Ramaiah sidecar operations to silently affect Global Hospital's agent.
Impact: Agent state changes, call disposition, outbound dialing, performance metrics, and maintenance commands could operate on the wrong hospital's agent with no error or warning.
Fix:
- Removed
defaultAgentIdgetter and all hardcoded fallbacks (agent3,Test123$,521814) - All 6 endpoints now require
agentIdfrom the caller (400 if missing) - Frontend updated to send
agentIdfromlocalStorage.helix_agent_configin all calls OZONETEL_AGENT_IDremoved from env config entirely
4. Defect Fixes (8 of 17)
| Bug | Title | Status |
|---|---|---|
| #527 | Appointment creation updates existing patient incorrectly | Fixed |
| #529 | Break/Training status doesn't block outbound calls | Fixed |
| #531 | Agent can log out during active call | Fixed |
| #533 | Redundant "Call History" header | Fixed |
| #534 | Redundant "Patients" header | Fixed |
| #536 | My Performance shows wrong agent's data | Fixed |
| #538 | Supervisor dashboard metrics incorrect | Fixed |
| #540 | Ghost calls visible for logged-out agents | Fixed |
| #547 | SLA rules not reflected in Call Desk | Fixed (config seeded) |
Deferred (by product): #516 (recordings real-time), #517/#548 (AI transcription), #519 (supervisor call — needs SIP seat), #539 (missed calls real-time), #541 (whisper/barge/listen)
5. E2E Test Suite (Playwright)
Status: 40 tests, all passing
Automated smoke tests covering every page for both hospitals:
- Login (4): branding, invalid creds, supervisor login, auth guard
- Ramaiah CC Agent (10): call desk, call history, patients, appointments, my performance, sidebar, sign-out
- Ramaiah Supervisor (12): dashboard, team performance, live monitor, leads, patients, appointments, call log, recordings, missed calls, campaigns, settings, sidebar
- Global CC Agent (7): all pages + sign-out
- Global Supervisor (5): all pages
Self-healing: auto-clears agent session locks before login, completes sign-out after tests.
6. CI/CD Pipeline (Woodpecker + Gitea)
Status: Operational
End-to-end CI/CD on EC2:
- Gitea mirrors Azure DevOps repos every 15 minutes
- Woodpecker CI triggers pipelines on push or manual run
- Frontend pipeline: TypeScript typecheck → 40 E2E tests → HTML report published to MinIO → Teams notification
- Sidecar pipeline: Jest unit tests → Teams notification
- Reports: Playwright HTML reports with screenshots at
operations.healix360.net/reports/{run}/index.html - Teams notifications: Adaptive Cards to "Deployment updates" channel with pass/fail summary + report link
7. Documentation
Three docs committed to the repo:
- architecture.md — Multi-tenant topology with Mermaid diagram, telephony dispatcher, failure modes
- developer-operations-runbook.md — SSH access, accounts, deploy steps, Redis ops, DB access, troubleshooting
- ci-cd-operations.md — Gitea, Woodpecker, MinIO, Teams notification setup and troubleshooting
8. Data Seeding
- Ramaiah: 195 real doctors scraped from msrmh.com, clinics, visit slots, campaign data
- Global: CC agent accounts (rekha.cc, ganesh.cc), marketing (sanjay), supervisor (dr.ramesh) created with proper roles
- Rules engine: 6 priority scoring rules seeded (missed call, follow-up, campaign lead, 2nd/3rd attempt, spam deprioritize)
- Seed script: idempotent
mkMember, cleanup phase before seeding, runs against any workspace via env vars
9. Other Improvements
- SIP agent tracing: Browser console logs
agent=ramaiahadmin ext=524435on every SIP connect/disconnect/state change for multi-agent debugging - ACW 3-layer protection: beforeunload warning → sendBeacon auto-dispose → server 30s timer
- Maint endpoints:
force-readyandunlock-agentnow acceptagentIdfrom body (was hardcoded) - Security group automation: SSH IP auto-updated via AWS CLI when ISP changes
Metrics
| Metric | Value |
|---|---|
| Commits (frontend) | 35 |
| Commits (sidecar) | 20 |
| Commits (SDK app) | 2 |
| Bugs fixed | 9 |
| E2E tests | 40 |
| Docker containers | 17 (14 app + 3 CI) |
| DNS records | 6 |
| Uptime | EC2 live since Apr 9 |
Next Week Priorities
- Merge
feature/omnichannel-widget→master(frontend) - Frontend Docker image (stop rsync, bake into image)
- Appointment date validation (no past dates, auto-tomorrow after hours)
- Pre-built CI Docker image (skip
yarn installon every run) - Deferred defects: #516, #539 (real-time updates)