# Helix Engage — Weekly Status Update **Period:** April 6 – April 11, 2026 **Team:** Engineering --- ## Executive Summary Major infrastructure milestone — Helix Engage is now running on AWS EC2 with multi-tenant architecture supporting both Ramaiah Hospitals and Global Hospital on a single instance. A full CI/CD pipeline with automated E2E testing and Teams notifications is operational. 17 defects from QA were triaged, 8 fixed and deployed, and a cross-tenant security vulnerability in the telephony layer was discovered and patched. --- ## 1. AWS EC2 Deployment (Multi-Tenant) **Status: Live** Migrated from single-tenant VPS to multi-tenant EC2 architecture: - **Instance:** m6i.xlarge, Mumbai (ap-south-1), 15GB RAM - **14 Docker containers** running: platform, 2 sidecars, telephony dispatcher, 4 Redis instances, Caddy, PostgreSQL, ClickHouse, Redpanda, MinIO - **Strict tenant isolation:** each hospital has its own sidecar container, Redis instance, and data volume - **Host-routed Caddy:** cross-tenant webhook routing is physically impossible **URLs deployed:** - ramaiah.engage.healix360.net (Ramaiah Hospitals) - global.engage.healix360.net (Global Hospital) - ramaiah.app.healix360.net / global.app.healix360.net (Platform) - telephony.engage.healix360.net (Event dispatcher) - operations.healix360.net (CI/CD dashboard) - git.healix360.net (Git forge) --- ## 2. Telephony Event Dispatcher **Status: Live** Built a NestJS service that routes Ozonetel agent/call events to the correct hospital's sidecar: - Ozonetel event subscriptions are **account-level** (not per-campaign) — one URL for all agents - Dispatcher receives all events, looks up `agentId` in Redis, forwards to the correct sidecar - Sidecars self-register on boot with their agent list; heartbeat every 30s, TTL 90s - No manual configuration needed when adding new hospitals --- ## 3. Cross-Tenant Security Fix (defaultAgentId) **Status: Fixed and deployed** Discovered that 6 sidecar endpoints used a hardcoded `OZONETEL_AGENT_ID` env var as a fallback when `agentId` wasn't provided by the frontend. In a multi-tenant setup, this caused Ramaiah sidecar operations to silently affect Global Hospital's agent. **Impact:** Agent state changes, call disposition, outbound dialing, performance metrics, and maintenance commands could operate on the wrong hospital's agent with no error or warning. **Fix:** - Removed `defaultAgentId` getter and all hardcoded fallbacks (`agent3`, `Test123$`, `521814`) - All 6 endpoints now require `agentId` from the caller (400 if missing) - Frontend updated to send `agentId` from `localStorage.helix_agent_config` in all calls - `OZONETEL_AGENT_ID` removed from env config entirely --- ## 4. Defect Fixes (8 of 17) | Bug | Title | Status | |-----|-------|--------| | #527 | Appointment creation updates existing patient incorrectly | Fixed | | #529 | Break/Training status doesn't block outbound calls | Fixed | | #531 | Agent can log out during active call | Fixed | | #533 | Redundant "Call History" header | Fixed | | #534 | Redundant "Patients" header | Fixed | | #536 | My Performance shows wrong agent's data | Fixed | | #538 | Supervisor dashboard metrics incorrect | Fixed | | #540 | Ghost calls visible for logged-out agents | Fixed | | #547 | SLA rules not reflected in Call Desk | Fixed (config seeded) | **Deferred (by product):** #516 (recordings real-time), #517/#548 (AI transcription), #519 (supervisor call — needs SIP seat), #539 (missed calls real-time), #541 (whisper/barge/listen) --- ## 5. E2E Test Suite (Playwright) **Status: 40 tests, all passing** Automated smoke tests covering every page for both hospitals: - **Login (4):** branding, invalid creds, supervisor login, auth guard - **Ramaiah CC Agent (10):** call desk, call history, patients, appointments, my performance, sidebar, sign-out - **Ramaiah Supervisor (12):** dashboard, team performance, live monitor, leads, patients, appointments, call log, recordings, missed calls, campaigns, settings, sidebar - **Global CC Agent (7):** all pages + sign-out - **Global Supervisor (5):** all pages Self-healing: auto-clears agent session locks before login, completes sign-out after tests. --- ## 6. CI/CD Pipeline (Woodpecker + Gitea) **Status: Operational** End-to-end CI/CD on EC2: - **Gitea** mirrors Azure DevOps repos every 15 minutes - **Woodpecker CI** triggers pipelines on push or manual run - **Frontend pipeline:** TypeScript typecheck → 40 E2E tests → HTML report published to MinIO → Teams notification - **Sidecar pipeline:** Jest unit tests → Teams notification - **Reports:** Playwright HTML reports with screenshots at `operations.healix360.net/reports/{run}/index.html` - **Teams notifications:** Adaptive Cards to "Deployment updates" channel with pass/fail summary + report link --- ## 7. Documentation Three docs committed to the repo: - **architecture.md** — Multi-tenant topology with Mermaid diagram, telephony dispatcher, failure modes - **developer-operations-runbook.md** — SSH access, accounts, deploy steps, Redis ops, DB access, troubleshooting - **ci-cd-operations.md** — Gitea, Woodpecker, MinIO, Teams notification setup and troubleshooting --- ## 8. Data Seeding - **Ramaiah:** 195 real doctors scraped from msrmh.com, clinics, visit slots, campaign data - **Global:** CC agent accounts (rekha.cc, ganesh.cc), marketing (sanjay), supervisor (dr.ramesh) created with proper roles - **Rules engine:** 6 priority scoring rules seeded (missed call, follow-up, campaign lead, 2nd/3rd attempt, spam deprioritize) - **Seed script:** idempotent `mkMember`, cleanup phase before seeding, runs against any workspace via env vars --- ## 9. Other Improvements - **SIP agent tracing:** Browser console logs `agent=ramaiahadmin ext=524435` on every SIP connect/disconnect/state change for multi-agent debugging - **ACW 3-layer protection:** beforeunload warning → sendBeacon auto-dispose → server 30s timer - **Maint endpoints:** `force-ready` and `unlock-agent` now accept `agentId` from body (was hardcoded) - **Security group automation:** SSH IP auto-updated via AWS CLI when ISP changes --- ## Metrics | Metric | Value | |--------|-------| | Commits (frontend) | 35 | | Commits (sidecar) | 20 | | Commits (SDK app) | 2 | | Bugs fixed | 9 | | E2E tests | 40 | | Docker containers | 17 (14 app + 3 CI) | | DNS records | 6 | | Uptime | EC2 live since Apr 9 | --- ## Next Week Priorities 1. Merge `feature/omnichannel-widget` → `master` (frontend) 2. Frontend Docker image (stop rsync, bake into image) 3. Appointment date validation (no past dates, auto-tomorrow after hours) 4. Pre-built CI Docker image (skip `yarn install` on every run) 5. Deferred defects: #516, #539 (real-time updates)