mirror of
https://dev.azure.com/globalhealthx/EMR/_git/helix-engage
synced 2026-05-18 20:08:19 +00:00
163 lines
6.6 KiB
Markdown
163 lines
6.6 KiB
Markdown
# Helix Engage — Weekly Status Update
|
||
|
||
**Period:** April 6 – April 11, 2026
|
||
**Team:** Engineering
|
||
|
||
---
|
||
|
||
## Executive Summary
|
||
|
||
Major infrastructure milestone — Helix Engage is now running on AWS EC2 with multi-tenant architecture supporting both Ramaiah Hospitals and Global Hospital on a single instance. A full CI/CD pipeline with automated E2E testing and Teams notifications is operational. 17 defects from QA were triaged, 8 fixed and deployed, and a cross-tenant security vulnerability in the telephony layer was discovered and patched.
|
||
|
||
---
|
||
|
||
## 1. AWS EC2 Deployment (Multi-Tenant)
|
||
|
||
**Status: Live**
|
||
|
||
Migrated from single-tenant VPS to multi-tenant EC2 architecture:
|
||
|
||
- **Instance:** m6i.xlarge, Mumbai (ap-south-1), 15GB RAM
|
||
- **14 Docker containers** running: platform, 2 sidecars, telephony dispatcher, 4 Redis instances, Caddy, PostgreSQL, ClickHouse, Redpanda, MinIO
|
||
- **Strict tenant isolation:** each hospital has its own sidecar container, Redis instance, and data volume
|
||
- **Host-routed Caddy:** cross-tenant webhook routing is physically impossible
|
||
|
||
**URLs deployed:**
|
||
- ramaiah.engage.healix360.net (Ramaiah Hospitals)
|
||
- global.engage.healix360.net (Global Hospital)
|
||
- ramaiah.app.healix360.net / global.app.healix360.net (Platform)
|
||
- telephony.engage.healix360.net (Event dispatcher)
|
||
- operations.healix360.net (CI/CD dashboard)
|
||
- git.healix360.net (Git forge)
|
||
|
||
---
|
||
|
||
## 2. Telephony Event Dispatcher
|
||
|
||
**Status: Live**
|
||
|
||
Built a NestJS service that routes Ozonetel agent/call events to the correct hospital's sidecar:
|
||
|
||
- Ozonetel event subscriptions are **account-level** (not per-campaign) — one URL for all agents
|
||
- Dispatcher receives all events, looks up `agentId` in Redis, forwards to the correct sidecar
|
||
- Sidecars self-register on boot with their agent list; heartbeat every 30s, TTL 90s
|
||
- No manual configuration needed when adding new hospitals
|
||
|
||
---
|
||
|
||
## 3. Cross-Tenant Security Fix (defaultAgentId)
|
||
|
||
**Status: Fixed and deployed**
|
||
|
||
Discovered that 6 sidecar endpoints used a hardcoded `OZONETEL_AGENT_ID` env var as a fallback when `agentId` wasn't provided by the frontend. In a multi-tenant setup, this caused Ramaiah sidecar operations to silently affect Global Hospital's agent.
|
||
|
||
**Impact:** Agent state changes, call disposition, outbound dialing, performance metrics, and maintenance commands could operate on the wrong hospital's agent with no error or warning.
|
||
|
||
**Fix:**
|
||
- Removed `defaultAgentId` getter and all hardcoded fallbacks (`agent3`, `Test123$`, `521814`)
|
||
- All 6 endpoints now require `agentId` from the caller (400 if missing)
|
||
- Frontend updated to send `agentId` from `localStorage.helix_agent_config` in all calls
|
||
- `OZONETEL_AGENT_ID` removed from env config entirely
|
||
|
||
---
|
||
|
||
## 4. Defect Fixes (8 of 17)
|
||
|
||
| Bug | Title | Status |
|
||
|-----|-------|--------|
|
||
| #527 | Appointment creation updates existing patient incorrectly | Fixed |
|
||
| #529 | Break/Training status doesn't block outbound calls | Fixed |
|
||
| #531 | Agent can log out during active call | Fixed |
|
||
| #533 | Redundant "Call History" header | Fixed |
|
||
| #534 | Redundant "Patients" header | Fixed |
|
||
| #536 | My Performance shows wrong agent's data | Fixed |
|
||
| #538 | Supervisor dashboard metrics incorrect | Fixed |
|
||
| #540 | Ghost calls visible for logged-out agents | Fixed |
|
||
| #547 | SLA rules not reflected in Call Desk | Fixed (config seeded) |
|
||
|
||
**Deferred (by product):** #516 (recordings real-time), #517/#548 (AI transcription), #519 (supervisor call — needs SIP seat), #539 (missed calls real-time), #541 (whisper/barge/listen)
|
||
|
||
---
|
||
|
||
## 5. E2E Test Suite (Playwright)
|
||
|
||
**Status: 40 tests, all passing**
|
||
|
||
Automated smoke tests covering every page for both hospitals:
|
||
|
||
- **Login (4):** branding, invalid creds, supervisor login, auth guard
|
||
- **Ramaiah CC Agent (10):** call desk, call history, patients, appointments, my performance, sidebar, sign-out
|
||
- **Ramaiah Supervisor (12):** dashboard, team performance, live monitor, leads, patients, appointments, call log, recordings, missed calls, campaigns, settings, sidebar
|
||
- **Global CC Agent (7):** all pages + sign-out
|
||
- **Global Supervisor (5):** all pages
|
||
|
||
Self-healing: auto-clears agent session locks before login, completes sign-out after tests.
|
||
|
||
---
|
||
|
||
## 6. CI/CD Pipeline (Woodpecker + Gitea)
|
||
|
||
**Status: Operational**
|
||
|
||
End-to-end CI/CD on EC2:
|
||
|
||
- **Gitea** mirrors Azure DevOps repos every 15 minutes
|
||
- **Woodpecker CI** triggers pipelines on push or manual run
|
||
- **Frontend pipeline:** TypeScript typecheck → 40 E2E tests → HTML report published to MinIO → Teams notification
|
||
- **Sidecar pipeline:** Jest unit tests → Teams notification
|
||
- **Reports:** Playwright HTML reports with screenshots at `operations.healix360.net/reports/{run}/index.html`
|
||
- **Teams notifications:** Adaptive Cards to "Deployment updates" channel with pass/fail summary + report link
|
||
|
||
---
|
||
|
||
## 7. Documentation
|
||
|
||
Three docs committed to the repo:
|
||
|
||
- **architecture.md** — Multi-tenant topology with Mermaid diagram, telephony dispatcher, failure modes
|
||
- **developer-operations-runbook.md** — SSH access, accounts, deploy steps, Redis ops, DB access, troubleshooting
|
||
- **ci-cd-operations.md** — Gitea, Woodpecker, MinIO, Teams notification setup and troubleshooting
|
||
|
||
---
|
||
|
||
## 8. Data Seeding
|
||
|
||
- **Ramaiah:** 195 real doctors scraped from msrmh.com, clinics, visit slots, campaign data
|
||
- **Global:** CC agent accounts (rekha.cc, ganesh.cc), marketing (sanjay), supervisor (dr.ramesh) created with proper roles
|
||
- **Rules engine:** 6 priority scoring rules seeded (missed call, follow-up, campaign lead, 2nd/3rd attempt, spam deprioritize)
|
||
- **Seed script:** idempotent `mkMember`, cleanup phase before seeding, runs against any workspace via env vars
|
||
|
||
---
|
||
|
||
## 9. Other Improvements
|
||
|
||
- **SIP agent tracing:** Browser console logs `agent=ramaiahadmin ext=524435` on every SIP connect/disconnect/state change for multi-agent debugging
|
||
- **ACW 3-layer protection:** beforeunload warning → sendBeacon auto-dispose → server 30s timer
|
||
- **Maint endpoints:** `force-ready` and `unlock-agent` now accept `agentId` from body (was hardcoded)
|
||
- **Security group automation:** SSH IP auto-updated via AWS CLI when ISP changes
|
||
|
||
---
|
||
|
||
## Metrics
|
||
|
||
| Metric | Value |
|
||
|--------|-------|
|
||
| Commits (frontend) | 35 |
|
||
| Commits (sidecar) | 20 |
|
||
| Commits (SDK app) | 2 |
|
||
| Bugs fixed | 9 |
|
||
| E2E tests | 40 |
|
||
| Docker containers | 17 (14 app + 3 CI) |
|
||
| DNS records | 6 |
|
||
| Uptime | EC2 live since Apr 9 |
|
||
|
||
---
|
||
|
||
## Next Week Priorities
|
||
|
||
1. Merge `feature/omnichannel-widget` → `master` (frontend)
|
||
2. Frontend Docker image (stop rsync, bake into image)
|
||
3. Appointment date validation (no past dates, auto-tomorrow after hours)
|
||
4. Pre-built CI Docker image (skip `yarn install` on every run)
|
||
5. Deferred defects: #516, #539 (real-time updates)
|