Files
helix-engage/docs/developer-operations-runbook.md
saridsa2 f09250f3ef docs: update developer runbook for EC2, remove duplicate
Rewrote developer-operations-runbook.md to reflect the current EC2
multi-tenant deployment (was VPS-only). Covers SSH key setup, all
containers, accounts, deploy steps, E2E tests, Redis ops, DB access,
and troubleshooting. Removed duplicate runbook.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-10 21:06:49 +05:30

390 lines
11 KiB
Markdown

# Helix Engage — Developer Operations Runbook
## Architecture
See [architecture.md](./architecture.md) for the full multi-tenant topology diagram.
```
Browser (India)
↓ HTTPS
Caddy (reverse proxy, TLS, host-routed)
├── ramaiah.engage.healix360.net → sidecar-ramaiah:4100
├── global.engage.healix360.net → sidecar-global:4100
├── telephony.engage.healix360.net → telephony:4200
├── *.app.healix360.net → server:4000 (platform)
└── engage.healix360.net → 404 (no catchall)
Docker Compose stack (EC2 — 13.234.31.194):
├── caddy — Reverse proxy + TLS (Let's Encrypt)
├── server — FortyTwo platform (NestJS, port 4000)
├── worker — BullMQ background jobs
├── sidecar-ramaiah — Ramaiah sidecar (NestJS, port 4100)
├── sidecar-global — Global sidecar (NestJS, port 4100)
├── telephony — Event dispatcher (NestJS, port 4200)
├── redis-ramaiah — Ramaiah sidecar Redis
├── redis-global — Global sidecar Redis
├── redis-telephony — Telephony dispatcher Redis
├── redis — Platform Redis
├── db — PostgreSQL 16 (workspace-per-schema)
├── clickhouse — Analytics
├── minio — S3-compatible object storage
└── redpanda — Event bus (Kafka-compatible)
```
---
## EC2 Access
```bash
# SSH into EC2
ssh -i /tmp/ramaiah-ec2-key -o StrictHostKeyChecking=no ubuntu@13.234.31.194
```
| Detail | Value |
|---|---|
| Host | `13.234.31.194` |
| User | `ubuntu` |
| SSH key | `/tmp/ramaiah-ec2-key` (decrypted from `~/Downloads/fortytwoai_hostinger`) |
| Docker compose dir | `/opt/fortytwo` |
| Frontend static files | `/opt/fortytwo/helix-engage-frontend` |
| Caddyfile | `/opt/fortytwo/Caddyfile` |
### SSH Key Setup
The key at `~/Downloads/fortytwoai_hostinger` is passphrase-protected (`SasiSuman@2007`).
Create a decrypted copy for non-interactive use:
```bash
# One-time setup
openssl pkey -in ~/Downloads/fortytwoai_hostinger -out /tmp/ramaiah-ec2-key
chmod 600 /tmp/ramaiah-ec2-key
# Verify
ssh -i /tmp/ramaiah-ec2-key ubuntu@13.234.31.194 hostname
```
### Handy alias
```bash
alias ec2="ssh -i /tmp/ramaiah-ec2-key -o StrictHostKeyChecking=no ubuntu@13.234.31.194"
```
---
## URLs
| Service | URL |
|---|---|
| Ramaiah Engage (Frontend + API) | `https://ramaiah.engage.healix360.net` |
| Global Engage (Frontend + API) | `https://global.engage.healix360.net` |
| Ramaiah Platform | `https://ramaiah.app.healix360.net` |
| Global Platform | `https://global.app.healix360.net` |
| Telephony Dispatcher | `https://telephony.engage.healix360.net` |
---
## Login Credentials
### Ramaiah Workspace
| Role | Email | Password |
|---|---|---|
| Marketing Executive | `marketing@ramaiahcare.com` | `AdRamaiah@2026` |
| Marketing Executive | `supervisor@ramaiahcare.com` | `MrRamaiah@2026` |
| CC Agent | `ccagent@ramaiahcare.com` | `CcRamaiah@2026` |
| Platform Admin | `dev@fortytwo.dev` | `tim@apple.dev` |
### Ozonetel
| Field | Value |
|---|---|
| API Key | `KK8110e6c3de02527f7243ffaa924fa93e` |
| Username | `global_healthx` |
| Ramaiah Campaign | `Inbound_918041763400` |
| Ramaiah Agent | `ramaiahadmin` / ext `524435` |
---
## Local Development
### Frontend (Vite dev server)
```bash
cd helix-engage
npm run dev # http://localhost:5173
npx tsc --noEmit # Type check
npm run build # Production build
```
The `.env.local` controls which sidecar the frontend talks to:
```bash
# Remote (default — uses EC2 backend)
VITE_API_URL=https://ramaiah.engage.healix360.net
# Local sidecar
# VITE_API_URL=http://localhost:4100
```
### Sidecar (NestJS dev server)
```bash
cd helix-engage-server
npm run start:dev # http://localhost:4100 (watch mode)
npm run build # Build only
```
Sidecar `.env` must have:
```bash
PLATFORM_GRAPHQL_URL=https://ramaiah.app.healix360.net/graphql
PLATFORM_API_KEY=<Ramaiah workspace API key>
PLATFORM_WORKSPACE_SUBDOMAIN=ramaiah
REDIS_URL=redis://localhost:6379
```
### Pre-deploy checklist
1. `npx tsc --noEmit` — passes (frontend)
2. `npm run build` — succeeds (sidecar)
3. Test the changed feature locally
4. Check `package.json` for new dependencies → decides quick vs full deploy
---
## Deployment
### Frontend
```bash
cd helix-engage && npm run build
rsync -avz -e "ssh -i /tmp/ramaiah-ec2-key -o StrictHostKeyChecking=no" \
dist/ ubuntu@13.234.31.194:/opt/fortytwo/helix-engage-frontend/
ssh -i /tmp/ramaiah-ec2-key ubuntu@13.234.31.194 \
"cd /opt/fortytwo && sudo docker compose restart caddy"
```
### Sidecar (quick — code only, no new dependencies)
```bash
cd helix-engage-server
aws ecr get-login-password --region ap-south-1 | \
docker login --username AWS --password-stdin 043728036361.dkr.ecr.ap-south-1.amazonaws.com
docker buildx build --platform linux/amd64 \
-t 043728036361.dkr.ecr.ap-south-1.amazonaws.com/fortytwo-eap/helix-engage-sidecar:alpha \
--push .
ssh -i /tmp/ramaiah-ec2-key ubuntu@13.234.31.194 \
"cd /opt/fortytwo && sudo docker compose pull sidecar-ramaiah sidecar-global && sudo docker compose up -d sidecar-ramaiah sidecar-global"
```
### How to decide
```
Did package.json change?
├── YES → ECR build + push + pull (above)
└── NO → Same steps (ECR is the only deploy path for EC2)
```
---
## Post-Deploy: E2E Smoke Tests
```bash
cd helix-engage
npx playwright test
```
27 tests covering login, all CC Agent pages, all Supervisor pages, and sign-out.
The last test completes sign-out so the agent session is released for the next run.
---
## Checking Logs
```bash
# Ramaiah sidecar
ssh -i /tmp/ramaiah-ec2-key ubuntu@13.234.31.194 \
"docker logs ramaiah-prod-sidecar-ramaiah-1 --tail 30 2>&1"
# Follow live
ssh -i /tmp/ramaiah-ec2-key ubuntu@13.234.31.194 \
"docker logs ramaiah-prod-sidecar-ramaiah-1 -f --tail 10 2>&1"
# Filter errors
ssh -i /tmp/ramaiah-ec2-key ubuntu@13.234.31.194 \
"docker logs ramaiah-prod-sidecar-ramaiah-1 --tail 100 2>&1" | grep -i "error\|fail"
# Telephony dispatcher
ssh -i /tmp/ramaiah-ec2-key ubuntu@13.234.31.194 \
"docker logs ramaiah-prod-telephony-1 --tail 30 2>&1"
# Caddy
ssh -i /tmp/ramaiah-ec2-key ubuntu@13.234.31.194 \
"docker logs ramaiah-prod-caddy-1 --tail 20 2>&1"
# Platform server
ssh -i /tmp/ramaiah-ec2-key ubuntu@13.234.31.194 \
"docker logs ramaiah-prod-server-1 --tail 30 2>&1"
# All container status
ssh -i /tmp/ramaiah-ec2-key ubuntu@13.234.31.194 \
"docker ps --format 'table {{.Names}}\t{{.Status}}'"
```
### Healthy startup
Look for these in sidecar logs:
```
[NestApplication] Nest application successfully started
Helix Engage Server running on port 4100
[SessionService] Redis connected
```
### Common failure patterns
| Log pattern | Meaning | Fix |
|---|---|---|
| `Cannot find module 'xxx'` | Missing npm dependency | Rebuild ECR image |
| `UndefinedModuleException` | Circular dependency | Fix code, redeploy |
| `ECONNREFUSED redis:6379` | Redis not ready | `docker compose up -d redis-ramaiah` |
| `Forbidden resource` | Platform permission issue | Check user roles |
| `429 Too Many Requests` | Ozonetel rate limit | Wait, reduce polling |
---
## Redis Operations
```bash
SSH="ssh -i /tmp/ramaiah-ec2-key -o StrictHostKeyChecking=no ubuntu@13.234.31.194"
REDIS="docker exec ramaiah-prod-redis-ramaiah-1 redis-cli"
# Clear agent session lock (fixes "already logged in from another device")
$SSH "$REDIS DEL agent:session:ramaiahadmin"
# List all keys
$SSH "$REDIS KEYS '*'"
# Clear caller cache (stale patient names)
$SSH "$REDIS --scan --pattern 'caller:*' | xargs -r docker exec -i ramaiah-prod-redis-ramaiah-1 redis-cli DEL"
# Clear masterdata cache (departments/doctors/clinics/slots)
$SSH "$REDIS --scan --pattern 'masterdata:*' | xargs -r docker exec -i ramaiah-prod-redis-ramaiah-1 redis-cli DEL"
# Clear recording analysis cache
$SSH "$REDIS --scan --pattern 'call:analysis:*' | xargs -r docker exec -i ramaiah-prod-redis-ramaiah-1 redis-cli DEL"
# Clear agent name cache
$SSH "$REDIS --scan --pattern 'agent:name:*' | xargs -r docker exec -i ramaiah-prod-redis-ramaiah-1 redis-cli DEL"
# Nuclear: flush all sidecar Redis
$SSH "$REDIS FLUSHDB"
```
---
## Database Access
```bash
ssh -i /tmp/ramaiah-ec2-key ubuntu@13.234.31.194 \
"docker exec -it ramaiah-prod-db-1 psql -U fortytwo -d fortytwo_eap"
```
### Useful queries
```sql
-- List workspace schemas
SELECT schema_name FROM information_schema.schemata WHERE schema_name LIKE 'workspace_%';
-- List custom entities
SELECT "nameSingular", "isCustom" FROM core."objectMetadata" ORDER BY "nameSingular";
-- List users
SELECT u.email, u."firstName", u."lastName", uw.id as workspace_id
FROM core."user" u
JOIN core."userWorkspace" uw ON uw."userId" = u.id;
-- List roles
SELECT r.label, rt."userWorkspaceId"
FROM core."roleTarget" rt
JOIN core."role" r ON r.id = rt."roleId";
```
---
## Troubleshooting
### "Already logged in from another device"
Single-session enforcement per Ozonetel agent. Clear the lock:
```bash
ssh -i /tmp/ramaiah-ec2-key ubuntu@13.234.31.194 \
"docker exec ramaiah-prod-redis-ramaiah-1 redis-cli DEL agent:session:ramaiahadmin"
```
### Agent stuck in ACW / Wrapping Up
```bash
curl -X POST https://ramaiah.engage.healix360.net/api/maint/force-ready \
-H "Content-Type: application/json" \
-d '{"agentId": "ramaiahadmin"}'
```
### Telephony events not routing
```bash
# Check dispatcher logs
ssh -i /tmp/ramaiah-ec2-key ubuntu@13.234.31.194 \
"docker logs ramaiah-prod-telephony-1 --tail 30 2>&1"
# Check service discovery registry
ssh -i /tmp/ramaiah-ec2-key ubuntu@13.234.31.194 \
"docker exec ramaiah-prod-redis-telephony-1 redis-cli KEYS '*'"
```
### Theme/branding reset after Redis flush
```bash
curl -X PUT https://ramaiah.engage.healix360.net/api/config/theme \
-H "Content-Type: application/json" \
-d '{"defaults": {"brandName": "Helix Engage", "hospitalName": "Ramaiah Hospitals"}}'
```
---
## Rollback
### Frontend
Checkout previous commit → `npm run build` → rsync to EC2.
### Sidecar
Checkout previous commit → ECR build + push → pull on EC2.
For immediate rollback, re-tag a known-good ECR image as `:alpha` and pull.
---
## Git Repositories
| Repo | Azure DevOps | Branch |
|---|---|---|
| Frontend | `helix-engage` in Patient Engagement Platform | `feature/omnichannel-widget` |
| Sidecar | `helix-engage-server` in Patient Engagement Platform | `master` |
| SDK App | `FortyTwoApps/helix-engage/` (monorepo) | `dev` |
| Telephony | `helix-engage-telephony` in Patient Engagement Platform | `master` |
---
## ECR Details
| Detail | Value |
|---|---|
| Registry | `043728036361.dkr.ecr.ap-south-1.amazonaws.com` |
| Sidecar repo | `fortytwo-eap/helix-engage-sidecar` |
| Tag | `alpha` |
| Region | `ap-south-1` (Mumbai) |