ADR-007: AEGIS AWS Deployment Architecture¶
Status: Accepted Date: 2026-02-10 Decision Makers: joshuakirby Supersedes: None (first deployment architecture decision)
Context¶
AEGIS Governance is a production-complete SDK (1859 tests, 94.55% coverage at the time of this ADR) with CDK infrastructure defined but not yet deployed. All code — CLI, MCP server, Prometheus exporter, Grafana dashboards, Docker Compose — exists and works locally. Five ROADMAP items (17-20a) are blocked on "infrastructure" because no AWS resources have been provisioned.
Key Constraints¶
- MCP server cannot run on Lambda — MCP servers use stdio or long-lived HTTP connections incompatible with Lambda's request-response model (confirmed via AWS re:Post and Cloudflare MCP docs)
- Gate evaluation is request-response —
pcw_decide()takes a context, evaluates gates, returns a decision in <1 second - Expected traffic: ~1,770 requests/month (Lambda cost breakeven vs ECS is ~145K requests/month)
- AEGIS must be a standalone product — not a Libertas-Core subsystem
- Libertas-Core has existing VPC/KMS/AMP — shared infrastructure is available
Decision¶
Deploy AEGIS as a hybrid Lambda + ECS architecture:
- Tier 1 (Lambda): Gate evaluation (
pcw_decide()) behind API Gateway with IAM auth - Tier 2 (ECS Fargate): MCP server as a long-lived container with HTTP transport, internal ALB, and ADOT sidecar for Prometheus → AMP
- Tier 3 (pip): SDK distribution via PyPI (
pip install aegis-governance)
Architecture Diagram¶
API Gateway (REST, IAM auth)
├── POST /evaluate → Lambda (aegis-evaluate)
├── POST /risk-check → Lambda
└── GET /health → Lambda
Internal ALB (:80) → ECS Fargate
├── aegis-mcp-server (HTTP :8080, metrics :9090)
│ ├── POST /mcp (JSON-RPC 2.0, single + batch)
│ ├── GET /health (200 OK)
│ └── GET /mcp (405 — SSE not implemented)
└── adot-collector (sidecar → AMP remote write) [optional]
DynamoDB (workflow state)
Secrets Manager (BIP-322 keys)
S3 (audit logs)
Alternatives Considered¶
1. Lambda-Only¶
- Pro: Simplest, cheapest for low traffic
- Con: Cannot run MCP server (stdio/streaming incompatible with Lambda)
- Rejected: MCP is a core deployment target
2. ECS-Only¶
- Pro: Single compute model, simpler architecture
- Con: $30-46/month for ECS vs $11/month Lambda at 2K requests; idle compute waste
- Rejected: Lambda is 85x cheaper for gate evaluation at projected volume
3. EKS (Kubernetes)¶
- Pro: Industry standard for containerized workloads
- Con: Massive overhead for single-service deployment; control plane ~$73/month
- Rejected: Overkill for current scale
4. App Runner¶
- Pro: Zero infrastructure management
- Con: No VPC integration (needed for DynamoDB VPC endpoints); limited observability
- Rejected: Missing required features
Consequences¶
Positive¶
- Cost-efficient: ~$51/month total (Lambda + ECS + storage)
- Separation of concerns: Gate eval (stateless, fast) vs MCP (stateful, long-lived)
- Standalone product: Any repo can use the governance gate action
- Shared infrastructure: Leverages Libertas VPC/KMS/AMP without duplication
Negative¶
- Two compute models: Lambda + ECS increases operational surface
- CDK complexity: Four stacks instead of one
- Cold starts: Lambda cold start (3-5s with scipy) affects first request
Mitigations¶
- CDK stacks are self-contained with clear dependency chain
- Lambda cold start acceptable for async governance gates (not latency-critical)
- Provisioned concurrency available if cold start becomes an issue ($11/mo)
Cost Analysis¶
| Component | Monthly | Annual |
|---|---|---|
| Lambda (512MB, ~2K invocations) | $11 | $132 |
| ECS Fargate (0.25 vCPU, 24/7) | $30 | $360 |
| API Gateway (2K requests) | $0.01 | $0.12 |
| DynamoDB (on-demand, <1GB) | $5 | $60 |
| Secrets Manager (5 secrets) | $2 | $24 |
| CloudWatch Logs | $2 | $24 |
| S3 (audit logs, <10GB) | $0.23 | $2.76 |
| AMP (shared, incremental) | $0.30 | $3.60 |
| Total | ~$51/mo | ~$607/yr |
Implementation¶
Infrastructure is defined in CDK (Python) under infra/:
AegisSharedStack— DynamoDB, Secrets Manager, S3, KMSAegisLambdaStack— Lambda function, API Gateway, IAMAegisMcpStack— ECS cluster, Fargate service, ADOT sidecarAegisMonitoringStack— CloudWatch alarms, dashboard, SNS topic