AEGIS Performance SLAs¶

Version: 1.0.0 | Updated: 2026-02-09 | Status: Active

Performance targets, resource baselines, and measurement methodology for AEGIS production deployments. Latency targets are sourced from 002-performance-load-testing.md (authoritative).

1. Decision Latency Targets¶

Source: docs/implementation-plans/002-performance-load-testing.md section 2.2.

Percentile	Target	Alert Threshold	Critical Threshold
p50	< 100 ms	> 150 ms	> 250 ms
p95	< 500 ms	> 500 ms	> 750 ms
p99	< 1000 ms	> 1000 ms	> 1500 ms

These targets apply to end-to-end pcw_decide() evaluation, measured by the aegis_decision_latency_seconds Prometheus histogram.

2. Throughput Targets¶

Metric	Target	Alert Threshold	Measurement Window
Minimum throughput	100 evaluations/sec	< 80 eval/s	60 seconds
Burst throughput	500 evaluations/sec	N/A	60 seconds
Error rate	< 0.1%	> 0.5%	5 minutes

Throughput measured as rate(aegis_decision_latency_seconds_count[1m]).

3. Resource Baselines¶

Resource	Target	Alert Threshold	Notes
CPU utilization	< 70%	> 80%	Per-process
Memory utilization	< 80%	> 85%	Per-process
Per-evaluation memory	~2 MB peak	N/A	Working set
Startup time	< 5 seconds	N/A	Process initialization

4. Component Latency Budget¶

Breakdown of per-evaluation latency budget:

Component	Budget	Notes
Gate evaluation (6 gates)	< 50 ms	Pure computation — risk, profit, novelty, complexity, quality, utility
Bayesian posterior	< 20 ms	With scipy (`engine` optional group)
Utility calculation	< 10 ms	With scipy z-score computation
Telemetry emission	< 5 ms	Async, non-blocking via pipeline
Crypto signing	< 15 ms	Ed25519 (~0.5 ms); ML-DSA-44 adds ~2-5 ms; HSM adds variable latency
Total overhead	< 100 ms	Target p50

Crypto Latency by Algorithm¶

Algorithm	Sign	Verify	Notes
Ed25519	~0.5 ms	~0.5 ms	Software implementation
ML-DSA-44	~2-5 ms	~1-3 ms	Post-quantum (liboqs)
Hybrid (Ed25519 + ML-DSA-44)	~3-6 ms	~2-4 ms	Combined
HSM Ed25519	~10-50 ms	~10-50 ms	Hardware dependent

5. Measurement Methodology¶

Prometheus Histogram Buckets¶

[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0]

Corresponding to: 5 ms, 10 ms, 25 ms, 50 ms, 100 ms, 250 ms, 500 ms, 1000 ms, 2500 ms, 5000 ms.

Measurement Rules¶

Exclude cold starts: First 30 seconds after process start
Measure end-to-end: pcw_decide() entry to return
Prometheus metric: aegis_decision_latency_seconds (histogram)
Recording rule: aegis:p99_latency_5m (pre-computed)

Statistical Requirements¶

Minimum sample size: 1000 evaluations before reporting percentiles
Measurement window: Rolling 5-minute windows for alerting
Baseline: Establish on identical hardware with standardized input

6. Benchmark Baselines¶

Run benchmarks to establish local baselines:

pytest tests/benchmarks/ --benchmark-only -v

Benchmark Tests¶

Test File	Functions	What It Measures
`test_gate_benchmarks.py`	Gate evaluation for each of the 6 gates	Individual gate latency
`test_pcw_decide_benchmark.py`	Full `pcw_decide()` evaluation	End-to-end decision latency
`test_bayesian_benchmark.py`	Bayesian posterior computation	Mathematical kernel performance

Recorded Baselines¶

Environment: Python 3.12.11, macOS (Darwin), pytest-benchmark 5.2.3 Date: 2026-02-09 Note: Run pytest tests/benchmarks/ --benchmark-only on your target hardware to record environment-specific baselines. Results vary by CPU, Python version, and available dependencies.

Bayesian Computation¶

Benchmark	Min	Median	Mean	Max	OPS (Kops/s)
`test_posterior_calculation`	416 ns	500 ns	537 ns	57.9 us	1,862
`test_posterior_with_overrides`	433 ns	510 ns	511 ns	4.6 us	1,958
`test_posterior_predictive`	498 ns	583 ns	587 ns	6.3 us	1,705
`test_compute_full`	667 ns	833 ns	840 ns	46.3 us	1,191
`test_update_prior`	1.17 us	1.42 us	1.43 us	49.0 us	701

Gate Evaluation¶

Benchmark	Min	Median	Mean	Max	OPS (Kops/s)
`test_complexity_gate_evaluation`	354 ns	410 ns	421 ns	74.2 us	2,374
`test_quality_gate_evaluation`	500 ns	625 ns	650 ns	30.9 us	1,538
`test_utility_gate_evaluation`	625 ns	792 ns	811 ns	98.2 us	1,232
`test_novelty_gate_evaluation`	708 ns	875 ns	912 ns	35.9 us	1,096
`test_risk_gate_evaluation`	1.04 us	1.25 us	1.27 us	63.2 us	788
`test_profit_gate_evaluation`	1.54 us	1.87 us	1.88 us	65.1 us	533
`test_full_gate_pipeline`	7.04 us	7.21 us	7.97 us	53.2 us	125

End-to-End Decision (`pcw_decide`)¶

Benchmark	Min	Median	Mean	Max	OPS (Kops/s)
`test_pcw_decide_passing`	14.25 us	15.04 us	16.29 us	133.9 us	61.4
`test_pcw_decide_failing`	14.54 us	16.63 us	17.15 us	1,217 us	58.3
`test_pcw_decide_with_prometheus`	15.38 us	17.54 us	17.97 us	379.5 us	55.7

Summary¶

Full gate pipeline: ~7 us median (well within 50 ms budget)
End-to-end pcw_decide: ~15-18 us median (well within 100 ms p50 target)
Single-process throughput: ~55-61K ops/s (exceeds 100 eval/s target by orders of magnitude)
Bayesian posterior: ~500 ns median (negligible latency contribution)

7. Scaling Characteristics¶

Configuration	Expected Throughput	Notes
Single process	~100 eval/s	Baseline
Multi-worker (2 workers)	~200 eval/s	Near-linear scaling
Multi-worker (4 workers)	~350-400 eval/s	I/O bound begins
Multi-worker (8 workers)	~500-600 eval/s	Diminishing returns

Latency Impact by Feature¶

Feature	Additional Latency	Notes
PostgreSQL persistence	+5-10 ms	Async write per evaluation
PII encryption (12 fields)	+1-3 ms	AES-256-GCM per field
ML-DSA-44 signing	+2-5 ms	vs Ed25519 ~0.5 ms
ML-KEM-768 encryption	+1-3 ms	Key encapsulation
Prometheus metrics	< 1 ms	Counter/histogram increment

Bottlenecks¶

CPU-bound: Gate evaluation and Bayesian computation (mitigate with multi-worker)
I/O-bound: Database persistence and HSM communication (mitigate with async and connection pooling)
Memory: Large batch evaluations (mitigate with streaming)

8. SLA Monitoring¶

Prometheus Alerting Rules¶

Defined in monitoring/prometheus/alerting-rules.yaml:

Alert	Expression	For	Severity
`AegisHighLatency`	`aegis:p99_latency_5m > 1.0`	5m	Warning
`AegisErrorRate`	`aegis:error_rate_5m > 0.05`	5m	Warning
`AegisHighGateFailRate`	`aegis:gate_pass_rate_5m < 0.5`	10m	Warning
`AegisOverrideSpike`	Override rate > 0.1/min	15m	Critical
`AegisDriftCritical`	KL divergence = critical	5m	Critical
`AegisOverrideStalePartial`	Partial override stuck	2h	Warning

Grafana Dashboards¶

Dashboard	File	Key Panels
AEGIS Overview	`monitoring/grafana/overview-dashboard.json`	Decision rate, gate pass rate, latency p50/p95/p99
Risk Analysis	`monitoring/grafana/risk-analysis-dashboard.json`	KL divergence, Bayesian posteriors, override history

Recording Rules¶

Pre-computed queries in monitoring/prometheus/recording-rules.yaml:

Rule	Expression	Interval
`aegis:gate_pass_rate_5m`	Pass rate by gate	30s
`aegis:decision_rate_5m`	Decision rate by status	30s
`aegis:p99_latency_5m`	p99 latency by operation	30s
`aegis:override_rate_1h`	Override rate by outcome	30s
`aegis:error_rate_5m`	Error rate by component	30s

References¶

Performance Load Testing Plan — Authoritative targets
Prometheus Alerting Rules — Alert definitions
Prometheus Recording Rules — Pre-computed queries
Prometheus Exporter — Metric names and types
Benchmark Tests — pytest-benchmark suite
Production Guide — Deployment instructions
Migration Guide — Upgrade procedures