Skip to content

AEGIS Performance SLAs

Version: 1.0.0 | Updated: 2026-02-09 | Status: Active

Performance targets, resource baselines, and measurement methodology for AEGIS production deployments. Latency targets are sourced from 002-performance-load-testing.md (authoritative).


1. Decision Latency Targets

Source: docs/implementation-plans/002-performance-load-testing.md section 2.2.

Percentile Target Alert Threshold Critical Threshold
p50 < 100 ms > 150 ms > 250 ms
p95 < 500 ms > 500 ms > 750 ms
p99 < 1000 ms > 1000 ms > 1500 ms

These targets apply to end-to-end pcw_decide() evaluation, measured by the aegis_decision_latency_seconds Prometheus histogram.


2. Throughput Targets

Metric Target Alert Threshold Measurement Window
Minimum throughput 100 evaluations/sec < 80 eval/s 60 seconds
Burst throughput 500 evaluations/sec N/A 60 seconds
Error rate < 0.1% > 0.5% 5 minutes

Throughput measured as rate(aegis_decision_latency_seconds_count[1m]).


3. Resource Baselines

Resource Target Alert Threshold Notes
CPU utilization < 70% > 80% Per-process
Memory utilization < 80% > 85% Per-process
Per-evaluation memory ~2 MB peak N/A Working set
Startup time < 5 seconds N/A Process initialization

4. Component Latency Budget

Breakdown of per-evaluation latency budget:

Component Budget Notes
Gate evaluation (6 gates) < 50 ms Pure computation — risk, profit, novelty, complexity, quality, utility
Bayesian posterior < 20 ms With scipy (engine optional group)
Utility calculation < 10 ms With scipy z-score computation
Telemetry emission < 5 ms Async, non-blocking via pipeline
Crypto signing < 15 ms Ed25519 (~0.5 ms); ML-DSA-44 adds ~2-5 ms; HSM adds variable latency
Total overhead < 100 ms Target p50

Crypto Latency by Algorithm

Algorithm Sign Verify Notes
Ed25519 ~0.5 ms ~0.5 ms Software implementation
ML-DSA-44 ~2-5 ms ~1-3 ms Post-quantum (liboqs)
Hybrid (Ed25519 + ML-DSA-44) ~3-6 ms ~2-4 ms Combined
HSM Ed25519 ~10-50 ms ~10-50 ms Hardware dependent

5. Measurement Methodology

Prometheus Histogram Buckets

[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0]

Corresponding to: 5 ms, 10 ms, 25 ms, 50 ms, 100 ms, 250 ms, 500 ms, 1000 ms, 2500 ms, 5000 ms.

Measurement Rules

  • Exclude cold starts: First 30 seconds after process start
  • Measure end-to-end: pcw_decide() entry to return
  • Prometheus metric: aegis_decision_latency_seconds (histogram)
  • Recording rule: aegis:p99_latency_5m (pre-computed)

Statistical Requirements

  • Minimum sample size: 1000 evaluations before reporting percentiles
  • Measurement window: Rolling 5-minute windows for alerting
  • Baseline: Establish on identical hardware with standardized input

6. Benchmark Baselines

Run benchmarks to establish local baselines:

pytest tests/benchmarks/ --benchmark-only -v

Benchmark Tests

Test File Functions What It Measures
test_gate_benchmarks.py Gate evaluation for each of the 6 gates Individual gate latency
test_pcw_decide_benchmark.py Full pcw_decide() evaluation End-to-end decision latency
test_bayesian_benchmark.py Bayesian posterior computation Mathematical kernel performance

Recorded Baselines

Environment: Python 3.12.11, macOS (Darwin), pytest-benchmark 5.2.3 Date: 2026-02-09 Note: Run pytest tests/benchmarks/ --benchmark-only on your target hardware to record environment-specific baselines. Results vary by CPU, Python version, and available dependencies.

Bayesian Computation

Benchmark Min Median Mean Max OPS (Kops/s)
test_posterior_calculation 416 ns 500 ns 537 ns 57.9 us 1,862
test_posterior_with_overrides 433 ns 510 ns 511 ns 4.6 us 1,958
test_posterior_predictive 498 ns 583 ns 587 ns 6.3 us 1,705
test_compute_full 667 ns 833 ns 840 ns 46.3 us 1,191
test_update_prior 1.17 us 1.42 us 1.43 us 49.0 us 701

Gate Evaluation

Benchmark Min Median Mean Max OPS (Kops/s)
test_complexity_gate_evaluation 354 ns 410 ns 421 ns 74.2 us 2,374
test_quality_gate_evaluation 500 ns 625 ns 650 ns 30.9 us 1,538
test_utility_gate_evaluation 625 ns 792 ns 811 ns 98.2 us 1,232
test_novelty_gate_evaluation 708 ns 875 ns 912 ns 35.9 us 1,096
test_risk_gate_evaluation 1.04 us 1.25 us 1.27 us 63.2 us 788
test_profit_gate_evaluation 1.54 us 1.87 us 1.88 us 65.1 us 533
test_full_gate_pipeline 7.04 us 7.21 us 7.97 us 53.2 us 125

End-to-End Decision (pcw_decide)

Benchmark Min Median Mean Max OPS (Kops/s)
test_pcw_decide_passing 14.25 us 15.04 us 16.29 us 133.9 us 61.4
test_pcw_decide_failing 14.54 us 16.63 us 17.15 us 1,217 us 58.3
test_pcw_decide_with_prometheus 15.38 us 17.54 us 17.97 us 379.5 us 55.7

Summary

  • Full gate pipeline: ~7 us median (well within 50 ms budget)
  • End-to-end pcw_decide: ~15-18 us median (well within 100 ms p50 target)
  • Single-process throughput: ~55-61K ops/s (exceeds 100 eval/s target by orders of magnitude)
  • Bayesian posterior: ~500 ns median (negligible latency contribution)

7. Scaling Characteristics

Configuration Expected Throughput Notes
Single process ~100 eval/s Baseline
Multi-worker (2 workers) ~200 eval/s Near-linear scaling
Multi-worker (4 workers) ~350-400 eval/s I/O bound begins
Multi-worker (8 workers) ~500-600 eval/s Diminishing returns

Latency Impact by Feature

Feature Additional Latency Notes
PostgreSQL persistence +5-10 ms Async write per evaluation
PII encryption (12 fields) +1-3 ms AES-256-GCM per field
ML-DSA-44 signing +2-5 ms vs Ed25519 ~0.5 ms
ML-KEM-768 encryption +1-3 ms Key encapsulation
Prometheus metrics < 1 ms Counter/histogram increment

Bottlenecks

  1. CPU-bound: Gate evaluation and Bayesian computation (mitigate with multi-worker)
  2. I/O-bound: Database persistence and HSM communication (mitigate with async and connection pooling)
  3. Memory: Large batch evaluations (mitigate with streaming)

8. SLA Monitoring

Prometheus Alerting Rules

Defined in monitoring/prometheus/alerting-rules.yaml:

Alert Expression For Severity
AegisHighLatency aegis:p99_latency_5m > 1.0 5m Warning
AegisErrorRate aegis:error_rate_5m > 0.05 5m Warning
AegisHighGateFailRate aegis:gate_pass_rate_5m < 0.5 10m Warning
AegisOverrideSpike Override rate > 0.1/min 15m Critical
AegisDriftCritical KL divergence = critical 5m Critical
AegisOverrideStalePartial Partial override stuck 2h Warning

Grafana Dashboards

Dashboard File Key Panels
AEGIS Overview monitoring/grafana/overview-dashboard.json Decision rate, gate pass rate, latency p50/p95/p99
Risk Analysis monitoring/grafana/risk-analysis-dashboard.json KL divergence, Bayesian posteriors, override history

Recording Rules

Pre-computed queries in monitoring/prometheus/recording-rules.yaml:

Rule Expression Interval
aegis:gate_pass_rate_5m Pass rate by gate 30s
aegis:decision_rate_5m Decision rate by status 30s
aegis:p99_latency_5m p99 latency by operation 30s
aegis:override_rate_1h Override rate by outcome 30s
aegis:error_rate_5m Error rate by component 30s

References