Version: 1.0.0 | Updated: 2026-02-09 | Status: Active
Performance targets, resource baselines, and measurement methodology for AEGIS production deployments. Latency targets are sourced from 002-performance-load-testing.md (authoritative).
1. Decision Latency Targets
Source: docs/implementation-plans/002-performance-load-testing.md section 2.2.
| Percentile | Target | Alert Threshold | Critical Threshold |
| p50 | < 100 ms | > 150 ms | > 250 ms |
| p95 | < 500 ms | > 500 ms | > 750 ms |
| p99 | < 1000 ms | > 1000 ms | > 1500 ms |
These targets apply to end-to-end pcw_decide() evaluation, measured by the aegis_decision_latency_seconds Prometheus histogram.
2. Throughput Targets
| Metric | Target | Alert Threshold | Measurement Window |
| Minimum throughput | 100 evaluations/sec | < 80 eval/s | 60 seconds |
| Burst throughput | 500 evaluations/sec | N/A | 60 seconds |
| Error rate | < 0.1% | > 0.5% | 5 minutes |
Throughput measured as rate(aegis_decision_latency_seconds_count[1m]).
3. Resource Baselines
| Resource | Target | Alert Threshold | Notes |
| CPU utilization | < 70% | > 80% | Per-process |
| Memory utilization | < 80% | > 85% | Per-process |
| Per-evaluation memory | ~2 MB peak | N/A | Working set |
| Startup time | < 5 seconds | N/A | Process initialization |
4. Component Latency Budget
Breakdown of per-evaluation latency budget:
| Component | Budget | Notes |
| Gate evaluation (6 gates) | < 50 ms | Pure computation — risk, profit, novelty, complexity, quality, utility |
| Bayesian posterior | < 20 ms | With scipy (engine optional group) |
| Utility calculation | < 10 ms | With scipy z-score computation |
| Telemetry emission | < 5 ms | Async, non-blocking via pipeline |
| Crypto signing | < 15 ms | Ed25519 (~0.5 ms); ML-DSA-44 adds ~2-5 ms; HSM adds variable latency |
| Total overhead | < 100 ms | Target p50 |
Crypto Latency by Algorithm
| Algorithm | Sign | Verify | Notes |
| Ed25519 | ~0.5 ms | ~0.5 ms | Software implementation |
| ML-DSA-44 | ~2-5 ms | ~1-3 ms | Post-quantum (liboqs) |
| Hybrid (Ed25519 + ML-DSA-44) | ~3-6 ms | ~2-4 ms | Combined |
| HSM Ed25519 | ~10-50 ms | ~10-50 ms | Hardware dependent |
5. Measurement Methodology
Prometheus Histogram Buckets
[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0]
Corresponding to: 5 ms, 10 ms, 25 ms, 50 ms, 100 ms, 250 ms, 500 ms, 1000 ms, 2500 ms, 5000 ms.
Measurement Rules
- Exclude cold starts: First 30 seconds after process start
- Measure end-to-end:
pcw_decide() entry to return - Prometheus metric:
aegis_decision_latency_seconds (histogram) - Recording rule:
aegis:p99_latency_5m (pre-computed)
Statistical Requirements
- Minimum sample size: 1000 evaluations before reporting percentiles
- Measurement window: Rolling 5-minute windows for alerting
- Baseline: Establish on identical hardware with standardized input
6. Benchmark Baselines
Run benchmarks to establish local baselines:
pytest tests/benchmarks/ --benchmark-only -v
Benchmark Tests
| Test File | Functions | What It Measures |
test_gate_benchmarks.py | Gate evaluation for each of the 6 gates | Individual gate latency |
test_pcw_decide_benchmark.py | Full pcw_decide() evaluation | End-to-end decision latency |
test_bayesian_benchmark.py | Bayesian posterior computation | Mathematical kernel performance |
Recorded Baselines
Environment: Python 3.12.11, macOS (Darwin), pytest-benchmark 5.2.3 Date: 2026-02-09 Note: Run pytest tests/benchmarks/ --benchmark-only on your target hardware to record environment-specific baselines. Results vary by CPU, Python version, and available dependencies.
Bayesian Computation
| Benchmark | Min | Median | Mean | Max | OPS (Kops/s) |
test_posterior_calculation | 416 ns | 500 ns | 537 ns | 57.9 us | 1,862 |
test_posterior_with_overrides | 433 ns | 510 ns | 511 ns | 4.6 us | 1,958 |
test_posterior_predictive | 498 ns | 583 ns | 587 ns | 6.3 us | 1,705 |
test_compute_full | 667 ns | 833 ns | 840 ns | 46.3 us | 1,191 |
test_update_prior | 1.17 us | 1.42 us | 1.43 us | 49.0 us | 701 |
Gate Evaluation
| Benchmark | Min | Median | Mean | Max | OPS (Kops/s) |
test_complexity_gate_evaluation | 354 ns | 410 ns | 421 ns | 74.2 us | 2,374 |
test_quality_gate_evaluation | 500 ns | 625 ns | 650 ns | 30.9 us | 1,538 |
test_utility_gate_evaluation | 625 ns | 792 ns | 811 ns | 98.2 us | 1,232 |
test_novelty_gate_evaluation | 708 ns | 875 ns | 912 ns | 35.9 us | 1,096 |
test_risk_gate_evaluation | 1.04 us | 1.25 us | 1.27 us | 63.2 us | 788 |
test_profit_gate_evaluation | 1.54 us | 1.87 us | 1.88 us | 65.1 us | 533 |
test_full_gate_pipeline | 7.04 us | 7.21 us | 7.97 us | 53.2 us | 125 |
End-to-End Decision (pcw_decide)
| Benchmark | Min | Median | Mean | Max | OPS (Kops/s) |
test_pcw_decide_passing | 14.25 us | 15.04 us | 16.29 us | 133.9 us | 61.4 |
test_pcw_decide_failing | 14.54 us | 16.63 us | 17.15 us | 1,217 us | 58.3 |
test_pcw_decide_with_prometheus | 15.38 us | 17.54 us | 17.97 us | 379.5 us | 55.7 |
Summary
- Full gate pipeline: ~7 us median (well within 50 ms budget)
- End-to-end pcw_decide: ~15-18 us median (well within 100 ms p50 target)
- Single-process throughput: ~55-61K ops/s (exceeds 100 eval/s target by orders of magnitude)
- Bayesian posterior: ~500 ns median (negligible latency contribution)
7. Scaling Characteristics
| Configuration | Expected Throughput | Notes |
| Single process | ~100 eval/s | Baseline |
| Multi-worker (2 workers) | ~200 eval/s | Near-linear scaling |
| Multi-worker (4 workers) | ~350-400 eval/s | I/O bound begins |
| Multi-worker (8 workers) | ~500-600 eval/s | Diminishing returns |
Latency Impact by Feature
| Feature | Additional Latency | Notes |
| PostgreSQL persistence | +5-10 ms | Async write per evaluation |
| PII encryption (12 fields) | +1-3 ms | AES-256-GCM per field |
| ML-DSA-44 signing | +2-5 ms | vs Ed25519 ~0.5 ms |
| ML-KEM-768 encryption | +1-3 ms | Key encapsulation |
| Prometheus metrics | < 1 ms | Counter/histogram increment |
Bottlenecks
- CPU-bound: Gate evaluation and Bayesian computation (mitigate with multi-worker)
- I/O-bound: Database persistence and HSM communication (mitigate with async and connection pooling)
- Memory: Large batch evaluations (mitigate with streaming)
8. SLA Monitoring
Prometheus Alerting Rules
Defined in monitoring/prometheus/alerting-rules.yaml:
| Alert | Expression | For | Severity |
AegisHighLatency | aegis:p99_latency_5m > 1.0 | 5m | Warning |
AegisErrorRate | aegis:error_rate_5m > 0.05 | 5m | Warning |
AegisHighGateFailRate | aegis:gate_pass_rate_5m < 0.5 | 10m | Warning |
AegisOverrideSpike | Override rate > 0.1/min | 15m | Critical |
AegisDriftCritical | KL divergence = critical | 5m | Critical |
AegisOverrideStalePartial | Partial override stuck | 2h | Warning |
Grafana Dashboards
| Dashboard | File | Key Panels |
| AEGIS Overview | monitoring/grafana/overview-dashboard.json | Decision rate, gate pass rate, latency p50/p95/p99 |
| Risk Analysis | monitoring/grafana/risk-analysis-dashboard.json | KL divergence, Bayesian posteriors, override history |
Recording Rules
Pre-computed queries in monitoring/prometheus/recording-rules.yaml:
| Rule | Expression | Interval |
aegis:gate_pass_rate_5m | Pass rate by gate | 30s |
aegis:decision_rate_5m | Decision rate by status | 30s |
aegis:p99_latency_5m | p99 latency by operation | 30s |
aegis:override_rate_1h | Override rate by outcome | 30s |
aegis:error_rate_5m | Error rate by component | 30s |
References