AEGIS: Autonomous Engineering Governance Integration System¶
Version: 1.1.0 Created: 2025-12-27 Updated: 2025-12-30 Status: Accepted (Post-Quantum Complete) Release Tag:
aegis-v1.0.0-pq-completeClassification: Architecture SpecificationNote (2026-02-07): This is a founding design specification — a point-in-time architectural blueprint that guided the AEGIS v1.0 implementation. The core architecture, interfaces, and mathematical formulas remain authoritative. For current implementation status, gap inventory, and evolution history, see
gap-analysis.md,ROADMAP.md, and thechangelog.
1. Executive Summary¶
AEGIS (Autonomous Engineering Governance Integration System) unifies five complementary frameworks into a cohesive autonomous engineering decision and execution engine. This specification defines the architecture, interfaces, and integration patterns that enable fully autonomous code evolution with quantitative governance controls.
1.1 Component Overview¶
| Component | Role | Version | Source |
|---|---|---|---|
| Guardrails | Risk & Invariant Layer | v1.1.1 | This repository |
| DOS | Policy-as-Code Engine | v2.1.0 | Universal Decision Operating System |
| Rubric | Mathematical Kernel | v2.1 | Universal Decision Rubric |
| LIBERTAS OPUS | Orchestration & Collaboration | v1.0 | github.com/ThermoclineLeviathan/libertas-pcw |
| AFA | Autonomous Execution Engine | v3-RO | Autonomous Function Amplifier |
1.2 Design Principles¶
- Defense in Depth: Multiple independent gates prevent unsafe changes
- Quantitative Governance: All decisions grounded in measurable metrics
- Graceful Degradation: System fails safe, never fails open
- Human-in-Loop Escape: Override mechanisms preserve human agency
- Continuous Calibration: Thresholds adapt to observed outcomes
2. Layered Architecture¶
┌─────────────────────────────────────────────────────────────────────────────────┐
│ AUTONOMOUS ENGINEERING GOVERNANCE SYSTEM │
│ (AEGIS) │
├─────────────────────────────────────────────────────────────────────────────────┤
│ LAYER 0: INVARIANTS │
│ ├─ Hard stops that MUST never be violated │
│ ├─ Security gates (SAST/DAST/SCA) │
│ ├─ Supply chain protection (SLSA Level 3+) │
│ └─ Source: Guardrails + AFA SecurityGate │
├─────────────────────────────────────────────────────────────────────────────────┤
│ LAYER 1: POLICY ENGINE │
│ ├─ Exchange rate definitions (profit, risk, complexity) │
│ ├─ Investment vs Refactoring path routing │
│ ├─ Three-point estimation aggregation │
│ └─ Source: DOS + Rubric v2.1 │
├─────────────────────────────────────────────────────────────────────────────────┤
│ LAYER 2: QUANTITATIVE GATES │
│ ├─ Bayesian confidence thresholds (P(Δ≥2|data) > 0.95) │
│ ├─ Complexity floor enforcement (C_norm ≥ 0.5) │
│ ├─ Quality score validation (≥ 0.7, no zero sub-scores) │
│ ├─ Novelty gate filtering (G(N) ≥ 0.8) │
│ └─ Source: Guardrails + Rubric v2.1 │
├─────────────────────────────────────────────────────────────────────────────────┤
│ LAYER 3: ORCHESTRATION │
│ ├─ Workflow definitions and state machines │
│ ├─ Actor assignment (AI, Human, Hybrid) │
│ ├─ Handoff protocols (review, escalation, override) │
│ ├─ Task decomposition and parallelization │
│ └─ Source: LIBERTAS OPUS │
├─────────────────────────────────────────────────────────────────────────────────┤
│ LAYER 4: EXECUTION │
│ ├─ Control Plane: RepoGuard, QualityMetrics, CostLimiter │
│ ├─ Data Plane: SecurityAnalyzer, PerformanceAnalyzer, OptimizationGenerator │
│ ├─ Autonomous enhancement cycles │
│ ├─ Anti-gaming and entropy analysis │
│ └─ Source: AFA v3-RO │
├─────────────────────────────────────────────────────────────────────────────────┤
│ LAYER 5: FEEDBACK │
│ ├─ KL divergence drift detection (τ_warn, τ_crit) │
│ ├─ Threshold recalibration loop │
│ ├─ Learning and evolution recording │
│ ├─ Performance telemetry aggregation │
│ └─ Source: All components (closed loop) │
└─────────────────────────────────────────────────────────────────────────────────┘
3. Component Specifications¶
3.1 Layer 0: Invariants (Guardrails + AFA)¶
Purpose: Hard stops that cannot be overridden by normal operation.
Invariant Classes¶
| Class | Description | Enforcement | Override |
|---|---|---|---|
| Security | No introduction of OWASP Top 10 vulnerabilities | SAST/DAST/SCA blocking | None (absolute) |
| Supply Chain | SLSA Level 3+ provenance | Signed builds, verified deps | None (absolute) |
| Data Integrity | No deletion of audit logs | Write-once storage | None (absolute) |
| Cryptographic | Two-key override requirement | Hybrid post-quantum signatures (Ed25519+ML-DSA-44) or BIP-322 | Dual approval only |
Security Gate Integration (AFA → Guardrails)¶
class SecurityGate:
"""Unified security invariant enforcer."""
def evaluate(self, proposal: Proposal) -> GateResult:
# Layer 0 checks - no bypass possible
sast_result = self.run_sast(proposal.diff)
if sast_result.critical_findings > 0:
return GateResult.HARD_BLOCK
sca_result = self.run_sca(proposal.dependencies)
if sca_result.known_vulnerabilities > 0:
return GateResult.HARD_BLOCK
slsa_result = self.verify_provenance(proposal)
if not slsa_result.level_3_compliant:
return GateResult.HARD_BLOCK
return GateResult.PASS
3.2 Layer 1: Policy Engine (DOS + Rubric)¶
Purpose: Define exchange rates between value dimensions and route decisions.
Unified Utility Function¶
The core decision kernel from Rubric v2.1:
| Symbol | Description | Default | Source |
|---|---|---|---|
ΔP_H | Change in high-confidence profit | measured | Guardrails |
ΔV_L | Change in low-confidence value | measured | DOS |
γ | Discount for uncertain value | 0.3 | Rubric |
κ | Risk coefficient (when R < 0) | 1.0 | Rubric |
ΔR | Change in risk exposure | measured | Guardrails |
φ_S | Static complexity cost rate | $500/point* | Rubric |
φ_D | Dynamic complexity cost rate | $10,000/point* | Rubric |
Note: *Specification values ($500/$10,000) represent theoretical maximums. Implementation defaults in
schema/interface-contract.yamluse $100/$2,000 for calibrated production use. SeeREADME.mdfrozen parameters section. |ΔC_S| Change in static complexity | measured | DOS | |ΔC_D| Change in dynamic complexity | measured | DOS | |ΔOPEX| Change in operational cost | measured | DOS |
Decision Paths¶
┌─────────────────────┐
│ Proposal Intake │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Three-Point Est. │
│ E = (a + 4m + b)/6 │
└──────────┬──────────┘
│
┌────────────────┴────────────────┐
│ │
┌────────▼────────┐ ┌───────▼────────┐
│ INVESTMENT │ │ REFACTORING │
│ PATH │ │ PATH │
│ (ΔC < 0) │ │ (ΔC ≥ 0) │
└────────┬────────┘ └───────┬────────┘
│ │
┌────────▼────────┐ ┌───────▼────────┐
│ Full utility │ │ Relaxed gates │
│ calculation │ │ MIGRATION_BUDGET│
│ LCB(U) > θ │ │ = $5,000 │
└────────┬────────┘ └───────┬────────┘
│ │
└────────────────┬────────────────┘
│
┌──────────▼──────────┐
│ Gate Evaluation │
└─────────────────────┘
3.3 Layer 2: Quantitative Gates (Guardrails + Rubric)¶
Purpose: Statistical validation of proposal safety.
Gate Definitions¶
| Gate | Formula | Threshold | Confidence |
|---|---|---|---|
| Risk | ΔRisk_norm = (R_prop - R_base) / max(R_base, ε_R) | ≤ 2.0× | P(Δ≥2|data) > 0.95 |
| Profit | ΔProfit_norm = (P_prop - P_base) / max(P_base, ε_P) | ≤ 2.0× | P(Δ≥2|data) > 0.95 |
| Novelty | G(N) = 1 / (1 + e^{-k(N - N_0)}) | G(N) ≥ 0.8 | Logistic function |
| Complexity | C_norm = C_S + C_D | ≥ 0.5 | Hard floor |
| Quality | Q = Σ(w_i · q_i) | ≥ 0.7 | No zero sub-scores |
| Utility | LCB(U) = μ_U - z_α · σ_U | LCB(U) > θ | Lower confidence bound |
Bayesian Posterior Computation¶
def compute_posterior_probability(
delta_observed: float,
prior_mean: float,
prior_variance: float,
likelihood_variance: float
) -> float:
"""Compute P(Δ ≥ 2.0 | observed data) using Bayesian updating."""
# Posterior parameters
posterior_precision = (1/prior_variance) + (1/likelihood_variance)
posterior_variance = 1 / posterior_precision
posterior_mean = posterior_variance * (
(prior_mean / prior_variance) +
(delta_observed / likelihood_variance)
)
# Probability that true delta exceeds threshold
z = (2.0 - posterior_mean) / math.sqrt(posterior_variance)
return 1 - scipy.stats.norm.cdf(z)
3.4 Layer 3: Orchestration (LIBERTAS OPUS)¶
Purpose: Coordinate workflows, actors, and handoffs.
Actor Types (Extended)¶
| Type | Description | Capabilities | Trust Level |
|---|---|---|---|
AI | Fully autonomous agent | All automated tasks | Computed |
HUMAN | Human operator | Review, override, approve | High |
HYBRID | AI with human oversight | Propose + validate | Computed |
GOVERNANCE | Risk/Security leads | Override approval | Highest |
CALIBRATOR | Statistical reviewer | Threshold tuning | High |
Core Workflow: Proposal Evaluation¶
workflow: aegis_proposal_evaluation
version: "1.0"
description: "End-to-end proposal evaluation through AEGIS gates"
actors:
- id: ai_analyzer
type: AI
capabilities: [analyze, score, recommend]
- id: human_reviewer
type: HUMAN
capabilities: [review, approve, reject, override]
- id: governance_pair
type: GOVERNANCE
capabilities: [two_key_override]
tasks:
- id: layer0_security
name: "Security Invariant Check"
actor: ai_analyzer
type: gate
inputs: [proposal]
outputs: [security_result]
on_failure: terminate # Hard stop
- id: layer1_policy
name: "Policy Evaluation"
actor: ai_analyzer
type: computation
inputs: [proposal, security_result]
outputs: [utility_score, decision_path]
depends_on: [layer0_security]
- id: layer2_gates
name: "Quantitative Gate Evaluation"
actor: ai_analyzer
type: gate
inputs: [proposal, utility_score]
outputs: [gate_results, confidence_scores]
depends_on: [layer1_policy]
- id: decision_routing
name: "Route to Approval Path"
type: decision
inputs: [gate_results, confidence_scores]
branches:
- condition: "all_gates_pass AND confidence >= 0.95"
next: auto_approve
- condition: "any_gate_fail AND is_override_eligible"
next: human_review
- condition: "any_gate_fail AND NOT is_override_eligible"
next: auto_reject
depends_on: [layer2_gates]
- id: human_review
name: "Human Review"
actor: human_reviewer
type: review
handoff: AI_TO_HUMAN_REVIEW
inputs: [proposal, gate_results, utility_score]
outputs: [review_decision]
- id: two_key_override
name: "Two-Key Override"
actor: governance_pair
type: override
handoff: TWO_KEY_OVERRIDE
inputs: [proposal, review_decision]
outputs: [override_decision]
requires:
- dual_signature: true
- signature_format: Hybrid-PQ (Ed25519+ML-DSA-44) or BIP-322
- roles: [risk_lead, security_lead]
- pq_encryption: ML-KEM-768 for key storage
- id: auto_approve
name: "Automatic Approval"
actor: ai_analyzer
type: terminal
outputs: [approved_proposal]
- id: auto_reject
name: "Automatic Rejection"
actor: ai_analyzer
type: terminal
outputs: [rejection_reason]
Handoff Protocol: Two-Key Override¶
class TwoKeyOverrideProtocol(HandoffProtocol):
"""
Dual-signature override mechanism with post-quantum cryptography.
Supports:
- Hybrid Ed25519 + ML-DSA-44 signatures (FIPS 204) - GAP-Q1
- Legacy BIP-322 Schnorr signatures - GAP-M4
- ML-KEM-768 encrypted key storage (FIPS 203) - GAP-Q2
"""
protocol_type = HandoffProtocolType.TWO_KEY_OVERRIDE
def validate_handoff(self, context: HandoffContext) -> bool:
# Require exactly 2 governance actors
if len(context.approvers) != 2:
return False
# Verify distinct roles
roles = {a.role for a in context.approvers}
required = {"risk_lead", "security_lead"}
if roles != required:
return False
# Verify signatures (hybrid PQ or BIP-322)
for approver in context.approvers:
if not self.verify_signature(
approver.signature,
context.proposal_hash,
approver.public_key,
algorithm=approver.signature_algorithm # "hybrid-pq" or "bip322"
):
return False
return True
def execute_handoff(self, context: HandoffContext) -> HandoffResult:
# Immutable audit entry
audit_entry = AuditEntry(
event_type="TWO_KEY_OVERRIDE",
proposal_id=context.proposal_id,
approvers=[a.id for a in context.approvers],
signatures=[a.signature for a in context.approvers],
timestamp=datetime.utcnow().isoformat(),
rationale=context.override_rationale
)
self.audit_store.append_immutable(audit_entry)
return HandoffResult(
success=True,
new_owner=context.execution_actor,
audit_id=audit_entry.id
)
3.5 Layer 4: Execution (AFA v3-RO)¶
Purpose: Autonomous code analysis and evolution.
Control Plane Architecture¶
┌──────────────────────────────────────────────────────────────────┐
│ AFA CONTROL PLANE │
├──────────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ RepoGuard │ │ Quality │ │ Cost │ │
│ │ (Gating) │ │ Metrics │ │ Limiter │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ └────────────────┼────────────────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Security │ │
│ │ Gate │ │
│ └──────┬──────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Learning │ │
│ │ Feedback │ │
│ └─────────────┘ │
└──────────────────────────────────────────────────────────────────┘
│
│ Gate Decisions
▼
┌──────────────────────────────────────────────────────────────────┐
│ AFA DATA PLANE │
├──────────────────────────────────────────────────────────────────┤
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Security │ │ Performance │ │ Maintain- │ │
│ │ Analyzer │ │ Analyzer │ │ ability │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ Documen- │ │ Optimi- │ │
│ │ tation │ │ zation Gen │ │
│ └─────────────┘ └─────────────┘ │
└──────────────────────────────────────────────────────────────────┘
pcw_decide() Integration with LIBERTAS¶
The core AFA decision function maps to LIBERTAS workflows:
async def pcw_decide(
candidates: List[CodeProposal],
context: AEGISContext
) -> DecisionResult:
"""
AFA's primary decision function integrated with LIBERTAS orchestration.
This function bridges the AFA Data Plane (candidate generation) with
LIBERTAS workflow orchestration for evaluation and approval.
"""
# Create LIBERTAS workflow context
workflow_context = WorkflowContext(
workflow_id="aegis_proposal_evaluation",
initial_data={
"candidates": candidates,
"context": context,
"timestamp": datetime.utcnow().isoformat()
}
)
# Initialize workflow engine with AEGIS configuration
engine = WorkflowEngine(
workflow=load_workflow("aegis_proposal_evaluation"),
context=workflow_context
)
# Execute through all layers
result = await engine.run()
# Extract decision
if result.status == WorkflowStatus.COMPLETED:
approved = result.outputs.get("approved_proposal")
if approved:
return DecisionResult(
decision=Decision.APPROVE,
proposal=approved,
confidence=result.outputs.get("confidence_score"),
audit_trail=result.audit_entries
)
return DecisionResult(
decision=Decision.REJECT,
reason=result.outputs.get("rejection_reason"),
audit_trail=result.audit_entries
)
3.6 Layer 5: Feedback (Closed Loop)¶
Purpose: Continuous calibration and drift detection.
KL Divergence Monitoring¶
class DriftMonitor:
"""Monitors distribution drift using KL divergence."""
def __init__(
self,
tau_warning: float = 0.3, # P90 of calibration data
tau_critical: float = 0.5 # P99 of calibration data
):
self.tau_warning = tau_warning
self.tau_critical = tau_critical
self.baseline_distribution = None
def compute_kl_divergence(
self,
current: np.ndarray,
baseline: np.ndarray
) -> float:
"""D_KL(current || baseline)"""
# Add small epsilon to prevent log(0)
epsilon = 1e-10
current = current + epsilon
baseline = baseline + epsilon
# Normalize to probability distributions
current = current / current.sum()
baseline = baseline / baseline.sum()
return np.sum(current * np.log(current / baseline))
def evaluate(self, current_window: np.ndarray) -> DriftResult:
kl = self.compute_kl_divergence(current_window, self.baseline_distribution)
if kl >= self.tau_critical:
return DriftResult(
status=DriftStatus.CRITICAL,
kl_divergence=kl,
action=DriftAction.HALT_AND_RECALIBRATE
)
elif kl >= self.tau_warning:
return DriftResult(
status=DriftStatus.WARNING,
kl_divergence=kl,
action=DriftAction.ALERT_AND_CONTINUE
)
else:
return DriftResult(
status=DriftStatus.NORMAL,
kl_divergence=kl,
action=DriftAction.CONTINUE
)
Recalibration Workflow¶
workflow: aegis_recalibration
version: "1.0"
description: "Threshold recalibration when drift detected"
trigger: drift_status == CRITICAL
actors:
- id: calibration_agent
type: AI
capabilities: [analyze, compute]
- id: calibrator
type: CALIBRATOR
capabilities: [review, approve]
tasks:
- id: collect_telemetry
name: "Collect 30-Day Telemetry Window"
actor: calibration_agent
outputs: [telemetry_window]
- id: compute_new_thresholds
name: "Compute Proposed Thresholds"
actor: calibration_agent
inputs: [telemetry_window]
outputs: [proposed_thresholds, statistical_justification]
- id: human_review
name: "Calibrator Review"
actor: calibrator
handoff: AI_TO_HUMAN_REVIEW
inputs: [proposed_thresholds, statistical_justification]
outputs: [approved_thresholds]
- id: deploy_thresholds
name: "Deploy New Thresholds"
actor: calibration_agent
inputs: [approved_thresholds]
requires:
- parameter_freeze_tag: true
- git_tag_format: "aegis-v{VERSION}-recal"
4. Interface Contracts¶
4.1 Layer Boundary Interfaces¶
L0 → L1: Security Result¶
interface SecurityResult {
status: "PASS" | "HARD_BLOCK";
sast_findings: Finding[];
sca_findings: Finding[];
slsa_level: number;
timestamp: string;
hash: string; // SHA-256 of result for audit
}
L1 → L2: Policy Decision¶
interface PolicyDecision {
utility_score: number;
utility_variance: number;
decision_path: "INVESTMENT" | "REFACTORING";
three_point_estimate: {
best: number;
likely: number;
worst: number;
expected: number; // PERT: (a + 4m + b) / 6
variance: number; // ((b - a) / 6)^2
};
complexity_breakdown: {
static: number; // C_S
dynamic: number; // C_D
tax: number; // φ_S·C_S + φ_D·C_D
};
}
L2 → L3: Gate Results¶
interface GateResults {
all_passed: boolean;
gates: {
risk: GateResult;
profit: GateResult;
novelty: GateResult;
complexity: GateResult;
quality: GateResult;
utility: GateResult;
};
override_eligible: boolean;
confidence: number; // Minimum confidence across gates
}
interface GateResult {
passed: boolean;
value: number;
threshold: number;
confidence: number;
posterior_probability?: number; // For Bayesian gates
}
L3 → L4: Execution Directive¶
interface ExecutionDirective {
action: "EXECUTE" | "HOLD" | "REJECT";
proposal_id: string;
approved_by: ApprovalPath;
constraints: ExecutionConstraints;
audit_chain: AuditEntry[];
}
type ApprovalPath =
| { type: "AUTO"; confidence: number }
| { type: "HUMAN_REVIEW"; reviewer_id: string }
| { type: "TWO_KEY_OVERRIDE"; approvers: [string, string]; signatures: [string, string] };
4.2 Telemetry Schema (Extended)¶
All proposals MUST emit the following telemetry:
telemetry_schema:
version: "2.0"
# Core identification
proposal_id: string # UUID
timestamp: string # ISO-8601
param_snapshot_id: string # e.g., "aegis-v1.0-freeze"
# Layer 0: Invariants
security_gate_status: string # PASS | HARD_BLOCK
sast_finding_count: integer
sca_vulnerability_count: integer
slsa_level: integer
# Layer 1: Policy
decision_path: string # INVESTMENT | REFACTORING
utility_raw: float
utility_lcb: float
three_point_best: float
three_point_likely: float
three_point_worst: float
three_point_expected: float
complexity_static: float
complexity_dynamic: float
complexity_tax_dollars: float
# Layer 2: Gates
risk_score: float
risk_posterior: float
profit_score: float
profit_posterior: float
novelty_score: float
novelty_gate_value: float
complexity_floor_met: boolean
quality_score: float
quality_subscore_min: float
# Layer 3: Orchestration
workflow_id: string
actor_assignments: object
handoff_count: integer
approval_path: string # AUTO | HUMAN_REVIEW | TWO_KEY_OVERRIDE
# Layer 4: Execution
execution_status: string
changes_applied: integer
lines_modified: integer
tests_passed: boolean
# Layer 5: Feedback
kl_divergence: float
drift_status: string # NORMAL | WARNING | CRITICAL
baseline_feed_hash: string
# Audit
human_decision: string # Actual outcome for calibration
override_rationale: string # If override used
audit_chain_hash: string # Merkle root of audit entries
5. Integration Patterns¶
5.1 Synchronous Decision Flow¶
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ AFA │ │ Security│ │ DOS │ │Guardrail│ │LIBERTAS │
│ Data │───▶│ Gate │───▶│ Policy │───▶│ Gates │───▶│ Workflow│
│ Plane │ │ (L0) │ │ (L1) │ │ (L2) │ │ (L3) │
└─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘
│ │
│ │
▼ ▼
┌─────────┐ ┌─────────┐
│Candidate│ │ Execute │
│Proposals│ │ or │
│ │ │ Reject │
└─────────┘ └─────────┘
5.2 Async Feedback Loop¶
┌─────────────────────────────────────────────────────────────────────┐
│ CONTINUOUS FEEDBACK │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Telemetry │ │ Drift │ │Recalibrate│ │
│ │Collection│─────▶│Detection │─────▶│ Workflow │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │ │ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ │
│ │ Baseline │◀───────────────────────│ Deploy │ │
│ │ Update │ │ New Tags │ │
│ └──────────┘ └──────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
6. Gap Analysis¶
Based on comprehensive analysis, the following gaps require bridging:
6.1 Critical Gaps¶
| ID | Gap | Impact | Bridging Solution |
|---|---|---|---|
| GAP-C1 | Decision Logic Divergence | Guardrails uses Bayesian posterior; Rubric uses LCB | Implement dual validation: compute both, require both to pass |
| GAP-C2 | Override Mechanism Incompatibility | AFA lacks two-key override; LIBERTAS needs extension | Extend LIBERTAS HandoffProtocol with BIP-322 implementation |
6.2 High Priority Gaps¶
| ID | Gap | Impact | Bridging Solution |
|---|---|---|---|
| GAP-H1 | Parameter Naming | Inconsistent names across components | Create unified parameter registry in /schema/ |
| GAP-H2 | Telemetry Schema Extension | AFA telemetry doesn't include all fields | Extend AFA telemetry to AEGIS schema |
| GAP-H3 | RBAC Model Reconciliation | Different role definitions | Map roles to unified hierarchy |
6.3 Medium Priority Gaps¶
| ID | Gap | Impact | Bridging Solution |
|---|---|---|---|
| GAP-M1 | Feedback Loop Timing | Different calibration cadences | Standardize to 30-day rolling window |
| GAP-M2 | Actor Type Extension | LIBERTAS needs GOVERNANCE, CALIBRATOR | Extend ActorType enum |
| GAP-M3 | Workflow State Persistence | Need durable workflow state | Add persistence layer to LIBERTAS |
6.4 Bridging Implementation Priority¶
Phase 1 (Immediate):
├── GAP-C1: Dual validation logic
├── GAP-C2: Two-key override in LIBERTAS
└── GAP-H1: Unified parameter registry
Phase 2 (Short-term):
├── GAP-H2: Telemetry schema extension
├── GAP-H3: RBAC reconciliation
└── GAP-M2: Actor type extension
Phase 3 (Medium-term):
├── GAP-M1: Feedback loop standardization
└── GAP-M3: Workflow persistence
7. Deployment Architecture¶
7.1 Infrastructure Topology¶
┌──────────────────────────────────────────────────────────────────┐
│ AWS DEPLOYMENT │
├──────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ VPC: aegis-prod │ │
│ │ ┌───────────────────────────────────────────────────────┐ │ │
│ │ │ Private Subnets │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │
│ │ │ │ ECS: │ │ ECS: │ │ ECS: │ │ │ │
│ │ │ │ Guardrail│ │ LIBERTAS │ │ AFA │ │ │ │
│ │ │ │ Service │ │ Engine │ │ Executor │ │ │ │
│ │ │ └──────────┘ └──────────┘ └──────────┘ │ │ │
│ │ │ │ │ │
│ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │
│ │ │ │ Lambda: │ │ SageMaker│ │ RDS: │ │ │ │
│ │ │ │ Gates │ │ Inference│ │ Telemetry│ │ │ │
│ │ │ └──────────┘ └──────────┘ └──────────┘ │ │ │
│ │ │ │ │ │
│ │ └───────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌───────────────────────────────────────────────────────┐ │ │
│ │ │ Monitoring │ │ │
│ │ │ CloudWatch │ X-Ray │ Grafana │ PagerDuty │ │ │
│ │ └───────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
7.2 SLA Targets¶
| Metric | Target | Measurement |
|---|---|---|
| Availability | 99.9% | Monthly uptime |
| p50 Latency | < 100ms | Gate evaluation |
| p95 Latency | < 500ms | Full workflow |
| p99 Latency | < 2000ms | Including overrides |
| RPO | ≤ 15 minutes | Max data loss |
| RTO | ≤ 60 minutes | Max downtime |
8. Security Model¶
8.1 Trust Boundaries¶
┌─────────────────────────────────────────────────────────────────┐
│ TRUST BOUNDARY: EXTERNAL │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ Proposal Intake (untrusted input) ││
│ └─────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────┘
│
│ Validation
▼
┌─────────────────────────────────────────────────────────────────┐
│ TRUST BOUNDARY: EVALUATION │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ Security Gate + Policy Engine + Quantitative Gates ││
│ └─────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────┘
│
│ Approval
▼
┌─────────────────────────────────────────────────────────────────┐
│ TRUST BOUNDARY: EXECUTION (highest privilege) │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ Code modification, deployment, production access ││
│ └─────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────┘
8.2 RBAC Hierarchy¶
roles:
viewer:
permissions: [read_proposals, read_telemetry, read_dashboards]
analyst:
inherits: viewer
permissions: [run_queries, export_reports]
developer:
inherits: analyst
permissions: [submit_proposals, view_own_decisions]
reviewer:
inherits: developer
permissions: [approve_proposals, request_override]
risk_lead:
inherits: reviewer
permissions: [first_key_override, adjust_thresholds_propose]
security_lead:
inherits: reviewer
permissions: [second_key_override, security_gate_bypass_never]
calibrator:
inherits: analyst
permissions: [propose_recalibration, deploy_thresholds]
requires: [human_approval_for_deploy]
admin:
inherits: [risk_lead, security_lead, calibrator]
permissions: [manage_roles, audit_all, system_config]
constraints: [no_unilateral_override, four_eyes_principle]
9. Success Criteria¶
9.1 System Operational¶
- [ ] All five components integrated and communicating
- [ ] End-to-end proposal flow functional
- [ ] Two-key override mechanism operational
- [ ] Drift detection active
- [ ] Telemetry pipeline complete
9.2 Performance Validated¶
- [ ] p95 latency < 500ms for gate evaluation
- [ ] 99.9% availability achieved over 30 days
- [ ] Zero undetected security gate bypasses
- [ ] Drift alerts triggered within 5 minutes of threshold breach
9.3 Governance Established¶
- [ ] RBAC roles deployed and tested
- [ ] Audit trail complete and immutable
- [ ] Recalibration workflow tested
- [ ] Disaster recovery drill passed
10. References¶
Internal Documents¶
/docs/architecture/afa-libertas-integration.md- Detailed pcw_decide() mapping/docs/architecture/repository-structure.md- File organization/docs/architecture/gap-analysis.md- Full gap inventory/docs/architecture/adr/ADR-003-hybrid-post-quantum-signatures.md- Post-quantum signatures (GAP-Q1)/docs/architecture/adr/ADR-004-hybrid-post-quantum-encryption.md- Post-quantum encryption (GAP-Q2)spec/guardrails/Hardened_Quantitative_Guardrail_Framework_Specification.md- Guardrails v1.1.1spec/guardrails/Guardrail_v1.1.1_Interface_Contract_and_Addenda.md- Parameter schema
External Sources¶
- Universal Decision Operating System (DOS) v2.1.0
- Universal Decision Rubric v2.1
- AFA v3-RO Unified Specification
- LIBERTAS OPUS: github.com/ThermoclineLeviathan/libertas-pcw
Changelog¶
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.1.0 | 2025-12-30 | Claude Code | Post-Quantum Complete: Updated cryptographic requirements with hybrid Ed25519+ML-DSA-44 signatures (GAP-Q1) and ML-KEM-768 encryption (GAP-Q2); Updated TwoKeyOverrideProtocol to support hybrid PQ signatures; Added ADR-003/004 references; Updated release tag to aegis-v1.0.0-pq-complete |
| 1.0.0 | 2025-12-27 | Claude Code | Initial specification |