AEGISdocs
Guides

AEGIS Domain Integration Templates

Version: 1.0.0 | Updated: 2026-02-12 | Status: Active

Version: 1.0.0 | Updated: 2026-02-12 | Status: Active

Four worked examples showing how to map domain-specific metrics to AEGIS parameters. Each template includes a scenario, parameter mapping, complete JSON input, and expected decision walkthrough.

Prerequisite: Read Parameter Reference for detailed parameter semantics.

Interactive access: These templates are also available via the aegis_get_scoring_guide MCP tool — call with domain set to trading, cicd, moderation, agents, or generic.


Template 1: Algorithmic Trading

Scenario

A quantitative trading team wants to deploy a new mean-reversion strategy on the S&P 500 E-mini futures market. The strategy has been backtested for 2 years but has never been traded live. Current portfolio risk is moderate.

Parameter Mapping

AEGIS ParameterDomain MetricDerivation
risk_baselineCurrent portfolio VaR / limit$45K daily VaR / $500K limit = 0.09
risk_proposedProjected portfolio VaR / limit$78K projected VaR / $500K limit = 0.156
profit_baselineCurrent Sharpe ratio1.2 (trailing 6-month)
profit_proposedBacktest Sharpe ratio1.8 (2-year backtest)
novelty_score1 - cosine_sim(strategy, nearest)New market regime = 0.65
complexity_score1 - (instruments * markets / max)1 instrument, 1 market = 0.9
quality_scoreBacktest quality composite[data_quality=0.85, code_review=0.9, stress_test=0.8] avg = 0.85
estimated_impactPosition sizing< 10% of portfolio = "medium"

JSON Input

{
  "proposal_summary": "Deploy mean-reversion strategy on ES futures (backtest Sharpe 1.8, 2yr)",
  "estimated_impact": "medium",
  "risk_baseline": 0.09,
  "risk_proposed": 0.156,
  "profit_baseline": 1.2,
  "profit_proposed": 1.8,
  "novelty_score": 0.65,
  "complexity_score": 0.9,
  "quality_score": 0.85,
  "quality_subscores": [0.85, 0.9, 0.8],
  "agent_id": "quant-desk-deployer",
  "reversible": true,
  "drift_baseline_data": [0.08, 0.09, 0.07, 0.11, 0.09, 0.08, 0.10, 0.09, 0.12, 0.08,
                          0.09, 0.07, 0.10, 0.09, 0.08, 0.11, 0.09, 0.10, 0.08, 0.09,
                          0.07, 0.10, 0.11, 0.09, 0.08, 0.09, 0.10, 0.08, 0.09, 0.11]
}

Expected Decision Walkthrough

GateValueThresholdResult
Riskdelta = (0.156-0.09)/0.09 = 0.73P(delta >= 2.0) < 0.95PASS — risk increased but not by 2x
Profitdelta = (1.8-1.2)/1.2 = 0.5Performance improvedPASS — positive improvement
NoveltyG(0.65) = 1/(1+exp(-10*(0.65-0.7))) ≈ 0.380.38 < 0.6FAIL — novelty below threshold
Complexity0.9 >= 0.5Floor checkPASS — well above floor
Quality0.85 >= 0.7, no zerosMin score + subscore checkPASS
DriftKL against baselineKL < 0.3 expectedPASS — within normal range

Expected status: PAUSE (novelty gate fails — proposal lacks sufficient novelty)

The novelty gate fails because G(0.65) ≈ 0.38, which is below the 0.6 threshold. This means the proposal is not sufficiently novel. Since estimated_impact is "medium", the decision pauses rather than escalates. The next_steps will recommend reviewing the novelty assessment.

Mitigation: If the strategy is genuinely novel (e.g., new market regime), increase novelty_score to 0.85+ to reflect the true novelty level. If the strategy is routine, the PAUSE is appropriate governance.


Template 2: CI/CD Pipeline Deployment

Scenario

A platform team is deploying a database migration that changes the schema of the users table (5M rows) across 3 microservices. The deployment requires a 10-minute maintenance window. Recent deployment error rate has been stable at 2%.

Parameter Mapping

AEGIS ParameterDomain MetricDerivation
risk_baselineCurrent error rate0.02 (2% error rate)
risk_proposedEstimated post-deploy error rate0.08 (8% during migration window)
profit_baselineDeploy throughput (deploys/day)12 deploys/day normalized: 12/50 = 0.24
profit_proposedExpected post-migration throughput14 deploys/day normalized: 14/50 = 0.28
novelty_scoreChange type classificationSchema migration = 0.6
complexity_score1 - (services * tables / max)1 - (3 * 1 / 20) = 0.85
quality_scoreCI pipeline metrics[test_pass=0.98, lint=1.0, review=0.9] avg = 0.96
estimated_impactServices affected3 services = "high"

JSON Input

{
  "proposal_summary": "Schema migration: users table (5M rows), 3 services, 10-min maintenance window",
  "estimated_impact": "high",
  "risk_baseline": 0.02,
  "risk_proposed": 0.08,
  "profit_baseline": 0.24,
  "profit_proposed": 0.28,
  "novelty_score": 0.6,
  "complexity_score": 0.85,
  "quality_score": 0.96,
  "quality_subscores": [0.98, 1.0, 0.9],
  "agent_id": "deploy-bot-prod",
  "reversible": false,
  "requires_human_approval": true,
  "time_sensitive": true
}

Expected Decision Walkthrough

GateValueThresholdResult
Riskdelta = (0.08-0.02)/0.02 = 3.0P(delta >= 2.0) > 0.95FAIL — risk more than doubled
Profitdelta = (0.28-0.24)/0.24 = 0.17Performance improvedPASS
NoveltyG(0.6) ≈ 0.270.27 < 0.6FAIL — insufficient novelty
Complexity0.85 >= 0.5Floor checkPASS
Quality0.96 >= 0.7, no zerosMin score + subscore checkPASS

Expected status: ESCALATE (high impact + multiple gate failures)

Both the risk gate and novelty gate fail. The risk gate fails because the error rate quadrupled (3x > trigger_factor 2.0). The novelty gate fails because G(0.6) ≈ 0.27 < 0.6. Because estimated_impact is "high", the decision escalates rather than just pausing. The next_steps will include "Obtain human approval" (from requires_human_approval=true) and rollback planning guidance (from reversible=false).


Template 3: Content Moderation Policy Update

Scenario

A trust & safety team is updating the content moderation policy to add a new category for AI-generated misinformation. This is a new policy area with no direct precedent. The team has high confidence in the rule definitions but limited data on false positive rates.

Parameter Mapping

AEGIS ParameterDomain MetricDerivation
risk_baselineCurrent false positive rate0.03 (3% FPR)
risk_proposedEstimated FPR with new rules0.05 (5% estimated, uncertain)
profit_baselinePrecision0.95
profit_proposedExpected precision0.92 (slightly lower due to new category)
novelty_scorePolicy precedentNo precedent = 0.85
complexity_score1 - (rules / total_rules)1 - (15 new / 200 total) = 0.925
quality_scoreReview composite[rule_clarity=0.9, legal_review=0.8, annotator_agreement=0.75] avg = 0.82
estimated_impactUser base affectedAll users = "critical"

JSON Input

{
  "proposal_summary": "Add AI-generated misinformation category to content moderation policy (15 new rules)",
  "estimated_impact": "critical",
  "risk_baseline": 0.03,
  "risk_proposed": 0.05,
  "profit_baseline": 0.95,
  "profit_proposed": 0.92,
  "novelty_score": 0.85,
  "complexity_score": 0.925,
  "quality_score": 0.82,
  "quality_subscores": [0.9, 0.8, 0.75],
  "agent_id": "trust-safety-reviewer",
  "reversible": true,
  "requires_human_approval": true
}

Expected Decision Walkthrough

GateValueThresholdResult
Riskdelta = (0.05-0.03)/0.03 = 0.67P(delta >= 2.0) < 0.95PASS — risk increase below trigger
Profitdelta = (0.92-0.95)/0.95 = -0.03Small decreasePASS — decrease is minimal
NoveltyG(0.85) ≈ 0.820.82 >= 0.6PASS — high novelty meets threshold
Complexity0.925 >= 0.5Floor checkPASS
Quality0.82 >= 0.7, no zerosMin score + subscore checkPASS

Expected status: ESCALATE (critical impact — all gates pass but critical always escalates)

All gates pass, including the novelty gate — G(0.85) ≈ 0.82 comfortably clears the 0.6 threshold, meaning the proposal demonstrates sufficient novelty. However, with estimated_impact=critical, the decision automatically escalates regardless of gate outcomes. The rationale will note that all gates passed but human review is required due to critical blast radius.

Mitigation: Deploy to 5% of traffic first (reduces estimated_impact to "low"), then gradually expand. With all gates passing, a low-impact deployment would receive PROCEED.


Template 4: Autonomous Agent Self-Governance

Scenario

An autonomous coding agent (e.g., Claude Code, Codex) is evaluating whether to proceed with a large refactoring task that touches 15 files across 3 modules. The agent has been tracking its own success/failure rates for the past month and wants to use shadow mode for calibration.

Parameter Mapping

AEGIS ParameterDomain MetricDerivation
risk_baselineRecent failure rate0.05 (5% task failure rate)
risk_proposedEstimated failure rate for this task0.12 (complex refactor)
profit_baselineCode quality before0.7 (maintainability index)
profit_proposedExpected quality after0.85 (cleaner architecture)
novelty_scoreAction confidence1 - 0.75 confidence = 0.25 (familiar pattern)
complexity_score1 - (files * modules / max)1 - (15 * 3 / 100) = 0.55
quality_scoreSelf-assessment[plan_coherence=0.8, test_coverage=0.7, safety=0.9] avg = 0.8
estimated_impactFiles changed15 files, 3 modules = "high"

JSON Input (Shadow Mode Calibration)

{
  "proposal_summary": "Refactor authentication module: 15 files across 3 modules for improved maintainability",
  "estimated_impact": "high",
  "risk_baseline": 0.05,
  "risk_proposed": 0.12,
  "profit_baseline": 0.7,
  "profit_proposed": 0.85,
  "novelty_score": 0.25,
  "complexity_score": 0.55,
  "quality_score": 0.8,
  "quality_subscores": [0.8, 0.7, 0.9],
  "agent_id": "claude-code-refactor",
  "shadow_mode": true,
  "reversible": true,
  "drift_baseline_data": [0.04, 0.06, 0.05, 0.03, 0.07, 0.05, 0.04, 0.06, 0.05, 0.04,
                          0.05, 0.03, 0.06, 0.05, 0.04, 0.07, 0.05, 0.06, 0.04, 0.05,
                          0.03, 0.05, 0.06, 0.04, 0.05, 0.07, 0.05, 0.04, 0.06, 0.05,
                          0.04, 0.05, 0.03]
}

Expected Decision Walkthrough

GateValueThresholdResult
Riskdelta = (0.12-0.05)/0.05 = 1.4P(delta >= 2.0) < 0.95PASS — increased but below 2x
Profitdelta = (0.85-0.7)/0.7 = 0.21Performance improvedPASS
NoveltyG(0.25) ≈ 0.010.01 < 0.6FAIL — insufficient novelty (familiar pattern scores low)
Complexity0.55 >= 0.5Floor checkPASS — barely above floor
Quality0.8 >= 0.7, no zerosMin score + subscore checkPASS
DriftKL against baselineKL < 0.3 expectedPASS

Expected status: ESCALATE (high impact + novelty gate failure — but in shadow mode, advisory only)

The novelty gate fails because G(0.25) ≈ 0.01 is well below the 0.6 threshold. A low novelty score means the proposal lacks sufficient novelty — the agent's high confidence (0.75) translates to low novelty (0.25), which the gate interprets as insufficient. With estimated_impact=high and a failing gate, the decision escalates. Because shadow_mode=true, the decision is advisory. The response includes a shadow_result object with:

  • shadow_only: true
  • observation_values: current metric values for future drift comparison
  • baseline_hash: SHA-256 of the baseline data (tamper detection)

Calibration insight: The ESCALATE result reveals that mapping novelty_score = 1 - confidence may be too aggressive for routine refactoring tasks. If the agent is confident, that should not trigger escalation. Consider mapping novelty differently — e.g., only use high novelty scores for genuinely novel actions (new language, unfamiliar codebase, first-time architecture change).

Calibration workflow:

  1. Run in shadow mode for 30+ days, collecting decisions
  2. Review shadow results — are the gates producing sensible outcomes?
  3. Adjust estimated_impact or metric derivations if gates are too strict/lenient (e.g., this example suggests refining the novelty mapping)
  4. Switch shadow_mode to false to enforce decisions

Note on complexity_score: At 0.55, this is barely above the 0.5 floor. If the refactor scope grows to 20+ files, recalculate — dropping below 0.5 would cause an unoverridable HALT.


Common Integration Patterns

Pattern: Gradual Rollout

Start with shadow mode, then enforce on low-risk changes first:

Phase 1 (Week 1-4):   shadow_mode=true,  all proposals
Phase 2 (Week 5-8):   shadow_mode=false, estimated_impact=low only
Phase 3 (Week 9-12):  shadow_mode=false, low + medium
Phase 4 (Week 13+):   shadow_mode=false, all proposals

Pattern: Pre-flight Check

Call aegis_check_thresholds before submitting a proposal to understand what gate values will be evaluated:

{"method": "tools/call", "params": {"name": "aegis_check_thresholds", "arguments": {}}}

Pattern: Quick Risk Guard

For simple actions where only risk matters, use the simplified API:

{"method": "tools/call", "params": {"name": "aegis_quick_risk_check", "arguments": {"action_description": "Delete staging database", "risk_score": 0.8}}}

This returns safe: false (0.8 >= 0.5 threshold) without full gate evaluation. Use aegis_evaluate_proposal for actual governance decisions.


References

On this page