AEGIS Domain Integration Templates

Version: 1.0.0 | Updated: 2026-02-12 | Status: Active

Four worked examples showing how to map domain-specific metrics to AEGIS parameters. Each template includes a scenario, parameter mapping, complete JSON input, and expected decision walkthrough.

Prerequisite: Read Parameter Reference for detailed parameter semantics.

Interactive access: These templates are also available via the aegis_get_scoring_guide MCP tool — call with domain set to trading, cicd, moderation, agents, or generic.

Template 1: Algorithmic Trading

Scenario

A quantitative trading team wants to deploy a new mean-reversion strategy on the S&P 500 E-mini futures market. The strategy has been backtested for 2 years but has never been traded live. Current portfolio risk is moderate.

Parameter Mapping

AEGIS Parameter	Domain Metric	Derivation
`risk_baseline`	Current portfolio VaR / limit	$45K daily VaR / $500K limit = 0.09
`risk_proposed`	Projected portfolio VaR / limit	$78K projected VaR / $500K limit = 0.156
`profit_baseline`	Current Sharpe ratio	1.2 (trailing 6-month)
`profit_proposed`	Backtest Sharpe ratio	1.8 (2-year backtest)
`novelty_score`	1 - cosine_sim(strategy, nearest)	New market regime = 0.65
`complexity_score`	1 - (instruments * markets / max)	1 instrument, 1 market = 0.9
`quality_score`	Backtest quality composite	[data_quality=0.85, code_review=0.9, stress_test=0.8] avg = 0.85
`estimated_impact`	Position sizing	< 10% of portfolio = "medium"

JSON Input

{
  "proposal_summary": "Deploy mean-reversion strategy on ES futures (backtest Sharpe 1.8, 2yr)",
  "estimated_impact": "medium",
  "risk_baseline": 0.09,
  "risk_proposed": 0.156,
  "profit_baseline": 1.2,
  "profit_proposed": 1.8,
  "novelty_score": 0.65,
  "complexity_score": 0.9,
  "quality_score": 0.85,
  "quality_subscores": [0.85, 0.9, 0.8],
  "agent_id": "quant-desk-deployer",
  "reversible": true,
  "drift_baseline_data": [0.08, 0.09, 0.07, 0.11, 0.09, 0.08, 0.10, 0.09, 0.12, 0.08,
                          0.09, 0.07, 0.10, 0.09, 0.08, 0.11, 0.09, 0.10, 0.08, 0.09,
                          0.07, 0.10, 0.11, 0.09, 0.08, 0.09, 0.10, 0.08, 0.09, 0.11]
}

Expected Decision Walkthrough

Gate	Value	Threshold	Result
Risk	delta = (0.156-0.09)/0.09 = 0.73	P(delta >= 2.0) < 0.95	PASS — risk increased but not by 2x
Profit	delta = (1.8-1.2)/1.2 = 0.5	Performance improved	PASS — positive improvement
Novelty	G(0.65) = 1/(1+exp(-10*(0.65-0.7))) ≈ 0.38	0.38 < 0.6	FAIL — novelty below threshold
Complexity	0.9 >= 0.5	Floor check	PASS — well above floor
Quality	0.85 >= 0.7, no zeros	Min score + subscore check	PASS
Drift	KL against baseline	KL < 0.3 expected	PASS — within normal range

Expected status: PAUSE (novelty gate fails — proposal lacks sufficient novelty)

The novelty gate fails because G(0.65) ≈ 0.38, which is below the 0.6 threshold. This means the proposal is not sufficiently novel. Since estimated_impact is "medium", the decision pauses rather than escalates. The next_steps will recommend reviewing the novelty assessment.

Mitigation: If the strategy is genuinely novel (e.g., new market regime), increase novelty_score to 0.85+ to reflect the true novelty level. If the strategy is routine, the PAUSE is appropriate governance.

Template 2: CI/CD Pipeline Deployment

Scenario

A platform team is deploying a database migration that changes the schema of the users table (5M rows) across 3 microservices. The deployment requires a 10-minute maintenance window. Recent deployment error rate has been stable at 2%.

Parameter Mapping

AEGIS Parameter	Domain Metric	Derivation
`risk_baseline`	Current error rate	0.02 (2% error rate)
`risk_proposed`	Estimated post-deploy error rate	0.08 (8% during migration window)
`profit_baseline`	Deploy throughput (deploys/day)	12 deploys/day normalized: 12/50 = 0.24
`profit_proposed`	Expected post-migration throughput	14 deploys/day normalized: 14/50 = 0.28
`novelty_score`	Change type classification	Schema migration = 0.6
`complexity_score`	1 - (services * tables / max)	1 - (3 * 1 / 20) = 0.85
`quality_score`	CI pipeline metrics	[test_pass=0.98, lint=1.0, review=0.9] avg = 0.96
`estimated_impact`	Services affected	3 services = "high"

JSON Input

{
  "proposal_summary": "Schema migration: users table (5M rows), 3 services, 10-min maintenance window",
  "estimated_impact": "high",
  "risk_baseline": 0.02,
  "risk_proposed": 0.08,
  "profit_baseline": 0.24,
  "profit_proposed": 0.28,
  "novelty_score": 0.6,
  "complexity_score": 0.85,
  "quality_score": 0.96,
  "quality_subscores": [0.98, 1.0, 0.9],
  "agent_id": "deploy-bot-prod",
  "reversible": false,
  "requires_human_approval": true,
  "time_sensitive": true
}

Expected Decision Walkthrough

Gate	Value	Threshold	Result
Risk	delta = (0.08-0.02)/0.02 = 3.0	P(delta >= 2.0) > 0.95	FAIL — risk more than doubled
Profit	delta = (0.28-0.24)/0.24 = 0.17	Performance improved	PASS
Novelty	G(0.6) ≈ 0.27	0.27 < 0.6	FAIL — insufficient novelty
Complexity	0.85 >= 0.5	Floor check	PASS
Quality	0.96 >= 0.7, no zeros	Min score + subscore check	PASS

Expected status: ESCALATE (high impact + multiple gate failures)

Both the risk gate and novelty gate fail. The risk gate fails because the error rate quadrupled (3x > trigger_factor 2.0). The novelty gate fails because G(0.6) ≈ 0.27 < 0.6. Because estimated_impact is "high", the decision escalates rather than just pausing. The next_steps will include "Obtain human approval" (from requires_human_approval=true) and rollback planning guidance (from reversible=false).

Template 3: Content Moderation Policy Update

Scenario

A trust & safety team is updating the content moderation policy to add a new category for AI-generated misinformation. This is a new policy area with no direct precedent. The team has high confidence in the rule definitions but limited data on false positive rates.

Parameter Mapping

AEGIS Parameter	Domain Metric	Derivation
`risk_baseline`	Current false positive rate	0.03 (3% FPR)
`risk_proposed`	Estimated FPR with new rules	0.05 (5% estimated, uncertain)
`profit_baseline`	Precision	0.95
`profit_proposed`	Expected precision	0.92 (slightly lower due to new category)
`novelty_score`	Policy precedent	No precedent = 0.85
`complexity_score`	1 - (rules / total_rules)	1 - (15 new / 200 total) = 0.925
`quality_score`	Review composite	[rule_clarity=0.9, legal_review=0.8, annotator_agreement=0.75] avg = 0.82
`estimated_impact`	User base affected	All users = "critical"

JSON Input

{
  "proposal_summary": "Add AI-generated misinformation category to content moderation policy (15 new rules)",
  "estimated_impact": "critical",
  "risk_baseline": 0.03,
  "risk_proposed": 0.05,
  "profit_baseline": 0.95,
  "profit_proposed": 0.92,
  "novelty_score": 0.85,
  "complexity_score": 0.925,
  "quality_score": 0.82,
  "quality_subscores": [0.9, 0.8, 0.75],
  "agent_id": "trust-safety-reviewer",
  "reversible": true,
  "requires_human_approval": true
}

Expected Decision Walkthrough

Gate	Value	Threshold	Result
Risk	delta = (0.05-0.03)/0.03 = 0.67	P(delta >= 2.0) < 0.95	PASS — risk increase below trigger
Profit	delta = (0.92-0.95)/0.95 = -0.03	Small decrease	PASS — decrease is minimal
Novelty	G(0.85) ≈ 0.82	0.82 >= 0.6	PASS — high novelty meets threshold
Complexity	0.925 >= 0.5	Floor check	PASS
Quality	0.82 >= 0.7, no zeros	Min score + subscore check	PASS

Expected status: ESCALATE (critical impact — all gates pass but critical always escalates)

All gates pass, including the novelty gate — G(0.85) ≈ 0.82 comfortably clears the 0.6 threshold, meaning the proposal demonstrates sufficient novelty. However, with estimated_impact=critical, the decision automatically escalates regardless of gate outcomes. The rationale will note that all gates passed but human review is required due to critical blast radius.

Mitigation: Deploy to 5% of traffic first (reduces estimated_impact to "low"), then gradually expand. With all gates passing, a low-impact deployment would receive PROCEED.

Template 4: Autonomous Agent Self-Governance

Scenario

An autonomous coding agent (e.g., Claude Code, Codex) is evaluating whether to proceed with a large refactoring task that touches 15 files across 3 modules. The agent has been tracking its own success/failure rates for the past month and wants to use shadow mode for calibration.

Parameter Mapping

AEGIS Parameter	Domain Metric	Derivation
`risk_baseline`	Recent failure rate	0.05 (5% task failure rate)
`risk_proposed`	Estimated failure rate for this task	0.12 (complex refactor)
`profit_baseline`	Code quality before	0.7 (maintainability index)
`profit_proposed`	Expected quality after	0.85 (cleaner architecture)
`novelty_score`	Action confidence	1 - 0.75 confidence = 0.25 (familiar pattern)
`complexity_score`	1 - (files * modules / max)	1 - (15 * 3 / 100) = 0.55
`quality_score`	Self-assessment	[plan_coherence=0.8, test_coverage=0.7, safety=0.9] avg = 0.8
`estimated_impact`	Files changed	15 files, 3 modules = "high"

JSON Input (Shadow Mode Calibration)

{
  "proposal_summary": "Refactor authentication module: 15 files across 3 modules for improved maintainability",
  "estimated_impact": "high",
  "risk_baseline": 0.05,
  "risk_proposed": 0.12,
  "profit_baseline": 0.7,
  "profit_proposed": 0.85,
  "novelty_score": 0.25,
  "complexity_score": 0.55,
  "quality_score": 0.8,
  "quality_subscores": [0.8, 0.7, 0.9],
  "agent_id": "claude-code-refactor",
  "shadow_mode": true,
  "reversible": true,
  "drift_baseline_data": [0.04, 0.06, 0.05, 0.03, 0.07, 0.05, 0.04, 0.06, 0.05, 0.04,
                          0.05, 0.03, 0.06, 0.05, 0.04, 0.07, 0.05, 0.06, 0.04, 0.05,
                          0.03, 0.05, 0.06, 0.04, 0.05, 0.07, 0.05, 0.04, 0.06, 0.05,
                          0.04, 0.05, 0.03]
}

Expected Decision Walkthrough

Gate	Value	Threshold	Result
Risk	delta = (0.12-0.05)/0.05 = 1.4	P(delta >= 2.0) < 0.95	PASS — increased but below 2x
Profit	delta = (0.85-0.7)/0.7 = 0.21	Performance improved	PASS
Novelty	G(0.25) ≈ 0.01	0.01 < 0.6	FAIL — insufficient novelty (familiar pattern scores low)
Complexity	0.55 >= 0.5	Floor check	PASS — barely above floor
Quality	0.8 >= 0.7, no zeros	Min score + subscore check	PASS
Drift	KL against baseline	KL < 0.3 expected	PASS

Expected status: ESCALATE (high impact + novelty gate failure — but in shadow mode, advisory only)

The novelty gate fails because G(0.25) ≈ 0.01 is well below the 0.6 threshold. A low novelty score means the proposal lacks sufficient novelty — the agent's high confidence (0.75) translates to low novelty (0.25), which the gate interprets as insufficient. With estimated_impact=high and a failing gate, the decision escalates. Because shadow_mode=true, the decision is advisory. The response includes a shadow_result object with:

shadow_only: true
observation_values: current metric values for future drift comparison
baseline_hash: SHA-256 of the baseline data (tamper detection)

Calibration insight: The ESCALATE result reveals that mapping novelty_score = 1 - confidence may be too aggressive for routine refactoring tasks. If the agent is confident, that should not trigger escalation. Consider mapping novelty differently — e.g., only use high novelty scores for genuinely novel actions (new language, unfamiliar codebase, first-time architecture change).

Calibration workflow:

Run in shadow mode for 30+ days, collecting decisions
Review shadow results — are the gates producing sensible outcomes?
Adjust estimated_impact or metric derivations if gates are too strict/lenient (e.g., this example suggests refining the novelty mapping)
Switch shadow_mode to false to enforce decisions

Note on complexity_score: At 0.55, this is barely above the 0.5 floor. If the refactor scope grows to 20+ files, recalculate — dropping below 0.5 would cause an unoverridable HALT.

Common Integration Patterns

Pattern: Gradual Rollout

Start with shadow mode, then enforce on low-risk changes first:

Phase 1 (Week 1-4):   shadow_mode=true,  all proposals
Phase 2 (Week 5-8):   shadow_mode=false, estimated_impact=low only
Phase 3 (Week 9-12):  shadow_mode=false, low + medium
Phase 4 (Week 13+):   shadow_mode=false, all proposals

Pattern: Pre-flight Check

Call aegis_check_thresholds before submitting a proposal to understand what gate values will be evaluated:

{"method": "tools/call", "params": {"name": "aegis_check_thresholds", "arguments": {}}}

Pattern: Quick Risk Guard

For simple actions where only risk matters, use the simplified API:

{"method": "tools/call", "params": {"name": "aegis_quick_risk_check", "arguments": {"action_description": "Delete staging database", "risk_score": 0.8}}}

This returns safe: false (0.8 >= 0.5 threshold) without full gate evaluation. Use aegis_evaluate_proposal for actual governance decisions.

References

Parameter Reference — Complete parameter documentation
Production Guide — Deployment and observability
Interface Contract — Frozen parameter values

AEGIS Domain Integration Templates

On this page