Shadow Mode Prerequisites¶

Version: 1.3.0 Date: 2026-02-09 Status: Phase 1 code-complete — pcw_decide(shadow_mode=True) available

Overview¶

Shadow mode (Phase 1 of the Implementation Plan) runs the Guardrail Framework in parallel with an existing decision-making system. Before deploying shadow mode, the following prerequisites must be satisfied.

This document answers three critical questions: 1. What is the guardrail evaluating? (Proposals) 2. What is it running alongside? (Existing Decision System) 3. What data does it need to start? (Baseline)

1. Define Your Proposals¶

What Is a "Proposal"?¶

A proposal is any entity that requires a risk/reward evaluation before approval. The Guardrail Framework is domain-agnostic; you must define what constitutes a proposal in your context.

Domain	Example Proposals	Evaluation Question
Trading	Trade orders, position changes	Should this trade be executed?
ML/AI	Model predictions, agent actions	Is this prediction/action safe?
Business	Investment proposals, projects	Should we approve this project?
Engineering	Code changes, deployments	Is this change safe to deploy?
Content	Generated content, recommendations	Should this content be shown?

Proposal Requirements¶

Each proposal MUST have:

Requirement	Description	Example
Unique ID	Identifier for tracking	`proposal_12345`
Timestamp	When the proposal was created	`2025-12-26T14:30:00Z`
Risk Inputs	Data needed to compute risk score	Revenue at risk, probability of failure
Profit Inputs	Data needed to compute profit score	Expected gain, strategic value
Complexity Inputs	Data for complexity scoring	Lines of code, new dependencies
Novelty Inputs	Data for novelty assessment	Similarity to past proposals
Quality Inputs	Data for quality sub-metrics	Validity, consistency, empirical support

Define Your Schema¶

Before shadow mode, create a Proposal Schema document that maps your domain entities to framework inputs:

# Example: Trading Domain Proposal Schema
proposal_schema:
  domain: "trading"
  entity: "trade_order"

  mapping:
    proposal_id: "order_id"
    timestamp: "order_timestamp"

    risk_inputs:
      R_base: "historical_var_95"        # Value-at-Risk from past 30 days
      R_prop: "projected_var_95"         # VaR if this trade executes

    profit_inputs:
      P_base: "baseline_expected_return"
      P_prop: "projected_expected_return"

    complexity_inputs:
      C_S: "instrument_count"            # Number of instruments
      C_D: "counterparty_count"          # New counterparties

    novelty_inputs:
      features: ["asset_class", "notional", "tenor", "strategy_type"]
      similarity_method: "cosine_distance_to_centroid"

    quality_inputs:
      validity: "model_validation_score"
      consistency: "internal_limit_check"
      support: "backtesting_score"

2. Existing Decision System¶

What Must Already Exist¶

Shadow mode requires an existing decision-making process that the guardrail runs alongside. This can be:

Type	Description	Integration Point
Human Review	People evaluate and approve proposals	Hook into review workflow
Rules Engine	Existing automated rules/limits	Wrap or intercept rule evaluation
ML Model	Existing prediction/classification model	Add guardrail as post-processor
Hybrid	Automated screening + human escalation	Integrate at both stages

Integration Architecture¶

┌─────────────────────────────────────────────────────────────────────┐
│                        PROPOSAL INTAKE                               │
│                              │                                       │
│                              ▼                                       │
│  ┌───────────────────────────────────────────────────────────────┐  │
│  │                    EXISTING SYSTEM                             │  │
│  │  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐        │  │
│  │  │   Rules     │───►│   Human     │───►│  Decision   │        │  │
│  │  │   Engine    │    │   Review    │    │  (ACTUAL)   │────────┼──┼──► Production
│  │  └─────────────┘    └─────────────┘    └──────┬──────┘        │  │
│  └───────────────────────────────────────────────┼───────────────┘  │
│                                                  │                   │
│                                                  │ Log for           │
│                                                  │ comparison        │
│                                                  ▼                   │
│  ┌───────────────────────────────────────────────────────────────┐  │
│  │                    SHADOW GUARDRAIL                            │  │
│  │  ┌─────────────┐    ┌─────────────┐    ┌─────────────┐        │  │
│  │  │  Compute    │───►│   Apply     │───►│  Decision   │        │  │
│  │  │  Metrics    │    │ Thresholds  │    │  (SHADOW)   │────────┼──┼──► Telemetry Only
│  │  └─────────────┘    └─────────────┘    └─────────────┘        │  │
│  └───────────────────────────────────────────────────────────────┘  │
│                                                                      │
│                              │                                       │
│                              ▼                                       │
│                    ┌─────────────────┐                              │
│                    │   COMPARISON    │                              │
│                    │   & TELEMETRY   │                              │
│                    └─────────────────┘                              │
└─────────────────────────────────────────────────────────────────────┘

Integration Checklist¶

Before shadow mode deployment:

[ ] Decision point identified: Where in your system are decisions made?
[ ] Hook mechanism defined: How will you intercept proposals?
[ ] Outcome capture: How will you record the actual decision (human/system)?
[ ] Non-blocking guarantee: Shadow evaluation cannot slow down production path
[ ] Failure isolation: Shadow service failure cannot impact production

3. Baseline Data¶

What Is Baseline?¶

The baseline is the historical distribution of metrics against which new proposals are compared. It answers: "What does normal look like?"

Baseline Requirements¶

Component	Minimum	Recommended	Purpose
Historical proposals	100	1,000+	Statistical significance
Time period	30 days	90 days	Capture variability
Coverage	80% of proposal types	95%+	Avoid blind spots
Completeness	All required fields	+ optional fields	Full metric computation

Baseline Data Structure¶

# Baseline snapshot requirements
baseline_snapshot:
  version: "1.0"
  created_at: "2025-12-26T00:00:00Z"
  sha256_hash: "<computed at creation>"

  period:
    start: "2025-09-26"
    end: "2025-12-25"
    days: 91

  statistics:
    total_proposals: 2847
    approved: 2102
    rejected: 745

  distributions:
    risk_scores:
      mean: 0.23
      std: 0.15
      p50: 0.19
      p90: 0.42
      p99: 0.71
      histogram_bins: [0.0, 0.1, 0.2, ..., 1.0]
      histogram_counts: [142, 387, 521, ...]

    profit_scores:
      mean: 0.45
      std: 0.22
      # ... similar structure

    novelty_scores:
      # ...

    complexity_scores:
      # ...

Creating Your Baseline¶

Step 1: Export Historical Data

-- Example: Export proposals from last 90 days
SELECT
    proposal_id,
    created_at,
    risk_score,
    profit_score,
    novelty_score,
    complexity_score,
    quality_score,
    decision,
    decision_maker
FROM proposals
WHERE created_at >= CURRENT_DATE - INTERVAL '90 days'
  AND status = 'completed';

Step 2: Compute Distributions

pan>import numpy as np class=kn>import hashlib class=kn>import json class=k>def create_baseline(proposals_df): baseline = { "version": "1.0", "created_at": datetime.now().isoformat(), "period": { "start": proposals_df['created_at'].min().isoformat(), "end": proposals_df['created_at'].max().isoformat(), "days": (proposals_df['created_at'].max() - proposals_df['created_at'].min()).days }, "statistics": { "total_proposals": len(proposals_df), "approved": len(proposals_df[proposals_df['decision'] == 'approved']), "rejected": len(proposals_df[proposals_df['decision'] == 'rejected']) }, "distributions": {} } for metric in ['risk_score', 'profit_score', 'novelty_score', 'complexity_score', 'quality_score']: values = proposals_df[metric].dropna() hist, bins = np.histogram(values, bins=10, range=(0, 1)) baseline["distributions"][metric] = { "mean": float(np.mean(values)), "std": float(np.std(values)), "p50": float(np.percentile(values, 50)), "p90": float(np.percentile(values, 90)), "p99": float(np.percentile(values, 99)), "histogram_bins": bins.tolist(), "histogram_counts": hist.tolist() } # Compute hash for integrity verification baseline_json = json.dumps(baseline, sort_keys=True) baseline["sha256_hash"] = hashlib.sha256(baseline_json.encode()).hexdigest() return baseline Step 3: Freeze and Version # Save baseline snapshot python create_baseline.py --output baseline-v1.0.json # Create immutable reference sha256sum baseline-v1.0.json > baseline-v1.0.sha256 # Store in version control or immutable storage aws s3 cp baseline-v1.0.json s3://guardrail-baselines/ --metadata sha256=$(cat baseline-v1.0.sha256) 4. Infrastructure Prerequisites¶ Telemetry Storage¶ Shadow mode generates telemetry for every proposal. You need: Requirement Specification Notes Database Time-series DB (InfluxDB, TimescaleDB) or data warehouse Must handle high write volume Retention Minimum 90 days For calibration and audit Schema Per Interface Contract telemetry fields See §Shared Telemetry Query capability Aggregations, percentiles, joins For analysis and dashboards Compute Resources¶ Component Minimum Recommended Shadow service 1 vCPU, 2GB RAM 2 vCPU, 4GB RAM Latency budget < 100ms p95 < 50ms p95 Availability 99% 99.9% Failure mode Fail open (log error, don't block) Circuit breaker pattern Monitoring¶ Before shadow mode, deploy: [ ] Metrics dashboard: Guardrail decision distribution, latency [ ] Comparison dashboard: Shadow vs. actual decision agreement rate [ ] Drift dashboard: KL divergence over time [ ] Alerting: Service health, error rates, decision anomalies 5. Operational Prerequisites¶ Roles and Responsibilities¶ Role Responsibility Access Level Risk Analyst Monitor guardrail decisions, calibrate thresholds Read telemetry, propose changes DevOps Deploy and maintain shadow service Infrastructure access Data Scientist Analyze comparison data, recommend calibration Read telemetry, run analysis Risk Lead Approve threshold changes, escalation point Approve parameter changes Runbooks¶ Before shadow mode, document: [ ] Shadow service deployment: How to deploy, configure, scale [ ] Telemetry troubleshooting: Missing data, lag, errors [ ] Comparison analysis: How to interpret agreement/disagreement [ ] Escalation path: Who to contact for anomalies Success Criteria¶ Shadow mode is ready for Phase 2 (Red Team) when: Criterion Target Measurement Data coverage 100% of proposals logged Count in telemetry vs. production Uptime > 99% Shadow service availability Latency < 100ms p95 Service response time Data quality < 1% null fields Telemetry completeness Duration 30+ days Calendar time with data 6. Checklist Summary¶ Before Deployment¶ [ ] Proposals defined: Schema mapping domain entities to framework inputs [ ] Existing system identified: Decision point and integration mechanism [ ] Baseline created: 30+ days historical data, frozen and hashed [ ] Telemetry storage ready: Database provisioned with correct schema [ ] Compute provisioned: Shadow service deployed (can be stub initially) [ ] Monitoring deployed: Dashboards and alerting configured [ ] Runbooks written: Operational procedures documented [ ] Roles assigned: Team members know their responsibilities Day 1 Validation¶ [ ] First proposal successfully logged to telemetry [ ] Guardrail decision computed (even if naive) [ ] Actual decision captured for comparison [ ] No impact on production latency [ ] Dashboard shows data flowing 7. Concrete Example: AI Customer Service Agent¶ This section walks through a complete example of applying the prerequisites to a real use case. Scenario¶ Company: FinServe Inc., a financial services company System: AI-powered customer service agent that handles account inquiries Risk: Agent might provide incorrect financial advice, expose sensitive data, or take unauthorized actions Step 1: Define the Proposals¶ What is a "proposal" here? Every action the AI agent wants to take is a proposal: - Responding to a customer query - Looking up account information - Initiating a transaction - Escalating to a human agent Proposal Schema: proposal_schema: domain: "customer_service_agent" entity: "agent_action" mapping: proposal_id: "action_id" # UUID for each agent action timestamp: "action_timestamp" risk_inputs: R_base: 0.15 # Historical average risk for this action type R_prop: # Computed from: - data_sensitivity_score # 0-1: How sensitive is the data accessed? - financial_impact_score # 0-1: Could this cost money? - compliance_risk_score # 0-1: Regulatory implications? # R_prop = weighted average of above profit_inputs: P_base: 0.60 # Historical customer satisfaction baseline P_prop: - resolution_probability # 0-1: Will this resolve the issue? - customer_sentiment_score # 0-1: Predicted customer satisfaction - efficiency_score # 0-1: Time/cost savings complexity_inputs: C_S: "systems_accessed_count" # Number of backend systems touched C_D: "external_api_calls" # External services called novelty_inputs: features: - action_type # "respond", "lookup", "transact", "escalate" - query_category # "balance", "payment", "dispute", etc. - customer_tier # "standard", "premium", "vip" - account_type # "checking", "savings", "investment" similarity_method: "embedding_cosine_distance" embedding_model: "text-embedding-3-small" quality_inputs: validity: "response_format_valid" # Does response follow required format? consistency: "no_contradictions_detected" # Consistent with previous statements? support: "citations_provided" # Are claims backed by data? Step 2: Identify the Existing Decision System¶ Current Process: 1. Customer submits inquiry via chat 2. AI agent generates proposed response 3. Human supervisor reviews high-risk responses (current guardrail) 4. Response sent to customer Decision Point: Between steps 2 and 3 - after AI generates response, before it's sent. Integration Architecture: Customer ──► AI Agent ──► [Proposed Response] │ ┌───────────────┴───────────────┐ │ │ ▼ ▼ ┌───────────────────┐ ┌───────────────────┐ │ EXISTING SYSTEM │ │ SHADOW GUARDRAIL │ │ │ │ │ │ Risk Score Check │ │ Compute Metrics │ │ (simple rules) │ │ (full framework) │ │ │ │ │ │ │ │ ▼ │ │ ▼ │ │ High Risk? │ │ Log Decision │ │ ├─ Yes: Human │ │ + All Metrics │ │ └─ No: Auto-send │ │ │ └─────────┬─────────┘ └─────────┬─────────┘ │ │ ▼ ▼ [Actual Decision] [Telemetry Database] │ ▼ Customer Existing Rules (what shadow mode runs alongside): # Current simple rule-based system def existing_risk_check(response): risk_score = 0 # Rule 1: Financial amounts trigger review if contains_dollar_amount(response) and amount > 1000: risk_score += 0.4 # Rule 2: Account changes trigger review if action_type in ["transact", "modify_account"]: risk_score += 0.3 # Rule 3: Sensitive topics if topic in ["fraud", "complaint", "legal"]: risk_score += 0.3 if risk_score >= 0.5: return "human_review" else: return "auto_approve" Step 3: Create the Baseline¶ Data Collection Query: -- Export 90 days of agent actions for baseline SELECT action_id, action_timestamp, action_type, query_category, customer_tier, -- Risk components data_sensitivity_score, financial_impact_score, compliance_risk_score, -- Profit components resolution_probability, customer_sentiment_score, efficiency_score, -- Complexity systems_accessed_count, external_api_calls, -- Quality response_format_valid, no_contradictions_detected, citations_provided, -- Outcomes existing_system_decision, -- 'auto_approve' or 'human_review' human_decision, -- 'approve', 'reject', 'modify', NULL if auto customer_feedback_score, -- 1-5 stars, NULL if not provided escalation_occurred -- Did customer ask for human after? FROM agent_actions WHERE action_timestamp >= CURRENT_DATE - INTERVAL '90 days' AND action_status = 'completed'; Baseline Statistics (example output): baseline_snapshot: version: "1.0" created_at: "2025-12-26T00:00:00Z" sha256_hash: "a1b2c3d4e5f6..." period: start: "2025-09-27" end: "2025-12-25" days: 90 statistics: total_actions: 127,543 auto_approved: 98,234 (77%) human_reviewed: 29,309 (23%) human_approved: 27,891 (95% of reviewed) human_rejected: 1,418 (5% of reviewed) distributions: risk_scores: mean: 0.23 std: 0.18 p50: 0.18 p90: 0.47 p99: 0.72 # Note: Existing system uses 0.5 threshold novelty_scores: mean: 0.31 std: 0.22 p50: 0.26 p90: 0.61 p99: 0.89 # High novelty often correlates with escalations # Key insight from baseline: insights: - "23% of actions go to human review" - "Only 5% of reviewed actions are rejected" - "High novelty (>0.6) correlates with 3x escalation rate" - "Actions accessing 3+ systems have 2x rejection rate" Step 4: Infrastructure Setup¶ Telemetry Schema (for this use case): CREATE TABLE guardrail_telemetry ( -- Identifiers telemetry_id UUID PRIMARY KEY, action_id UUID NOT NULL, timestamp TIMESTAMPTZ NOT NULL, -- Proposal Context action_type VARCHAR(50), query_category VARCHAR(100), customer_tier VARCHAR(20), -- Guardrail Metrics risk_score FLOAT, profit_score FLOAT, novelty_score FLOAT, complexity_score FLOAT, quality_score FLOAT, -- Sub-metrics (for debugging) data_sensitivity FLOAT, financial_impact FLOAT, compliance_risk FLOAT, -- Decisions guardrail_decision VARCHAR(20), -- 'pass' or 'fail' existing_decision VARCHAR(20), -- 'auto_approve' or 'human_review' human_decision VARCHAR(20), -- 'approve', 'reject', 'modify', NULL -- KL Drift Tracking kl_divergence FLOAT, drift_status VARCHAR(20), -- 'normal', 'warning', 'critical' -- Traceability param_snapshot_id VARCHAR(50), -- 'guardrail-v1.1-freeze' baseline_hash VARCHAR(64), embedding_hash VARCHAR(64), -- Indexes INDEX idx_timestamp (timestamp), INDEX idx_action_id (action_id), INDEX idx_drift_status (drift_status) ); Dashboard Queries: -- Agreement Rate: How often does guardrail match existing system? SELECT DATE(timestamp) as date, COUNT(*) as total, SUM(CASE WHEN guardrail_decision = 'fail' AND existing_decision = 'human_review' THEN 1 ELSE 0 END) as both_flagged, SUM(CASE WHEN guardrail_decision = 'pass' AND existing_decision = 'auto_approve' THEN 1 ELSE 0 END) as both_passed, SUM(CASE WHEN guardrail_decision = 'fail' AND existing_decision = 'auto_approve' THEN 1 ELSE 0 END) as guardrail_stricter, SUM(CASE WHEN guardrail_decision = 'pass' AND existing_decision = 'human_review' THEN 1 ELSE 0 END) as guardrail_lenient FROM guardrail_telemetry WHERE timestamp >= CURRENT_DATE - INTERVAL '7 days' GROUP BY DATE(timestamp) ORDER BY date; -- False Negative Detection: Actions guardrail passed but humans rejected SELECT * FROM guardrail_telemetry WHERE guardrail_decision = 'pass' AND human_decision = 'reject' AND timestamp >= CURRENT_DATE - INTERVAL '30 days'; Step 5: Integration Code¶ Shadow Service Wrapper: from dataclasses import dataclass from datetime import datetime import hashlib import json @dataclass class AgentAction: action_id: str action_type: str query_category: str customer_tier: str proposed_response: str context: dict class ShadowGuardrail: def __init__(self, config): self.param_snapshot_id = config['param_snapshot_id'] self.baseline = self.load_baseline(config['baseline_path']) self.telemetry_client = TelemetryClient(config['telemetry_endpoint']) def evaluate(self, action: AgentAction) -> dict: """ Evaluate action through guardrail framework. Returns decision and metrics (does NOT block action). """ start_time = datetime.now() # Compute all metrics risk_score = self.compute_risk_score(action) profit_score = self.compute_profit_score(action) novelty_score = self.compute_novelty_score(action) complexity_score = self.compute_complexity_score(action) quality_score = self.compute_quality_score(action) # Apply guardrail logic guardrail_decision = self.apply_thresholds( risk_score, profit_score, novelty_score, complexity_score, quality_score ) # Compute KL divergence for drift detection kl_divergence = self.compute_kl_divergence(risk_score) drift_status = self.get_drift_status(kl_divergence) # Build telemetry record telemetry = { "action_id": action.action_id, "timestamp": datetime.now().isoformat(), "action_type": action.action_type, "query_category": action.query_category, "customer_tier": action.customer_tier, "risk_score": risk_score, "profit_score": profit_score, "novelty_score": novelty_score, "complexity_score": complexity_score, "quality_score": quality_score, "guardrail_decision": guardrail_decision, "kl_divergence": kl_divergence, "drift_status": drift_status, "param_snapshot_id": self.param_snapshot_id, "baseline_hash": self.baseline['sha256_hash'], "latency_ms": (datetime.now() - start_time).total_seconds() * 1000 } # Log telemetry (async, non-blocking) self.telemetry_client.log_async(telemetry) return { "decision": guardrail_decision, "metrics": telemetry } def compute_risk_score(self, action: AgentAction) -> float: # Implementation based on proposal schema data_sensitivity = self.score_data_sensitivity(action) financial_impact = self.score_financial_impact(action) compliance_risk = self.score_compliance_risk(action) # Weighted combination weights = [0.4, 0.35, 0.25] raw_score = ( weights[0] * data_sensitivity + weights[1] * financial_impact + weights[2] * compliance_risk ) # Normalize relative to baseline baseline_mean = self.baseline['distributions']['risk_scores']['mean'] epsilon = 0.01 normalized = (raw_score - baseline_mean) / max(baseline_mean, epsilon) return max(0, min(1, (normalized + 1) / 2)) # Scale to 0-1 # ... other scoring methods ... # Integration point: wrap existing decision function def process_agent_action(action: AgentAction): """ Main processing function - integrates shadow guardrail. """ # 1. Run shadow guardrail (non-blocking) shadow_result = shadow_guardrail.evaluate(action) # 2. Run existing system (this is what actually decides) existing_decision = existing_risk_check(action) # 3. Log comparison (for calibration analysis) shadow_guardrail.telemetry_client.log_comparison( action_id=action.action_id, guardrail_decision=shadow_result['decision'], existing_decision=existing_decision ) # 4. Return existing system's decision (shadow doesn't affect outcome) return existing_decision Step 6: Day 1 Validation¶ Checklist for FinServe Inc.: Check Status Notes First action logged Verify in telemetry DB Should see within minutes of deployment All fields populated Query for NULL counts SELECT COUNT(*) WHERE risk_score IS NULL Latency acceptable Check p95 < 100ms Monitor service metrics No production impact Compare response times Before/after shadow deployment Existing system unaffected Verify decision rates Same approval/rejection ratio Dashboard working View live data Grafana/similar shows telemetry Week 1 Analysis Questions: Agreement Rate: How often do guardrail and existing system agree? Target: > 80% initially (will improve with calibration) Stricter Cases: When guardrail would reject but existing system approves Review these: Are they actually risky? Lenient Cases: When guardrail would pass but existing system flags Review these: Is existing system too conservative? Novelty Correlation: Do high-novelty actions correlate with human rejections? Validates novelty scoring Drift Baseline: Is KL divergence stable? Should be near 0 in first week (no drift from fresh baseline) Appendix: Common Pitfalls¶ Pitfall 1: No Existing System¶ Problem: "We don't have an existing decision system - we want the guardrail to BE the system." Solution: You still need a comparison point for calibration. Options: - Use human review as the baseline (all proposals go to humans initially) - Use a simple rules-based system as the baseline - Accept that calibration will take longer without comparison data Pitfall 2: Insufficient Historical Data¶ Problem: "We don't have 30 days of historical proposals." Solution: Options in order of preference: 1. Wait and collect data (delays deployment) 2. Use synthetic baseline based on domain expertise (document assumptions) 3. Start with conservative thresholds and calibrate as data accumulates Pitfall 3: Schema Mismatch¶ Problem: "Our proposals don't have all the fields the framework expects." Solution: - Map existing fields to framework inputs (may require derived calculations) - For missing inputs, use conservative defaults or disable that guardrail dimension - Document gaps for future iteration Pitfall 4: Integration Complexity¶ Problem: "We can't easily hook into our existing decision system." Solution: - Consider async integration (read from decision log, not inline) - Start with batch processing (daily analysis) before real-time - Build the comparison manually initially to prove value References¶ Implementation Plan: Hardened_Guardrail_Framework_Implementation_Plan.md (Phase 1) Interface Contract: Guardrail_v1.1.1_Interface_Contract_and_Addenda.md (Telemetry Schema) Baseline requirements: Interface Contract §baseline_feed_hash

Requirement	Specification	Notes
Database	Time-series DB (InfluxDB, TimescaleDB) or data warehouse	Must handle high write volume
Retention	Minimum 90 days	For calibration and audit
Schema	Per Interface Contract telemetry fields	See §Shared Telemetry
Query capability	Aggregations, percentiles, joins	For analysis and dashboards

Component	Minimum	Recommended
Shadow service	1 vCPU, 2GB RAM	2 vCPU, 4GB RAM
Latency budget	< 100ms p95	< 50ms p95
Availability	99%	99.9%
Failure mode	Fail open (log error, don't block)	Circuit breaker pattern

Role	Responsibility	Access Level
Risk Analyst	Monitor guardrail decisions, calibrate thresholds	Read telemetry, propose changes
DevOps	Deploy and maintain shadow service	Infrastructure access
Data Scientist	Analyze comparison data, recommend calibration	Read telemetry, run analysis
Risk Lead	Approve threshold changes, escalation point	Approve parameter changes

Criterion	Target	Measurement
Data coverage	100% of proposals logged	Count in telemetry vs. production
Uptime	> 99%	Shadow service availability
Latency	< 100ms p95	Service response time
Data quality	< 1% null fields	Telemetry completeness
Duration	30+ days	Calendar time with data

Check	Status	Notes
First action logged	Verify in telemetry DB	Should see within minutes of deployment
All fields populated	Query for NULL counts	`SELECT COUNT(*) WHERE risk_score IS NULL`
Latency acceptable	Check p95 < 100ms	Monitor service metrics
No production impact	Compare response times	Before/after shadow deployment
Existing system unaffected	Verify decision rates	Same approval/rejection ratio
Dashboard working	View live data	Grafana/similar shows telemetry