AEGIS CLAUDE.md Changelog¶

Relocated from: CLAUDE.md Section 9 (Change Management) Purpose: Preserve full version history while keeping CLAUDE.md concise for agent system prompt consumption Coverage: v2.1.0 (2025-12-26) through v4.5.59 (2026-02-25) Active CLAUDE.md: See /CLAUDE.md for current rolling changelog (latest 2 versions)

v4.5.59 — Utility Scale & Sign Convention Fixes (2026-02-25)¶

Scope: Lambda auto-utility scale mismatch and sign convention fixes, E2E validation Test Coverage: 3041 tests passing, ~94.9% coverage (2 skipped) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changes¶

Complexity tax scale fix: Zeroed ComplexityBreakdown(static=0.0, dynamic=0.0) — phi_S=100.0 $/pt coefficient overwhelmed normalized [0,1] profit deltas (LCB=-64.9 for a 0.10 profit improvement)
Risk term sign fix: Zeroed risk delta via _make_pert(0.0) — risk_term = kappa * delta_R produces negative values when risk decreases (delta_R < 0), counteracting the profit gain (LCB=-0.015 for a beneficial proposal)
Design rationale: Risk and complexity gates evaluate independently; the utility gate now measures pure profit uplift, avoiding scale mismatches between normalized advisor inputs and dollar-denominated calculator coefficients
E2E validation: 8 API scenarios via browser fetch (PROCEED, PAUSE, HALT, multi-domain) + 3 full wizard walkthroughs (Engineering low-impact, Engineering worst-case, Life Decision) confirm correct behavior

Deployed¶

Lambda redeployed via aegis-deploy.yml workflow_dispatch (dev stage)
CI green: Python CI + docs deploy + CDK deploy + smoke test all passing

v4.5.58 — Advisor Utility & Novelty Gate Fixes (2026-02-25)¶

Scope: Lambda auto-utility computation, advisor novelty step reframe Test Coverage: 3041 tests passing, ~94.9% coverage (2 skipped, +12 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changes¶

Lambda auto-utility: Added _make_pert() and _compute_utility() — synthesizes PERT three-point estimates from flat risk/profit/complexity parameters, auto-computes UtilityResult when not explicitly provided. Utility gate now produces real values instead of N/A for advisor proposals.
Advisor novelty reframe: Step 7 changed from "How new is this?" to "How well-documented is this type of change?" — inverted value mapping so well-documented precedent = high score = passes gate. Radio values: 0.95/0.88/0.72/0.40 (was 0.2/0.5/0.7/0.95). Tool modifiers: +0.02/-0.03/-0.08 (was 0/+0.1/+0.2).
Gate explanations: Novelty pass/fail messages updated to match precedent framing
Review screen: Label changed from "Novelty" to "Precedent"

v4.5.57 — Advisor & Lambda CORS Fixes (2026-02-25)¶

Scope: Advisor evaluate button fix, Lambda CORS headers, custom domain migration Test Coverage: 3029 tests passing, ~94.8% coverage (2 skipped) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changes¶

Advisor evaluate() DOM collision: Renamed evaluate() to runEvaluation() — inline onclick handler resolved to document.evaluate() (XPath), not the custom function (3 call sites)
Lambda CORS headers: Added Access-Control-Allow-Origin: *, Allow-Headers, Allow-Methods to _response() — API Gateway proxy integration passes Lambda response as-is, so CORS headers must be in Lambda response (not just OPTIONS preflight)
Custom domain migration: Updated all undercurrentai.github.io URLs to aegis.undercurrentholdings.com in advisor HTML (3 links), pyproject.toml Documentation URL, and docs/api/rest.md CORS headers table
CDK deploy: Redeployed Lambda via aegis-deploy.yml workflow — smoke test passed

v4.5.56 — Bug Hunt #45 (2026-02-25)¶

Scope: Hybrid bug hunt (Codex gpt-5.3-codex xhigh + 3 Claude sweep agents) Test Coverage: 3029 tests passing, ~94.8% coverage (2 skipped, +31 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changes¶

BH45-Codex-M1: ProposalWorkflow.transition_to() metadata stored by reference — copy.deepcopy fix prevents caller mutation corrupting audit trail
BH45-M1: MCP risk_score eager evaluation — conditional fallback matching Lambda BH13-M1 fix (transport parity)
BH45-M2: BayesianPosterior.update_prior missing current_mean/current_std finiteness validation — added isfinite + validate_positive
BH45-T1: BayesianPosterior.update_prior current_mean missing bool guard (ultrathink finding) — added isinstance(bool) check
BH45-L1: PipelineConfig missing retention_days/drift_window_size validation — added type/range checks matching buffer_size pattern
BH45-L2: PipelineConfig pii_encryption_on_error/storage_on_error accept arbitrary strings — added enum validation (PII safety)
Deferred: 3 LOW findings (drift next_steps recomputation, schema_revisions counter, consensus QUORUM_MET semantics)

v4.5.55 — Scoring Guide MCP Tool + Advisor v2 (2026-02-25)¶

Scope: Scoring Guide MCP tool and Advisor v2 rewrite Test Coverage: 2998 tests passing, ~94.8% coverage (2 skipped, +31 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changes¶

Scoring Guide MCP Tool: New aegis_get_scoring_guide tool with 5-domain derivation guidance (trading, cicd, moderation, agents, generic); surfaces parameter formulas, range guides, common mistakes, and worked examples through the MCP protocol
Advisor v2: Complete rewrite with domain funnel (6 domains), 8-step factual scoring rubric replacing vibes-based sliders, real API calls with provisioned demo key
Ultrathink Fixes: Defensive copy for get_scoring_guide() return value, HTML escaping for gate detail values, null guard for override_requires
Test Coverage: 2998 tests passing, ~94.8% coverage (2 skipped, +31 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.54 — SaaS Commercialization Sprint (2026-02-24)¶

Scope: Full commercial readiness transformation — API key auth, customer provisioning, docs site, PyPI workflows Test Coverage: 2967 tests passing, ~94.8% coverage (2 skipped, +9 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changes¶

Phase 1 — API Key Auth: CDK lambda_stack.py switched from IAM to API key auth + usage plans; per-stage throttling (dev 50 req/s, staging 200 req/s, prod 500 req/s); /health fully public; CORS widened to *; optional custom domain via CDK context; UsagePlanId CfnOutput added
Phase 1 — Tenant Context: Lambda handler extracts tenant_id from requestContext.identity.apiKeyId; injects _tenant_id and _request_id into response body; adds X-AEGIS-Tenant and X-AEGIS-Request-Id headers; 404 responses include tenant context (UT-1 fix)
Phase 1 — Provisioning: New scripts/provision-customer.py — boto3 script for API key creation + usage plan attachment + usage queries; show-once key value; rollback on failure; env var fallbacks
Phase 2 — OpenAPI: New docs/api/openapi.yaml (OpenAPI 3.1.0) — 3 endpoints, 9 component schemas, ApiKeyAuth security scheme
Phase 2 — REST Docs: New docs/getting-started/quickstart-rest.md and docs/api/rest.md — curl examples, field tables, error codes
Phase 3 — Docs Site: New mkdocs.yml + 10 new docs pages (index, installation, SDK/CLI/REST/MCP quickstarts, onboarding, GitHub Action, AI governance); docs-deploy.yml GitHub Pages workflow
Phase 3 — PyPI: New .github/workflows/pypi-publish.yml — OIDC trusted publishing; multi-version smoke test (3.9, 3.11, 3.12)
Phase 4 — Polish: New SECURITY.md (vulnerability disclosure), CHANGELOG.md (customer-facing); pyproject.toml bumped to v1.1.0 with mkdocs-minify-plugin + pymdown-extensions docs deps; Documentation + Changelog URLs added
New files: 22 files created (scripts, docs, workflows, config)
Governance invariants preserved: src/engine/, src/integration/pcw_decide.py, src/crypto/, schema/interface-contract.yaml, schema/rbac-definitions.yaml, .github/workflows/python-ci.yml all untouched

v4.5.53 — Transport Parity Fix (2026-02-24)¶

Scope: Comprehensive transport parity audit closing 15 of 22 gaps across CLI, MCP, and Lambda Test Coverage: 2958 tests passing, ~94.8% coverage (2 skipped, +35 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changes¶

GAP 1: metadata input extraction + validation in CLI and MCP (parity with Lambda)
GAP 2-4 (CRITICAL): requires_human_approval, time_sensitive, reversible bool flags in MCP — previously missing, silently bypassing human oversight controls
GAP 6-7: MCP inputSchema updated with 5 new documented properties (bool flags, metadata, session_id)
GAP 8: Lambda telemetry_emitter wired via _wire_lambda_telemetry() helper
GAP 12: MCP estimated_impact now strict — rejects non-string values (parity with CLI/Lambda)
GAP 15: CLI session_id changed from static "cli-session" to dynamic uuid.uuid4()
GAP 17: CLI SSRF validation via shared telemetry/url_validation.py module
GAP 18-19: MCP output now includes constraints and override_requires fields
GAP 20: MCP output now includes timestamp field
GAP 21: MCP output now includes per-gate confidence field
GAP 22: Lambda shadow drift dict includes message field

New Module¶

src/telemetry/url_validation.py — shared SSRF-safe URL validation extracted from MCP server; uses resolve-then-validate pattern with not addr.is_global (consistent across Python 3.9-3.12)

v4.5.52 — Bug Hunt #44 (2026-02-23)¶

Method: Hybrid — 3 Claude sweep agents + Codex gpt-5.3-codex (xhigh reasoning) Test Coverage: 2923 tests passing, ~94.8% coverage (2 skipped, +15 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Bugs Fixed (4 total: 1 Codex + 2M, 1L)¶

BH44-Codex-M1 (Codex) — crypto/schema_signer.py: sign_tools_list() chain state committed before manifest signing — partial state corruption on failure; reordered to sign manifest before committing chain state
BH44-M1 (Claude) — actors/calibrator.py: _NONNEGATIVE_GATE_PARAMS incorrectly includes utility_threshold — rejects negative values that GateEvaluator accepts; moved to appropriate parameter set
BH44-M2 (Claude) — actors/proposer.py: create_draft() does not catch TypeError from _validate_pert_estimates() — raw exception escapes instead of ActionResult failure; added try/except TypeError wrapper
BH44-L1 (Claude) — integration/pcw_decide.py: _evaluate_drift_policy returns aliased constraints list when drift_monitor is None — inconsistent with non-None path which returns a new list; changed to return list(constraints) defensive copy

v4.5.51 — Bug Hunt #43 (2026-02-23)¶

Method: Hybrid — 3 Claude sweep agents + Codex gpt-5.3-codex (xhigh reasoning) Test Coverage: 2908 tests passing, ~94.8% coverage (2 skipped, +31 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Bugs Fixed (11 total: 2 Codex + 5M, 4L)¶

BH43-Codex-M1 (Codex) — actors/analyst.py: Analyst missing try/except around gate evaluations — unhandled exceptions crash instead of returning ActionResult failure; wrapped gate evaluation calls in try/except with structured error return
BH43-Codex-M2 (Codex) — actors/analyst.py: Analyst missing else TypeError for non-list quality_subscores — silently ignores invalid type; added explicit isinstance(list) guard with TypeError for non-list/non-None input
BH43-M1 (Claude) — cli.py: CLI quality_subscores=null raises TypeError — Lambda/MCP default to [0.7, 0.7, 0.7] but CLI crashes on None; added null-coalesce parity with other transports
BH43-M2 (Claude) — engine/utility.py: ComplexityBreakdown accepts bool fields without validation — True/False silently coerce to 1/0 via bool-is-int; added isinstance(bool) guard in __post_init__
BH43-M3 (Claude) — engine/utility.py: UtilityCalculator.calculate() value_variance negative silently floored to 0 — produces overly optimistic LCB; changed to raise ValueError for negative variance
BH43-M4+M5 (Claude) — telemetry/pipeline.py: TelemetryPipeline.ingest() no defensive copy — bidirectional aliasing between caller dict and internal buffer; caller mutations corrupt queued events; added copy.copy() on ingest
BH43-L1 (Claude) — cli.py: CLI metric defaults ignores simplified alias when canonical key is null — dict.get("key", default) returns None (not default) for explicit JSON null; added explicit null-check before alias fallback
BH43-L2 (Claude) — engine/utility.py: UtilityCalculator.calculate() value_low_conf/opex_delta NaN/Inf not validated — non-finite values propagate through utility computation; added math.isfinite() guards
BH43-L3 (Claude) — engine/utility.py: UtilityCalculator.calculate() covariance terms NaN/Inf not validated — same pattern as L2; added math.isfinite() guards for cov_risk_value and cov_complexity_quality
BH43-L4 (Claude) — workflows/proposal.py: ProposalWorkflow.from_dict() uses cls() instead of cls.__new__() — constructor generates spurious created_at/updated_at timestamps that overwrite deserialized values; switched to cls.__new__(cls) pattern (matching ConsensusWorkflow)

Quality Gate — Ultrathink Findings (1)¶

QG-T1 — workflows/proposal.py: ProposalWorkflow.from_dict() missing evaluation_result=None for workflows that had no evaluation — deserialized workflows without evaluation results got stale/incorrect default; added explicit None assignment

v4.5.50 — Bug Hunt #42 (2026-02-23)¶

Method: Hybrid — 3 Claude sweep agents + Codex gpt-5.3-codex (xhigh reasoning) Test Coverage: 2877 tests passing, 94.81% coverage (2 skipped, +29 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Bugs Fixed (13 total: 3 Codex + 6M, 2L + 2 ultrathink)¶

BH42-M1 (Claude) — engine/complexity.py: ComplexityDecomposer mutable default weights dict — self.weights = weights or self.DEFAULT_WEIGHTS created shared reference; mutation of instance weights corrupted class-level DEFAULT_WEIGHTS for all future instances; changed to dict(weights) if weights else dict(self.DEFAULT_WEIGHTS) (defensive copy)
BH42-M2 (Claude) — actors/calibrator.py: novelty_k was in _NONZERO_GATE_PARAMS (rejects ==0) but GateEvaluator requires strictly positive; negative values trivially bypassed novelty gate; moved to _POSITIVE_GATE_PARAMS
BH42-M3 (Claude) — telemetry/prometheus_exporter.py: NaN/Inf latency permanently corrupts Prometheus histogram _sum (irreversible without process restart); added math.isfinite() guards to record_gate_evaluation(), record_decision(), and MetricsTimer.__exit__()
BH42-M4 (Claude) — telemetry/prometheus_exporter.py: NaN KL divergence set via set_kl_divergence() disables drift alerting permanently; added math.isfinite() guard with warning log
BH42-M5 (Claude) — telemetry/emitter.py: root emit() method used correlation_id or self.default_correlation_id — empty string "" (valid sentinel) replaced by default; changed to is not None pattern for both correlation_id and parent_event_id
BH42-M6 (Claude) — lambda_handler.py: shadow_mode = body.get("shadow_mode") is True silently ignored non-bool values (int 1, string "true"); added explicit isinstance(bool) validation matching other bool fields (requires_human_approval, reversible)
BH42-L1 (Claude) — integration/pcw_decide.py: risk_posterior = risk_gate.posterior_probability or 0.0 — zero posterior probability treated as falsy and replaced; changed to if is not None else 0.0
BH42-L2 (Claude) — integration/afa_bridge.py: same posterior_probability or 0.0 pattern as BH42-L1; fixed to is not None
BH42-Codex-M1 (Codex) — integration/afa_bridge.py: required = context.get("required_authorizations") or [] — explicit falsy values (empty list, 0) coerced to default; changed to explicit None check
BH42-Codex-M2 (Codex) — workflows/consensus.py: ConsensusConfig.allow_abstain accepted non-bool via coercion (1, "yes"); added isinstance(self.allow_abstain, bool) guard in __post_init__
BH42-Codex-L1 (Codex) — workflows/persistence/repository.py: concurrent save_checkpoint() calls caused IntegrityError on checkpoint_number uniqueness collision; added bounded retry loop (3 attempts) with _is_checkpoint_number_conflict() classifier

Quality Gate — Ultrathink Findings (2)¶

QG-T1 — aegis_governance/mcp_server.py: MCP _tool_evaluate_proposal() used shadow_mode = arguments.get("shadow_mode") is True — transport parity violation with Lambda; added isinstance(bool) validation returning error dict
QG-T2 — actors/analyst.py: 6 instances of gate_result.confidence or 0.0 — confidence=0.0 treated as falsy; replaced all with if gate_result.confidence is not None else 0.0

v4.5.49 — Bug Hunt #41 (2026-02-22)¶

Method: Hybrid — 3 Claude sweep agents + Codex gpt-5.3-codex (xhigh reasoning) Test Coverage: 2848 tests passing, 94.82% coverage (2 skipped, +33 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Bugs Fixed (7 total: 1 Codex + 4M, 2L)¶

BH41-M1 (Claude) — actors/analyst.py: _run_quality_analysis() included None elements in quality_subscores without saw_non_null guard; [None, 0.8] silently dropped None but [None, None] defaulted inconsistently; added saw_non_null filter matching afa_bridge pattern so all-None list defaults and partially-valid list filters correctly
BH41-M2 (Claude) — engine/validation.py: validate_range() default check_nan=False — NaN inputs silently passed range checks; changed default to True; all production callers already passed check_nan=True explicitly, so no behavior change for existing call sites
BH41-M3 (Claude) — crypto/schema_signer.py: create_tool_statement() mutated _prev_digests on each call — partial sign_statement() failure left chain in inconsistent state; sign_tools_list() now collects pending_digests first, commits atomically only after all sign_statement() calls succeed
BH41-M4 (Claude) — workflows/consensus.py: get_required_missing() included DEFER voters in voted_roles accumulation — a single DeferredVoter trivially satisfied role coverage for any role; DEFER now excluded consistent with votes_cast exclusion
BH41-L1 (Claude) — actors/calibrator.py: list_proposals() iterated live proposal dict while serializing enum .value — concurrent propose() mutations caused AttributeError on partially-mutated proposals; snapshot status enum values under self._lock
BH41-L2 (Claude) — telemetry/emitter.py: emit_mcp_invocation() used correlation_id or self.default_correlation_id — empty-string "" (valid sentinel) silently replaced with default; changed to Optional[str] = None with explicit is not None guard
BH41-Codex (Codex) — engine/complexity.py: ComplexityDecomposer.decompose() accepted bool as complexity_floor — True/False passed validate_range numeric check via bool-is-int coercion; added isinstance(complexity_floor, bool) guard with TypeError before validate_range

Quality Gate Fixes (Phase 1 — Verify)¶

ruff B017 — tests/test_bug_hunt_41.py lines 189, 205: pytest.raises(Exception) narrowed to pytest.raises(ValueError) (invalid Ed25519 key raises ValueError)
black — Auto-formatted 5 files: test_bug_hunt_41.py, calibrator.py, emitter.py, test_schema_signer.py, test_engine.py
mypy attr-defined — calibrator.py:847: added proposals_snapshot: list[dict[str, Any]] = [] type annotation to fix "object" has no attribute "value" inference error

v4.5.48 — Bug Hunt #40 (2026-02-22)¶

Method: Hybrid — 3 Claude sweep agents + Codex gpt-5.3-codex (xhigh reasoning) Test Coverage: 2815 tests passing, 94.78% coverage (2 skipped, +40 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Bugs Fixed (9 total: 4M, 5L)¶

BH40-M1 (Codex+Claude) — integration/afa_bridge.py: _validate_quality_subscores() failed to distinguish all-null list [None,None] from explicit empty list []; both defaulted to [0.7,0.7,0.7], bypassing quality gate fail-closed behavior; added saw_non_null tracker so [] returns [] (fail-closed) while all-null correctly defaults
BH40-M2 (Both) — telemetry/emitter.py: BatchHTTPSink.stop() read self._thread outside the lock — race with concurrent start(); extract ref + clear under lock, join outside lock (same pattern as BH38-M4, BH39-M1)
BH40-M3 (Claude) — engine/validation.py: validate_normalized() missing isinstance(value, bool) guard; True/False passed [0,1] range check via bool-is-int coercion
BH40-M4 (Claude) — config.py: _parse_mcp_rate_limit checked isinstance(value, float) on original value; string "3.5" bypassed check and was silently truncated to 3 via int(); fix: convert first, then check fv.is_integer()
BH40-L1 (Claude) — engine/gates.py: GateEvaluator accepted negative values for novelty_N0, novelty_threshold, complexity_floor, quality_min_score; negative thresholds trivially disable governance gates; added non-negativity guard
BH40-L2 (Claude) — config.py: _parse_kl_drift_dict window_days: same string-fractional truncation pattern as BH40-M4; separated ValueError (non-finite) from TypeError (fractional) to preserve existing test semantics
BH40-L3 (Claude) — aegis_governance/mcp_server.py: stdio size guard used len(line) (Unicode code-point count) instead of len(line.encode("utf-8")) (byte count); multi-byte characters allowed oversized requests through
BH40-L4 (Claude) — integration/afa_bridge.py: get_decision_history() used truthy if agent_id: — empty string "" was treated as no-filter, bypassing agent_id filtering; fixed to if agent_id is not None:
BH40-L5 (Claude) — telemetry/encryption.py: DEKRotator.get_decryptor() and list_versions() read self._deks without self._lock; only write paths (BH39-M2) had been protected; added lock acquisition for all read paths

v4.5.47 — Bug Hunt #39 (2026-02-21)¶

Method: Hybrid — 3 Claude sweep agents + Codex gpt-5.3-codex (xhigh reasoning) Test Coverage: 2775 tests passing, 94.77% coverage (2 skipped, +54 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Bugs Fixed (13 total: 1H, 6M, 6L)¶

BH39-H1 (High) — workflows/persistence/models.py: verify_chain_link() chain root forgery — root link was not self-consistency checked, allowing crafted from_link/to_hash chains to pass verification; added root hash self-consistency guard
BH39-M1 (Medium) — telemetry/pipeline.py: TelemetryPipeline.stop() held _lock during thread.join() — blocked concurrent start() for up to N seconds; lock released before join
BH39-M2 (Medium) — telemetry/encryption.py: DEKRotator.generate_dek_version() TOCTOU race — version ID computed, then written without holding lock; version now pinned under lock before write
BH39-M3 (Medium) — workflows/persistence/key_store.py: KeyStore audit_lock held during kek.decrypt() (blocking I/O) — lock released before decrypt call
BH39-M4 (Medium) — engine/gates.py: GateEvaluator accepts float('inf') for trigger factors — silently disables Bayesian gate (posterior probability of exceeding infinity is always 0, gate permanently passes); added math.isinf() guard
BH39-M5 (Medium) — engine/utility.py: UtilityResult accepts NaN for variance/raw/lcb — NaN propagates to silent gate failure (NaN > threshold is always False); added __post_init__ NaN rejection; -inf allowed for raw/lcb (signals extremely low utility)
BH39-M6 (Medium) — config.py: _parse_kl_drift_dict silently truncates float window_days (30.7 → 30); reject fractional floats before int() coercion
BH39-L1 (Low) — workflows/consensus.py: ConsensusWorkflow.from_dict() used cls() constructor mid-deserialization — inconsistent intermediate state if constructor validates; switched to cls.__new__(cls) pattern
BH39-L2 (Low) — engine/gates.py: GateEvaluator novelty_k=0.0 makes logistic gate score-insensitive (denominator always ≥1 regardless of score); added > 0 guard with clear error message
BH39-L3 (Low) — aegis_governance/mcp_server.py: JSON-RPC notification with non-string method received error response, violating JSON-RPC 2.0 §4.1; is_notification now computed before method type check so malformed notifications are silently dropped
BH39-L4 (Low) — crypto/bip322_provider.py: encode_simple() crashes with cryptic OverflowError for signatures ≥256 bytes (bytes([256]) overflow); added upfront len(signature) != SIGNATURE_SIZE check with clear ValueError
BH39-L5 (Low) — config.py: _parse_mcp_rate_limit silently truncates float mcp_rate_limit (e.g., 60.9 → 60); reject fractional floats
BH39-Codex-2 (Low) — telemetry/emitter.py: memory_sink accepts maxlen=0 for list sinks — del events[0] on empty list; TelemetryEmitter.emit() swallows sink exceptions → silent telemetry loss; added maxlen ≥ 1 guard for list-backed sinks

Test Files Added¶

tests/test_bug_hunt_39.py — 21 tests (BH39-H1 chain root, BH39-M1/M2/M3 concurrency/TOCTOU, BH39-L1 ConsensusWorkflow)
tests/test_bug_hunt_39b.py — 31 tests (BH39-M4/M5/M6/L2/L3/L4/L5)
tests/telemetry/test_emitter.py — +1 test (BH39-Codex-2 maxlen=0)

v4.5.46 (2026-02-21)¶

Bug Hunt #38 (Hybrid): 6 bugs (1H, 4M, 1L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 35 regression tests
BH38-H1: key_store.py uses Python 3.10+ parenthesized async-with syntax — SyntaxError on Python 3.9 (declared minimum); replaced with comma-separated form + # fmt: off guards for black stability
BH38-M1: UtilityCalculator accepts bool for phi_S, phi_D, gamma, kappa, migration_budget (bool-is-int bypass)
BH38-M2: GateEvaluator accepts bool for risk_trigger_factor, profit_trigger_factor, and all threshold params (bool-is-int bypass)
BH38-M3: CalibrationProposal accepts bool for current_value/proposed_value; _validate_gate_param accepts bool without rejection
BH38-M4: MetricsServer.stop() held _lock during thread.join() — blocked concurrent start() for up to 5 seconds; fixed by extracting refs under lock, releasing, then shutdown+join outside
BH38-L1 (Codex): BatchHTTPSink accepts float/non-int for integer params — upgraded bool-only check to full not isinstance(int) or isinstance(bool) pattern
QG-UT1: GateEvaluator(trigger_confidence_prob=True) silently accepted via validate_range inclusive upper bound (True==1.0); added explicit bool guard before validate_range in gates.py
Test Coverage: 2721 tests passing, 94.78% coverage (2 skipped, +36 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.45 (2026-02-20)¶

Bug Hunt #37 (Hybrid): 6 bugs (3M, 3L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 26 regression tests
BH37-M1: BayesianPosterior compute_posterior/compute_full NaN/Inf validation — added math.isfinite() guards + fail-closed GateResult for non-finite deltas
BH37-M2: emergency_halt() audit trail completeness — terminal overrides now tracked in already_terminal_overrides
BH37-M3: Calibrator novelty_N0 range constraint — added to _NONNEGATIVE_GATE_PARAMS
BH37-L1 (Codex): PipelineConfig accepts float for integer sizing fields
BH37-L2: ThreePointEstimate accepts bool values — added isinstance guard
BH37-L3: DriftMonitor window_days missing integer type check
Test Coverage: 2685 tests passing, 94.76% coverage (2 skipped, +26 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.44 (2026-02-20)¶

Bug Hunt #36 (Hybrid): 6 bugs (4M, 2L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 17 regression tests

BH36-M1 (Codex): lambda_handler.py — or pattern for estimated_impact/phase bypasses type check for falsy non-string values (False, 0)
BH36-M2: repository.py — mark_completed(final_state="aborted") injects non-enum state into serialized request.state, breaking from_dict() deserialization
BH36-M3: cli.py — or pattern for estimated_impact — same class as M1 (transport parity)
BH36-M4: mcp_server.py — or pattern for estimated_impact — same class as M1 (transport parity)
BH36-L1: complexity.py — compute_complexity_tax missing bool guard for phi_S/phi_D
BH36-L2: lambda_handler.py/cli.py — proposal_summary or pattern accepts falsy non-strings

QG Ultrathink: 2 findings (2L) — Lambda + MCP action_description or "" patterns replaced with type-safe defaults; no new tests (non-governance documentation field)

Test metrics: 2659 tests, 94.74% coverage (2 skipped)

v4.5.43 (2026-02-20)¶

Bug Hunt #35 (Hybrid): 6 bugs (4M, 2L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 22 regression tests

BH35-M1 (Codex): override.py — check_and_mark_expired() downgrades APPROVED/REJECTED terminal states to EXPIRED after wall-clock expiry
BH35-M2: rbac.py — NO_UNILATERAL_OVERRIDE constraint passes for NaN signer_count (IEEE 754: NaN < 2 is False)
BH35-M3: pipeline.py — PipelineConfig.flush_interval_seconds has no validation (zero/NaN/bool degrades flushing)
BH35-M4: emitter.py — BatchHTTPSink.flush_interval_seconds has no validation (same pattern as M3)
BH35-L1: pipeline.py — PipelineConfig.buffer_size/flush_threshold accept bool (isinstance(True, int) is True)
BH35-L2: encryption.py — DEKCache.ttl_seconds accepts zero/negative/bool without validation

QG Ultrathink (BH35 session): 4 findings (4L) — BatchHTTPSink/HTTPEventSink missing bool guards + timeout/retry_delay validation; +19 regression tests

Test metrics: 2642 tests, 94.79% coverage (0 skipped)

v4.5.42 (2026-02-20)¶

Bug Hunt #34 — Hybrid Architecture¶

Bug Hunt #34 (Hybrid): 5 bugs (4M, 1L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 14 regression tests
BH34-M1: DriftMonitor(num_bins=float) accepted, crashes later (Codex finding)
BH34-M2: CLI cmd_evaluate missing TypeError in config catch
BH34-M3: DualSignatureValidator.expiration_hours missing upper bound
BH34-M4: TelemetryPipeline._worker_loop inconsistent state on raise error
BH34-L1: AegisConfig.from_dict() telemetry_url type coercion gap
Test Coverage: 2601 tests passing, 94.79% coverage (0 skipped, +14 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.41 (2026-02-20)¶

Bug Hunt #33 — Hybrid Architecture¶

Bug Hunt #33 (Hybrid): 5 bugs (5M) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 15 regression tests
BH33-M1: config._parse_flat_numeric silently accepts non-numeric types (list, dict) — isinstance(x, (int, float)) guard missing; list/dict crash float() in _DIRECT flat numeric parsing
BH33-M2: config._from_raw_dict silently accepts non-numeric types for DIRECT params — YAML schema validates but direct dict construction bypasses; epsilon_R/beta/etc accept list/dict
BH33-M3: DriftMonitor.evaluate() passes unfiltered current_window to _to_histogram() — NaN/Inf values corrupt histogram bins; math.isfinite() filter missing (upstream filter only in set_baseline)
BH33-M4: OverrideWorkflow.init doesn't defensive-copy failed_gates list — external mutation of input list modifies workflow state; use list(failed_gates)
BH33-M5: mark_completed() doesn't sync state_data with final_state (Codex finding) — state_data["status"] remains old value after final_state update; audit trail inconsistency
Test Coverage: 2587 tests passing, 94.80% coverage (0 skipped, +15 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.40 (2026-02-20)¶

Bug Hunt #32 — Hybrid Architecture¶

Bug Hunt #32 (Hybrid): 3 bugs (2M, 1L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 20 regression tests
BH32-M1: DriftMonitor constructor accepts negative/Inf thresholds — update_thresholds() validates isfinite + non-negative but __init__() only calls validate_threshold_ordering(); negative tau causes false CRITICAL, Inf tau disables detection
BH32-M2: Calibrator _validate_gate_param allows negative threshold params (complexity_floor, quality_min_score, novelty_threshold, utility_threshold) — negative values make gates trivially pass; complexity_floor is non-overridable (governance bypass via calibration)
BH32-L1: KLDriftConfig __post_init__ missing window_days validation — _parse_kl_drift_dict validates >= 1 for YAML loading but direct construction accepts 0 or negative
Test Coverage: 2572 tests passing, 94.80% coverage (0 skipped, +20 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.39 (2026-02-20)¶

Bug Hunt #31 — Hybrid Architecture + QG73¶

Bug Hunt #31 (Hybrid): 4 bugs (1M, 3L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 15 regression tests
BH31-M1: MCP emit_mcp_invocation caller_id non-string guard — null/non-string passed to structured audit log
BH31-L1: Lambda _coalesce_thresholds dict.get() null gotcha — JSON null bypasses default value
BH31-L2: ConsensusConfig timeout_hours fractional minimum — accepts Inf/0.0, should require > 0
BH31-L3: DualSignatureValidator expiration_hours fractional minimum — accepts Inf/0.0, should require > 0
Quality Gate QG73 Ultrathink: 2 findings (1M, 1L), 7 regression tests
QG73-L1: CLI agent_id/session_id isinstance guard — transport parity with MCP/Lambda non-string rejection
QG73-M1: AFABridge default_timeout_hours fractional minimum — accepts Inf/0.0, should require > 0
Quality Gate QG74 Ultrathink: 1 cosmetic fix
QG74-L1: MCP tool_name dict.get() null gotcha — params.get("name", "") returns None for null key
Test Coverage: 2552 tests passing, 94.80% coverage (0 skipped, +22 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.38 (2026-02-19)¶

Bug Hunt #30 — Hybrid Architecture + QG72¶

Bug Hunt #30 (Hybrid): 5 bugs (2M, 3L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 8 regression tests
BH30-M1: Lambda proposal_summary dict.get() null gotcha — JSON null bypasses default (transport parity)
BH30-M2: AFABridge get_decision_history() accepts float limit — crashes during list slicing (Codex finding)
BH30-M3: CLI risk_score null → float(None) TypeError — extracted _coerce_risk_score() helper
BH30-L1: Lambda action_description dict.get() null gotcha — same pattern as BH30-M1
BH30-L2: Lambda estimated_impact dict.get() null gotcha — None passes to isinstance check
BH30-L3: TelemetryPipeline config mutation — shared PipelineConfig mutated by pii_encryptor setup; defensive copy.copy()
Quality Gate QG72 Ultrathink: 4 findings (2M, 2L), 4 regression tests
QG72-M1: CLI proposal_summary dict.get() null gotcha — transport parity with Lambda/MCP
QG72-M2: MCP action_description dict.get() null gotcha — None passed to risk check
QG72-L1: CLI estimated_impact dict.get() null gotcha — None bypasses string type check
QG72-L2: Lambda phase dict.get() null gotcha — None passed to .lower() TypeError
Test Coverage: 2530 tests passing, 94.76% coverage (0 skipped, +12 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.37 (2026-02-18)¶

Bug Hunt #29 — Hybrid Architecture + QG71¶

Bug Hunt #29 (Hybrid): 8 bugs (3M, 5L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 21 regression tests
BH29-M1: estimated_impact case-sensitive comparison — "HIGH" or "Critical" bypass human oversight gates silently
BH29-M2: Executor start_execution reads progress.status outside lock after publishing — TOCTOU race
BH29-M3: Calibrator novelty_k missing from _NONZERO_GATE_PARAMS — zero value weakens governance without validation error
BH29-L1: Config mcp_rate_limit missing math.isfinite() guard — NaN/Inf pass through to int() silently
BH29-L2: MCP phase string not lowercased before _PHASE_MAP lookup — transport parity gap with CLI/Lambda
BH29-L3: Lambda/CLI non-string estimated_impact silently bypasses impact classification — transport parity fix
BH29-L4: Pipeline stop() drain loop aborts on first storage error — remaining queued events lost
BH29-L5: PipelineConfig.flush_threshold accepts 0 or negative — triggers flush on every ingest
Quality Gate QG71 Ultrathink: 3 findings (3L), 5 regression tests
QG71-L1: MCP estimated_impact str() cast silently accepts null/non-string — replaced with or guard
QG71-L2: Pipeline drain except RuntimeError too narrow — enricher exceptions crash stop(); broadened to except Exception
QG71-L3: MCP proposal_summary null → "None" string via dict.get() gotcha — added or "" guard
Test Coverage: 2518 tests passing, 94.76% coverage (0 skipped, +26 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.36 (2026-02-18)¶

Bug Hunt #28 — Hybrid Architecture + QG70¶

Bug Hunt #28 (Hybrid): 5 bugs (3M, 2L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 17 regression tests
BH28-M1: ConsensusWorkflow QUORUM_MET state never reverts to PENDING when quorum lost via vote overwrite to DEFER
BH28-M2: Governance initiate_override blocked by expired stale override — evict expired before rejecting
BH28-M3: CLI risk_score alias overwrites risk simplified name — priority chain: risk_proposed > risk > risk_score
BH28-L1: DriftMonitor.evaluate() window_size includes non-finite values that don't contribute to KL divergence
BH28-L2: Config quality_no_zero_subscore string "false" treated as truthy — extracted _coerce_bool() helper
Quality Gate QG70 Ultrathink: 3 findings (3L), 5 regression tests
QG70-L1: _from_raw_dict skips _coerce_bool() for bool-allowed YAML fields
QG70-L2: _coerce_bool() accepts NaN/Inf floats
QG70-L3: set_baseline() passes raw data (including Inf) to histogram
Test Coverage: 2492 tests passing, 94.73% coverage (0 skipped, +22 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.35 (2026-02-17)¶

Quality Gate QG69 Ultrathink¶

Quality Gate QG69 Ultrathink: 1 finding (1M), 7 regression tests
QG69-M1: MCP + CLI drift_baseline_data missing math.isfinite() — transport parity violation with Lambda (BH27-L4 was fixed there but not in MCP/CLI)
Test Coverage: 2470 tests passing, 94.73% coverage (0 skipped, +7 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.34 (2026-02-17)¶

Bug Hunt #27 — Hybrid Architecture¶

Bug Hunt #27 (Hybrid): 4 bugs (3M, 1L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 13 regression tests
BH27-M1: resume_or_create() ID propagation — creation path now inspects constructor params and injects workflow_id
BH27-M2: _from_raw_dict string-to-float coercion — YAML quoted numerics passed through as str instead of float
BH27-M3: Lambda/MCP agent_id/session_id null bypass — dict.get() returns None for explicit JSON null
BH27-L4: Lambda drift_baseline_data missing math.isfinite() — NaN/Inf caused 500 instead of 400
Test Coverage: 2470 tests passing, 94.73% coverage (0 skipped, +13 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.33 (2026-02-17)¶

Scaffold Adoption — Compliance Infrastructure¶

Scaffold Adoption: Integrated Engineering Standards ai_scaffold_package v2.1.1 (50 new files, zero conflicts)
New directories: ai/ (governance artifacts), docs/compliance/ (operational runbooks), schemas/ (log schema), tools/ci/ (9 validators), policy/ (OPA), benchmarks/
AI Governance: 8 artifacts pre-filled with AEGIS content (system-register, risk-register mapping to OWASP Agentic Top 10, model/data cards, oversight-plan with kill switch, postmarket-monitoring, AIMS-POLICY, technical_file/)
Compliance Runbooks: 7 docs customized for AEGIS (system-description with Lambda+ECS architecture, BCP-DRP with RTO/RPO Tier 2, IRP with governance erosion containment, ACCESS-REVIEW with 4 AWS service accounts, VENDOR-RISK complete AWS assessment, CHANGE-MANAGEMENT with frozen parameter process, DSR-PRIVACY with 12 encrypted PII fields)
Automation: compliance-evidence-scheduler.yml (monthly/quarterly/annual GitHub issue creation), 15 labels, PR + 7 issue templates
CI/CD: 4 new workflows (scaffold-gates, codeql, compliance-nightly, compliance-evidence-scheduler), Makefile, .pre-commit-config.yaml (ELITE tier)
Type Safety: Added type annotations to 9 tools/ci/*.py validators (mypy strict mode compliance)
pyproject.toml: Added [tool.standards] (tier: ELITE, v2.1.1), tools/ci per-file ignores for scaffold style preferences
Test Coverage: 2448 tests passing, 94.83% coverage (0 skipped, no new tests — operational changes only)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest, AI RMF validators, AI Act lint, scaffold adoption validator)
Placeholder Elimination: 274 → 0 in scaffold files (100% customization: compliance docs, AI artifacts, GitHub templates all AEGIS-specific)

v4.5.32 (2026-02-16)¶

Bug Hunt #26 (Hybrid)¶

Method: 3 Claude sweep agents + Codex gpt-5.3-codex
Findings: 4 bugs (3M, 1L), 18 regression tests
BH26-M1: validation.py — validate_positive() accepts bool (True→1) due to Python bool⊂int (Codex finding)
BH26-M2: bayesian.py — update_prior variance overflow: sum((x-mean)**2) → inf silently bypasses guard
BH26-M3: rbac.py — _check_bool_constraint None value fail-open for pass_when_true=False constraints (security)
BH26-L1: complexity.py — compute_complexity_tax NaN/Inf propagation via delta dict values
0 deferred bugs
Test Coverage: 2463 tests passing, 94.83% coverage (0 skipped, +18 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.31 (2026-02-16)¶

Bug Hunt #25 (Hybrid)¶

Method: 3 Claude sweep agents + Codex gpt-5.3-codex
Findings: 6 bugs (3M, 3L), 18 regression tests
BH25-M1: analyst.py — _evaluate_utility_gate UtilityComponents fields crash on explicit None (JSON null)
BH25-M2: cli.py — _extract_risk_alias_and_subscores key-presence vs value-presence transport parity gap with Lambda/MCP
BH25-M3: drift.py — _to_histogram constant-range +-1.0 adjustment no-op for values > 2^53 (IEEE 754 precision) → ZeroDivisionError
BH25-L1: analyst.py — risk_delta/profit_delta = None crashes with TypeError (Codex finding)
BH25-L2: bayesian.py — update_prior returns (inf, inf) when sum() overflows (no OverflowError from sum())
BH25-L3: config.py — from_dict accepts string "nan"/"inf" bypassing isinstance(val, (int, float)) NaN check
PLR0912 fix: extracted _parse_flat_numeric() static helper from from_dict()
0 deferred bugs
Test Coverage: 2430 tests passing, 94.81% coverage (0 skipped, +18 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.30 (2026-02-16)¶

Bug Hunt #24 (Hybrid)¶

Method: 3 Claude sweep agents + Codex gpt-5.3-codex (1 iteration, 1 novel finding)
Findings: 10 bugs (4M, 6L), 26 regression tests
BH24-M1: mcp_server.py — handle_request() returns response for JSON-RPC notifications (requests without "id") violating JSON-RPC 2.0 §4.1 (Codex finding)
BH24-M2: rbac.py — NO_UNILATERAL_OVERRIDE signer_count=None crashes with TypeError; guards None, bool, non-numeric (fail-closed)
BH24-M3: analyst.py — _evaluate_quality_gate missing null guard for quality_score (dict.get returns None for explicit null)
BH24-M4: analyst.py — _evaluate_risk_gate missing null guard for risk_baseline
BH24-L1: afa_bridge.py — _extract_metrics missing isinstance(list) check for quality_subscores (transport parity gap)
BH24-L2: afa_bridge.py — _extract_metrics missing isinstance(UtilityResult) check for utility_result
BH24-L3: config.py — KLDriftConfig.__post_init__ accepts NaN/Inf tau_warning/tau_critical via direct constructor
BH24-L4: analyst.py — _evaluate_novelty_gate missing null guard for novelty_score
BH24-L5: analyst.py — _evaluate_complexity_gate missing null guard for complexity_score
BH24-L6: analyst.py — _evaluate_profit_gate missing null guard for profit_baseline
PLR0912 fix: extracted _validate_quality_subscores() static helper
0 deferred bugs
Test Coverage: 2412 tests passing, 94.80% coverage (0 skipped, +26 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.29 (2026-02-16)¶

AMTSS Protocol v1 — MCP Tool Schema Signing¶

ROADMAP Item 20a(e) complete: Cryptographic signing of MCP tool schemas to detect rug pull attacks (CoSAI MCP-T6)
Design via Claude-GPT dialogical collaboration (GPT 5.2 Pro xhigh reasoning, 3 substantive rounds)
Research document: docs/research/004-mcp-schema-signing-design.md (465 lines)
New module: src/crypto/schema_signer.py — ToolSchemaSigner, SigningKeyPair, compute_tool_digest()
Protocol: per-tool + manifest dual signing, Ed25519, RFC 8785 canonicalization, _meta inline delivery
MCP integration: _handle_tools_list() embeds proofs in _meta[com.aegis.governance/toolSchemaSigning]
MCP integration: _handle_initialize() advertises keyset in capabilities.experimental.toolSchemaSigning
Graceful degradation: signer is best-effort (None if cryptography not installed)
Updated crypto/__init__.py exports: ToolSchemaSigner, SigningKeyPair, compute_tool_digest, AMTSS_* constants, SCHEMA_SIGNER_AVAILABLE
All 5 sub-items of ROADMAP 20a now complete (audit logging, rate limiting, TLS enforcement, CoSAI cross-reference, schema signing)
Quality-Gate Ultrathink (prior session): 5 findings fixed (3 MEDIUM, 2 LOW) — manifest duplicate-name bypass, _meta stripping in digest, statement type validation, _prev_digests chain wiring, strict base64url decode; +7 regression tests
Quality-Gate QG67 Ultrathink: 4 additional findings fixed (2 MEDIUM, 2 LOW) — null sig crash in verify methods (isinstance guard), NaN/Inf canonicalization (allow_nan=False), manifest revision never incremented, MCP signing error log level (DEBUG→WARNING); +7 regression tests
Test Coverage: 2386 tests passing, 94.74% coverage (0 skipped, +82 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.28 (2026-02-16)¶

CoSAI MCP-T Cross-Reference¶

ROADMAP Item 20a(d) complete: Added §11.4.1 CoSAI MCP Threat Model (MCP-T1..T12) cross-reference to CLAUDE.md
Maps all 12 CoSAI MCP-specific threats to OWASP Agentic risks and AEGIS controls
Coverage: 9/12 STRONG, 2/12 MODERATE, 1/12 PARTIAL (MCP-T7 transport security)
Source: docs/research/003-mcp-security-ecosystem-review.md
ROADMAP 20a(d) checkbox marked complete
No code changes — documentation only
Test Coverage: 2304 tests passing, 94.63% coverage (unchanged)

v4.5.27 (2026-02-16)¶

Bug Hunt #23 (Hybrid)¶

7 bugs found (3M, 4L) by 3 Claude sweep agents; Codex gpt-5.3-codex (1 iteration, 0 novel findings — 1 duplicate of QG66-UT2)
29 regression tests added
Scope: Transport parity (CLI), input validation, thread safety, consensus logic, key store TOCTOU
BH23-M1: cli.py — _load_drift_monitor missing bool guard for drift baseline elements (transport parity with Lambda/MCP)
BH23-M2: cli.py — _extract_risk_alias_and_subscores treats explicit [] as falsy → returns defaults instead of empty list (transport parity)
BH23-M3: calibrator.py — _evict_old_proposals() called outside self._lock — race condition on self.proposals dict
BH23-L1: cli.py — _extract_risk_alias_and_subscores missing isinstance(list) type check for quality_subscores
BH23-L2: engine/bayesian.py — BayesianPosterior accepts NaN/Inf prior_mean in constructor and override paths
BH23-L3: consensus.py — check_timeout() returns True for finalized (APPROVED/REJECTED) workflows past deadline
BH23-L4: key_store.py — get_private_key/get_public_key missing _audit_lock — TOCTOU race with revoke_key()
0 deferred bugs
Test Coverage: 2304 tests passing, 94.63% coverage (0 skipped, +29 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.26 (2026-02-15)¶

Quality-Gate QG66 Ultrathink¶

2 findings (2L), 2 regression tests
UT-1: mcp_server.py — _validate_quality_subscores treats empty list [] as default [0.7, 0.7, 0.7] (transport parity with Lambda)
UT-2: mcp_server.py — _validate_quality_subscores missing try/except on float() — non-numeric strings crash server
Test Coverage: 2275 tests passing, 94.63% coverage (0 skipped, +2 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.25 (2026-02-15)¶

Bug Hunt #22 (Hybrid)¶

8 bugs found (4M, 4L) by 3 Claude sweep agents; Codex gpt-5.3-codex (3 iterations, 0 novel findings)
20 regression tests added
Scope: Override lifecycle, transport parity, input validation, bounded collections, type safety
BH22-M1: override.py — reject() missing wall-clock expiration check before state mutation (Codex)
BH22-M2: mcp_server.py — quality_subscores missing extraction (transport parity with Lambda/CLI)
BH22-M3: drift.py — DriftMonitor.update_thresholds() missing finiteness + non-negativity validation
BH22-M4: persistence/repository.py — mark_completed() allows re-completing already-completed workflows
BH22-L1: lambda_handler.py + mcp_server.py — drift_baseline_data bool guard (transport parity)
BH22-L2: governance.py — active_overrides dict grows unbounded (expired overrides never evicted)
BH22-L3: afa_bridge.py — _evaluate_authorization string-as-iterable (set("admin") → char explosion)
BH22-L4: analyst.py — _evaluate_quality_gate crashes on explicit None quality_subscores (Codex)
0 deferred bugs
Test Coverage: 2273 tests passing, 94.64% coverage (0 skipped, +20 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.24 (2026-02-15)¶

Bug Hunt #21 (Hybrid)¶

8 bugs found (3M, 5L) by 3 Claude sweep agents; Codex gpt-5.3-codex (3 iterations, 0 novel findings)
16 regression tests added
Scope: Config validation, transport parity, bounded collections, telemetry, HTTP compliance
BH21-M1: config.py — KLDriftConfig missing __post_init__ threshold ordering validation
BH21-M2: lambda_handler.py — _validate_subscores missing bool guard (transport parity)
BH21-M3: afa_bridge.py — _extract_metrics missing quality_subscores element validation (bool/NaN/Inf)
BH21-L1: drift.py — DriftMonitor window_days accepts zero/negative
BH21-L2: calibrator.py — proposals dict grows unbounded (added _evict_old_proposals)
BH21-L3: emitter.py — emit_shadow_evaluation payload key collision (status overwrite via dict.update)
BH21-L4: prometheus_exporter.py — set_drift_status unbounded Prometheus label cardinality
BH21-L5: mcp_server.py — MCP HTTP 405 missing Allow header (RFC 9110 §15.5.6)
0 deferred bugs
Test Coverage: 2253 tests passing, 94.63% coverage (0 skipped, +17 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.23 (2026-02-15)¶

Bug Hunt #20 (Hybrid)¶

9 bugs found (7M, 2L) by Codex gpt-5.3-codex (3 iterations) + 3 Claude sweep agents
18 regression tests added
BH20-M1: persistence/durable.py — resume_all_pending() crashes on non-dict state_data
BH20-M2: override.py — from_dict() mutable failed_gates list sharing
BH20-M3: override.py (8 sites) — non-strict base64.b64decode accepts garbage
BH20-M4: consensus.py — eligible_voters set stored by reference
BH20-M5: consensus.py — timeout_hours unbounded → timedelta OverflowError
BH20-M6: pcw_decide.py — _hash_summary/_build_decision_trace crash on non-string/non-mapping inputs
BH20-M7: encryption.py — EncryptedField.from_dict() permissive base64 decode
BH20-L1: config.py — window_days missing negative/overflow validation
BH20-L2: Lambda/MCP/CLI (4 files) — transport parity: float helpers missing bool guard

QG65 Ultrathink¶

5 additional fixes from deep analysis phase
CLI risk_score alias path missing bool guard — True silently became 1.0
CLI quality_subscores elements missing bool guard — booleans passed as floats
Lambda _parse_body base64 decode missing validate=True — non-base64 chars accepted
Crypto providers strict base64: ed25519_provider.py (3 calls), bip322_provider.py (1 call), kek_provider.py (2 calls) — all upgraded to validate=True
PLR0912 fix: extracted _extract_risk_alias_and_subscores() helper in cli.py
4 regression tests added
Test Coverage: 2236 tests passing, 94.68% coverage (0 skipped, +22 new from BH20+QG65)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.22 (2026-02-15)¶

Rigor: Resolve All Deferred Bugs¶

Rigor: Resolve All Deferred Bugs: Fixed BH16-L5, closed BH15-L6 — 0 deferred bugs remaining
BH16-L5 FIXED: WorkflowTransition.verify_hash() standalone false negatives for non-first transitions — added previous_hash column to model, updated _record_transition() to populate it, verify_hash() now falls back to stored value
BH15-L6 CLOSED (by-design): Lambda telemetry_emitter wiring gap — Lambda uses AWS-native observability (CloudWatch Logs, X-Ray); adding HTTPEventSink would create redundant double-logging and SSRF attack surface
8 regression tests added (6 model-level + 2 repository integration)
Test Coverage: 2214 tests passing, 94.68% coverage (0 skipped, +8 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.21 (2026-02-14)¶

Bug Hunt #19 (Hybrid)¶

Bug Hunt #19 (Hybrid): 5 bugs (2M, 3L) found by Codex gpt-5.3-codex (3 iterations) + 3 Claude sweep agents; 12 regression tests
BH19-M1: proposal.py — from_dict() shares mutable references with input dict (list aliasing for tags/related_proposals, dict aliasing for transition metadata/gate_results) — caller mutations corrupt workflow state
BH19-M2: override.py + key_store.py — sign_with_stored_key() key rotation TOCTOU — private/public keys fetched in separate calls without version pinning; record_usage() also lacked version parameter
BH19-L1: afa_bridge.py — _coalesce_float missing bool guard (bool-is-int subclass, True silently becomes 1.0)
BH19-L2: afa_bridge.py — _evaluate_execution accepted non-boolean proposal_approved/has_execution_plan (Python truthiness)
BH19-L3: afa_bridge.py — _evaluate_authorization crashes with set(None) when authorization lists are JSON null
0 deferred bugs
Test Coverage: 2206 tests passing, 94.68% coverage (0 skipped, +12 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.20 (2026-02-14)¶

Bug Hunt #18 (Hybrid)¶

Bug Hunt #18 (Hybrid): 7 bugs (3M, 4L) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 25 regression tests
BH18-M1: lambda_handler.py — non-boolean control flags accepted via raw body.get() (Python truthiness confusion)
BH18-M2: config.py — from_dict/_from_raw_dict flat keys lack NaN/Inf validation (parity gap with nested sections)
BH18-M3: cli.py — CLI non-boolean control flags (transport parity with Lambda)
BH18-L1: bayesian.py — BayesianPosterior.update_prior() accepted ddof=True (bool is int subclass)
BH18-L2: config.py — from_dict novelty keys lack NaN/Inf validation
BH18-L3: consensus.py — ConsensusConfig missing bool guards on quorum_percentage/approval_threshold
BH18-L4: afa_bridge.py — AFABridge(default_timeout_hours=True) bypasses isfinite() check
0 deferred bugs
Test Coverage: 2194 tests passing, 94.61% coverage (0 skipped, +25 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.19 (2026-02-14)¶

Bug Hunt #17 (Hybrid)¶

Bug Hunt #17 (Hybrid): 6 bugs (1M, 5L) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 13 regression tests
BH17-M1: afa_bridge.py — _evaluate_risk_check used raw context.get() instead of _coalesce_float(), allowing None/NaN/Inf through (transport parity gap)
BH17-L1: config.py — _extract_nested_floats missing isfinite() validation after float() cast
BH17-L2: config.py — _from_raw_dict kl_drift parsing lacked NaN/Inf validation; replaced inline code with _parse_kl_drift_dict() helper for parity
BH17-L4: serialization.py — ensure_utc preserved non-UTC timezone offsets instead of converting to UTC
BH17-L5: emitter.py — BatchHTTPSink accepted negative max_retries (silent event drops on flush)
BH17-L6: governance.py — emergency_halt reported already-rejected overrides as cancelled
0 deferred bugs
Test Coverage: 2169 tests passing, 94.60% coverage (0 skipped, +13 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.18 (2026-02-14)¶

Quality Gate #62 (Ultrathink)¶

Quality Gate #62 (Ultrathink): 6 findings (1M, 5L) from BH16 post-fix audit; 11 regression tests
QG62-M1: afa_bridge.py — _coalesce_float() missing isfinite() guard (transport parity with CLI/Lambda); risk_proposed/profit_proposed bare float() without validation
QG62-L1: config.py — from_dict kl_drift float fields lacked NaN/Inf validation; window_days not coerced to int (parity with _from_raw_dict); extracted _parse_kl_drift_dict() helper
QG62-L2: lambda_handler.py — quality_subscores null guard missing (dict.get returns None, not default, when key exists with null)
0 deferred bugs
Test Coverage: 2156 tests passing, 94.58% coverage (0 skipped, +11 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.17 (2026-02-14)¶

Bug Hunt #16 (Hybrid)¶

Bug Hunt #16 (Hybrid): 9 bugs (4M, 5L) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 22 regression tests
BH16-M1: lambda_handler.py — non-dict metadata → 500 instead of 400; added type validation
BH16-M2: config.py — from_dict missing bool guard for kl_drift values (parity with _from_raw_dict)
BH16-M3: afa_bridge.py — _evaluate_proposal crashes on None context values; extracted _coalesce_float() + _extract_metrics() helpers
BH16-M4: consensus.py — deadlock when all voters voted but neither threshold met; added rejection fallback
BH16-L1: mcp_server.py — _drain_request_body partial drain without close_connection = True
BH16-L2: config.py — mcp_rate_limit missing bool guard in both from_dict and _from_raw_dict
BH16-L3: afa_bridge.py — _evaluate_authorization non-deterministic set ordering in rationale/next_steps
BH16-L4: complexity.py — negative normalized values not clamped to [0, 1]
BH16-L5: persistence/models.py — WorkflowTransition.verify_hash() false negatives (deferred — requires schema change)
1 deferred bug (BH16-L5)
Test Coverage: 2145 tests passing, 94.56% coverage (0 skipped, +22 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.16 (2026-02-14)¶

Bug Hunt #15 (Hybrid) + Quality Gate #61 (Ultrathink)¶

Bug Hunt #15 (Hybrid): 8 bugs (2M, 6L) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 22 regression tests
Quality Gate #61 (Ultrathink): 7 findings (4M, 3L) — 5 fixed + 8 regression tests
Transport parity: CLI observation_values sanitization
0 deferred bugs
Test Coverage: 2123 tests passing, 94.53% coverage (0 skipped, +22 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.15 (2026-02-13)¶

Bug Hunt #14 (Hybrid)¶

3 bugs (3M) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 10 regression tests
BH14-M1: consensus.py — ConsensusConfig(timeout_hours=True) silently creates 1-hour deadline (Python bool is subclass of int, True == 1); added isinstance(bool) guard in __post_init__
BH14-M2: override.py — DualSignatureValidator(expiration_hours=0) silently creates instantly-expired overrides; NaN/Inf crash downstream at OverrideWorkflow.__init__; added full validation (bool, isfinite, positivity) in __init__
BH14-M3: lambda_handler.py — quality_subscores lacked isfinite() guard (CLI had it, Lambda didn't); extracted _validate_subscores() helper for transport parity
0 deferred bugs
Test Coverage: 2101 tests passing, 94.54% coverage (0 skipped, +10 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.14 (2026-02-13)¶

Rigor Close Deferrals v3¶

Closed all 5 deferred bugs (1 fixed, 4 documented/accepted-risk)
BH12-L2: afa_bridge.py — default_timeout_hours NaN/Inf passthrough; math.isfinite() guard added. 3 regression tests.
QG60-6: emitter.py — BatchHTTPSink stats counters outside lock. CLOSED: CPython GIL atomic; inline documentation added.
QG60-7: consensus.py — Rejection threshold indeterminate. CLOSED: By-design; timeout handles.
QG60-8: consensus.py — cast_vote() no thread lock. CLOSED: Single-threaded by design; thread-safety note added to docstring.
QG60-9: governance.py — No un-halt mechanism. CLOSED: Intentional one-way safety mechanism.
0 deferred bugs remaining
Test Coverage: 2091 tests passing, 94.52% coverage (0 skipped, +3 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.13 (2026-02-13)¶

Bug Hunt #13 (Hybrid)¶

7 bugs (4M, 3L) found by 3 Claude sweep agents; Codex gpt-5.3-codex xhigh timed out (60+ min, no output); 16 regression tests
BH13-M1: lambda_handler.py — Eager evaluation of risk_score fallback causes crash when risk_proposed is valid but risk_score is NaN/Inf
BH13-M2: config.py — from_dict null kl_drift values pass through to KLDriftConfig, causing TypeError in DriftMonitor construction
BH13-M3: cli.py — Transport-layer parity gap: missing isfinite() guard on float conversions (Lambda/MCP have it, CLI didn't)
BH13-M4: mcp_server.py — POST 403 origin rejection doesn't drain request body (same class as QG60-3)
BH13-L1: config.py — from_dict mcp_rate_limit=None causes int(None) TypeError
BH13-L2: override.py — to_dict() leaks mutable failed_gates reference (same class as BH12-C1)
BH13-L3: mcp_server.py — Invalid/negative Content-Length doesn't close connection
Deferred: BH12-L2 (AFABridge.default_timeout_hours NaN/Inf — carried forward)
Test Coverage: 2088 tests passing, 94.52% coverage (0 skipped, +16 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.12 (2026-02-13)¶

Quality-Gate Ultrathink (QG60)¶

5 fixes from 9 findings (3M, 2L fixed; 4L deferred)
QG60-1: validation.py — validate_positive() accepts Inf (FAIL-OPEN: Inf epsilon disables risk/profit gates); isnan → isfinite
QG60-2: utility.py — UtilityCalculator gamma/kappa/migration_budget no Inf guard (NaN contamination via Inf * 0.0); isfinite validation
QG60-3: mcp_server.py — POST 404 catch-all doesn't consume request body (HTTP/1.1 persistent connection corruption); _drain_request_body() helper
QG60-4: mcp_server.py — 413 oversized request doesn't consume body; close_connection = True to force connection close
QG60-5: utility.py — ThreePointEstimate accepts Inf values (OverflowError or NaN LCB); isfinite __post_init__
SDK facade: Added Calibrator and Governance actor exports to aegis_governance.__init__
Deferred: QG60-6 (BatchHTTPSink stats counters outside lock — advisory), QG60-7 (ConsensusWorkflow rejection threshold — by-design), QG60-8 (cast_vote no thread lock — single-threaded), QG60-9 (no un-halt — intentional one-way)
Resolves deferred BH12-L1 (MCP HTTP POST body drain)
Test Coverage: 2072 tests passing, 94.50% coverage (0 skipped, +19 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.11 (2026-02-12)¶

Bug Hunt #12 (Hybrid)¶

10 bugs (1H, 7M, 2L) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 22 regression tests
BH12-H1: gates.py — GateEvaluator NaN threshold params cause governance lockout (complexity_floor NaN blocks all proposals with no override path)
BH12-M1: gates.py — NaN for novelty_N0/k/threshold, quality_min_score, utility_threshold (isfinite validation loop)
BH12-M2: complexity.py — analyze() NaN metric silent 1.0 via min() order-dependence (isfinite guard)
BH12-M3: complexity.py — complexity_floor validate_range missing check_nan=True
BH12-M4: lambda_handler.py — _float helper NaN/Inf passthrough (parity gap with MCP _float_arg)
BH12-M5: lambda_handler.py — _handle_risk_check NaN/Inf produces invalid JSON response
BH12-M6: executor.py — ExecutionPlan.timeout_seconds NaN/Inf bypass IEEE 754
BH12-M7: calibrator.py — CalibrationProposal.data_window no type/range validation
BH12-M8: config.py — _from_raw_dict null YAML values for _DIRECT params + from_dict null novelty/flat-key params
BH12-C1: proposal.py — to_dict() leaks mutable references to internal state (Codex)
Deferred: BH12-L1 (MCP HTTP POST body on 404/403), BH12-L2 (AFABridge.default_timeout_hours NaN/Inf)
Test Coverage: 2053 tests passing, 94.52% coverage (0 skipped, +22 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.10 (2026-02-12)¶

Quality-Gate Ultrathink (QG59)¶

12 fixes from 21 findings (8M, 4L); 9 deferred (3M design-gap, 6L advisory)
QG59-P1-1: gates.py — NaN trigger_factor bypasses zero-check (NaN guard)
QG59-P1-2: gates.py — trigger_confidence_prob > 1.0 disables governance FAIL-OPEN (validate_range)
QG59-P1-4: config.py — YAML null values crash _extract_nested_floats (None guard + KL null-coalesce)
QG59-P2-1: calibrator.py — CalibrationProposal accepts NaN/Inf (__post_init__ validation)
QG59-P2-2: analyst.py — _coerce_to_float accepts "nan"/"inf" strings + _calculate_confidence averages NaN (isfinite guards)
QG59-P2-3: proposer.py — PERT estimates NaN/Inf passthrough (isfinite guard)
QG59-P3-1: mcp_server.py — _float_arg passes NaN/Inf (isfinite guard)
QG59-P3-3: emitter.py — wrong event counted as dropped (track evicted oldest)
Test Coverage: 2031 tests passing, 94.52% coverage (0 skipped, +22 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.9 (2026-02-12)¶

Bug Hunt #11 (Hybrid)¶

10 bugs (8M, 2L) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 12 regression tests
BH11-M1: cli.py — quality_subscores null crash (null-coalesce + fallback)
BH11-M2: cli.py — non-string phase crash (isinstance guard)
BH11-M3: calibrator.py — wrong capability check (can_evaluate_gates → can_configure)
BH11-M4: governance.py — emergency_halt doesn't cancel active overrides
BH11-M5: consensus.py — NaN timeout_hours passthrough (math.isfinite guard)
BH11-M6: mcp_server.py — POST /health unconsumed body corruption (removed route)
BH11-M7: pipeline.py — _encrypt_pii_fields T-5 bypass (pass captured encryptor)
BH11-M8: emitter.py — BatchHTTPSink batch_size=0 silent data loss (Codex)
BH11-L1: utility.py — lcb_alpha NaN passthrough (check_nan=True)
BH11-L2: mcp_server.py — stdio size check includes newline (strip first)
Test Coverage: 2009 tests passing, 94.49% coverage (0 skipped, +12 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.5.8)¶

Quality-Gate Ultrathink (QG58): Docs sync phase — comprehensive test metric update across all documentation files
Updated version numbers: CLAUDE.md v4.5.7 → v4.5.8, ROADMAP v1.44.0 → v1.45.0, gap-analysis v1.49.0 → v1.50.0, repository-structure v2.10.0 → v2.11.0
Test metrics synchronized: 1987 tests, 94.45% coverage → 1997 tests, 94.47% coverage (10 new tests, +0.02% coverage)
Updated 9 documentation files: CLAUDE.md, README.md, ROADMAP.md, gap-analysis.md, KNOWN_ISSUES.md, repository-structure.md, test-count-methodology.md, comprehensive-todo-discovery.md, changelog.md
Changelog entries added to CLAUDE.md §9, docs/claude/changelog.md, and ROADMAP.md §Changelog
Test coverage: 1997 tests passing, 94.47% coverage (0 skipped)
Quality gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.5.7)¶

Bug Hunt #10 (Hybrid): 7 bugs (5M, 2L) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 9 regression tests
BH10-M1: validation.py — validate_positive NaN pass-through (IEEE 754: NaN <= 0 is False). Fix: math.isnan() guard
BH10-M2: validation.py — validate_threshold_ordering NaN pass-through. Fix: math.isnan() guard on both values
BH10-M3: mcp_server.py — stdio transport missing _MAX_REQUEST_BYTES size limit (HTTP had it). Fix: len(line) check
BH10-M4: cli.py — null JSON metric values crash (data.get(key, default) returns None, not default). Fix: null-coalesce
BH10-M5: lambda_handler.py — non-string phase value causes AttributeError on .lower(). Fix: isinstance(str) guard
BH10-L1: governance.py — emergency_halt non-atomic state mutation (no lock). Fix: with self._lock: on write + read
BH10-M7: lambda_handler.py — non-numeric drift_baseline_data crash on float(). Fix: try/except ValueError/TypeError
Quality-Gate Ultrathink (QG57): 2 additional fixes from Phase 2 ultrathink
QG57-M1: mcp_server.py — drift baseline non-numeric crash (same pattern as lambda, not propagated to MCP). Fix: try/except
QG57-M2: governance.py — TOCTOU in initiate_override/add_override_signature (halt check outside lock). Fix: moved inside lock
Test Coverage: 1987 tests passing, 94.45% coverage (0 skipped, +9 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.5.6)¶

Quality-Gate Ultrathink (QG56): 4 fixes from 13 findings (1H, 5M, 5L, 2I)
QG56-M2: stdio transport now supports JSON-RPC batch arrays via handle_batch()
QG56-M3: WebhookAlertSink TLS enforcement via _validate_sink_url() + allow_insecure param (breaking: http:// URLs now require allow_insecure=True)
QG56-M4: _validate_sink_url() strips URL whitespace before urlparse() to prevent hostname TOCTOU
QG56-L5: mcp_rate_limit clamped to max(0, ...) in both from_dict() and _from_raw_dict()
Test Coverage: 1978 tests passing, 94.47% coverage (0 skipped, +14 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.5.5)¶

TLS Enforcement (ROADMAP Item 20a(c)): _validate_sink_url() helper enforces HTTPS on HTTPEventSink and BatchHTTPSink with allow_insecure: bool = False keyword-only escape hatch for local development; MCP _ALLOWED_TELEMETRY_SCHEMES restricted from {"http", "https"} to {"https"}; CLI catches ValueError in _build_telemetry_emitter(); production guide TLS section added; Research 003 G2 status → ADDRESSED. Closes CoSAI MCP-T7 (Transport Security) gap.
Parameter Cookbook (ROADMAP Item 16): (a) docs/integration/parameter-reference.md — comprehensive parameter reference with derivation guidance, domain examples, boundary behavior for all inputs; (b) docs/integration/domain-templates.md — 4 worked examples (trading, CI/CD, content moderation, autonomous agent) with parameter mapping tables, JSON inputs, gate-by-gate walkthroughs; (c) MCP tool descriptions enriched with semantic context, minimum/maximum JSON Schema constraints, instructions field in initialize response
Test Coverage: 1964 tests passing, 94.47% coverage (0 skipped, +12 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.5.4)¶

MCP Hardening Phase 1 (ROADMAP Item 20a): Token bucket rate limiter + structured audit logging for all MCP tool invocations
_MCPRateLimiter: stdlib-only token bucket (capacity/rate), thread-safe via threading.Lock, configurable via AegisConfig.mcp_rate_limit (default: 60 req/min, 0 to disable)
emit_mcp_invocation(): structured audit event on every tools/call — ALLOW/DENY/ERROR decision, SHA-256 params_hash (PII-safe), latency, caller_id
Telemetry schema v2.2.0: added mcp.tool_invocation event definition (6 fields)
Closes CoSAI MCP-T10 (resource management) and MCP-T12 (logging/audit) gaps
Test Coverage: 1948 tests passing, 94.59% coverage (0 skipped, +25 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.5.3)¶

MCP Streamable HTTP Transport (ROADMAP Item 23): Implemented MCP Streamable HTTP transport per 2025-03-26 spec using stdlib http.server (zero new dependencies). Protocol version updated to 2025-03-26. --transport http flag with --host, --port, --allowed-origins options. POST /mcp supports JSON-RPC single + batch dispatch. GET /mcp returns 405 (SSE not implemented). /health endpoint for container health checks. Origin validation: fail-closed for non-localhost, permissive for localhost. Infrastructure: Dockerfile exposes 8080 with HTTP CMD; ECS stack replaces keepalive loop with HTTP server; internal ALB (:80 → :8080); ALB 5xx CloudWatch alarm. Deferred: SSE streaming, session management, resumability (all tools are synchronous/stateless). KNOWN_ISSUES.md ECS limitation marked RESOLVED. ADR-007 diagram updated.
Security Hardening (Ultrathink): 8 findings fixed (1 HIGH, 3 MEDIUM, 4 LOW) with 18 regression tests — U-1 SSRF protection on telemetry_url (scheme whitelist + private IP blocking), U-2/U-3 Content-Length validation (non-numeric, negative), U-4 error message sanitization (no exception details to clients), U-7 non-dict batch item rejection, U-8 empty batch [] error response, U-9 Origin validation on all endpoints (GET+POST), U-10 handle_request return type correctness
H-1 SSRF Hex/Decimal IP Bypass Fix: _validate_telemetry_url() now uses resolve-then-validate via socket.getaddrinfo() in the except ValueError branch — blocks hex (0x7f000001), decimal (2130706433), and DNS-to-private bypasses; extracted _is_forbidden_ip() helper using not addr.is_global (covers CGNAT 100.64/10 range missed by 4-property check); M-3 Slowloris timeout: timeout = 30 class attribute on _MCPHTTPHandler + server self.timeout = 30; 14 regression tests
Test Coverage: 1923 tests passing, 94.62% coverage (0 skipped, +64 new)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.5.2)¶

Security Hardening (Quality-Gate Ultrathink): 17 findings fixed (3 HIGH, 11 MEDIUM, 3 LOW) across 6 files — CORS restricted from ALL_ORIGINS to *.amazonaws.com, aegis-gate script injection fixes (function_name to env var, GITHUB_OUTPUT heredoc delimiters, ::error:: env vars), error message sanitization (no exception details in 500s), dynamodb:Scan removed from IAM policies, s3:PutObjectAcl removed, ADOT collector pinned to v0.41.2, CDK deploy --require-approval broadening, billing alarm enabled for all stages (dev=$100, staging=$150, prod=$200), deploy workflow test gate added, ECS keepalive logs failures, health check rejects degraded, _safe() None guard, quality_subscores empty-list fallback, ECS config path cleared
Test Coverage: 1859 tests passing, 94.54% coverage (0 skipped)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.5.1)¶

AWS Deployment Complete (ROADMAP Items 16-20): All 4 CDK stacks successfully deployed to AWS us-west-2 (account 164171672016): AegisSharedStack-dev (DynamoDB aegis-governance-state-dev, KMS, S3 aegis-governance-audit-dev-164171672016, Secrets Manager aegis/signing-keys-dev), AegisLambdaStack-dev (Lambda aegis-evaluate-proposal-dev + API Gateway https://yd1xm4ahcg.execute-api.us-west-2.amazonaws.com/dev/), AegisMcpStack-dev (ECS Fargate cluster aegis-governance-dev, service aegis-mcp-dev 1/1 running), AegisMonitoringStack-dev (SNS aegis-governance-alarms-dev, CloudWatch dashboard AEGIS-Governance-dev, 4 alarms)
Deployment Bug Fixes (7): cdk.json literal string bug (account context), pyproject.toml py-modules for standalone .py modules, Dockerfile.lambda numpy/scipy pins + explicit COPY for standalone modules, ECS ALB removal (MCP uses stdio not HTTP) + keepalive loop + lightweight health check, Lambda cross-stack cyclic refs (inline IAM policies), CloudWatch math expression MAX() to IF(), CDK protocol error (dict context to env kwarg)
ECS Architecture Note: MCP server uses stdio transport; ECS container runs keepalive loop pending HTTP/SSE transport implementation (see KNOWN_ISSUES.md)
Test Coverage: 1859 tests passing, 94.55% coverage (0 skipped)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.5.0)¶

AWS Deployment Infrastructure (ROADMAP Items 16-20): Hybrid Lambda + ECS architecture via CDK Python; 4 CDK stacks (infra/): AegisSharedStack (DynamoDB, Secrets Manager, S3, KMS), AegisLambdaStack (Lambda container image + API Gateway REST with IAM auth), AegisMcpStack (ECS Fargate + ADOT sidecar for AMP), AegisMonitoringStack (CloudWatch alarms + dashboard + billing protection); src/lambda_handler.py wrapping pcw_decide() with 3 routes (POST /evaluate, POST /risk-check, GET /health); Dockerfile.lambda for scipy-enabled container image; .github/workflows/aegis-deploy.yml (OIDC deploy pipeline); .github/actions/aegis-gate/action.yml (reusable governance gate composite action); ADR-007 documenting architecture decision; estimated $51/mo
Ultrathink Hardening: U-1 quality_subscores null filter (prevents TypeError), U-2 aegis-gate script injection fix (env vars), U-4 Dockerfile.lambda editable install removed
Coverage Boost: 8 error-path tests raising lambda_handler.py from 86% to 94%
Test Coverage: 1859 tests passing, 94.55% coverage (0 skipped)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.4.0)¶

Drift Detection → Policy Connection (ROADMAP Item 15): Wired DriftMonitor KL divergence detector into the production decision path of pcw_decide() — CRITICAL drift → HALT (non-overridable), WARNING drift → advisory constraint, NORMAL drift → no change; drift_monitor=None (default) → identical behavior to previous versions (backward compatible); new drift_result field on PCWDecision; _evaluate_drift_policy() and _apply_drift_overrides() helpers extracted for PLR0912 compliance; DRIFT_POLICY_ENFORCED telemetry event type; AegisConfig.create_drift_monitor() factory; CLI --drift-baseline flag with null-value filtering; MCP drift_baseline_data array parameter with null-value filtering; DriftAction and DriftResult re-exported from SDK facade and engine; drift-specific next_steps for CRITICAL HALT; 39 new tests including 6 quality-gate regression tests
Test Coverage: 1817 tests passing, 94.56% coverage (0 skipped)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.3.1)¶

HTTP Telemetry Sink (ROADMAP Item 14): HTTPEventSink (per-event fire-and-forget POST), BatchHTTPSink (batching with retry and background flush daemon), http_sink() factory; stdlib-only (urllib.request, matching WebhookAlertSink pattern); AegisConfig.telemetry_url optional field; CLI --telemetry-url flag on aegis evaluate; MCP telemetry_url string parameter on aegis_evaluate_proposal; SDK facade re-exports (BatchHTTPSink, HTTPEventSink, http_sink); telemetry __init__.py re-exports; 45 new tests
Test Coverage: 1778 tests passing, 94.44% coverage (0 skipped)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.3.0)¶

Shadow Mode (ROADMAP Item 13): Added shadow_mode keyword parameter to pcw_decide() for KL divergence calibration data collection without enforcing decisions; new ShadowResult dataclass with drift evaluation, observation values, and baseline hash; DriftMonitor integration via optional drift_monitor parameter; TelemetryEmitter integration via optional telemetry_emitter parameter with SHADOW_EVALUATION event type; Prometheus mode label on decision_latency_seconds histogram ("production"/"shadow"), new aegis_shadow_evaluations_total counter; CLI --shadow flag on aegis evaluate; MCP shadow_mode boolean parameter on aegis_evaluate_proposal; ShadowResult re-exported from SDK facade; alerting/recording rules filtered to {mode="production"} to exclude shadow data; 44 new tests
Test Coverage: 1733 tests passing, 94.48% coverage (0 skipped)
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.2.3)¶

ROADMAP Items 10-12: Production deployment guide (docs/deployment/production-guide.md), migration guide (docs/deployment/migration-guide.md), performance SLAs with recorded benchmark baselines (docs/deployment/performance-slas.md); Dockerfile (multi-stage, non-root), docker-compose.yaml (AEGIS + Prometheus + Grafana), monitoring/prometheus/prometheus.yml (scrape config); no code changes
CALIBRATOR Actor (ROADMAP Item 7): New Calibrator actor type — statistical threshold tuning for drift thresholds, Bayesian priors, gate parameters; approval-gated workflow (PROPOSED→APPROVED→APPLIED); _RECOGNIZED_PARAMETERS whitelist (16 params); ultrathink-hardened (U-1 ValueError propagation, U-2 setattr validation, U-3 double-apply TOCTOU, U-4 derived ID collision, U-5 emit simplification); 69 new tests including 12 regression tests; 1689 tests / 94.60% coverage
GOVERNANCE Actor (ROADMAP Item 6): New Governance actor type — override orchestration (initiate/sign/approve/reject/expire), compliance checking (complexity gate non-overridable, fail-closed), emergency halt, thread-safe with threading.Lock; ultrathink-hardened (U-1/U-2 halt guards, U-3 fail-closed compliance, U-4 terminal cleanup, U-7 thread safety); 41 new tests including 6 regression tests; 1620 tests / 94.36% coverage
DRY Extraction (ROADMAP Items 8 & 9): Extracted ensure_utc() to src/workflows/serialization.py (3 workflows), 4 validation helpers to src/engine/validation.py (5 engine modules); 26 new tests; deferred persistence/telemetry timezone consolidation; 1620 tests / 94.36% coverage
Dependency Fix: Moved scipy/prometheus_client from dev to dedicated engine/telemetry optional groups with graceful ImportError at point of use; 4 regression tests; 1552 tests / 94.27% coverage
Quality-Gate Ultrathink #10: 5 MEDIUM bugs fixed — Bayesian overflow/NaN guard (B10-1/B10-2), pipeline validator exception propagation + per-event counting (T-2/T-3), executor rollback retry (T10-1) — 7 regression tests; 1471 tests / 94.23% coverage
Rigor Close Deferrals v2: 4 bugs fixed + 3 closed as intentional; 6 regression tests; 1466 tests / 94.22% coverage
FIX-1: Pipeline validator short-circuit break skipped remaining validators when drop_invalid=False
FIX-2: Bayesian update_prior() linear std interpolation → variance-space combination (Jensen's inequality fix)
FIX-3: Decryption _decrypt_dict() dotted-path field filtering (e.g., "nested.actor_id")
FIX-4: Consensus approval_threshold default 0.67 → 2/3 (float precision fix for 2/3 majority)
CLOSE-1: Kappa discontinuity (intentional design), CLOSE-2: Prometheus private API (no alternative), CLOSE-3: Override rejection schema-code gap (intentional architecture)

Changelog (4.2.2)¶

Bug-Hunt #9 + Ultrathink: 8 bugs fixed (4M, 4L) + 2 ultrathink findings; 19 regression tests; 1466 tests / 94.22% coverage
Rigor: Close Deferrals: M6 (import normalization) + L47 (UtilityCalculator phi_S/phi_D validation) closed; T-1 ComplexityDecomposer NaN guard; 15 regression tests; 1441 tests / 94.14% coverage
Quality-Gate: DEKEntry frozen dataclass (cache immutability), schema closure (theta in interface contract), 1426 tests / 94.14% coverage
Docs-Sync #1: Comprehensive documentation audit — 9 files updated with 1398→1417 test metric sync
Docs-Sync #2: Changelog header relabeled (4.2.1 → 4.2.2), ROADMAP/test-count-methodology dates fixed (2026-02-08 → 2026-02-07), gap-analysis bumped to v1.30.0, CLAUDE.md §10 added 3 missing modules (config.py, cli.py, aegis_governance/), §4.10 updated from 4 → 7 optional dependency groups
Stale references fixed: gap-analysis.md date, test-count-methodology.md date, repository-structure.md CLAUDE.md annotation (v4.0.0 → v4.2.2), KNOWN_ISSUES.md version (4.2.1 → 4.2.2)
CLAUDE.md: Telemetry schema version reference corrected (v2.0.0 → v2.1.0)
ROADMAP.md: Version ordering anomaly fixed (v1.15.0 → v1.17.0 to restore monotonic ordering)
comprehensive-todo-discovery.md: Stale metrics corrected (1375/15 skipped → 1398/0 skipped, 91/91 → 103/103 bugs)
gap-analysis.md: GAP-L1 In-Progress table updated (66% → 100% code-complete), changelog entry added

Changelog (4.2.1)¶

Bug-Hunt Sessions #3 & #4: Hybrid Codex+Claude sweeps — 14 bugs fixed (11 MEDIUM, 3 LOW)
Session #3 MEDIUM: Four-eyes violation in override workflow, confidence fallback or vs is not None, current_state property returning stale derived values, terminal state guard in sign_with_stored_key, MCP server replying to JSON-RPC notifications
Session #3 LOW: RBAC _resolve_permissions return type set/frozenset inconsistency, KL divergence length guard, WebhookAlertSink Content-Type header loss with custom headers
Session #4 MEDIUM: Buffer overflow silent discard in pipeline, encryption/decryption error path mutation (2 fixes), PostgreSQL URL encoding in persistence, executor rollback audit gap, quality_no_zero_subscore config flag ignored
Ultrathink hardening: PIIManifest set → frozenset (immutable PII fields), decryption TypeError handler (crypto resilience), pipeline warn-path copy-on-error, PostgreSQL URL quote_plus → SQLAlchemy URL.create() — 4 regression tests
18 new regression tests across 12 test files
6 LOW-severity bugs deferred (documented in KNOWN_ISSUES.md)
Rigor Protocol: 13 deferred ultrathink findings fixed (T-1..T-6, W-1..W-10) — 2 MEDIUM + 11 LOW with 18 regression tests
Bug-Hunt Session #5: 11 ultrathink findings fixed (5 MEDIUM, 6 LOW) with 9 regression tests — KL divergence re-normalization, inf histogram handling, import path fix, None guards, defensive copies, MappingProxyType immutability, exception guards
Bug-Hunt Session #6: 6 bugs fixed (3 MEDIUM, 3 LOW) with 10 regression tests — RBAC fail-open constraint (Codex), MCP non-dict JSON crash, drift inf baseline, CLI simplified names, pcw_decide empty next_steps, executor re-execution guard
Quality-Gate Ultrathink: 5 bugs fixed (3 MEDIUM, 2 LOW) with 5 regression tests — NaN confidence propagation in gate evaluation, CLI risk_score priority override, pipeline stop/start race condition, dead rationale_parts code in afa_bridge, decision_path case inconsistency
Benchmarks Enabled: --benchmark-skip → --benchmark-disable — 15 benchmark tests now execute (0 skipped)
Bug-Hunt Session #8: 6 bugs fixed (3 MEDIUM, 3 LOW) with 8 regression tests — config utility_threshold YAML drop, drift histogram ignoring baseline range, Bayesian NaN propagation, consensus premature rejection, pipeline buffer_size=0 infinite loop, repository async lazy-load crash
Test Coverage: 1398 tests passing, 94.13% coverage
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.2.0)¶

Gap Closure Sprint (issues #24, #2, #7, #5, #8, #9): Major gap closure addressing RBAC enforcement, performance testing, override audit, DR drill, and monitoring dashboard gaps
Schema Alignment: Resolved three-way naming drift in telemetry override fields across schema YAML, OverrideInfo dataclass, and TelemetryEmitter payloads
New modules: src/rbac.py (RBAC enforcement engine), src/telemetry/alert.py (alerting rules), src/telemetry/metrics_server.py (metrics HTTP server)
New test suite: tests/test_schema_consistency.py (13 tests for schema-code consistency)
Wired RBAC into override workflow and pcw_decide decision flow
Added monitoring/ configs (Prometheus recording/alerting rules, Grafana dashboards)
Added to_schema_dict() on OverrideInfo for schema-compliant serialization
Added stale partial override Prometheus alert (AegisOverrideStalePartial)
128 new tests across RBAC, alerting, metrics server, schema consistency, DR, benchmarks
Test Coverage: 1309 tests passing, 94.17% coverage
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.1.0)¶

v1.0 SDK Merge (PR #23, commit cfa3783): AEGIS v1.0 Governance Decision SDK
New modules: src/config.py (AegisConfig), src/cli.py, src/aegis_governance/__init__.py (facade), src/aegis_governance/mcp_server.py
79 new tests: test_config.py, test_cli.py, test_facade.py, test_mcp_server.py
4 runnable examples in examples/
README rewritten for SDK positioning
pyproject.toml: Added [project.scripts] entries (aegis, aegis-mcp-server)
Updated §1 entry points to reflect SDK surfaces (CLI, MCP, Python import)
Test Coverage: 1172 tests passing, 94.61% coverage
Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.0.0)¶

CLAUDE.md Audit & Regeneration: Full v4.0.0 rewrite with agentic AI hardening
Relocated 900-line changelog (v2.1–v3.36) to this file
NEW Section 11: Agentic AI Hardening (OWASP Agentic Top 10 mapping)
Added 5 developer playbooks (setup, quality gates, add gate, add workflow, optional deps)
Added Python code standards (type annotations, dataclass patterns, thread safety)
Added governance invariant protection protocol
Created 3 custom slash commands (/quality-gate, /sync-metrics, /governance-verify)
Enhanced ask-first triggers with agentic safety triggers
Updated audit: docs/claude/audits/aegis-root-v4.0.md
Reduced CLAUDE.md from 69KB (~1340 lines) to ~20KB (~620 lines)
Test Coverage: 1132 tests passing, 93.24% coverage
Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.36.0)¶

Hybrid Bug-Hunt Session: Codex gpt-5.2-codex (xhigh) + 3 Claude debugger agents
Bug Fixes (1 HIGH, 4 MEDIUM):
Bug 1 (MEDIUM): gates.py:77-86 - Missing epsilon validation
- Added validation in __init__ to reject non-positive epsilon_R/epsilon_P
- Raises ValueError with descriptive message preventing division by zero
- Regression tests: 4 tests for epsilon validation
Bug 2 (MEDIUM): gates.py:423,507 - Incorrect tail for negative trigger factor
- Risk gate now computes left-tail P(Δ≤t) when trigger_factor < 0
- Profit gate uses abs(trigger_factor) for symmetric both-tails check
- Regression tests: 3 tests for negative trigger factor behavior
Bug 3 (MEDIUM): gates.py:674 - Utility confidence ignores threshold
- Changed confidence calculation to use distance from threshold (margin = lcb - threshold)
- Sigmoid function now reflects how far above/below threshold the utility is
- Regression tests: 4 tests for confidence calculation
Bug 4 (MEDIUM): pipeline.py:282 - Queue not drained on stop
- Added queue drain loop before final flush in stop() method
- Prevents data loss when events are still queued during shutdown
- Regression tests: 1 test for queue drain on stop
Bug 5 (HIGH): encryption.py:543 - PII encryption bypass for lists
- _encrypt_dict didn't recurse into lists, leaving PII unencrypted
- Added _encrypt_list() method for recursive list processing
- Also fixed decryption.py with _decrypt_list() and _verify_list_integrity()
- Regression tests: 4 tests for list encryption/decryption
Test Coverage: 1053 tests (+16 from v3.34.0), 93.83% coverage
Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.35.0)¶

Hybrid Bug-Hunt Session: Second session with Codex gpt-5.2-codex (xhigh) + 3 Claude debugger agents
Bugs Identified: 6 total (1 HIGH, 5 MEDIUM) - fixed in v3.36.0
Test Coverage: 1037 tests, 94.11% coverage
Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.34.0)¶

All Deferred Bugs Fixed: Resolved 17 deferred issues from hybrid bug-hunt sessions
MEDIUM Severity (1):
B2-1: emitter.py:334 - memory_sink unbounded growth - added maxlen parameter
LOW Severity (16):
B1-1,B1-2,B1-3: proposer.py - PERT validation and TOCTOU fixes
B2-4,B2-10,L47: gates.py, utility.py, complexity.py - validation improvements
B1-4,B1-5: bip322_provider.py, hybrid_kem.py - documentation and edge case handling
B2-11,B2-12: schema.py, afa_bridge.py - nested validation and hash confidence
B3-1,B3-2,B3-5,B3-7: consensus.py, override.py, durable.py, models.py - workflow validation and chain integrity
Test Coverage: 1037 tests (+81 from v3.33.0), 94.11% coverage (+0.48pp)
Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.33.0)¶

Workflow Bug Fixes (4 MEDIUM Severity): Continued bug-hunt session fixes
B3-1: consensus.py - Empty eligible_voters and actor_roles validation
Added validation to reject empty eligible_voters with positive quorum_percentage
Previously: quorum could never be met, causing confusing runtime behavior
Added warning when get_required_missing() called with empty actor_roles dict
Regression tests: 6 tests for empty voters and actor_roles scenarios
B3-2: override.py - is_expired TOCTOU race condition documentation
Enhanced is_expired docstring to document advisory-only nature (TOCTOU risk)
Added check_and_mark_expired() method for atomic check-and-update operation
Signature operations already perform atomic expiration check internally
Regression tests: 5 tests for expiration handling and atomic marking
B3-5: durable.py - resume_or_create() ID mismatch detection
Added strict_id: bool = True parameter to detect workflow ID mismatches
Raises ValueError when resumed workflow ID differs from requested ID (indicates caller bug)
Use strict_id=False for legacy behavior (logs warning, returns stored workflow)
Regression tests: 3 tests for strict_id behavior
B3-7: models.py - verify_chain_link() method for chain validation
Added verify_chain_link(previous_transition) to WorkflowTransition
Validates: hash integrity, workflow ID match, state continuity, temporal ordering
Returns (is_valid, error_message) tuple for detailed error reporting
Regression tests: 7 tests for chain link validation scenarios
Test Coverage: 956 tests passing (+10 from v3.32.0), 93.63% coverage
Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.32.0)¶

Hybrid Bug-Hunt Session: Codex gpt-5.2-codex (xhigh) + 3 Claude debugger agents
Lane A (Codex) - Bayesian Zero Override Fix (src/engine/bayesian.py:357-358):
Python truthy or fallback ignored explicit zero overrides in update_prior()
Fix: Changed current_mean or self.prior_mean -> self.prior_mean if current_mean is None else current_mean
Regression test: test_update_prior_empty_observations_respects_overrides
Lane B (Claude) - 4 MEDIUM Severity Fixes:
B2-3: prometheus_exporter.py - Duplicate metric registration on multi-instantiation
- Added _get_or_create_counter(), _get_or_create_histogram(), _get_or_create_gauge() factory methods
- Makes metric registration idempotent via REGISTRY lookup
B3-3: override.py - reject() discards actor_id/reason parameters
- Added rejected_by, rejection_reason, rejected_at fields to OverrideRequest
- Updated reject() to record rejection metadata
- Added serialization/deserialization for rejection fields
B3-4: proposal.py - from_dict() loses prometheus exporter reference
- Added from_dict_with_exporter() factory method for DI during deserialization
- Added set_prometheus_exporter() method for post-construction injection
Test Coverage: 956 tests passing (+10 from v3.31.0), 93.63% coverage
Quality Gates: All passing (mypy --strict, ruff, black, bandit, pip-audit)

Changelog (3.31.0)¶

Claude-GPT Dialogue Recommendations: Implemented 4 changes from multi-model consensus session
phi_S/phi_D Single Source of Truth:
Updated docs/architecture/afa-libertas-integration.md line 753-754
Updated docs/architecture/repository-structure.md line 546-549
All phi_S/phi_D values now reference schema/interface-contract.yaml as authoritative source
Corrected values: phi_S=100 (was 500), phi_D=2000 (was 10000)
KNOWN_ISSUES.md Cleanup:
Removed L45 from LOW Severity (intentional design, not a bug)
Moved L45 to "Intentional Patterns" section with explanation
Reclassified L7 from "Python limitation" to "Known Limitation" with HSM mitigation path
Added detailed HSM/KMS integration guidance for production deployments
docs-consistency.yml CI Workflow: New GitHub Actions workflow
Validates test count consistency across CLAUDE.md, README.md, ROADMAP.md, gap-analysis.md
Extracts version and coverage metrics from CLAUDE.md as source of truth
Weekly scheduled runs + push/PR triggers on documentation changes
Advisory warnings (non-blocking) for mismatches
Test Coverage: 946 tests passing, 93.48% coverage (unchanged)
Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.30.0)¶

Deferred Bug Fix Complete: Fixed 2 remaining deferred issues (L44, L49)
L44 Fix: analyst.py:345 - Type coercion validation in _evaluate_utility_gate()
Added _coerce_to_float() method with try-except float() pattern
Validates all 8 numeric fields (mean, variance, lcb, ucb for both value/risk)
Regression tests: 9 tests for type coercion edge cases
L49 Fix: hybrid_provider.py:324 - Timing side-channel mitigation
Added audit_mode: bool = False parameter to HybridSignatureProvider
Detailed timing logs only when audit_mode=True for debugging
Generic error messages in default mode prevent timing leakage
Regression tests: 6 tests for audit_mode behavior
Research Verification: Fixes validated via ExaSearch
L44: try-except float() is standard Python type coercion idiom
L49: Configurable audit modes per Intel security guidance
Test Coverage: 946 tests passing (+15 from v3.29.0), 93.48% coverage
Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.29.0)¶

Hybrid Bug-Hunt Session: Implementation of 4 priority fixes from bug-hunt plan v3.0
HIGH Severity Fixes (2):
H-WF-001: consensus.py:228 - All-abstain stuck state in QUORUM_MET
- When quorum met with only abstention votes, workflow stayed stuck indefinitely
- Fixed: Rejects immediately when quorum met with zero decisive votes
- Regression tests: 4 tests for partial quorum scenarios
H-WF-003: pipeline.py - Thread safety race conditions
- Stats counters not lock-protected, reset_stats() object replacement race
- Fixed: Added _stats_lock, get_stats() returns snapshot, stop() timeout warning
- Regression tests: 5 thread safety tests
MEDIUM Severity Fixes (3):
M24: hybrid_kem.py:279 - Empty plaintext now raises ValueError (with allow_empty opt-in)
M25: bip322_provider.py:109 - Keygen max retry limit (1000 attempts)
M-ENG-005: pcw_decide.py:187 - Added AttributeError catch for malformed context
Test Coverage: 931 tests passing (+15 from v3.28.0), 93.48% coverage
Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.28.0)¶

Deferred Bug Cleanup: Fixed 16 issues from hybrid bug-hunt (5 MEDIUM, 11 LOW)
MEDIUM Severity Fixes (5):
M19: drift.py:68 - Added num_bins > 0 and epsilon > 0 validation
M20: bip322_provider.py:173 - Added empty message_hash validation
M21: kek_provider.py:224 - Documented version=0 alias for "current"
M22: pipeline.py:321 - Added _start_lock to fix TOCTOU race in start()
M23: key_store.py:516 - Added logging when key not found in record_usage()
LOW Severity Fixes (11):
L41-L43: gates.py - Input validation with warnings for out-of-range scores
L46: approver.py:139 - Removed redundant validation (covered by __post_init__)
L48: proposer.py:152 - Better error for already-submitted proposals
L50: hybrid_kem.py:279 - Warning for empty plaintext in encrypt()
L51: mldsa.py:98 - Empty message validation in sign()
L52: hybrid_provider.py:225 - Empty message_hash validation
L53: afa_bridge.py:467 - Positive limit validation in get_decision_history()
L54: consensus.py:217 - Fixed all-abstain stuck state
L55: bayesian.py:362 - Extracted magic number to _FALLBACK_STD constant
Remaining Deferred (2): L45 (intentional), L47 (extreme values valid)
Test Coverage: 916 tests passing (+4 regression tests), 93.39% coverage
Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.27.0)¶

Hybrid Bug-Hunt Session: Codex gpt-5.2-codex (xhigh) + 3 Claude debugger agents
HIGH Severity Fixes (2 - Codex Lane A):
H1: pcw_decide.py:243 - Human approval bypass when all gates pass
- CRITICAL: pcw_decide() returned PROCEED even with requires_human_approval=True
- Added human_required gating to enforce PAUSE/ESCALATE
H2: consensus.py:337 - Workflow ID collision with ProposalWorkflow
- Namespaced to consensus:{proposal_id} to avoid persistence conflicts
New Issues Identified (Claude Lane B): 5 MEDIUM, 15 LOW (documented in KNOWN_ISSUES.md)
Test Coverage: 912 tests passing (+2 regression tests), 93.48% coverage
Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.26.0)¶

Rigor Protocol Phase 3: Fixed 13 remaining issues (5 MEDIUM, 8 LOW)
MEDIUM Severity Fixes (5):
M14: proposal.py:359 - Documented is_terminal excludes ROLLED_BACK
M15: pipeline.py:90 - PipelineStats.errors bounded to deque(maxlen=1000)
M16: drift.py:279 - calibrate_thresholds() requires at least 2 values
M17: override.py:976 - Documented from_dict() maintenance contract
M18: pipeline.py:390 - Replaced bare except Exception in validate_timestamp()
LOW Severity Fixes (8):
L33: approver.py:24 - Added __post_init__ validation for ApprovalVote.vote
L34: emitter.py:164 - Fixed auto-linking parent_event_id (correlation chain only)
L35: schema.py:240 - Removed redundant local import dataclasses
L36: proposal.py:58 - Added __post_init__ validation to ProposalMetadata
L38: gates.py:62 - Enhanced GateResult.margin docstring
L39: pcw_decide.py:28 - Moved UtilityComponents import to module level
L40: pcw_decide.py:471 - Added debug logging to quick_risk_check()
M13: proposer.py:118 - Enhanced PERT validation comment (reclassified)
Test Updates: Updated test_cast_vote_invalid and test_stats_defaults for new validation
Test Coverage: 910 tests passing, 93.48% coverage
Quality Gates: All passing (mypy --strict, ruff, black, bandit)
Remaining Issues: 2 (M6 non-relative imports; L7 secure memory erase)

Changelog (3.25.0)¶

Rigor Protocol Phase 2: Fixed 17 issues (4 MEDIUM, 13 LOW) with 25 regression tests
Test Coverage: 910 tests passing (+25), 93.48% coverage
Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.24.0)¶

Rigor Protocol Phase 1: Fixed 7 issues (2 MEDIUM, 4 LOW, 1 documentation)
Claude-GPT Dialogue: Resolved M15 retry logic architecture (Hybrid approach)
Test Coverage: 885 tests passing, 93.48% coverage
Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.23.0)¶

Minor Documented Issues Fixed: 11 issues from KNOWN_ISSUES.md resolved
Test Coverage: 885 tests passing (+18 from 867), 93.48% coverage
Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.22.0)¶

Hybrid Bug-Hunt Session Complete: Codex gpt-5.2-codex (3 iterations) + 3 Claude debugger agents
Test Coverage: 867 tests passing (+9 from 858), 93.79% coverage
Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.21.0)¶

Hybrid Bug-Hunt Session: 3 Claude debugger agents reviewing 42 source files in parallel
Test Coverage: 858 tests passing (+4 from 854), 93.79% coverage
Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.20.0)¶

All LOW Severity Bugs Fixed: Complete resolution of L1-L9 from hybrid bug-hunt
Test Coverage: 854 tests passing (+8 from 846), 93.79% coverage
Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.19.0)¶

Hybrid Bug-Hunt Session: Codex gpt-5.2-codex + 3 Claude debugger agents
Test Coverage: 845 tests passing (+6 from 839)
Quality Gates: All passing (mypy --strict, ruff, black)

Changelog (3.18.0)¶

Quality Gate Fixes: Full quality-gate execution with all 8 phases
Test Coverage: 839 tests passing
Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.17.0)¶

Bug Fixes (KNOWN_ISSUES.md): Fixed 4 MEDIUM severity bugs from hybrid bug-hunt
Test Coverage: 839 tests passing (+2 from 837)
Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.16.0)¶

Bug Hunt Session: Hybrid bug-hunt with 3 Claude debugger agents
Test Coverage: 837 tests passing (maintained)
Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.15.0)¶

Bug Fixes (Rigor Protocol): Fixed 9 except Exception patterns from hybrid bug-hunt session
Test Coverage: 837 tests passing (+9 from 828)
Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.14.0)¶

Bug Fixes (Rigor Protocol): Fixed 3 MEDIUM severity bugs from hybrid bug-hunt session
Test Coverage: 828 tests passing (+7 from 821)
Quality Gates: All passing (mypy --strict, ruff, black)

Changelog (3.13.0)¶

Mathematical Coherence Review: Addressed 4 design decisions from rigor protocol review
Test Coverage: 821 tests passing (+14 from 807), 93.34% coverage

Changelog (3.12.0)¶

Optional Dependencies Installed: Full post-quantum cryptography and BIP-322 now active
Test Coverage Milestone: All tests now pass with 0 skipped (821 passed)

Changelog (3.11.0)¶

Mathematical Coherence Fixes: Implemented 4 critical fixes from multi-model coherence review
Test Coverage: Added 20 new tests (506 total passing, 282 skipped)

Changelog (3.10.0)¶

ADR Consolidation Complete: Finished consolidating ADR directories

Changelog (3.9.0)¶

Logic Coherence Fixes: Public API improvement, factory method, overflow protection

Changelog (3.8.0)¶

Documentation Synchronization & Future Work Roadmap

Changelog (3.7.0)¶

GAP-L1 PHASE 1 IMPLEMENTED: Prometheus Metrics Foundation

Changelog (3.6.0)¶

Index & Cross-Reference Update: Systematic update of all TOCs, indexes, and cross-references

Changelog (3.5.0)¶

Documentation Enhancement & Quality Gates: CI/CD badges, test count methodology, dependency visualization

Changelog (3.4.0)¶

Repository Nomenclature Update: Clarified AEGIS vs Guardrails naming

Changelog (3.3.0)¶

GAP-Q2 PHASE 2 COMPLETE: Full post-quantum encryption implementation
TEST COVERAGE MILESTONE: 846 tests passing (444 new), 93.60% coverage

Changelog (3.2.0)¶

GAP-Q1 IMPLEMENTED: Post-quantum hybrid signatures (Ed25519 + ML-DSA-44)

Changelog (3.1.0)¶

GAP-M4 IMPLEMENTED: Full BIP-322 signature format support

Changelog (3.0.0)¶

AEGIS v1.0.0 RELEASED: Production-ready release with full CI/CD validation

Changelog (2.9.0)¶

Documentation Synchronization: Comprehensive audit and alignment of all documentation

Changelog (2.8.0)¶

Repository Migration: Restructured for implementation-ready architecture

Changelog (2.7.0)¶

AEGIS Integration: Unified five frameworks into Autonomous Engineering Governance System

Changelog (2.6.0)¶

Documentation synchronization & cleanup review completed

Changelog (2.5.0)¶

Added EPCC methodology documentation

Changelog (2.4.0)¶

Fixed markdown linting issues

Changelog (2.3.0)¶

Documentation synchronization audit completed

Changelog (2.2.0)¶

Added framework comparison analysis

Changelog (2.1.0)¶

Removed [PROVISIONAL] tags - tooling now configured