Skip to content

AEGIS CLAUDE.md Changelog

Relocated from: CLAUDE.md Section 9 (Change Management) Purpose: Preserve full version history while keeping CLAUDE.md concise for agent system prompt consumption Coverage: v2.1.0 (2025-12-26) through v4.5.59 (2026-02-25) Active CLAUDE.md: See /CLAUDE.md for current rolling changelog (latest 2 versions)


v4.5.59 — Utility Scale & Sign Convention Fixes (2026-02-25)

Scope: Lambda auto-utility scale mismatch and sign convention fixes, E2E validation Test Coverage: 3041 tests passing, ~94.9% coverage (2 skipped) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changes

  • Complexity tax scale fix: Zeroed ComplexityBreakdown(static=0.0, dynamic=0.0)phi_S=100.0 $/pt coefficient overwhelmed normalized [0,1] profit deltas (LCB=-64.9 for a 0.10 profit improvement)
  • Risk term sign fix: Zeroed risk delta via _make_pert(0.0)risk_term = kappa * delta_R produces negative values when risk decreases (delta_R < 0), counteracting the profit gain (LCB=-0.015 for a beneficial proposal)
  • Design rationale: Risk and complexity gates evaluate independently; the utility gate now measures pure profit uplift, avoiding scale mismatches between normalized advisor inputs and dollar-denominated calculator coefficients
  • E2E validation: 8 API scenarios via browser fetch (PROCEED, PAUSE, HALT, multi-domain) + 3 full wizard walkthroughs (Engineering low-impact, Engineering worst-case, Life Decision) confirm correct behavior

Deployed

  • Lambda redeployed via aegis-deploy.yml workflow_dispatch (dev stage)
  • CI green: Python CI + docs deploy + CDK deploy + smoke test all passing

v4.5.58 — Advisor Utility & Novelty Gate Fixes (2026-02-25)

Scope: Lambda auto-utility computation, advisor novelty step reframe Test Coverage: 3041 tests passing, ~94.9% coverage (2 skipped, +12 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changes

  • Lambda auto-utility: Added _make_pert() and _compute_utility() — synthesizes PERT three-point estimates from flat risk/profit/complexity parameters, auto-computes UtilityResult when not explicitly provided. Utility gate now produces real values instead of N/A for advisor proposals.
  • Advisor novelty reframe: Step 7 changed from "How new is this?" to "How well-documented is this type of change?" — inverted value mapping so well-documented precedent = high score = passes gate. Radio values: 0.95/0.88/0.72/0.40 (was 0.2/0.5/0.7/0.95). Tool modifiers: +0.02/-0.03/-0.08 (was 0/+0.1/+0.2).
  • Gate explanations: Novelty pass/fail messages updated to match precedent framing
  • Review screen: Label changed from "Novelty" to "Precedent"

v4.5.57 — Advisor & Lambda CORS Fixes (2026-02-25)

Scope: Advisor evaluate button fix, Lambda CORS headers, custom domain migration Test Coverage: 3029 tests passing, ~94.8% coverage (2 skipped) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changes

  • Advisor evaluate() DOM collision: Renamed evaluate() to runEvaluation() — inline onclick handler resolved to document.evaluate() (XPath), not the custom function (3 call sites)
  • Lambda CORS headers: Added Access-Control-Allow-Origin: *, Allow-Headers, Allow-Methods to _response() — API Gateway proxy integration passes Lambda response as-is, so CORS headers must be in Lambda response (not just OPTIONS preflight)
  • Custom domain migration: Updated all undercurrentai.github.io URLs to aegis.undercurrentholdings.com in advisor HTML (3 links), pyproject.toml Documentation URL, and docs/api/rest.md CORS headers table
  • CDK deploy: Redeployed Lambda via aegis-deploy.yml workflow — smoke test passed

v4.5.56 — Bug Hunt #45 (2026-02-25)

Scope: Hybrid bug hunt (Codex gpt-5.3-codex xhigh + 3 Claude sweep agents) Test Coverage: 3029 tests passing, ~94.8% coverage (2 skipped, +31 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changes

  • BH45-Codex-M1: ProposalWorkflow.transition_to() metadata stored by reference — copy.deepcopy fix prevents caller mutation corrupting audit trail
  • BH45-M1: MCP risk_score eager evaluation — conditional fallback matching Lambda BH13-M1 fix (transport parity)
  • BH45-M2: BayesianPosterior.update_prior missing current_mean/current_std finiteness validation — added isfinite + validate_positive
  • BH45-T1: BayesianPosterior.update_prior current_mean missing bool guard (ultrathink finding) — added isinstance(bool) check
  • BH45-L1: PipelineConfig missing retention_days/drift_window_size validation — added type/range checks matching buffer_size pattern
  • BH45-L2: PipelineConfig pii_encryption_on_error/storage_on_error accept arbitrary strings — added enum validation (PII safety)
  • Deferred: 3 LOW findings (drift next_steps recomputation, schema_revisions counter, consensus QUORUM_MET semantics)

v4.5.55 — Scoring Guide MCP Tool + Advisor v2 (2026-02-25)

Scope: Scoring Guide MCP tool and Advisor v2 rewrite Test Coverage: 2998 tests passing, ~94.8% coverage (2 skipped, +31 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changes

  • Scoring Guide MCP Tool: New aegis_get_scoring_guide tool with 5-domain derivation guidance (trading, cicd, moderation, agents, generic); surfaces parameter formulas, range guides, common mistakes, and worked examples through the MCP protocol
  • Advisor v2: Complete rewrite with domain funnel (6 domains), 8-step factual scoring rubric replacing vibes-based sliders, real API calls with provisioned demo key
  • Ultrathink Fixes: Defensive copy for get_scoring_guide() return value, HTML escaping for gate detail values, null guard for override_requires
  • Test Coverage: 2998 tests passing, ~94.8% coverage (2 skipped, +31 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.54 — SaaS Commercialization Sprint (2026-02-24)

Scope: Full commercial readiness transformation — API key auth, customer provisioning, docs site, PyPI workflows Test Coverage: 2967 tests passing, ~94.8% coverage (2 skipped, +9 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changes

  • Phase 1 — API Key Auth: CDK lambda_stack.py switched from IAM to API key auth + usage plans; per-stage throttling (dev 50 req/s, staging 200 req/s, prod 500 req/s); /health fully public; CORS widened to *; optional custom domain via CDK context; UsagePlanId CfnOutput added
  • Phase 1 — Tenant Context: Lambda handler extracts tenant_id from requestContext.identity.apiKeyId; injects _tenant_id and _request_id into response body; adds X-AEGIS-Tenant and X-AEGIS-Request-Id headers; 404 responses include tenant context (UT-1 fix)
  • Phase 1 — Provisioning: New scripts/provision-customer.py — boto3 script for API key creation + usage plan attachment + usage queries; show-once key value; rollback on failure; env var fallbacks
  • Phase 2 — OpenAPI: New docs/api/openapi.yaml (OpenAPI 3.1.0) — 3 endpoints, 9 component schemas, ApiKeyAuth security scheme
  • Phase 2 — REST Docs: New docs/getting-started/quickstart-rest.md and docs/api/rest.md — curl examples, field tables, error codes
  • Phase 3 — Docs Site: New mkdocs.yml + 10 new docs pages (index, installation, SDK/CLI/REST/MCP quickstarts, onboarding, GitHub Action, AI governance); docs-deploy.yml GitHub Pages workflow
  • Phase 3 — PyPI: New .github/workflows/pypi-publish.yml — OIDC trusted publishing; multi-version smoke test (3.9, 3.11, 3.12)
  • Phase 4 — Polish: New SECURITY.md (vulnerability disclosure), CHANGELOG.md (customer-facing); pyproject.toml bumped to v1.1.0 with mkdocs-minify-plugin + pymdown-extensions docs deps; Documentation + Changelog URLs added
  • New files: 22 files created (scripts, docs, workflows, config)
  • Governance invariants preserved: src/engine/, src/integration/pcw_decide.py, src/crypto/, schema/interface-contract.yaml, schema/rbac-definitions.yaml, .github/workflows/python-ci.yml all untouched

v4.5.53 — Transport Parity Fix (2026-02-24)

Scope: Comprehensive transport parity audit closing 15 of 22 gaps across CLI, MCP, and Lambda Test Coverage: 2958 tests passing, ~94.8% coverage (2 skipped, +35 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changes

  • GAP 1: metadata input extraction + validation in CLI and MCP (parity with Lambda)
  • GAP 2-4 (CRITICAL): requires_human_approval, time_sensitive, reversible bool flags in MCP — previously missing, silently bypassing human oversight controls
  • GAP 6-7: MCP inputSchema updated with 5 new documented properties (bool flags, metadata, session_id)
  • GAP 8: Lambda telemetry_emitter wired via _wire_lambda_telemetry() helper
  • GAP 12: MCP estimated_impact now strict — rejects non-string values (parity with CLI/Lambda)
  • GAP 15: CLI session_id changed from static "cli-session" to dynamic uuid.uuid4()
  • GAP 17: CLI SSRF validation via shared telemetry/url_validation.py module
  • GAP 18-19: MCP output now includes constraints and override_requires fields
  • GAP 20: MCP output now includes timestamp field
  • GAP 21: MCP output now includes per-gate confidence field
  • GAP 22: Lambda shadow drift dict includes message field

New Module

  • src/telemetry/url_validation.py — shared SSRF-safe URL validation extracted from MCP server; uses resolve-then-validate pattern with not addr.is_global (consistent across Python 3.9-3.12)

v4.5.52 — Bug Hunt #44 (2026-02-23)

Method: Hybrid — 3 Claude sweep agents + Codex gpt-5.3-codex (xhigh reasoning) Test Coverage: 2923 tests passing, ~94.8% coverage (2 skipped, +15 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Bugs Fixed (4 total: 1 Codex + 2M, 1L)

  • BH44-Codex-M1 (Codex) — crypto/schema_signer.py: sign_tools_list() chain state committed before manifest signing — partial state corruption on failure; reordered to sign manifest before committing chain state
  • BH44-M1 (Claude) — actors/calibrator.py: _NONNEGATIVE_GATE_PARAMS incorrectly includes utility_threshold — rejects negative values that GateEvaluator accepts; moved to appropriate parameter set
  • BH44-M2 (Claude) — actors/proposer.py: create_draft() does not catch TypeError from _validate_pert_estimates() — raw exception escapes instead of ActionResult failure; added try/except TypeError wrapper
  • BH44-L1 (Claude) — integration/pcw_decide.py: _evaluate_drift_policy returns aliased constraints list when drift_monitor is None — inconsistent with non-None path which returns a new list; changed to return list(constraints) defensive copy

v4.5.51 — Bug Hunt #43 (2026-02-23)

Method: Hybrid — 3 Claude sweep agents + Codex gpt-5.3-codex (xhigh reasoning) Test Coverage: 2908 tests passing, ~94.8% coverage (2 skipped, +31 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Bugs Fixed (11 total: 2 Codex + 5M, 4L)

  • BH43-Codex-M1 (Codex) — actors/analyst.py: Analyst missing try/except around gate evaluations — unhandled exceptions crash instead of returning ActionResult failure; wrapped gate evaluation calls in try/except with structured error return
  • BH43-Codex-M2 (Codex) — actors/analyst.py: Analyst missing else TypeError for non-list quality_subscores — silently ignores invalid type; added explicit isinstance(list) guard with TypeError for non-list/non-None input
  • BH43-M1 (Claude) — cli.py: CLI quality_subscores=null raises TypeError — Lambda/MCP default to [0.7, 0.7, 0.7] but CLI crashes on None; added null-coalesce parity with other transports
  • BH43-M2 (Claude) — engine/utility.py: ComplexityBreakdown accepts bool fields without validation — True/False silently coerce to 1/0 via bool-is-int; added isinstance(bool) guard in __post_init__
  • BH43-M3 (Claude) — engine/utility.py: UtilityCalculator.calculate() value_variance negative silently floored to 0 — produces overly optimistic LCB; changed to raise ValueError for negative variance
  • BH43-M4+M5 (Claude) — telemetry/pipeline.py: TelemetryPipeline.ingest() no defensive copy — bidirectional aliasing between caller dict and internal buffer; caller mutations corrupt queued events; added copy.copy() on ingest
  • BH43-L1 (Claude) — cli.py: CLI metric defaults ignores simplified alias when canonical key is null — dict.get("key", default) returns None (not default) for explicit JSON null; added explicit null-check before alias fallback
  • BH43-L2 (Claude) — engine/utility.py: UtilityCalculator.calculate() value_low_conf/opex_delta NaN/Inf not validated — non-finite values propagate through utility computation; added math.isfinite() guards
  • BH43-L3 (Claude) — engine/utility.py: UtilityCalculator.calculate() covariance terms NaN/Inf not validated — same pattern as L2; added math.isfinite() guards for cov_risk_value and cov_complexity_quality
  • BH43-L4 (Claude) — workflows/proposal.py: ProposalWorkflow.from_dict() uses cls() instead of cls.__new__() — constructor generates spurious created_at/updated_at timestamps that overwrite deserialized values; switched to cls.__new__(cls) pattern (matching ConsensusWorkflow)

Quality Gate — Ultrathink Findings (1)

  • QG-T1workflows/proposal.py: ProposalWorkflow.from_dict() missing evaluation_result=None for workflows that had no evaluation — deserialized workflows without evaluation results got stale/incorrect default; added explicit None assignment

v4.5.50 — Bug Hunt #42 (2026-02-23)

Method: Hybrid — 3 Claude sweep agents + Codex gpt-5.3-codex (xhigh reasoning) Test Coverage: 2877 tests passing, 94.81% coverage (2 skipped, +29 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Bugs Fixed (13 total: 3 Codex + 6M, 2L + 2 ultrathink)

  • BH42-M1 (Claude) — engine/complexity.py: ComplexityDecomposer mutable default weights dict — self.weights = weights or self.DEFAULT_WEIGHTS created shared reference; mutation of instance weights corrupted class-level DEFAULT_WEIGHTS for all future instances; changed to dict(weights) if weights else dict(self.DEFAULT_WEIGHTS) (defensive copy)
  • BH42-M2 (Claude) — actors/calibrator.py: novelty_k was in _NONZERO_GATE_PARAMS (rejects ==0) but GateEvaluator requires strictly positive; negative values trivially bypassed novelty gate; moved to _POSITIVE_GATE_PARAMS
  • BH42-M3 (Claude) — telemetry/prometheus_exporter.py: NaN/Inf latency permanently corrupts Prometheus histogram _sum (irreversible without process restart); added math.isfinite() guards to record_gate_evaluation(), record_decision(), and MetricsTimer.__exit__()
  • BH42-M4 (Claude) — telemetry/prometheus_exporter.py: NaN KL divergence set via set_kl_divergence() disables drift alerting permanently; added math.isfinite() guard with warning log
  • BH42-M5 (Claude) — telemetry/emitter.py: root emit() method used correlation_id or self.default_correlation_id — empty string "" (valid sentinel) replaced by default; changed to is not None pattern for both correlation_id and parent_event_id
  • BH42-M6 (Claude) — lambda_handler.py: shadow_mode = body.get("shadow_mode") is True silently ignored non-bool values (int 1, string "true"); added explicit isinstance(bool) validation matching other bool fields (requires_human_approval, reversible)
  • BH42-L1 (Claude) — integration/pcw_decide.py: risk_posterior = risk_gate.posterior_probability or 0.0 — zero posterior probability treated as falsy and replaced; changed to if is not None else 0.0
  • BH42-L2 (Claude) — integration/afa_bridge.py: same posterior_probability or 0.0 pattern as BH42-L1; fixed to is not None
  • BH42-Codex-M1 (Codex) — integration/afa_bridge.py: required = context.get("required_authorizations") or [] — explicit falsy values (empty list, 0) coerced to default; changed to explicit None check
  • BH42-Codex-M2 (Codex) — workflows/consensus.py: ConsensusConfig.allow_abstain accepted non-bool via coercion (1, "yes"); added isinstance(self.allow_abstain, bool) guard in __post_init__
  • BH42-Codex-L1 (Codex) — workflows/persistence/repository.py: concurrent save_checkpoint() calls caused IntegrityError on checkpoint_number uniqueness collision; added bounded retry loop (3 attempts) with _is_checkpoint_number_conflict() classifier

Quality Gate — Ultrathink Findings (2)

  • QG-T1aegis_governance/mcp_server.py: MCP _tool_evaluate_proposal() used shadow_mode = arguments.get("shadow_mode") is True — transport parity violation with Lambda; added isinstance(bool) validation returning error dict
  • QG-T2actors/analyst.py: 6 instances of gate_result.confidence or 0.0confidence=0.0 treated as falsy; replaced all with if gate_result.confidence is not None else 0.0

v4.5.49 — Bug Hunt #41 (2026-02-22)

Method: Hybrid — 3 Claude sweep agents + Codex gpt-5.3-codex (xhigh reasoning) Test Coverage: 2848 tests passing, 94.82% coverage (2 skipped, +33 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Bugs Fixed (7 total: 1 Codex + 4M, 2L)

  • BH41-M1 (Claude) — actors/analyst.py: _run_quality_analysis() included None elements in quality_subscores without saw_non_null guard; [None, 0.8] silently dropped None but [None, None] defaulted inconsistently; added saw_non_null filter matching afa_bridge pattern so all-None list defaults and partially-valid list filters correctly
  • BH41-M2 (Claude) — engine/validation.py: validate_range() default check_nan=False — NaN inputs silently passed range checks; changed default to True; all production callers already passed check_nan=True explicitly, so no behavior change for existing call sites
  • BH41-M3 (Claude) — crypto/schema_signer.py: create_tool_statement() mutated _prev_digests on each call — partial sign_statement() failure left chain in inconsistent state; sign_tools_list() now collects pending_digests first, commits atomically only after all sign_statement() calls succeed
  • BH41-M4 (Claude) — workflows/consensus.py: get_required_missing() included DEFER voters in voted_roles accumulation — a single DeferredVoter trivially satisfied role coverage for any role; DEFER now excluded consistent with votes_cast exclusion
  • BH41-L1 (Claude) — actors/calibrator.py: list_proposals() iterated live proposal dict while serializing enum .value — concurrent propose() mutations caused AttributeError on partially-mutated proposals; snapshot status enum values under self._lock
  • BH41-L2 (Claude) — telemetry/emitter.py: emit_mcp_invocation() used correlation_id or self.default_correlation_id — empty-string "" (valid sentinel) silently replaced with default; changed to Optional[str] = None with explicit is not None guard
  • BH41-Codex (Codex) — engine/complexity.py: ComplexityDecomposer.decompose() accepted bool as complexity_floorTrue/False passed validate_range numeric check via bool-is-int coercion; added isinstance(complexity_floor, bool) guard with TypeError before validate_range

Quality Gate Fixes (Phase 1 — Verify)

  • ruff B017tests/test_bug_hunt_41.py lines 189, 205: pytest.raises(Exception) narrowed to pytest.raises(ValueError) (invalid Ed25519 key raises ValueError)
  • black — Auto-formatted 5 files: test_bug_hunt_41.py, calibrator.py, emitter.py, test_schema_signer.py, test_engine.py
  • mypy attr-definedcalibrator.py:847: added proposals_snapshot: list[dict[str, Any]] = [] type annotation to fix "object" has no attribute "value" inference error

v4.5.48 — Bug Hunt #40 (2026-02-22)

Method: Hybrid — 3 Claude sweep agents + Codex gpt-5.3-codex (xhigh reasoning) Test Coverage: 2815 tests passing, 94.78% coverage (2 skipped, +40 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Bugs Fixed (9 total: 4M, 5L)

  • BH40-M1 (Codex+Claude) — integration/afa_bridge.py: _validate_quality_subscores() failed to distinguish all-null list [None,None] from explicit empty list []; both defaulted to [0.7,0.7,0.7], bypassing quality gate fail-closed behavior; added saw_non_null tracker so [] returns [] (fail-closed) while all-null correctly defaults
  • BH40-M2 (Both) — telemetry/emitter.py: BatchHTTPSink.stop() read self._thread outside the lock — race with concurrent start(); extract ref + clear under lock, join outside lock (same pattern as BH38-M4, BH39-M1)
  • BH40-M3 (Claude) — engine/validation.py: validate_normalized() missing isinstance(value, bool) guard; True/False passed [0,1] range check via bool-is-int coercion
  • BH40-M4 (Claude) — config.py: _parse_mcp_rate_limit checked isinstance(value, float) on original value; string "3.5" bypassed check and was silently truncated to 3 via int(); fix: convert first, then check fv.is_integer()
  • BH40-L1 (Claude) — engine/gates.py: GateEvaluator accepted negative values for novelty_N0, novelty_threshold, complexity_floor, quality_min_score; negative thresholds trivially disable governance gates; added non-negativity guard
  • BH40-L2 (Claude) — config.py: _parse_kl_drift_dict window_days: same string-fractional truncation pattern as BH40-M4; separated ValueError (non-finite) from TypeError (fractional) to preserve existing test semantics
  • BH40-L3 (Claude) — aegis_governance/mcp_server.py: stdio size guard used len(line) (Unicode code-point count) instead of len(line.encode("utf-8")) (byte count); multi-byte characters allowed oversized requests through
  • BH40-L4 (Claude) — integration/afa_bridge.py: get_decision_history() used truthy if agent_id: — empty string "" was treated as no-filter, bypassing agent_id filtering; fixed to if agent_id is not None:
  • BH40-L5 (Claude) — telemetry/encryption.py: DEKRotator.get_decryptor() and list_versions() read self._deks without self._lock; only write paths (BH39-M2) had been protected; added lock acquisition for all read paths

v4.5.47 — Bug Hunt #39 (2026-02-21)

Method: Hybrid — 3 Claude sweep agents + Codex gpt-5.3-codex (xhigh reasoning) Test Coverage: 2775 tests passing, 94.77% coverage (2 skipped, +54 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Bugs Fixed (13 total: 1H, 6M, 6L)

  • BH39-H1 (High) — workflows/persistence/models.py: verify_chain_link() chain root forgery — root link was not self-consistency checked, allowing crafted from_link/to_hash chains to pass verification; added root hash self-consistency guard
  • BH39-M1 (Medium) — telemetry/pipeline.py: TelemetryPipeline.stop() held _lock during thread.join() — blocked concurrent start() for up to N seconds; lock released before join
  • BH39-M2 (Medium) — telemetry/encryption.py: DEKRotator.generate_dek_version() TOCTOU race — version ID computed, then written without holding lock; version now pinned under lock before write
  • BH39-M3 (Medium) — workflows/persistence/key_store.py: KeyStore audit_lock held during kek.decrypt() (blocking I/O) — lock released before decrypt call
  • BH39-M4 (Medium) — engine/gates.py: GateEvaluator accepts float('inf') for trigger factors — silently disables Bayesian gate (posterior probability of exceeding infinity is always 0, gate permanently passes); added math.isinf() guard
  • BH39-M5 (Medium) — engine/utility.py: UtilityResult accepts NaN for variance/raw/lcb — NaN propagates to silent gate failure (NaN > threshold is always False); added __post_init__ NaN rejection; -inf allowed for raw/lcb (signals extremely low utility)
  • BH39-M6 (Medium) — config.py: _parse_kl_drift_dict silently truncates float window_days (30.7 → 30); reject fractional floats before int() coercion
  • BH39-L1 (Low) — workflows/consensus.py: ConsensusWorkflow.from_dict() used cls() constructor mid-deserialization — inconsistent intermediate state if constructor validates; switched to cls.__new__(cls) pattern
  • BH39-L2 (Low) — engine/gates.py: GateEvaluator novelty_k=0.0 makes logistic gate score-insensitive (denominator always ≥1 regardless of score); added > 0 guard with clear error message
  • BH39-L3 (Low) — aegis_governance/mcp_server.py: JSON-RPC notification with non-string method received error response, violating JSON-RPC 2.0 §4.1; is_notification now computed before method type check so malformed notifications are silently dropped
  • BH39-L4 (Low) — crypto/bip322_provider.py: encode_simple() crashes with cryptic OverflowError for signatures ≥256 bytes (bytes([256]) overflow); added upfront len(signature) != SIGNATURE_SIZE check with clear ValueError
  • BH39-L5 (Low) — config.py: _parse_mcp_rate_limit silently truncates float mcp_rate_limit (e.g., 60.9 → 60); reject fractional floats
  • BH39-Codex-2 (Low) — telemetry/emitter.py: memory_sink accepts maxlen=0 for list sinks — del events[0] on empty list; TelemetryEmitter.emit() swallows sink exceptions → silent telemetry loss; added maxlen ≥ 1 guard for list-backed sinks

Test Files Added

  • tests/test_bug_hunt_39.py — 21 tests (BH39-H1 chain root, BH39-M1/M2/M3 concurrency/TOCTOU, BH39-L1 ConsensusWorkflow)
  • tests/test_bug_hunt_39b.py — 31 tests (BH39-M4/M5/M6/L2/L3/L4/L5)
  • tests/telemetry/test_emitter.py — +1 test (BH39-Codex-2 maxlen=0)

v4.5.46 (2026-02-21)

  • Bug Hunt #38 (Hybrid): 6 bugs (1H, 4M, 1L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 35 regression tests
  • BH38-H1: key_store.py uses Python 3.10+ parenthesized async-with syntax — SyntaxError on Python 3.9 (declared minimum); replaced with comma-separated form + # fmt: off guards for black stability
  • BH38-M1: UtilityCalculator accepts bool for phi_S, phi_D, gamma, kappa, migration_budget (bool-is-int bypass)
  • BH38-M2: GateEvaluator accepts bool for risk_trigger_factor, profit_trigger_factor, and all threshold params (bool-is-int bypass)
  • BH38-M3: CalibrationProposal accepts bool for current_value/proposed_value; _validate_gate_param accepts bool without rejection
  • BH38-M4: MetricsServer.stop() held _lock during thread.join() — blocked concurrent start() for up to 5 seconds; fixed by extracting refs under lock, releasing, then shutdown+join outside
  • BH38-L1 (Codex): BatchHTTPSink accepts float/non-int for integer params — upgraded bool-only check to full not isinstance(int) or isinstance(bool) pattern
  • QG-UT1: GateEvaluator(trigger_confidence_prob=True) silently accepted via validate_range inclusive upper bound (True==1.0); added explicit bool guard before validate_range in gates.py
  • Test Coverage: 2721 tests passing, 94.78% coverage (2 skipped, +36 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.45 (2026-02-20)

  • Bug Hunt #37 (Hybrid): 6 bugs (3M, 3L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 26 regression tests
  • BH37-M1: BayesianPosterior compute_posterior/compute_full NaN/Inf validation — added math.isfinite() guards + fail-closed GateResult for non-finite deltas
  • BH37-M2: emergency_halt() audit trail completeness — terminal overrides now tracked in already_terminal_overrides
  • BH37-M3: Calibrator novelty_N0 range constraint — added to _NONNEGATIVE_GATE_PARAMS
  • BH37-L1 (Codex): PipelineConfig accepts float for integer sizing fields
  • BH37-L2: ThreePointEstimate accepts bool values — added isinstance guard
  • BH37-L3: DriftMonitor window_days missing integer type check
  • Test Coverage: 2685 tests passing, 94.76% coverage (2 skipped, +26 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.44 (2026-02-20)

Bug Hunt #36 (Hybrid): 6 bugs (4M, 2L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 17 regression tests

  • BH36-M1 (Codex): lambda_handler.pyor pattern for estimated_impact/phase bypasses type check for falsy non-string values (False, 0)
  • BH36-M2: repository.pymark_completed(final_state="aborted") injects non-enum state into serialized request.state, breaking from_dict() deserialization
  • BH36-M3: cli.pyor pattern for estimated_impact — same class as M1 (transport parity)
  • BH36-M4: mcp_server.pyor pattern for estimated_impact — same class as M1 (transport parity)
  • BH36-L1: complexity.pycompute_complexity_tax missing bool guard for phi_S/phi_D
  • BH36-L2: lambda_handler.py/cli.pyproposal_summary or pattern accepts falsy non-strings

QG Ultrathink: 2 findings (2L) — Lambda + MCP action_description or "" patterns replaced with type-safe defaults; no new tests (non-governance documentation field)

Test metrics: 2659 tests, 94.74% coverage (2 skipped)


v4.5.43 (2026-02-20)

Bug Hunt #35 (Hybrid): 6 bugs (4M, 2L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 22 regression tests

  • BH35-M1 (Codex): override.pycheck_and_mark_expired() downgrades APPROVED/REJECTED terminal states to EXPIRED after wall-clock expiry
  • BH35-M2: rbac.pyNO_UNILATERAL_OVERRIDE constraint passes for NaN signer_count (IEEE 754: NaN < 2 is False)
  • BH35-M3: pipeline.pyPipelineConfig.flush_interval_seconds has no validation (zero/NaN/bool degrades flushing)
  • BH35-M4: emitter.pyBatchHTTPSink.flush_interval_seconds has no validation (same pattern as M3)
  • BH35-L1: pipeline.pyPipelineConfig.buffer_size/flush_threshold accept bool (isinstance(True, int) is True)
  • BH35-L2: encryption.pyDEKCache.ttl_seconds accepts zero/negative/bool without validation

QG Ultrathink (BH35 session): 4 findings (4L) — BatchHTTPSink/HTTPEventSink missing bool guards + timeout/retry_delay validation; +19 regression tests

Test metrics: 2642 tests, 94.79% coverage (0 skipped)


v4.5.42 (2026-02-20)

Bug Hunt #34 — Hybrid Architecture

  • Bug Hunt #34 (Hybrid): 5 bugs (4M, 1L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 14 regression tests
  • BH34-M1: DriftMonitor(num_bins=float) accepted, crashes later (Codex finding)
  • BH34-M2: CLI cmd_evaluate missing TypeError in config catch
  • BH34-M3: DualSignatureValidator.expiration_hours missing upper bound
  • BH34-M4: TelemetryPipeline._worker_loop inconsistent state on raise error
  • BH34-L1: AegisConfig.from_dict() telemetry_url type coercion gap
  • Test Coverage: 2601 tests passing, 94.79% coverage (0 skipped, +14 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.41 (2026-02-20)

Bug Hunt #33 — Hybrid Architecture

  • Bug Hunt #33 (Hybrid): 5 bugs (5M) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 15 regression tests
  • BH33-M1: config._parse_flat_numeric silently accepts non-numeric types (list, dict) — isinstance(x, (int, float)) guard missing; list/dict crash float() in _DIRECT flat numeric parsing
  • BH33-M2: config._from_raw_dict silently accepts non-numeric types for DIRECT params — YAML schema validates but direct dict construction bypasses; epsilon_R/beta/etc accept list/dict
  • BH33-M3: DriftMonitor.evaluate() passes unfiltered current_window to _to_histogram() — NaN/Inf values corrupt histogram bins; math.isfinite() filter missing (upstream filter only in set_baseline)
  • BH33-M4: OverrideWorkflow.init doesn't defensive-copy failed_gates list — external mutation of input list modifies workflow state; use list(failed_gates)
  • BH33-M5: mark_completed() doesn't sync state_data with final_state (Codex finding) — state_data["status"] remains old value after final_state update; audit trail inconsistency
  • Test Coverage: 2587 tests passing, 94.80% coverage (0 skipped, +15 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.40 (2026-02-20)

Bug Hunt #32 — Hybrid Architecture

  • Bug Hunt #32 (Hybrid): 3 bugs (2M, 1L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 20 regression tests
  • BH32-M1: DriftMonitor constructor accepts negative/Inf thresholds — update_thresholds() validates isfinite + non-negative but __init__() only calls validate_threshold_ordering(); negative tau causes false CRITICAL, Inf tau disables detection
  • BH32-M2: Calibrator _validate_gate_param allows negative threshold params (complexity_floor, quality_min_score, novelty_threshold, utility_threshold) — negative values make gates trivially pass; complexity_floor is non-overridable (governance bypass via calibration)
  • BH32-L1: KLDriftConfig __post_init__ missing window_days validation — _parse_kl_drift_dict validates >= 1 for YAML loading but direct construction accepts 0 or negative
  • Test Coverage: 2572 tests passing, 94.80% coverage (0 skipped, +20 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.39 (2026-02-20)

Bug Hunt #31 — Hybrid Architecture + QG73

  • Bug Hunt #31 (Hybrid): 4 bugs (1M, 3L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 15 regression tests
  • BH31-M1: MCP emit_mcp_invocation caller_id non-string guard — null/non-string passed to structured audit log
  • BH31-L1: Lambda _coalesce_thresholds dict.get() null gotcha — JSON null bypasses default value
  • BH31-L2: ConsensusConfig timeout_hours fractional minimum — accepts Inf/0.0, should require > 0
  • BH31-L3: DualSignatureValidator expiration_hours fractional minimum — accepts Inf/0.0, should require > 0
  • Quality Gate QG73 Ultrathink: 2 findings (1M, 1L), 7 regression tests
  • QG73-L1: CLI agent_id/session_id isinstance guard — transport parity with MCP/Lambda non-string rejection
  • QG73-M1: AFABridge default_timeout_hours fractional minimum — accepts Inf/0.0, should require > 0
  • Quality Gate QG74 Ultrathink: 1 cosmetic fix
  • QG74-L1: MCP tool_name dict.get() null gotcha — params.get("name", "") returns None for null key
  • Test Coverage: 2552 tests passing, 94.80% coverage (0 skipped, +22 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.38 (2026-02-19)

Bug Hunt #30 — Hybrid Architecture + QG72

  • Bug Hunt #30 (Hybrid): 5 bugs (2M, 3L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 8 regression tests
  • BH30-M1: Lambda proposal_summary dict.get() null gotcha — JSON null bypasses default (transport parity)
  • BH30-M2: AFABridge get_decision_history() accepts float limit — crashes during list slicing (Codex finding)
  • BH30-M3: CLI risk_score null → float(None) TypeError — extracted _coerce_risk_score() helper
  • BH30-L1: Lambda action_description dict.get() null gotcha — same pattern as BH30-M1
  • BH30-L2: Lambda estimated_impact dict.get() null gotcha — None passes to isinstance check
  • BH30-L3: TelemetryPipeline config mutation — shared PipelineConfig mutated by pii_encryptor setup; defensive copy.copy()
  • Quality Gate QG72 Ultrathink: 4 findings (2M, 2L), 4 regression tests
  • QG72-M1: CLI proposal_summary dict.get() null gotcha — transport parity with Lambda/MCP
  • QG72-M2: MCP action_description dict.get() null gotcha — None passed to risk check
  • QG72-L1: CLI estimated_impact dict.get() null gotcha — None bypasses string type check
  • QG72-L2: Lambda phase dict.get() null gotcha — None passed to .lower() TypeError
  • Test Coverage: 2530 tests passing, 94.76% coverage (0 skipped, +12 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.37 (2026-02-18)

Bug Hunt #29 — Hybrid Architecture + QG71

  • Bug Hunt #29 (Hybrid): 8 bugs (3M, 5L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 21 regression tests
  • BH29-M1: estimated_impact case-sensitive comparison — "HIGH" or "Critical" bypass human oversight gates silently
  • BH29-M2: Executor start_execution reads progress.status outside lock after publishing — TOCTOU race
  • BH29-M3: Calibrator novelty_k missing from _NONZERO_GATE_PARAMS — zero value weakens governance without validation error
  • BH29-L1: Config mcp_rate_limit missing math.isfinite() guard — NaN/Inf pass through to int() silently
  • BH29-L2: MCP phase string not lowercased before _PHASE_MAP lookup — transport parity gap with CLI/Lambda
  • BH29-L3: Lambda/CLI non-string estimated_impact silently bypasses impact classification — transport parity fix
  • BH29-L4: Pipeline stop() drain loop aborts on first storage error — remaining queued events lost
  • BH29-L5: PipelineConfig.flush_threshold accepts 0 or negative — triggers flush on every ingest
  • Quality Gate QG71 Ultrathink: 3 findings (3L), 5 regression tests
  • QG71-L1: MCP estimated_impact str() cast silently accepts null/non-string — replaced with or guard
  • QG71-L2: Pipeline drain except RuntimeError too narrow — enricher exceptions crash stop(); broadened to except Exception
  • QG71-L3: MCP proposal_summary null → "None" string via dict.get() gotcha — added or "" guard
  • Test Coverage: 2518 tests passing, 94.76% coverage (0 skipped, +26 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.36 (2026-02-18)

Bug Hunt #28 — Hybrid Architecture + QG70

  • Bug Hunt #28 (Hybrid): 5 bugs (3M, 2L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 17 regression tests
  • BH28-M1: ConsensusWorkflow QUORUM_MET state never reverts to PENDING when quorum lost via vote overwrite to DEFER
  • BH28-M2: Governance initiate_override blocked by expired stale override — evict expired before rejecting
  • BH28-M3: CLI risk_score alias overwrites risk simplified name — priority chain: risk_proposed > risk > risk_score
  • BH28-L1: DriftMonitor.evaluate() window_size includes non-finite values that don't contribute to KL divergence
  • BH28-L2: Config quality_no_zero_subscore string "false" treated as truthy — extracted _coerce_bool() helper
  • Quality Gate QG70 Ultrathink: 3 findings (3L), 5 regression tests
  • QG70-L1: _from_raw_dict skips _coerce_bool() for bool-allowed YAML fields
  • QG70-L2: _coerce_bool() accepts NaN/Inf floats
  • QG70-L3: set_baseline() passes raw data (including Inf) to histogram
  • Test Coverage: 2492 tests passing, 94.73% coverage (0 skipped, +22 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.35 (2026-02-17)

Quality Gate QG69 Ultrathink

  • Quality Gate QG69 Ultrathink: 1 finding (1M), 7 regression tests
  • QG69-M1: MCP + CLI drift_baseline_data missing math.isfinite() — transport parity violation with Lambda (BH27-L4 was fixed there but not in MCP/CLI)
  • Test Coverage: 2470 tests passing, 94.73% coverage (0 skipped, +7 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.34 (2026-02-17)

Bug Hunt #27 — Hybrid Architecture

  • Bug Hunt #27 (Hybrid): 4 bugs (3M, 1L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 13 regression tests
  • BH27-M1: resume_or_create() ID propagation — creation path now inspects constructor params and injects workflow_id
  • BH27-M2: _from_raw_dict string-to-float coercion — YAML quoted numerics passed through as str instead of float
  • BH27-M3: Lambda/MCP agent_id/session_id null bypass — dict.get() returns None for explicit JSON null
  • BH27-L4: Lambda drift_baseline_data missing math.isfinite() — NaN/Inf caused 500 instead of 400
  • Test Coverage: 2470 tests passing, 94.73% coverage (0 skipped, +13 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.33 (2026-02-17)

Scaffold Adoption — Compliance Infrastructure

  • Scaffold Adoption: Integrated Engineering Standards ai_scaffold_package v2.1.1 (50 new files, zero conflicts)
  • New directories: ai/ (governance artifacts), docs/compliance/ (operational runbooks), schemas/ (log schema), tools/ci/ (9 validators), policy/ (OPA), benchmarks/
  • AI Governance: 8 artifacts pre-filled with AEGIS content (system-register, risk-register mapping to OWASP Agentic Top 10, model/data cards, oversight-plan with kill switch, postmarket-monitoring, AIMS-POLICY, technical_file/)
  • Compliance Runbooks: 7 docs customized for AEGIS (system-description with Lambda+ECS architecture, BCP-DRP with RTO/RPO Tier 2, IRP with governance erosion containment, ACCESS-REVIEW with 4 AWS service accounts, VENDOR-RISK complete AWS assessment, CHANGE-MANAGEMENT with frozen parameter process, DSR-PRIVACY with 12 encrypted PII fields)
  • Automation: compliance-evidence-scheduler.yml (monthly/quarterly/annual GitHub issue creation), 15 labels, PR + 7 issue templates
  • CI/CD: 4 new workflows (scaffold-gates, codeql, compliance-nightly, compliance-evidence-scheduler), Makefile, .pre-commit-config.yaml (ELITE tier)
  • Type Safety: Added type annotations to 9 tools/ci/*.py validators (mypy strict mode compliance)
  • pyproject.toml: Added [tool.standards] (tier: ELITE, v2.1.1), tools/ci per-file ignores for scaffold style preferences
  • Test Coverage: 2448 tests passing, 94.83% coverage (0 skipped, no new tests — operational changes only)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest, AI RMF validators, AI Act lint, scaffold adoption validator)
  • Placeholder Elimination: 274 → 0 in scaffold files (100% customization: compliance docs, AI artifacts, GitHub templates all AEGIS-specific)

v4.5.32 (2026-02-16)

Bug Hunt #26 (Hybrid)

  • Method: 3 Claude sweep agents + Codex gpt-5.3-codex
  • Findings: 4 bugs (3M, 1L), 18 regression tests
  • BH26-M1: validation.pyvalidate_positive() accepts bool (True→1) due to Python bool⊂int (Codex finding)
  • BH26-M2: bayesian.pyupdate_prior variance overflow: sum((x-mean)**2)inf silently bypasses guard
  • BH26-M3: rbac.py_check_bool_constraint None value fail-open for pass_when_true=False constraints (security)
  • BH26-L1: complexity.pycompute_complexity_tax NaN/Inf propagation via delta dict values
  • 0 deferred bugs
  • Test Coverage: 2463 tests passing, 94.83% coverage (0 skipped, +18 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.31 (2026-02-16)

Bug Hunt #25 (Hybrid)

  • Method: 3 Claude sweep agents + Codex gpt-5.3-codex
  • Findings: 6 bugs (3M, 3L), 18 regression tests
  • BH25-M1: analyst.py_evaluate_utility_gate UtilityComponents fields crash on explicit None (JSON null)
  • BH25-M2: cli.py_extract_risk_alias_and_subscores key-presence vs value-presence transport parity gap with Lambda/MCP
  • BH25-M3: drift.py_to_histogram constant-range +-1.0 adjustment no-op for values > 2^53 (IEEE 754 precision) → ZeroDivisionError
  • BH25-L1: analyst.pyrisk_delta/profit_delta = None crashes with TypeError (Codex finding)
  • BH25-L2: bayesian.pyupdate_prior returns (inf, inf) when sum() overflows (no OverflowError from sum())
  • BH25-L3: config.pyfrom_dict accepts string "nan"/"inf" bypassing isinstance(val, (int, float)) NaN check
  • PLR0912 fix: extracted _parse_flat_numeric() static helper from from_dict()
  • 0 deferred bugs
  • Test Coverage: 2430 tests passing, 94.81% coverage (0 skipped, +18 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.30 (2026-02-16)

Bug Hunt #24 (Hybrid)

  • Method: 3 Claude sweep agents + Codex gpt-5.3-codex (1 iteration, 1 novel finding)
  • Findings: 10 bugs (4M, 6L), 26 regression tests
  • BH24-M1: mcp_server.pyhandle_request() returns response for JSON-RPC notifications (requests without "id") violating JSON-RPC 2.0 §4.1 (Codex finding)
  • BH24-M2: rbac.pyNO_UNILATERAL_OVERRIDE signer_count=None crashes with TypeError; guards None, bool, non-numeric (fail-closed)
  • BH24-M3: analyst.py_evaluate_quality_gate missing null guard for quality_score (dict.get returns None for explicit null)
  • BH24-M4: analyst.py_evaluate_risk_gate missing null guard for risk_baseline
  • BH24-L1: afa_bridge.py_extract_metrics missing isinstance(list) check for quality_subscores (transport parity gap)
  • BH24-L2: afa_bridge.py_extract_metrics missing isinstance(UtilityResult) check for utility_result
  • BH24-L3: config.pyKLDriftConfig.__post_init__ accepts NaN/Inf tau_warning/tau_critical via direct constructor
  • BH24-L4: analyst.py_evaluate_novelty_gate missing null guard for novelty_score
  • BH24-L5: analyst.py_evaluate_complexity_gate missing null guard for complexity_score
  • BH24-L6: analyst.py_evaluate_profit_gate missing null guard for profit_baseline
  • PLR0912 fix: extracted _validate_quality_subscores() static helper
  • 0 deferred bugs
  • Test Coverage: 2412 tests passing, 94.80% coverage (0 skipped, +26 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.29 (2026-02-16)

AMTSS Protocol v1 — MCP Tool Schema Signing

  • ROADMAP Item 20a(e) complete: Cryptographic signing of MCP tool schemas to detect rug pull attacks (CoSAI MCP-T6)
  • Design via Claude-GPT dialogical collaboration (GPT 5.2 Pro xhigh reasoning, 3 substantive rounds)
  • Research document: docs/research/004-mcp-schema-signing-design.md (465 lines)
  • New module: src/crypto/schema_signer.pyToolSchemaSigner, SigningKeyPair, compute_tool_digest()
  • Protocol: per-tool + manifest dual signing, Ed25519, RFC 8785 canonicalization, _meta inline delivery
  • MCP integration: _handle_tools_list() embeds proofs in _meta[com.aegis.governance/toolSchemaSigning]
  • MCP integration: _handle_initialize() advertises keyset in capabilities.experimental.toolSchemaSigning
  • Graceful degradation: signer is best-effort (None if cryptography not installed)
  • Updated crypto/__init__.py exports: ToolSchemaSigner, SigningKeyPair, compute_tool_digest, AMTSS_* constants, SCHEMA_SIGNER_AVAILABLE
  • All 5 sub-items of ROADMAP 20a now complete (audit logging, rate limiting, TLS enforcement, CoSAI cross-reference, schema signing)
  • Quality-Gate Ultrathink (prior session): 5 findings fixed (3 MEDIUM, 2 LOW) — manifest duplicate-name bypass, _meta stripping in digest, statement type validation, _prev_digests chain wiring, strict base64url decode; +7 regression tests
  • Quality-Gate QG67 Ultrathink: 4 additional findings fixed (2 MEDIUM, 2 LOW) — null sig crash in verify methods (isinstance guard), NaN/Inf canonicalization (allow_nan=False), manifest revision never incremented, MCP signing error log level (DEBUG→WARNING); +7 regression tests
  • Test Coverage: 2386 tests passing, 94.74% coverage (0 skipped, +82 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.28 (2026-02-16)

CoSAI MCP-T Cross-Reference

  • ROADMAP Item 20a(d) complete: Added §11.4.1 CoSAI MCP Threat Model (MCP-T1..T12) cross-reference to CLAUDE.md
  • Maps all 12 CoSAI MCP-specific threats to OWASP Agentic risks and AEGIS controls
  • Coverage: 9/12 STRONG, 2/12 MODERATE, 1/12 PARTIAL (MCP-T7 transport security)
  • Source: docs/research/003-mcp-security-ecosystem-review.md
  • ROADMAP 20a(d) checkbox marked complete
  • No code changes — documentation only
  • Test Coverage: 2304 tests passing, 94.63% coverage (unchanged)

v4.5.27 (2026-02-16)

Bug Hunt #23 (Hybrid)

  • 7 bugs found (3M, 4L) by 3 Claude sweep agents; Codex gpt-5.3-codex (1 iteration, 0 novel findings — 1 duplicate of QG66-UT2)
  • 29 regression tests added
  • Scope: Transport parity (CLI), input validation, thread safety, consensus logic, key store TOCTOU
  • BH23-M1: cli.py_load_drift_monitor missing bool guard for drift baseline elements (transport parity with Lambda/MCP)
  • BH23-M2: cli.py_extract_risk_alias_and_subscores treats explicit [] as falsy → returns defaults instead of empty list (transport parity)
  • BH23-M3: calibrator.py_evict_old_proposals() called outside self._lock — race condition on self.proposals dict
  • BH23-L1: cli.py_extract_risk_alias_and_subscores missing isinstance(list) type check for quality_subscores
  • BH23-L2: engine/bayesian.pyBayesianPosterior accepts NaN/Inf prior_mean in constructor and override paths
  • BH23-L3: consensus.pycheck_timeout() returns True for finalized (APPROVED/REJECTED) workflows past deadline
  • BH23-L4: key_store.pyget_private_key/get_public_key missing _audit_lock — TOCTOU race with revoke_key()
  • 0 deferred bugs
  • Test Coverage: 2304 tests passing, 94.63% coverage (0 skipped, +29 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.26 (2026-02-15)

Quality-Gate QG66 Ultrathink

  • 2 findings (2L), 2 regression tests
  • UT-1: mcp_server.py_validate_quality_subscores treats empty list [] as default [0.7, 0.7, 0.7] (transport parity with Lambda)
  • UT-2: mcp_server.py_validate_quality_subscores missing try/except on float() — non-numeric strings crash server
  • Test Coverage: 2275 tests passing, 94.63% coverage (0 skipped, +2 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.25 (2026-02-15)

Bug Hunt #22 (Hybrid)

  • 8 bugs found (4M, 4L) by 3 Claude sweep agents; Codex gpt-5.3-codex (3 iterations, 0 novel findings)
  • 20 regression tests added
  • Scope: Override lifecycle, transport parity, input validation, bounded collections, type safety
  • BH22-M1: override.pyreject() missing wall-clock expiration check before state mutation (Codex)
  • BH22-M2: mcp_server.pyquality_subscores missing extraction (transport parity with Lambda/CLI)
  • BH22-M3: drift.pyDriftMonitor.update_thresholds() missing finiteness + non-negativity validation
  • BH22-M4: persistence/repository.pymark_completed() allows re-completing already-completed workflows
  • BH22-L1: lambda_handler.py + mcp_server.pydrift_baseline_data bool guard (transport parity)
  • BH22-L2: governance.pyactive_overrides dict grows unbounded (expired overrides never evicted)
  • BH22-L3: afa_bridge.py_evaluate_authorization string-as-iterable (set("admin") → char explosion)
  • BH22-L4: analyst.py_evaluate_quality_gate crashes on explicit None quality_subscores (Codex)
  • 0 deferred bugs
  • Test Coverage: 2273 tests passing, 94.64% coverage (0 skipped, +20 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.24 (2026-02-15)

Bug Hunt #21 (Hybrid)

  • 8 bugs found (3M, 5L) by 3 Claude sweep agents; Codex gpt-5.3-codex (3 iterations, 0 novel findings)
  • 16 regression tests added
  • Scope: Config validation, transport parity, bounded collections, telemetry, HTTP compliance
  • BH21-M1: config.pyKLDriftConfig missing __post_init__ threshold ordering validation
  • BH21-M2: lambda_handler.py_validate_subscores missing bool guard (transport parity)
  • BH21-M3: afa_bridge.py_extract_metrics missing quality_subscores element validation (bool/NaN/Inf)
  • BH21-L1: drift.pyDriftMonitor window_days accepts zero/negative
  • BH21-L2: calibrator.pyproposals dict grows unbounded (added _evict_old_proposals)
  • BH21-L3: emitter.pyemit_shadow_evaluation payload key collision (status overwrite via dict.update)
  • BH21-L4: prometheus_exporter.pyset_drift_status unbounded Prometheus label cardinality
  • BH21-L5: mcp_server.py — MCP HTTP 405 missing Allow header (RFC 9110 §15.5.6)
  • 0 deferred bugs
  • Test Coverage: 2253 tests passing, 94.63% coverage (0 skipped, +17 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.23 (2026-02-15)

Bug Hunt #20 (Hybrid)

  • 9 bugs found (7M, 2L) by Codex gpt-5.3-codex (3 iterations) + 3 Claude sweep agents
  • 18 regression tests added
  • BH20-M1: persistence/durable.pyresume_all_pending() crashes on non-dict state_data
  • BH20-M2: override.pyfrom_dict() mutable failed_gates list sharing
  • BH20-M3: override.py (8 sites) — non-strict base64.b64decode accepts garbage
  • BH20-M4: consensus.pyeligible_voters set stored by reference
  • BH20-M5: consensus.pytimeout_hours unbounded → timedelta OverflowError
  • BH20-M6: pcw_decide.py_hash_summary/_build_decision_trace crash on non-string/non-mapping inputs
  • BH20-M7: encryption.pyEncryptedField.from_dict() permissive base64 decode
  • BH20-L1: config.pywindow_days missing negative/overflow validation
  • BH20-L2: Lambda/MCP/CLI (4 files) — transport parity: float helpers missing bool guard

QG65 Ultrathink

  • 5 additional fixes from deep analysis phase
  • CLI risk_score alias path missing bool guard — True silently became 1.0
  • CLI quality_subscores elements missing bool guard — booleans passed as floats
  • Lambda _parse_body base64 decode missing validate=True — non-base64 chars accepted
  • Crypto providers strict base64: ed25519_provider.py (3 calls), bip322_provider.py (1 call), kek_provider.py (2 calls) — all upgraded to validate=True
  • PLR0912 fix: extracted _extract_risk_alias_and_subscores() helper in cli.py
  • 4 regression tests added
  • Test Coverage: 2236 tests passing, 94.68% coverage (0 skipped, +22 new from BH20+QG65)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.22 (2026-02-15)

Rigor: Resolve All Deferred Bugs

  • Rigor: Resolve All Deferred Bugs: Fixed BH16-L5, closed BH15-L6 — 0 deferred bugs remaining
  • BH16-L5 FIXED: WorkflowTransition.verify_hash() standalone false negatives for non-first transitions — added previous_hash column to model, updated _record_transition() to populate it, verify_hash() now falls back to stored value
  • BH15-L6 CLOSED (by-design): Lambda telemetry_emitter wiring gap — Lambda uses AWS-native observability (CloudWatch Logs, X-Ray); adding HTTPEventSink would create redundant double-logging and SSRF attack surface
  • 8 regression tests added (6 model-level + 2 repository integration)
  • Test Coverage: 2214 tests passing, 94.68% coverage (0 skipped, +8 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.21 (2026-02-14)

Bug Hunt #19 (Hybrid)

  • Bug Hunt #19 (Hybrid): 5 bugs (2M, 3L) found by Codex gpt-5.3-codex (3 iterations) + 3 Claude sweep agents; 12 regression tests
  • BH19-M1: proposal.pyfrom_dict() shares mutable references with input dict (list aliasing for tags/related_proposals, dict aliasing for transition metadata/gate_results) — caller mutations corrupt workflow state
  • BH19-M2: override.py + key_store.pysign_with_stored_key() key rotation TOCTOU — private/public keys fetched in separate calls without version pinning; record_usage() also lacked version parameter
  • BH19-L1: afa_bridge.py_coalesce_float missing bool guard (bool-is-int subclass, True silently becomes 1.0)
  • BH19-L2: afa_bridge.py_evaluate_execution accepted non-boolean proposal_approved/has_execution_plan (Python truthiness)
  • BH19-L3: afa_bridge.py_evaluate_authorization crashes with set(None) when authorization lists are JSON null
  • 0 deferred bugs
  • Test Coverage: 2206 tests passing, 94.68% coverage (0 skipped, +12 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.20 (2026-02-14)

Bug Hunt #18 (Hybrid)

  • Bug Hunt #18 (Hybrid): 7 bugs (3M, 4L) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 25 regression tests
  • BH18-M1: lambda_handler.py — non-boolean control flags accepted via raw body.get() (Python truthiness confusion)
  • BH18-M2: config.pyfrom_dict/_from_raw_dict flat keys lack NaN/Inf validation (parity gap with nested sections)
  • BH18-M3: cli.py — CLI non-boolean control flags (transport parity with Lambda)
  • BH18-L1: bayesian.pyBayesianPosterior.update_prior() accepted ddof=True (bool is int subclass)
  • BH18-L2: config.pyfrom_dict novelty keys lack NaN/Inf validation
  • BH18-L3: consensus.pyConsensusConfig missing bool guards on quorum_percentage/approval_threshold
  • BH18-L4: afa_bridge.pyAFABridge(default_timeout_hours=True) bypasses isfinite() check
  • 0 deferred bugs
  • Test Coverage: 2194 tests passing, 94.61% coverage (0 skipped, +25 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.19 (2026-02-14)

Bug Hunt #17 (Hybrid)

  • Bug Hunt #17 (Hybrid): 6 bugs (1M, 5L) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 13 regression tests
  • BH17-M1: afa_bridge.py_evaluate_risk_check used raw context.get() instead of _coalesce_float(), allowing None/NaN/Inf through (transport parity gap)
  • BH17-L1: config.py_extract_nested_floats missing isfinite() validation after float() cast
  • BH17-L2: config.py_from_raw_dict kl_drift parsing lacked NaN/Inf validation; replaced inline code with _parse_kl_drift_dict() helper for parity
  • BH17-L4: serialization.pyensure_utc preserved non-UTC timezone offsets instead of converting to UTC
  • BH17-L5: emitter.pyBatchHTTPSink accepted negative max_retries (silent event drops on flush)
  • BH17-L6: governance.pyemergency_halt reported already-rejected overrides as cancelled
  • 0 deferred bugs
  • Test Coverage: 2169 tests passing, 94.60% coverage (0 skipped, +13 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.18 (2026-02-14)

Quality Gate #62 (Ultrathink)

  • Quality Gate #62 (Ultrathink): 6 findings (1M, 5L) from BH16 post-fix audit; 11 regression tests
  • QG62-M1: afa_bridge.py_coalesce_float() missing isfinite() guard (transport parity with CLI/Lambda); risk_proposed/profit_proposed bare float() without validation
  • QG62-L1: config.pyfrom_dict kl_drift float fields lacked NaN/Inf validation; window_days not coerced to int (parity with _from_raw_dict); extracted _parse_kl_drift_dict() helper
  • QG62-L2: lambda_handler.pyquality_subscores null guard missing (dict.get returns None, not default, when key exists with null)
  • 0 deferred bugs
  • Test Coverage: 2156 tests passing, 94.58% coverage (0 skipped, +11 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.17 (2026-02-14)

Bug Hunt #16 (Hybrid)

  • Bug Hunt #16 (Hybrid): 9 bugs (4M, 5L) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 22 regression tests
  • BH16-M1: lambda_handler.py — non-dict metadata → 500 instead of 400; added type validation
  • BH16-M2: config.pyfrom_dict missing bool guard for kl_drift values (parity with _from_raw_dict)
  • BH16-M3: afa_bridge.py_evaluate_proposal crashes on None context values; extracted _coalesce_float() + _extract_metrics() helpers
  • BH16-M4: consensus.py — deadlock when all voters voted but neither threshold met; added rejection fallback
  • BH16-L1: mcp_server.py_drain_request_body partial drain without close_connection = True
  • BH16-L2: config.pymcp_rate_limit missing bool guard in both from_dict and _from_raw_dict
  • BH16-L3: afa_bridge.py_evaluate_authorization non-deterministic set ordering in rationale/next_steps
  • BH16-L4: complexity.py — negative normalized values not clamped to [0, 1]
  • BH16-L5: persistence/models.py — WorkflowTransition.verify_hash() false negatives (deferred — requires schema change)
  • 1 deferred bug (BH16-L5)
  • Test Coverage: 2145 tests passing, 94.56% coverage (0 skipped, +22 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.16 (2026-02-14)

Bug Hunt #15 (Hybrid) + Quality Gate #61 (Ultrathink)

  • Bug Hunt #15 (Hybrid): 8 bugs (2M, 6L) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 22 regression tests
  • Quality Gate #61 (Ultrathink): 7 findings (4M, 3L) — 5 fixed + 8 regression tests
  • Transport parity: CLI observation_values sanitization
  • 0 deferred bugs
  • Test Coverage: 2123 tests passing, 94.53% coverage (0 skipped, +22 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.15 (2026-02-13)

Bug Hunt #14 (Hybrid)

  • 3 bugs (3M) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 10 regression tests
  • BH14-M1: consensus.pyConsensusConfig(timeout_hours=True) silently creates 1-hour deadline (Python bool is subclass of int, True == 1); added isinstance(bool) guard in __post_init__
  • BH14-M2: override.pyDualSignatureValidator(expiration_hours=0) silently creates instantly-expired overrides; NaN/Inf crash downstream at OverrideWorkflow.__init__; added full validation (bool, isfinite, positivity) in __init__
  • BH14-M3: lambda_handler.pyquality_subscores lacked isfinite() guard (CLI had it, Lambda didn't); extracted _validate_subscores() helper for transport parity
  • 0 deferred bugs
  • Test Coverage: 2101 tests passing, 94.54% coverage (0 skipped, +10 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.14 (2026-02-13)

Rigor Close Deferrals v3

  • Closed all 5 deferred bugs (1 fixed, 4 documented/accepted-risk)
  • BH12-L2: afa_bridge.pydefault_timeout_hours NaN/Inf passthrough; math.isfinite() guard added. 3 regression tests.
  • QG60-6: emitter.py — BatchHTTPSink stats counters outside lock. CLOSED: CPython GIL atomic; inline documentation added.
  • QG60-7: consensus.py — Rejection threshold indeterminate. CLOSED: By-design; timeout handles.
  • QG60-8: consensus.pycast_vote() no thread lock. CLOSED: Single-threaded by design; thread-safety note added to docstring.
  • QG60-9: governance.py — No un-halt mechanism. CLOSED: Intentional one-way safety mechanism.
  • 0 deferred bugs remaining
  • Test Coverage: 2091 tests passing, 94.52% coverage (0 skipped, +3 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.13 (2026-02-13)

Bug Hunt #13 (Hybrid)

  • 7 bugs (4M, 3L) found by 3 Claude sweep agents; Codex gpt-5.3-codex xhigh timed out (60+ min, no output); 16 regression tests
  • BH13-M1: lambda_handler.py — Eager evaluation of risk_score fallback causes crash when risk_proposed is valid but risk_score is NaN/Inf
  • BH13-M2: config.pyfrom_dict null kl_drift values pass through to KLDriftConfig, causing TypeError in DriftMonitor construction
  • BH13-M3: cli.py — Transport-layer parity gap: missing isfinite() guard on float conversions (Lambda/MCP have it, CLI didn't)
  • BH13-M4: mcp_server.py — POST 403 origin rejection doesn't drain request body (same class as QG60-3)
  • BH13-L1: config.pyfrom_dict mcp_rate_limit=None causes int(None) TypeError
  • BH13-L2: override.pyto_dict() leaks mutable failed_gates reference (same class as BH12-C1)
  • BH13-L3: mcp_server.py — Invalid/negative Content-Length doesn't close connection
  • Deferred: BH12-L2 (AFABridge.default_timeout_hours NaN/Inf — carried forward)
  • Test Coverage: 2088 tests passing, 94.52% coverage (0 skipped, +16 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.12 (2026-02-13)

Quality-Gate Ultrathink (QG60)

  • 5 fixes from 9 findings (3M, 2L fixed; 4L deferred)
  • QG60-1: validation.pyvalidate_positive() accepts Inf (FAIL-OPEN: Inf epsilon disables risk/profit gates); isnanisfinite
  • QG60-2: utility.pyUtilityCalculator gamma/kappa/migration_budget no Inf guard (NaN contamination via Inf * 0.0); isfinite validation
  • QG60-3: mcp_server.py — POST 404 catch-all doesn't consume request body (HTTP/1.1 persistent connection corruption); _drain_request_body() helper
  • QG60-4: mcp_server.py — 413 oversized request doesn't consume body; close_connection = True to force connection close
  • QG60-5: utility.pyThreePointEstimate accepts Inf values (OverflowError or NaN LCB); isfinite __post_init__
  • SDK facade: Added Calibrator and Governance actor exports to aegis_governance.__init__
  • Deferred: QG60-6 (BatchHTTPSink stats counters outside lock — advisory), QG60-7 (ConsensusWorkflow rejection threshold — by-design), QG60-8 (cast_vote no thread lock — single-threaded), QG60-9 (no un-halt — intentional one-way)
  • Resolves deferred BH12-L1 (MCP HTTP POST body drain)
  • Test Coverage: 2072 tests passing, 94.50% coverage (0 skipped, +19 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.11 (2026-02-12)

Bug Hunt #12 (Hybrid)

  • 10 bugs (1H, 7M, 2L) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 22 regression tests
  • BH12-H1: gates.py — GateEvaluator NaN threshold params cause governance lockout (complexity_floor NaN blocks all proposals with no override path)
  • BH12-M1: gates.py — NaN for novelty_N0/k/threshold, quality_min_score, utility_threshold (isfinite validation loop)
  • BH12-M2: complexity.pyanalyze() NaN metric silent 1.0 via min() order-dependence (isfinite guard)
  • BH12-M3: complexity.pycomplexity_floor validate_range missing check_nan=True
  • BH12-M4: lambda_handler.py_float helper NaN/Inf passthrough (parity gap with MCP _float_arg)
  • BH12-M5: lambda_handler.py_handle_risk_check NaN/Inf produces invalid JSON response
  • BH12-M6: executor.pyExecutionPlan.timeout_seconds NaN/Inf bypass IEEE 754
  • BH12-M7: calibrator.pyCalibrationProposal.data_window no type/range validation
  • BH12-M8: config.py_from_raw_dict null YAML values for _DIRECT params + from_dict null novelty/flat-key params
  • BH12-C1: proposal.pyto_dict() leaks mutable references to internal state (Codex)
  • Deferred: BH12-L1 (MCP HTTP POST body on 404/403), BH12-L2 (AFABridge.default_timeout_hours NaN/Inf)
  • Test Coverage: 2053 tests passing, 94.52% coverage (0 skipped, +22 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.10 (2026-02-12)

Quality-Gate Ultrathink (QG59)

  • 12 fixes from 21 findings (8M, 4L); 9 deferred (3M design-gap, 6L advisory)
  • QG59-P1-1: gates.py — NaN trigger_factor bypasses zero-check (NaN guard)
  • QG59-P1-2: gates.pytrigger_confidence_prob > 1.0 disables governance FAIL-OPEN (validate_range)
  • QG59-P1-4: config.py — YAML null values crash _extract_nested_floats (None guard + KL null-coalesce)
  • QG59-P2-1: calibrator.py — CalibrationProposal accepts NaN/Inf (__post_init__ validation)
  • QG59-P2-2: analyst.py_coerce_to_float accepts "nan"/"inf" strings + _calculate_confidence averages NaN (isfinite guards)
  • QG59-P2-3: proposer.py — PERT estimates NaN/Inf passthrough (isfinite guard)
  • QG59-P3-1: mcp_server.py_float_arg passes NaN/Inf (isfinite guard)
  • QG59-P3-3: emitter.py — wrong event counted as dropped (track evicted oldest)
  • Test Coverage: 2031 tests passing, 94.52% coverage (0 skipped, +22 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

v4.5.9 (2026-02-12)

Bug Hunt #11 (Hybrid)

  • 10 bugs (8M, 2L) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 12 regression tests
  • BH11-M1: cli.pyquality_subscores null crash (null-coalesce + fallback)
  • BH11-M2: cli.py — non-string phase crash (isinstance guard)
  • BH11-M3: calibrator.py — wrong capability check (can_evaluate_gatescan_configure)
  • BH11-M4: governance.pyemergency_halt doesn't cancel active overrides
  • BH11-M5: consensus.py — NaN timeout_hours passthrough (math.isfinite guard)
  • BH11-M6: mcp_server.py — POST /health unconsumed body corruption (removed route)
  • BH11-M7: pipeline.py_encrypt_pii_fields T-5 bypass (pass captured encryptor)
  • BH11-M8: emitter.pyBatchHTTPSink batch_size=0 silent data loss (Codex)
  • BH11-L1: utility.pylcb_alpha NaN passthrough (check_nan=True)
  • BH11-L2: mcp_server.py — stdio size check includes newline (strip first)
  • Test Coverage: 2009 tests passing, 94.49% coverage (0 skipped, +12 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.5.8)

  • Quality-Gate Ultrathink (QG58): Docs sync phase — comprehensive test metric update across all documentation files
  • Updated version numbers: CLAUDE.md v4.5.7 → v4.5.8, ROADMAP v1.44.0 → v1.45.0, gap-analysis v1.49.0 → v1.50.0, repository-structure v2.10.0 → v2.11.0
  • Test metrics synchronized: 1987 tests, 94.45% coverage → 1997 tests, 94.47% coverage (10 new tests, +0.02% coverage)
  • Updated 9 documentation files: CLAUDE.md, README.md, ROADMAP.md, gap-analysis.md, KNOWN_ISSUES.md, repository-structure.md, test-count-methodology.md, comprehensive-todo-discovery.md, changelog.md
  • Changelog entries added to CLAUDE.md §9, docs/claude/changelog.md, and ROADMAP.md §Changelog
  • Test coverage: 1997 tests passing, 94.47% coverage (0 skipped)
  • Quality gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.5.7)

  • Bug Hunt #10 (Hybrid): 7 bugs (5M, 2L) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 9 regression tests
  • BH10-M1: validation.pyvalidate_positive NaN pass-through (IEEE 754: NaN <= 0 is False). Fix: math.isnan() guard
  • BH10-M2: validation.pyvalidate_threshold_ordering NaN pass-through. Fix: math.isnan() guard on both values
  • BH10-M3: mcp_server.py — stdio transport missing _MAX_REQUEST_BYTES size limit (HTTP had it). Fix: len(line) check
  • BH10-M4: cli.py — null JSON metric values crash (data.get(key, default) returns None, not default). Fix: null-coalesce
  • BH10-M5: lambda_handler.py — non-string phase value causes AttributeError on .lower(). Fix: isinstance(str) guard
  • BH10-L1: governance.pyemergency_halt non-atomic state mutation (no lock). Fix: with self._lock: on write + read
  • BH10-M7: lambda_handler.py — non-numeric drift_baseline_data crash on float(). Fix: try/except ValueError/TypeError
  • Quality-Gate Ultrathink (QG57): 2 additional fixes from Phase 2 ultrathink
  • QG57-M1: mcp_server.py — drift baseline non-numeric crash (same pattern as lambda, not propagated to MCP). Fix: try/except
  • QG57-M2: governance.py — TOCTOU in initiate_override/add_override_signature (halt check outside lock). Fix: moved inside lock
  • Test Coverage: 1987 tests passing, 94.45% coverage (0 skipped, +9 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.5.6)

  • Quality-Gate Ultrathink (QG56): 4 fixes from 13 findings (1H, 5M, 5L, 2I)
  • QG56-M2: stdio transport now supports JSON-RPC batch arrays via handle_batch()
  • QG56-M3: WebhookAlertSink TLS enforcement via _validate_sink_url() + allow_insecure param (breaking: http:// URLs now require allow_insecure=True)
  • QG56-M4: _validate_sink_url() strips URL whitespace before urlparse() to prevent hostname TOCTOU
  • QG56-L5: mcp_rate_limit clamped to max(0, ...) in both from_dict() and _from_raw_dict()
  • Test Coverage: 1978 tests passing, 94.47% coverage (0 skipped, +14 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.5.5)

  • TLS Enforcement (ROADMAP Item 20a(c)): _validate_sink_url() helper enforces HTTPS on HTTPEventSink and BatchHTTPSink with allow_insecure: bool = False keyword-only escape hatch for local development; MCP _ALLOWED_TELEMETRY_SCHEMES restricted from {"http", "https"} to {"https"}; CLI catches ValueError in _build_telemetry_emitter(); production guide TLS section added; Research 003 G2 status → ADDRESSED. Closes CoSAI MCP-T7 (Transport Security) gap.
  • Parameter Cookbook (ROADMAP Item 16): (a) docs/integration/parameter-reference.md — comprehensive parameter reference with derivation guidance, domain examples, boundary behavior for all inputs; (b) docs/integration/domain-templates.md — 4 worked examples (trading, CI/CD, content moderation, autonomous agent) with parameter mapping tables, JSON inputs, gate-by-gate walkthroughs; (c) MCP tool descriptions enriched with semantic context, minimum/maximum JSON Schema constraints, instructions field in initialize response
  • Test Coverage: 1964 tests passing, 94.47% coverage (0 skipped, +12 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.5.4)

  • MCP Hardening Phase 1 (ROADMAP Item 20a): Token bucket rate limiter + structured audit logging for all MCP tool invocations
  • _MCPRateLimiter: stdlib-only token bucket (capacity/rate), thread-safe via threading.Lock, configurable via AegisConfig.mcp_rate_limit (default: 60 req/min, 0 to disable)
  • emit_mcp_invocation(): structured audit event on every tools/call — ALLOW/DENY/ERROR decision, SHA-256 params_hash (PII-safe), latency, caller_id
  • Telemetry schema v2.2.0: added mcp.tool_invocation event definition (6 fields)
  • Closes CoSAI MCP-T10 (resource management) and MCP-T12 (logging/audit) gaps
  • Test Coverage: 1948 tests passing, 94.59% coverage (0 skipped, +25 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.5.3)

  • MCP Streamable HTTP Transport (ROADMAP Item 23): Implemented MCP Streamable HTTP transport per 2025-03-26 spec using stdlib http.server (zero new dependencies). Protocol version updated to 2025-03-26. --transport http flag with --host, --port, --allowed-origins options. POST /mcp supports JSON-RPC single + batch dispatch. GET /mcp returns 405 (SSE not implemented). /health endpoint for container health checks. Origin validation: fail-closed for non-localhost, permissive for localhost. Infrastructure: Dockerfile exposes 8080 with HTTP CMD; ECS stack replaces keepalive loop with HTTP server; internal ALB (:80 → :8080); ALB 5xx CloudWatch alarm. Deferred: SSE streaming, session management, resumability (all tools are synchronous/stateless). KNOWN_ISSUES.md ECS limitation marked RESOLVED. ADR-007 diagram updated.
  • Security Hardening (Ultrathink): 8 findings fixed (1 HIGH, 3 MEDIUM, 4 LOW) with 18 regression tests — U-1 SSRF protection on telemetry_url (scheme whitelist + private IP blocking), U-2/U-3 Content-Length validation (non-numeric, negative), U-4 error message sanitization (no exception details to clients), U-7 non-dict batch item rejection, U-8 empty batch [] error response, U-9 Origin validation on all endpoints (GET+POST), U-10 handle_request return type correctness
  • H-1 SSRF Hex/Decimal IP Bypass Fix: _validate_telemetry_url() now uses resolve-then-validate via socket.getaddrinfo() in the except ValueError branch — blocks hex (0x7f000001), decimal (2130706433), and DNS-to-private bypasses; extracted _is_forbidden_ip() helper using not addr.is_global (covers CGNAT 100.64/10 range missed by 4-property check); M-3 Slowloris timeout: timeout = 30 class attribute on _MCPHTTPHandler + server self.timeout = 30; 14 regression tests
  • Test Coverage: 1923 tests passing, 94.62% coverage (0 skipped, +64 new)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.5.2)

  • Security Hardening (Quality-Gate Ultrathink): 17 findings fixed (3 HIGH, 11 MEDIUM, 3 LOW) across 6 files — CORS restricted from ALL_ORIGINS to *.amazonaws.com, aegis-gate script injection fixes (function_name to env var, GITHUB_OUTPUT heredoc delimiters, ::error:: env vars), error message sanitization (no exception details in 500s), dynamodb:Scan removed from IAM policies, s3:PutObjectAcl removed, ADOT collector pinned to v0.41.2, CDK deploy --require-approval broadening, billing alarm enabled for all stages (dev=$100, staging=$150, prod=$200), deploy workflow test gate added, ECS keepalive logs failures, health check rejects degraded, _safe() None guard, quality_subscores empty-list fallback, ECS config path cleared
  • Test Coverage: 1859 tests passing, 94.54% coverage (0 skipped)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.5.1)

  • AWS Deployment Complete (ROADMAP Items 16-20): All 4 CDK stacks successfully deployed to AWS us-west-2 (account 164171672016): AegisSharedStack-dev (DynamoDB aegis-governance-state-dev, KMS, S3 aegis-governance-audit-dev-164171672016, Secrets Manager aegis/signing-keys-dev), AegisLambdaStack-dev (Lambda aegis-evaluate-proposal-dev + API Gateway https://yd1xm4ahcg.execute-api.us-west-2.amazonaws.com/dev/), AegisMcpStack-dev (ECS Fargate cluster aegis-governance-dev, service aegis-mcp-dev 1/1 running), AegisMonitoringStack-dev (SNS aegis-governance-alarms-dev, CloudWatch dashboard AEGIS-Governance-dev, 4 alarms)
  • Deployment Bug Fixes (7): cdk.json literal string bug (account context), pyproject.toml py-modules for standalone .py modules, Dockerfile.lambda numpy/scipy pins + explicit COPY for standalone modules, ECS ALB removal (MCP uses stdio not HTTP) + keepalive loop + lightweight health check, Lambda cross-stack cyclic refs (inline IAM policies), CloudWatch math expression MAX() to IF(), CDK protocol error (dict context to env kwarg)
  • ECS Architecture Note: MCP server uses stdio transport; ECS container runs keepalive loop pending HTTP/SSE transport implementation (see KNOWN_ISSUES.md)
  • Test Coverage: 1859 tests passing, 94.55% coverage (0 skipped)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.5.0)

  • AWS Deployment Infrastructure (ROADMAP Items 16-20): Hybrid Lambda + ECS architecture via CDK Python; 4 CDK stacks (infra/): AegisSharedStack (DynamoDB, Secrets Manager, S3, KMS), AegisLambdaStack (Lambda container image + API Gateway REST with IAM auth), AegisMcpStack (ECS Fargate + ADOT sidecar for AMP), AegisMonitoringStack (CloudWatch alarms + dashboard + billing protection); src/lambda_handler.py wrapping pcw_decide() with 3 routes (POST /evaluate, POST /risk-check, GET /health); Dockerfile.lambda for scipy-enabled container image; .github/workflows/aegis-deploy.yml (OIDC deploy pipeline); .github/actions/aegis-gate/action.yml (reusable governance gate composite action); ADR-007 documenting architecture decision; estimated $51/mo
  • Ultrathink Hardening: U-1 quality_subscores null filter (prevents TypeError), U-2 aegis-gate script injection fix (env vars), U-4 Dockerfile.lambda editable install removed
  • Coverage Boost: 8 error-path tests raising lambda_handler.py from 86% to 94%
  • Test Coverage: 1859 tests passing, 94.55% coverage (0 skipped)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.4.0)

  • Drift Detection → Policy Connection (ROADMAP Item 15): Wired DriftMonitor KL divergence detector into the production decision path of pcw_decide() — CRITICAL drift → HALT (non-overridable), WARNING drift → advisory constraint, NORMAL drift → no change; drift_monitor=None (default) → identical behavior to previous versions (backward compatible); new drift_result field on PCWDecision; _evaluate_drift_policy() and _apply_drift_overrides() helpers extracted for PLR0912 compliance; DRIFT_POLICY_ENFORCED telemetry event type; AegisConfig.create_drift_monitor() factory; CLI --drift-baseline flag with null-value filtering; MCP drift_baseline_data array parameter with null-value filtering; DriftAction and DriftResult re-exported from SDK facade and engine; drift-specific next_steps for CRITICAL HALT; 39 new tests including 6 quality-gate regression tests
  • Test Coverage: 1817 tests passing, 94.56% coverage (0 skipped)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.3.1)

  • HTTP Telemetry Sink (ROADMAP Item 14): HTTPEventSink (per-event fire-and-forget POST), BatchHTTPSink (batching with retry and background flush daemon), http_sink() factory; stdlib-only (urllib.request, matching WebhookAlertSink pattern); AegisConfig.telemetry_url optional field; CLI --telemetry-url flag on aegis evaluate; MCP telemetry_url string parameter on aegis_evaluate_proposal; SDK facade re-exports (BatchHTTPSink, HTTPEventSink, http_sink); telemetry __init__.py re-exports; 45 new tests
  • Test Coverage: 1778 tests passing, 94.44% coverage (0 skipped)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.3.0)

  • Shadow Mode (ROADMAP Item 13): Added shadow_mode keyword parameter to pcw_decide() for KL divergence calibration data collection without enforcing decisions; new ShadowResult dataclass with drift evaluation, observation values, and baseline hash; DriftMonitor integration via optional drift_monitor parameter; TelemetryEmitter integration via optional telemetry_emitter parameter with SHADOW_EVALUATION event type; Prometheus mode label on decision_latency_seconds histogram ("production"/"shadow"), new aegis_shadow_evaluations_total counter; CLI --shadow flag on aegis evaluate; MCP shadow_mode boolean parameter on aegis_evaluate_proposal; ShadowResult re-exported from SDK facade; alerting/recording rules filtered to {mode="production"} to exclude shadow data; 44 new tests
  • Test Coverage: 1733 tests passing, 94.48% coverage (0 skipped)
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.2.3)

  • ROADMAP Items 10-12: Production deployment guide (docs/deployment/production-guide.md), migration guide (docs/deployment/migration-guide.md), performance SLAs with recorded benchmark baselines (docs/deployment/performance-slas.md); Dockerfile (multi-stage, non-root), docker-compose.yaml (AEGIS + Prometheus + Grafana), monitoring/prometheus/prometheus.yml (scrape config); no code changes
  • CALIBRATOR Actor (ROADMAP Item 7): New Calibrator actor type — statistical threshold tuning for drift thresholds, Bayesian priors, gate parameters; approval-gated workflow (PROPOSED→APPROVED→APPLIED); _RECOGNIZED_PARAMETERS whitelist (16 params); ultrathink-hardened (U-1 ValueError propagation, U-2 setattr validation, U-3 double-apply TOCTOU, U-4 derived ID collision, U-5 emit simplification); 69 new tests including 12 regression tests; 1689 tests / 94.60% coverage
  • GOVERNANCE Actor (ROADMAP Item 6): New Governance actor type — override orchestration (initiate/sign/approve/reject/expire), compliance checking (complexity gate non-overridable, fail-closed), emergency halt, thread-safe with threading.Lock; ultrathink-hardened (U-1/U-2 halt guards, U-3 fail-closed compliance, U-4 terminal cleanup, U-7 thread safety); 41 new tests including 6 regression tests; 1620 tests / 94.36% coverage
  • DRY Extraction (ROADMAP Items 8 & 9): Extracted ensure_utc() to src/workflows/serialization.py (3 workflows), 4 validation helpers to src/engine/validation.py (5 engine modules); 26 new tests; deferred persistence/telemetry timezone consolidation; 1620 tests / 94.36% coverage
  • Dependency Fix: Moved scipy/prometheus_client from dev to dedicated engine/telemetry optional groups with graceful ImportError at point of use; 4 regression tests; 1552 tests / 94.27% coverage
  • Quality-Gate Ultrathink #10: 5 MEDIUM bugs fixed — Bayesian overflow/NaN guard (B10-1/B10-2), pipeline validator exception propagation + per-event counting (T-2/T-3), executor rollback retry (T10-1) — 7 regression tests; 1471 tests / 94.23% coverage
  • Rigor Close Deferrals v2: 4 bugs fixed + 3 closed as intentional; 6 regression tests; 1466 tests / 94.22% coverage
  • FIX-1: Pipeline validator short-circuit break skipped remaining validators when drop_invalid=False
  • FIX-2: Bayesian update_prior() linear std interpolation → variance-space combination (Jensen's inequality fix)
  • FIX-3: Decryption _decrypt_dict() dotted-path field filtering (e.g., "nested.actor_id")
  • FIX-4: Consensus approval_threshold default 0.672/3 (float precision fix for 2/3 majority)
  • CLOSE-1: Kappa discontinuity (intentional design), CLOSE-2: Prometheus private API (no alternative), CLOSE-3: Override rejection schema-code gap (intentional architecture)

Changelog (4.2.2)

  • Bug-Hunt #9 + Ultrathink: 8 bugs fixed (4M, 4L) + 2 ultrathink findings; 19 regression tests; 1466 tests / 94.22% coverage
  • Rigor: Close Deferrals: M6 (import normalization) + L47 (UtilityCalculator phi_S/phi_D validation) closed; T-1 ComplexityDecomposer NaN guard; 15 regression tests; 1441 tests / 94.14% coverage
  • Quality-Gate: DEKEntry frozen dataclass (cache immutability), schema closure (theta in interface contract), 1426 tests / 94.14% coverage
  • Docs-Sync #1: Comprehensive documentation audit — 9 files updated with 1398→1417 test metric sync
  • Docs-Sync #2: Changelog header relabeled (4.2.1 → 4.2.2), ROADMAP/test-count-methodology dates fixed (2026-02-08 → 2026-02-07), gap-analysis bumped to v1.30.0, CLAUDE.md §10 added 3 missing modules (config.py, cli.py, aegis_governance/), §4.10 updated from 4 → 7 optional dependency groups
  • Stale references fixed: gap-analysis.md date, test-count-methodology.md date, repository-structure.md CLAUDE.md annotation (v4.0.0 → v4.2.2), KNOWN_ISSUES.md version (4.2.1 → 4.2.2)
  • CLAUDE.md: Telemetry schema version reference corrected (v2.0.0 → v2.1.0)
  • ROADMAP.md: Version ordering anomaly fixed (v1.15.0 → v1.17.0 to restore monotonic ordering)
  • comprehensive-todo-discovery.md: Stale metrics corrected (1375/15 skipped → 1398/0 skipped, 91/91 → 103/103 bugs)
  • gap-analysis.md: GAP-L1 In-Progress table updated (66% → 100% code-complete), changelog entry added

Changelog (4.2.1)

  • Bug-Hunt Sessions #3 & #4: Hybrid Codex+Claude sweeps — 14 bugs fixed (11 MEDIUM, 3 LOW)
  • Session #3 MEDIUM: Four-eyes violation in override workflow, confidence fallback or vs is not None, current_state property returning stale derived values, terminal state guard in sign_with_stored_key, MCP server replying to JSON-RPC notifications
  • Session #3 LOW: RBAC _resolve_permissions return type set/frozenset inconsistency, KL divergence length guard, WebhookAlertSink Content-Type header loss with custom headers
  • Session #4 MEDIUM: Buffer overflow silent discard in pipeline, encryption/decryption error path mutation (2 fixes), PostgreSQL URL encoding in persistence, executor rollback audit gap, quality_no_zero_subscore config flag ignored
  • Ultrathink hardening: PIIManifest setfrozenset (immutable PII fields), decryption TypeError handler (crypto resilience), pipeline warn-path copy-on-error, PostgreSQL URL quote_plusSQLAlchemy URL.create() — 4 regression tests
  • 18 new regression tests across 12 test files
  • 6 LOW-severity bugs deferred (documented in KNOWN_ISSUES.md)
  • Rigor Protocol: 13 deferred ultrathink findings fixed (T-1..T-6, W-1..W-10) — 2 MEDIUM + 11 LOW with 18 regression tests
  • Bug-Hunt Session #5: 11 ultrathink findings fixed (5 MEDIUM, 6 LOW) with 9 regression tests — KL divergence re-normalization, inf histogram handling, import path fix, None guards, defensive copies, MappingProxyType immutability, exception guards
  • Bug-Hunt Session #6: 6 bugs fixed (3 MEDIUM, 3 LOW) with 10 regression tests — RBAC fail-open constraint (Codex), MCP non-dict JSON crash, drift inf baseline, CLI simplified names, pcw_decide empty next_steps, executor re-execution guard
  • Quality-Gate Ultrathink: 5 bugs fixed (3 MEDIUM, 2 LOW) with 5 regression tests — NaN confidence propagation in gate evaluation, CLI risk_score priority override, pipeline stop/start race condition, dead rationale_parts code in afa_bridge, decision_path case inconsistency
  • Benchmarks Enabled: --benchmark-skip--benchmark-disable — 15 benchmark tests now execute (0 skipped)
  • Bug-Hunt Session #8: 6 bugs fixed (3 MEDIUM, 3 LOW) with 8 regression tests — config utility_threshold YAML drop, drift histogram ignoring baseline range, Bayesian NaN propagation, consensus premature rejection, pipeline buffer_size=0 infinite loop, repository async lazy-load crash
  • Test Coverage: 1398 tests passing, 94.13% coverage
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.2.0)

  • Gap Closure Sprint (issues #24, #2, #7, #5, #8, #9): Major gap closure addressing RBAC enforcement, performance testing, override audit, DR drill, and monitoring dashboard gaps
  • Schema Alignment: Resolved three-way naming drift in telemetry override fields across schema YAML, OverrideInfo dataclass, and TelemetryEmitter payloads
  • New modules: src/rbac.py (RBAC enforcement engine), src/telemetry/alert.py (alerting rules), src/telemetry/metrics_server.py (metrics HTTP server)
  • New test suite: tests/test_schema_consistency.py (13 tests for schema-code consistency)
  • Wired RBAC into override workflow and pcw_decide decision flow
  • Added monitoring/ configs (Prometheus recording/alerting rules, Grafana dashboards)
  • Added to_schema_dict() on OverrideInfo for schema-compliant serialization
  • Added stale partial override Prometheus alert (AegisOverrideStalePartial)
  • 128 new tests across RBAC, alerting, metrics server, schema consistency, DR, benchmarks
  • Test Coverage: 1309 tests passing, 94.17% coverage
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.1.0)

  • v1.0 SDK Merge (PR #23, commit cfa3783): AEGIS v1.0 Governance Decision SDK
  • New modules: src/config.py (AegisConfig), src/cli.py, src/aegis_governance/__init__.py (facade), src/aegis_governance/mcp_server.py
  • 79 new tests: test_config.py, test_cli.py, test_facade.py, test_mcp_server.py
  • 4 runnable examples in examples/
  • README rewritten for SDK positioning
  • pyproject.toml: Added [project.scripts] entries (aegis, aegis-mcp-server)
  • Updated §1 entry points to reflect SDK surfaces (CLI, MCP, Python import)
  • Test Coverage: 1172 tests passing, 94.61% coverage
  • Quality Gates: All passing (ruff, black, mypy, bandit, pytest)

Changelog (4.0.0)

  • CLAUDE.md Audit & Regeneration: Full v4.0.0 rewrite with agentic AI hardening
  • Relocated 900-line changelog (v2.1–v3.36) to this file
  • NEW Section 11: Agentic AI Hardening (OWASP Agentic Top 10 mapping)
  • Added 5 developer playbooks (setup, quality gates, add gate, add workflow, optional deps)
  • Added Python code standards (type annotations, dataclass patterns, thread safety)
  • Added governance invariant protection protocol
  • Created 3 custom slash commands (/quality-gate, /sync-metrics, /governance-verify)
  • Enhanced ask-first triggers with agentic safety triggers
  • Updated audit: docs/claude/audits/aegis-root-v4.0.md
  • Reduced CLAUDE.md from 69KB (~1340 lines) to ~20KB (~620 lines)
  • Test Coverage: 1132 tests passing, 93.24% coverage
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.36.0)

  • Hybrid Bug-Hunt Session: Codex gpt-5.2-codex (xhigh) + 3 Claude debugger agents
  • Bug Fixes (1 HIGH, 4 MEDIUM):
  • Bug 1 (MEDIUM): gates.py:77-86 - Missing epsilon validation
    • Added validation in __init__ to reject non-positive epsilon_R/epsilon_P
    • Raises ValueError with descriptive message preventing division by zero
    • Regression tests: 4 tests for epsilon validation
  • Bug 2 (MEDIUM): gates.py:423,507 - Incorrect tail for negative trigger factor
    • Risk gate now computes left-tail P(Δ≤t) when trigger_factor < 0
    • Profit gate uses abs(trigger_factor) for symmetric both-tails check
    • Regression tests: 3 tests for negative trigger factor behavior
  • Bug 3 (MEDIUM): gates.py:674 - Utility confidence ignores threshold
    • Changed confidence calculation to use distance from threshold (margin = lcb - threshold)
    • Sigmoid function now reflects how far above/below threshold the utility is
    • Regression tests: 4 tests for confidence calculation
  • Bug 4 (MEDIUM): pipeline.py:282 - Queue not drained on stop
    • Added queue drain loop before final flush in stop() method
    • Prevents data loss when events are still queued during shutdown
    • Regression tests: 1 test for queue drain on stop
  • Bug 5 (HIGH): encryption.py:543 - PII encryption bypass for lists
    • _encrypt_dict didn't recurse into lists, leaving PII unencrypted
    • Added _encrypt_list() method for recursive list processing
    • Also fixed decryption.py with _decrypt_list() and _verify_list_integrity()
    • Regression tests: 4 tests for list encryption/decryption
  • Test Coverage: 1053 tests (+16 from v3.34.0), 93.83% coverage
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.35.0)

  • Hybrid Bug-Hunt Session: Second session with Codex gpt-5.2-codex (xhigh) + 3 Claude debugger agents
  • Bugs Identified: 6 total (1 HIGH, 5 MEDIUM) - fixed in v3.36.0
  • Test Coverage: 1037 tests, 94.11% coverage
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.34.0)

  • All Deferred Bugs Fixed: Resolved 17 deferred issues from hybrid bug-hunt sessions
  • MEDIUM Severity (1):
  • B2-1: emitter.py:334 - memory_sink unbounded growth - added maxlen parameter
  • LOW Severity (16):
  • B1-1,B1-2,B1-3: proposer.py - PERT validation and TOCTOU fixes
  • B2-4,B2-10,L47: gates.py, utility.py, complexity.py - validation improvements
  • B1-4,B1-5: bip322_provider.py, hybrid_kem.py - documentation and edge case handling
  • B2-11,B2-12: schema.py, afa_bridge.py - nested validation and hash confidence
  • B3-1,B3-2,B3-5,B3-7: consensus.py, override.py, durable.py, models.py - workflow validation and chain integrity
  • Test Coverage: 1037 tests (+81 from v3.33.0), 94.11% coverage (+0.48pp)
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.33.0)

  • Workflow Bug Fixes (4 MEDIUM Severity): Continued bug-hunt session fixes
  • B3-1: consensus.py - Empty eligible_voters and actor_roles validation
  • Added validation to reject empty eligible_voters with positive quorum_percentage
  • Previously: quorum could never be met, causing confusing runtime behavior
  • Added warning when get_required_missing() called with empty actor_roles dict
  • Regression tests: 6 tests for empty voters and actor_roles scenarios
  • B3-2: override.py - is_expired TOCTOU race condition documentation
  • Enhanced is_expired docstring to document advisory-only nature (TOCTOU risk)
  • Added check_and_mark_expired() method for atomic check-and-update operation
  • Signature operations already perform atomic expiration check internally
  • Regression tests: 5 tests for expiration handling and atomic marking
  • B3-5: durable.py - resume_or_create() ID mismatch detection
  • Added strict_id: bool = True parameter to detect workflow ID mismatches
  • Raises ValueError when resumed workflow ID differs from requested ID (indicates caller bug)
  • Use strict_id=False for legacy behavior (logs warning, returns stored workflow)
  • Regression tests: 3 tests for strict_id behavior
  • B3-7: models.py - verify_chain_link() method for chain validation
  • Added verify_chain_link(previous_transition) to WorkflowTransition
  • Validates: hash integrity, workflow ID match, state continuity, temporal ordering
  • Returns (is_valid, error_message) tuple for detailed error reporting
  • Regression tests: 7 tests for chain link validation scenarios
  • Test Coverage: 956 tests passing (+10 from v3.32.0), 93.63% coverage
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.32.0)

  • Hybrid Bug-Hunt Session: Codex gpt-5.2-codex (xhigh) + 3 Claude debugger agents
  • Lane A (Codex) - Bayesian Zero Override Fix (src/engine/bayesian.py:357-358):
  • Python truthy or fallback ignored explicit zero overrides in update_prior()
  • Fix: Changed current_mean or self.prior_mean -> self.prior_mean if current_mean is None else current_mean
  • Regression test: test_update_prior_empty_observations_respects_overrides
  • Lane B (Claude) - 4 MEDIUM Severity Fixes:
  • B2-3: prometheus_exporter.py - Duplicate metric registration on multi-instantiation
    • Added _get_or_create_counter(), _get_or_create_histogram(), _get_or_create_gauge() factory methods
    • Makes metric registration idempotent via REGISTRY lookup
  • B3-3: override.py - reject() discards actor_id/reason parameters
    • Added rejected_by, rejection_reason, rejected_at fields to OverrideRequest
    • Updated reject() to record rejection metadata
    • Added serialization/deserialization for rejection fields
  • B3-4: proposal.py - from_dict() loses prometheus exporter reference
    • Added from_dict_with_exporter() factory method for DI during deserialization
    • Added set_prometheus_exporter() method for post-construction injection
  • Test Coverage: 956 tests passing (+10 from v3.31.0), 93.63% coverage
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit, pip-audit)

Changelog (3.31.0)

  • Claude-GPT Dialogue Recommendations: Implemented 4 changes from multi-model consensus session
  • phi_S/phi_D Single Source of Truth:
  • Updated docs/architecture/afa-libertas-integration.md line 753-754
  • Updated docs/architecture/repository-structure.md line 546-549
  • All phi_S/phi_D values now reference schema/interface-contract.yaml as authoritative source
  • Corrected values: phi_S=100 (was 500), phi_D=2000 (was 10000)
  • KNOWN_ISSUES.md Cleanup:
  • Removed L45 from LOW Severity (intentional design, not a bug)
  • Moved L45 to "Intentional Patterns" section with explanation
  • Reclassified L7 from "Python limitation" to "Known Limitation" with HSM mitigation path
  • Added detailed HSM/KMS integration guidance for production deployments
  • docs-consistency.yml CI Workflow: New GitHub Actions workflow
  • Validates test count consistency across CLAUDE.md, README.md, ROADMAP.md, gap-analysis.md
  • Extracts version and coverage metrics from CLAUDE.md as source of truth
  • Weekly scheduled runs + push/PR triggers on documentation changes
  • Advisory warnings (non-blocking) for mismatches
  • Test Coverage: 946 tests passing, 93.48% coverage (unchanged)
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.30.0)

  • Deferred Bug Fix Complete: Fixed 2 remaining deferred issues (L44, L49)
  • L44 Fix: analyst.py:345 - Type coercion validation in _evaluate_utility_gate()
  • Added _coerce_to_float() method with try-except float() pattern
  • Validates all 8 numeric fields (mean, variance, lcb, ucb for both value/risk)
  • Regression tests: 9 tests for type coercion edge cases
  • L49 Fix: hybrid_provider.py:324 - Timing side-channel mitigation
  • Added audit_mode: bool = False parameter to HybridSignatureProvider
  • Detailed timing logs only when audit_mode=True for debugging
  • Generic error messages in default mode prevent timing leakage
  • Regression tests: 6 tests for audit_mode behavior
  • Research Verification: Fixes validated via ExaSearch
  • L44: try-except float() is standard Python type coercion idiom
  • L49: Configurable audit modes per Intel security guidance
  • Test Coverage: 946 tests passing (+15 from v3.29.0), 93.48% coverage
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.29.0)

  • Hybrid Bug-Hunt Session: Implementation of 4 priority fixes from bug-hunt plan v3.0
  • HIGH Severity Fixes (2):
  • H-WF-001: consensus.py:228 - All-abstain stuck state in QUORUM_MET
    • When quorum met with only abstention votes, workflow stayed stuck indefinitely
    • Fixed: Rejects immediately when quorum met with zero decisive votes
    • Regression tests: 4 tests for partial quorum scenarios
  • H-WF-003: pipeline.py - Thread safety race conditions
    • Stats counters not lock-protected, reset_stats() object replacement race
    • Fixed: Added _stats_lock, get_stats() returns snapshot, stop() timeout warning
    • Regression tests: 5 thread safety tests
  • MEDIUM Severity Fixes (3):
  • M24: hybrid_kem.py:279 - Empty plaintext now raises ValueError (with allow_empty opt-in)
  • M25: bip322_provider.py:109 - Keygen max retry limit (1000 attempts)
  • M-ENG-005: pcw_decide.py:187 - Added AttributeError catch for malformed context
  • Test Coverage: 931 tests passing (+15 from v3.28.0), 93.48% coverage
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.28.0)

  • Deferred Bug Cleanup: Fixed 16 issues from hybrid bug-hunt (5 MEDIUM, 11 LOW)
  • MEDIUM Severity Fixes (5):
  • M19: drift.py:68 - Added num_bins > 0 and epsilon > 0 validation
  • M20: bip322_provider.py:173 - Added empty message_hash validation
  • M21: kek_provider.py:224 - Documented version=0 alias for "current"
  • M22: pipeline.py:321 - Added _start_lock to fix TOCTOU race in start()
  • M23: key_store.py:516 - Added logging when key not found in record_usage()
  • LOW Severity Fixes (11):
  • L41-L43: gates.py - Input validation with warnings for out-of-range scores
  • L46: approver.py:139 - Removed redundant validation (covered by __post_init__)
  • L48: proposer.py:152 - Better error for already-submitted proposals
  • L50: hybrid_kem.py:279 - Warning for empty plaintext in encrypt()
  • L51: mldsa.py:98 - Empty message validation in sign()
  • L52: hybrid_provider.py:225 - Empty message_hash validation
  • L53: afa_bridge.py:467 - Positive limit validation in get_decision_history()
  • L54: consensus.py:217 - Fixed all-abstain stuck state
  • L55: bayesian.py:362 - Extracted magic number to _FALLBACK_STD constant
  • Remaining Deferred (2): L45 (intentional), L47 (extreme values valid)
  • Test Coverage: 916 tests passing (+4 regression tests), 93.39% coverage
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.27.0)

  • Hybrid Bug-Hunt Session: Codex gpt-5.2-codex (xhigh) + 3 Claude debugger agents
  • HIGH Severity Fixes (2 - Codex Lane A):
  • H1: pcw_decide.py:243 - Human approval bypass when all gates pass
    • CRITICAL: pcw_decide() returned PROCEED even with requires_human_approval=True
    • Added human_required gating to enforce PAUSE/ESCALATE
  • H2: consensus.py:337 - Workflow ID collision with ProposalWorkflow
    • Namespaced to consensus:{proposal_id} to avoid persistence conflicts
  • New Issues Identified (Claude Lane B): 5 MEDIUM, 15 LOW (documented in KNOWN_ISSUES.md)
  • Test Coverage: 912 tests passing (+2 regression tests), 93.48% coverage
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.26.0)

  • Rigor Protocol Phase 3: Fixed 13 remaining issues (5 MEDIUM, 8 LOW)
  • MEDIUM Severity Fixes (5):
  • M14: proposal.py:359 - Documented is_terminal excludes ROLLED_BACK
  • M15: pipeline.py:90 - PipelineStats.errors bounded to deque(maxlen=1000)
  • M16: drift.py:279 - calibrate_thresholds() requires at least 2 values
  • M17: override.py:976 - Documented from_dict() maintenance contract
  • M18: pipeline.py:390 - Replaced bare except Exception in validate_timestamp()
  • LOW Severity Fixes (8):
  • L33: approver.py:24 - Added __post_init__ validation for ApprovalVote.vote
  • L34: emitter.py:164 - Fixed auto-linking parent_event_id (correlation chain only)
  • L35: schema.py:240 - Removed redundant local import dataclasses
  • L36: proposal.py:58 - Added __post_init__ validation to ProposalMetadata
  • L38: gates.py:62 - Enhanced GateResult.margin docstring
  • L39: pcw_decide.py:28 - Moved UtilityComponents import to module level
  • L40: pcw_decide.py:471 - Added debug logging to quick_risk_check()
  • M13: proposer.py:118 - Enhanced PERT validation comment (reclassified)
  • Test Updates: Updated test_cast_vote_invalid and test_stats_defaults for new validation
  • Test Coverage: 910 tests passing, 93.48% coverage
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit)
  • Remaining Issues: 2 (M6 non-relative imports; L7 secure memory erase)

Changelog (3.25.0)

  • Rigor Protocol Phase 2: Fixed 17 issues (4 MEDIUM, 13 LOW) with 25 regression tests
  • Test Coverage: 910 tests passing (+25), 93.48% coverage
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.24.0)

  • Rigor Protocol Phase 1: Fixed 7 issues (2 MEDIUM, 4 LOW, 1 documentation)
  • Claude-GPT Dialogue: Resolved M15 retry logic architecture (Hybrid approach)
  • Test Coverage: 885 tests passing, 93.48% coverage
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.23.0)

  • Minor Documented Issues Fixed: 11 issues from KNOWN_ISSUES.md resolved
  • Test Coverage: 885 tests passing (+18 from 867), 93.48% coverage
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.22.0)

  • Hybrid Bug-Hunt Session Complete: Codex gpt-5.2-codex (3 iterations) + 3 Claude debugger agents
  • Test Coverage: 867 tests passing (+9 from 858), 93.79% coverage
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.21.0)

  • Hybrid Bug-Hunt Session: 3 Claude debugger agents reviewing 42 source files in parallel
  • Test Coverage: 858 tests passing (+4 from 854), 93.79% coverage
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.20.0)

  • All LOW Severity Bugs Fixed: Complete resolution of L1-L9 from hybrid bug-hunt
  • Test Coverage: 854 tests passing (+8 from 846), 93.79% coverage
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.19.0)

  • Hybrid Bug-Hunt Session: Codex gpt-5.2-codex + 3 Claude debugger agents
  • Test Coverage: 845 tests passing (+6 from 839)
  • Quality Gates: All passing (mypy --strict, ruff, black)

Changelog (3.18.0)

  • Quality Gate Fixes: Full quality-gate execution with all 8 phases
  • Test Coverage: 839 tests passing
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.17.0)

  • Bug Fixes (KNOWN_ISSUES.md): Fixed 4 MEDIUM severity bugs from hybrid bug-hunt
  • Test Coverage: 839 tests passing (+2 from 837)
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.16.0)

  • Bug Hunt Session: Hybrid bug-hunt with 3 Claude debugger agents
  • Test Coverage: 837 tests passing (maintained)
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.15.0)

  • Bug Fixes (Rigor Protocol): Fixed 9 except Exception patterns from hybrid bug-hunt session
  • Test Coverage: 837 tests passing (+9 from 828)
  • Quality Gates: All passing (mypy --strict, ruff, black, bandit)

Changelog (3.14.0)

  • Bug Fixes (Rigor Protocol): Fixed 3 MEDIUM severity bugs from hybrid bug-hunt session
  • Test Coverage: 828 tests passing (+7 from 821)
  • Quality Gates: All passing (mypy --strict, ruff, black)

Changelog (3.13.0)

  • Mathematical Coherence Review: Addressed 4 design decisions from rigor protocol review
  • Test Coverage: 821 tests passing (+14 from 807), 93.34% coverage

Changelog (3.12.0)

  • Optional Dependencies Installed: Full post-quantum cryptography and BIP-322 now active
  • Test Coverage Milestone: All tests now pass with 0 skipped (821 passed)

Changelog (3.11.0)

  • Mathematical Coherence Fixes: Implemented 4 critical fixes from multi-model coherence review
  • Test Coverage: Added 20 new tests (506 total passing, 282 skipped)

Changelog (3.10.0)

  • ADR Consolidation Complete: Finished consolidating ADR directories

Changelog (3.9.0)

  • Logic Coherence Fixes: Public API improvement, factory method, overflow protection

Changelog (3.8.0)

  • Documentation Synchronization & Future Work Roadmap

Changelog (3.7.0)

  • GAP-L1 PHASE 1 IMPLEMENTED: Prometheus Metrics Foundation

Changelog (3.6.0)

  • Index & Cross-Reference Update: Systematic update of all TOCs, indexes, and cross-references

Changelog (3.5.0)

  • Documentation Enhancement & Quality Gates: CI/CD badges, test count methodology, dependency visualization

Changelog (3.4.0)

  • Repository Nomenclature Update: Clarified AEGIS vs Guardrails naming

Changelog (3.3.0)

  • GAP-Q2 PHASE 2 COMPLETE: Full post-quantum encryption implementation
  • TEST COVERAGE MILESTONE: 846 tests passing (444 new), 93.60% coverage

Changelog (3.2.0)

  • GAP-Q1 IMPLEMENTED: Post-quantum hybrid signatures (Ed25519 + ML-DSA-44)

Changelog (3.1.0)

  • GAP-M4 IMPLEMENTED: Full BIP-322 signature format support

Changelog (3.0.0)

  • AEGIS v1.0.0 RELEASED: Production-ready release with full CI/CD validation

Changelog (2.9.0)

  • Documentation Synchronization: Comprehensive audit and alignment of all documentation

Changelog (2.8.0)

  • Repository Migration: Restructured for implementation-ready architecture

Changelog (2.7.0)

  • AEGIS Integration: Unified five frameworks into Autonomous Engineering Governance System

Changelog (2.6.0)

  • Documentation synchronization & cleanup review completed

Changelog (2.5.0)

  • Added EPCC methodology documentation

Changelog (2.4.0)

  • Fixed markdown linting issues

Changelog (2.3.0)

  • Documentation synchronization audit completed

Changelog (2.2.0)

  • Added framework comparison analysis

Changelog (2.1.0)

  • Removed [PROVISIONAL] tags - tooling now configured