AEGIS CLAUDE.md Changelog¶
Relocated from:
CLAUDE.mdSection 9 (Change Management) Purpose: Preserve full version history while keeping CLAUDE.md concise for agent system prompt consumption Coverage: v2.1.0 (2025-12-26) through v4.5.59 (2026-02-25) Active CLAUDE.md: See/CLAUDE.mdfor current rolling changelog (latest 2 versions)
v4.5.59 — Utility Scale & Sign Convention Fixes (2026-02-25)¶
Scope: Lambda auto-utility scale mismatch and sign convention fixes, E2E validation Test Coverage: 3041 tests passing, ~94.9% coverage (2 skipped) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changes¶
- Complexity tax scale fix: Zeroed
ComplexityBreakdown(static=0.0, dynamic=0.0)—phi_S=100.0 $/ptcoefficient overwhelmed normalized [0,1] profit deltas (LCB=-64.9 for a 0.10 profit improvement) - Risk term sign fix: Zeroed risk delta via
_make_pert(0.0)—risk_term = kappa * delta_Rproduces negative values when risk decreases (delta_R < 0), counteracting the profit gain (LCB=-0.015 for a beneficial proposal) - Design rationale: Risk and complexity gates evaluate independently; the utility gate now measures pure profit uplift, avoiding scale mismatches between normalized advisor inputs and dollar-denominated calculator coefficients
- E2E validation: 8 API scenarios via browser fetch (PROCEED, PAUSE, HALT, multi-domain) + 3 full wizard walkthroughs (Engineering low-impact, Engineering worst-case, Life Decision) confirm correct behavior
Deployed¶
- Lambda redeployed via
aegis-deploy.ymlworkflow_dispatch (dev stage) - CI green: Python CI + docs deploy + CDK deploy + smoke test all passing
v4.5.58 — Advisor Utility & Novelty Gate Fixes (2026-02-25)¶
Scope: Lambda auto-utility computation, advisor novelty step reframe Test Coverage: 3041 tests passing, ~94.9% coverage (2 skipped, +12 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changes¶
- Lambda auto-utility: Added
_make_pert()and_compute_utility()— synthesizes PERT three-point estimates from flat risk/profit/complexity parameters, auto-computes UtilityResult when not explicitly provided. Utility gate now produces real values instead of N/A for advisor proposals. - Advisor novelty reframe: Step 7 changed from "How new is this?" to "How well-documented is this type of change?" — inverted value mapping so well-documented precedent = high score = passes gate. Radio values: 0.95/0.88/0.72/0.40 (was 0.2/0.5/0.7/0.95). Tool modifiers: +0.02/-0.03/-0.08 (was 0/+0.1/+0.2).
- Gate explanations: Novelty pass/fail messages updated to match precedent framing
- Review screen: Label changed from "Novelty" to "Precedent"
v4.5.57 — Advisor & Lambda CORS Fixes (2026-02-25)¶
Scope: Advisor evaluate button fix, Lambda CORS headers, custom domain migration Test Coverage: 3029 tests passing, ~94.8% coverage (2 skipped) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changes¶
- Advisor evaluate() DOM collision: Renamed
evaluate()torunEvaluation()— inlineonclickhandler resolved todocument.evaluate()(XPath), not the custom function (3 call sites) - Lambda CORS headers: Added
Access-Control-Allow-Origin: *,Allow-Headers,Allow-Methodsto_response()— API Gateway proxy integration passes Lambda response as-is, so CORS headers must be in Lambda response (not just OPTIONS preflight) - Custom domain migration: Updated all
undercurrentai.github.ioURLs toaegis.undercurrentholdings.comin advisor HTML (3 links), pyproject.toml Documentation URL, and docs/api/rest.md CORS headers table - CDK deploy: Redeployed Lambda via
aegis-deploy.ymlworkflow — smoke test passed
v4.5.56 — Bug Hunt #45 (2026-02-25)¶
Scope: Hybrid bug hunt (Codex gpt-5.3-codex xhigh + 3 Claude sweep agents) Test Coverage: 3029 tests passing, ~94.8% coverage (2 skipped, +31 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changes¶
- BH45-Codex-M1: ProposalWorkflow.transition_to() metadata stored by reference —
copy.deepcopyfix prevents caller mutation corrupting audit trail - BH45-M1: MCP risk_score eager evaluation — conditional fallback matching Lambda BH13-M1 fix (transport parity)
- BH45-M2: BayesianPosterior.update_prior missing current_mean/current_std finiteness validation — added isfinite + validate_positive
- BH45-T1: BayesianPosterior.update_prior current_mean missing bool guard (ultrathink finding) — added isinstance(bool) check
- BH45-L1: PipelineConfig missing retention_days/drift_window_size validation — added type/range checks matching buffer_size pattern
- BH45-L2: PipelineConfig pii_encryption_on_error/storage_on_error accept arbitrary strings — added enum validation (PII safety)
- Deferred: 3 LOW findings (drift next_steps recomputation, schema_revisions counter, consensus QUORUM_MET semantics)
v4.5.55 — Scoring Guide MCP Tool + Advisor v2 (2026-02-25)¶
Scope: Scoring Guide MCP tool and Advisor v2 rewrite Test Coverage: 2998 tests passing, ~94.8% coverage (2 skipped, +31 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changes¶
- Scoring Guide MCP Tool: New
aegis_get_scoring_guidetool with 5-domain derivation guidance (trading, cicd, moderation, agents, generic); surfaces parameter formulas, range guides, common mistakes, and worked examples through the MCP protocol - Advisor v2: Complete rewrite with domain funnel (6 domains), 8-step factual scoring rubric replacing vibes-based sliders, real API calls with provisioned demo key
- Ultrathink Fixes: Defensive copy for
get_scoring_guide()return value, HTML escaping for gate detail values, null guard foroverride_requires - Test Coverage: 2998 tests passing, ~94.8% coverage (2 skipped, +31 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.54 — SaaS Commercialization Sprint (2026-02-24)¶
Scope: Full commercial readiness transformation — API key auth, customer provisioning, docs site, PyPI workflows Test Coverage: 2967 tests passing, ~94.8% coverage (2 skipped, +9 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changes¶
- Phase 1 — API Key Auth: CDK
lambda_stack.pyswitched from IAM to API key auth + usage plans; per-stage throttling (dev 50 req/s, staging 200 req/s, prod 500 req/s);/healthfully public; CORS widened to*; optional custom domain via CDK context;UsagePlanIdCfnOutput added - Phase 1 — Tenant Context: Lambda handler extracts tenant_id from
requestContext.identity.apiKeyId; injects_tenant_idand_request_idinto response body; addsX-AEGIS-TenantandX-AEGIS-Request-Idheaders; 404 responses include tenant context (UT-1 fix) - Phase 1 — Provisioning: New
scripts/provision-customer.py— boto3 script for API key creation + usage plan attachment + usage queries; show-once key value; rollback on failure; env var fallbacks - Phase 2 — OpenAPI: New
docs/api/openapi.yaml(OpenAPI 3.1.0) — 3 endpoints, 9 component schemas, ApiKeyAuth security scheme - Phase 2 — REST Docs: New
docs/getting-started/quickstart-rest.mdanddocs/api/rest.md— curl examples, field tables, error codes - Phase 3 — Docs Site: New
mkdocs.yml+ 10 new docs pages (index, installation, SDK/CLI/REST/MCP quickstarts, onboarding, GitHub Action, AI governance);docs-deploy.ymlGitHub Pages workflow - Phase 3 — PyPI: New
.github/workflows/pypi-publish.yml— OIDC trusted publishing; multi-version smoke test (3.9, 3.11, 3.12) - Phase 4 — Polish: New
SECURITY.md(vulnerability disclosure),CHANGELOG.md(customer-facing);pyproject.tomlbumped to v1.1.0 with mkdocs-minify-plugin + pymdown-extensions docs deps; Documentation + Changelog URLs added - New files: 22 files created (scripts, docs, workflows, config)
- Governance invariants preserved:
src/engine/,src/integration/pcw_decide.py,src/crypto/,schema/interface-contract.yaml,schema/rbac-definitions.yaml,.github/workflows/python-ci.ymlall untouched
v4.5.53 — Transport Parity Fix (2026-02-24)¶
Scope: Comprehensive transport parity audit closing 15 of 22 gaps across CLI, MCP, and Lambda Test Coverage: 2958 tests passing, ~94.8% coverage (2 skipped, +35 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changes¶
- GAP 1: metadata input extraction + validation in CLI and MCP (parity with Lambda)
- GAP 2-4 (CRITICAL): requires_human_approval, time_sensitive, reversible bool flags in MCP — previously missing, silently bypassing human oversight controls
- GAP 6-7: MCP inputSchema updated with 5 new documented properties (bool flags, metadata, session_id)
- GAP 8: Lambda telemetry_emitter wired via
_wire_lambda_telemetry()helper - GAP 12: MCP estimated_impact now strict — rejects non-string values (parity with CLI/Lambda)
- GAP 15: CLI session_id changed from static
"cli-session"to dynamicuuid.uuid4() - GAP 17: CLI SSRF validation via shared
telemetry/url_validation.pymodule - GAP 18-19: MCP output now includes
constraintsandoverride_requiresfields - GAP 20: MCP output now includes
timestampfield - GAP 21: MCP output now includes per-gate
confidencefield - GAP 22: Lambda shadow drift dict includes
messagefield
New Module¶
src/telemetry/url_validation.py— shared SSRF-safe URL validation extracted from MCP server; uses resolve-then-validate pattern withnot addr.is_global(consistent across Python 3.9-3.12)
v4.5.52 — Bug Hunt #44 (2026-02-23)¶
Method: Hybrid — 3 Claude sweep agents + Codex gpt-5.3-codex (xhigh reasoning) Test Coverage: 2923 tests passing, ~94.8% coverage (2 skipped, +15 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Bugs Fixed (4 total: 1 Codex + 2M, 1L)¶
- BH44-Codex-M1 (Codex) —
crypto/schema_signer.py:sign_tools_list()chain state committed before manifest signing — partial state corruption on failure; reordered to sign manifest before committing chain state - BH44-M1 (Claude) —
actors/calibrator.py:_NONNEGATIVE_GATE_PARAMSincorrectly includesutility_threshold— rejects negative values thatGateEvaluatoraccepts; moved to appropriate parameter set - BH44-M2 (Claude) —
actors/proposer.py:create_draft()does not catchTypeErrorfrom_validate_pert_estimates()— raw exception escapes instead ofActionResultfailure; added try/except TypeError wrapper - BH44-L1 (Claude) —
integration/pcw_decide.py:_evaluate_drift_policyreturns aliased constraints list whendrift_monitor is None— inconsistent with non-None path which returns a new list; changed to returnlist(constraints)defensive copy
v4.5.51 — Bug Hunt #43 (2026-02-23)¶
Method: Hybrid — 3 Claude sweep agents + Codex gpt-5.3-codex (xhigh reasoning) Test Coverage: 2908 tests passing, ~94.8% coverage (2 skipped, +31 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Bugs Fixed (11 total: 2 Codex + 5M, 4L)¶
- BH43-Codex-M1 (Codex) —
actors/analyst.py: Analyst missing try/except around gate evaluations — unhandled exceptions crash instead of returning ActionResult failure; wrapped gate evaluation calls in try/except with structured error return - BH43-Codex-M2 (Codex) —
actors/analyst.py: Analyst missing else TypeError for non-listquality_subscores— silently ignores invalid type; added explicitisinstance(list)guard withTypeErrorfor non-list/non-None input - BH43-M1 (Claude) —
cli.py: CLIquality_subscores=nullraisesTypeError— Lambda/MCP default to[0.7, 0.7, 0.7]but CLI crashes onNone; added null-coalesce parity with other transports - BH43-M2 (Claude) —
engine/utility.py:ComplexityBreakdownaccepts bool fields without validation —True/Falsesilently coerce to1/0via bool-is-int; addedisinstance(bool)guard in__post_init__ - BH43-M3 (Claude) —
engine/utility.py:UtilityCalculator.calculate()value_variancenegative silently floored to0— produces overly optimistic LCB; changed to raiseValueErrorfor negative variance - BH43-M4+M5 (Claude) —
telemetry/pipeline.py:TelemetryPipeline.ingest()no defensive copy — bidirectional aliasing between caller dict and internal buffer; caller mutations corrupt queued events; addedcopy.copy()on ingest - BH43-L1 (Claude) —
cli.py: CLI metric defaults ignores simplified alias when canonical key is null —dict.get("key", default)returnsNone(not default) for explicit JSON null; added explicit null-check before alias fallback - BH43-L2 (Claude) —
engine/utility.py:UtilityCalculator.calculate()value_low_conf/opex_deltaNaN/Inf not validated — non-finite values propagate through utility computation; addedmath.isfinite()guards - BH43-L3 (Claude) —
engine/utility.py:UtilityCalculator.calculate()covariance terms NaN/Inf not validated — same pattern as L2; addedmath.isfinite()guards forcov_risk_valueandcov_complexity_quality - BH43-L4 (Claude) —
workflows/proposal.py:ProposalWorkflow.from_dict()usescls()instead ofcls.__new__()— constructor generates spuriouscreated_at/updated_attimestamps that overwrite deserialized values; switched tocls.__new__(cls)pattern (matching ConsensusWorkflow)
Quality Gate — Ultrathink Findings (1)¶
- QG-T1 —
workflows/proposal.py:ProposalWorkflow.from_dict()missingevaluation_result=Nonefor workflows that had no evaluation — deserialized workflows without evaluation results got stale/incorrect default; added explicitNoneassignment
v4.5.50 — Bug Hunt #42 (2026-02-23)¶
Method: Hybrid — 3 Claude sweep agents + Codex gpt-5.3-codex (xhigh reasoning) Test Coverage: 2877 tests passing, 94.81% coverage (2 skipped, +29 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Bugs Fixed (13 total: 3 Codex + 6M, 2L + 2 ultrathink)¶
- BH42-M1 (Claude) —
engine/complexity.py:ComplexityDecomposermutable default weights dict —self.weights = weights or self.DEFAULT_WEIGHTScreated shared reference; mutation of instance weights corrupted class-level DEFAULT_WEIGHTS for all future instances; changed todict(weights) if weights else dict(self.DEFAULT_WEIGHTS)(defensive copy) - BH42-M2 (Claude) —
actors/calibrator.py:novelty_kwas in_NONZERO_GATE_PARAMS(rejects ==0) but GateEvaluator requires strictly positive; negative values trivially bypassed novelty gate; moved to_POSITIVE_GATE_PARAMS - BH42-M3 (Claude) —
telemetry/prometheus_exporter.py: NaN/Inf latency permanently corrupts Prometheus histogram_sum(irreversible without process restart); addedmath.isfinite()guards torecord_gate_evaluation(),record_decision(), andMetricsTimer.__exit__() - BH42-M4 (Claude) —
telemetry/prometheus_exporter.py: NaN KL divergence set viaset_kl_divergence()disables drift alerting permanently; addedmath.isfinite()guard with warning log - BH42-M5 (Claude) —
telemetry/emitter.py: rootemit()method usedcorrelation_id or self.default_correlation_id— empty string""(valid sentinel) replaced by default; changed tois not Nonepattern for bothcorrelation_idandparent_event_id - BH42-M6 (Claude) —
lambda_handler.py:shadow_mode = body.get("shadow_mode") is Truesilently ignored non-bool values (int 1, string "true"); added explicitisinstance(bool)validation matching other bool fields (requires_human_approval,reversible) - BH42-L1 (Claude) —
integration/pcw_decide.py:risk_posterior = risk_gate.posterior_probability or 0.0— zero posterior probability treated as falsy and replaced; changed toif is not None else 0.0 - BH42-L2 (Claude) —
integration/afa_bridge.py: sameposterior_probability or 0.0pattern as BH42-L1; fixed tois not None - BH42-Codex-M1 (Codex) —
integration/afa_bridge.py:required = context.get("required_authorizations") or []— explicit falsy values (empty list, 0) coerced to default; changed to explicitNonecheck - BH42-Codex-M2 (Codex) —
workflows/consensus.py:ConsensusConfig.allow_abstainaccepted non-bool via coercion (1,"yes"); addedisinstance(self.allow_abstain, bool)guard in__post_init__ - BH42-Codex-L1 (Codex) —
workflows/persistence/repository.py: concurrentsave_checkpoint()calls causedIntegrityErroron checkpoint_number uniqueness collision; added bounded retry loop (3 attempts) with_is_checkpoint_number_conflict()classifier
Quality Gate — Ultrathink Findings (2)¶
- QG-T1 —
aegis_governance/mcp_server.py: MCP_tool_evaluate_proposal()usedshadow_mode = arguments.get("shadow_mode") is True— transport parity violation with Lambda; addedisinstance(bool)validation returning error dict - QG-T2 —
actors/analyst.py: 6 instances ofgate_result.confidence or 0.0—confidence=0.0treated as falsy; replaced all withif gate_result.confidence is not None else 0.0
v4.5.49 — Bug Hunt #41 (2026-02-22)¶
Method: Hybrid — 3 Claude sweep agents + Codex gpt-5.3-codex (xhigh reasoning) Test Coverage: 2848 tests passing, 94.82% coverage (2 skipped, +33 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Bugs Fixed (7 total: 1 Codex + 4M, 2L)¶
- BH41-M1 (Claude) —
actors/analyst.py:_run_quality_analysis()includedNoneelements inquality_subscoreswithoutsaw_non_nullguard;[None, 0.8]silently dropped None but[None, None]defaulted inconsistently; addedsaw_non_nullfilter matchingafa_bridgepattern so all-None list defaults and partially-valid list filters correctly - BH41-M2 (Claude) —
engine/validation.py:validate_range()defaultcheck_nan=False— NaN inputs silently passed range checks; changed default toTrue; all production callers already passedcheck_nan=Trueexplicitly, so no behavior change for existing call sites - BH41-M3 (Claude) —
crypto/schema_signer.py:create_tool_statement()mutated_prev_digestson each call — partialsign_statement()failure left chain in inconsistent state;sign_tools_list()now collectspending_digestsfirst, commits atomically only after allsign_statement()calls succeed - BH41-M4 (Claude) —
workflows/consensus.py:get_required_missing()includedDEFERvoters invoted_rolesaccumulation — a singleDeferredVotertrivially satisfied role coverage for any role;DEFERnow excluded consistent withvotes_castexclusion - BH41-L1 (Claude) —
actors/calibrator.py:list_proposals()iterated live proposal dict while serializing enum.value— concurrentpropose()mutations causedAttributeErroron partially-mutated proposals; snapshot status enum values underself._lock - BH41-L2 (Claude) —
telemetry/emitter.py:emit_mcp_invocation()usedcorrelation_id or self.default_correlation_id— empty-string""(valid sentinel) silently replaced with default; changed toOptional[str] = Nonewith explicitis not Noneguard - BH41-Codex (Codex) —
engine/complexity.py:ComplexityDecomposer.decompose()acceptedboolascomplexity_floor—True/Falsepassedvalidate_rangenumeric check via bool-is-int coercion; addedisinstance(complexity_floor, bool)guard withTypeErrorbeforevalidate_range
Quality Gate Fixes (Phase 1 — Verify)¶
- ruff B017 —
tests/test_bug_hunt_41.pylines 189, 205:pytest.raises(Exception)narrowed topytest.raises(ValueError)(invalid Ed25519 key raises ValueError) - black — Auto-formatted 5 files:
test_bug_hunt_41.py,calibrator.py,emitter.py,test_schema_signer.py,test_engine.py - mypy attr-defined —
calibrator.py:847: addedproposals_snapshot: list[dict[str, Any]] = []type annotation to fix"object" has no attribute "value"inference error
v4.5.48 — Bug Hunt #40 (2026-02-22)¶
Method: Hybrid — 3 Claude sweep agents + Codex gpt-5.3-codex (xhigh reasoning) Test Coverage: 2815 tests passing, 94.78% coverage (2 skipped, +40 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Bugs Fixed (9 total: 4M, 5L)¶
- BH40-M1 (Codex+Claude) —
integration/afa_bridge.py:_validate_quality_subscores()failed to distinguish all-null list[None,None]from explicit empty list[]; both defaulted to[0.7,0.7,0.7], bypassing quality gate fail-closed behavior; addedsaw_non_nulltracker so[]returns[](fail-closed) while all-null correctly defaults - BH40-M2 (Both) —
telemetry/emitter.py:BatchHTTPSink.stop()readself._threadoutside the lock — race with concurrentstart(); extract ref + clear under lock, join outside lock (same pattern as BH38-M4, BH39-M1) - BH40-M3 (Claude) —
engine/validation.py:validate_normalized()missingisinstance(value, bool)guard;True/Falsepassed[0,1]range check via bool-is-int coercion - BH40-M4 (Claude) —
config.py:_parse_mcp_rate_limitcheckedisinstance(value, float)on original value; string"3.5"bypassed check and was silently truncated to3viaint(); fix: convert first, then checkfv.is_integer() - BH40-L1 (Claude) —
engine/gates.py:GateEvaluatoraccepted negative values fornovelty_N0,novelty_threshold,complexity_floor,quality_min_score; negative thresholds trivially disable governance gates; added non-negativity guard - BH40-L2 (Claude) —
config.py:_parse_kl_drift_dictwindow_days: same string-fractional truncation pattern as BH40-M4; separatedValueError(non-finite) fromTypeError(fractional) to preserve existing test semantics - BH40-L3 (Claude) —
aegis_governance/mcp_server.py: stdio size guard usedlen(line)(Unicode code-point count) instead oflen(line.encode("utf-8"))(byte count); multi-byte characters allowed oversized requests through - BH40-L4 (Claude) —
integration/afa_bridge.py:get_decision_history()used truthyif agent_id:— empty string""was treated as no-filter, bypassing agent_id filtering; fixed toif agent_id is not None: - BH40-L5 (Claude) —
telemetry/encryption.py:DEKRotator.get_decryptor()andlist_versions()readself._dekswithoutself._lock; only write paths (BH39-M2) had been protected; added lock acquisition for all read paths
v4.5.47 — Bug Hunt #39 (2026-02-21)¶
Method: Hybrid — 3 Claude sweep agents + Codex gpt-5.3-codex (xhigh reasoning) Test Coverage: 2775 tests passing, 94.77% coverage (2 skipped, +54 new) Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Bugs Fixed (13 total: 1H, 6M, 6L)¶
- BH39-H1 (High) —
workflows/persistence/models.py:verify_chain_link()chain root forgery — root link was not self-consistency checked, allowing craftedfrom_link/to_hashchains to pass verification; added root hash self-consistency guard - BH39-M1 (Medium) —
telemetry/pipeline.py:TelemetryPipeline.stop()held_lockduringthread.join()— blocked concurrentstart()for up to N seconds; lock released before join - BH39-M2 (Medium) —
telemetry/encryption.py:DEKRotator.generate_dek_version()TOCTOU race — version ID computed, then written without holding lock; version now pinned under lock before write - BH39-M3 (Medium) —
workflows/persistence/key_store.py: KeyStoreaudit_lockheld duringkek.decrypt()(blocking I/O) — lock released before decrypt call - BH39-M4 (Medium) —
engine/gates.py:GateEvaluatoracceptsfloat('inf')for trigger factors — silently disables Bayesian gate (posterior probability of exceeding infinity is always 0, gate permanently passes); addedmath.isinf()guard - BH39-M5 (Medium) —
engine/utility.py:UtilityResultaccepts NaN forvariance/raw/lcb— NaN propagates to silent gate failure (NaN > threshold is always False); added__post_init__NaN rejection;-infallowed forraw/lcb(signals extremely low utility) - BH39-M6 (Medium) —
config.py:_parse_kl_drift_dictsilently truncates floatwindow_days(30.7 → 30); reject fractional floats beforeint()coercion - BH39-L1 (Low) —
workflows/consensus.py:ConsensusWorkflow.from_dict()usedcls()constructor mid-deserialization — inconsistent intermediate state if constructor validates; switched tocls.__new__(cls)pattern - BH39-L2 (Low) —
engine/gates.py:GateEvaluatornovelty_k=0.0makes logistic gate score-insensitive (denominator always ≥1 regardless of score); added> 0guard with clear error message - BH39-L3 (Low) —
aegis_governance/mcp_server.py: JSON-RPC notification with non-string method received error response, violating JSON-RPC 2.0 §4.1;is_notificationnow computed before method type check so malformed notifications are silently dropped - BH39-L4 (Low) —
crypto/bip322_provider.py:encode_simple()crashes with crypticOverflowErrorfor signatures ≥256 bytes (bytes([256])overflow); added upfrontlen(signature) != SIGNATURE_SIZEcheck with clearValueError - BH39-L5 (Low) —
config.py:_parse_mcp_rate_limitsilently truncates floatmcp_rate_limit(e.g.,60.9 → 60); reject fractional floats - BH39-Codex-2 (Low) —
telemetry/emitter.py:memory_sinkacceptsmaxlen=0for list sinks —del events[0]on empty list;TelemetryEmitter.emit()swallows sink exceptions → silent telemetry loss; addedmaxlen ≥ 1guard for list-backed sinks
Test Files Added¶
tests/test_bug_hunt_39.py— 21 tests (BH39-H1 chain root, BH39-M1/M2/M3 concurrency/TOCTOU, BH39-L1 ConsensusWorkflow)tests/test_bug_hunt_39b.py— 31 tests (BH39-M4/M5/M6/L2/L3/L4/L5)tests/telemetry/test_emitter.py— +1 test (BH39-Codex-2 maxlen=0)
v4.5.46 (2026-02-21)¶
- Bug Hunt #38 (Hybrid): 6 bugs (1H, 4M, 1L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 35 regression tests
- BH38-H1: key_store.py uses Python 3.10+ parenthesized async-with syntax — SyntaxError on Python 3.9 (declared minimum); replaced with comma-separated form +
# fmt: offguards for black stability - BH38-M1: UtilityCalculator accepts bool for phi_S, phi_D, gamma, kappa, migration_budget (bool-is-int bypass)
- BH38-M2: GateEvaluator accepts bool for risk_trigger_factor, profit_trigger_factor, and all threshold params (bool-is-int bypass)
- BH38-M3: CalibrationProposal accepts bool for current_value/proposed_value; _validate_gate_param accepts bool without rejection
- BH38-M4: MetricsServer.stop() held _lock during thread.join() — blocked concurrent start() for up to 5 seconds; fixed by extracting refs under lock, releasing, then shutdown+join outside
- BH38-L1 (Codex): BatchHTTPSink accepts float/non-int for integer params — upgraded bool-only check to full
not isinstance(int) or isinstance(bool)pattern - QG-UT1: GateEvaluator(trigger_confidence_prob=True) silently accepted via validate_range inclusive upper bound (True==1.0); added explicit bool guard before validate_range in gates.py
- Test Coverage: 2721 tests passing, 94.78% coverage (2 skipped, +36 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.45 (2026-02-20)¶
- Bug Hunt #37 (Hybrid): 6 bugs (3M, 3L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 26 regression tests
- BH37-M1: BayesianPosterior compute_posterior/compute_full NaN/Inf validation — added math.isfinite() guards + fail-closed GateResult for non-finite deltas
- BH37-M2: emergency_halt() audit trail completeness — terminal overrides now tracked in already_terminal_overrides
- BH37-M3: Calibrator novelty_N0 range constraint — added to _NONNEGATIVE_GATE_PARAMS
- BH37-L1 (Codex): PipelineConfig accepts float for integer sizing fields
- BH37-L2: ThreePointEstimate accepts bool values — added isinstance guard
- BH37-L3: DriftMonitor window_days missing integer type check
- Test Coverage: 2685 tests passing, 94.76% coverage (2 skipped, +26 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.44 (2026-02-20)¶
Bug Hunt #36 (Hybrid): 6 bugs (4M, 2L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 17 regression tests
- BH36-M1 (Codex):
lambda_handler.py—orpattern forestimated_impact/phasebypasses type check for falsy non-string values (False, 0) - BH36-M2:
repository.py—mark_completed(final_state="aborted")injects non-enum state into serializedrequest.state, breakingfrom_dict()deserialization - BH36-M3:
cli.py—orpattern forestimated_impact— same class as M1 (transport parity) - BH36-M4:
mcp_server.py—orpattern forestimated_impact— same class as M1 (transport parity) - BH36-L1:
complexity.py—compute_complexity_taxmissing bool guard forphi_S/phi_D - BH36-L2:
lambda_handler.py/cli.py—proposal_summaryorpattern accepts falsy non-strings
QG Ultrathink: 2 findings (2L) — Lambda + MCP action_description or "" patterns replaced with type-safe defaults; no new tests (non-governance documentation field)
Test metrics: 2659 tests, 94.74% coverage (2 skipped)
v4.5.43 (2026-02-20)¶
Bug Hunt #35 (Hybrid): 6 bugs (4M, 2L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 22 regression tests
- BH35-M1 (Codex):
override.py—check_and_mark_expired()downgrades APPROVED/REJECTED terminal states to EXPIRED after wall-clock expiry - BH35-M2:
rbac.py—NO_UNILATERAL_OVERRIDEconstraint passes for NaNsigner_count(IEEE 754:NaN < 2isFalse) - BH35-M3:
pipeline.py—PipelineConfig.flush_interval_secondshas no validation (zero/NaN/bool degrades flushing) - BH35-M4:
emitter.py—BatchHTTPSink.flush_interval_secondshas no validation (same pattern as M3) - BH35-L1:
pipeline.py—PipelineConfig.buffer_size/flush_thresholdaccept bool (isinstance(True, int)is True) - BH35-L2:
encryption.py—DEKCache.ttl_secondsaccepts zero/negative/bool without validation
QG Ultrathink (BH35 session): 4 findings (4L) — BatchHTTPSink/HTTPEventSink missing bool guards + timeout/retry_delay validation; +19 regression tests
Test metrics: 2642 tests, 94.79% coverage (0 skipped)
v4.5.42 (2026-02-20)¶
Bug Hunt #34 — Hybrid Architecture¶
- Bug Hunt #34 (Hybrid): 5 bugs (4M, 1L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 14 regression tests
- BH34-M1: DriftMonitor(num_bins=float) accepted, crashes later (Codex finding)
- BH34-M2: CLI cmd_evaluate missing TypeError in config catch
- BH34-M3: DualSignatureValidator.expiration_hours missing upper bound
- BH34-M4: TelemetryPipeline._worker_loop inconsistent state on raise error
- BH34-L1: AegisConfig.from_dict() telemetry_url type coercion gap
- Test Coverage: 2601 tests passing, 94.79% coverage (0 skipped, +14 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.41 (2026-02-20)¶
Bug Hunt #33 — Hybrid Architecture¶
- Bug Hunt #33 (Hybrid): 5 bugs (5M) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 15 regression tests
- BH33-M1: config._parse_flat_numeric silently accepts non-numeric types (list, dict) —
isinstance(x, (int, float))guard missing; list/dict crashfloat()in_DIRECTflat numeric parsing - BH33-M2: config._from_raw_dict silently accepts non-numeric types for DIRECT params — YAML schema validates but direct dict construction bypasses;
epsilon_R/beta/etc accept list/dict - BH33-M3: DriftMonitor.evaluate() passes unfiltered current_window to _to_histogram() — NaN/Inf values corrupt histogram bins;
math.isfinite()filter missing (upstream filter only inset_baseline) - BH33-M4: OverrideWorkflow.init doesn't defensive-copy failed_gates list — external mutation of input list modifies workflow state; use
list(failed_gates) - BH33-M5: mark_completed() doesn't sync state_data with final_state (Codex finding) —
state_data["status"]remains old value afterfinal_stateupdate; audit trail inconsistency - Test Coverage: 2587 tests passing, 94.80% coverage (0 skipped, +15 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.40 (2026-02-20)¶
Bug Hunt #32 — Hybrid Architecture¶
- Bug Hunt #32 (Hybrid): 3 bugs (2M, 1L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 20 regression tests
- BH32-M1: DriftMonitor constructor accepts negative/Inf thresholds —
update_thresholds()validatesisfinite+ non-negative but__init__()only callsvalidate_threshold_ordering(); negative tau causes false CRITICAL, Inf tau disables detection - BH32-M2: Calibrator
_validate_gate_paramallows negative threshold params (complexity_floor,quality_min_score,novelty_threshold,utility_threshold) — negative values make gates trivially pass;complexity_flooris non-overridable (governance bypass via calibration) - BH32-L1: KLDriftConfig
__post_init__missingwindow_daysvalidation —_parse_kl_drift_dictvalidates>= 1for YAML loading but direct construction accepts 0 or negative - Test Coverage: 2572 tests passing, 94.80% coverage (0 skipped, +20 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.39 (2026-02-20)¶
Bug Hunt #31 — Hybrid Architecture + QG73¶
- Bug Hunt #31 (Hybrid): 4 bugs (1M, 3L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 15 regression tests
- BH31-M1: MCP
emit_mcp_invocationcaller_id non-string guard — null/non-string passed to structured audit log - BH31-L1: Lambda
_coalesce_thresholdsdict.get() null gotcha — JSON null bypasses default value - BH31-L2: ConsensusConfig
timeout_hoursfractional minimum — accepts Inf/0.0, should require > 0 - BH31-L3: DualSignatureValidator
expiration_hoursfractional minimum — accepts Inf/0.0, should require > 0 - Quality Gate QG73 Ultrathink: 2 findings (1M, 1L), 7 regression tests
- QG73-L1: CLI
agent_id/session_idisinstance guard — transport parity with MCP/Lambda non-string rejection - QG73-M1: AFABridge
default_timeout_hoursfractional minimum — accepts Inf/0.0, should require > 0 - Quality Gate QG74 Ultrathink: 1 cosmetic fix
- QG74-L1: MCP
tool_namedict.get() null gotcha —params.get("name", "")returns None for null key - Test Coverage: 2552 tests passing, 94.80% coverage (0 skipped, +22 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.38 (2026-02-19)¶
Bug Hunt #30 — Hybrid Architecture + QG72¶
- Bug Hunt #30 (Hybrid): 5 bugs (2M, 3L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 8 regression tests
- BH30-M1: Lambda
proposal_summarydict.get()null gotcha — JSON null bypasses default (transport parity) - BH30-M2: AFABridge
get_decision_history()accepts floatlimit— crashes during list slicing (Codex finding) - BH30-M3: CLI
risk_scorenull →float(None)TypeError — extracted_coerce_risk_score()helper - BH30-L1: Lambda
action_descriptiondict.get()null gotcha — same pattern as BH30-M1 - BH30-L2: Lambda
estimated_impactdict.get()null gotcha — None passes to isinstance check - BH30-L3: TelemetryPipeline config mutation — shared PipelineConfig mutated by pii_encryptor setup; defensive
copy.copy() - Quality Gate QG72 Ultrathink: 4 findings (2M, 2L), 4 regression tests
- QG72-M1: CLI
proposal_summarydict.get()null gotcha — transport parity with Lambda/MCP - QG72-M2: MCP
action_descriptiondict.get()null gotcha — None passed to risk check - QG72-L1: CLI
estimated_impactdict.get()null gotcha — None bypasses string type check - QG72-L2: Lambda
phasedict.get()null gotcha — None passed to.lower()TypeError - Test Coverage: 2530 tests passing, 94.76% coverage (0 skipped, +12 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.37 (2026-02-18)¶
Bug Hunt #29 — Hybrid Architecture + QG71¶
- Bug Hunt #29 (Hybrid): 8 bugs (3M, 5L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 21 regression tests
- BH29-M1:
estimated_impactcase-sensitive comparison —"HIGH"or"Critical"bypass human oversight gates silently - BH29-M2: Executor
start_executionreadsprogress.statusoutside lock after publishing — TOCTOU race - BH29-M3: Calibrator
novelty_kmissing from_NONZERO_GATE_PARAMS— zero value weakens governance without validation error - BH29-L1: Config
mcp_rate_limitmissingmath.isfinite()guard — NaN/Inf pass through to int() silently - BH29-L2: MCP phase string not lowercased before
_PHASE_MAPlookup — transport parity gap with CLI/Lambda - BH29-L3: Lambda/CLI non-string
estimated_impactsilently bypasses impact classification — transport parity fix - BH29-L4: Pipeline
stop()drain loop aborts on first storage error — remaining queued events lost - BH29-L5:
PipelineConfig.flush_thresholdaccepts 0 or negative — triggers flush on every ingest - Quality Gate QG71 Ultrathink: 3 findings (3L), 5 regression tests
- QG71-L1: MCP
estimated_impactstr()cast silently accepts null/non-string — replaced withorguard - QG71-L2: Pipeline drain
except RuntimeErrortoo narrow — enricher exceptions crashstop(); broadened toexcept Exception - QG71-L3: MCP
proposal_summarynull →"None"string viadict.get()gotcha — addedor ""guard - Test Coverage: 2518 tests passing, 94.76% coverage (0 skipped, +26 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.36 (2026-02-18)¶
Bug Hunt #28 — Hybrid Architecture + QG70¶
- Bug Hunt #28 (Hybrid): 5 bugs (3M, 2L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 17 regression tests
- BH28-M1: ConsensusWorkflow
QUORUM_METstate never reverts toPENDINGwhen quorum lost via vote overwrite to DEFER - BH28-M2: Governance
initiate_overrideblocked by expired stale override — evict expired before rejecting - BH28-M3: CLI
risk_scorealias overwritesrisksimplified name — priority chain:risk_proposed>risk>risk_score - BH28-L1:
DriftMonitor.evaluate()window_sizeincludes non-finite values that don't contribute to KL divergence - BH28-L2: Config
quality_no_zero_subscorestring"false"treated as truthy — extracted_coerce_bool()helper - Quality Gate QG70 Ultrathink: 3 findings (3L), 5 regression tests
- QG70-L1:
_from_raw_dictskips_coerce_bool()for bool-allowed YAML fields - QG70-L2:
_coerce_bool()accepts NaN/Inf floats - QG70-L3:
set_baseline()passes raw data (including Inf) to histogram - Test Coverage: 2492 tests passing, 94.73% coverage (0 skipped, +22 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.35 (2026-02-17)¶
Quality Gate QG69 Ultrathink¶
- Quality Gate QG69 Ultrathink: 1 finding (1M), 7 regression tests
- QG69-M1: MCP + CLI
drift_baseline_datamissingmath.isfinite()— transport parity violation with Lambda (BH27-L4 was fixed there but not in MCP/CLI) - Test Coverage: 2470 tests passing, 94.73% coverage (0 skipped, +7 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.34 (2026-02-17)¶
Bug Hunt #27 — Hybrid Architecture¶
- Bug Hunt #27 (Hybrid): 4 bugs (3M, 1L) found by 3 Claude sweep agents + Codex gpt-5.3-codex; 13 regression tests
- BH27-M1:
resume_or_create()ID propagation — creation path now inspects constructor params and injects workflow_id - BH27-M2:
_from_raw_dictstring-to-float coercion — YAML quoted numerics passed through as str instead of float - BH27-M3: Lambda/MCP
agent_id/session_idnull bypass —dict.get()returns None for explicit JSON null - BH27-L4: Lambda
drift_baseline_datamissingmath.isfinite()— NaN/Inf caused 500 instead of 400 - Test Coverage: 2470 tests passing, 94.73% coverage (0 skipped, +13 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.33 (2026-02-17)¶
Scaffold Adoption — Compliance Infrastructure¶
- Scaffold Adoption: Integrated Engineering Standards ai_scaffold_package v2.1.1 (50 new files, zero conflicts)
- New directories:
ai/(governance artifacts),docs/compliance/(operational runbooks),schemas/(log schema),tools/ci/(9 validators),policy/(OPA),benchmarks/ - AI Governance: 8 artifacts pre-filled with AEGIS content (system-register, risk-register mapping to OWASP Agentic Top 10, model/data cards, oversight-plan with kill switch, postmarket-monitoring, AIMS-POLICY, technical_file/)
- Compliance Runbooks: 7 docs customized for AEGIS (system-description with Lambda+ECS architecture, BCP-DRP with RTO/RPO Tier 2, IRP with governance erosion containment, ACCESS-REVIEW with 4 AWS service accounts, VENDOR-RISK complete AWS assessment, CHANGE-MANAGEMENT with frozen parameter process, DSR-PRIVACY with 12 encrypted PII fields)
- Automation: compliance-evidence-scheduler.yml (monthly/quarterly/annual GitHub issue creation), 15 labels, PR + 7 issue templates
- CI/CD: 4 new workflows (scaffold-gates, codeql, compliance-nightly, compliance-evidence-scheduler), Makefile, .pre-commit-config.yaml (ELITE tier)
- Type Safety: Added type annotations to 9 tools/ci/*.py validators (mypy strict mode compliance)
- pyproject.toml: Added
[tool.standards](tier: ELITE, v2.1.1), tools/ci per-file ignores for scaffold style preferences - Test Coverage: 2448 tests passing, 94.83% coverage (0 skipped, no new tests — operational changes only)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest, AI RMF validators, AI Act lint, scaffold adoption validator)
- Placeholder Elimination: 274 → 0 in scaffold files (100% customization: compliance docs, AI artifacts, GitHub templates all AEGIS-specific)
v4.5.32 (2026-02-16)¶
Bug Hunt #26 (Hybrid)¶
- Method: 3 Claude sweep agents + Codex gpt-5.3-codex
- Findings: 4 bugs (3M, 1L), 18 regression tests
- BH26-M1:
validation.py—validate_positive()acceptsbool(True→1) due to Pythonbool⊂int(Codex finding) - BH26-M2:
bayesian.py—update_priorvariance overflow:sum((x-mean)**2)→infsilently bypasses guard - BH26-M3:
rbac.py—_check_bool_constraintNone value fail-open forpass_when_true=Falseconstraints (security) - BH26-L1:
complexity.py—compute_complexity_taxNaN/Inf propagation via delta dict values - 0 deferred bugs
- Test Coverage: 2463 tests passing, 94.83% coverage (0 skipped, +18 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.31 (2026-02-16)¶
Bug Hunt #25 (Hybrid)¶
- Method: 3 Claude sweep agents + Codex gpt-5.3-codex
- Findings: 6 bugs (3M, 3L), 18 regression tests
- BH25-M1:
analyst.py—_evaluate_utility_gateUtilityComponents fields crash on explicitNone(JSON null) - BH25-M2:
cli.py—_extract_risk_alias_and_subscoreskey-presence vs value-presence transport parity gap with Lambda/MCP - BH25-M3:
drift.py—_to_histogramconstant-range +-1.0 adjustment no-op for values > 2^53 (IEEE 754 precision) → ZeroDivisionError - BH25-L1:
analyst.py—risk_delta/profit_delta=Nonecrashes with TypeError (Codex finding) - BH25-L2:
bayesian.py—update_priorreturns(inf, inf)whensum()overflows (no OverflowError fromsum()) - BH25-L3:
config.py—from_dictaccepts string"nan"/"inf"bypassingisinstance(val, (int, float))NaN check - PLR0912 fix: extracted
_parse_flat_numeric()static helper fromfrom_dict() - 0 deferred bugs
- Test Coverage: 2430 tests passing, 94.81% coverage (0 skipped, +18 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.30 (2026-02-16)¶
Bug Hunt #24 (Hybrid)¶
- Method: 3 Claude sweep agents + Codex gpt-5.3-codex (1 iteration, 1 novel finding)
- Findings: 10 bugs (4M, 6L), 26 regression tests
- BH24-M1:
mcp_server.py—handle_request()returns response for JSON-RPC notifications (requests without"id") violating JSON-RPC 2.0 §4.1 (Codex finding) - BH24-M2:
rbac.py—NO_UNILATERAL_OVERRIDEsigner_count=None crashes with TypeError; guards None, bool, non-numeric (fail-closed) - BH24-M3:
analyst.py—_evaluate_quality_gatemissing null guard forquality_score(dict.get returns None for explicit null) - BH24-M4:
analyst.py—_evaluate_risk_gatemissing null guard forrisk_baseline - BH24-L1:
afa_bridge.py—_extract_metricsmissingisinstance(list)check forquality_subscores(transport parity gap) - BH24-L2:
afa_bridge.py—_extract_metricsmissingisinstance(UtilityResult)check forutility_result - BH24-L3:
config.py—KLDriftConfig.__post_init__accepts NaN/Inftau_warning/tau_criticalvia direct constructor - BH24-L4:
analyst.py—_evaluate_novelty_gatemissing null guard fornovelty_score - BH24-L5:
analyst.py—_evaluate_complexity_gatemissing null guard forcomplexity_score - BH24-L6:
analyst.py—_evaluate_profit_gatemissing null guard forprofit_baseline - PLR0912 fix: extracted
_validate_quality_subscores()static helper - 0 deferred bugs
- Test Coverage: 2412 tests passing, 94.80% coverage (0 skipped, +26 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.29 (2026-02-16)¶
AMTSS Protocol v1 — MCP Tool Schema Signing¶
- ROADMAP Item 20a(e) complete: Cryptographic signing of MCP tool schemas to detect rug pull attacks (CoSAI MCP-T6)
- Design via Claude-GPT dialogical collaboration (GPT 5.2 Pro xhigh reasoning, 3 substantive rounds)
- Research document:
docs/research/004-mcp-schema-signing-design.md(465 lines) - New module:
src/crypto/schema_signer.py—ToolSchemaSigner,SigningKeyPair,compute_tool_digest() - Protocol: per-tool + manifest dual signing, Ed25519, RFC 8785 canonicalization,
_metainline delivery - MCP integration:
_handle_tools_list()embeds proofs in_meta[com.aegis.governance/toolSchemaSigning] - MCP integration:
_handle_initialize()advertises keyset incapabilities.experimental.toolSchemaSigning - Graceful degradation: signer is best-effort (None if
cryptographynot installed) - Updated
crypto/__init__.pyexports:ToolSchemaSigner,SigningKeyPair,compute_tool_digest,AMTSS_*constants,SCHEMA_SIGNER_AVAILABLE - All 5 sub-items of ROADMAP 20a now complete (audit logging, rate limiting, TLS enforcement, CoSAI cross-reference, schema signing)
- Quality-Gate Ultrathink (prior session): 5 findings fixed (3 MEDIUM, 2 LOW) — manifest duplicate-name bypass,
_metastripping in digest, statement type validation,_prev_digestschain wiring, strict base64url decode; +7 regression tests - Quality-Gate QG67 Ultrathink: 4 additional findings fixed (2 MEDIUM, 2 LOW) — null sig crash in verify methods (
isinstanceguard), NaN/Inf canonicalization (allow_nan=False), manifest revision never incremented, MCP signing error log level (DEBUG→WARNING); +7 regression tests - Test Coverage: 2386 tests passing, 94.74% coverage (0 skipped, +82 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.28 (2026-02-16)¶
CoSAI MCP-T Cross-Reference¶
- ROADMAP Item 20a(d) complete: Added §11.4.1 CoSAI MCP Threat Model (MCP-T1..T12) cross-reference to CLAUDE.md
- Maps all 12 CoSAI MCP-specific threats to OWASP Agentic risks and AEGIS controls
- Coverage: 9/12 STRONG, 2/12 MODERATE, 1/12 PARTIAL (MCP-T7 transport security)
- Source:
docs/research/003-mcp-security-ecosystem-review.md - ROADMAP 20a(d) checkbox marked complete
- No code changes — documentation only
- Test Coverage: 2304 tests passing, 94.63% coverage (unchanged)
v4.5.27 (2026-02-16)¶
Bug Hunt #23 (Hybrid)¶
- 7 bugs found (3M, 4L) by 3 Claude sweep agents; Codex gpt-5.3-codex (1 iteration, 0 novel findings — 1 duplicate of QG66-UT2)
- 29 regression tests added
- Scope: Transport parity (CLI), input validation, thread safety, consensus logic, key store TOCTOU
- BH23-M1:
cli.py—_load_drift_monitormissing bool guard for drift baseline elements (transport parity with Lambda/MCP) - BH23-M2:
cli.py—_extract_risk_alias_and_subscorestreats explicit[]as falsy → returns defaults instead of empty list (transport parity) - BH23-M3:
calibrator.py—_evict_old_proposals()called outsideself._lock— race condition onself.proposalsdict - BH23-L1:
cli.py—_extract_risk_alias_and_subscoresmissingisinstance(list)type check for quality_subscores - BH23-L2:
engine/bayesian.py—BayesianPosterioraccepts NaN/Infprior_meanin constructor and override paths - BH23-L3:
consensus.py—check_timeout()returnsTruefor finalized (APPROVED/REJECTED) workflows past deadline - BH23-L4:
key_store.py—get_private_key/get_public_keymissing_audit_lock— TOCTOU race withrevoke_key() - 0 deferred bugs
- Test Coverage: 2304 tests passing, 94.63% coverage (0 skipped, +29 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.26 (2026-02-15)¶
Quality-Gate QG66 Ultrathink¶
- 2 findings (2L), 2 regression tests
- UT-1:
mcp_server.py—_validate_quality_subscorestreats empty list[]as default[0.7, 0.7, 0.7](transport parity with Lambda) - UT-2:
mcp_server.py—_validate_quality_subscoresmissing try/except onfloat()— non-numeric strings crash server - Test Coverage: 2275 tests passing, 94.63% coverage (0 skipped, +2 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.25 (2026-02-15)¶
Bug Hunt #22 (Hybrid)¶
- 8 bugs found (4M, 4L) by 3 Claude sweep agents; Codex gpt-5.3-codex (3 iterations, 0 novel findings)
- 20 regression tests added
- Scope: Override lifecycle, transport parity, input validation, bounded collections, type safety
- BH22-M1:
override.py—reject()missing wall-clock expiration check before state mutation (Codex) - BH22-M2:
mcp_server.py—quality_subscoresmissing extraction (transport parity with Lambda/CLI) - BH22-M3:
drift.py—DriftMonitor.update_thresholds()missing finiteness + non-negativity validation - BH22-M4:
persistence/repository.py—mark_completed()allows re-completing already-completed workflows - BH22-L1:
lambda_handler.py+mcp_server.py—drift_baseline_databool guard (transport parity) - BH22-L2:
governance.py—active_overridesdict grows unbounded (expired overrides never evicted) - BH22-L3:
afa_bridge.py—_evaluate_authorizationstring-as-iterable (set("admin")→ char explosion) - BH22-L4:
analyst.py—_evaluate_quality_gatecrashes on explicitNonequality_subscores (Codex) - 0 deferred bugs
- Test Coverage: 2273 tests passing, 94.64% coverage (0 skipped, +20 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.24 (2026-02-15)¶
Bug Hunt #21 (Hybrid)¶
- 8 bugs found (3M, 5L) by 3 Claude sweep agents; Codex gpt-5.3-codex (3 iterations, 0 novel findings)
- 16 regression tests added
- Scope: Config validation, transport parity, bounded collections, telemetry, HTTP compliance
- BH21-M1:
config.py—KLDriftConfigmissing__post_init__threshold ordering validation - BH21-M2:
lambda_handler.py—_validate_subscoresmissing bool guard (transport parity) - BH21-M3:
afa_bridge.py—_extract_metricsmissingquality_subscoreselement validation (bool/NaN/Inf) - BH21-L1:
drift.py—DriftMonitorwindow_daysaccepts zero/negative - BH21-L2:
calibrator.py—proposalsdict grows unbounded (added_evict_old_proposals) - BH21-L3:
emitter.py—emit_shadow_evaluationpayload key collision (statusoverwrite viadict.update) - BH21-L4:
prometheus_exporter.py—set_drift_statusunbounded Prometheus label cardinality - BH21-L5:
mcp_server.py— MCP HTTP 405 missingAllowheader (RFC 9110 §15.5.6) - 0 deferred bugs
- Test Coverage: 2253 tests passing, 94.63% coverage (0 skipped, +17 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.23 (2026-02-15)¶
Bug Hunt #20 (Hybrid)¶
- 9 bugs found (7M, 2L) by Codex gpt-5.3-codex (3 iterations) + 3 Claude sweep agents
- 18 regression tests added
- BH20-M1:
persistence/durable.py—resume_all_pending()crashes on non-dictstate_data - BH20-M2:
override.py—from_dict()mutablefailed_gateslist sharing - BH20-M3:
override.py(8 sites) — non-strictbase64.b64decodeaccepts garbage - BH20-M4:
consensus.py—eligible_votersset stored by reference - BH20-M5:
consensus.py—timeout_hoursunbounded →timedeltaOverflowError - BH20-M6:
pcw_decide.py—_hash_summary/_build_decision_tracecrash on non-string/non-mapping inputs - BH20-M7:
encryption.py—EncryptedField.from_dict()permissive base64 decode - BH20-L1:
config.py—window_daysmissing negative/overflow validation - BH20-L2: Lambda/MCP/CLI (4 files) — transport parity: float helpers missing bool guard
QG65 Ultrathink¶
- 5 additional fixes from deep analysis phase
- CLI
risk_scorealias path missing bool guard —Truesilently became1.0 - CLI
quality_subscoreselements missing bool guard — booleans passed as floats - Lambda
_parse_bodybase64 decode missingvalidate=True— non-base64 chars accepted - Crypto providers strict base64:
ed25519_provider.py(3 calls),bip322_provider.py(1 call),kek_provider.py(2 calls) — all upgraded tovalidate=True - PLR0912 fix: extracted
_extract_risk_alias_and_subscores()helper incli.py - 4 regression tests added
- Test Coverage: 2236 tests passing, 94.68% coverage (0 skipped, +22 new from BH20+QG65)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.22 (2026-02-15)¶
Rigor: Resolve All Deferred Bugs¶
- Rigor: Resolve All Deferred Bugs: Fixed BH16-L5, closed BH15-L6 — 0 deferred bugs remaining
- BH16-L5 FIXED:
WorkflowTransition.verify_hash()standalone false negatives for non-first transitions — addedprevious_hashcolumn to model, updated_record_transition()to populate it,verify_hash()now falls back to stored value - BH15-L6 CLOSED (by-design): Lambda telemetry_emitter wiring gap — Lambda uses AWS-native observability (CloudWatch Logs, X-Ray); adding HTTPEventSink would create redundant double-logging and SSRF attack surface
- 8 regression tests added (6 model-level + 2 repository integration)
- Test Coverage: 2214 tests passing, 94.68% coverage (0 skipped, +8 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.21 (2026-02-14)¶
Bug Hunt #19 (Hybrid)¶
- Bug Hunt #19 (Hybrid): 5 bugs (2M, 3L) found by Codex gpt-5.3-codex (3 iterations) + 3 Claude sweep agents; 12 regression tests
- BH19-M1:
proposal.py—from_dict()shares mutable references with input dict (list aliasing for tags/related_proposals, dict aliasing for transition metadata/gate_results) — caller mutations corrupt workflow state - BH19-M2:
override.py+key_store.py—sign_with_stored_key()key rotation TOCTOU — private/public keys fetched in separate calls without version pinning;record_usage()also lacked version parameter - BH19-L1:
afa_bridge.py—_coalesce_floatmissing bool guard (bool-is-int subclass,Truesilently becomes1.0) - BH19-L2:
afa_bridge.py—_evaluate_executionaccepted non-booleanproposal_approved/has_execution_plan(Python truthiness) - BH19-L3:
afa_bridge.py—_evaluate_authorizationcrashes withset(None)when authorization lists are JSON null - 0 deferred bugs
- Test Coverage: 2206 tests passing, 94.68% coverage (0 skipped, +12 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.20 (2026-02-14)¶
Bug Hunt #18 (Hybrid)¶
- Bug Hunt #18 (Hybrid): 7 bugs (3M, 4L) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 25 regression tests
- BH18-M1:
lambda_handler.py— non-boolean control flags accepted via rawbody.get()(Python truthiness confusion) - BH18-M2:
config.py—from_dict/_from_raw_dictflat keys lack NaN/Inf validation (parity gap with nested sections) - BH18-M3:
cli.py— CLI non-boolean control flags (transport parity with Lambda) - BH18-L1:
bayesian.py—BayesianPosterior.update_prior()acceptedddof=True(bool is int subclass) - BH18-L2:
config.py—from_dictnovelty keys lack NaN/Inf validation - BH18-L3:
consensus.py—ConsensusConfigmissing bool guards onquorum_percentage/approval_threshold - BH18-L4:
afa_bridge.py—AFABridge(default_timeout_hours=True)bypassesisfinite()check - 0 deferred bugs
- Test Coverage: 2194 tests passing, 94.61% coverage (0 skipped, +25 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.19 (2026-02-14)¶
Bug Hunt #17 (Hybrid)¶
- Bug Hunt #17 (Hybrid): 6 bugs (1M, 5L) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 13 regression tests
- BH17-M1:
afa_bridge.py—_evaluate_risk_checkused rawcontext.get()instead of_coalesce_float(), allowing None/NaN/Inf through (transport parity gap) - BH17-L1:
config.py—_extract_nested_floatsmissingisfinite()validation afterfloat()cast - BH17-L2:
config.py—_from_raw_dictkl_drift parsing lacked NaN/Inf validation; replaced inline code with_parse_kl_drift_dict()helper for parity - BH17-L4:
serialization.py—ensure_utcpreserved non-UTC timezone offsets instead of converting to UTC - BH17-L5:
emitter.py—BatchHTTPSinkaccepted negativemax_retries(silent event drops on flush) - BH17-L6:
governance.py—emergency_haltreported already-rejected overrides as cancelled - 0 deferred bugs
- Test Coverage: 2169 tests passing, 94.60% coverage (0 skipped, +13 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.18 (2026-02-14)¶
Quality Gate #62 (Ultrathink)¶
- Quality Gate #62 (Ultrathink): 6 findings (1M, 5L) from BH16 post-fix audit; 11 regression tests
- QG62-M1:
afa_bridge.py—_coalesce_float()missingisfinite()guard (transport parity with CLI/Lambda);risk_proposed/profit_proposedbarefloat()without validation - QG62-L1:
config.py—from_dictkl_drift float fields lacked NaN/Inf validation;window_daysnot coerced to int (parity with_from_raw_dict); extracted_parse_kl_drift_dict()helper - QG62-L2:
lambda_handler.py—quality_subscoresnull guard missing (dict.getreturns None, not default, when key exists with null) - 0 deferred bugs
- Test Coverage: 2156 tests passing, 94.58% coverage (0 skipped, +11 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.17 (2026-02-14)¶
Bug Hunt #16 (Hybrid)¶
- Bug Hunt #16 (Hybrid): 9 bugs (4M, 5L) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 22 regression tests
- BH16-M1:
lambda_handler.py— non-dict metadata → 500 instead of 400; added type validation - BH16-M2:
config.py—from_dictmissing bool guard forkl_driftvalues (parity with_from_raw_dict) - BH16-M3:
afa_bridge.py—_evaluate_proposalcrashes on None context values; extracted_coalesce_float()+_extract_metrics()helpers - BH16-M4:
consensus.py— deadlock when all voters voted but neither threshold met; added rejection fallback - BH16-L1:
mcp_server.py—_drain_request_bodypartial drain withoutclose_connection = True - BH16-L2:
config.py—mcp_rate_limitmissing bool guard in bothfrom_dictand_from_raw_dict - BH16-L3:
afa_bridge.py—_evaluate_authorizationnon-deterministic set ordering in rationale/next_steps - BH16-L4:
complexity.py— negative normalized values not clamped to [0, 1] - BH16-L5:
persistence/models.py— WorkflowTransition.verify_hash() false negatives (deferred — requires schema change) - 1 deferred bug (BH16-L5)
- Test Coverage: 2145 tests passing, 94.56% coverage (0 skipped, +22 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.16 (2026-02-14)¶
Bug Hunt #15 (Hybrid) + Quality Gate #61 (Ultrathink)¶
- Bug Hunt #15 (Hybrid): 8 bugs (2M, 6L) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 22 regression tests
- Quality Gate #61 (Ultrathink): 7 findings (4M, 3L) — 5 fixed + 8 regression tests
- Transport parity: CLI
observation_valuessanitization - 0 deferred bugs
- Test Coverage: 2123 tests passing, 94.53% coverage (0 skipped, +22 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.15 (2026-02-13)¶
Bug Hunt #14 (Hybrid)¶
- 3 bugs (3M) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 10 regression tests
- BH14-M1:
consensus.py—ConsensusConfig(timeout_hours=True)silently creates 1-hour deadline (Pythonboolis subclass ofint,True == 1); addedisinstance(bool)guard in__post_init__ - BH14-M2:
override.py—DualSignatureValidator(expiration_hours=0)silently creates instantly-expired overrides; NaN/Inf crash downstream atOverrideWorkflow.__init__; added full validation (bool, isfinite, positivity) in__init__ - BH14-M3:
lambda_handler.py—quality_subscoreslackedisfinite()guard (CLI had it, Lambda didn't); extracted_validate_subscores()helper for transport parity - 0 deferred bugs
- Test Coverage: 2101 tests passing, 94.54% coverage (0 skipped, +10 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.14 (2026-02-13)¶
Rigor Close Deferrals v3¶
- Closed all 5 deferred bugs (1 fixed, 4 documented/accepted-risk)
- BH12-L2:
afa_bridge.py—default_timeout_hoursNaN/Inf passthrough;math.isfinite()guard added. 3 regression tests. - QG60-6:
emitter.py— BatchHTTPSink stats counters outside lock. CLOSED: CPython GIL atomic; inline documentation added. - QG60-7:
consensus.py— Rejection threshold indeterminate. CLOSED: By-design; timeout handles. - QG60-8:
consensus.py—cast_vote()no thread lock. CLOSED: Single-threaded by design; thread-safety note added to docstring. - QG60-9:
governance.py— No un-halt mechanism. CLOSED: Intentional one-way safety mechanism. - 0 deferred bugs remaining
- Test Coverage: 2091 tests passing, 94.52% coverage (0 skipped, +3 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.13 (2026-02-13)¶
Bug Hunt #13 (Hybrid)¶
- 7 bugs (4M, 3L) found by 3 Claude sweep agents; Codex gpt-5.3-codex xhigh timed out (60+ min, no output); 16 regression tests
- BH13-M1:
lambda_handler.py— Eager evaluation ofrisk_scorefallback causes crash whenrisk_proposedis valid butrisk_scoreis NaN/Inf - BH13-M2:
config.py—from_dictnullkl_driftvalues pass through toKLDriftConfig, causing TypeError in DriftMonitor construction - BH13-M3:
cli.py— Transport-layer parity gap: missingisfinite()guard on float conversions (Lambda/MCP have it, CLI didn't) - BH13-M4:
mcp_server.py— POST 403 origin rejection doesn't drain request body (same class as QG60-3) - BH13-L1:
config.py—from_dictmcp_rate_limit=Nonecausesint(None)TypeError - BH13-L2:
override.py—to_dict()leaks mutablefailed_gatesreference (same class as BH12-C1) - BH13-L3:
mcp_server.py— Invalid/negative Content-Length doesn't close connection - Deferred: BH12-L2 (AFABridge.default_timeout_hours NaN/Inf — carried forward)
- Test Coverage: 2088 tests passing, 94.52% coverage (0 skipped, +16 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.12 (2026-02-13)¶
Quality-Gate Ultrathink (QG60)¶
- 5 fixes from 9 findings (3M, 2L fixed; 4L deferred)
- QG60-1:
validation.py—validate_positive()accepts Inf (FAIL-OPEN: Inf epsilon disables risk/profit gates);isnan→isfinite - QG60-2:
utility.py—UtilityCalculatorgamma/kappa/migration_budget no Inf guard (NaN contamination viaInf * 0.0); isfinite validation - QG60-3:
mcp_server.py— POST 404 catch-all doesn't consume request body (HTTP/1.1 persistent connection corruption);_drain_request_body()helper - QG60-4:
mcp_server.py— 413 oversized request doesn't consume body;close_connection = Trueto force connection close - QG60-5:
utility.py—ThreePointEstimateaccepts Inf values (OverflowError or NaN LCB); isfinite__post_init__ - SDK facade: Added
CalibratorandGovernanceactor exports toaegis_governance.__init__ - Deferred: QG60-6 (BatchHTTPSink stats counters outside lock — advisory), QG60-7 (ConsensusWorkflow rejection threshold — by-design), QG60-8 (cast_vote no thread lock — single-threaded), QG60-9 (no un-halt — intentional one-way)
- Resolves deferred BH12-L1 (MCP HTTP POST body drain)
- Test Coverage: 2072 tests passing, 94.50% coverage (0 skipped, +19 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.11 (2026-02-12)¶
Bug Hunt #12 (Hybrid)¶
- 10 bugs (1H, 7M, 2L) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 22 regression tests
- BH12-H1:
gates.py— GateEvaluator NaN threshold params cause governance lockout (complexity_floor NaN blocks all proposals with no override path) - BH12-M1:
gates.py— NaN for novelty_N0/k/threshold, quality_min_score, utility_threshold (isfinite validation loop) - BH12-M2:
complexity.py—analyze()NaN metric silent 1.0 viamin()order-dependence (isfinite guard) - BH12-M3:
complexity.py—complexity_floorvalidate_range missingcheck_nan=True - BH12-M4:
lambda_handler.py—_floathelper NaN/Inf passthrough (parity gap with MCP_float_arg) - BH12-M5:
lambda_handler.py—_handle_risk_checkNaN/Inf produces invalid JSON response - BH12-M6:
executor.py—ExecutionPlan.timeout_secondsNaN/Inf bypass IEEE 754 - BH12-M7:
calibrator.py—CalibrationProposal.data_windowno type/range validation - BH12-M8:
config.py—_from_raw_dictnull YAML values for_DIRECTparams +from_dictnull novelty/flat-key params - BH12-C1:
proposal.py—to_dict()leaks mutable references to internal state (Codex) - Deferred: BH12-L1 (MCP HTTP POST body on 404/403), BH12-L2 (AFABridge.default_timeout_hours NaN/Inf)
- Test Coverage: 2053 tests passing, 94.52% coverage (0 skipped, +22 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.10 (2026-02-12)¶
Quality-Gate Ultrathink (QG59)¶
- 12 fixes from 21 findings (8M, 4L); 9 deferred (3M design-gap, 6L advisory)
- QG59-P1-1:
gates.py— NaN trigger_factor bypasses zero-check (NaN guard) - QG59-P1-2:
gates.py—trigger_confidence_prob > 1.0disables governance FAIL-OPEN (validate_range) - QG59-P1-4:
config.py— YAML null values crash_extract_nested_floats(None guard + KL null-coalesce) - QG59-P2-1:
calibrator.py— CalibrationProposal accepts NaN/Inf (__post_init__validation) - QG59-P2-2:
analyst.py—_coerce_to_floataccepts "nan"/"inf" strings +_calculate_confidenceaverages NaN (isfinite guards) - QG59-P2-3:
proposer.py— PERT estimates NaN/Inf passthrough (isfinite guard) - QG59-P3-1:
mcp_server.py—_float_argpasses NaN/Inf (isfinite guard) - QG59-P3-3:
emitter.py— wrong event counted as dropped (track evicted oldest) - Test Coverage: 2031 tests passing, 94.52% coverage (0 skipped, +22 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
v4.5.9 (2026-02-12)¶
Bug Hunt #11 (Hybrid)¶
- 10 bugs (8M, 2L) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 12 regression tests
- BH11-M1:
cli.py—quality_subscoresnull crash (null-coalesce + fallback) - BH11-M2:
cli.py— non-stringphasecrash (isinstanceguard) - BH11-M3:
calibrator.py— wrong capability check (can_evaluate_gates→can_configure) - BH11-M4:
governance.py—emergency_haltdoesn't cancel active overrides - BH11-M5:
consensus.py— NaNtimeout_hourspassthrough (math.isfiniteguard) - BH11-M6:
mcp_server.py— POST/healthunconsumed body corruption (removed route) - BH11-M7:
pipeline.py—_encrypt_pii_fieldsT-5 bypass (pass captured encryptor) - BH11-M8:
emitter.py—BatchHTTPSinkbatch_size=0silent data loss (Codex) - BH11-L1:
utility.py—lcb_alphaNaN passthrough (check_nan=True) - BH11-L2:
mcp_server.py— stdio size check includes newline (strip first) - Test Coverage: 2009 tests passing, 94.49% coverage (0 skipped, +12 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changelog (4.5.8)¶
- Quality-Gate Ultrathink (QG58): Docs sync phase — comprehensive test metric update across all documentation files
- Updated version numbers: CLAUDE.md v4.5.7 → v4.5.8, ROADMAP v1.44.0 → v1.45.0, gap-analysis v1.49.0 → v1.50.0, repository-structure v2.10.0 → v2.11.0
- Test metrics synchronized: 1987 tests, 94.45% coverage → 1997 tests, 94.47% coverage (10 new tests, +0.02% coverage)
- Updated 9 documentation files: CLAUDE.md, README.md, ROADMAP.md, gap-analysis.md, KNOWN_ISSUES.md, repository-structure.md, test-count-methodology.md, comprehensive-todo-discovery.md, changelog.md
- Changelog entries added to CLAUDE.md §9, docs/claude/changelog.md, and ROADMAP.md §Changelog
- Test coverage: 1997 tests passing, 94.47% coverage (0 skipped)
- Quality gates: All passing (ruff, black, mypy, bandit, pytest)
Changelog (4.5.7)¶
- Bug Hunt #10 (Hybrid): 7 bugs (5M, 2L) found by Codex gpt-5.3-codex + 3 Claude sweep agents; 9 regression tests
- BH10-M1:
validation.py—validate_positiveNaN pass-through (IEEE 754:NaN <= 0is False). Fix:math.isnan()guard - BH10-M2:
validation.py—validate_threshold_orderingNaN pass-through. Fix:math.isnan()guard on both values - BH10-M3:
mcp_server.py— stdio transport missing_MAX_REQUEST_BYTESsize limit (HTTP had it). Fix:len(line)check - BH10-M4:
cli.py— null JSON metric values crash (data.get(key, default)returns None, not default). Fix: null-coalesce - BH10-M5:
lambda_handler.py— non-stringphasevalue causesAttributeErroron.lower(). Fix:isinstance(str)guard - BH10-L1:
governance.py—emergency_haltnon-atomic state mutation (no lock). Fix:with self._lock:on write + read - BH10-M7:
lambda_handler.py— non-numericdrift_baseline_datacrash onfloat(). Fix: try/except ValueError/TypeError - Quality-Gate Ultrathink (QG57): 2 additional fixes from Phase 2 ultrathink
- QG57-M1:
mcp_server.py— drift baseline non-numeric crash (same pattern as lambda, not propagated to MCP). Fix: try/except - QG57-M2:
governance.py— TOCTOU ininitiate_override/add_override_signature(halt check outside lock). Fix: moved inside lock - Test Coverage: 1987 tests passing, 94.45% coverage (0 skipped, +9 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changelog (4.5.6)¶
- Quality-Gate Ultrathink (QG56): 4 fixes from 13 findings (1H, 5M, 5L, 2I)
- QG56-M2: stdio transport now supports JSON-RPC batch arrays via
handle_batch() - QG56-M3:
WebhookAlertSinkTLS enforcement via_validate_sink_url()+allow_insecureparam (breaking:http://URLs now requireallow_insecure=True) - QG56-M4:
_validate_sink_url()strips URL whitespace beforeurlparse()to prevent hostname TOCTOU - QG56-L5:
mcp_rate_limitclamped tomax(0, ...)in bothfrom_dict()and_from_raw_dict() - Test Coverage: 1978 tests passing, 94.47% coverage (0 skipped, +14 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changelog (4.5.5)¶
- TLS Enforcement (ROADMAP Item 20a(c)):
_validate_sink_url()helper enforces HTTPS onHTTPEventSinkandBatchHTTPSinkwithallow_insecure: bool = Falsekeyword-only escape hatch for local development; MCP_ALLOWED_TELEMETRY_SCHEMESrestricted from{"http", "https"}to{"https"}; CLI catchesValueErrorin_build_telemetry_emitter(); production guide TLS section added; Research 003 G2 status → ADDRESSED. Closes CoSAI MCP-T7 (Transport Security) gap. - Parameter Cookbook (ROADMAP Item 16): (a)
docs/integration/parameter-reference.md— comprehensive parameter reference with derivation guidance, domain examples, boundary behavior for all inputs; (b)docs/integration/domain-templates.md— 4 worked examples (trading, CI/CD, content moderation, autonomous agent) with parameter mapping tables, JSON inputs, gate-by-gate walkthroughs; (c) MCP tool descriptions enriched with semantic context,minimum/maximumJSON Schema constraints,instructionsfield in initialize response - Test Coverage: 1964 tests passing, 94.47% coverage (0 skipped, +12 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changelog (4.5.4)¶
- MCP Hardening Phase 1 (ROADMAP Item 20a): Token bucket rate limiter + structured audit logging for all MCP tool invocations
_MCPRateLimiter: stdlib-only token bucket (capacity/rate), thread-safe viathreading.Lock, configurable viaAegisConfig.mcp_rate_limit(default: 60 req/min, 0 to disable)emit_mcp_invocation(): structured audit event on everytools/call— ALLOW/DENY/ERROR decision, SHA-256 params_hash (PII-safe), latency, caller_id- Telemetry schema v2.2.0: added
mcp.tool_invocationevent definition (6 fields) - Closes CoSAI MCP-T10 (resource management) and MCP-T12 (logging/audit) gaps
- Test Coverage: 1948 tests passing, 94.59% coverage (0 skipped, +25 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changelog (4.5.3)¶
- MCP Streamable HTTP Transport (ROADMAP Item 23): Implemented MCP Streamable HTTP transport per 2025-03-26 spec using stdlib
http.server(zero new dependencies). Protocol version updated to2025-03-26.--transport httpflag with--host,--port,--allowed-originsoptions. POST/mcpsupports JSON-RPC single + batch dispatch. GET/mcpreturns 405 (SSE not implemented)./healthendpoint for container health checks. Origin validation: fail-closed for non-localhost, permissive for localhost. Infrastructure: Dockerfile exposes 8080 with HTTP CMD; ECS stack replaces keepalive loop with HTTP server; internal ALB (:80 → :8080); ALB 5xx CloudWatch alarm. Deferred: SSE streaming, session management, resumability (all tools are synchronous/stateless). KNOWN_ISSUES.md ECS limitation marked RESOLVED. ADR-007 diagram updated. - Security Hardening (Ultrathink): 8 findings fixed (1 HIGH, 3 MEDIUM, 4 LOW) with 18 regression tests — U-1 SSRF protection on
telemetry_url(scheme whitelist + private IP blocking), U-2/U-3 Content-Length validation (non-numeric, negative), U-4 error message sanitization (no exception details to clients), U-7 non-dict batch item rejection, U-8 empty batch[]error response, U-9 Origin validation on all endpoints (GET+POST), U-10handle_requestreturn type correctness - H-1 SSRF Hex/Decimal IP Bypass Fix:
_validate_telemetry_url()now uses resolve-then-validate viasocket.getaddrinfo()in theexcept ValueErrorbranch — blocks hex (0x7f000001), decimal (2130706433), and DNS-to-private bypasses; extracted_is_forbidden_ip()helper usingnot addr.is_global(covers CGNAT 100.64/10 range missed by 4-property check); M-3 Slowloris timeout:timeout = 30class attribute on_MCPHTTPHandler+ serverself.timeout = 30; 14 regression tests - Test Coverage: 1923 tests passing, 94.62% coverage (0 skipped, +64 new)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changelog (4.5.2)¶
- Security Hardening (Quality-Gate Ultrathink): 17 findings fixed (3 HIGH, 11 MEDIUM, 3 LOW) across 6 files — CORS restricted from ALL_ORIGINS to
*.amazonaws.com, aegis-gate script injection fixes (function_name to env var, GITHUB_OUTPUT heredoc delimiters,::error::env vars), error message sanitization (no exception details in 500s),dynamodb:Scanremoved from IAM policies,s3:PutObjectAclremoved, ADOT collector pinned tov0.41.2, CDK deploy--require-approval broadening, billing alarm enabled for all stages (dev=$100, staging=$150, prod=$200), deploy workflow test gate added, ECS keepalive logs failures, health check rejectsdegraded,_safe()None guard, quality_subscores empty-list fallback, ECS config path cleared - Test Coverage: 1859 tests passing, 94.54% coverage (0 skipped)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changelog (4.5.1)¶
- AWS Deployment Complete (ROADMAP Items 16-20): All 4 CDK stacks successfully deployed to AWS us-west-2 (account 164171672016):
AegisSharedStack-dev(DynamoDBaegis-governance-state-dev, KMS, S3aegis-governance-audit-dev-164171672016, Secrets Manageraegis/signing-keys-dev),AegisLambdaStack-dev(Lambdaaegis-evaluate-proposal-dev+ API Gatewayhttps://yd1xm4ahcg.execute-api.us-west-2.amazonaws.com/dev/),AegisMcpStack-dev(ECS Fargate clusteraegis-governance-dev, serviceaegis-mcp-dev1/1 running),AegisMonitoringStack-dev(SNSaegis-governance-alarms-dev, CloudWatch dashboardAEGIS-Governance-dev, 4 alarms) - Deployment Bug Fixes (7): cdk.json literal string bug (account context), pyproject.toml py-modules for standalone .py modules, Dockerfile.lambda numpy/scipy pins + explicit COPY for standalone modules, ECS ALB removal (MCP uses stdio not HTTP) + keepalive loop + lightweight health check, Lambda cross-stack cyclic refs (inline IAM policies), CloudWatch math expression MAX() to IF(), CDK protocol error (dict context to env kwarg)
- ECS Architecture Note: MCP server uses stdio transport; ECS container runs keepalive loop pending HTTP/SSE transport implementation (see KNOWN_ISSUES.md)
- Test Coverage: 1859 tests passing, 94.55% coverage (0 skipped)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changelog (4.5.0)¶
- AWS Deployment Infrastructure (ROADMAP Items 16-20): Hybrid Lambda + ECS architecture via CDK Python; 4 CDK stacks (
infra/):AegisSharedStack(DynamoDB, Secrets Manager, S3, KMS),AegisLambdaStack(Lambda container image + API Gateway REST with IAM auth),AegisMcpStack(ECS Fargate + ADOT sidecar for AMP),AegisMonitoringStack(CloudWatch alarms + dashboard + billing protection);src/lambda_handler.pywrappingpcw_decide()with 3 routes (POST /evaluate, POST /risk-check, GET /health);Dockerfile.lambdafor scipy-enabled container image;.github/workflows/aegis-deploy.yml(OIDC deploy pipeline);.github/actions/aegis-gate/action.yml(reusable governance gate composite action); ADR-007 documenting architecture decision; estimated $51/mo - Ultrathink Hardening: U-1
quality_subscoresnull filter (prevents TypeError), U-2 aegis-gate script injection fix (env vars), U-4 Dockerfile.lambda editable install removed - Coverage Boost: 8 error-path tests raising
lambda_handler.pyfrom 86% to 94% - Test Coverage: 1859 tests passing, 94.55% coverage (0 skipped)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changelog (4.4.0)¶
- Drift Detection → Policy Connection (ROADMAP Item 15): Wired
DriftMonitorKL divergence detector into the production decision path ofpcw_decide()— CRITICAL drift → HALT (non-overridable), WARNING drift → advisory constraint, NORMAL drift → no change;drift_monitor=None(default) → identical behavior to previous versions (backward compatible); newdrift_resultfield onPCWDecision;_evaluate_drift_policy()and_apply_drift_overrides()helpers extracted for PLR0912 compliance;DRIFT_POLICY_ENFORCEDtelemetry event type;AegisConfig.create_drift_monitor()factory; CLI--drift-baselineflag with null-value filtering; MCPdrift_baseline_dataarray parameter with null-value filtering;DriftActionandDriftResultre-exported from SDK facade and engine; drift-specific next_steps for CRITICAL HALT; 39 new tests including 6 quality-gate regression tests - Test Coverage: 1817 tests passing, 94.56% coverage (0 skipped)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changelog (4.3.1)¶
- HTTP Telemetry Sink (ROADMAP Item 14):
HTTPEventSink(per-event fire-and-forget POST),BatchHTTPSink(batching with retry and background flush daemon),http_sink()factory; stdlib-only (urllib.request, matchingWebhookAlertSinkpattern);AegisConfig.telemetry_urloptional field; CLI--telemetry-urlflag onaegis evaluate; MCPtelemetry_urlstring parameter onaegis_evaluate_proposal; SDK facade re-exports (BatchHTTPSink,HTTPEventSink,http_sink); telemetry__init__.pyre-exports; 45 new tests - Test Coverage: 1778 tests passing, 94.44% coverage (0 skipped)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changelog (4.3.0)¶
- Shadow Mode (ROADMAP Item 13): Added
shadow_modekeyword parameter topcw_decide()for KL divergence calibration data collection without enforcing decisions; newShadowResultdataclass with drift evaluation, observation values, and baseline hash;DriftMonitorintegration via optionaldrift_monitorparameter;TelemetryEmitterintegration via optionaltelemetry_emitterparameter withSHADOW_EVALUATIONevent type; Prometheusmodelabel ondecision_latency_secondshistogram ("production"/"shadow"), newaegis_shadow_evaluations_totalcounter; CLI--shadowflag onaegis evaluate; MCPshadow_modeboolean parameter onaegis_evaluate_proposal;ShadowResultre-exported from SDK facade; alerting/recording rules filtered to{mode="production"}to exclude shadow data; 44 new tests - Test Coverage: 1733 tests passing, 94.48% coverage (0 skipped)
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changelog (4.2.3)¶
- ROADMAP Items 10-12: Production deployment guide (
docs/deployment/production-guide.md), migration guide (docs/deployment/migration-guide.md), performance SLAs with recorded benchmark baselines (docs/deployment/performance-slas.md);Dockerfile(multi-stage, non-root),docker-compose.yaml(AEGIS + Prometheus + Grafana),monitoring/prometheus/prometheus.yml(scrape config); no code changes - CALIBRATOR Actor (ROADMAP Item 7): New
Calibratoractor type — statistical threshold tuning for drift thresholds, Bayesian priors, gate parameters; approval-gated workflow (PROPOSED→APPROVED→APPLIED);_RECOGNIZED_PARAMETERSwhitelist (16 params); ultrathink-hardened (U-1 ValueError propagation, U-2 setattr validation, U-3 double-apply TOCTOU, U-4 derived ID collision, U-5 emit simplification); 69 new tests including 12 regression tests; 1689 tests / 94.60% coverage - GOVERNANCE Actor (ROADMAP Item 6): New
Governanceactor type — override orchestration (initiate/sign/approve/reject/expire), compliance checking (complexity gate non-overridable, fail-closed), emergency halt, thread-safe withthreading.Lock; ultrathink-hardened (U-1/U-2 halt guards, U-3 fail-closed compliance, U-4 terminal cleanup, U-7 thread safety); 41 new tests including 6 regression tests; 1620 tests / 94.36% coverage - DRY Extraction (ROADMAP Items 8 & 9): Extracted
ensure_utc()tosrc/workflows/serialization.py(3 workflows), 4 validation helpers tosrc/engine/validation.py(5 engine modules); 26 new tests; deferred persistence/telemetry timezone consolidation; 1620 tests / 94.36% coverage - Dependency Fix: Moved scipy/prometheus_client from
devto dedicatedengine/telemetryoptional groups with gracefulImportErrorat point of use; 4 regression tests; 1552 tests / 94.27% coverage - Quality-Gate Ultrathink #10: 5 MEDIUM bugs fixed — Bayesian overflow/NaN guard (B10-1/B10-2), pipeline validator exception propagation + per-event counting (T-2/T-3), executor rollback retry (T10-1) — 7 regression tests; 1471 tests / 94.23% coverage
- Rigor Close Deferrals v2: 4 bugs fixed + 3 closed as intentional; 6 regression tests; 1466 tests / 94.22% coverage
- FIX-1: Pipeline validator short-circuit
breakskipped remaining validators whendrop_invalid=False - FIX-2: Bayesian
update_prior()linear std interpolation → variance-space combination (Jensen's inequality fix) - FIX-3: Decryption
_decrypt_dict()dotted-path field filtering (e.g.,"nested.actor_id") - FIX-4: Consensus
approval_thresholddefault0.67→2/3(float precision fix for 2/3 majority) - CLOSE-1: Kappa discontinuity (intentional design), CLOSE-2: Prometheus private API (no alternative), CLOSE-3: Override rejection schema-code gap (intentional architecture)
Changelog (4.2.2)¶
- Bug-Hunt #9 + Ultrathink: 8 bugs fixed (4M, 4L) + 2 ultrathink findings; 19 regression tests; 1466 tests / 94.22% coverage
- Rigor: Close Deferrals: M6 (import normalization) + L47 (UtilityCalculator phi_S/phi_D validation) closed; T-1 ComplexityDecomposer NaN guard; 15 regression tests; 1441 tests / 94.14% coverage
- Quality-Gate: DEKEntry frozen dataclass (cache immutability), schema closure (theta in interface contract), 1426 tests / 94.14% coverage
- Docs-Sync #1: Comprehensive documentation audit — 9 files updated with 1398→1417 test metric sync
- Docs-Sync #2: Changelog header relabeled (4.2.1 → 4.2.2), ROADMAP/test-count-methodology dates fixed (2026-02-08 → 2026-02-07), gap-analysis bumped to v1.30.0, CLAUDE.md §10 added 3 missing modules (config.py, cli.py, aegis_governance/), §4.10 updated from 4 → 7 optional dependency groups
- Stale references fixed: gap-analysis.md date, test-count-methodology.md date, repository-structure.md CLAUDE.md annotation (v4.0.0 → v4.2.2), KNOWN_ISSUES.md version (4.2.1 → 4.2.2)
- CLAUDE.md: Telemetry schema version reference corrected (v2.0.0 → v2.1.0)
- ROADMAP.md: Version ordering anomaly fixed (v1.15.0 → v1.17.0 to restore monotonic ordering)
- comprehensive-todo-discovery.md: Stale metrics corrected (1375/15 skipped → 1398/0 skipped, 91/91 → 103/103 bugs)
- gap-analysis.md: GAP-L1 In-Progress table updated (66% → 100% code-complete), changelog entry added
Changelog (4.2.1)¶
- Bug-Hunt Sessions #3 & #4: Hybrid Codex+Claude sweeps — 14 bugs fixed (11 MEDIUM, 3 LOW)
- Session #3 MEDIUM: Four-eyes violation in override workflow, confidence fallback
orvsis not None, current_state property returning stale derived values, terminal state guard in sign_with_stored_key, MCP server replying to JSON-RPC notifications - Session #3 LOW: RBAC
_resolve_permissionsreturn type set/frozenset inconsistency, KL divergence length guard, WebhookAlertSink Content-Type header loss with custom headers - Session #4 MEDIUM: Buffer overflow silent discard in pipeline, encryption/decryption error path mutation (2 fixes), PostgreSQL URL encoding in persistence, executor rollback audit gap, quality_no_zero_subscore config flag ignored
- Ultrathink hardening: PIIManifest
set→frozenset(immutable PII fields), decryptionTypeErrorhandler (crypto resilience), pipeline warn-path copy-on-error, PostgreSQL URLquote_plus→SQLAlchemy URL.create()— 4 regression tests - 18 new regression tests across 12 test files
- 6 LOW-severity bugs deferred (documented in KNOWN_ISSUES.md)
- Rigor Protocol: 13 deferred ultrathink findings fixed (T-1..T-6, W-1..W-10) — 2 MEDIUM + 11 LOW with 18 regression tests
- Bug-Hunt Session #5: 11 ultrathink findings fixed (5 MEDIUM, 6 LOW) with 9 regression tests — KL divergence re-normalization, inf histogram handling, import path fix, None guards, defensive copies, MappingProxyType immutability, exception guards
- Bug-Hunt Session #6: 6 bugs fixed (3 MEDIUM, 3 LOW) with 10 regression tests — RBAC fail-open constraint (Codex), MCP non-dict JSON crash, drift inf baseline, CLI simplified names, pcw_decide empty next_steps, executor re-execution guard
- Quality-Gate Ultrathink: 5 bugs fixed (3 MEDIUM, 2 LOW) with 5 regression tests — NaN confidence propagation in gate evaluation, CLI risk_score priority override, pipeline stop/start race condition, dead rationale_parts code in afa_bridge, decision_path case inconsistency
- Benchmarks Enabled:
--benchmark-skip→--benchmark-disable— 15 benchmark tests now execute (0 skipped) - Bug-Hunt Session #8: 6 bugs fixed (3 MEDIUM, 3 LOW) with 8 regression tests — config utility_threshold YAML drop, drift histogram ignoring baseline range, Bayesian NaN propagation, consensus premature rejection, pipeline buffer_size=0 infinite loop, repository async lazy-load crash
- Test Coverage: 1398 tests passing, 94.13% coverage
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changelog (4.2.0)¶
- Gap Closure Sprint (issues #24, #2, #7, #5, #8, #9): Major gap closure addressing RBAC enforcement, performance testing, override audit, DR drill, and monitoring dashboard gaps
- Schema Alignment: Resolved three-way naming drift in telemetry override fields across schema YAML, OverrideInfo dataclass, and TelemetryEmitter payloads
- New modules:
src/rbac.py(RBAC enforcement engine),src/telemetry/alert.py(alerting rules),src/telemetry/metrics_server.py(metrics HTTP server) - New test suite:
tests/test_schema_consistency.py(13 tests for schema-code consistency) - Wired RBAC into override workflow and pcw_decide decision flow
- Added
monitoring/configs (Prometheus recording/alerting rules, Grafana dashboards) - Added
to_schema_dict()on OverrideInfo for schema-compliant serialization - Added stale partial override Prometheus alert (AegisOverrideStalePartial)
- 128 new tests across RBAC, alerting, metrics server, schema consistency, DR, benchmarks
- Test Coverage: 1309 tests passing, 94.17% coverage
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changelog (4.1.0)¶
- v1.0 SDK Merge (PR #23, commit
cfa3783): AEGIS v1.0 Governance Decision SDK - New modules:
src/config.py(AegisConfig),src/cli.py,src/aegis_governance/__init__.py(facade),src/aegis_governance/mcp_server.py - 79 new tests:
test_config.py,test_cli.py,test_facade.py,test_mcp_server.py - 4 runnable examples in
examples/ - README rewritten for SDK positioning
pyproject.toml: Added[project.scripts]entries (aegis,aegis-mcp-server)- Updated §1 entry points to reflect SDK surfaces (CLI, MCP, Python import)
- Test Coverage: 1172 tests passing, 94.61% coverage
- Quality Gates: All passing (ruff, black, mypy, bandit, pytest)
Changelog (4.0.0)¶
- CLAUDE.md Audit & Regeneration: Full v4.0.0 rewrite with agentic AI hardening
- Relocated 900-line changelog (v2.1–v3.36) to this file
- NEW Section 11: Agentic AI Hardening (OWASP Agentic Top 10 mapping)
- Added 5 developer playbooks (setup, quality gates, add gate, add workflow, optional deps)
- Added Python code standards (type annotations, dataclass patterns, thread safety)
- Added governance invariant protection protocol
- Created 3 custom slash commands (
/quality-gate,/sync-metrics,/governance-verify) - Enhanced ask-first triggers with agentic safety triggers
- Updated audit:
docs/claude/audits/aegis-root-v4.0.md - Reduced CLAUDE.md from 69KB (~1340 lines) to ~20KB (~620 lines)
- Test Coverage: 1132 tests passing, 93.24% coverage
- Quality Gates: All passing (mypy --strict, ruff, black, bandit)
Changelog (3.36.0)¶
- Hybrid Bug-Hunt Session: Codex gpt-5.2-codex (xhigh) + 3 Claude debugger agents
- Bug Fixes (1 HIGH, 4 MEDIUM):
- Bug 1 (MEDIUM):
gates.py:77-86- Missing epsilon validation- Added validation in
__init__to reject non-positive epsilon_R/epsilon_P - Raises
ValueErrorwith descriptive message preventing division by zero - Regression tests: 4 tests for epsilon validation
- Added validation in
- Bug 2 (MEDIUM):
gates.py:423,507- Incorrect tail for negative trigger factor- Risk gate now computes left-tail P(Δ≤t) when trigger_factor < 0
- Profit gate uses
abs(trigger_factor)for symmetric both-tails check - Regression tests: 3 tests for negative trigger factor behavior
- Bug 3 (MEDIUM):
gates.py:674- Utility confidence ignores threshold- Changed confidence calculation to use distance from threshold (
margin = lcb - threshold) - Sigmoid function now reflects how far above/below threshold the utility is
- Regression tests: 4 tests for confidence calculation
- Changed confidence calculation to use distance from threshold (
- Bug 4 (MEDIUM):
pipeline.py:282- Queue not drained on stop- Added queue drain loop before final flush in
stop()method - Prevents data loss when events are still queued during shutdown
- Regression tests: 1 test for queue drain on stop
- Added queue drain loop before final flush in
- Bug 5 (HIGH):
encryption.py:543- PII encryption bypass for lists_encrypt_dictdidn't recurse into lists, leaving PII unencrypted- Added
_encrypt_list()method for recursive list processing - Also fixed
decryption.pywith_decrypt_list()and_verify_list_integrity() - Regression tests: 4 tests for list encryption/decryption
- Test Coverage: 1053 tests (+16 from v3.34.0), 93.83% coverage
- Quality Gates: All passing (mypy --strict, ruff, black, bandit)
Changelog (3.35.0)¶
- Hybrid Bug-Hunt Session: Second session with Codex gpt-5.2-codex (xhigh) + 3 Claude debugger agents
- Bugs Identified: 6 total (1 HIGH, 5 MEDIUM) - fixed in v3.36.0
- Test Coverage: 1037 tests, 94.11% coverage
- Quality Gates: All passing (mypy --strict, ruff, black, bandit)
Changelog (3.34.0)¶
- All Deferred Bugs Fixed: Resolved 17 deferred issues from hybrid bug-hunt sessions
- MEDIUM Severity (1):
- B2-1:
emitter.py:334-memory_sinkunbounded growth - addedmaxlenparameter - LOW Severity (16):
- B1-1,B1-2,B1-3:
proposer.py- PERT validation and TOCTOU fixes - B2-4,B2-10,L47:
gates.py,utility.py,complexity.py- validation improvements - B1-4,B1-5:
bip322_provider.py,hybrid_kem.py- documentation and edge case handling - B2-11,B2-12:
schema.py,afa_bridge.py- nested validation and hash confidence - B3-1,B3-2,B3-5,B3-7:
consensus.py,override.py,durable.py,models.py- workflow validation and chain integrity - Test Coverage: 1037 tests (+81 from v3.33.0), 94.11% coverage (+0.48pp)
- Quality Gates: All passing (mypy --strict, ruff, black, bandit)
Changelog (3.33.0)¶
- Workflow Bug Fixes (4 MEDIUM Severity): Continued bug-hunt session fixes
- B3-1:
consensus.py- Emptyeligible_votersandactor_rolesvalidation - Added validation to reject empty
eligible_voterswith positivequorum_percentage - Previously: quorum could never be met, causing confusing runtime behavior
- Added warning when
get_required_missing()called with emptyactor_rolesdict - Regression tests: 6 tests for empty voters and actor_roles scenarios
- B3-2:
override.py-is_expiredTOCTOU race condition documentation - Enhanced
is_expireddocstring to document advisory-only nature (TOCTOU risk) - Added
check_and_mark_expired()method for atomic check-and-update operation - Signature operations already perform atomic expiration check internally
- Regression tests: 5 tests for expiration handling and atomic marking
- B3-5:
durable.py-resume_or_create()ID mismatch detection - Added
strict_id: bool = Trueparameter to detect workflow ID mismatches - Raises
ValueErrorwhen resumed workflow ID differs from requested ID (indicates caller bug) - Use
strict_id=Falsefor legacy behavior (logs warning, returns stored workflow) - Regression tests: 3 tests for strict_id behavior
- B3-7:
models.py-verify_chain_link()method for chain validation - Added
verify_chain_link(previous_transition)toWorkflowTransition - Validates: hash integrity, workflow ID match, state continuity, temporal ordering
- Returns
(is_valid, error_message)tuple for detailed error reporting - Regression tests: 7 tests for chain link validation scenarios
- Test Coverage: 956 tests passing (+10 from v3.32.0), 93.63% coverage
- Quality Gates: All passing (mypy --strict, ruff, black, bandit)
Changelog (3.32.0)¶
- Hybrid Bug-Hunt Session: Codex gpt-5.2-codex (xhigh) + 3 Claude debugger agents
- Lane A (Codex) - Bayesian Zero Override Fix (
src/engine/bayesian.py:357-358): - Python truthy
orfallback ignored explicit zero overrides inupdate_prior() - Fix: Changed
current_mean or self.prior_mean->self.prior_mean if current_mean is None else current_mean - Regression test:
test_update_prior_empty_observations_respects_overrides - Lane B (Claude) - 4 MEDIUM Severity Fixes:
- B2-3:
prometheus_exporter.py- Duplicate metric registration on multi-instantiation- Added
_get_or_create_counter(),_get_or_create_histogram(),_get_or_create_gauge()factory methods - Makes metric registration idempotent via REGISTRY lookup
- Added
- B3-3:
override.py-reject()discards actor_id/reason parameters- Added
rejected_by,rejection_reason,rejected_atfields to OverrideRequest - Updated
reject()to record rejection metadata - Added serialization/deserialization for rejection fields
- Added
- B3-4:
proposal.py-from_dict()loses prometheus exporter reference- Added
from_dict_with_exporter()factory method for DI during deserialization - Added
set_prometheus_exporter()method for post-construction injection
- Added
- Test Coverage: 956 tests passing (+10 from v3.31.0), 93.63% coverage
- Quality Gates: All passing (mypy --strict, ruff, black, bandit, pip-audit)
Changelog (3.31.0)¶
- Claude-GPT Dialogue Recommendations: Implemented 4 changes from multi-model consensus session
- phi_S/phi_D Single Source of Truth:
- Updated
docs/architecture/afa-libertas-integration.mdline 753-754 - Updated
docs/architecture/repository-structure.mdline 546-549 - All phi_S/phi_D values now reference
schema/interface-contract.yamlas authoritative source - Corrected values: phi_S=100 (was 500), phi_D=2000 (was 10000)
- KNOWN_ISSUES.md Cleanup:
- Removed L45 from LOW Severity (intentional design, not a bug)
- Moved L45 to "Intentional Patterns" section with explanation
- Reclassified L7 from "Python limitation" to "Known Limitation" with HSM mitigation path
- Added detailed HSM/KMS integration guidance for production deployments
- docs-consistency.yml CI Workflow: New GitHub Actions workflow
- Validates test count consistency across CLAUDE.md, README.md, ROADMAP.md, gap-analysis.md
- Extracts version and coverage metrics from CLAUDE.md as source of truth
- Weekly scheduled runs + push/PR triggers on documentation changes
- Advisory warnings (non-blocking) for mismatches
- Test Coverage: 946 tests passing, 93.48% coverage (unchanged)
- Quality Gates: All passing (mypy --strict, ruff, black, bandit)
Changelog (3.30.0)¶
- Deferred Bug Fix Complete: Fixed 2 remaining deferred issues (L44, L49)
- L44 Fix:
analyst.py:345- Type coercion validation in_evaluate_utility_gate() - Added
_coerce_to_float()method with try-except float() pattern - Validates all 8 numeric fields (mean, variance, lcb, ucb for both value/risk)
- Regression tests: 9 tests for type coercion edge cases
- L49 Fix:
hybrid_provider.py:324- Timing side-channel mitigation - Added
audit_mode: bool = Falseparameter to HybridSignatureProvider - Detailed timing logs only when
audit_mode=Truefor debugging - Generic error messages in default mode prevent timing leakage
- Regression tests: 6 tests for audit_mode behavior
- Research Verification: Fixes validated via ExaSearch
- L44: try-except float() is standard Python type coercion idiom
- L49: Configurable audit modes per Intel security guidance
- Test Coverage: 946 tests passing (+15 from v3.29.0), 93.48% coverage
- Quality Gates: All passing (mypy --strict, ruff, black, bandit)
Changelog (3.29.0)¶
- Hybrid Bug-Hunt Session: Implementation of 4 priority fixes from bug-hunt plan v3.0
- HIGH Severity Fixes (2):
- H-WF-001:
consensus.py:228- All-abstain stuck state in QUORUM_MET- When quorum met with only abstention votes, workflow stayed stuck indefinitely
- Fixed: Rejects immediately when quorum met with zero decisive votes
- Regression tests: 4 tests for partial quorum scenarios
- H-WF-003:
pipeline.py- Thread safety race conditions- Stats counters not lock-protected,
reset_stats()object replacement race - Fixed: Added
_stats_lock,get_stats()returns snapshot,stop()timeout warning - Regression tests: 5 thread safety tests
- Stats counters not lock-protected,
- MEDIUM Severity Fixes (3):
- M24:
hybrid_kem.py:279- Empty plaintext now raises ValueError (with allow_empty opt-in) - M25:
bip322_provider.py:109- Keygen max retry limit (1000 attempts) - M-ENG-005:
pcw_decide.py:187- Added AttributeError catch for malformed context - Test Coverage: 931 tests passing (+15 from v3.28.0), 93.48% coverage
- Quality Gates: All passing (mypy --strict, ruff, black, bandit)
Changelog (3.28.0)¶
- Deferred Bug Cleanup: Fixed 16 issues from hybrid bug-hunt (5 MEDIUM, 11 LOW)
- MEDIUM Severity Fixes (5):
- M19:
drift.py:68- Addednum_bins > 0andepsilon > 0validation - M20:
bip322_provider.py:173- Added empty message_hash validation - M21:
kek_provider.py:224- Documented version=0 alias for "current" - M22:
pipeline.py:321- Added_start_lockto fix TOCTOU race instart() - M23:
key_store.py:516- Added logging when key not found inrecord_usage() - LOW Severity Fixes (11):
- L41-L43:
gates.py- Input validation with warnings for out-of-range scores - L46:
approver.py:139- Removed redundant validation (covered by__post_init__) - L48:
proposer.py:152- Better error for already-submitted proposals - L50:
hybrid_kem.py:279- Warning for empty plaintext inencrypt() - L51:
mldsa.py:98- Empty message validation insign() - L52:
hybrid_provider.py:225- Empty message_hash validation - L53:
afa_bridge.py:467- Positive limit validation inget_decision_history() - L54:
consensus.py:217- Fixed all-abstain stuck state - L55:
bayesian.py:362- Extracted magic number to_FALLBACK_STDconstant - Remaining Deferred (2): L45 (intentional), L47 (extreme values valid)
- Test Coverage: 916 tests passing (+4 regression tests), 93.39% coverage
- Quality Gates: All passing (mypy --strict, ruff, black, bandit)
Changelog (3.27.0)¶
- Hybrid Bug-Hunt Session: Codex gpt-5.2-codex (xhigh) + 3 Claude debugger agents
- HIGH Severity Fixes (2 - Codex Lane A):
- H1:
pcw_decide.py:243- Human approval bypass when all gates pass- CRITICAL:
pcw_decide()returned PROCEED even withrequires_human_approval=True - Added
human_requiredgating to enforce PAUSE/ESCALATE
- CRITICAL:
- H2:
consensus.py:337- Workflow ID collision with ProposalWorkflow- Namespaced to
consensus:{proposal_id}to avoid persistence conflicts
- Namespaced to
- New Issues Identified (Claude Lane B): 5 MEDIUM, 15 LOW (documented in KNOWN_ISSUES.md)
- Test Coverage: 912 tests passing (+2 regression tests), 93.48% coverage
- Quality Gates: All passing (mypy --strict, ruff, black, bandit)
Changelog (3.26.0)¶
- Rigor Protocol Phase 3: Fixed 13 remaining issues (5 MEDIUM, 8 LOW)
- MEDIUM Severity Fixes (5):
- M14:
proposal.py:359- Documentedis_terminalexcludes ROLLED_BACK - M15:
pipeline.py:90-PipelineStats.errorsbounded todeque(maxlen=1000) - M16:
drift.py:279-calibrate_thresholds()requires at least 2 values - M17:
override.py:976- Documentedfrom_dict()maintenance contract - M18:
pipeline.py:390- Replaced bareexcept Exceptioninvalidate_timestamp() - LOW Severity Fixes (8):
- L33:
approver.py:24- Added__post_init__validation forApprovalVote.vote - L34:
emitter.py:164- Fixed auto-linkingparent_event_id(correlation chain only) - L35:
schema.py:240- Removed redundant localimport dataclasses - L36:
proposal.py:58- Added__post_init__validation toProposalMetadata - L38:
gates.py:62- EnhancedGateResult.margindocstring - L39:
pcw_decide.py:28- MovedUtilityComponentsimport to module level - L40:
pcw_decide.py:471- Added debug logging toquick_risk_check() - M13:
proposer.py:118- Enhanced PERT validation comment (reclassified) - Test Updates: Updated
test_cast_vote_invalidandtest_stats_defaultsfor new validation - Test Coverage: 910 tests passing, 93.48% coverage
- Quality Gates: All passing (mypy --strict, ruff, black, bandit)
- Remaining Issues: 2 (M6 non-relative imports; L7 secure memory erase)
Changelog (3.25.0)¶
- Rigor Protocol Phase 2: Fixed 17 issues (4 MEDIUM, 13 LOW) with 25 regression tests
- Test Coverage: 910 tests passing (+25), 93.48% coverage
- Quality Gates: All passing (mypy --strict, ruff, black, bandit)
Changelog (3.24.0)¶
- Rigor Protocol Phase 1: Fixed 7 issues (2 MEDIUM, 4 LOW, 1 documentation)
- Claude-GPT Dialogue: Resolved M15 retry logic architecture (Hybrid approach)
- Test Coverage: 885 tests passing, 93.48% coverage
- Quality Gates: All passing (mypy --strict, ruff, black, bandit)
Changelog (3.23.0)¶
- Minor Documented Issues Fixed: 11 issues from KNOWN_ISSUES.md resolved
- Test Coverage: 885 tests passing (+18 from 867), 93.48% coverage
- Quality Gates: All passing (mypy --strict, ruff, black, bandit)
Changelog (3.22.0)¶
- Hybrid Bug-Hunt Session Complete: Codex gpt-5.2-codex (3 iterations) + 3 Claude debugger agents
- Test Coverage: 867 tests passing (+9 from 858), 93.79% coverage
- Quality Gates: All passing (mypy --strict, ruff, black, bandit)
Changelog (3.21.0)¶
- Hybrid Bug-Hunt Session: 3 Claude debugger agents reviewing 42 source files in parallel
- Test Coverage: 858 tests passing (+4 from 854), 93.79% coverage
- Quality Gates: All passing (mypy --strict, ruff, black, bandit)
Changelog (3.20.0)¶
- All LOW Severity Bugs Fixed: Complete resolution of L1-L9 from hybrid bug-hunt
- Test Coverage: 854 tests passing (+8 from 846), 93.79% coverage
- Quality Gates: All passing (mypy --strict, ruff, black, bandit)
Changelog (3.19.0)¶
- Hybrid Bug-Hunt Session: Codex gpt-5.2-codex + 3 Claude debugger agents
- Test Coverage: 845 tests passing (+6 from 839)
- Quality Gates: All passing (mypy --strict, ruff, black)
Changelog (3.18.0)¶
- Quality Gate Fixes: Full quality-gate execution with all 8 phases
- Test Coverage: 839 tests passing
- Quality Gates: All passing (mypy --strict, ruff, black, bandit)
Changelog (3.17.0)¶
- Bug Fixes (KNOWN_ISSUES.md): Fixed 4 MEDIUM severity bugs from hybrid bug-hunt
- Test Coverage: 839 tests passing (+2 from 837)
- Quality Gates: All passing (mypy --strict, ruff, black, bandit)
Changelog (3.16.0)¶
- Bug Hunt Session: Hybrid bug-hunt with 3 Claude debugger agents
- Test Coverage: 837 tests passing (maintained)
- Quality Gates: All passing (mypy --strict, ruff, black, bandit)
Changelog (3.15.0)¶
- Bug Fixes (Rigor Protocol): Fixed 9
except Exceptionpatterns from hybrid bug-hunt session - Test Coverage: 837 tests passing (+9 from 828)
- Quality Gates: All passing (mypy --strict, ruff, black, bandit)
Changelog (3.14.0)¶
- Bug Fixes (Rigor Protocol): Fixed 3 MEDIUM severity bugs from hybrid bug-hunt session
- Test Coverage: 828 tests passing (+7 from 821)
- Quality Gates: All passing (mypy --strict, ruff, black)
Changelog (3.13.0)¶
- Mathematical Coherence Review: Addressed 4 design decisions from rigor protocol review
- Test Coverage: 821 tests passing (+14 from 807), 93.34% coverage
Changelog (3.12.0)¶
- Optional Dependencies Installed: Full post-quantum cryptography and BIP-322 now active
- Test Coverage Milestone: All tests now pass with 0 skipped (821 passed)
Changelog (3.11.0)¶
- Mathematical Coherence Fixes: Implemented 4 critical fixes from multi-model coherence review
- Test Coverage: Added 20 new tests (506 total passing, 282 skipped)
Changelog (3.10.0)¶
- ADR Consolidation Complete: Finished consolidating ADR directories
Changelog (3.9.0)¶
- Logic Coherence Fixes: Public API improvement, factory method, overflow protection
Changelog (3.8.0)¶
- Documentation Synchronization & Future Work Roadmap
Changelog (3.7.0)¶
- GAP-L1 PHASE 1 IMPLEMENTED: Prometheus Metrics Foundation
Changelog (3.6.0)¶
- Index & Cross-Reference Update: Systematic update of all TOCs, indexes, and cross-references
Changelog (3.5.0)¶
- Documentation Enhancement & Quality Gates: CI/CD badges, test count methodology, dependency visualization
Changelog (3.4.0)¶
- Repository Nomenclature Update: Clarified AEGIS vs Guardrails naming
Changelog (3.3.0)¶
- GAP-Q2 PHASE 2 COMPLETE: Full post-quantum encryption implementation
- TEST COVERAGE MILESTONE: 846 tests passing (444 new), 93.60% coverage
Changelog (3.2.0)¶
- GAP-Q1 IMPLEMENTED: Post-quantum hybrid signatures (Ed25519 + ML-DSA-44)
Changelog (3.1.0)¶
- GAP-M4 IMPLEMENTED: Full BIP-322 signature format support
Changelog (3.0.0)¶
- AEGIS v1.0.0 RELEASED: Production-ready release with full CI/CD validation
Changelog (2.9.0)¶
- Documentation Synchronization: Comprehensive audit and alignment of all documentation
Changelog (2.8.0)¶
- Repository Migration: Restructured for implementation-ready architecture
Changelog (2.7.0)¶
- AEGIS Integration: Unified five frameworks into Autonomous Engineering Governance System
Changelog (2.6.0)¶
- Documentation synchronization & cleanup review completed
Changelog (2.5.0)¶
- Added EPCC methodology documentation
Changelog (2.4.0)¶
- Fixed markdown linting issues
Changelog (2.3.0)¶
- Documentation synchronization audit completed
Changelog (2.2.0)¶
- Added framework comparison analysis
Changelog (2.1.0)¶
- Removed [PROVISIONAL] tags - tooling now configured