EPCC Implementation Methodology¶

Version: 1.0.0 Created: 2025-12-26 Scope: Implementation issues requiring architectural decisions Applicable Issues: #1, #2, #5, #6, #7, #8, #9

Overview¶

The EPCC (Explore, Plan, Code, Commit) methodology provides a structured approach for implementing complex features that require research validation, architectural decisions, and cross-functional coordination.

When to Use This Methodology:

Building new services or systems
Implementing security controls (RBAC, encryption)
Creating monitoring/observability infrastructure
Performance optimization requiring benchmarking
Disaster recovery implementation
Any work requiring architectural decisions

When NOT to Use:

Specification/documentation updates (use simpler approach)
Bug fixes with clear solutions
Configuration changes
Minor enhancements

Phase 1: Research & Validation¶

1.1 Current State Analysis¶

## Current State
- [ ] Document existing implementation (if any)
- [ ] Identify pain points and gaps
- [ ] Gather stakeholder requirements
- [ ] Review related issues and prior decisions

1.2 Industry Research¶

Required Research Queries (use Exa Search, Context7, web search):

Topic	Query Pattern	Sources to Check
Best practices	`{topic} best practices 2025`	Official docs, tech blogs
Patterns	`{topic} design patterns production`	GitHub, architecture guides
Benchmarks	`{topic} performance benchmarks`	Cloud provider docs, papers
Security	`{topic} security considerations OWASP`	OWASP, NIST, CIS
Case studies	`{topic} implementation case study`	Engineering blogs

1.3 Technology Evaluation¶

## Technology Options

| Option | Pros | Cons | Fit Score |
|--------|------|------|-----------|
| Option A | | | /10 |
| Option B | | | /10 |
| Option C | | | /10 |

**Selected**: [Option] because [rationale]

1.4 Research Documentation¶

Create /docs/research/{issue-number}-{topic}.md:

# Research: {Topic}

**Issue**: #{number}
**Date**: 2026-02-10 (example — replace with actual research date)
**Researcher**: [name]

## Questions Investigated
1. ...

## Sources Consulted
<!-- Replace placeholder URLs below with actual research sources -->
- [Source 1](url) - Key finding
- [Source 2](url) - Key finding

## Findings Summary
...

## Recommendations
...

Phase 2: Architecture Overview¶

2.1 High-Level Design¶

## Architecture Overview

### Components
- Component A: [responsibility]
- Component B: [responsibility]

### Interactions
[Mermaid diagram or description]

### Data Flow
[Sequence diagram or description]

2.2 Architecture Decision Record (ADR)¶

Create /docs/adr/{number}-{title}.md:

# ADR-{number}: {Title}

**Status**: Proposed | Accepted | Deprecated | Superseded
**Date**: 2026-02-10 (example — replace with actual decision date)
**Decision Makers**: [names]

## Context

What is the issue we're addressing?

## Decision Drivers

- Driver 1
- Driver 2
- Driver 3

## Considered Options

1. Option A
2. Option B
3. Option C

## Decision Outcome

**Chosen Option**: Option X

### Rationale

Why this option was selected.

### Consequences

**Positive:**
- ...

**Negative:**
- ...

**Risks:**
- ...

## Validation

How will we verify this decision was correct?

## References

- [Link 1]
- [Link 2]

2.3 Technology Stack¶

## Technology Stack

| Layer | Technology | Version | Rationale |
|-------|------------|---------|-----------|
| Runtime | | | |
| Framework | | | |
| Database | | | |
| Messaging | | | |
| Monitoring | | | |

Phase 3: Implementation Strategy (EPCC Plan)¶

3.1 Milestone Breakdown¶

## Implementation Milestones

### Milestone 1: Foundation
- [ ] Task 1.1
- [ ] Task 1.2
**Exit Criteria**: [measurable outcome]

### Milestone 2: Core Implementation
- [ ] Task 2.1
- [ ] Task 2.2
**Exit Criteria**: [measurable outcome]

### Milestone 3: Integration
- [ ] Task 3.1
- [ ] Task 3.2
**Exit Criteria**: [measurable outcome]

### Milestone 4: Hardening
- [ ] Task 4.1
- [ ] Task 4.2
**Exit Criteria**: [measurable outcome]

3.2 Dependencies¶

## Dependencies

### Prerequisites (must exist before starting)
- [ ] Dependency 1
- [ ] Dependency 2

### Parallel Work (can proceed simultaneously)
- [ ] Work stream A
- [ ] Work stream B

### Blocking Dependencies (gates progress)
- [ ] Gate 1 → unlocks Milestone 2
- [ ] Gate 2 → unlocks Milestone 3

3.3 Risk Mitigation¶

## Risk Register

| Risk | Likelihood | Impact | Mitigation | Owner |
|------|------------|--------|------------|-------|
| R1 | H/M/L | H/M/L | Strategy | Name |
| R2 | H/M/L | H/M/L | Strategy | Name |

## Contingency Plans

### If [Risk 1] occurs:
1. Step 1
2. Step 2

### If [Risk 2] occurs:
1. Step 1
2. Step 2

Phase 4: Technical Excellence¶

4.1 Design Patterns¶

## Design Patterns Applied

| Pattern | Purpose | Location |
|---------|---------|----------|
| Pattern 1 | Why used | Where applied |
| Pattern 2 | Why used | Where applied |

4.2 Performance Requirements¶

## Performance Targets

| Metric | Target | Measurement Method |
|--------|--------|-------------------|
| Latency (p50) | X ms | APM tool |
| Latency (p95) | Y ms | APM tool |
| Latency (p99) | Z ms | APM tool |
| Throughput | N req/s | Load test |
| Error rate | <X% | Monitoring |
| Availability | X.XX% | Uptime tracking |

4.3 Security Measures¶

## Security Checklist

### Authentication & Authorization
- [ ] Authentication mechanism defined
- [ ] Authorization model documented
- [ ] RBAC roles mapped

### Data Protection
- [ ] Encryption at rest
- [ ] Encryption in transit
- [ ] PII handling documented

### Audit & Compliance
- [ ] Audit logging implemented
- [ ] Compliance requirements mapped
- [ ] Security review scheduled

4.4 Reliability Measures¶

## Reliability Design

### Failure Modes
| Component | Failure Mode | Detection | Recovery |
|-----------|--------------|-----------|----------|
| A | Crash | Health check | Auto-restart |
| B | Timeout | Circuit breaker | Fallback |

### Redundancy
- [ ] Multi-AZ deployment
- [ ] Database replication
- [ ] Load balancing

### Disaster Recovery
- RPO: X minutes
- RTO: Y minutes
- Backup strategy: [description]

Phase 5: Development Workflow¶

5.1 Environment Setup¶

## Development Environment

### Prerequisites
- Tool 1: version X.Y
- Tool 2: version X.Y

### Setup Steps
1. Clone repository
2. Install dependencies
3. Configure environment
4. Verify setup

### Configuration
| Variable | Purpose | Default |
|----------|---------|---------|
| VAR_1 | Description | value |
| VAR_2 | Description | value |

5.2 Testing Strategy¶

## Testing Strategy

### Unit Tests
- Coverage target: X%
- Framework: [name]
- Location: `tests/unit/`

### Integration Tests
- Scope: [description]
- Framework: [name]
- Location: `tests/integration/`

### End-to-End Tests
- Scenarios: [list]
- Framework: [name]
- Location: `tests/e2e/`

### Performance Tests
- Tool: [name]
- Scenarios: [list]
- Baselines: [metrics]

5.3 CI/CD Pipeline¶

## Pipeline Stages

1. **Build**
   - Compile/transpile
   - Dependency resolution
   - Artifact creation

2. **Test**
   - Unit tests
   - Integration tests
   - Security scan

3. **Deploy (Staging)**
   - Infrastructure provisioning
   - Application deployment
   - Smoke tests

4. **Deploy (Production)**
   - Canary deployment
   - Health verification
   - Rollback capability

5.4 Monitoring & Observability¶

## Observability Stack

### Metrics
- Platform: [CloudWatch/Prometheus/etc.]
- Key metrics: [list]
- Dashboards: [locations]

### Logging
- Platform: [CloudWatch/ELK/etc.]
- Log levels: [policy]
- Retention: [duration]

### Tracing
- Platform: [X-Ray/Jaeger/etc.]
- Sampling rate: [percentage]
- Key spans: [list]

### Alerting
| Alert | Condition | Severity | Runbook |
|-------|-----------|----------|---------|
| A | threshold | P1/P2/P3 | link |
| B | threshold | P1/P2/P3 | link |

Phase 6: Success Criteria & Metrics¶

6.1 Definition of Done¶

## Implementation Complete When:

### Functional
- [ ] All acceptance criteria met
- [ ] Edge cases handled
- [ ] Error handling implemented

### Quality
- [ ] Code review approved
- [ ] Test coverage ≥ X%
- [ ] No critical/high security issues

### Operational
- [ ] Monitoring configured
- [ ] Alerts defined
- [ ] Runbooks created

### Documentation
- [ ] API documentation complete
- [ ] Architecture diagrams updated
- [ ] ADR finalized

6.2 Performance Benchmarks¶

## Benchmark Results

| Metric | Target | Achieved | Status |
|--------|--------|----------|--------|
| p95 latency | <500ms | Xms | ✅/❌ |
| Throughput | >1000/s | X/s | ✅/❌ |
| Error rate | <0.1% | X% | ✅/❌ |

6.3 Quality Gates¶

## Quality Gates

| Gate | Requirement | Verification |
|------|-------------|--------------|
| Code Quality | Lint pass, no critical issues | CI pipeline |
| Test Coverage | ≥80% | Coverage report |
| Security | No high/critical vulns | Security scan |
| Performance | Meets SLOs | Load test |
| Documentation | Complete | Review checklist |

Issue-Specific Application¶

Issue #1: GAP-DriftThreshold¶

Phase Focus: Research (30-day data analysis), Validation (threshold tuning)

Research Queries:
- "KL divergence threshold calibration ML production"
- "distribution drift detection false positive rate"
- "concept drift monitoring best practices 2025"

Issue #2: GAP-PerfTest¶

Phase Focus: Testing Strategy, Performance Benchmarks

Research Queries:
- "load testing ML inference service"
- "p95 latency optimization techniques"
- "performance testing guardrail systems"

Issue #5: GAP-OverrideAudit¶

Phase Focus: Security Measures, Audit & Compliance

Research Queries:
- "audit logging best practices immutable"
- "two-person rule implementation patterns"
- "security event alerting architecture"

Issue #6: GAP-TelemetryPrivacy¶

Phase Focus: Security Measures, Data Protection

Research Queries:
- "PII redaction telemetry pipelines"
- "field-level encryption logging"
- "GDPR compliant logging architecture"

Issue #7: GAP-RBAC-Enforcement¶

Phase Focus: Full methodology (complex security implementation)

Research Queries:
- "RBAC implementation patterns cloud native"
- "NIST RBAC model implementation"
- "fine-grained access control telemetry"

Issue #8: GAP-DR-Drill¶

Phase Focus: Reliability Measures, Contingency Plans

Research Queries:
- "disaster recovery drill methodology"
- "RTO RPO validation testing"
- "chaos engineering DR testing"

Issue #9: GAP-MonitoringDashboard¶

Phase Focus: Observability Stack, Architecture Overview

Research Queries:
- "ML model monitoring dashboard design"
- "guardrail metrics visualization"
- "Grafana CloudWatch dashboard patterns"

Templates Location¶

Templates and implementation plans are organized as follows:

docs/
├── implementation-plans/   # EPCC plans per issue (ACTIVE)
│   ├── 001-drift-threshold-calibration.md
│   ├── 002-performance-load-testing.md
│   ├── 005-override-audit-logging.md
│   ├── 006-telemetry-pii-redaction.md
│   ├── 007-rbac-enforcement.md
│   ├── 008-disaster-recovery-drill.md
│   └── 009-monitoring-dashboard.md
├── adr/                    # Architecture Decision Records (create as needed)
└── research/               # Research documentation (create as needed)

Note: The adr/ and research/ directories should be created when needed for formal architecture decisions or research documentation. See the implementation plans above for examples of the EPCC format in practice.

Changelog¶

Version	Date	Author	Changes
1.1.0	2025-12-26	Claude Code	Updated templates location to reflect actual plans
1.0.0	2025-12-26	Claude Code	Initial creation