EPCC Implementation Methodology¶
Version: 1.0.0 Created: 2025-12-26 Scope: Implementation issues requiring architectural decisions Applicable Issues: #1, #2, #5, #6, #7, #8, #9
Overview¶
The EPCC (Explore, Plan, Code, Commit) methodology provides a structured approach for implementing complex features that require research validation, architectural decisions, and cross-functional coordination.
When to Use This Methodology:
- Building new services or systems
- Implementing security controls (RBAC, encryption)
- Creating monitoring/observability infrastructure
- Performance optimization requiring benchmarking
- Disaster recovery implementation
- Any work requiring architectural decisions
When NOT to Use:
- Specification/documentation updates (use simpler approach)
- Bug fixes with clear solutions
- Configuration changes
- Minor enhancements
Phase 1: Research & Validation¶
1.1 Current State Analysis¶
## Current State
- [ ] Document existing implementation (if any)
- [ ] Identify pain points and gaps
- [ ] Gather stakeholder requirements
- [ ] Review related issues and prior decisions
1.2 Industry Research¶
Required Research Queries (use Exa Search, Context7, web search):
| Topic | Query Pattern | Sources to Check |
|---|---|---|
| Best practices | {topic} best practices 2025 | Official docs, tech blogs |
| Patterns | {topic} design patterns production | GitHub, architecture guides |
| Benchmarks | {topic} performance benchmarks | Cloud provider docs, papers |
| Security | {topic} security considerations OWASP | OWASP, NIST, CIS |
| Case studies | {topic} implementation case study | Engineering blogs |
1.3 Technology Evaluation¶
## Technology Options
| Option | Pros | Cons | Fit Score |
|--------|------|------|-----------|
| Option A | | | /10 |
| Option B | | | /10 |
| Option C | | | /10 |
**Selected**: [Option] because [rationale]
1.4 Research Documentation¶
Create /docs/research/{issue-number}-{topic}.md:
# Research: {Topic}
**Issue**: #{number}
**Date**: 2026-02-10 (example — replace with actual research date)
**Researcher**: [name]
## Questions Investigated
1. ...
## Sources Consulted
<!-- Replace placeholder URLs below with actual research sources -->
- [Source 1](url) - Key finding
- [Source 2](url) - Key finding
## Findings Summary
...
## Recommendations
...
Phase 2: Architecture Overview¶
2.1 High-Level Design¶
## Architecture Overview
### Components
- Component A: [responsibility]
- Component B: [responsibility]
### Interactions
[Mermaid diagram or description]
### Data Flow
[Sequence diagram or description]
2.2 Architecture Decision Record (ADR)¶
Create /docs/adr/{number}-{title}.md:
# ADR-{number}: {Title}
**Status**: Proposed | Accepted | Deprecated | Superseded
**Date**: 2026-02-10 (example — replace with actual decision date)
**Decision Makers**: [names]
## Context
What is the issue we're addressing?
## Decision Drivers
- Driver 1
- Driver 2
- Driver 3
## Considered Options
1. Option A
2. Option B
3. Option C
## Decision Outcome
**Chosen Option**: Option X
### Rationale
Why this option was selected.
### Consequences
**Positive:**
- ...
**Negative:**
- ...
**Risks:**
- ...
## Validation
How will we verify this decision was correct?
## References
- [Link 1]
- [Link 2]
2.3 Technology Stack¶
## Technology Stack
| Layer | Technology | Version | Rationale |
|-------|------------|---------|-----------|
| Runtime | | | |
| Framework | | | |
| Database | | | |
| Messaging | | | |
| Monitoring | | | |
Phase 3: Implementation Strategy (EPCC Plan)¶
3.1 Milestone Breakdown¶
## Implementation Milestones
### Milestone 1: Foundation
- [ ] Task 1.1
- [ ] Task 1.2
**Exit Criteria**: [measurable outcome]
### Milestone 2: Core Implementation
- [ ] Task 2.1
- [ ] Task 2.2
**Exit Criteria**: [measurable outcome]
### Milestone 3: Integration
- [ ] Task 3.1
- [ ] Task 3.2
**Exit Criteria**: [measurable outcome]
### Milestone 4: Hardening
- [ ] Task 4.1
- [ ] Task 4.2
**Exit Criteria**: [measurable outcome]
3.2 Dependencies¶
## Dependencies
### Prerequisites (must exist before starting)
- [ ] Dependency 1
- [ ] Dependency 2
### Parallel Work (can proceed simultaneously)
- [ ] Work stream A
- [ ] Work stream B
### Blocking Dependencies (gates progress)
- [ ] Gate 1 → unlocks Milestone 2
- [ ] Gate 2 → unlocks Milestone 3
3.3 Risk Mitigation¶
## Risk Register
| Risk | Likelihood | Impact | Mitigation | Owner |
|------|------------|--------|------------|-------|
| R1 | H/M/L | H/M/L | Strategy | Name |
| R2 | H/M/L | H/M/L | Strategy | Name |
## Contingency Plans
### If [Risk 1] occurs:
1. Step 1
2. Step 2
### If [Risk 2] occurs:
1. Step 1
2. Step 2
Phase 4: Technical Excellence¶
4.1 Design Patterns¶
## Design Patterns Applied
| Pattern | Purpose | Location |
|---------|---------|----------|
| Pattern 1 | Why used | Where applied |
| Pattern 2 | Why used | Where applied |
4.2 Performance Requirements¶
## Performance Targets
| Metric | Target | Measurement Method |
|--------|--------|-------------------|
| Latency (p50) | X ms | APM tool |
| Latency (p95) | Y ms | APM tool |
| Latency (p99) | Z ms | APM tool |
| Throughput | N req/s | Load test |
| Error rate | <X% | Monitoring |
| Availability | X.XX% | Uptime tracking |
4.3 Security Measures¶
## Security Checklist
### Authentication & Authorization
- [ ] Authentication mechanism defined
- [ ] Authorization model documented
- [ ] RBAC roles mapped
### Data Protection
- [ ] Encryption at rest
- [ ] Encryption in transit
- [ ] PII handling documented
### Audit & Compliance
- [ ] Audit logging implemented
- [ ] Compliance requirements mapped
- [ ] Security review scheduled
4.4 Reliability Measures¶
## Reliability Design
### Failure Modes
| Component | Failure Mode | Detection | Recovery |
|-----------|--------------|-----------|----------|
| A | Crash | Health check | Auto-restart |
| B | Timeout | Circuit breaker | Fallback |
### Redundancy
- [ ] Multi-AZ deployment
- [ ] Database replication
- [ ] Load balancing
### Disaster Recovery
- RPO: X minutes
- RTO: Y minutes
- Backup strategy: [description]
Phase 5: Development Workflow¶
5.1 Environment Setup¶
## Development Environment
### Prerequisites
- Tool 1: version X.Y
- Tool 2: version X.Y
### Setup Steps
1. Clone repository
2. Install dependencies
3. Configure environment
4. Verify setup
### Configuration
| Variable | Purpose | Default |
|----------|---------|---------|
| VAR_1 | Description | value |
| VAR_2 | Description | value |
5.2 Testing Strategy¶
## Testing Strategy
### Unit Tests
- Coverage target: X%
- Framework: [name]
- Location: `tests/unit/`
### Integration Tests
- Scope: [description]
- Framework: [name]
- Location: `tests/integration/`
### End-to-End Tests
- Scenarios: [list]
- Framework: [name]
- Location: `tests/e2e/`
### Performance Tests
- Tool: [name]
- Scenarios: [list]
- Baselines: [metrics]
5.3 CI/CD Pipeline¶
## Pipeline Stages
1. **Build**
- Compile/transpile
- Dependency resolution
- Artifact creation
2. **Test**
- Unit tests
- Integration tests
- Security scan
3. **Deploy (Staging)**
- Infrastructure provisioning
- Application deployment
- Smoke tests
4. **Deploy (Production)**
- Canary deployment
- Health verification
- Rollback capability
5.4 Monitoring & Observability¶
## Observability Stack
### Metrics
- Platform: [CloudWatch/Prometheus/etc.]
- Key metrics: [list]
- Dashboards: [locations]
### Logging
- Platform: [CloudWatch/ELK/etc.]
- Log levels: [policy]
- Retention: [duration]
### Tracing
- Platform: [X-Ray/Jaeger/etc.]
- Sampling rate: [percentage]
- Key spans: [list]
### Alerting
| Alert | Condition | Severity | Runbook |
|-------|-----------|----------|---------|
| A | threshold | P1/P2/P3 | link |
| B | threshold | P1/P2/P3 | link |
Phase 6: Success Criteria & Metrics¶
6.1 Definition of Done¶
## Implementation Complete When:
### Functional
- [ ] All acceptance criteria met
- [ ] Edge cases handled
- [ ] Error handling implemented
### Quality
- [ ] Code review approved
- [ ] Test coverage ≥ X%
- [ ] No critical/high security issues
### Operational
- [ ] Monitoring configured
- [ ] Alerts defined
- [ ] Runbooks created
### Documentation
- [ ] API documentation complete
- [ ] Architecture diagrams updated
- [ ] ADR finalized
6.2 Performance Benchmarks¶
## Benchmark Results
| Metric | Target | Achieved | Status |
|--------|--------|----------|--------|
| p95 latency | <500ms | Xms | ✅/❌ |
| Throughput | >1000/s | X/s | ✅/❌ |
| Error rate | <0.1% | X% | ✅/❌ |
6.3 Quality Gates¶
## Quality Gates
| Gate | Requirement | Verification |
|------|-------------|--------------|
| Code Quality | Lint pass, no critical issues | CI pipeline |
| Test Coverage | ≥80% | Coverage report |
| Security | No high/critical vulns | Security scan |
| Performance | Meets SLOs | Load test |
| Documentation | Complete | Review checklist |
Issue-Specific Application¶
Issue #1: GAP-DriftThreshold¶
Phase Focus: Research (30-day data analysis), Validation (threshold tuning)
Research Queries:
- "KL divergence threshold calibration ML production"
- "distribution drift detection false positive rate"
- "concept drift monitoring best practices 2025"
Issue #2: GAP-PerfTest¶
Phase Focus: Testing Strategy, Performance Benchmarks
Research Queries:
- "load testing ML inference service"
- "p95 latency optimization techniques"
- "performance testing guardrail systems"
Issue #5: GAP-OverrideAudit¶
Phase Focus: Security Measures, Audit & Compliance
Research Queries:
- "audit logging best practices immutable"
- "two-person rule implementation patterns"
- "security event alerting architecture"
Issue #6: GAP-TelemetryPrivacy¶
Phase Focus: Security Measures, Data Protection
Research Queries:
- "PII redaction telemetry pipelines"
- "field-level encryption logging"
- "GDPR compliant logging architecture"
Issue #7: GAP-RBAC-Enforcement¶
Phase Focus: Full methodology (complex security implementation)
Research Queries:
- "RBAC implementation patterns cloud native"
- "NIST RBAC model implementation"
- "fine-grained access control telemetry"
Issue #8: GAP-DR-Drill¶
Phase Focus: Reliability Measures, Contingency Plans
Research Queries:
- "disaster recovery drill methodology"
- "RTO RPO validation testing"
- "chaos engineering DR testing"
Issue #9: GAP-MonitoringDashboard¶
Phase Focus: Observability Stack, Architecture Overview
Research Queries:
- "ML model monitoring dashboard design"
- "guardrail metrics visualization"
- "Grafana CloudWatch dashboard patterns"
Templates Location¶
Templates and implementation plans are organized as follows:
docs/
├── implementation-plans/ # EPCC plans per issue (ACTIVE)
│ ├── 001-drift-threshold-calibration.md
│ ├── 002-performance-load-testing.md
│ ├── 005-override-audit-logging.md
│ ├── 006-telemetry-pii-redaction.md
│ ├── 007-rbac-enforcement.md
│ ├── 008-disaster-recovery-drill.md
│ └── 009-monitoring-dashboard.md
├── adr/ # Architecture Decision Records (create as needed)
└── research/ # Research documentation (create as needed)
Note: The adr/ and research/ directories should be created when needed for formal architecture decisions or research documentation. See the implementation plans above for examples of the EPCC format in practice.
Changelog¶
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.1.0 | 2025-12-26 | Claude Code | Updated templates location to reflect actual plans |
| 1.0.0 | 2025-12-26 | Claude Code | Initial creation |