Files
NATSBridge/docs/SDD_FRAMEWORK.md
2026-03-13 09:47:10 +07:00

14 KiB

SDD + GitOps Documentation Framework

Overview

The SDD (Software Design Documentation) + GitOps Documentation Framework is a comprehensive, structured approach to software development documentation that aligns technical work with business outcomes through clear separation of concerns.

This framework ensures that every piece of documentation serves a specific purpose, reaches the right audience, and is measurable through clear KPIs and SLOs.


The Documentation Matrix

Document Purpose & Rationale (The "Why") Audience Format / Content Measurement (KPI/SLO) Example (SaaS Context)
Requirements The Business North Star. Defines exactly what problem the user has and what success looks like. It prevents "feature creep" by setting hard boundaries on what we will NOT build. Founder, Team, PM Format: Shared Wiki (Notion/GitHub Wiki). Content: User stories, business constraints, competitive context, and success metrics. KPI: Business Outcomes. Measured by User Retention, Conversion Rates, and Monthly Recurring Revenue (MRR). "The system must process high-volume math so clients see reports instantly. Goal: 15% increase in daily active users."
Spec The Technical Contract. A machine-readable, strictly typed definition of all data interfaces. It is the "Single Source of Truth" that prevents bugs caused by communication gaps between services. Developers, QA, Automation Format: OpenAPI/YAML or Protobuf. Content: API endpoints, snake_case key naming, data validation rules, and error response codes. SLA/SLO: System Performance. Measured by API Uptime (99.9%), Response Latency (<100ms), and Error Rates. A contract.yaml defining exactly how Julia sends Arrow data to Node.js. It forces user_id to be a UUID.
Architecture The Structural Blueprint. A visual map of how the components (services, DBs, networks) fit together. It shows how the data flows through the 6-node cluster and where bottlenecks live. Senior Devs, DevOps Format: Diagrams-as-code (Mermaid.js). Content: System Context diagrams, Database ERDs, Network Security Policies, and Infrastructure maps. Efficiency Metrics: Resource utilization. Measured by CPU Load (<70%), RAM per pod, and internal network throughput. A diagram showing the data path: Caddy (Proxy) → Node.js (API) → NATS (Queue) → Julia (Math Engine).
Walkthrough The Intuition & Logic. A narrative guide that explains the "steps" and "rationale" behind end-to-end flows. It's about building a mental model so devs understand why the sequence matters. The Team, New Hires Format: TOUR.md file or Loom Video. Content: Step-by-step traces of core features, explanation of architectural trade-offs, and "The Big Picture" flow. Quality: Developer Velocity. Measured by "Time-to-First-Commit" for new hires and reduction in conceptual bugs. "End-to-End Trace:" 1. UI sends JSON. 2. API wraps it in Claim-Check. 3. Julia pulls it. Rationale: To avoid NATS memory spikes.
Implementation The Functional Reality. The actual code that does the work. In SDD, the "boring" parts (types/routes) are auto-generated from the Spec to ensure the code never lies. Developers, Reviewers Format: Git Repository. Content: Business logic, internal helper functions, Unit Tests, and a README.md for local environment setup. Code Health: Internal Quality. Measured by Test Coverage (90%+), Linting compliance, and Cyclomatic Complexity. The SvelteKit frontend components and the specific Julia math-processing functions.
Validation The Enforcement Layer. Automated gates that prove the Implementation matches the Spec. It prevents human error (like changing a key name) from reaching production. CI/CD Pipeline, QA Format: GitHub Actions / Tests. Content: Contract tests (Dredd/Prism), Integration tests, and Security scans that run on every pull request. Compliance: Safety Metrics. Measured by Build Success Rate and 0 "Contract Violations" in the production logs. A CI job that blocks a Pull Request because a developer used camelCase in a database field instead of snake_case.
Maintenance The Health & Evolution. Defines how to upgrade dependencies, manage technical debt, and rotate secrets. It's the guide for "future-proofing" the software over time. The Team, DevOps Format: MAINTENANCE.md. Content: Dependency update schedules, Secret rotation steps, DB Migration logs, and Tech Debt "Graveyard" tracking. Sustainability: System Longevity. Measured by "Package Age," "Security Vulnerabilities Found," and "Migration Success Rate." "Steps to upgrade the Julia version across all 6 nodes without downtime using a Blue-Green deployment strategy."
Runbook The Operational Life-Support. The instructions for when the system is alive (or dying). In GitOps, this is the "Desired State" of the infrastructure. DevOps, SRE, On-call Devs Format: K8s Manifests (Flux/Argo). Content: Deployment steps, Scaling triggers, Backup/Restore procedures, and "3:00 AM" troubleshooting guides. Reliability: Operational Health. Measured by MTTR (Mean Time to Recovery) and Error-Free Deployments. A Flux manifest that ensures 6 replicas of the Julia service are always healthy and restarts them if they hit 80% RAM.

Detailed Breakdown of Each Document Type

1. Requirements

Purpose: Establish the Business North Star

The Requirements document is your anchor point. It answers the fundamental question: "What problem are we solving, and how do we know we've succeeded?"

Key Characteristics:

  • Business-Focused: Written in business terms, not technical jargon
  • Boundary-Setting: Explicitly defines what we will NOT build
  • Outcome-Oriented: Focuses on user outcomes, not features

Best Practices:

  • Include user stories that describe the user's perspective
  • Document business constraints (regulatory, legal, compliance)
  • Define competitive context and market positioning
  • Establish clear success metrics from day one

Common Pitfalls to Avoid:

  • Vague descriptions like "improve user experience"
  • Changing requirements without updating the document
  • Not defining what's out of scope

2. Spec (Specification)

Purpose: Create the Technical Contract

The Spec serves as the Single Source of Truth for all data interfaces. It's a machine-readable definition that ensures consistency across services.

Key Characteristics:

  • Machine-Readable: Can be parsed by tools for validation and code generation
  • Strictly Typed: Enforces data types and validation rules
  • Comprehensive: Covers all endpoints, request/response formats, and error codes

Best Practices:

  • Use OpenAPI/Swagger for REST APIs or Protobuf for gRPC
  • Enforce consistent naming conventions (e.g., snake_case)
  • Define validation rules for all data fields
  • Document all possible error responses

Common Pitfalls to Avoid:

  • Letting the spec diverge from the implementation
  • Incomplete error handling documentation
  • Not versioning the API spec

3. Architecture

Purpose: Visualize the System Structure

The Architecture document provides a visual map of how components fit together. It helps identify bottlenecks and understand data flow.

Key Characteristics:

  • Visual: Uses diagrams to represent complex relationships
  • Comprehensive: Covers system context, data flow, and infrastructure
  • Living Document: Updated as the system evolves

Best Practices:

  • Use Mermaid.js for diagrams-as-code (versionable in Git)
  • Include multiple views: System Context, C4 model, ERDs, network topology
  • Document trade-offs and architectural decisions
  • Show data flow through the system

Common Pitfalls to Avoid:

  • Over-engineering diagrams with unnecessary detail
  • Not updating diagrams when the architecture changes
  • Using static images instead of diagrams-as-code

4. Walkthrough

Purpose: Build Mental Models

The Walkthrough document explains the "why" behind the "how." It helps developers understand the rationale behind design decisions.

Key Characteristics:

  • Narrative-Driven: Tells a story about how the system works
  • Context-Rich: Explains trade-offs and decisions
  • End-to-End: Traces flows from user input to system output

Best Practices:

  • Document step-by-step traces of core features
  • Explain architectural trade-offs and why you chose them
  • Include "The Big Picture" context
  • Use real examples and data flows

Common Pitfalls to Avoid:

  • Only documenting the happy path
  • Assuming developers will figure out the "why"
  • Not explaining the rationale behind decisions

5. Implementation

Purpose: The Functional Reality

The Implementation is the actual code that does the work. In SDD, the "boring" parts are auto-generated from the Spec to ensure consistency.

Key Characteristics:

  • Machine-Generated: Types and routes auto-generated from Spec
  • Human-Written: Business logic and helper functions
  • Tested: Includes unit and integration tests

Best Practices:

  • Auto-generate boring parts (types, routes) from the Spec
  • Keep business logic separate from boilerplate
  • Maintain comprehensive test coverage
  • Document the local development setup

Common Pitfalls to Avoid:

  • Hand-writing types that should be auto-generated
  • Inconsistent code style
  • Insufficient test coverage

6. Validation

Purpose: Enforce the Contract

The Validation layer provides automated gates that ensure the Implementation matches the Spec. It prevents human error from reaching production.

Key Characteristics:

  • Automated: Runs on every commit/Pull Request
  • Comprehensive: Covers contract tests, integration tests, and security scans
  • Blocking: Prevents merges that violate the contract

Best Practices:

  • Use contract testing tools (Dredd, Prism) to validate API contracts
  • Run integration tests on every commit
  • Include security scans in the CI pipeline
  • Fail builds on contract violations

Common Pitfalls to Avoid:

  • Not running tests on every commit
  • Allowing manual overrides of validation gates
  • Not updating tests when the Spec changes

7. Maintenance

Purpose: Ensure Long-Term Health

The Maintenance document defines how to upgrade dependencies, manage technical debt, and rotate secrets. It's the guide for "future-proofing" the software.

Key Characteristics:

  • Procedural: Step-by-step instructions for common tasks
  • Scheduled: Includes regular maintenance windows
  • Documented: Tracks technical debt and migration history

Best Practices:

  • Document dependency update schedules
  • Create secret rotation procedures
  • Track technical debt in a "Graveyard"
  • Document migration history and rollback procedures

Common Pitfalls to Avoid:

  • Ad-hoc upgrades without documentation
  • Ignoring technical debt until it becomes critical
  • Not testing upgrades in staging first

8. Runbook

Purpose: Operational Life-Support

The Runbook provides instructions for when the system is alive (or dying). In GitOps, this is the "Desired State" of the infrastructure.

Key Characteristics:

  • Action-Oriented: Step-by-step instructions for common operations
  • Automated: Infrastructure as code defines the desired state
  • Crisis-Ready: Includes "3:00 AM" troubleshooting guides

Best Practices:

  • Document deployment procedures
  • Define scaling triggers and procedures
  • Include backup and restore procedures
  • Create troubleshooting guides for common issues

Common Pitfalls to Avoid:

  • Not documenting procedures for common issues
  • Not testing runbook procedures
  • Not versioning runbooks with the infrastructure

How to Use This Approach Effectively

Phase 1: Foundation (Week 1-2)

  1. Create Requirements Document

    • Define the Business North Star
    • Establish success metrics
    • Define out-of-scope items
  2. Write the Spec

    • Define all data interfaces
    • Establish naming conventions
    • Document validation rules
  3. Design Architecture

    • Create system diagrams
    • Document data flow
    • Identify potential bottlenecks

Phase 2: Development (Week 3+)

  1. Write Walkthrough

    • Document end-to-end flows
    • Explain architectural trade-offs
    • Create mental models for developers
  2. Implement Code

    • Auto-generate boring parts from Spec
    • Write business logic
    • Implement tests

Phase 3: Quality Assurance

  1. Set Up Validation

    • Configure CI/CD pipeline
    • Set up contract testing
    • Configure security scans
  2. Create Runbook

    • Document deployment procedures
    • Define scaling triggers
    • Create troubleshooting guides

Phase 4: Maintenance

  1. Document Maintenance
    • Create dependency update schedule
    • Document secret rotation
    • Track technical debt

Key Principles for Success

  1. Separation of Concerns: Keep business concerns separate from technical concerns
  2. Machine-Readable Contracts: Use OpenAPI/Protobuf for specs to enable automation
  3. Automation: Automate boring parts and validation to reduce human error
  4. Measurability: Every document should have measurable outcomes
  5. Version Control: Keep all documentation in Git for history and collaboration
  6. Living Documents: Update documentation as the system evolves
  7. Audience-Focused: Write for the intended audience's needs and knowledge level

Conclusion

The SDD + GitOps Documentation Framework provides a comprehensive, structured approach to software development documentation. By following this framework, teams can ensure that:

  • Business goals are clearly defined and measurable
  • Technical contracts are machine-readable and enforced
  • System architecture is visualized and understood
  • Developers have clear mental models of the system
  • Code quality is maintained through automation
  • Operations are reliable and repeatable

This framework is not just about documentation—it's about creating a shared understanding across the entire team and ensuring that every decision is aligned with business goals.