update requirement doc
This commit is contained in:
@@ -1,402 +0,0 @@
|
||||
# SDD + GitOps Documentation Framework
|
||||
|
||||
This document defines the documentation framework for the NATSBridge project. It establishes a structured approach to creating, maintaining, and evolving technical documentation in alignment with GitOps principles—ensuring that documentation is versioned, auditable, and continuously validated alongside the codebase.
|
||||
|
||||
---
|
||||
|
||||
## The SDD Framework: Seven Pillars of Documentation
|
||||
|
||||
| Document | Purpose (Rationale) | Primary Audience | Format / Content | Example (SaaS Context) | Measurement (KPI) |
|
||||
|----------|---------------------|-----------------|------------------|------------------------|-------------------|
|
||||
| **Requirements** | Capture the **business intent** — why we're building this and what success looks like. Defines boundaries and user-visible outcomes. | Stakeholders, Product Owners, Lead Developers | User stories, PRDs, acceptance criteria, non-functional constraints. | "System must process tabular data from Julia to SvelteKit UI with <200ms latency for 5-member teams." | 95% of requests complete <200ms (synthetic monitoring). |
|
||||
| **Specification** | The **technical contract** — precise rules for inputs, outputs, and data shape. Ensures consistency across dev and test. | Developers, QA Engineers, CI/CD pipelines | OpenAPI, Protobuf, AsyncAPI. Endpoint definitions, schemas, error codes. | `contract.yaml` defining a NATS subject that accepts Arrow streams with snake_case headers. | 100% of messages validated against spec (CI block rate). |
|
||||
| **Architecture** | The **blueprint** — how components fit together, interact, and scale. Guides system structure and trade-offs. | Architects, Senior Developers, DevOps | C4 diagrams, Mermaid.js, component/network/storage models. | Diagram showing 6-node cluster routing traffic via Caddy → Node.js API → Julia pods. | 100% of major decisions logged with trade-off analysis. |
|
||||
| **Walkthrough** | The **story of flow** — shows how pieces connect end-to-end and why steps are sequenced. Builds intuition for new devs. | New Developers, Team Members | TOUR.md, Loom videos, sequence diagrams. Step-by-step traces with rationale. | "UI sends JSON → Node.js wraps Claim-Check → Julia pulls Arrow data (prevents NATS overflow)." | New developers ship feature in <2 days (PR timeline). |
|
||||
| **Implementation** | The **real code** — business logic, helpers, tests, configs. Where design becomes executable. | Developers, Code Reviewers | Source code, README.md, unit tests, setup scripts. | Julia function for matrix calculation + SvelteKit component rendering table. | >80% unit test coverage, <5% drift from spec. |
|
||||
| **Validation** | The **enforcer** — ensures implementation matches the spec. Blocks drift and human error. | Automation servers, QA, Lead Developers | CI jobs, contract tests, linting, integration checks. | CI job rejects PR with camelCase field not allowed by YAML spec. | <1% of PRs bypass validation gates. |
|
||||
| **Runbook** | The **operational manual** — how the system lives in production, scales, and recovers. Guides on-call engineers. | DevOps, SREs, On-call Developers | K8s manifests, Helm charts, Markdown guides. Deployment, scaling, backup/restore, troubleshooting. | GitOps manifest ensuring 6 Julia replicas restart if memory >80%. | MTTR <15 minutes for P1 incidents. |
|
||||
|
||||
---
|
||||
|
||||
## Detailed Document Descriptions
|
||||
|
||||
### 1. Requirements
|
||||
|
||||
**Purpose**: Capture the *business intent* — why we're building this and what success looks like. Defines boundaries and user-visible outcomes.
|
||||
|
||||
**Why It Matters**:
|
||||
- Aligns engineering efforts with business goals
|
||||
- Provides a north star for feature development
|
||||
- Establishes acceptance criteria before implementation begins
|
||||
- Creates a contract between product and engineering
|
||||
|
||||
**Content Guidelines**:
|
||||
- User stories with clear acceptance criteria (As a X, I want Y so that Z)
|
||||
- Product Requirements Documents (PRDs) with success metrics
|
||||
- Non-functional requirements (performance, security, scalability)
|
||||
- Boundary definitions (what's in scope vs. out of scope)
|
||||
|
||||
**Best Practices**:
|
||||
- Link each requirement to a measurable KPI
|
||||
- Keep requirements testable and verifiable
|
||||
- Maintain backward compatibility with existing requirements
|
||||
- Review and update requirements as business context changes
|
||||
|
||||
---
|
||||
|
||||
### 2. Specification
|
||||
|
||||
**Purpose**: The *technical contract* — precise rules for inputs, outputs, and data shape. Ensures consistency across dev and test.
|
||||
|
||||
**Why It Matters**:
|
||||
- Prevents implementation drift between components
|
||||
- Enables contract testing in CI/CD pipelines
|
||||
- Provides a single source of truth for data structures
|
||||
- Facilitates integration between teams
|
||||
|
||||
**Content Guidelines**:
|
||||
- API endpoint definitions (methods, paths, parameters)
|
||||
- Request/response schemas (JSON, XML, Protobuf, AsyncAPI)
|
||||
- Error codes and their meanings
|
||||
- Data validation rules and constraints
|
||||
- Rate limiting and quota definitions
|
||||
|
||||
**Best Practices**:
|
||||
- Use formal specification languages (OpenAPI 3.0+, AsyncAPI)
|
||||
- Version specifications alongside code
|
||||
- Generate client SDKs from specifications
|
||||
- Block CI on specification violations
|
||||
- Document edge cases and error scenarios
|
||||
|
||||
---
|
||||
|
||||
### 3. Architecture
|
||||
|
||||
**Purpose**: The *blueprint* — how components fit together, interact, and scale. Guides system structure and trade-offs.
|
||||
|
||||
**Why It Matters**:
|
||||
- Provides a mental model for system design
|
||||
- Guides technical decision-making and trade-off analysis
|
||||
- Facilitates onboarding of new architects and senior developers
|
||||
- Documents scaling and performance considerations
|
||||
|
||||
**Content Guidelines**:
|
||||
- C4 diagrams (Context, Container, Component levels)
|
||||
- Mermaid.js flowcharts for sequence diagrams
|
||||
- Component interaction diagrams
|
||||
- Network topology and data flow
|
||||
- Storage and caching strategies
|
||||
- Scaling and resilience patterns
|
||||
|
||||
**Best Practices**:
|
||||
- Use diagrams that are easy to update (Mermaid.js over static images)
|
||||
- Document trade-off decisions with Rationale Documents
|
||||
- Include scaling considerations for each component
|
||||
- Document failure modes and recovery strategies
|
||||
- Keep architecture diagrams versioned with code
|
||||
|
||||
---
|
||||
|
||||
### 4. Walkthrough
|
||||
|
||||
**Purpose**: The *story of flow* — shows how pieces connect end-to-end and why steps are sequenced. Builds intuition for new devs.
|
||||
|
||||
**Why It Matters**:
|
||||
- Reduces onboarding time for new developers
|
||||
- Provides context that code comments alone cannot convey
|
||||
- Explains the "why" behind architectural decisions
|
||||
- Helps identify gaps in the system design
|
||||
|
||||
**Content Guidelines**:
|
||||
- Step-by-step flow descriptions with rationale
|
||||
- Sequence diagrams showing request/response patterns
|
||||
- "Tour of the codebase" guides
|
||||
- Video walkthroughs (Loom, internal recordings)
|
||||
- Debugging and tracing examples
|
||||
|
||||
**Best Practices**:
|
||||
- Walk through real user journeys, not just technical flows
|
||||
- Include "what could go wrong" scenarios
|
||||
- Link walkthroughs to relevant code locations
|
||||
- Keep walkthroughs updated with architecture changes
|
||||
- Make walkthroughs interactive where possible
|
||||
|
||||
---
|
||||
|
||||
### 5. Implementation
|
||||
|
||||
**Purpose**: The *real code* — business logic, helpers, tests, configs. Where design becomes executable.
|
||||
|
||||
**Why It Matters**:
|
||||
- This is the actual artifact that runs in production
|
||||
- Code is the ultimate source of truth (when it matches spec)
|
||||
- Tests validate correctness and prevent regressions
|
||||
- Configuration files define runtime behavior
|
||||
|
||||
**Content Guidelines**:
|
||||
- Business logic implementation
|
||||
- Helper functions and utilities
|
||||
- Unit and integration tests
|
||||
- Configuration files (YAML, JSON, environment)
|
||||
- Setup and development scripts
|
||||
- Code organization and module structure
|
||||
|
||||
**Best Practices**:
|
||||
- Follow consistent code style and conventions
|
||||
- Write tests before or alongside implementation (TDD/BDD)
|
||||
- Document complex logic with inline comments
|
||||
- Keep configuration externalized and versioned
|
||||
- Use type annotations where applicable
|
||||
|
||||
---
|
||||
|
||||
### 6. Validation
|
||||
|
||||
**Purpose**: The *enforcer* — ensures implementation matches the spec. Blocks drift and human error.
|
||||
|
||||
**Why It Matters**:
|
||||
- Prevents breaking changes from reaching production
|
||||
- Catches specification violations early in the CI pipeline
|
||||
- Maintains data integrity and API consistency
|
||||
- Reduces manual QA effort through automation
|
||||
|
||||
**Content Guidelines**:
|
||||
- CI/CD pipeline configurations
|
||||
- Contract testing scripts
|
||||
- Linting rules and configurations
|
||||
- Integration test suites
|
||||
- Schema validation jobs
|
||||
- Security scanning and audit jobs
|
||||
|
||||
**Best Practices**:
|
||||
- Fail CI on specification violations
|
||||
- Run validation jobs on every commit and PR
|
||||
- Use automated code review tools
|
||||
- Maintain validation job health dashboard
|
||||
- Document validation failure remediation steps
|
||||
|
||||
---
|
||||
|
||||
### 7. Runbook
|
||||
|
||||
**Purpose**: The *operational manual* — how the system lives in production, scales, and recovers. Guides on-call engineers.
|
||||
|
||||
**Why It Matters**:
|
||||
- Reduces Mean Time To Recovery (MTTR) for incidents
|
||||
- Provides step-by-step guidance for common issues
|
||||
- Documents scaling and deployment procedures
|
||||
- Ensures operational knowledge is not siloed
|
||||
|
||||
**Content Guidelines**:
|
||||
- Deployment procedures (manual and automated)
|
||||
- Scaling instructions (horizontal/vertical)
|
||||
- Backup and restore procedures
|
||||
- Troubleshooting guides for common issues
|
||||
- Runbook entries for specific error codes
|
||||
- Contact information and escalation paths
|
||||
|
||||
**Best Practices**:
|
||||
- Write runbooks for every P1/P2 incident
|
||||
- Include exact commands and configuration snippets
|
||||
- Test runbooks periodically (chaos engineering)
|
||||
- Link runbook entries to relevant documentation
|
||||
- Keep runbooks updated when system changes
|
||||
|
||||
---
|
||||
|
||||
## How to Use This Approach Effectively
|
||||
|
||||
### 1. Start with Requirements
|
||||
|
||||
Before writing any code or documentation, establish clear requirements. Ask:
|
||||
- What business problem are we solving?
|
||||
- How will we measure success?
|
||||
- What are the non-negotiable constraints?
|
||||
|
||||
**Action**: Create a `docs/requirements/` directory and start with `PRD.md` and `KPIs.md`.
|
||||
|
||||
### 2. Define the Specification First
|
||||
|
||||
Once requirements are stable, define the technical specification. This becomes the contract for implementation.
|
||||
|
||||
**Action**: Create `docs/specification/` with `contract.yaml` (or appropriate format) and `error-codes.md`.
|
||||
|
||||
### 3. Design the Architecture
|
||||
|
||||
With requirements and specification in place, design the architecture. Document trade-off decisions explicitly.
|
||||
|
||||
**Action**: Create `docs/architecture/` with Mermaid diagrams and `trade-offs.md`.
|
||||
|
||||
### 4. Create Walkthroughs Early
|
||||
|
||||
As soon as the architecture is defined, create walkthroughs. This helps identify gaps and provides onboarding material.
|
||||
|
||||
**Action**: Create `docs/walkthrough/` with `TOUR.md` and sequence diagrams.
|
||||
|
||||
### 5. Implement with Validation in Mind
|
||||
|
||||
Write implementation code that adheres to the specification. Build validation into the CI pipeline from day one.
|
||||
|
||||
**Action**: Ensure test files are co-located with implementation and run on every commit.
|
||||
|
||||
### 6. Automate Validation
|
||||
|
||||
Build automated validation that runs in CI/CD. This ensures spec compliance and prevents drift.
|
||||
|
||||
**Action**: Configure CI jobs to validate against specification and block PRs on violations.
|
||||
|
||||
### 7. Document Operations from Day One
|
||||
|
||||
Create runbook entries as soon as deployment procedures are established. Update them when incidents occur.
|
||||
|
||||
**Action**: Create `docs/runbook/` with entries for deployment, scaling, and common issues.
|
||||
|
||||
---
|
||||
|
||||
## GitOps Integration
|
||||
|
||||
This documentation framework aligns with GitOps principles:
|
||||
|
||||
| GitOps Principle | Documentation Alignment |
|
||||
|-----------------|------------------------|
|
||||
| **Versioned** | All documentation lives in git, with history and audit trail |
|
||||
| ** declarative** | Specifications and architecture are declarative contracts |
|
||||
| **Automated** | Validation jobs automate spec compliance checks |
|
||||
| **Self-Service** | Walkthroughs and runbooks enable self-service onboarding and operations |
|
||||
| **Observability** | KPIs and metrics are defined for each documentation artifact |
|
||||
|
||||
**Git Structure**:
|
||||
```
|
||||
docs/
|
||||
├── requirements/ # PRDs, user stories, KPIs
|
||||
├── specification/ # OpenAPI, Protobuf, AsyncAPI specs
|
||||
├── architecture/ # C4 diagrams, Mermaid, trade-off docs
|
||||
├── walkthrough/ # TOUR.md, sequence diagrams
|
||||
├── implementation/ # Source code (in src/)
|
||||
├── validation/ # CI configs, test suites
|
||||
└── runbook/ # Deployment, scaling, troubleshooting
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Metrics and Continuous Improvement
|
||||
|
||||
Each documentation artifact has associated KPIs. Track these to ensure quality:
|
||||
|
||||
| Document | KPI | Target |
|
||||
|----------|-----|--------|
|
||||
| Requirements | Requirement coverage | 100% of features have associated requirements |
|
||||
| Specification | Spec compliance rate | 100% of messages validate against spec |
|
||||
| Architecture | Decision documentation | 100% of major decisions logged with trade-offs |
|
||||
| Walkthrough | New dev time-to-first-PR | <2 days from onboarding to first contribution |
|
||||
| Implementation | Test coverage | >80% unit test coverage |
|
||||
| Validation | Bypass rate | <1% of PRs bypass validation gates |
|
||||
| Runbook | MTTR | <15 minutes for P1 incidents |
|
||||
|
||||
**Review Cadence**:
|
||||
- Weekly: Review KPI dashboards and documentation gaps
|
||||
- Monthly: Update documentation based on incident learnings
|
||||
- Quarterly: Full framework review and improvement
|
||||
|
||||
---
|
||||
|
||||
## Template Examples
|
||||
|
||||
### Requirements Template
|
||||
```markdown
|
||||
# PRD: Feature Name
|
||||
|
||||
## Business Goal
|
||||
[What problem are we solving?]
|
||||
|
||||
## Success Metrics
|
||||
- [Metric 1]: Target [value]
|
||||
- [Metric 2]: Target [value]
|
||||
|
||||
## User Stories
|
||||
- As a [role], I want [feature] so that [benefit]
|
||||
- Acceptance Criteria: [details]
|
||||
|
||||
## Non-Functional Requirements
|
||||
- Performance: [details]
|
||||
- Security: [details]
|
||||
- Scalability: [details]
|
||||
|
||||
## Out of Scope
|
||||
- [What's explicitly excluded]
|
||||
```
|
||||
|
||||
### Specification Template
|
||||
```yaml
|
||||
# contract.yaml
|
||||
openapi: 3.0.0
|
||||
info:
|
||||
title: NATSBridge API
|
||||
version: 1.0.0
|
||||
paths:
|
||||
/api/v1/endpoint:
|
||||
post:
|
||||
requestBody:
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: '#/components/schemas/Request'
|
||||
responses:
|
||||
'200':
|
||||
description: Success
|
||||
content:
|
||||
application/json:
|
||||
schema:
|
||||
$ref: '#/components/schemas/Response'
|
||||
```
|
||||
|
||||
### Architecture Template
|
||||
```mermaid
|
||||
%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#3b82f6'}}}%%
|
||||
flowchart TD
|
||||
A[Client] --> B[Caddy]
|
||||
B --> C[Node.js API]
|
||||
C --> D[Julia Worker]
|
||||
D --> E[NATS Cluster]
|
||||
E --> F[Storage]
|
||||
|
||||
style A fill:#f9f9f9,stroke:#333
|
||||
style E fill:#e0e7ff,stroke:#3b82f6
|
||||
```
|
||||
|
||||
### Runbook Template
|
||||
```markdown
|
||||
# Runbook: Service Restart
|
||||
|
||||
**Severity**: P2
|
||||
**Estimated Time**: 5 minutes
|
||||
|
||||
## Symptoms
|
||||
- Service is unresponsive
|
||||
- Health checks are failing
|
||||
|
||||
## Steps
|
||||
1. SSH to the host
|
||||
2. Run: `kubectl rollout restart deployment/natsbridge`
|
||||
3. Monitor: `kubectl get pods -l app=natsbridge -w`
|
||||
|
||||
## Rollback
|
||||
- Run: `kubectl rollout undo deployment/natsbridge`
|
||||
|
||||
## Post-Incident
|
||||
- [ ] Review logs for root cause
|
||||
- [ ] Update runbook if needed
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
This SDD + GitOps Documentation Framework ensures that documentation is:
|
||||
- **Structured**: Seven distinct artifacts with clear purposes
|
||||
- **Automated**: Validation and CI/CD integration
|
||||
- **Versioned**: All documentation in git with history
|
||||
- **Measurable**: KPIs for quality and effectiveness
|
||||
- **Actionable**: Practical templates and examples
|
||||
|
||||
Use this framework as a living document—update it as your team's needs evolve.
|
||||
@@ -1,33 +1,34 @@
|
||||
# Requirements Document: NATSBridge
|
||||
|
||||
**Version**: 1.0.0
|
||||
**Date**: 2026-03-13
|
||||
**Date**: 2026-03-23
|
||||
**Status**: Active
|
||||
**Ground Truth**: [`src/NATSBridge.jl`](../src/NATSBridge.jl)
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
## 1. Business Context & Success Metrics
|
||||
|
||||
### 1.1 Business Goal
|
||||
|
||||
NATSBridge is a cross-platform, bi-directional data bridge that enables seamless communication between **Julia**, **JavaScript**, **Python**, and **MicroPython** applications using NATS as the message bus. The system implements the **Claim-Check pattern** for efficient handling of large payloads (>0.5MB) by uploading them to an HTTP file server instead of sending raw binary data over NATS.
|
||||
|
||||
---
|
||||
### 1.2 User Stories (with acceptance criteria)
|
||||
|
||||
## Business Goals
|
||||
| Story | Priority | Acceptance Criteria |
|
||||
|-------|----------|---------------------|
|
||||
| **As a Julia developer**, I want to send text messages to JavaScript applications that lives on a server and also on a browser | P1 | Text messages are serialized, encoded, and received correctly across platforms |
|
||||
| **As a Python developer**, I want to send tabular data to Julia applications | P1 | DataFrame exchange works with both Arrow IPC and JSON formats |
|
||||
| **As a JavaScript developer**, I want to send large files (>0.5MB) from JavaScript applications that lives on a server and also on a browser to other applications | P1 | Large files are automatically uploaded to file server and URLs are sent via NATS |
|
||||
| **As a MicroPython developer**, I want to send sensor data with minimal memory usage | P1 | Direct transport works for payloads <100KB on memory-constrained devices |
|
||||
| **As a developer**, I want to send mixed-content messages (text + image + file) | P1 | NATSBridge accepts list of (dataname, data, type) tuples and handles each payload appropriately |
|
||||
| **As a developer**, I want to receive multi-payload messages | P1 | NATSBridge returns payloads as list of tuples with correct types preserved |
|
||||
| **As a developer**, I want to use Plik as the file server | P2 | Plik one-shot upload mode is supported with upload ID and token handling |
|
||||
| **As a developer**, I want to use custom HTTP file servers | P2 | Handler function abstraction allows plugging in AWS S3 or custom implementations |
|
||||
| **As a developer**, I want automatic retry on file server download failures | P1 | Exponential backoff with configurable retries (default: 5, base_delay: 100ms, max_delay: 5000ms) |
|
||||
| **As a developer**, I want message tracing across distributed systems | P1 | Correlation ID is propagated through all message processing steps |
|
||||
|
||||
### Primary Objectives
|
||||
|
||||
1. **Cross-Platform Interoperability**: Enable seamless data exchange between Julia, JavaScript (for both Server-Side rendering and Client-Side rendering webapp), Python, and MicroPython applications without platform-specific barriers.
|
||||
|
||||
2. **Efficient Large Payload Handling**: Implement intelligent transport selection based on payload size:
|
||||
- **Direct Transport**: Small payloads (<0.5MB) sent directly via NATS
|
||||
- **Link Transport**: Large payloads (≥0.5MB) uploaded to HTTP file server, URL sent via NATS
|
||||
|
||||
3. **Unified API Across Platforms**: Provide consistent `smartsend()` and `smartreceive()` functions across all supported platforms while maintaining idiomatic implementations.
|
||||
|
||||
4. **Developer Productivity**: Reduce onboarding time and simplify integration through comprehensive documentation and test examples.
|
||||
|
||||
### Success Metrics
|
||||
### 1.3 KPIs & Targets
|
||||
|
||||
| Metric | Target | Measurement Method |
|
||||
|--------|--------|-------------------|
|
||||
@@ -40,77 +41,48 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless
|
||||
|
||||
---
|
||||
|
||||
## User Stories
|
||||
## 2. Technical Boundaries
|
||||
|
||||
### Core Functionality
|
||||
### 2.1 In Scope
|
||||
|
||||
| Story | Priority | Acceptance Criteria |
|
||||
|-------|----------|---------------------|
|
||||
| **As a Julia developer**, I want to send text messages to JavaScript applications that lives on a server and also on a browser | P1 | Text messages are serialized, encoded, and received correctly across platforms |
|
||||
| **As a Python developer**, I want to send tabular data to Julia applications | P1 | DataFrame exchange works with both Arrow IPC and JSON formats |
|
||||
| **As a JavaScript developer**, I want to send large files (>0.5MB) from JavaScript applications that lives on a server and also on a browser to other applications | P1 | Large files are automatically uploaded to file server and URLs are sent via NATS |
|
||||
| **As a MicroPython developer**, I want to send sensor data with minimal memory usage | P1 | Direct transport works for payloads <100KB on memory-constrained devices |
|
||||
| Feature | Description |
|
||||
|---------|-------------|
|
||||
| Cross-platform interoperability | Seamless data exchange between Julia, JavaScript, Python, and MicroPython |
|
||||
| Intelligent transport selection | Direct transport (<0.5MB) vs Link transport (≥0.5MB) based on payload size |
|
||||
| Unified API | Consistent `smartsend()` and `smartreceive()` functions across all platforms |
|
||||
| Multi-payload support | List of (dataname, data, type) tuples with appropriate handling |
|
||||
| File server integration | Plik one-shot upload and custom HTTP server support |
|
||||
| Reliability features | Exponential backoff retry and correlation ID propagation |
|
||||
| Message serialization | Converts data types to binary format (Base64, JSON, Arrow IPC) |
|
||||
| NATS communication | Publishing and subscription via NATS subjects |
|
||||
|
||||
### Multi-Payload Support
|
||||
### 2.2 Out of Scope
|
||||
|
||||
| Story | Priority | Acceptance Criteria |
|
||||
|-------|----------|---------------------|
|
||||
| **As a developer**, I want to send mixed-content messages (text + image + file) | P1 | NATSBridge accepts list of (dataname, data, type) tuples and handles each payload appropriately |
|
||||
| **As a developer**, I want to receive multi-payload messages | P1 | NATSBridge returns payloads as list of tuples with correct types preserved |
|
||||
| Feature | Reason |
|
||||
|---------|--------|
|
||||
| NATS JetStream support | Core NATS sufficient for current use cases |
|
||||
| Message compression | Compression adds complexity without clear benefit |
|
||||
| Message encryption | Payload encryption is application-layer concern |
|
||||
| Persistent message queues | NATS request-reply pattern sufficient |
|
||||
| Advanced routing rules | Simple NATS subject matching sufficient |
|
||||
|
||||
### File Server Integration
|
||||
### 2.3 Dependencies
|
||||
|
||||
| Story | Priority | Acceptance Criteria |
|
||||
|-------|----------|---------------------|
|
||||
| **As a developer**, I want to use Plik as the file server | P2 | Plik one-shot upload mode is supported with upload ID and token handling |
|
||||
| **As a developer**, I want to use custom HTTP file servers | P2 | Handler function abstraction allows plugging in AWS S3 or custom implementations |
|
||||
| Platform | Package | Version |
|
||||
|----------|---------|---------|
|
||||
| Julia | NATS.jl | Latest stable |
|
||||
| Julia | JSON.jl | Latest stable |
|
||||
| Julia | Arrow.jl | Latest stable |
|
||||
| Julia | HTTP.jl | Latest stable |
|
||||
| Julia | UUIDs.jl | Latest stable |
|
||||
| Node.js | nats | Latest stable |
|
||||
| Node.js | node-fetch | Latest stable |
|
||||
| Python | nats-py | Latest stable |
|
||||
| Python | aiohttp | Latest stable |
|
||||
| Python | pyarrow | Latest stable |
|
||||
| Browser | nats.ws | Latest stable |
|
||||
|
||||
### Reliability Features
|
||||
|
||||
| Story | Priority | Acceptance Criteria |
|
||||
|-------|----------|---------------------|
|
||||
| **As a developer**, I want automatic retry on file server download failures | P1 | Exponential backoff with configurable retries (default: 5, base_delay: 100ms, max_delay: 5000ms) |
|
||||
| **As a developer**, I want message tracing across distributed systems | P1 | Correlation ID is propagated through all message processing steps |
|
||||
|
||||
---
|
||||
|
||||
## Non-Functional Requirements
|
||||
|
||||
### Performance Requirements
|
||||
|
||||
| Requirement | Specification | Test Method |
|
||||
|-------------|---------------|-------------|
|
||||
| Message serialization overhead | <50ms for 10KB payload | Benchmark tests |
|
||||
| Message deserialization overhead | <50ms for 10KB payload | Benchmark tests |
|
||||
| NATS connection establishment | <100ms | Connection pool benchmarks |
|
||||
| File upload latency | <1s for 0.5MB file | Integration tests |
|
||||
| File download latency | <1s for 0.5MB file | Integration tests |
|
||||
|
||||
### Scalability Requirements
|
||||
|
||||
| Requirement | Specification |
|
||||
|-------------|---------------|
|
||||
| Concurrent connections | Support 100+ simultaneous NATS connections |
|
||||
| Message throughput | Handle 1000+ messages/second per instance |
|
||||
| File server scalability | Support horizontal scaling of file server backend |
|
||||
|
||||
### Reliability Requirements
|
||||
|
||||
| Requirement | Specification |
|
||||
|-------------|---------------|
|
||||
| Message delivery | At-least-once delivery semantics via NATS |
|
||||
| File server availability | Graceful degradation when file server is unavailable |
|
||||
| Connection recovery | Auto-reconnect on NATS connection failure |
|
||||
|
||||
### Security Requirements
|
||||
|
||||
| Requirement | Specification |
|
||||
|-------------|---------------|
|
||||
| Payload integrity | SHA-256 checksum support via metadata |
|
||||
| Transport security | TLS support for NATS connections |
|
||||
| File server security | Authentication token for file uploads |
|
||||
|
||||
### Compatibility Requirements
|
||||
### 2.4 Platform Compatibility
|
||||
|
||||
| Platform | Minimum Version | Notes |
|
||||
|----------|-----------------|-------|
|
||||
@@ -122,61 +94,90 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless
|
||||
|
||||
---
|
||||
|
||||
## Out of Scope
|
||||
## 3. Functional Requirements (FR)
|
||||
|
||||
### Phase 1 (Current Implementation)
|
||||
|
||||
| Feature | Reason |
|
||||
|---------|--------|
|
||||
| NATS JetStream support | Core NATS sufficient for current use cases |
|
||||
| Message compression | Compression adds complexity without clear benefit |
|
||||
| Message encryption | Payload encryption is application-layer concern |
|
||||
| Persistent message queues | NATS request-reply pattern sufficient |
|
||||
| Advanced routing rules | Simple NATS subject matching sufficient |
|
||||
|
||||
### Future Considerations
|
||||
|
||||
| Feature | Future Phase |
|
||||
|---------|--------------|
|
||||
| JetStream streams and consumers | Phase 2 |
|
||||
| Message TTL and dead-letter queues | Phase 3 |
|
||||
| Message tracing with OpenTelemetry | Phase 3 |
|
||||
| Rate limiting and quota management | Phase 4 |
|
||||
| ID | Requirement | Description |
|
||||
|----|-------------|-------------|
|
||||
| **FR-001** | Cross-platform text messaging | System shall allow users to send text messages between Julia, JavaScript, Python, and MicroPython applications |
|
||||
| **FR-002** | Cross-platform tabular data | System shall support DataFrame exchange between Julia and Python applications using Arrow IPC format |
|
||||
| **FR-003** | Large file handling | System shall automatically detect payloads ≥0.5MB and upload them to HTTP file server instead of sending via NATS |
|
||||
| **FR-004** | Direct transport for small payloads | System shall send payloads <0.5MB directly via NATS without file server upload |
|
||||
| **FR-005** | MicroPython support | System shall support payloads <100KB on MicroPython devices using direct transport |
|
||||
| **FR-006** | Multi-payload messages | System shall accept and process lists of (dataname, data, type) tuples |
|
||||
| **FR-007** | Payload type preservation | System shall preserve payload types when returning multi-payload messages |
|
||||
| **FR-008** | Plik file server integration | System shall support Plik one-shot upload mode with upload ID and token handling |
|
||||
| **FR-009** | Custom file server support | System shall provide handler function abstraction for custom HTTP file server implementations |
|
||||
| **FR-010** | Exponential backoff retry | System shall implement exponential backoff with configurable retries (default: 5, base_delay: 100ms, max_delay: 5000ms) for file server download failures |
|
||||
| **FR-011** | Correlation ID propagation | System shall propagate correlation IDs through all message processing steps |
|
||||
| **FR-012** | Message serialization | System shall serialize data types using Base64, JSON, or Arrow IPC encoding |
|
||||
| **FR-013** | NATS publishing | System shall publish messages to NATS subjects |
|
||||
| **FR-014** | NATS subscription | System shall receive and process NATS messages |
|
||||
|
||||
---
|
||||
|
||||
## Boundary Definitions
|
||||
## 4. Non-Functional Requirements (NFRs)
|
||||
|
||||
### What NATSBridge Handles
|
||||
### 4.1 Performance & Scalability
|
||||
|
||||
| Function | Description |
|
||||
|----------|-------------|
|
||||
| Message serialization | Converts data types to binary format |
|
||||
| Message encoding | Base64, JSON, Arrow IPC encoding |
|
||||
| Transport selection | Direct vs link based on size threshold |
|
||||
| NATS publishing | Publishes messages to NATS subjects |
|
||||
| NATS subscription | Receives and processes NATS messages |
|
||||
| File server upload | Uploads large payloads to HTTP server |
|
||||
| File server download | Downloads payloads from HTTP server with retry |
|
||||
| Correlation ID generation | Creates and propagates UUIDs |
|
||||
| Data deserialization | Converts binary format back to native types |
|
||||
| ID | Requirement | Specification | Test Method |
|
||||
|----|-------------|---------------|-------------|
|
||||
| **NFR-101** | Message serialization overhead | <50ms for 10KB payload | Benchmark tests |
|
||||
| **NFR-102** | Message deserialization overhead | <50ms for 10KB payload | Benchmark tests |
|
||||
| **NFR-103** | NATS connection establishment | <100ms | Connection pool benchmarks |
|
||||
| **NFR-104** | File upload latency | <1s for 0.5MB file | Integration tests |
|
||||
| **NFR-105** | File download latency | <1s for 0.5MB file | Integration tests |
|
||||
| **NFR-106** | Concurrent connections | Support 100+ simultaneous NATS connections | Scale testing |
|
||||
| **NFR-107** | Message throughput | Handle 1000+ messages/second per instance | Load testing |
|
||||
| **NFR-108** | File server scalability | Support horizontal scaling of file server backend | Architecture review |
|
||||
|
||||
### What NATSBridge Does NOT Handle
|
||||
### 4.2 Availability & Reliability
|
||||
|
||||
| Function | Handled By |
|
||||
|----------|------------|
|
||||
| NATS server management | External NATS deployment |
|
||||
| File server management | External HTTP server deployment |
|
||||
| Application business logic | Application code using NATSBridge |
|
||||
| Message encryption | Application layer |
|
||||
| Message compression | Application layer |
|
||||
| Authentication/Authorization | NATS server configuration |
|
||||
| ID | Requirement | Specification |
|
||||
|----|-------------|---------------|
|
||||
| **NFR-201** | Message delivery | At-least-once delivery semantics via NATS |
|
||||
| **NFR-202** | File server availability | Graceful degradation when file server is unavailable |
|
||||
| **NFR-203** | Connection recovery | Auto-reconnect on NATS connection failure |
|
||||
|
||||
### 4.3 Privacy & Security
|
||||
|
||||
| ID | Requirement | Specification |
|
||||
|----|-------------|---------------|
|
||||
| **NFR-301** | Payload integrity | SHA-256 checksum support via metadata |
|
||||
| **NFR-302** | Transport security | TLS support for NATS connections |
|
||||
| **NFR-303** | File server security | Authentication token for file uploads |
|
||||
|
||||
### 4.4 Observability & Telemetry
|
||||
|
||||
| ID | Requirement | Specification |
|
||||
|----|-------------|---------------|
|
||||
| **NFR-401** | Required logs | `correlation_id`, `msg_id`, `timestamp`, `sender_name`, `receiver_name`, `payload_type`, `transport` |
|
||||
| **NFR-402** | Critical metrics | `messages_sent_total`, `messages_received_total`, `file_upload_duration_seconds`, `file_download_duration_seconds`, `retry_attempts_total` |
|
||||
| **NFR-403** | Tracing | Correlation ID propagation for request tracing |
|
||||
| **NFR-404** | Alerting | `download_retry_exceeded` triggers alert when max retries exceeded |
|
||||
| **NFR-405** | Retention | Logs: 30 days, Metrics: 1 year |
|
||||
|
||||
---
|
||||
|
||||
## Payload Type Requirements
|
||||
## 5. Acceptance Conditions
|
||||
|
||||
### Supported Payload Types
|
||||
| Condition | Description |
|
||||
|-----------|-------------|
|
||||
| **AC-001** | All functional requirements FR-001 through FR-014 are implemented and tested |
|
||||
| **AC-002** | All non-functional requirements NFR-101 through NFR-405 meet specified targets |
|
||||
| **AC-003** | Cross-platform text message test passes (Julia ↔ JavaScript ↔ Python) |
|
||||
| **AC-004** | Cross-platform tabular data test passes with Arrow IPC round-trip (Desktop) |
|
||||
| **AC-005** | Cross-platform tabular data test passes with JSON table round-trip (Browser) |
|
||||
| **AC-006** | Large file transfer test passes with file server upload/download |
|
||||
| **AC-007** | Multi-payload mixed content test passes with all payload types in one message |
|
||||
| **AC-008** | CI validation gates block PRs on specification violations |
|
||||
| **AC-009** | Unit test coverage exceeds 80% |
|
||||
| **AC-010** | Documentation is complete and includes walkthroughs, architecture, and runbook |
|
||||
|
||||
---
|
||||
|
||||
## 6. Payload Type Requirements
|
||||
|
||||
### 6.1 Supported Payload Types
|
||||
|
||||
| Type | Julia | JavaScript | Python | MicroPython | Description |
|
||||
|------|-------|------------|--------|-------------|-------------|
|
||||
@@ -189,7 +190,7 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless
|
||||
| `video` | `Vector{UInt8}` | `Uint8Array`, `Buffer` | `bytes` | `bytearray` | Video binary data |
|
||||
| `binary` | `Vector{UInt8}`, `IOBuffer` | `Uint8Array`, `Buffer` | `bytes`, `bytearray` | `bytearray` | Generic binary data |
|
||||
|
||||
### Encoding Requirements
|
||||
### 6.2 Encoding Requirements
|
||||
|
||||
| Payload Type | Encoding Method | Notes |
|
||||
|--------------|-----------------|-------|
|
||||
@@ -201,16 +202,16 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless
|
||||
|
||||
---
|
||||
|
||||
## Size Threshold Requirements
|
||||
## 7. Size Threshold Requirements
|
||||
|
||||
### Direct Transport Threshold
|
||||
### 7.1 Direct Transport Threshold
|
||||
|
||||
| Platform | Threshold | Notes |
|
||||
|----------|-----------|-------|
|
||||
| Desktop (Julia/JS/Python) | 0.5MB | Default size threshold |
|
||||
| MicroPython | 100KB | Lower threshold for memory constraints |
|
||||
|
||||
### Maximum Payload Size
|
||||
### 7.2 Maximum Payload Size
|
||||
|
||||
| Platform | Maximum | Notes |
|
||||
|----------|---------|-------|
|
||||
@@ -219,9 +220,9 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless
|
||||
|
||||
---
|
||||
|
||||
## Message Envelope Requirements
|
||||
## 8. Message Envelope Requirements
|
||||
|
||||
### Required Fields
|
||||
### 8.1 Required Fields
|
||||
|
||||
| Field | Type | Purpose |
|
||||
|-------|------|---------|
|
||||
@@ -240,7 +241,7 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless
|
||||
| `metadata` | Dict | Message-level metadata |
|
||||
| `payloads` | Array | List of payload objects |
|
||||
|
||||
### Payload Fields
|
||||
### 8.2 Payload Fields
|
||||
|
||||
| Field | Type | Purpose |
|
||||
|-------|------|---------|
|
||||
@@ -255,9 +256,9 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless
|
||||
|
||||
---
|
||||
|
||||
## Error Handling Requirements
|
||||
## 9. Error Handling Requirements
|
||||
|
||||
### Error Codes
|
||||
### 9.1 Error Codes
|
||||
|
||||
| Error | Condition | Response |
|
||||
|-------|-----------|----------|
|
||||
@@ -267,7 +268,7 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless
|
||||
| `Unknown transport` | Invalid transport type | Throw error |
|
||||
| `NATS connection failed` | NATS unavailable | Throw error |
|
||||
|
||||
### Exception Handling
|
||||
### 9.2 Exception Handling
|
||||
|
||||
| Scenario | Handler |
|
||||
|----------|---------|
|
||||
@@ -278,9 +279,9 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless
|
||||
|
||||
---
|
||||
|
||||
## Testing Requirements
|
||||
## 10. Testing Requirements
|
||||
|
||||
### Unit Tests
|
||||
### 10.1 Unit Tests
|
||||
|
||||
| Test Category | Coverage | Files |
|
||||
|---------------|----------|-------|
|
||||
@@ -290,7 +291,7 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless
|
||||
| File server upload | Plik integration | Platform-specific |
|
||||
| File server download | Exponential backoff | Platform-specific |
|
||||
|
||||
### Integration Tests
|
||||
### 10.2 Integration Tests
|
||||
|
||||
| Test Scenario | Success Criteria |
|
||||
|-------------|-----------------|
|
||||
@@ -302,9 +303,9 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless
|
||||
|
||||
---
|
||||
|
||||
## API Contract
|
||||
## 11. API Contract
|
||||
|
||||
### smartsend Signature
|
||||
### 11.1 smartsend Signature
|
||||
|
||||
```julia
|
||||
function smartsend(
|
||||
@@ -328,7 +329,7 @@ function smartsend(
|
||||
)::Tuple{msg_envelope_v1, String}
|
||||
```
|
||||
|
||||
### smartreceive Signature
|
||||
### 11.2 smartreceive Signature
|
||||
|
||||
```julia
|
||||
function smartreceive(
|
||||
@@ -342,36 +343,9 @@ function smartreceive(
|
||||
|
||||
---
|
||||
|
||||
## Dependencies
|
||||
## 12. Deployment Requirements
|
||||
|
||||
### Required Dependencies
|
||||
|
||||
| Platform | Package | Version |
|
||||
|----------|---------|---------|
|
||||
| Julia | NATS.jl | Latest stable |
|
||||
| Julia | JSON.jl | Latest stable |
|
||||
| Julia | Arrow.jl | Latest stable |
|
||||
| Julia | HTTP.jl | Latest stable |
|
||||
| Julia | UUIDs.jl | Latest stable |
|
||||
| Node.js | nats | Latest stable |
|
||||
| Node.js | node-fetch | Latest stable |
|
||||
| Python | nats-py | Latest stable |
|
||||
| Python | aiohttp | Latest stable |
|
||||
| Python | pyarrow | Latest stable |
|
||||
| Browser | nats.ws | Latest stable |
|
||||
|
||||
### Optional Dependencies
|
||||
|
||||
| Platform | Package | Use Case |
|
||||
|----------|---------|----------|
|
||||
| Julia | DataFrames.jl | DataFrame support for arrowtable |
|
||||
| Python | pandas | DataFrame support for arrowtable |
|
||||
|
||||
---
|
||||
|
||||
## Deployment Requirements
|
||||
|
||||
### Minimum Infrastructure
|
||||
### 12.1 Minimum Infrastructure
|
||||
|
||||
| Component | Minimum | Notes |
|
||||
|-----------|---------|-------|
|
||||
@@ -380,7 +354,7 @@ function smartreceive(
|
||||
| Client Memory | 50MB | Desktop platforms |
|
||||
| Client Memory | 256KB | MicroPython devices |
|
||||
|
||||
### Environment Variables
|
||||
### 12.2 Environment Variables
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
@@ -390,15 +364,15 @@ function smartreceive(
|
||||
|
||||
---
|
||||
|
||||
## Versioning
|
||||
## 13. Versioning
|
||||
|
||||
### Current Version
|
||||
### 13.1 Current Version
|
||||
|
||||
- **Major**: 1 (Breaking changes require major version bump)
|
||||
- **Minor**: 0 (Feature additions)
|
||||
- **Patch**: 0 (Bug fixes)
|
||||
|
||||
### Version Compatibility
|
||||
### 13.2 Version Compatibility
|
||||
|
||||
| Version | Supported Platforms |
|
||||
|---------|---------------------|
|
||||
@@ -406,18 +380,21 @@ function smartreceive(
|
||||
|
||||
---
|
||||
|
||||
## Change Log
|
||||
## 14. Change Log
|
||||
|
||||
| Date | Version | Changes |
|
||||
|------|---------|---------|
|
||||
| 2026-03-13 | 1.0.0 | Initial requirements document |
|
||||
| 2026-03-23 | 1.0.0 | Updated to ASG Framework requirements structure |
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
## 15. References
|
||||
|
||||
- [`src/NATSBridge.jl`](../src/NATSBridge.jl) - Ground truth implementation
|
||||
- [`README.md`](../README.md) - Project overview
|
||||
- [`docs/specification.md`](./specification.md) - Technical specification
|
||||
- [`docs/ui-specification.md`](./ui-specification.md) - UI specification
|
||||
- [`docs/walkthrough.md`](./walkthrough.md) - End-to-end walkthrough
|
||||
- [`docs/architecture.md`](./architecture.md) - Architecture documentation
|
||||
- [`docs/implementation.md`](./implementation.md) - Implementation details
|
||||
- [`docs/walkthrough.md`](./walkthrough.md) - Usage examples
|
||||
- [`docs/validation.md`](./validation.md) - Validation and CI/CD
|
||||
- [`docs/runbook.md`](./runbook.md) - Operational runbook
|
||||
Reference in New Issue
Block a user