diff --git a/AI_prompt.md b/AI_prompt.md index 4318e8c..9a57bcf 100644 --- a/AI_prompt.md +++ b/AI_prompt.md @@ -143,12 +143,19 @@ Since I develop src folder before I adopt SDD_FRAMEWORK.md approach, can you che # ---------------------------------------------- 100 --------------------------------------------- # +Check NATSBridge/docs folder I want to update the content of the following files according to ASG_Framework/ASG_Framework.md: +- NATSBridge/docs/requirements.md +- NATSBridge/docs/specification.md +- NATSBridge/docs/ui-specification.md (you'll need to create this one) +- NATSBridge/docs/walkthrough.md +- NATSBridge/docs/architecture.md +I'll do the other docs not listed here later myself. - + diff --git a/docs/SDD_FRAMEWORK.md b/docs/SDD_FRAMEWORK.md deleted file mode 100644 index f278dcf..0000000 --- a/docs/SDD_FRAMEWORK.md +++ /dev/null @@ -1,402 +0,0 @@ -# SDD + GitOps Documentation Framework - -This document defines the documentation framework for the NATSBridge project. It establishes a structured approach to creating, maintaining, and evolving technical documentation in alignment with GitOps principles—ensuring that documentation is versioned, auditable, and continuously validated alongside the codebase. - ---- - -## The SDD Framework: Seven Pillars of Documentation - -| Document | Purpose (Rationale) | Primary Audience | Format / Content | Example (SaaS Context) | Measurement (KPI) | -|----------|---------------------|-----------------|------------------|------------------------|-------------------| -| **Requirements** | Capture the **business intent** — why we're building this and what success looks like. Defines boundaries and user-visible outcomes. | Stakeholders, Product Owners, Lead Developers | User stories, PRDs, acceptance criteria, non-functional constraints. | "System must process tabular data from Julia to SvelteKit UI with <200ms latency for 5-member teams." | 95% of requests complete <200ms (synthetic monitoring). | -| **Specification** | The **technical contract** — precise rules for inputs, outputs, and data shape. Ensures consistency across dev and test. | Developers, QA Engineers, CI/CD pipelines | OpenAPI, Protobuf, AsyncAPI. Endpoint definitions, schemas, error codes. | `contract.yaml` defining a NATS subject that accepts Arrow streams with snake_case headers. | 100% of messages validated against spec (CI block rate). | -| **Architecture** | The **blueprint** — how components fit together, interact, and scale. Guides system structure and trade-offs. | Architects, Senior Developers, DevOps | C4 diagrams, Mermaid.js, component/network/storage models. | Diagram showing 6-node cluster routing traffic via Caddy → Node.js API → Julia pods. | 100% of major decisions logged with trade-off analysis. | -| **Walkthrough** | The **story of flow** — shows how pieces connect end-to-end and why steps are sequenced. Builds intuition for new devs. | New Developers, Team Members | TOUR.md, Loom videos, sequence diagrams. Step-by-step traces with rationale. | "UI sends JSON → Node.js wraps Claim-Check → Julia pulls Arrow data (prevents NATS overflow)." | New developers ship feature in <2 days (PR timeline). | -| **Implementation** | The **real code** — business logic, helpers, tests, configs. Where design becomes executable. | Developers, Code Reviewers | Source code, README.md, unit tests, setup scripts. | Julia function for matrix calculation + SvelteKit component rendering table. | >80% unit test coverage, <5% drift from spec. | -| **Validation** | The **enforcer** — ensures implementation matches the spec. Blocks drift and human error. | Automation servers, QA, Lead Developers | CI jobs, contract tests, linting, integration checks. | CI job rejects PR with camelCase field not allowed by YAML spec. | <1% of PRs bypass validation gates. | -| **Runbook** | The **operational manual** — how the system lives in production, scales, and recovers. Guides on-call engineers. | DevOps, SREs, On-call Developers | K8s manifests, Helm charts, Markdown guides. Deployment, scaling, backup/restore, troubleshooting. | GitOps manifest ensuring 6 Julia replicas restart if memory >80%. | MTTR <15 minutes for P1 incidents. | - ---- - -## Detailed Document Descriptions - -### 1. Requirements - -**Purpose**: Capture the *business intent* — why we're building this and what success looks like. Defines boundaries and user-visible outcomes. - -**Why It Matters**: -- Aligns engineering efforts with business goals -- Provides a north star for feature development -- Establishes acceptance criteria before implementation begins -- Creates a contract between product and engineering - -**Content Guidelines**: -- User stories with clear acceptance criteria (As a X, I want Y so that Z) -- Product Requirements Documents (PRDs) with success metrics -- Non-functional requirements (performance, security, scalability) -- Boundary definitions (what's in scope vs. out of scope) - -**Best Practices**: -- Link each requirement to a measurable KPI -- Keep requirements testable and verifiable -- Maintain backward compatibility with existing requirements -- Review and update requirements as business context changes - ---- - -### 2. Specification - -**Purpose**: The *technical contract* — precise rules for inputs, outputs, and data shape. Ensures consistency across dev and test. - -**Why It Matters**: -- Prevents implementation drift between components -- Enables contract testing in CI/CD pipelines -- Provides a single source of truth for data structures -- Facilitates integration between teams - -**Content Guidelines**: -- API endpoint definitions (methods, paths, parameters) -- Request/response schemas (JSON, XML, Protobuf, AsyncAPI) -- Error codes and their meanings -- Data validation rules and constraints -- Rate limiting and quota definitions - -**Best Practices**: -- Use formal specification languages (OpenAPI 3.0+, AsyncAPI) -- Version specifications alongside code -- Generate client SDKs from specifications -- Block CI on specification violations -- Document edge cases and error scenarios - ---- - -### 3. Architecture - -**Purpose**: The *blueprint* — how components fit together, interact, and scale. Guides system structure and trade-offs. - -**Why It Matters**: -- Provides a mental model for system design -- Guides technical decision-making and trade-off analysis -- Facilitates onboarding of new architects and senior developers -- Documents scaling and performance considerations - -**Content Guidelines**: -- C4 diagrams (Context, Container, Component levels) -- Mermaid.js flowcharts for sequence diagrams -- Component interaction diagrams -- Network topology and data flow -- Storage and caching strategies -- Scaling and resilience patterns - -**Best Practices**: -- Use diagrams that are easy to update (Mermaid.js over static images) -- Document trade-off decisions with Rationale Documents -- Include scaling considerations for each component -- Document failure modes and recovery strategies -- Keep architecture diagrams versioned with code - ---- - -### 4. Walkthrough - -**Purpose**: The *story of flow* — shows how pieces connect end-to-end and why steps are sequenced. Builds intuition for new devs. - -**Why It Matters**: -- Reduces onboarding time for new developers -- Provides context that code comments alone cannot convey -- Explains the "why" behind architectural decisions -- Helps identify gaps in the system design - -**Content Guidelines**: -- Step-by-step flow descriptions with rationale -- Sequence diagrams showing request/response patterns -- "Tour of the codebase" guides -- Video walkthroughs (Loom, internal recordings) -- Debugging and tracing examples - -**Best Practices**: -- Walk through real user journeys, not just technical flows -- Include "what could go wrong" scenarios -- Link walkthroughs to relevant code locations -- Keep walkthroughs updated with architecture changes -- Make walkthroughs interactive where possible - ---- - -### 5. Implementation - -**Purpose**: The *real code* — business logic, helpers, tests, configs. Where design becomes executable. - -**Why It Matters**: -- This is the actual artifact that runs in production -- Code is the ultimate source of truth (when it matches spec) -- Tests validate correctness and prevent regressions -- Configuration files define runtime behavior - -**Content Guidelines**: -- Business logic implementation -- Helper functions and utilities -- Unit and integration tests -- Configuration files (YAML, JSON, environment) -- Setup and development scripts -- Code organization and module structure - -**Best Practices**: -- Follow consistent code style and conventions -- Write tests before or alongside implementation (TDD/BDD) -- Document complex logic with inline comments -- Keep configuration externalized and versioned -- Use type annotations where applicable - ---- - -### 6. Validation - -**Purpose**: The *enforcer* — ensures implementation matches the spec. Blocks drift and human error. - -**Why It Matters**: -- Prevents breaking changes from reaching production -- Catches specification violations early in the CI pipeline -- Maintains data integrity and API consistency -- Reduces manual QA effort through automation - -**Content Guidelines**: -- CI/CD pipeline configurations -- Contract testing scripts -- Linting rules and configurations -- Integration test suites -- Schema validation jobs -- Security scanning and audit jobs - -**Best Practices**: -- Fail CI on specification violations -- Run validation jobs on every commit and PR -- Use automated code review tools -- Maintain validation job health dashboard -- Document validation failure remediation steps - ---- - -### 7. Runbook - -**Purpose**: The *operational manual* — how the system lives in production, scales, and recovers. Guides on-call engineers. - -**Why It Matters**: -- Reduces Mean Time To Recovery (MTTR) for incidents -- Provides step-by-step guidance for common issues -- Documents scaling and deployment procedures -- Ensures operational knowledge is not siloed - -**Content Guidelines**: -- Deployment procedures (manual and automated) -- Scaling instructions (horizontal/vertical) -- Backup and restore procedures -- Troubleshooting guides for common issues -- Runbook entries for specific error codes -- Contact information and escalation paths - -**Best Practices**: -- Write runbooks for every P1/P2 incident -- Include exact commands and configuration snippets -- Test runbooks periodically (chaos engineering) -- Link runbook entries to relevant documentation -- Keep runbooks updated when system changes - ---- - -## How to Use This Approach Effectively - -### 1. Start with Requirements - -Before writing any code or documentation, establish clear requirements. Ask: -- What business problem are we solving? -- How will we measure success? -- What are the non-negotiable constraints? - -**Action**: Create a `docs/requirements/` directory and start with `PRD.md` and `KPIs.md`. - -### 2. Define the Specification First - -Once requirements are stable, define the technical specification. This becomes the contract for implementation. - -**Action**: Create `docs/specification/` with `contract.yaml` (or appropriate format) and `error-codes.md`. - -### 3. Design the Architecture - -With requirements and specification in place, design the architecture. Document trade-off decisions explicitly. - -**Action**: Create `docs/architecture/` with Mermaid diagrams and `trade-offs.md`. - -### 4. Create Walkthroughs Early - -As soon as the architecture is defined, create walkthroughs. This helps identify gaps and provides onboarding material. - -**Action**: Create `docs/walkthrough/` with `TOUR.md` and sequence diagrams. - -### 5. Implement with Validation in Mind - -Write implementation code that adheres to the specification. Build validation into the CI pipeline from day one. - -**Action**: Ensure test files are co-located with implementation and run on every commit. - -### 6. Automate Validation - -Build automated validation that runs in CI/CD. This ensures spec compliance and prevents drift. - -**Action**: Configure CI jobs to validate against specification and block PRs on violations. - -### 7. Document Operations from Day One - -Create runbook entries as soon as deployment procedures are established. Update them when incidents occur. - -**Action**: Create `docs/runbook/` with entries for deployment, scaling, and common issues. - ---- - -## GitOps Integration - -This documentation framework aligns with GitOps principles: - -| GitOps Principle | Documentation Alignment | -|-----------------|------------------------| -| **Versioned** | All documentation lives in git, with history and audit trail | -| ** declarative** | Specifications and architecture are declarative contracts | -| **Automated** | Validation jobs automate spec compliance checks | -| **Self-Service** | Walkthroughs and runbooks enable self-service onboarding and operations | -| **Observability** | KPIs and metrics are defined for each documentation artifact | - -**Git Structure**: -``` -docs/ -├── requirements/ # PRDs, user stories, KPIs -├── specification/ # OpenAPI, Protobuf, AsyncAPI specs -├── architecture/ # C4 diagrams, Mermaid, trade-off docs -├── walkthrough/ # TOUR.md, sequence diagrams -├── implementation/ # Source code (in src/) -├── validation/ # CI configs, test suites -└── runbook/ # Deployment, scaling, troubleshooting -``` - ---- - -## Metrics and Continuous Improvement - -Each documentation artifact has associated KPIs. Track these to ensure quality: - -| Document | KPI | Target | -|----------|-----|--------| -| Requirements | Requirement coverage | 100% of features have associated requirements | -| Specification | Spec compliance rate | 100% of messages validate against spec | -| Architecture | Decision documentation | 100% of major decisions logged with trade-offs | -| Walkthrough | New dev time-to-first-PR | <2 days from onboarding to first contribution | -| Implementation | Test coverage | >80% unit test coverage | -| Validation | Bypass rate | <1% of PRs bypass validation gates | -| Runbook | MTTR | <15 minutes for P1 incidents | - -**Review Cadence**: -- Weekly: Review KPI dashboards and documentation gaps -- Monthly: Update documentation based on incident learnings -- Quarterly: Full framework review and improvement - ---- - -## Template Examples - -### Requirements Template -```markdown -# PRD: Feature Name - -## Business Goal -[What problem are we solving?] - -## Success Metrics -- [Metric 1]: Target [value] -- [Metric 2]: Target [value] - -## User Stories -- As a [role], I want [feature] so that [benefit] - - Acceptance Criteria: [details] - -## Non-Functional Requirements -- Performance: [details] -- Security: [details] -- Scalability: [details] - -## Out of Scope -- [What's explicitly excluded] -``` - -### Specification Template -```yaml -# contract.yaml -openapi: 3.0.0 -info: - title: NATSBridge API - version: 1.0.0 -paths: - /api/v1/endpoint: - post: - requestBody: - content: - application/json: - schema: - $ref: '#/components/schemas/Request' - responses: - '200': - description: Success - content: - application/json: - schema: - $ref: '#/components/schemas/Response' -``` - -### Architecture Template -```mermaid -%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#3b82f6'}}}%% -flowchart TD - A[Client] --> B[Caddy] - B --> C[Node.js API] - C --> D[Julia Worker] - D --> E[NATS Cluster] - E --> F[Storage] - - style A fill:#f9f9f9,stroke:#333 - style E fill:#e0e7ff,stroke:#3b82f6 -``` - -### Runbook Template -```markdown -# Runbook: Service Restart - -**Severity**: P2 -**Estimated Time**: 5 minutes - -## Symptoms -- Service is unresponsive -- Health checks are failing - -## Steps -1. SSH to the host -2. Run: `kubectl rollout restart deployment/natsbridge` -3. Monitor: `kubectl get pods -l app=natsbridge -w` - -## Rollback -- Run: `kubectl rollout undo deployment/natsbridge` - -## Post-Incident -- [ ] Review logs for root cause -- [ ] Update runbook if needed -``` - ---- - -## Conclusion - -This SDD + GitOps Documentation Framework ensures that documentation is: -- **Structured**: Seven distinct artifacts with clear purposes -- **Automated**: Validation and CI/CD integration -- **Versioned**: All documentation in git with history -- **Measurable**: KPIs for quality and effectiveness -- **Actionable**: Practical templates and examples - -Use this framework as a living document—update it as your team's needs evolve. \ No newline at end of file diff --git a/docs/requirements.md b/docs/requirements.md index a0cfaea..4efd51e 100644 --- a/docs/requirements.md +++ b/docs/requirements.md @@ -1,33 +1,34 @@ # Requirements Document: NATSBridge **Version**: 1.0.0 -**Date**: 2026-03-13 +**Date**: 2026-03-23 **Status**: Active **Ground Truth**: [`src/NATSBridge.jl`](../src/NATSBridge.jl) --- -## Executive Summary +## 1. Business Context & Success Metrics + +### 1.1 Business Goal NATSBridge is a cross-platform, bi-directional data bridge that enables seamless communication between **Julia**, **JavaScript**, **Python**, and **MicroPython** applications using NATS as the message bus. The system implements the **Claim-Check pattern** for efficient handling of large payloads (>0.5MB) by uploading them to an HTTP file server instead of sending raw binary data over NATS. ---- +### 1.2 User Stories (with acceptance criteria) -## Business Goals +| Story | Priority | Acceptance Criteria | +|-------|----------|---------------------| +| **As a Julia developer**, I want to send text messages to JavaScript applications that lives on a server and also on a browser | P1 | Text messages are serialized, encoded, and received correctly across platforms | +| **As a Python developer**, I want to send tabular data to Julia applications | P1 | DataFrame exchange works with both Arrow IPC and JSON formats | +| **As a JavaScript developer**, I want to send large files (>0.5MB) from JavaScript applications that lives on a server and also on a browser to other applications | P1 | Large files are automatically uploaded to file server and URLs are sent via NATS | +| **As a MicroPython developer**, I want to send sensor data with minimal memory usage | P1 | Direct transport works for payloads <100KB on memory-constrained devices | +| **As a developer**, I want to send mixed-content messages (text + image + file) | P1 | NATSBridge accepts list of (dataname, data, type) tuples and handles each payload appropriately | +| **As a developer**, I want to receive multi-payload messages | P1 | NATSBridge returns payloads as list of tuples with correct types preserved | +| **As a developer**, I want to use Plik as the file server | P2 | Plik one-shot upload mode is supported with upload ID and token handling | +| **As a developer**, I want to use custom HTTP file servers | P2 | Handler function abstraction allows plugging in AWS S3 or custom implementations | +| **As a developer**, I want automatic retry on file server download failures | P1 | Exponential backoff with configurable retries (default: 5, base_delay: 100ms, max_delay: 5000ms) | +| **As a developer**, I want message tracing across distributed systems | P1 | Correlation ID is propagated through all message processing steps | -### Primary Objectives - -1. **Cross-Platform Interoperability**: Enable seamless data exchange between Julia, JavaScript (for both Server-Side rendering and Client-Side rendering webapp), Python, and MicroPython applications without platform-specific barriers. - -2. **Efficient Large Payload Handling**: Implement intelligent transport selection based on payload size: - - **Direct Transport**: Small payloads (<0.5MB) sent directly via NATS - - **Link Transport**: Large payloads (≥0.5MB) uploaded to HTTP file server, URL sent via NATS - -3. **Unified API Across Platforms**: Provide consistent `smartsend()` and `smartreceive()` functions across all supported platforms while maintaining idiomatic implementations. - -4. **Developer Productivity**: Reduce onboarding time and simplify integration through comprehensive documentation and test examples. - -### Success Metrics +### 1.3 KPIs & Targets | Metric | Target | Measurement Method | |--------|--------|-------------------| @@ -40,77 +41,48 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless --- -## User Stories +## 2. Technical Boundaries -### Core Functionality +### 2.1 In Scope -| Story | Priority | Acceptance Criteria | -|-------|----------|---------------------| -| **As a Julia developer**, I want to send text messages to JavaScript applications that lives on a server and also on a browser | P1 | Text messages are serialized, encoded, and received correctly across platforms | -| **As a Python developer**, I want to send tabular data to Julia applications | P1 | DataFrame exchange works with both Arrow IPC and JSON formats | -| **As a JavaScript developer**, I want to send large files (>0.5MB) from JavaScript applications that lives on a server and also on a browser to other applications | P1 | Large files are automatically uploaded to file server and URLs are sent via NATS | -| **As a MicroPython developer**, I want to send sensor data with minimal memory usage | P1 | Direct transport works for payloads <100KB on memory-constrained devices | +| Feature | Description | +|---------|-------------| +| Cross-platform interoperability | Seamless data exchange between Julia, JavaScript, Python, and MicroPython | +| Intelligent transport selection | Direct transport (<0.5MB) vs Link transport (≥0.5MB) based on payload size | +| Unified API | Consistent `smartsend()` and `smartreceive()` functions across all platforms | +| Multi-payload support | List of (dataname, data, type) tuples with appropriate handling | +| File server integration | Plik one-shot upload and custom HTTP server support | +| Reliability features | Exponential backoff retry and correlation ID propagation | +| Message serialization | Converts data types to binary format (Base64, JSON, Arrow IPC) | +| NATS communication | Publishing and subscription via NATS subjects | -### Multi-Payload Support +### 2.2 Out of Scope -| Story | Priority | Acceptance Criteria | -|-------|----------|---------------------| -| **As a developer**, I want to send mixed-content messages (text + image + file) | P1 | NATSBridge accepts list of (dataname, data, type) tuples and handles each payload appropriately | -| **As a developer**, I want to receive multi-payload messages | P1 | NATSBridge returns payloads as list of tuples with correct types preserved | +| Feature | Reason | +|---------|--------| +| NATS JetStream support | Core NATS sufficient for current use cases | +| Message compression | Compression adds complexity without clear benefit | +| Message encryption | Payload encryption is application-layer concern | +| Persistent message queues | NATS request-reply pattern sufficient | +| Advanced routing rules | Simple NATS subject matching sufficient | -### File Server Integration +### 2.3 Dependencies -| Story | Priority | Acceptance Criteria | -|-------|----------|---------------------| -| **As a developer**, I want to use Plik as the file server | P2 | Plik one-shot upload mode is supported with upload ID and token handling | -| **As a developer**, I want to use custom HTTP file servers | P2 | Handler function abstraction allows plugging in AWS S3 or custom implementations | +| Platform | Package | Version | +|----------|---------|---------| +| Julia | NATS.jl | Latest stable | +| Julia | JSON.jl | Latest stable | +| Julia | Arrow.jl | Latest stable | +| Julia | HTTP.jl | Latest stable | +| Julia | UUIDs.jl | Latest stable | +| Node.js | nats | Latest stable | +| Node.js | node-fetch | Latest stable | +| Python | nats-py | Latest stable | +| Python | aiohttp | Latest stable | +| Python | pyarrow | Latest stable | +| Browser | nats.ws | Latest stable | -### Reliability Features - -| Story | Priority | Acceptance Criteria | -|-------|----------|---------------------| -| **As a developer**, I want automatic retry on file server download failures | P1 | Exponential backoff with configurable retries (default: 5, base_delay: 100ms, max_delay: 5000ms) | -| **As a developer**, I want message tracing across distributed systems | P1 | Correlation ID is propagated through all message processing steps | - ---- - -## Non-Functional Requirements - -### Performance Requirements - -| Requirement | Specification | Test Method | -|-------------|---------------|-------------| -| Message serialization overhead | <50ms for 10KB payload | Benchmark tests | -| Message deserialization overhead | <50ms for 10KB payload | Benchmark tests | -| NATS connection establishment | <100ms | Connection pool benchmarks | -| File upload latency | <1s for 0.5MB file | Integration tests | -| File download latency | <1s for 0.5MB file | Integration tests | - -### Scalability Requirements - -| Requirement | Specification | -|-------------|---------------| -| Concurrent connections | Support 100+ simultaneous NATS connections | -| Message throughput | Handle 1000+ messages/second per instance | -| File server scalability | Support horizontal scaling of file server backend | - -### Reliability Requirements - -| Requirement | Specification | -|-------------|---------------| -| Message delivery | At-least-once delivery semantics via NATS | -| File server availability | Graceful degradation when file server is unavailable | -| Connection recovery | Auto-reconnect on NATS connection failure | - -### Security Requirements - -| Requirement | Specification | -|-------------|---------------| -| Payload integrity | SHA-256 checksum support via metadata | -| Transport security | TLS support for NATS connections | -| File server security | Authentication token for file uploads | - -### Compatibility Requirements +### 2.4 Platform Compatibility | Platform | Minimum Version | Notes | |----------|-----------------|-------| @@ -122,61 +94,90 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless --- -## Out of Scope +## 3. Functional Requirements (FR) -### Phase 1 (Current Implementation) - -| Feature | Reason | -|---------|--------| -| NATS JetStream support | Core NATS sufficient for current use cases | -| Message compression | Compression adds complexity without clear benefit | -| Message encryption | Payload encryption is application-layer concern | -| Persistent message queues | NATS request-reply pattern sufficient | -| Advanced routing rules | Simple NATS subject matching sufficient | - -### Future Considerations - -| Feature | Future Phase | -|---------|--------------| -| JetStream streams and consumers | Phase 2 | -| Message TTL and dead-letter queues | Phase 3 | -| Message tracing with OpenTelemetry | Phase 3 | -| Rate limiting and quota management | Phase 4 | +| ID | Requirement | Description | +|----|-------------|-------------| +| **FR-001** | Cross-platform text messaging | System shall allow users to send text messages between Julia, JavaScript, Python, and MicroPython applications | +| **FR-002** | Cross-platform tabular data | System shall support DataFrame exchange between Julia and Python applications using Arrow IPC format | +| **FR-003** | Large file handling | System shall automatically detect payloads ≥0.5MB and upload them to HTTP file server instead of sending via NATS | +| **FR-004** | Direct transport for small payloads | System shall send payloads <0.5MB directly via NATS without file server upload | +| **FR-005** | MicroPython support | System shall support payloads <100KB on MicroPython devices using direct transport | +| **FR-006** | Multi-payload messages | System shall accept and process lists of (dataname, data, type) tuples | +| **FR-007** | Payload type preservation | System shall preserve payload types when returning multi-payload messages | +| **FR-008** | Plik file server integration | System shall support Plik one-shot upload mode with upload ID and token handling | +| **FR-009** | Custom file server support | System shall provide handler function abstraction for custom HTTP file server implementations | +| **FR-010** | Exponential backoff retry | System shall implement exponential backoff with configurable retries (default: 5, base_delay: 100ms, max_delay: 5000ms) for file server download failures | +| **FR-011** | Correlation ID propagation | System shall propagate correlation IDs through all message processing steps | +| **FR-012** | Message serialization | System shall serialize data types using Base64, JSON, or Arrow IPC encoding | +| **FR-013** | NATS publishing | System shall publish messages to NATS subjects | +| **FR-014** | NATS subscription | System shall receive and process NATS messages | --- -## Boundary Definitions +## 4. Non-Functional Requirements (NFRs) -### What NATSBridge Handles +### 4.1 Performance & Scalability -| Function | Description | -|----------|-------------| -| Message serialization | Converts data types to binary format | -| Message encoding | Base64, JSON, Arrow IPC encoding | -| Transport selection | Direct vs link based on size threshold | -| NATS publishing | Publishes messages to NATS subjects | -| NATS subscription | Receives and processes NATS messages | -| File server upload | Uploads large payloads to HTTP server | -| File server download | Downloads payloads from HTTP server with retry | -| Correlation ID generation | Creates and propagates UUIDs | -| Data deserialization | Converts binary format back to native types | +| ID | Requirement | Specification | Test Method | +|----|-------------|---------------|-------------| +| **NFR-101** | Message serialization overhead | <50ms for 10KB payload | Benchmark tests | +| **NFR-102** | Message deserialization overhead | <50ms for 10KB payload | Benchmark tests | +| **NFR-103** | NATS connection establishment | <100ms | Connection pool benchmarks | +| **NFR-104** | File upload latency | <1s for 0.5MB file | Integration tests | +| **NFR-105** | File download latency | <1s for 0.5MB file | Integration tests | +| **NFR-106** | Concurrent connections | Support 100+ simultaneous NATS connections | Scale testing | +| **NFR-107** | Message throughput | Handle 1000+ messages/second per instance | Load testing | +| **NFR-108** | File server scalability | Support horizontal scaling of file server backend | Architecture review | -### What NATSBridge Does NOT Handle +### 4.2 Availability & Reliability -| Function | Handled By | -|----------|------------| -| NATS server management | External NATS deployment | -| File server management | External HTTP server deployment | -| Application business logic | Application code using NATSBridge | -| Message encryption | Application layer | -| Message compression | Application layer | -| Authentication/Authorization | NATS server configuration | +| ID | Requirement | Specification | +|----|-------------|---------------| +| **NFR-201** | Message delivery | At-least-once delivery semantics via NATS | +| **NFR-202** | File server availability | Graceful degradation when file server is unavailable | +| **NFR-203** | Connection recovery | Auto-reconnect on NATS connection failure | + +### 4.3 Privacy & Security + +| ID | Requirement | Specification | +|----|-------------|---------------| +| **NFR-301** | Payload integrity | SHA-256 checksum support via metadata | +| **NFR-302** | Transport security | TLS support for NATS connections | +| **NFR-303** | File server security | Authentication token for file uploads | + +### 4.4 Observability & Telemetry + +| ID | Requirement | Specification | +|----|-------------|---------------| +| **NFR-401** | Required logs | `correlation_id`, `msg_id`, `timestamp`, `sender_name`, `receiver_name`, `payload_type`, `transport` | +| **NFR-402** | Critical metrics | `messages_sent_total`, `messages_received_total`, `file_upload_duration_seconds`, `file_download_duration_seconds`, `retry_attempts_total` | +| **NFR-403** | Tracing | Correlation ID propagation for request tracing | +| **NFR-404** | Alerting | `download_retry_exceeded` triggers alert when max retries exceeded | +| **NFR-405** | Retention | Logs: 30 days, Metrics: 1 year | --- -## Payload Type Requirements +## 5. Acceptance Conditions -### Supported Payload Types +| Condition | Description | +|-----------|-------------| +| **AC-001** | All functional requirements FR-001 through FR-014 are implemented and tested | +| **AC-002** | All non-functional requirements NFR-101 through NFR-405 meet specified targets | +| **AC-003** | Cross-platform text message test passes (Julia ↔ JavaScript ↔ Python) | +| **AC-004** | Cross-platform tabular data test passes with Arrow IPC round-trip (Desktop) | +| **AC-005** | Cross-platform tabular data test passes with JSON table round-trip (Browser) | +| **AC-006** | Large file transfer test passes with file server upload/download | +| **AC-007** | Multi-payload mixed content test passes with all payload types in one message | +| **AC-008** | CI validation gates block PRs on specification violations | +| **AC-009** | Unit test coverage exceeds 80% | +| **AC-010** | Documentation is complete and includes walkthroughs, architecture, and runbook | + +--- + +## 6. Payload Type Requirements + +### 6.1 Supported Payload Types | Type | Julia | JavaScript | Python | MicroPython | Description | |------|-------|------------|--------|-------------|-------------| @@ -189,7 +190,7 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless | `video` | `Vector{UInt8}` | `Uint8Array`, `Buffer` | `bytes` | `bytearray` | Video binary data | | `binary` | `Vector{UInt8}`, `IOBuffer` | `Uint8Array`, `Buffer` | `bytes`, `bytearray` | `bytearray` | Generic binary data | -### Encoding Requirements +### 6.2 Encoding Requirements | Payload Type | Encoding Method | Notes | |--------------|-----------------|-------| @@ -201,16 +202,16 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless --- -## Size Threshold Requirements +## 7. Size Threshold Requirements -### Direct Transport Threshold +### 7.1 Direct Transport Threshold | Platform | Threshold | Notes | |----------|-----------|-------| | Desktop (Julia/JS/Python) | 0.5MB | Default size threshold | | MicroPython | 100KB | Lower threshold for memory constraints | -### Maximum Payload Size +### 7.2 Maximum Payload Size | Platform | Maximum | Notes | |----------|---------|-------| @@ -219,9 +220,9 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless --- -## Message Envelope Requirements +## 8. Message Envelope Requirements -### Required Fields +### 8.1 Required Fields | Field | Type | Purpose | |-------|------|---------| @@ -240,7 +241,7 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless | `metadata` | Dict | Message-level metadata | | `payloads` | Array | List of payload objects | -### Payload Fields +### 8.2 Payload Fields | Field | Type | Purpose | |-------|------|---------| @@ -255,9 +256,9 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless --- -## Error Handling Requirements +## 9. Error Handling Requirements -### Error Codes +### 9.1 Error Codes | Error | Condition | Response | |-------|-----------|----------| @@ -267,7 +268,7 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless | `Unknown transport` | Invalid transport type | Throw error | | `NATS connection failed` | NATS unavailable | Throw error | -### Exception Handling +### 9.2 Exception Handling | Scenario | Handler | |----------|---------| @@ -278,9 +279,9 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless --- -## Testing Requirements +## 10. Testing Requirements -### Unit Tests +### 10.1 Unit Tests | Test Category | Coverage | Files | |---------------|----------|-------| @@ -290,7 +291,7 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless | File server upload | Plik integration | Platform-specific | | File server download | Exponential backoff | Platform-specific | -### Integration Tests +### 10.2 Integration Tests | Test Scenario | Success Criteria | |-------------|-----------------| @@ -302,9 +303,9 @@ NATSBridge is a cross-platform, bi-directional data bridge that enables seamless --- -## API Contract +## 11. API Contract -### smartsend Signature +### 11.1 smartsend Signature ```julia function smartsend( @@ -328,7 +329,7 @@ function smartsend( )::Tuple{msg_envelope_v1, String} ``` -### smartreceive Signature +### 11.2 smartreceive Signature ```julia function smartreceive( @@ -342,36 +343,9 @@ function smartreceive( --- -## Dependencies +## 12. Deployment Requirements -### Required Dependencies - -| Platform | Package | Version | -|----------|---------|---------| -| Julia | NATS.jl | Latest stable | -| Julia | JSON.jl | Latest stable | -| Julia | Arrow.jl | Latest stable | -| Julia | HTTP.jl | Latest stable | -| Julia | UUIDs.jl | Latest stable | -| Node.js | nats | Latest stable | -| Node.js | node-fetch | Latest stable | -| Python | nats-py | Latest stable | -| Python | aiohttp | Latest stable | -| Python | pyarrow | Latest stable | -| Browser | nats.ws | Latest stable | - -### Optional Dependencies - -| Platform | Package | Use Case | -|----------|---------|----------| -| Julia | DataFrames.jl | DataFrame support for arrowtable | -| Python | pandas | DataFrame support for arrowtable | - ---- - -## Deployment Requirements - -### Minimum Infrastructure +### 12.1 Minimum Infrastructure | Component | Minimum | Notes | |-----------|---------|-------| @@ -380,7 +354,7 @@ function smartreceive( | Client Memory | 50MB | Desktop platforms | | Client Memory | 256KB | MicroPython devices | -### Environment Variables +### 12.2 Environment Variables | Variable | Default | Description | |----------|---------|-------------| @@ -390,15 +364,15 @@ function smartreceive( --- -## Versioning +## 13. Versioning -### Current Version +### 13.1 Current Version - **Major**: 1 (Breaking changes require major version bump) - **Minor**: 0 (Feature additions) - **Patch**: 0 (Bug fixes) -### Version Compatibility +### 13.2 Version Compatibility | Version | Supported Platforms | |---------|---------------------| @@ -406,18 +380,21 @@ function smartreceive( --- -## Change Log +## 14. Change Log | Date | Version | Changes | |------|---------|---------| -| 2026-03-13 | 1.0.0 | Initial requirements document | +| 2026-03-23 | 1.0.0 | Updated to ASG Framework requirements structure | --- -## References +## 15. References - [`src/NATSBridge.jl`](../src/NATSBridge.jl) - Ground truth implementation - [`README.md`](../README.md) - Project overview +- [`docs/specification.md`](./specification.md) - Technical specification +- [`docs/ui-specification.md`](./ui-specification.md) - UI specification +- [`docs/walkthrough.md`](./walkthrough.md) - End-to-end walkthrough - [`docs/architecture.md`](./architecture.md) - Architecture documentation -- [`docs/implementation.md`](./implementation.md) - Implementation details -- [`docs/walkthrough.md`](./walkthrough.md) - Usage examples \ No newline at end of file +- [`docs/validation.md`](./validation.md) - Validation and CI/CD +- [`docs/runbook.md`](./runbook.md) - Operational runbook \ No newline at end of file diff --git a/docs/spec.md b/docs/specification.md similarity index 100% rename from docs/spec.md rename to docs/specification.md