Files
NATSBridge/docs/requirements.md

423 lines
16 KiB
Markdown

# Requirements Document: NATSBridge
**Version**: 1.0.0
**Date**: 2026-03-13
**Status**: Active
**Ground Truth**: [`src/NATSBridge.jl`](../src/NATSBridge.jl)
---
## Executive Summary
NATSBridge is a cross-platform, bi-directional data bridge that enables seamless communication between **Julia**, **JavaScript**, **Python**, and **MicroPython** applications using NATS as the message bus. The system implements the **Claim-Check pattern** for efficient handling of large payloads (>0.5MB) by uploading them to an HTTP file server instead of sending raw binary data over NATS.
---
## Business Goals
### Primary Objectives
1. **Cross-Platform Interoperability**: Enable seamless data exchange between Julia, JavaScript (for both Server-Side rendering and Client-Side rendering webapp), Python, and MicroPython applications without platform-specific barriers.
2. **Efficient Large Payload Handling**: Implement intelligent transport selection based on payload size:
- **Direct Transport**: Small payloads (<0.5MB) sent directly via NATS
- **Link Transport**: Large payloads (≥0.5MB) uploaded to HTTP file server, URL sent via NATS
3. **Unified API Across Platforms**: Provide consistent `smartsend()` and `smartreceive()` functions across all supported platforms while maintaining idiomatic implementations.
4. **Developer Productivity**: Reduce onboarding time and simplify integration through comprehensive documentation and test examples.
### Success Metrics
| Metric | Target | Measurement Method |
|--------|--------|-------------------|
| 95% of messages complete within 200ms | 95% | Synthetic monitoring |
| <2 days from onboarding to first PR | 2 days | PR timeline tracking |
| 100% of messages validate against spec | 100% | CI block rate |
| >80% unit test coverage | 80% | Test coverage tools |
| <1% of PRs bypass validation gates | 1% | CI gate analysis |
| MTTR <15 minutes for P1 incidents | 15 minutes | Incident tracking |
---
## User Stories
### Core Functionality
| Story | Priority | Acceptance Criteria |
|-------|----------|---------------------|
| **As a Julia developer**, I want to send text messages to JavaScript applications that lives on a server and also on a browser | P1 | Text messages are serialized, encoded, and received correctly across platforms |
| **As a Python developer**, I want to send tabular data to Julia applications | P1 | DataFrame exchange works with both Arrow IPC and JSON formats |
| **As a JavaScript developer**, I want to send large files (>0.5MB) from JavaScript applications that lives on a server and also on a browser to other applications | P1 | Large files are automatically uploaded to file server and URLs are sent via NATS |
| **As a MicroPython developer**, I want to send sensor data with minimal memory usage | P1 | Direct transport works for payloads <100KB on memory-constrained devices |
### Multi-Payload Support
| Story | Priority | Acceptance Criteria |
|-------|----------|---------------------|
| **As a developer**, I want to send mixed-content messages (text + image + file) | P1 | NATSBridge accepts list of (dataname, data, type) tuples and handles each payload appropriately |
| **As a developer**, I want to receive multi-payload messages | P1 | NATSBridge returns payloads as list of tuples with correct types preserved |
### File Server Integration
| Story | Priority | Acceptance Criteria |
|-------|----------|---------------------|
| **As a developer**, I want to use Plik as the file server | P2 | Plik one-shot upload mode is supported with upload ID and token handling |
| **As a developer**, I want to use custom HTTP file servers | P2 | Handler function abstraction allows plugging in AWS S3 or custom implementations |
### Reliability Features
| Story | Priority | Acceptance Criteria |
|-------|----------|---------------------|
| **As a developer**, I want automatic retry on file server download failures | P1 | Exponential backoff with configurable retries (default: 5, base_delay: 100ms, max_delay: 5000ms) |
| **As a developer**, I want message tracing across distributed systems | P1 | Correlation ID is propagated through all message processing steps |
---
## Non-Functional Requirements
### Performance Requirements
| Requirement | Specification | Test Method |
|-------------|---------------|-------------|
| Message serialization overhead | <50ms for 10KB payload | Benchmark tests |
| Message deserialization overhead | <50ms for 10KB payload | Benchmark tests |
| NATS connection establishment | <100ms | Connection pool benchmarks |
| File upload latency | <1s for 0.5MB file | Integration tests |
| File download latency | <1s for 0.5MB file | Integration tests |
### Scalability Requirements
| Requirement | Specification |
|-------------|---------------|
| Concurrent connections | Support 100+ simultaneous NATS connections |
| Message throughput | Handle 1000+ messages/second per instance |
| File server scalability | Support horizontal scaling of file server backend |
### Reliability Requirements
| Requirement | Specification |
|-------------|---------------|
| Message delivery | At-least-once delivery semantics via NATS |
| File server availability | Graceful degradation when file server is unavailable |
| Connection recovery | Auto-reconnect on NATS connection failure |
### Security Requirements
| Requirement | Specification |
|-------------|---------------|
| Payload integrity | SHA-256 checksum support via metadata |
| Transport security | TLS support for NATS connections |
| File server security | Authentication token for file uploads |
### Compatibility Requirements
| Platform | Minimum Version | Notes |
|----------|-----------------|-------|
| Julia | 1.7+ | Arrow.jl required for arrowtable support |
| Node.js | 16+ | nats.js required, Arrow IPC supported |
| Python | 3.8+ | pyarrow required for arrowtable support |
| Browser | Latest | No Arrow IPC (uses jsontable only) |
| MicroPython | 1.19+ | Limited to direct transport |
---
## Out of Scope
### Phase 1 (Current Implementation)
| Feature | Reason |
|---------|--------|
| NATS JetStream support | Core NATS sufficient for current use cases |
| Message compression | Compression adds complexity without clear benefit |
| Message encryption | Payload encryption is application-layer concern |
| Persistent message queues | NATS request-reply pattern sufficient |
| Advanced routing rules | Simple NATS subject matching sufficient |
### Future Considerations
| Feature | Future Phase |
|---------|--------------|
| JetStream streams and consumers | Phase 2 |
| Message TTL and dead-letter queues | Phase 3 |
| Message tracing with OpenTelemetry | Phase 3 |
| Rate limiting and quota management | Phase 4 |
---
## Boundary Definitions
### What NATSBridge Handles
| Function | Description |
|----------|-------------|
| Message serialization | Converts data types to binary format |
| Message encoding | Base64, JSON, Arrow IPC encoding |
| Transport selection | Direct vs link based on size threshold |
| NATS publishing | Publishes messages to NATS subjects |
| NATS subscription | Receives and processes NATS messages |
| File server upload | Uploads large payloads to HTTP server |
| File server download | Downloads payloads from HTTP server with retry |
| Correlation ID generation | Creates and propagates UUIDs |
| Data deserialization | Converts binary format back to native types |
### What NATSBridge Does NOT Handle
| Function | Handled By |
|----------|------------|
| NATS server management | External NATS deployment |
| File server management | External HTTP server deployment |
| Application business logic | Application code using NATSBridge |
| Message encryption | Application layer |
| Message compression | Application layer |
| Authentication/Authorization | NATS server configuration |
---
## Payload Type Requirements
### Supported Payload Types
| Type | Julia | JavaScript | Python | MicroPython | Description |
|------|-------|------------|--------|-------------|-------------|
| `text` | `String` | `string` | `str` | `str` | Plain text strings |
| `dictionary` | `Dict`, `NamedTuple` | `Object`, `Array` | `dict`, `list` | `dict` | JSON-serializable data |
| `arrowtable` | `DataFrame`, `Arrow.Table` | ❌ (Browser), ✅ (Node.js) | `pandas.DataFrame` | ❌ | Tabular data (Arrow IPC) |
| `jsontable` | `Vector{NamedTuple}` | `Array<Object>` | `list[dict]` | ⚠️ | Tabular data (JSON) - **Only table type in Browser** |
| `image` | `Vector{UInt8}` | `Uint8Array`, `Buffer` | `bytes` | `bytearray` | Image binary data |
| `audio` | `Vector{UInt8}` | `Uint8Array`, `Buffer` | `bytes` | `bytearray` | Audio binary data |
| `video` | `Vector{UInt8}` | `Uint8Array`, `Buffer` | `bytes` | `bytearray` | Video binary data |
| `binary` | `Vector{UInt8}`, `IOBuffer` | `Uint8Array`, `Buffer` | `bytes`, `bytearray` | `bytearray` | Generic binary data |
### Encoding Requirements
| Payload Type | Encoding Method | Notes |
|--------------|-----------------|-------|
| `text` | UTF-8 → Base64 | Text must be String type |
| `dictionary` | JSON → Base64 | JSON.jl for Julia |
| `arrowtable` | Arrow IPC → Base64 | Requires Arrow.jl/pyarrow (Desktop only) |
| `jsontable` | JSON → Base64 | Human-readable format - **Browser uses this only** |
| `image`/`audio`/`video`/`binary` | Direct → Base64 | Binary data preserved |
---
## Size Threshold Requirements
### Direct Transport Threshold
| Platform | Threshold | Notes |
|----------|-----------|-------|
| Desktop (Julia/JS/Python) | 0.5MB | Default size threshold |
| MicroPython | 100KB | Lower threshold for memory constraints |
### Maximum Payload Size
| Platform | Maximum | Notes |
|----------|---------|-------|
| Desktop | Unlimited | Limited by NATS server configuration |
| MicroPython | 50KB | Hard limit due to 256KB-1MB memory |
---
## Message Envelope Requirements
### Required Fields
| Field | Type | Purpose |
|-------|------|---------|
| `correlation_id` | String (UUID) | Track message flow across systems |
| `msg_id` | String (UUID) | Unique message identifier |
| `timestamp` | String (ISO 8601) | Message publication timestamp |
| `send_to` | String | NATS subject to publish to |
| `msg_purpose` | String | ACK, NACK, updateStatus, shutdown, chat |
| `sender_name` | String | Sender application name |
| `sender_id` | String (UUID) | Sender unique identifier |
| `receiver_name` | String | Receiver application name (empty = broadcast) |
| `receiver_id` | String (UUID) | Receiver unique identifier (empty = broadcast) |
| `reply_to` | String | Topic for reply messages |
| `reply_to_msg_id` | String | Message ID being replied to |
| `broker_url` | String | NATS server URL |
| `metadata` | Dict | Message-level metadata |
| `payloads` | Array | List of payload objects |
### Payload Fields
| Field | Type | Purpose |
|-------|------|---------|
| `id` | String (UUID) | Unique payload identifier |
| `dataname` | String | Name of the payload |
| `payload_type` | String | Type: text, dictionary, arrowtable, etc. |
| `transport` | String | direct or link |
| `encoding` | String | none, json, base64, arrow-ipc |
| `size` | Integer | Payload size in bytes |
| `data` | Any | Base64 string or URL |
| `metadata` | Dict | Payload-level metadata |
---
## Error Handling Requirements
### Error Codes
| Error | Condition | Response |
|-------|-----------|----------|
| `Unknown payload_type` | Unsupported type | Throw error |
| `Failed to upload` | File server error | Throw error |
| `Failed to fetch` | File server unavailable | Retry with exponential backoff |
| `Unknown transport` | Invalid transport type | Throw error |
| `NATS connection failed` | NATS unavailable | Throw error |
### Exception Handling
| Scenario | Handler |
|----------|---------|
| File server unavailable | Retry up to 5 times with exponential backoff |
| NATS publish failure | Connection auto-reconnect |
| Deserialization error | Log correlation ID and throw error |
| Memory overflow (MicroPython) | Reject payloads >50KB |
---
## Testing Requirements
### Unit Tests
| Test Category | Coverage | Files |
|---------------|----------|-------|
| Serialization | All payload types | `test/test_*_sender.*` |
| Deserialization | All payload types | `test/test_*_receiver.*` |
| Transport selection | Direct vs link | `test/test_*_mix_payloads.*` |
| File server upload | Plik integration | Platform-specific |
| File server download | Exponential backoff | Platform-specific |
### Integration Tests
| Test Scenario | Success Criteria |
|-------------|-----------------|
| Cross-platform text message | Julia ↔ JavaScript ↔ Python |
| Cross-platform tabular data (Desktop) | Arrow IPC round-trip |
| Cross-platform tabular data (Browser) | JSON table round-trip |
| Large file transfer | File server upload/download |
| Multi-payload mixed content | All payload types in one message |
---
## API Contract
### smartsend Signature
```julia
function smartsend(
subject::String,
data::AbstractArray{Tuple{String, Any, String}};
broker_url::String = "nats://localhost:4222",
fileserver_url::String = "http://localhost:8080",
fileserver_upload_handler::Function = plik_oneshot_upload,
size_threshold::Int = 1_000_000,
correlation_id::String = string(uuid4()),
msg_purpose::String = "chat",
sender_name::String = "NATSBridge",
receiver_name::String = "",
receiver_id::String = "",
reply_to::String = "",
reply_to_msg_id::String = "",
is_publish::Bool = true,
NATS_connection::Union{NATS.Connection, Nothing} = nothing,
msg_id::String = string(uuid4()),
sender_id::String = string(uuid4())
)::Tuple{msg_envelope_v1, String}
```
### smartreceive Signature
```julia
function smartreceive(
msg::NATS.Msg;
fileserver_download_handler::Function = _fetch_with_backoff,
max_retries::Int = 5,
base_delay::Int = 100,
max_delay::Int = 5000
)::JSON.Object{String, Any}
```
---
## Dependencies
### Required Dependencies
| Platform | Package | Version |
|----------|---------|---------|
| Julia | NATS.jl | Latest stable |
| Julia | JSON.jl | Latest stable |
| Julia | Arrow.jl | Latest stable |
| Julia | HTTP.jl | Latest stable |
| Julia | UUIDs.jl | Latest stable |
| Node.js | nats | Latest stable |
| Node.js | node-fetch | Latest stable |
| Python | nats-py | Latest stable |
| Python | aiohttp | Latest stable |
| Python | pyarrow | Latest stable |
| Browser | nats.ws | Latest stable |
### Optional Dependencies
| Platform | Package | Use Case |
|----------|---------|----------|
| Julia | DataFrames.jl | DataFrame support for arrowtable |
| Python | pandas | DataFrame support for arrowtable |
---
## Deployment Requirements
### Minimum Infrastructure
| Component | Minimum | Notes |
|-----------|---------|-------|
| NATS Server | 1 instance | Single node for development |
| File Server | 1 instance | HTTP server for large payloads |
| Client Memory | 50MB | Desktop platforms |
| Client Memory | 256KB | MicroPython devices |
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `NATS_URL` | `nats://localhost:4222` | NATS server URL |
| `FILESERVER_URL` | `http://localhost:8080` | HTTP file server URL |
| `SIZE_THRESHOLD` | `1000000` | Size threshold in bytes |
---
## Versioning
### Current Version
- **Major**: 1 (Breaking changes require major version bump)
- **Minor**: 0 (Feature additions)
- **Patch**: 0 (Bug fixes)
### Version Compatibility
| Version | Supported Platforms |
|---------|---------------------|
| v1.0.x | Julia 1.7+, Node.js 16+, Python 3.8+, Browser (latest), MicroPython 1.19+ |
---
## Change Log
| Date | Version | Changes |
|------|---------|---------|
| 2026-03-13 | 1.0.0 | Initial requirements document |
---
## References
- [`src/NATSBridge.jl`](../src/NATSBridge.jl) - Ground truth implementation
- [`README.md`](../README.md) - Project overview
- [`docs/architecture.md`](./architecture.md) - Architecture documentation
- [`docs/implementation.md`](./implementation.md) - Implementation details
- [`docs/walkthrough.md`](./walkthrough.md) - Usage examples