# Walkthrough: NATSBridge **Version**: 1.0.0 **Date**: 2026-03-13 **Status**: Active **Ground Truth**: [`src/NATSBridge.jl`](../src/NATSBridge.jl) --- ## Executive Summary This document provides the **story of flow** for NATSBridge - the cross-platform bi-directional data bridge that enables seamless communication between **Julia**, **JavaScript**, **Python**, and **MicroPython** applications using NATS as the message bus. This walkthrough serves as the primary onboarding guide for new developers and explains: - **User scenarios** - Real-world use cases from developer perspective - **Why steps are sequenced** - The rationale behind architectural decisions - **What could go wrong** - Common failure scenarios and recovery strategies --- ## Overview: The Big Picture NATSBridge implements the **Claim-Check pattern** for efficient handling of large payloads (>0.5MB): ``` ┌─────────────────────────────────────────────────────────────────────┐ │ NATSBridge Architecture │ ├─────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────┐ ┌──────────────┐ │ │ │ Sender │ │ Receiver │ │ │ │ │ │ │ │ │ │ ┌──────────┐ │ │ ┌──────────┐ │ │ │ │ │smartsend │◀─────────┤ │smartreceive│ │ │ │ │ └────┬─────┘ │ │ └────┬─────┘ │ │ │ │ │ │ │ │ │ │ │ │ ▼ │ │ ▼ │ │ │ │ ┌──────────┐ │ │ ┌──────────┐ │ │ │ │ │Serialize │◀─────────┤ │Deserialize│ │ │ │ │ └────┬─────┘ │ │ └────┬─────┘ │ │ │ │ │ │ │ │ │ │ │ │ ▼ │ │ ▼ │ │ │ │ ┌──────────┐ │ │ ┌──────────┐ │ │ │ │ │Transport │◀─────────┤ │Transport │ │ │ │ │ │Selection │ │ │ │Selection │ │ │ │ │ └────┬─────┘ │ │ └────┬─────┘ │ │ │ │ │ │ │ │ │ │ │ │ ▼ │ │ ▼ │ │ │ │ ┌──────────┐ │ │ ┌──────────┐ │ │ │ │ │ NATS │◀─────────┤ │ NATS │ │ │ │ │ │Publish │ │ │ │Subscribe │ │ │ │ │ └──────────┘ │ │ └──────────┘ │ │ │ │ │ │ │ │ │ │ ┌──────────┐ │ │ ┌──────────┐ │ │ │ │ │File Server│◀─────────┤ │File Server│ │ │ │ │ │Upload │ │ │ │Download │ │ │ │ │ └──────────┘ │ │ └──────────┘ │ │ │ └──────────────┘ └──────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────┘ ``` ### Key Design Principles | Principle | Description | Rationale | |-----------|-------------|-----------| | **Claim-Check Pattern** | Large payloads uploaded to HTTP server, URL sent via NATS | NATS has message size limits; avoids NATS overflow | | **Automatic Transport Selection** | Direct (< threshold) vs Link (≥ threshold) based on size | Optimizes memory vs network I/O trade-off | | **Cross-Platform API** | Consistent `smartsend()`/`smartreceive()` across all platforms | Simplifies developer experience | | **Exponential Backoff** | Retry downloads with increasing delays | Handles transient failures gracefully | --- ## User Scenario 1: Chat Webapp ↔ Julia Backend ### Scenario Description A JavaScript chat webapp wants to send mixed payloads (text message + user avatar image) to a Julia backend, and receive mixed payloads (text response + AI-generated image) back. ### Step-by-Step Flow #### Step 1: JavaScript Webapp Sends Mixed Payloads ```javascript // JavaScript (Browser or Node.js) const [env, msgJson] = await NATSBridge.smartsend( "/agent/wine/api/v1/prompt", [ ["msg", "Hello! I'm Ton.", "text"], ["avatar", avatarImageData, "image"] ], { broker_url: "ws://localhost:4222", receiver_name: "agent-backend", msg_purpose: "chat" } ); ``` **Rationale**: - **Why mixed payloads?** Real chat apps often send both text and images together - **Why text first?** Text is smaller, sent via direct transport (fast, no file server needed) - **Why image second?** Images may trigger link transport if >0.5MB #### Step 2: Transport Selection For each payload, NATSBridge determines transport: | Payload | Size | Transport | Reason | |---------|------|-----------|--------| | `"msg"` (text) | ~20 bytes | direct | < 0.5MB threshold | | `"avatar"` (image) | ~150KB | direct | < 0.5MB threshold | **Rationale**: - Direct transport is faster for small payloads (no file server round-trip) - Link transport is used when payload ≥ 0.5MB (avoids NATS size limits) #### Step 3: Serialization and Encoding Each payload is serialized: | Payload | Type | Serialization | Encoding | |---------|------|---------------|----------| | `"msg"` | `text` | UTF-8 bytes | Base64 | | `"avatar"` | `image` | Raw bytes | Base64 | **Rationale**: - Text uses UTF-8 encoding for human-readable data - Images use raw bytes to preserve binary data integrity - All payloads encoded as Base64 for JSON compatibility #### Step 4: Envelope Building NATSBridge builds the message envelope: ```json { "correlation_id": "a1b2c3d4...", "msg_id": "e5f6g7h8...", "timestamp": "2026-03-13T16:30:00.000Z", "send_to": "/agent/wine/api/v1/prompt", "msg_purpose": "chat", "sender_name": "chat-webapp", "sender_id": "sender-uuid...", "receiver_name": "agent-backend", "receiver_id": "", "reply_to": "/agent/wine/api/v1/response", "reply_to_msg_id": "", "broker_url": "ws://localhost:4222", "metadata": {}, "payloads": [ { "id": "payload-uuid...", "dataname": "msg", "payload_type": "text", "transport": "direct", "encoding": "base64", "size": 20, "data": "SGVsbG8hIEknIHRlbCB5b3UgSW4gZW5nbGlzaC4=", "metadata": {"payload_bytes": 20} }, { "id": "payload-uuid...", "dataname": "avatar", "payload_type": "image", "transport": "direct", "encoding": "base64", "size": 150000, "data": "iVBORw0KGgoAAAANSUhEUgAA...", "metadata": {"payload_bytes": 150000} } ] } ``` **Rationale**: - **correlation_id**: Tracks this chat session across all systems - **reply_to**: Tells backend where to send response - **payloads array**: Contains all data with metadata for proper handling #### Step 5: Publish to NATS ```javascript await NATSBridge.NATSClient.connect("ws://localhost:4222"); await NATSBridge.NATSClient.publish("/agent/wine/api/v1/prompt", msgJson); ``` **Rationale**: - NATS provides low-latency message delivery - JSON format ensures cross-platform compatibility #### Step 6: Julia Backend Receives Message ```julia # Julia backend msg = NATS.subscription.next() # Get message from NATS env = smartreceive(msg) # env["payloads"] is now: # [ # ("msg", "Hello! I'm Ton.", "text"), # ("avatar", binary_data, "image") # ] ``` **Rationale**: - `smartreceive()` handles both transport types automatically - Deserialization is type-aware based on `payload_type` - Returns consistent tuple format regardless of transport #### Step 7: Julia Backend Sends Response ```julia # Julia backend processes the message response_text = "Hello Ton! I'm the AI assistant." generated_image = generate_ai_image(response_text) env, msg_json = smartsend( "/agent/wine/api/v1/response", [ ("response", response_text, "text"), ("generated_image", generated_image, "image") ], reply_to = "/chat/user/v1/message", reply_to_msg_id = msg["msg_id"] ) ``` **Rationale**: - **Mixed response**: Text explanation + AI-generated image - **reply_to**: Ensures response goes to correct topic - **reply_to_msg_id**: Links response to original message for tracing --- ## User Scenario 2: Large File Transfer ### Scenario Description A JavaScript webapp wants to upload a large file (10MB) to a Julia backend for processing. ### Step-by-Step Flow #### Step 1: JavaScript Webapp Sends Large File ```javascript const [env, msgJson] = await NATSBridge.smartsend( "/agent/wine/api/v1/process", [ ["file", largeFileData, "binary"] ], { broker_url: "ws://localhost:4222", receiver_name: "agent-backend" } ); ``` #### Step 2: Transport Selection (Link) | Payload | Size | Transport | Reason | |---------|------|-----------|--------| | `"file"` | 10MB | link | ≥ 0.5MB threshold | **Rationale**: - Link transport used for large payloads - File server handles large file upload - NATS only sends URL (small message) #### Step 3: File Server Upload ```javascript // NATSBridge internally calls: const response = await plikOneshotUpload( "http://localhost:8080", "file", largeFileData ); // Response: // { // status: 200, // uploadid: "UPLOAD_ID", // fileid: "FILE_ID", // url: "http://localhost:8080/file/UPLOAD_ID/FILE_ID/file" // } ``` **Rationale**: - Plik handles multipart upload - One-shot mode simplifies API - Returns URL for download #### Step 4: Envelope with Link Transport ```json { "correlation_id": "a1b2c3d4...", "payloads": [ { "id": "payload-uuid...", "dataname": "file", "payload_type": "binary", "transport": "link", "encoding": "none", "size": 10000000, "data": "http://localhost:8080/file/UPLOAD_ID/FILE_ID/file", "metadata": {} } ] } ``` **Rationale**: - `data` field contains URL instead of Base64 - `transport: "link"` signals URL-based download - `encoding: "none"` indicates no additional encoding #### Step 5: Julia Backend Receives and Downloads ```julia # Julia backend msg = NATS.subscription.next() env = smartreceive(msg) # NATSBridge automatically: # 1. Extracts URL from payload # 2. Downloads with exponential backoff # 3. Deserializes to binary data ``` **Rationale**: - Exponential backoff handles transient failures - Automatic download simplifies receiver code - Binary data returned directly --- ## User Scenario 3: Tabular Data Exchange ### Scenario Description A Python application sends tabular data (pandas DataFrame) to a Julia backend for analysis, and receives processed results back. ### Step-by-Step Flow #### Step 1: Python Sends Tabular Data ```python # Python import pandas as pd from natsbridge import smartsend df = pd.DataFrame({ "id": [1, 2, 3], "name": ["Alice", "Bob", "Charlie"], "score": [95, 88, 92] }) env, msg_json = await smartsend( "/agent/wine/api/v1/analyze", [("data", df, "arrowtable")], broker_url="nats://localhost:4222", receiver_name="agent-backend" ) ``` **Rationale**: - `arrowtable` type for efficient tabular data transfer - Arrow IPC format preserves data types - Much faster than JSON serialization #### Step 2: Serialization to Arrow IPC ```python # NATSBridge internally: import pyarrow as pa import pyarrow.ipc as ipc table = pa.Table.from_pandas(df) buf = io.BytesIO() sink = ipc.new_file(buf, table.schema) ipc.write_table(table, sink) arrow_bytes = buf.getvalue() ``` **Rationale**: - Arrow IPC preserves column types - Binary format is compact - No schema information loss #### Step 3: Julia Receives and Deserializes ```julia # Julia backend msg = NATS.subscription.next() env = smartreceive(msg) # env["payloads"][1] is now: # ("data", DataFrame with id, name, score columns, "arrowtable") ``` **Rationale**: - Arrow.jl reads IPC format directly - DataFrame returned with correct types - No manual parsing needed #### Step 4: Julia Sends Results ```julia # Julia backend results = analyze_data(env["payloads"][1][2]) # Send results back env, msg_json = smartsend( "/agent/wine/api/v1/results", [("results", results, "arrowtable")], reply_to = "/python/worker/v1/results" ) ``` **Rationale**: - Arrow IPC format for efficient round-trip - Results preserve DataFrame structure - Python can deserialize to pandas DataFrame --- ## User Scenario 4: MicroPython Device ### Scenario Description A MicroPython sensor device sends sensor readings to a Python backend. ### Step-by-Step Flow #### Step 1: MicroPython Sends Sensor Data ```python # MicroPython from natsbridge import smartsend sensor_data = { "temperature": 25.5, "humidity": 60.0, "pressure": 1013.25 } env, msg_json = smartsend( "/sensor/device/v1/readings", [("data", sensor_data, "dictionary")], broker_url="nats://localhost:4222", size_threshold=100000 # 100KB for MicroPython ) ``` **Rationale**: - `dictionary` type for JSON-serializable sensor data - Smaller threshold (100KB) for memory constraints - Direct transport only (no file server support) #### Step 2: Serialization ```python # NATSBridge internally: json_str = json.dumps(sensor_data) json_bytes = json_str.encode('utf-8') payload_b64 = base64.b64encode(json_bytes).decode('ascii') ``` **Rationale**: - JSON format for human-readable data - Base64 for NATS compatibility - UTF-8 for text encoding #### Step 3: Python Backend Receives ```python # Python backend msg = await nats_consumer.next() env = await smartreceive(msg) # env["payloads"][0] is now: # ("data", {"temperature": 25.5, "humidity": 60.0, ...}, "dictionary") ``` **Rationale**: - JSON deserialization - Dictionary returned directly - No Arrow support (memory constraints) --- ## User Scenario 5: Cross-Platform Chat with Mixed Payloads ### Scenario Description Multiple platforms (JavaScript, Python, Julia) communicate in a chat application with mixed payload types. ### Step-by-Step Flow #### Step 1: JavaScript Sends Chat Message ```javascript // JavaScript (Frontend) const [env, msgJson] = await NATSBridge.smartsend( "/chat/user/v1/message", [ ["text", "Check this out!", "text"], ["image", imageData, "image"] ], { broker_url: "ws://localhost:4222", receiver_name: "", msg_purpose: "chat" } ); ``` **Rationale**: - Empty `receiver_name` = broadcast to all subscribers - Chat messages often include text + images - NATS wildcard subscriptions route to correct recipients #### Step 2: Python Backend Receives ```python # Python (Backend) msg = await nats_consumer.next() env = await smartreceive(msg) # env["payloads"] is now: # [ # ("text", "Check this out!", "text"), # ("image", binary_data, "image") # ] ``` **Rationale**: - Consistent API across platforms - Same payload structure regardless of sender - Type information preserved #### Step 3: Julia Backend Receives ```julia # Julia (Backend) msg = NATS.subscription.next() env = smartreceive(msg) # env["payloads"] is now: # [ # ("text", "Check this out!", "text"), # ("image", binary_data, "image") # ] ``` **Rationale**: - Cross-platform API parity - Same function signature across platforms - Type information enables proper deserialization #### Step 4: All Platforms Reply Each platform can reply using the same API: ```python # Python reply await smartsend( "/chat/user/v1/reply", [("response", "Nice!", "text")], reply_to="/chat/user/v1/message" ) ``` ```julia # Julia reply smartsend( "/chat/user/v1/reply", [("response", "Nice!", "text")], reply_to="/chat/user/v1/message" ) ``` ```javascript // JavaScript reply await NATSBridge.smartsend( "/chat/user/v1/reply", [["response", "Nice!", "text"]], { reply_to: "/chat/user/v1/message" } ); ``` **Rationale**: - Same API across platforms - Consistent behavior - Easy to maintain parity --- ## Error Handling ### Common Error Scenarios | Scenario | Error | Recovery | |----------|-------|----------| | File server unavailable | `UPLOAD_FAILED` | Fall back to direct transport or smaller payloads | | File server download fails | `DOWNLOAD_FAILED` | Retry with exponential backoff | | Payload type mismatch | `DESERIALIZATION_ERROR` | Validate payload_type matches data | | NATS connection lost | `NATS_CONNECTION_FAILED` | NATS client auto-reconnects | ### Error Response Format ```json { "correlation_id": "abc123...", "error": { "code": "DOWNLOAD_FAILED", "message": "Failed to fetch data after 5 attempts", "details": { "url": "http://localhost:8080/file/...", "correlation_id": "abc123..." } } } ``` --- ## Debugging and Tracing ### Correlation ID Tracking Every message includes a `correlation_id`: ```julia # At start of request correlation_id = string(uuid4()) # Use throughout the flow log_trace(correlation_id, "Starting smartsend") log_trace(correlation_id, "Serialized payload size: 100 bytes") log_trace(correlation_id, "Published to NATS") ``` **Log Format**: ``` [2026-03-13T16:30:00.000Z] [Correlation: abc123...] Starting smartsend [2026-03-13T16:30:00.001Z] [Correlation: abc123...] Serialized payload size: 100 bytes [2026-03-13T16:30:00.002Z] [Correlation: abc123...] Published to NATS ``` --- ## Performance Considerations ### Optimization Strategies | Strategy | Description | When to Use | |----------|-------------|-------------| | Pre-create NATS connection | Reuse connection for multiple sends | High-throughput scenarios | | Adjust size threshold | Increase threshold if file server slow | File server bottleneck | | Use direct transport | Avoid file server for small payloads | Low latency requirements | ### Size Threshold by Platform | Platform | Threshold | Notes | |----------|-----------|-------| | Desktop (Julia/JS/Python) | 500,000 bytes (0.5MB) | Default threshold | | MicroPython | 100,000 bytes (100KB) | Lower threshold for memory constraints | --- ## Deployment Considerations ### Minimum Infrastructure | Component | Minimum | Notes | |-----------|---------|-------| | NATS Server | 1 instance | Single node for development | | File Server | 1 instance | HTTP server for large payloads | | Client Memory | 50MB | Desktop platforms | | Client Memory | 256KB | MicroPython devices | ### Environment Variables | Variable | Default | Description | |----------|---------|-------------| | `NATS_URL` | `nats://localhost:4222` | NATS server URL | | `FILESERVER_URL` | `http://localhost:8080` | HTTP file server URL | | `SIZE_THRESHOLD` | `1000000` | Size threshold in bytes | --- ## Change Log | Date | Version | Changes | |------|---------|---------| | 2026-03-13 | 1.0.0 | Initial walkthrough documentation | --- ## References - [`docs/requirements.md`](./requirements.md) - Business requirements and user stories - [`docs/spec.md`](./spec.md) - Technical specification and contracts - [`docs/architecture.md`](./architecture.md) - System architecture diagrams - [`src/NATSBridge.jl`](../src/NATSBridge.jl) - Ground truth implementation - [`README.md`](../README.md) - Project overview --- *This walkthrough document is versioned and maintained in git alongside the codebase. All implementations must adhere to this documentation.* [x] Analyze existing documentation (requirements.md, spec.md, architecture.md) [x] Read all source files in src/ folder [x] Write docs/walkthrough.md according to SDD framework with user scenarios