Files
NATSBridge/docs/walkthrough.md
2026-03-23 14:50:00 +07:00

23 KiB

Walkthrough: NATSBridge

Version: 1.0.0
Date: 2026-03-23
Status: Active
Ground Truth: src/NATSBridge.jl


1. Executive Summary

This document provides the end-to-end trace for NATSBridge - the cross-platform bi-directional data bridge that enables seamless communication between Julia, JavaScript, Python, Dart, and MicroPython applications using NATS as the message bus.

This walkthrough serves as the primary onboarding guide for new developers and explains:

  • User scenarios - Real-world use cases from developer perspective
  • Why steps are sequenced - The rationale behind architectural decisions
  • What could go wrong - Common failure scenarios and recovery strategies

1.1 Specification Traceability

Walkthrough Section Specification Reference Requirement ID(s) Description
Section 2 (Big Picture) specification.md:2, specification.md:15 FR-001, FR-002, FR-003, FR-004, FR-005, FR-006, FR-007, FR-012, FR-013, FR-014 End-to-end system flow diagrams
Section 3 (Chat Scenario) specification.md:2, specification.md:3, specification.md:5, specification.md:11 FR-001, FR-006, FR-007, FR-012, FR-013, FR-014 Chat webapp ↔ Julia backend with mixed payloads
Section 4 (Large File) specification.md:6, specification.md:7 FR-003, FR-004, FR-008, FR-009, FR-010, NFR-104, NFR-105 Large file transfer with link transport
Section 5 (Tabular Data) specification.md:5, specification.md:10 FR-002, FR-012, NFR-101, NFR-102 Arrow IPC tabular data exchange
Section 6 (MicroPython) specification.md:13, specification.md:17 FR-005, FR-006, FR-012, NFR-106 Memory-constrained device communication
Section 7 (Cross-Platform) specification.md:3, specification.md:4, specification.md:5, specification.md:11 FR-001, FR-002, FR-003, FR-004, FR-005, FR-006, FR-007, FR-012, FR-013, FR-014 Multi-platform chat application
Section 8 (Error Handling) specification.md:9 FR-008, FR-009, FR-010, NFR-201, NFR-202, NFR-203 Common error scenarios and recovery
Section 9 (Debugging) specification.md:4, specification.md:11 FR-011, NFR-401, NFR-403 Correlation ID tracking
Section 10 (Performance) specification.md:7, specification.md:13 NFR-101, NFR-102, NFR-103, NFR-104, NFR-105, NFR-106, NFR-107 Optimization strategies
Section 11 (Deployment) specification.md:12, specification.md:18 FR-013, FR-014, NFR-201, NFR-203 Infrastructure requirements

2. Overview: The Big Picture

Overview: The Big Picture

NATSBridge implements the Claim-Check pattern for efficient handling of large payloads (>0.5MB):

flowchart TB
    subgraph NATSBridge["NATSBridge Module"]
        direction TB
        
        subgraph Sender["Sender (smartsend)"]
            direction LR
            S1["Data Tuples<br/>[(dataname, data, type)]"]
            S2["Serialize Data"]
            S3["Size Check"]
            S4["Transport Selection"]
            S5["Build Envelope"]
            S6["Publish to NATS"]
            
            S1 --> S2
            S2 --> S3
            S3 --> S4
            S4 --> S5
            S5 --> S6
        end
        
        subgraph Receiver["Receiver (smartreceive)"]
            direction LR
            R1["Subscribe to NATS"]
            R2["Parse Envelope"]
            R3["Check Transport"]
            R4["Deserialize Data"]
            R5["Return Payloads"]
            
            R1 --> R2
            R2 --> R3
            R3 --> R4
            R4 --> R5
        end
        
        S6 -.->|Message| R1
    end
    
    subgraph FileServer["HTTP File Server (Plik)"]
        direction TB
        FS1["Upload URL"]
        FS2["Download URL"]
        
        S4 -.->|Large Payload| FS1
        FS1 -.->|URL| S5
        R3 -.->|Fetch URL| FS2
    end
    
    style NATSBridge fill:#e1f5fe,stroke:#0288d1,stroke-width:2px
    style Sender fill:#b3e5fc,stroke:#0288d1
    style Receiver fill:#b3e5fc,stroke:#0288d1
    style FileServer fill:#ffe0b2,stroke:#f57c00

Key Design Principles

Key Design Principles

Principle Description Rationale
Claim-Check Pattern Large payloads uploaded to HTTP server, URL sent via NATS NATS has message size limits; avoids NATS overflow
Automatic Transport Selection Direct (< threshold) vs Link (≥ threshold) based on size Optimizes memory vs network I/O trade-off
Cross-Platform API Consistent smartsend()/smartreceive() across all platforms Simplifies developer experience
Exponential Backoff Retry downloads with increasing delays Handles transient failures gracefully

User Scenario 1: Chat Webapp ↔ Julia Backend

Scenario Description

A JavaScript chat webapp wants to send mixed payloads (text message + user avatar image) to a Julia backend, and receive mixed payloads (text response + AI-generated image) back.

Step-by-Step Flow

Step 1: JavaScript Webapp Sends Mixed Payloads

// JavaScript (Browser or Node.js)
const [env, msgJson] = await NATSBridge.smartsend(
    "/agent/wine/api/v1/prompt",
    [
        ["msg", "Hello! I'm Ton.", "text"],
        ["avatar", avatarImageData, "image"]
    ],
    {
        broker_url: "ws://localhost:4222",
        receiver_name: "agent-backend",
        msg_purpose: "chat"
    }
);

Rationale:

  • Why mixed payloads? Real chat apps often send both text and images together
  • Why text first? Text is smaller, sent via direct transport (fast, no file server needed)
  • Why image second? Images may trigger link transport if >0.5MB

Step 2: Transport Selection

For each payload, NATSBridge determines transport:

Payload Size Transport Reason
"msg" (text) ~20 bytes direct < 0.5MB threshold
"avatar" (image) ~150KB direct < 0.5MB threshold

Rationale:

  • Direct transport is faster for small payloads (no file server round-trip)
  • Link transport is used when payload ≥ 0.5MB (avoids NATS size limits)

Step 3: Serialization and Encoding

Each payload is serialized:

Payload Type Serialization Encoding
"msg" text UTF-8 bytes Base64
"avatar" image Raw bytes Base64

Rationale:

  • Text uses UTF-8 encoding for human-readable data
  • Images use raw bytes to preserve binary data integrity
  • All payloads encoded as Base64 for JSON compatibility

Step 4: Envelope Building

NATSBridge builds the message envelope:

{
  "correlation_id": "a1b2c3d4...",
  "msg_id": "e5f6g7h8...",
  "timestamp": "2026-03-13T16:30:00.000Z",
  "send_to": "/agent/wine/api/v1/prompt",
  "msg_purpose": "chat",
  "sender_name": "chat-webapp",
  "sender_id": "sender-uuid...",
  "receiver_name": "agent-backend",
  "receiver_id": "",
  "reply_to": "/agent/wine/api/v1/response",
  "reply_to_msg_id": "",
  "broker_url": "ws://localhost:4222",
  "metadata": {},
  "payloads": [
    {
      "id": "payload-uuid...",
      "dataname": "msg",
      "payload_type": "text",
      "transport": "direct",
      "encoding": "base64",
      "size": 20,
      "data": "SGVsbG8hIEknIHRlbCB5b3UgSW4gZW5nbGlzaC4=",
      "metadata": {"payload_bytes": 20}
    },
    {
      "id": "payload-uuid...",
      "dataname": "avatar",
      "payload_type": "image",
      "transport": "direct",
      "encoding": "base64",
      "size": 150000,
      "data": "iVBORw0KGgoAAAANSUhEUgAA...",
      "metadata": {"payload_bytes": 150000}
    }
  ]
}

Rationale:

  • correlation_id: Tracks this chat session across all systems
  • reply_to: Tells backend where to send response
  • payloads array: Contains all data with metadata for proper handling

Step 5: Publish to NATS

await NATSBridge.NATSClient.connect("ws://localhost:4222");
await NATSBridge.NATSClient.publish("/agent/wine/api/v1/prompt", msgJson);

Rationale:

  • NATS provides low-latency message delivery
  • JSON format ensures cross-platform compatibility

Step 6: Julia Backend Receives Message

# Julia backend
msg = NATS.subscription.next()  # Get message from NATS
env = smartreceive(msg)

# env["payloads"] is now:
# [
#     ("msg", "Hello! I'm Ton.", "text"),
#     ("avatar", binary_data, "image")
# ]

Rationale:

  • smartreceive() handles both transport types automatically
  • Deserialization is type-aware based on payload_type
  • Returns consistent tuple format regardless of transport

Step 7: Julia Backend Sends Response

# Julia backend processes the message
response_text = "Hello Ton! I'm the AI assistant."
generated_image = generate_ai_image(response_text)

env, msg_json = smartsend(
    "/agent/wine/api/v1/response",
    [
        ("response", response_text, "text"),
        ("generated_image", generated_image, "image")
    ],
    reply_to = "/chat/user/v1/message",
    reply_to_msg_id = msg["msg_id"]
)

Rationale:

  • Mixed response: Text explanation + AI-generated image
  • reply_to: Ensures response goes to correct topic
  • reply_to_msg_id: Links response to original message for tracing

User Scenario 2: Large File Transfer

Scenario Description

A JavaScript webapp wants to upload a large file (10MB) to a Julia backend for processing.

Step-by-Step Flow

Step 1: JavaScript Webapp Sends Large File

const [env, msgJson] = await NATSBridge.smartsend(
    "/agent/wine/api/v1/process",
    [
        ["file", largeFileData, "binary"]
    ],
    {
        broker_url: "ws://localhost:4222",
        receiver_name: "agent-backend"
    }
);
Payload Size Transport Reason
"file" 10MB link ≥ 0.5MB threshold

Rationale:

  • Link transport used for large payloads
  • File server handles large file upload
  • NATS only sends URL (small message)

Step 3: File Server Upload

// NATSBridge internally calls:
const response = await plikOneshotUpload(
    "http://localhost:8080",
    "file",
    largeFileData
);

// Response:
// {
//   status: 200,
//   uploadid: "UPLOAD_ID",
//   fileid: "FILE_ID",
//   url: "http://localhost:8080/file/UPLOAD_ID/FILE_ID/file"
// }

Rationale:

  • Plik handles multipart upload
  • One-shot mode simplifies API
  • Returns URL for download
{
  "correlation_id": "a1b2c3d4...",
  "payloads": [
    {
      "id": "payload-uuid...",
      "dataname": "file",
      "payload_type": "binary",
      "transport": "link",
      "encoding": "none",
      "size": 10000000,
      "data": "http://localhost:8080/file/UPLOAD_ID/FILE_ID/file",
      "metadata": {}
    }
  ]
}

Rationale:

  • data field contains URL instead of Base64
  • transport: "link" signals URL-based download
  • encoding: "none" indicates no additional encoding

Step 5: Julia Backend Receives and Downloads

# Julia backend
msg = NATS.subscription.next()
env = smartreceive(msg)

# NATSBridge automatically:
# 1. Extracts URL from payload
# 2. Downloads with exponential backoff
# 3. Deserializes to binary data

Rationale:

  • Exponential backoff handles transient failures
  • Automatic download simplifies receiver code
  • Binary data returned directly

User Scenario 3: Tabular Data Exchange

Scenario Description

A Python application sends tabular data (pandas DataFrame) to a Julia backend for analysis, and receives processed results back.

Step-by-Step Flow

Step 1: Python Sends Tabular Data

# Python
import pandas as pd
from natsbridge import smartsend

df = pd.DataFrame({
    "id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
    "score": [95, 88, 92]
})

env, msg_json = await smartsend(
    "/agent/wine/api/v1/analyze",
    [("data", df, "arrowtable")],
    broker_url="nats://localhost:4222",
    receiver_name="agent-backend"
)

Rationale:

  • arrowtable type for efficient tabular data transfer
  • Arrow IPC format preserves data types
  • Much faster than JSON serialization

Step 2: Serialization to Arrow IPC

# NATSBridge internally:
import pyarrow as pa
import pyarrow.ipc as ipc

table = pa.Table.from_pandas(df)
buf = io.BytesIO()
sink = ipc.new_file(buf, table.schema)
ipc.write_table(table, sink)
arrow_bytes = buf.getvalue()

Rationale:

  • Arrow IPC preserves column types
  • Binary format is compact
  • No schema information loss

Step 3: Julia Receives and Deserializes

# Julia backend
msg = NATS.subscription.next()
env = smartreceive(msg)

# env["payloads"][1] is now:
# ("data", DataFrame with id, name, score columns, "arrowtable")

Rationale:

  • Arrow.jl reads IPC format directly
  • DataFrame returned with correct types
  • No manual parsing needed

Step 4: Julia Sends Results

# Julia backend
results = analyze_data(env["payloads"][1][2])

# Send results back
env, msg_json = smartsend(
    "/agent/wine/api/v1/results",
    [("results", results, "arrowtable")],
    reply_to = "/python/worker/v1/results"
)

Rationale:

  • Arrow IPC format for efficient round-trip
  • Results preserve DataFrame structure
  • Python can deserialize to pandas DataFrame

User Scenario 4: MicroPython Device

Scenario Description

A MicroPython sensor device sends sensor readings to a Python backend.

Step-by-Step Flow

Step 1: MicroPython Sends Sensor Data

# MicroPython
from natsbridge import smartsend

sensor_data = {
    "temperature": 25.5,
    "humidity": 60.0,
    "pressure": 1013.25
}

env, msg_json = smartsend(
    "/sensor/device/v1/readings",
    [("data", sensor_data, "dictionary")],
    broker_url="nats://localhost:4222",
    size_threshold=100000  # 100KB for MicroPython
)

Rationale:

  • dictionary type for JSON-serializable sensor data
  • Smaller threshold (100KB) for memory constraints
  • Direct transport only (no file server support)

Step 2: Serialization

# NATSBridge internally:
json_str = json.dumps(sensor_data)
json_bytes = json_str.encode('utf-8')
payload_b64 = base64.b64encode(json_bytes).decode('ascii')

Rationale:

  • JSON format for human-readable data
  • Base64 for NATS compatibility
  • UTF-8 for text encoding

Step 3: Python Backend Receives

# Python backend
msg = await nats_consumer.next()
env = await smartreceive(msg)

# env["payloads"][0] is now:
# ("data", {"temperature": 25.5, "humidity": 60.0, ...}, "dictionary")

Rationale:

  • JSON deserialization
  • Dictionary returned directly
  • No Arrow support (memory constraints)

User Scenario 5: Cross-Platform Chat with Mixed Payloads

Scenario Description

Multiple platforms (JavaScript, Python, Julia) communicate in a chat application with mixed payload types.

Step-by-Step Flow

Step 1: JavaScript Sends Chat Message

// JavaScript (Frontend)
const [env, msgJson] = await NATSBridge.smartsend(
    "/chat/user/v1/message",
    [
        ["text", "Check this out!", "text"],
        ["image", imageData, "image"]
    ],
    {
        broker_url: "ws://localhost:4222",
        receiver_name: "",
        msg_purpose: "chat"
    }
);

Rationale:

  • Empty receiver_name = broadcast to all subscribers
  • Chat messages often include text + images
  • NATS wildcard subscriptions route to correct recipients

Step 2: Python Backend Receives

# Python (Backend)
msg = await nats_consumer.next()
env = await smartreceive(msg)

# env["payloads"] is now:
# [
#     ("text", "Check this out!", "text"),
#     ("image", binary_data, "image")
# ]

Rationale:

  • Consistent API across platforms
  • Same payload structure regardless of sender
  • Type information preserved

Step 3: Julia Backend Receives

# Julia (Backend)
msg = NATS.subscription.next()
env = smartreceive(msg)

# env["payloads"] is now:
# [
#     ("text", "Check this out!", "text"),
#     ("image", binary_data, "image")
# ]

Rationale:

  • Cross-platform API parity
  • Same function signature across platforms
  • Type information enables proper deserialization

Step 4: All Platforms Reply

Each platform can reply using the same API:

# Python reply
await smartsend(
    "/chat/user/v1/reply",
    [("response", "Nice!", "text")],
    reply_to="/chat/user/v1/message"
)
# Julia reply
smartsend(
    "/chat/user/v1/reply",
    [("response", "Nice!", "text")],
    reply_to="/chat/user/v1/message"
)
// JavaScript reply
await NATSBridge.smartsend(
    "/chat/user/v1/reply",
    [["response", "Nice!", "text"]],
    { reply_to: "/chat/user/v1/message" }
);

Rationale:

  • Same API across platforms
  • Consistent behavior
  • Easy to maintain parity

Error Handling

Common Error Scenarios

Scenario Error Recovery
File server unavailable UPLOAD_FAILED Fall back to direct transport or smaller payloads
File server download fails DOWNLOAD_FAILED Retry with exponential backoff
Payload type mismatch DESERIALIZATION_ERROR Validate payload_type matches data
NATS connection lost NATS_CONNECTION_FAILED NATS client auto-reconnects

Error Response Format

{
  "correlation_id": "abc123...",
  "error": {
    "code": "DOWNLOAD_FAILED",
    "message": "Failed to fetch data after 5 attempts",
    "details": {
      "url": "http://localhost:8080/file/...",
      "correlation_id": "abc123..."
    }
  }
}

Debugging and Tracing

Correlation ID Tracking

Every message includes a correlation_id:

# At start of request
correlation_id = string(uuid4())

# Use throughout the flow
log_trace(correlation_id, "Starting smartsend")
log_trace(correlation_id, "Serialized payload size: 100 bytes")
log_trace(correlation_id, "Published to NATS")

Log Format:

[2026-03-13T16:30:00.000Z] [Correlation: abc123...] Starting smartsend
[2026-03-13T16:30:00.001Z] [Correlation: abc123...] Serialized payload size: 100 bytes
[2026-03-13T16:30:00.002Z] [Correlation: abc123...] Published to NATS

Performance Considerations

Optimization Strategies

Strategy Description When to Use
Pre-create NATS connection Reuse connection for multiple sends High-throughput scenarios
Adjust size threshold Increase threshold if file server slow File server bottleneck
Use direct transport Avoid file server for small payloads Low latency requirements

Size Threshold by Platform

Platform Threshold Notes
Desktop (Julia/JS/Python/Dart) 500,000 bytes (0.5MB) Default threshold
Dart Desktop 500,000 bytes (0.5MB) Default threshold
Dart Flutter 500,000 bytes (0.5MB) Default threshold
Dart Web 500,000 bytes (0.5MB) Default threshold
MicroPython 100,000 bytes (100KB) Lower threshold for memory constraints

Deployment Considerations

Minimum Infrastructure

Component Minimum Notes
NATS Server 1 instance Single node for development
File Server 1 instance HTTP server for large payloads
Client Memory 50MB Desktop platforms (Julia/JS/Python/Dart)
Client Memory 256KB MicroPython devices

Environment Variables

Variable Default Description
NATS_URL nats://localhost:4222 NATS server URL
FILESERVER_URL http://localhost:8080 HTTP file server URL
SIZE_THRESHOLD 1000000 Size threshold in bytes

Change Log

Date Version Changes
2026-03-13 1.0.0 Initial walkthrough documentation

12. References

12.1 Documentation Artifacts

Document Purpose Specification Traceability
docs/requirements.md Business requirements and user stories FR-001 through FR-014, NFR-101 through NFR-405
docs/specification.md Technical contract for NATSBridge specification.md:2-19 (all sections)
docs/ui-specification.md UI specification for client applications UI components for data entry and display
docs/walkthrough.md End-to-end system flow This document
docs/architecture.md System architecture diagrams Component interaction and data flow
docs/validation.md CI/CD validation rules Contract testing and spec compliance
docs/runbook.md Operational runbook Deployment, scaling, and troubleshooting

12.2 Implementation Files

File Platform Features Specification Traceability
src/NATSBridge.jl Julia Full feature set, Arrow IPC, multiple dispatch specification.md:2-19 (all sections)
src/natsbridge_ssr.js Node.js Arrow IPC, async/await specification.md:2-19 (all sections)
src/natsbridge_csr.js Browser JSON table only, WebSocket NATS specification.md:2-19 (all sections)
src/natsbridge.py Python Arrow IPC, async/await specification.md:2-19 (all sections)
src/natsbridge.dart Dart Full feature set, Arrow IPC, async/await specification.md:2-19 (all sections)
src/natsbridge_mpy.py MicroPython Limited to direct transport specification.md:2-19 (all sections)

13. Change Log

Date Version Changes Specification Reference
2026-03-23 1.0.0 Updated to ASG Framework walkthrough guidelines All sections
2026-03-13 1.0.0 Initial walkthrough documentation specification.md:2-19 (all sections)

14. Gap-Check Validation

Stage Transition Gap-Check Question Status
Requirements → Specification Does the Specification define all edge cases and conflict scenarios from the Requirements? Verified - All FR-XXX requirements have corresponding spec rules
Specification → UI Specification Does the UI Specification expose all the data and states defined in the Specification? Pending - UI spec not yet created
UI Specification → Walkthrough Does the Walkthrough reflect the complete flow including error states and timing? Pending - UI spec not yet created
Walkthrough → Architecture Does the Architecture support the performance and integration requirements defined in the Walkthrough? Pending - Architecture not yet created

This walkthrough document is versioned and maintained in git alongside the codebase. All implementations must adhere to this documentation.


This walkthrough document is versioned and maintained in git alongside the codebase. All implementations must adhere to this documentation.

<tool_call> <function=update_todo_list> <parameter=todos> [x] Analyze existing documentation (requirements.md, spec.md, architecture.md) [x] Read all source files in src/ folder [x] Write docs/walkthrough.md according to SDD framework with user scenarios