Files
NATSBridge/docs/walkthrough.md
2026-03-14 07:50:00 +07:00

738 lines
18 KiB
Markdown

# Walkthrough: NATSBridge
**Version**: 1.0.0
**Date**: 2026-03-13
**Status**: Active
**Ground Truth**: [`src/NATSBridge.jl`](../src/NATSBridge.jl)
---
## Executive Summary
This document provides the **story of flow** for NATSBridge - the cross-platform bi-directional data bridge that enables seamless communication between **Julia**, **JavaScript**, **Python**, and **MicroPython** applications using NATS as the message bus.
This walkthrough serves as the primary onboarding guide for new developers and explains:
- **User scenarios** - Real-world use cases from developer perspective
- **Why steps are sequenced** - The rationale behind architectural decisions
- **What could go wrong** - Common failure scenarios and recovery strategies
---
## Overview: The Big Picture
NATSBridge implements the **Claim-Check pattern** for efficient handling of large payloads (>0.5MB):
```mermaid
flowchart TB
subgraph NATSBridge["NATSBridge Module"]
direction TB
subgraph Sender["Sender (smartsend)"]
direction LR
S1["Data Tuples<br/>[(dataname, data, type)]"]
S2["Serialize Data"]
S3["Size Check"]
S4["Transport Selection"]
S5["Build Envelope"]
S6["Publish to NATS"]
S1 --> S2
S2 --> S3
S3 --> S4
S4 --> S5
S5 --> S6
end
subgraph Receiver["Receiver (smartreceive)"]
direction LR
R1["Subscribe to NATS"]
R2["Parse Envelope"]
R3["Check Transport"]
R4["Deserialize Data"]
R5["Return Payloads"]
R1 --> R2
R2 --> R3
R3 --> R4
R4 --> R5
end
S6 -.->|Message| R1
end
subgraph FileServer["HTTP File Server (Plik)"]
direction TB
FS1["Upload URL"]
FS2["Download URL"]
S4 -.->|Large Payload| FS1
FS1 -.->|URL| S5
R3 -.->|Fetch URL| FS2
end
style NATSBridge fill:#e1f5fe,stroke:#0288d1,stroke-width:2px
style Sender fill:#b3e5fc,stroke:#0288d1
style Receiver fill:#b3e5fc,stroke:#0288d1
style FileServer fill:#ffe0b2,stroke:#f57c00
```
### Key Design Principles
### Key Design Principles
| Principle | Description | Rationale |
|-----------|-------------|-----------|
| **Claim-Check Pattern** | Large payloads uploaded to HTTP server, URL sent via NATS | NATS has message size limits; avoids NATS overflow |
| **Automatic Transport Selection** | Direct (< threshold) vs Link (≥ threshold) based on size | Optimizes memory vs network I/O trade-off |
| **Cross-Platform API** | Consistent `smartsend()`/`smartreceive()` across all platforms | Simplifies developer experience |
| **Exponential Backoff** | Retry downloads with increasing delays | Handles transient failures gracefully |
---
## User Scenario 1: Chat Webapp ↔ Julia Backend
### Scenario Description
A JavaScript chat webapp wants to send mixed payloads (text message + user avatar image) to a Julia backend, and receive mixed payloads (text response + AI-generated image) back.
### Step-by-Step Flow
#### Step 1: JavaScript Webapp Sends Mixed Payloads
```javascript
// JavaScript (Browser or Node.js)
const [env, msgJson] = await NATSBridge.smartsend(
"/agent/wine/api/v1/prompt",
[
["msg", "Hello! I'm Ton.", "text"],
["avatar", avatarImageData, "image"]
],
{
broker_url: "ws://localhost:4222",
receiver_name: "agent-backend",
msg_purpose: "chat"
}
);
```
**Rationale**:
- **Why mixed payloads?** Real chat apps often send both text and images together
- **Why text first?** Text is smaller, sent via direct transport (fast, no file server needed)
- **Why image second?** Images may trigger link transport if >0.5MB
#### Step 2: Transport Selection
For each payload, NATSBridge determines transport:
| Payload | Size | Transport | Reason |
|---------|------|-----------|--------|
| `"msg"` (text) | ~20 bytes | direct | < 0.5MB threshold |
| `"avatar"` (image) | ~150KB | direct | < 0.5MB threshold |
**Rationale**:
- Direct transport is faster for small payloads (no file server round-trip)
- Link transport is used when payload ≥ 0.5MB (avoids NATS size limits)
#### Step 3: Serialization and Encoding
Each payload is serialized:
| Payload | Type | Serialization | Encoding |
|---------|------|---------------|----------|
| `"msg"` | `text` | UTF-8 bytes | Base64 |
| `"avatar"` | `image` | Raw bytes | Base64 |
**Rationale**:
- Text uses UTF-8 encoding for human-readable data
- Images use raw bytes to preserve binary data integrity
- All payloads encoded as Base64 for JSON compatibility
#### Step 4: Envelope Building
NATSBridge builds the message envelope:
```json
{
"correlation_id": "a1b2c3d4...",
"msg_id": "e5f6g7h8...",
"timestamp": "2026-03-13T16:30:00.000Z",
"send_to": "/agent/wine/api/v1/prompt",
"msg_purpose": "chat",
"sender_name": "chat-webapp",
"sender_id": "sender-uuid...",
"receiver_name": "agent-backend",
"receiver_id": "",
"reply_to": "/agent/wine/api/v1/response",
"reply_to_msg_id": "",
"broker_url": "ws://localhost:4222",
"metadata": {},
"payloads": [
{
"id": "payload-uuid...",
"dataname": "msg",
"payload_type": "text",
"transport": "direct",
"encoding": "base64",
"size": 20,
"data": "SGVsbG8hIEknIHRlbCB5b3UgSW4gZW5nbGlzaC4=",
"metadata": {"payload_bytes": 20}
},
{
"id": "payload-uuid...",
"dataname": "avatar",
"payload_type": "image",
"transport": "direct",
"encoding": "base64",
"size": 150000,
"data": "iVBORw0KGgoAAAANSUhEUgAA...",
"metadata": {"payload_bytes": 150000}
}
]
}
```
**Rationale**:
- **correlation_id**: Tracks this chat session across all systems
- **reply_to**: Tells backend where to send response
- **payloads array**: Contains all data with metadata for proper handling
#### Step 5: Publish to NATS
```javascript
await NATSBridge.NATSClient.connect("ws://localhost:4222");
await NATSBridge.NATSClient.publish("/agent/wine/api/v1/prompt", msgJson);
```
**Rationale**:
- NATS provides low-latency message delivery
- JSON format ensures cross-platform compatibility
#### Step 6: Julia Backend Receives Message
```julia
# Julia backend
msg = NATS.subscription.next() # Get message from NATS
env = smartreceive(msg)
# env["payloads"] is now:
# [
# ("msg", "Hello! I'm Ton.", "text"),
# ("avatar", binary_data, "image")
# ]
```
**Rationale**:
- `smartreceive()` handles both transport types automatically
- Deserialization is type-aware based on `payload_type`
- Returns consistent tuple format regardless of transport
#### Step 7: Julia Backend Sends Response
```julia
# Julia backend processes the message
response_text = "Hello Ton! I'm the AI assistant."
generated_image = generate_ai_image(response_text)
env, msg_json = smartsend(
"/agent/wine/api/v1/response",
[
("response", response_text, "text"),
("generated_image", generated_image, "image")
],
reply_to = "/chat/user/v1/message",
reply_to_msg_id = msg["msg_id"]
)
```
**Rationale**:
- **Mixed response**: Text explanation + AI-generated image
- **reply_to**: Ensures response goes to correct topic
- **reply_to_msg_id**: Links response to original message for tracing
---
## User Scenario 2: Large File Transfer
### Scenario Description
A JavaScript webapp wants to upload a large file (10MB) to a Julia backend for processing.
### Step-by-Step Flow
#### Step 1: JavaScript Webapp Sends Large File
```javascript
const [env, msgJson] = await NATSBridge.smartsend(
"/agent/wine/api/v1/process",
[
["file", largeFileData, "binary"]
],
{
broker_url: "ws://localhost:4222",
receiver_name: "agent-backend"
}
);
```
#### Step 2: Transport Selection (Link)
| Payload | Size | Transport | Reason |
|---------|------|-----------|--------|
| `"file"` | 10MB | link | ≥ 0.5MB threshold |
**Rationale**:
- Link transport used for large payloads
- File server handles large file upload
- NATS only sends URL (small message)
#### Step 3: File Server Upload
```javascript
// NATSBridge internally calls:
const response = await plikOneshotUpload(
"http://localhost:8080",
"file",
largeFileData
);
// Response:
// {
// status: 200,
// uploadid: "UPLOAD_ID",
// fileid: "FILE_ID",
// url: "http://localhost:8080/file/UPLOAD_ID/FILE_ID/file"
// }
```
**Rationale**:
- Plik handles multipart upload
- One-shot mode simplifies API
- Returns URL for download
#### Step 4: Envelope with Link Transport
```json
{
"correlation_id": "a1b2c3d4...",
"payloads": [
{
"id": "payload-uuid...",
"dataname": "file",
"payload_type": "binary",
"transport": "link",
"encoding": "none",
"size": 10000000,
"data": "http://localhost:8080/file/UPLOAD_ID/FILE_ID/file",
"metadata": {}
}
]
}
```
**Rationale**:
- `data` field contains URL instead of Base64
- `transport: "link"` signals URL-based download
- `encoding: "none"` indicates no additional encoding
#### Step 5: Julia Backend Receives and Downloads
```julia
# Julia backend
msg = NATS.subscription.next()
env = smartreceive(msg)
# NATSBridge automatically:
# 1. Extracts URL from payload
# 2. Downloads with exponential backoff
# 3. Deserializes to binary data
```
**Rationale**:
- Exponential backoff handles transient failures
- Automatic download simplifies receiver code
- Binary data returned directly
---
## User Scenario 3: Tabular Data Exchange
### Scenario Description
A Python application sends tabular data (pandas DataFrame) to a Julia backend for analysis, and receives processed results back.
### Step-by-Step Flow
#### Step 1: Python Sends Tabular Data
```python
# Python
import pandas as pd
from natsbridge import smartsend
df = pd.DataFrame({
"id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"],
"score": [95, 88, 92]
})
env, msg_json = await smartsend(
"/agent/wine/api/v1/analyze",
[("data", df, "arrowtable")],
broker_url="nats://localhost:4222",
receiver_name="agent-backend"
)
```
**Rationale**:
- `arrowtable` type for efficient tabular data transfer
- Arrow IPC format preserves data types
- Much faster than JSON serialization
#### Step 2: Serialization to Arrow IPC
```python
# NATSBridge internally:
import pyarrow as pa
import pyarrow.ipc as ipc
table = pa.Table.from_pandas(df)
buf = io.BytesIO()
sink = ipc.new_file(buf, table.schema)
ipc.write_table(table, sink)
arrow_bytes = buf.getvalue()
```
**Rationale**:
- Arrow IPC preserves column types
- Binary format is compact
- No schema information loss
#### Step 3: Julia Receives and Deserializes
```julia
# Julia backend
msg = NATS.subscription.next()
env = smartreceive(msg)
# env["payloads"][1] is now:
# ("data", DataFrame with id, name, score columns, "arrowtable")
```
**Rationale**:
- Arrow.jl reads IPC format directly
- DataFrame returned with correct types
- No manual parsing needed
#### Step 4: Julia Sends Results
```julia
# Julia backend
results = analyze_data(env["payloads"][1][2])
# Send results back
env, msg_json = smartsend(
"/agent/wine/api/v1/results",
[("results", results, "arrowtable")],
reply_to = "/python/worker/v1/results"
)
```
**Rationale**:
- Arrow IPC format for efficient round-trip
- Results preserve DataFrame structure
- Python can deserialize to pandas DataFrame
---
## User Scenario 4: MicroPython Device
### Scenario Description
A MicroPython sensor device sends sensor readings to a Python backend.
### Step-by-Step Flow
#### Step 1: MicroPython Sends Sensor Data
```python
# MicroPython
from natsbridge import smartsend
sensor_data = {
"temperature": 25.5,
"humidity": 60.0,
"pressure": 1013.25
}
env, msg_json = smartsend(
"/sensor/device/v1/readings",
[("data", sensor_data, "dictionary")],
broker_url="nats://localhost:4222",
size_threshold=100000 # 100KB for MicroPython
)
```
**Rationale**:
- `dictionary` type for JSON-serializable sensor data
- Smaller threshold (100KB) for memory constraints
- Direct transport only (no file server support)
#### Step 2: Serialization
```python
# NATSBridge internally:
json_str = json.dumps(sensor_data)
json_bytes = json_str.encode('utf-8')
payload_b64 = base64.b64encode(json_bytes).decode('ascii')
```
**Rationale**:
- JSON format for human-readable data
- Base64 for NATS compatibility
- UTF-8 for text encoding
#### Step 3: Python Backend Receives
```python
# Python backend
msg = await nats_consumer.next()
env = await smartreceive(msg)
# env["payloads"][0] is now:
# ("data", {"temperature": 25.5, "humidity": 60.0, ...}, "dictionary")
```
**Rationale**:
- JSON deserialization
- Dictionary returned directly
- No Arrow support (memory constraints)
---
## User Scenario 5: Cross-Platform Chat with Mixed Payloads
### Scenario Description
Multiple platforms (JavaScript, Python, Julia) communicate in a chat application with mixed payload types.
### Step-by-Step Flow
#### Step 1: JavaScript Sends Chat Message
```javascript
// JavaScript (Frontend)
const [env, msgJson] = await NATSBridge.smartsend(
"/chat/user/v1/message",
[
["text", "Check this out!", "text"],
["image", imageData, "image"]
],
{
broker_url: "ws://localhost:4222",
receiver_name: "",
msg_purpose: "chat"
}
);
```
**Rationale**:
- Empty `receiver_name` = broadcast to all subscribers
- Chat messages often include text + images
- NATS wildcard subscriptions route to correct recipients
#### Step 2: Python Backend Receives
```python
# Python (Backend)
msg = await nats_consumer.next()
env = await smartreceive(msg)
# env["payloads"] is now:
# [
# ("text", "Check this out!", "text"),
# ("image", binary_data, "image")
# ]
```
**Rationale**:
- Consistent API across platforms
- Same payload structure regardless of sender
- Type information preserved
#### Step 3: Julia Backend Receives
```julia
# Julia (Backend)
msg = NATS.subscription.next()
env = smartreceive(msg)
# env["payloads"] is now:
# [
# ("text", "Check this out!", "text"),
# ("image", binary_data, "image")
# ]
```
**Rationale**:
- Cross-platform API parity
- Same function signature across platforms
- Type information enables proper deserialization
#### Step 4: All Platforms Reply
Each platform can reply using the same API:
```python
# Python reply
await smartsend(
"/chat/user/v1/reply",
[("response", "Nice!", "text")],
reply_to="/chat/user/v1/message"
)
```
```julia
# Julia reply
smartsend(
"/chat/user/v1/reply",
[("response", "Nice!", "text")],
reply_to="/chat/user/v1/message"
)
```
```javascript
// JavaScript reply
await NATSBridge.smartsend(
"/chat/user/v1/reply",
[["response", "Nice!", "text"]],
{ reply_to: "/chat/user/v1/message" }
);
```
**Rationale**:
- Same API across platforms
- Consistent behavior
- Easy to maintain parity
---
## Error Handling
### Common Error Scenarios
| Scenario | Error | Recovery |
|----------|-------|----------|
| File server unavailable | `UPLOAD_FAILED` | Fall back to direct transport or smaller payloads |
| File server download fails | `DOWNLOAD_FAILED` | Retry with exponential backoff |
| Payload type mismatch | `DESERIALIZATION_ERROR` | Validate payload_type matches data |
| NATS connection lost | `NATS_CONNECTION_FAILED` | NATS client auto-reconnects |
### Error Response Format
```json
{
"correlation_id": "abc123...",
"error": {
"code": "DOWNLOAD_FAILED",
"message": "Failed to fetch data after 5 attempts",
"details": {
"url": "http://localhost:8080/file/...",
"correlation_id": "abc123..."
}
}
}
```
---
## Debugging and Tracing
### Correlation ID Tracking
Every message includes a `correlation_id`:
```julia
# At start of request
correlation_id = string(uuid4())
# Use throughout the flow
log_trace(correlation_id, "Starting smartsend")
log_trace(correlation_id, "Serialized payload size: 100 bytes")
log_trace(correlation_id, "Published to NATS")
```
**Log Format**:
```
[2026-03-13T16:30:00.000Z] [Correlation: abc123...] Starting smartsend
[2026-03-13T16:30:00.001Z] [Correlation: abc123...] Serialized payload size: 100 bytes
[2026-03-13T16:30:00.002Z] [Correlation: abc123...] Published to NATS
```
---
## Performance Considerations
### Optimization Strategies
| Strategy | Description | When to Use |
|----------|-------------|-------------|
| Pre-create NATS connection | Reuse connection for multiple sends | High-throughput scenarios |
| Adjust size threshold | Increase threshold if file server slow | File server bottleneck |
| Use direct transport | Avoid file server for small payloads | Low latency requirements |
### Size Threshold by Platform
| Platform | Threshold | Notes |
|----------|-----------|-------|
| Desktop (Julia/JS/Python) | 500,000 bytes (0.5MB) | Default threshold |
| MicroPython | 100,000 bytes (100KB) | Lower threshold for memory constraints |
---
## Deployment Considerations
### Minimum Infrastructure
| Component | Minimum | Notes |
|-----------|---------|-------|
| NATS Server | 1 instance | Single node for development |
| File Server | 1 instance | HTTP server for large payloads |
| Client Memory | 50MB | Desktop platforms |
| Client Memory | 256KB | MicroPython devices |
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `NATS_URL` | `nats://localhost:4222` | NATS server URL |
| `FILESERVER_URL` | `http://localhost:8080` | HTTP file server URL |
| `SIZE_THRESHOLD` | `1000000` | Size threshold in bytes |
---
## Change Log
| Date | Version | Changes |
|------|---------|---------|
| 2026-03-13 | 1.0.0 | Initial walkthrough documentation |
---
## References
- [`docs/requirements.md`](./requirements.md) - Business requirements and user stories
- [`docs/spec.md`](./spec.md) - Technical specification and contracts
- [`docs/architecture.md`](./architecture.md) - System architecture diagrams
- [`src/NATSBridge.jl`](../src/NATSBridge.jl) - Ground truth implementation
- [`README.md`](../README.md) - Project overview
---
*This walkthrough document is versioned and maintained in git alongside the codebase. All implementations must adhere to this documentation.*
<tool_call>
<function=update_todo_list>
<parameter=todos>
[x] Analyze existing documentation (requirements.md, spec.md, architecture.md)
[x] Read all source files in src/ folder
[x] Write docs/walkthrough.md according to SDD framework with user scenarios