542 lines
18 KiB
Markdown
542 lines
18 KiB
Markdown
# Architecture Documentation: Bi-Directional Data Bridge (Julia ↔ JavaScript)
|
|
|
|
## Overview
|
|
|
|
This document describes the architecture for a high-performance, bi-directional data bridge between a Julia service and a JavaScript (Node.js) service using NATS (Core & JetStream), implementing the Claim-Check pattern for large payloads.
|
|
|
|
### File Server Handler Architecture
|
|
|
|
The system uses **handler functions** to abstract file server operations, allowing support for different file server implementations (e.g., Plik, AWS S3, custom HTTP server).
|
|
|
|
**Handler Function Signatures:**
|
|
|
|
```julia
|
|
# Upload handler - uploads data to file server and returns URL
|
|
fileserverUploadHandler(fileserver_url::String, dataname::String, data::Vector{UInt8})::Dict{String, Any}
|
|
|
|
# Download handler - fetches data from file server URL
|
|
fileserverDownloadHandler(fileserver_url::String, url::String, max_retries::Int, base_delay::Int, max_delay::Int)::Vector{UInt8}
|
|
```
|
|
|
|
This design allows the system to support multiple file server backends without changing the core messaging logic.
|
|
|
|
### Multi-Payload Support (Standard API)
|
|
|
|
The system uses a **standardized list-of-tuples format** for all payload operations. **Even when sending a single payload, the user must wrap it in a list.**
|
|
|
|
**API Standard:**
|
|
```julia
|
|
# Input format for smartsend (always a list of tuples)
|
|
[(dataname1, data1), (dataname2, data2), ...]
|
|
|
|
# Output format for smartreceive (always returns a list of tuples)
|
|
[(dataname1, data1), (dataname2, data2), ...]
|
|
```
|
|
|
|
**Examples:**
|
|
|
|
```julia
|
|
# Single payload - still wrapped in a list
|
|
smartsend(
|
|
"/test",
|
|
[("dataname1", data1)], # List with one tuple
|
|
nats_url="nats://localhost:4222",
|
|
fileserverUploadHandler=plik_oneshot_upload,
|
|
metadata=user_provided_envelope_level_metadata
|
|
)
|
|
|
|
# Multiple payloads in one message
|
|
smartsend(
|
|
"/test",
|
|
[("dataname1", data1), ("dataname2", data2)],
|
|
nats_url="nats://localhost:4222",
|
|
fileserverUploadHandler=plik_oneshot_upload
|
|
)
|
|
|
|
# Receive always returns a list
|
|
payloads = smartreceive(msg, fileserverDownloadHandler, max_retries, base_delay, max_delay)
|
|
# payloads = [("dataname1", data1), ("dataname2", data2), ...]
|
|
```
|
|
|
|
## Architecture Diagram
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
subgraph Client
|
|
JS[JavaScript Client]
|
|
JSApp[Application Logic]
|
|
end
|
|
|
|
subgraph Server
|
|
Julia[Julia Service]
|
|
NATS[NATS Server]
|
|
FileServer[HTTP File Server]
|
|
end
|
|
|
|
JS -->|Control/Small Data| JSApp
|
|
JSApp -->|NATS| NATS
|
|
NATS -->|NATS| Julia
|
|
Julia -->|NATS| NATS
|
|
Julia -->|HTTP POST| FileServer
|
|
JS -->|HTTP GET| FileServer
|
|
|
|
style JS fill:#e1f5fe
|
|
style Julia fill:#e8f5e9
|
|
style NATS fill:#fff3e0
|
|
style FileServer fill:#f3e5f5
|
|
```
|
|
|
|
## System Components
|
|
|
|
### 1. msgEnvelope_v1 - Message Envelope
|
|
|
|
The `msgEnvelope_v1` structure provides a comprehensive message format for bidirectional communication between Julia and JavaScript services.
|
|
|
|
**Julia Structure:**
|
|
```julia
|
|
struct msgEnvelope_v1
|
|
correlationId::String # Unique identifier to track messages across systems
|
|
msgId::String # This message id
|
|
timestamp::String # Message published timestamp
|
|
|
|
sendTo::String # Topic/subject the sender sends to
|
|
msgPurpose::String # Purpose of this message (ACK | NACK | updateStatus | shutdown | ...)
|
|
senderName::String # Sender name (e.g., "agent-wine-web-frontend")
|
|
senderId::String # Sender id (uuid4)
|
|
receiverName::String # Message receiver name (e.g., "agent-backend")
|
|
receiverId::String # Message receiver id (uuid4 or nothing for broadcast)
|
|
replyTo::String # Topic to reply to
|
|
replyToMsgId::String # Message id this message is replying to
|
|
brokerURL::String # NATS server address
|
|
|
|
metadata::Dict{String, Any}
|
|
payloads::AbstractArray{msgPayload_v1} # Multiple payloads stored here
|
|
end
|
|
```
|
|
|
|
**JSON Schema:**
|
|
```json
|
|
{
|
|
"correlationId": "uuid-v4-string",
|
|
"msgId": "uuid-v4-string",
|
|
"timestamp": "2024-01-15T10:30:00Z",
|
|
|
|
"sendTo": "topic/subject",
|
|
"msgPurpose": "ACK | NACK | updateStatus | shutdown | chat",
|
|
"senderName": "agent-wine-web-frontend",
|
|
"senderId": "uuid4",
|
|
"receiverName": "agent-backend",
|
|
"receiverId": "uuid4",
|
|
"replyTo": "topic",
|
|
"replyToMsgId": "uuid4",
|
|
"brokerURL": "nats://localhost:4222",
|
|
|
|
"metadata": {
|
|
"content_type": "application/octet-stream",
|
|
"content_length": 123456
|
|
},
|
|
|
|
"payloads": [
|
|
{
|
|
"id": "uuid4",
|
|
"dataname": "login_image",
|
|
"type": "image",
|
|
"transport": "direct",
|
|
"encoding": "base64",
|
|
"size": 15433,
|
|
"data": "base64-encoded-string",
|
|
"metadata": {
|
|
"checksum": "sha256_hash"
|
|
}
|
|
},
|
|
{
|
|
"id": "uuid4",
|
|
"dataname": "large_data",
|
|
"type": "table",
|
|
"transport": "link",
|
|
"encoding": "none",
|
|
"size": 524288,
|
|
"data": "http://localhost:8080/file/UPLOAD_ID/FILE_ID/data.arrow",
|
|
"metadata": {
|
|
"checksum": "sha256_hash"
|
|
}
|
|
}
|
|
]
|
|
}
|
|
```
|
|
|
|
### 2. msgPayload_v1 - Payload Structure
|
|
|
|
The `msgPayload_v1` structure provides flexible payload handling for various data types.
|
|
|
|
**Julia Structure:**
|
|
```julia
|
|
struct msgPayload_v1
|
|
id::String # Id of this payload (e.g., "uuid4")
|
|
dataname::String # Name of this payload (e.g., "login_image")
|
|
type::String # "text | json | table | image | audio | video | binary"
|
|
transport::String # "direct | link"
|
|
encoding::String # "none | json | base64 | arrow-ipc"
|
|
size::Integer # Data size in bytes
|
|
data::Any # Payload data in case of direct transport or a URL in case of link
|
|
metadata::Dict{String, Any} # Dict("checksum" => "sha256_hash", ...)
|
|
end
|
|
```
|
|
|
|
**Key Features:**
|
|
- Supports multiple data types: text, json, table, image, audio, video, binary
|
|
- Flexible transport: "direct" (NATS) or "link" (HTTP fileserver)
|
|
- Multiple payloads per message (essential for chat with mixed content)
|
|
- Per-payload and per-envelope metadata support
|
|
|
|
### 3. Transport Strategy Decision Logic
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ smartsend Function │
|
|
│ Accepts: [(dataname1, data1), (dataname2, data2), ...] │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ Is payload size < 1MB? │
|
|
└─────────────────────────────────────────────────────────────┘
|
|
│
|
|
┌────────────────┴─-────────────────┐
|
|
▼ ▼
|
|
┌─────────────────┐ ┌─────────────────┐
|
|
│ Direct Path │ │ Link Path │
|
|
│ (< 1MB) │ │ (> 1MB) │
|
|
│ │ │ │
|
|
│ • Serialize to │ │ • Serialize to │
|
|
│ IOBuffer │ │ IOBuffer │
|
|
│ • Base64 encode │ │ • Upload to │
|
|
│ • Publish to │ │ HTTP Server │
|
|
│ NATS │ │ • Publish to │
|
|
│ (with payload │ │ NATS with URL │
|
|
│ in envelope) │ │ (in envelope) │
|
|
└─────────────────┘ └─────────────────┘
|
|
```
|
|
|
|
### 4. Julia Module Architecture
|
|
|
|
```mermaid
|
|
graph TD
|
|
subgraph JuliaModule
|
|
smartsendJulia[smartsend Julia]
|
|
SizeCheck[Size Check]
|
|
DirectPath[Direct Path]
|
|
LinkPath[Link Path]
|
|
HTTPClient[HTTP Client]
|
|
end
|
|
|
|
smartsendJulia --> SizeCheck
|
|
SizeCheck -->|< 1MB| DirectPath
|
|
SizeCheck -->|>= 1MB| LinkPath
|
|
LinkPath --> HTTPClient
|
|
|
|
style JuliaModule fill:#c5e1a5
|
|
```
|
|
|
|
### 5. JavaScript Module Architecture
|
|
|
|
```mermaid
|
|
graph TD
|
|
subgraph JSModule
|
|
smartsendJS[smartsend JS]
|
|
smartreceiveJS[smartreceive JS]
|
|
JetStreamConsumer[JetStream Pull Consumer]
|
|
ApacheArrow[Apache Arrow]
|
|
end
|
|
|
|
smartsendJS --> NATS
|
|
smartreceiveJS --> JetStreamConsumer
|
|
JetStreamConsumer --> ApacheArrow
|
|
|
|
style JSModule fill:#f3e5f5
|
|
```
|
|
|
|
## Implementation Details
|
|
|
|
### Julia Implementation
|
|
|
|
#### Dependencies
|
|
- `NATS.jl` - Core NATS functionality
|
|
- `Arrow.jl` - Arrow IPC serialization
|
|
- `JSON3.jl` - JSON parsing
|
|
- `HTTP.jl` - HTTP client for file server
|
|
- `Dates.jl` - Timestamps for logging
|
|
|
|
#### smartsend Function
|
|
|
|
```julia
|
|
function smartsend(
|
|
subject::String,
|
|
data::AbstractArray{Tuple{String, Any}},
|
|
type::String = "json";
|
|
nats_url::String = "nats://localhost:4222",
|
|
fileserverUploadHandler::Function = plik_oneshot_upload,
|
|
size_threshold::Int = 1_000_000 # 1MB
|
|
)
|
|
```
|
|
|
|
**Input Format:**
|
|
- `data::AbstractArray{Tuple{String, Any}}` - **Must be a list of tuples**: `[("dataname1", data1), ("dataname2", data2), ...]`
|
|
- Even for single payloads: `[(dataname1, data1)]`
|
|
|
|
**Flow:**
|
|
1. Iterate through the list of `("dataname", data)` tuples
|
|
2. For each payload: serialize to Arrow IPC stream (if table) or JSON
|
|
3. Check payload size
|
|
4. If < threshold: publish directly to NATS with Base64-encoded payload
|
|
5. If >= threshold: upload to HTTP server, publish NATS with URL
|
|
|
|
#### smartreceive Handler
|
|
|
|
```julia
|
|
function smartreceive(
|
|
msg::NATS.Message;
|
|
fileserverDownloadHandler::Function,
|
|
max_retries::Int = 5,
|
|
base_delay::Int = 100,
|
|
max_delay::Int = 5000
|
|
)
|
|
# Parse envelope
|
|
# Iterate through all payloads
|
|
# For each payload: check transport type
|
|
# If direct: decode Base64 payload
|
|
# If link: fetch from URL with exponential backoff using fileserverDownloadHandler
|
|
# Deserialize payload based on type
|
|
# Return list of (dataname, data) tuples
|
|
end
|
|
```
|
|
|
|
**Output Format:**
|
|
- Always returns a list of tuples: `[(dataname1, data1), (dataname2, data2), ...]`
|
|
- Even for single payloads: `[(dataname1, data1)]`
|
|
|
|
**Process Flow:**
|
|
1. Parse the JSON envelope to extract the `payloads` array
|
|
2. Iterate through each payload in `payloads`
|
|
3. For each payload:
|
|
- Determine transport type (`direct` or `link`)
|
|
- If `direct`: decode Base64 data from the message
|
|
- If `link`: fetch data from URL using exponential backoff
|
|
- Deserialize based on payload type (`json`, `table`, `binary`, etc.)
|
|
4. Return list of `(dataname, data)` tuples
|
|
|
|
### JavaScript Implementation
|
|
|
|
#### Dependencies
|
|
- `nats.js` - Core NATS functionality
|
|
- `apache-arrow` - Arrow IPC serialization
|
|
- `uuid` - Correlation ID generation
|
|
|
|
#### smartsend Function
|
|
|
|
```javascript
|
|
async function smartsend(subject, data, type = 'json', options = {})
|
|
// options object should include:
|
|
// - fileserverUploadHandler: function to upload data to file server
|
|
// - fileserver_url: base URL of the file server
|
|
```
|
|
|
|
**Flow:**
|
|
1. Serialize data to Arrow IPC buffer (if table)
|
|
2. Check payload size
|
|
3. If < threshold: publish directly to NATS
|
|
4. If >= threshold: upload to HTTP server, publish NATS with URL
|
|
|
|
#### smartreceive Handler
|
|
|
|
```javascript
|
|
async function smartreceive(msg, options = {})
|
|
// options object should include:
|
|
// - fileserverDownloadHandler: function to fetch data from file server URL
|
|
// - fileserver_url: base URL of the file server
|
|
// - max_retries: maximum retry attempts for fetching URL
|
|
// - base_delay: initial delay for exponential backoff in ms
|
|
// - max_delay: maximum delay for exponential backoff in ms
|
|
```
|
|
|
|
**Process Flow:**
|
|
1. Parse the JSON envelope to extract the `payloads` array
|
|
2. Iterate through each payload in `payloads`
|
|
3. For each payload:
|
|
- Determine transport type (`direct` or `link`)
|
|
- If `direct`: decode Base64 data from the message
|
|
- If `link`: fetch data from URL using exponential backoff
|
|
- Deserialize based on payload type (`json`, `table`, `binary`, etc.)
|
|
4. Return list of `(dataname, data)` tuples
|
|
|
|
## Scenario Implementations
|
|
|
|
### Scenario 1: Command & Control (Small JSON)
|
|
|
|
**Julia (Receiver):**
|
|
```julia
|
|
# Subscribe to control subject
|
|
# Parse JSON envelope
|
|
# Execute simulation with parameters
|
|
# Send acknowledgment
|
|
```
|
|
|
|
**JavaScript (Sender):**
|
|
```javascript
|
|
// Create small JSON config
|
|
// Send via smartsend with type="json"
|
|
```
|
|
|
|
### Scenario 2: Deep Dive Analysis (Large Arrow Table)
|
|
|
|
**Julia (Sender):**
|
|
```julia
|
|
# Create large DataFrame
|
|
# Convert to Arrow IPC stream
|
|
# Check size (> 1MB)
|
|
# Upload to HTTP server
|
|
# Publish NATS with URL
|
|
```
|
|
|
|
**JavaScript (Receiver):**
|
|
```javascript
|
|
// Receive NATS message with URL
|
|
// Fetch data from HTTP server
|
|
// Parse Arrow IPC with zero-copy
|
|
// Load into Perspective.js or D3
|
|
```
|
|
|
|
### Scenario 3: Live Audio Processing
|
|
|
|
**JavaScript (Sender):**
|
|
```javascript
|
|
// Capture audio chunk
|
|
// Send as binary with metadata headers
|
|
// Use smartsend with type="audio"
|
|
```
|
|
|
|
**Julia (Receiver):**
|
|
```julia
|
|
// Receive audio data
|
|
// Perform FFT or AI transcription
|
|
// Send results back (JSON + Arrow table)
|
|
```
|
|
|
|
### Scenario 4: Catch-Up (JetStream)
|
|
|
|
**Julia (Producer):**
|
|
```julia
|
|
# Publish to JetStream
|
|
# Include metadata for temporal tracking
|
|
```
|
|
|
|
**JavaScript (Consumer):**
|
|
```javascript
|
|
// Connect to JetStream
|
|
// Request replay from last 10 minutes
|
|
// Process historical and real-time messages
|
|
```
|
|
|
|
### Scenario 5: Selection (Low Bandwidth)
|
|
|
|
**Focus:** Small Arrow tables, Julia to JavaScript. The Action: Julia wants to send a small DataFrame to show on a JavaScript dashboard for the user to choose.
|
|
|
|
**Julia (Sender):**
|
|
```julia
|
|
# Create small DataFrame (e.g., 50KB - 500KB)
|
|
# Convert to Arrow IPC stream
|
|
# Check payload size (< 1MB threshold)
|
|
# Publish directly to NATS with Base64-encoded payload
|
|
# Include metadata for dashboard selection context
|
|
```
|
|
|
|
**JavaScript (Receiver):**
|
|
```javascript
|
|
// Receive NATS message with direct transport
|
|
// Decode Base64 payload
|
|
// Parse Arrow IPC with zero-copy
|
|
// Load into selection UI component (e.g., dropdown, table)
|
|
// User makes selection
|
|
// Send selection back to Julia
|
|
```
|
|
|
|
**Use Case:** Julia server generates a list of available options (e.g., file selections, configuration presets) as a small DataFrame and sends to JavaScript dashboard for user selection. The selection is then sent back to Julia for processing.
|
|
|
|
### Scenario 6: Chat System
|
|
|
|
**Focus:** Every conversational message is composed of any number and any combination of components, spanning the full spectrum from small to large. This includes text, images, audio, video, tables, and files—specifically accommodating everything from brief snippets to high-resolution images, large audio files, extensive tables, and massive documents. Support for claim-check delivery and full bi-directional messaging.
|
|
|
|
**Multi-Payload Support:** The system supports mixed-payload messages where a single message can contain multiple payloads with different transport strategies. The `smartreceive` function iterates through all payloads in the envelope and processes each according to its transport type.
|
|
|
|
**Julia (Sender/Receiver):**
|
|
```julia
|
|
# Build chat message with mixed payloads:
|
|
# - Text: direct transport (Base64)
|
|
# - Small images: direct transport (Base64)
|
|
# - Large images: link transport (HTTP URL)
|
|
# - Audio/video: link transport (HTTP URL)
|
|
# - Tables: direct or link depending on size
|
|
# - Files: link transport (HTTP URL)
|
|
#
|
|
# Each payload uses appropriate transport strategy:
|
|
# - Size < 1MB → direct (NATS + Base64)
|
|
# - Size >= 1MB → link (HTTP upload + NATS URL)
|
|
#
|
|
# Include claim-check metadata for delivery tracking
|
|
# Support bidirectional messaging with replyTo fields
|
|
```
|
|
|
|
**JavaScript (Sender/Receiver):**
|
|
```javascript
|
|
// Build chat message with mixed content:
|
|
// - User input text: direct transport
|
|
// - Selected image: check size, use appropriate transport
|
|
// - Audio recording: link transport for large files
|
|
// - File attachment: link transport
|
|
//
|
|
// Parse received message:
|
|
// - Direct payloads: decode Base64
|
|
// - Link payloads: fetch from HTTP with exponential backoff
|
|
// - Deserialize all payloads appropriately
|
|
//
|
|
// Render mixed content in chat interface
|
|
// Support bidirectional reply with claim-check delivery confirmation
|
|
```
|
|
|
|
**Use Case:** Full-featured chat system supporting rich media. User can send text, small images directly, or upload large files that get uploaded to HTTP server and referenced via URLs. Claim-check pattern ensures reliable delivery tracking for all message components.
|
|
|
|
**Implementation Note:** The `smartreceive` function iterates through all payloads in the envelope and processes each according to its transport type. See the standard API format in Section 1: `msgEnvelope_v1` supports `AbstractArray{msgPayload_v1}` for multiple payloads.
|
|
|
|
## Performance Considerations
|
|
|
|
### Zero-Copy Reading
|
|
- Use Arrow's memory-mapped file reading
|
|
- Avoid unnecessary data copying during deserialization
|
|
- Use Apache Arrow's native IPC reader
|
|
|
|
### Exponential Backoff
|
|
- Implement exponential backoff for HTTP link fetching
|
|
- Maximum retry count: 5
|
|
- Base delay: 100ms, max delay: 5000ms
|
|
|
|
### Correlation ID Logging
|
|
- Log correlation_id at every stage
|
|
- Include: send, receive, serialize, deserialize
|
|
- Use structured logging format
|
|
|
|
## Testing Strategy
|
|
|
|
### Unit Tests
|
|
- Test smartsend with various payload sizes
|
|
- Test smartreceive with direct and link transport
|
|
- Test Arrow IPC serialization/deserialization
|
|
|
|
### Integration Tests
|
|
- Test full flow with NATS server
|
|
- Test large data transfer (> 100MB)
|
|
- Test audio processing pipeline
|
|
|
|
### Performance Tests
|
|
- Measure throughput for small payloads
|
|
- Measure throughput for large payloads
|