# Solution Design: msghandler **Version**: 1.3.0 **Date**: 2026-05-22 **Status**: Active **Ground Truth**: [`src/msghandler.jl`](../src/msghandler.jl) **ASG Framework Alignment**: v8 pillars - Requirements → Solution Design → Specification → Walkthrough → Implementation Plan → Validation → Runbook --- ## 1. Problem Decomposition msghandler addresses the challenge of cross-platform data exchange between **Julia**, **JavaScript**, **Python**, **Dart**, **Rust**, and **MicroPython** applications using message brokers as transport layers. ### User Problems | Problem | Description | User Impact | Requirement ID | |---------|-------------|-------------|----------------| | **P-001**: Cross-platform data serialization | Different languages have incompatible data types and serialization formats | Developers must write platform-specific conversion code | FR-001, FR-002 | | **P-002**: Large payload handling | Message brokers have size limits, but large files need to be transferred | Large files either fail or require complex workarounds | FR-003 | | **P-003**: Transport abstraction | Each platform has different message broker libraries and APIs | No unified interface across platforms | FR-013, FR-014 | | **P-004**: Request-response patterns | Bi-directional communication requires complex correlation tracking | Developers must implement custom message routing | FR-011 | | **P-005**: File server reliability | File server may be temporarily unavailable during downloads | Failed downloads without retry mechanism | FR-010 | | **P-006**: Payload type preservation | Different platforms have different type systems | Data corruption or misinterpretation on receiving end | FR-006, FR-007 | ### Solution Boundaries **In Scope**: - Unified API for `smartpack()` and `smartunpack()` across all platforms - Automatic transport selection based on payload size - File server integration using Claim-Check pattern - Multi-payload support with mixed types in single message - Exponential backoff for reliable file downloads - Correlation ID propagation for message tracing **Out of Scope**: - Message compression (adds complexity without clear benefit) - Message encryption (application-layer concern) - Advanced message routing (simple topic matching sufficient) - Persistent message queues (transport pattern sufficient) ### Decision IDs | Decision ID | Decision | Description | Requirement IDs | NFR IDs | |-------------|----------|-------------|-----------------|---------| | SD-001 | Claim-Check Pattern | Large payloads uploaded to HTTP server, small payloads sent directly | FR-003, FR-004 | NFR-104, NFR-105 | | SD-002 | Automatic Transport Selection | <0.5MB = direct, ≥0.5MB = link based on size threshold | FR-003, FR-004 | NFR-104, NFR-105 | | SD-003 | Handler Function Abstraction | Pluggable file server implementations via handler functions | FR-008, FR-009 | NFR-202 | | SD-004 | Unified Tuple Format | Same `(dataname, data, type)` format across all platforms | FR-006, FR-007 | - | | SD-005 | Base64 Encoding | JSON-compatible binary data transport | FR-012 | - | | SD-006 | Transport Abstraction | Support multiple broker protocols (NATS/MQTT/WebSocket) transparently | FR-013, FR-014 | NFR-201 | | SD-007 | Exponential Backoff | Retry failed file downloads with exponential backoff | FR-010 | NFR-202 | | SD-008 | Correlation ID Propagation | Propagate correlation IDs through all message processing steps | FR-011 | NFR-401, NFR-403 | --- ## 2. Solution Approach msghandler implements a **Claim-Check pattern** with intelligent transport selection: ``` Sender (smartpack) Transport Layer Receiver (smartunpack) ┌─────────────────┐ ┌───────────────┐ ┌───────────────────┐ │ │ │ │ │ │ │ 1. Data tuples │────────────>│ │───────────>│ 1. Parse envelope │ │ [(name, │ JSON │ Message │ JSON │ 2. Check transport│ │ data, type)]│ format │ Broker │ format │ 3. Fetch/Decode │ │ │ │ (NATS/MQTT/ │ │ 4. Return tuples │ └─────────────────┘ │ WebSocket) │ │ │ │ │ └───────────────────┘ └───────────────┘ ``` ### Key Design Decisions | Decision ID | Decision | Rationale | Alternatives Rejected | |-------------|----------|-----------|----------------------| | **SD-001** | Claim-Check Pattern | Large payloads (>0.5MB) uploaded to HTTP server, small payloads sent directly via transport | Client-side compression - adds complexity; Server-side compression - not universally supported | | **SD-002** | Automatic Transport Selection | <0.5MB = direct (fast), ≥0.5MB = link (avoid transport limits) | Manual selection - error-prone; Fixed threshold - not adaptive | | **SD-003** | Handler Function Abstraction | Allows pluggable file server implementations (Plik, AWS S3, custom) | Hardcoded Plik - not flexible; Interface-based - too complex for this use case | | **SD-004** | Unified Tuple Format | Same input/output format across all platforms | Platform-native formats - no interoperability; Protocol buffers - too heavy | | **SD-005** | Base64 Encoding | JSON-compatible binary data transport | Raw bytes - not JSON-compatible; Hex encoding - 2x size overhead | | **SD-006** | Transport Abstraction | Support multiple broker protocols (NATS/MQTT/WebSocket) transparently | Platform-specific libraries - no interoperability | | **SD-007** | Exponential Backoff | Retry failed file downloads with exponential backoff | Simple retry - too aggressive; No retry - poor reliability | | **SD-008** | Correlation ID Propagation | Propagate correlation IDs through all message processing steps | Manual correlation - error-prone; No tracing - debug impossible | ### Architecture Components ```mermaid flowchart TB subgraph Client["Client Application"] direction TB APP["Application Code"] API["msghandler API"] APP -->|Data tuples| API API -->|JSON envelope| TRANSPORT end subgraph Transport["Transport Layer"] direction TB BROKER["Message Broker
NATS/MQTT/WebSocket"] TOPICS["Topic Subscription"] API -->|Publish| BROKER BROKER -->|Deliver| TOPICS TOPICS -->|Subscribe| API end subgraph FileServer["File Server"] direction TB UPLOAD["Upload Handler"] DOWNLOAD["Download Handler"] API -.->|Upload URL| UPLOAD DOWNLOAD -.->|Fetch URL| API end style Client fill:#e1f5fe,stroke:#0288d1,stroke-width:2px style Transport fill:#ffe0b2,stroke:#f57c00,stroke-width:2px style FileServer fill:#c8e6c9,stroke:#43a047,stroke-width:2px ``` --- ## 3. Alternatives Considered | Alternative | Pros | Cons | Decision | |-------------|------|------|----------| | **gRPC/Protobuf** | Strong typing, efficient binary format | No native MicroPython support; Complex schema management | Rejected - not cross-platform enough | | **MessagePack** | Compact binary, good performance | Browser support limited; No standard for tabular data | Rejected - missing Arrow IPC alternative | | **Protocol Buffers** | Type-safe, efficient | No native support for tabular data exchange | Rejected - cannot represent DataFrames natively | | **REST HTTP Upload** | Simple, universal | High latency; No real-time capability | Rejected - not suitable for message broker pattern | | **Hybrid (direct/link)** | Optimal for both small and large payloads | More complex implementation | Accepted - matches user requirements (FR-003, FR-004) | | **Single transport type** | Simpler implementation | Cannot handle large payloads efficiently | Rejected - violates FR-003 requirement | | **Platform-specific APIs** | Native performance | No interoperability; Maintenance burden | Rejected - violates cross-platform goal | --- ## 4. High-Level Component Diagram ```mermaid flowchart TD subgraph msghandler["msghandler Core Module"] direction TB subgraph Serialization["Serialization Layer"] DIR["Direct Transport"] LNK["Link Transport"] DIR -->|Base64| JSON_MSG LNK -->|HTTP URL| JSON_MSG end subgraph Envelope["Envelope Builder"] HDR["Message Header"] PAY["Payload Manager"] HDR --> PAY end subgraph Handlers["Handler Functions"] UPD["Upload Handler"] DWN["Download Handler"] UPD --> LNK DWN --> LNK end API["smartpack() / smartunpack()"] API -->|Input| Serialization API -->|Output| Serialization API -->|Configure| Handlers end subgraph Transport["Transport Layer"] BROKER["NATS / MQTT / WebSocket"] API -->|JSON| BROKER BROKER -->|JSON| API end subgraph FileServer["File Server"] Plik["HTTP Server"] UPD -.->|POST| Plik Plik -.->|URL| DWN end style msghandler fill:#b3e5fc,stroke:#0288d1,stroke-width:2px style Transport fill:#ffe0b2,stroke:#f57c00,stroke-width:2px style FileServer fill:#c8e6c9,stroke:#43a047,stroke-width:2px ``` ### Component Responsibilities | Component | Responsibilities | Decision IDs | Requirements Addressed | |-----------|-----------------|--------------|----------------------| | **Serialization Layer** | Convert data types to transport format (Base64/URL) | SD-005 | FR-001, FR-002, FR-012 | | **Envelope Builder** | Create standardized message envelope with metadata | SD-001, SD-008 | FR-011, FR-013, FR-014 | | **Handler Functions** | Abstract file server operations for pluggability | SD-003, SD-007 | FR-008, FR-009, FR-010 | | **Transport Adapter** | Support multiple broker protocols transparently | SD-006 | FR-013, FR-014 | | **Payload Manager** | Track payload types, sizes, and encoding | SD-004 | FR-006, FR-007 | --- ## 5. Decision Rationale ### SD-001: Why Claim-Check Pattern? **Requirement**: FR-003 (Large file handling), FR-004 (Direct transport for small payloads) **NFRs**: NFR-104 (File upload latency <1s), NFR-105 (File download latency <1s) **Rationale**: - Transport layers (NATS, MQTT) have message size limits (typically 1MB) - Direct transport is faster for small payloads (no file server round-trip) - Link transport avoids transport limits for large payloads - User doesn't need to manually choose - automatic selection based on threshold ### SD-002: Why Handler Functions for File Server? **Requirement**: FR-008 (Plik integration), FR-009 (Custom file server support) **NFR**: NFR-202 (File server availability <5% failure rate) **Rationale**: - Plik is common open-source solution for file server - Some users need AWS S3 or custom implementation - Handler functions provide clean abstraction without vendor lock-in - Same signature across all platforms (unified API) ### SD-003: Why Tuple Format for Payloads? **Requirement**: FR-006 (Multi-payload messages), FR-007 (Payload type preservation) **Rationale**: - `(dataname, data, type)` tuple is language-agnostic - Simple to understand: name, content, type - Supports mixed payload types in single message - Easy to serialize/deserialize across platforms ### SD-004: Why Base64 Encoding? **Requirement**: FR-012 (Message serialization), FR-001 (Cross-platform text messaging) **Rationale**: - JSON is universal - works on all platforms - Base64 converts binary to ASCII for JSON compatibility - Standard format with native support in all languages - No additional dependencies needed ### SD-005: Why Automatic Transport Selection? **Requirement**: FR-003 (Large file handling), FR-004 (Direct transport for small payloads) **NFRs**: NFR-104 (File upload latency <1s), NFR-105 (File download latency <1s) **Rationale**: - <0.5MB payloads use direct transport (<1s latency, FR-004 KPI) - ≥0.5MB payloads use link transport to avoid transport limits (FR-003 KPI: 99% successful uploads) - User doesn't need to manually choose - automatic selection based on threshold ### SD-006: Why Transport Abstraction? **Requirement**: FR-013 (Transport publishing), FR-014 (Transport subscription) **NFR**: NFR-201 (Message delivery at-least-once) **Rationale**: - Support multiple broker protocols (NATS, MQTT, WebSocket) transparently - Caller handles actual transport publishing/subscription - Unified API across all platforms - At-least-once delivery semantics via transport layer ### SD-007: Why Exponential Backoff? **Requirement**: FR-010 (Exponential backoff retry) **NFR**: NFR-202 (File server availability <5% failure rate) **Rationale**: - File server may be temporarily unavailable - Exponential backoff prevents overwhelming server during outages - Default: 5 retries, 100ms base delay, 5000ms max delay - 95% successful downloads within retry limit (FR-010 KPI) ### SD-008: Why Correlation ID Propagation? **Requirement**: FR-011 (Correlation ID propagation) **NFRs**: NFR-401 (Required logs), NFR-403 (Tracing) **Rationale**: - Trace messages across distributed systems - Correlation ID logged with every message (NFR-401) - Propagated through all message processing steps (NFR-403) - Enables debugging and performance analysis in production --- ## 6. Risk Assessment | Risk | Impact | Probability | Mitigation | Requirement IDs | NFR IDs | |------|--------|-------------|------------|-----------------|---------| | **Performance degradation with >500KB payloads** | High | Medium | Size threshold detection; Link transport fallback | FR-003, FR-004 | NFR-104, NFR-105 | | **File server availability issues** | Medium | Low | Exponential backoff retry; Graceful degradation | FR-010 | NFR-202 | | **Platform-specific bugs** | Medium | Low | Comprehensive test suite per platform; CI validation | FR-001, FR-002, FR-006, FR-007 | - | | **Encoding mismatches between platforms** | High | Low | Strict specification; Test contracts; Validation rules | FR-012 | NFR-301 | | **Transport layer incompatibility** | Medium | Low | Transport-agnostic design; Handler abstraction | FR-013, FR-014 | NFR-201 | | **Correlation ID loss in processing** | Medium | Low | Centralized trace context management | FR-011 | NFR-401, NFR-403 | --- ## 7. Requirements Traceability | Solution Component | Decision ID | Requirement ID | Description | |-------------------|-------------|----------------|-------------| | **smartpack() function** | SD-001, SD-002, SD-004, SD-005, SD-006, SD-008 | FR-001, FR-002, FR-003, FR-004, FR-005, FR-006, FR-007, FR-008, FR-009, FR-010, FR-011, FR-012, FR-013, FR-014 | Unified API for sending messages across all platforms | | **smartunpack() function** | SD-001, SD-002, SD-004, SD-005, SD-006, SD-007, SD-008 | FR-001, FR-002, FR-003, FR-004, FR-005, FR-006, FR-007, FR-008, FR-009, FR-010, FR-011, FR-012, FR-013, FR-014 | Unified API for receiving messages across all platforms | | **Direct transport** | SD-002 | FR-004 | Send payloads < threshold directly via transport | | **Link transport** | SD-001, SD-002 | FR-003 | Upload payloads ≥ threshold to file server | | **File server handler** | SD-003, SD-007 | FR-008, FR-009, FR-010 | Pluggable upload/download handlers with retry logic | | **Payload type preservation** | SD-004 | FR-006, FR-007 | Support text, dictionary, arrowtable, jsontable, image, audio, video, binary | | **Correlation ID propagation** | SD-008 | FR-011 | Message tracing across distributed systems | | **Multi-payload support** | SD-004 | FR-006, FR-007 | List of (dataname, data, type) tuples | ### Non-Functional Requirements Traceability | Solution Component | Decision ID | NFR ID | Description | |-------------------|-------------|--------|-------------| | **Serialization optimization** | SD-005 | NFR-101, NFR-102 | <50ms overhead for 10KB payloads | | **Transport efficiency** | SD-006 | NFR-103 | <100ms connection establishment | | **File server latency** | SD-001, SD-002 | NFR-104, NFR-105 | <1s upload/download for 0.5MB files | | **Concurrent connections** | SD-006 | NFR-106 | Support 100+ simultaneous connections | | **Message throughput** | SD-005, SD-006 | NFR-107 | Handle 1000+ messages/second per instance | | **At-least-once delivery** | SD-006 | NFR-201 | Transport layer semantics | | **Graceful degradation** | SD-003, SD-007 | NFR-202 | File server unavailability handling | | **Auto-reconnect** | SD-006 | NFR-203 | Transport connection failure recovery | | **Payload integrity** | SD-005 | NFR-301 | 100% SHA-256 checksum validation | | **Transport security** | SD-006 | NFR-302 | 100% TLS connections in production | | **File server security** | SD-003 | NFR-303 | 100% authenticated file uploads | | **Required logs** | SD-001, SD-008 | NFR-401 | Correlation ID, msg_id, timestamp, etc. | | **Critical metrics** | SD-001, SD-005 | NFR-402 | messages_sent_total, file upload/download duration | | **Tracing** | SD-001, SD-008 | NFR-403 | Correlation ID propagation | | **Alerting** | SD-007 | NFR-404 | <5min alert latency for `download_retry_exceeded` | --- ## 8. Gap-Check Validation | Stage Transition | Gap-Check Question | Status | |------------------|-------------------|--------| | **Requirements → Solution Design** | Does the Solution Design clearly explain how the system solves the user problem, not just what it does? | ✅ Verified - All user problems mapped to solution components with requirement ID and decision ID references | | **Solution Design → Specification** | Does the Specification define all technical details that the solution approach requires? | ⏳ Pending - Specification needs review for completeness | | **Solution Design → Walkthrough** | Does the Walkthrough reflect the complete flow including error states and timing? | ⏳ Pending - Walkthrough needs validation against design | ### Solution Design Validation **User Problems** (from requirements.md): - **P-001**: Cross-platform data serialization (FR-001, FR-002) - **P-002**: Large payload handling (FR-003) - **P-003**: Transport abstraction (FR-013, FR-014) - **P-004**: Request-response patterns (FR-011) - **P-005**: File server reliability (FR-010) - **P-006**: Payload type preservation (FR-006, FR-007) **Solution Components**: 1. **SD-001** - `smartpack()` / `smartunpack()` - Unified API for all platforms 2. **SD-002** - Claim-Check pattern - Automatic transport selection based on size threshold 3. **SD-003** - Handler function abstraction - Plik/AWS S3/custom file server support 4. **SD-004** - Tuple format - `(dataname, data, type)` - platform-agnostic 5. **SD-005** - Base64 encoding - JSON-compatible binary data transport 6. **SD-006** - Transport abstraction - Support multiple broker protocols transparently 7. **SD-007** - Exponential backoff - Reliable file downloads with retry logic 8. **SD-008** - Correlation ID propagation - Message tracing across distributed systems **Requirement Mapping**: - **Functional Requirements**: FR-001 through FR-014 ✅ - **Non-Functional Requirements**: NFR-101 through NFR-405 ✅ **Gap Check**: Does this solution explain *how* users will actually use the system? **Answer**: Yes - the walkthrough provides concrete examples: 1. JavaScript sends `[(msg, "Hello", "text"), (avatar, binary_data, "image")]` 2. `smartpack()` automatically selects transport based on size (SD-002) 3. Large file (≥0.5MB) → link transport → file server upload (SD-001) 4. Small payload (<0.5MB) → direct transport → base64 encoding (SD-005) 5. Receiver calls `smartunpack()` → receives same tuple format with preserved types **NFR Traceability**: - **Performance**: NFR-101 (serialization <50ms), NFR-102 (deserialization <50ms), NFR-103 (connection <100ms) ✅ - **Reliability**: NFR-201 (at-least-once delivery), NFR-202 (file server <5% failure), NFR-203 (auto-reconnect <30s) ✅ - **Security**: NFR-301 (SHA-256 checksum), NFR-302 (TLS 100%), NFR-303 (authenticated uploads) ✅ - **Observability**: NFR-401 (required logs), NFR-402 (metrics), NFR-403 (tracing), NFR-404 (alerting <5min) ✅ --- *This solution design document is versioned and maintained in git alongside the codebase. All implementations must adhere to this design.* **Traceability Summary**: - All requirements traced to solution components with SD-XXX decision IDs - Each decision ID references the corresponding requirement IDs (FR-XXX, NFR-XXX) - Specification must cite SD-XXX references for each technical detail