21 KiB
Solution Design: msghandler
Version: 1.3.0
Date: 2026-05-22
Status: Active
Ground Truth: src/msghandler.jl
ASG Framework Alignment: v8 pillars - Requirements → Solution Design → Specification → Walkthrough → Implementation Plan → Validation → Runbook
1. Problem Decomposition
msghandler addresses the challenge of cross-platform data exchange between Julia, JavaScript, Python, Dart, Rust, and MicroPython applications using message brokers as transport layers.
User Problems
| Problem | Description | User Impact | Requirement ID |
|---|---|---|---|
| P-001: Cross-platform data serialization | Different languages have incompatible data types and serialization formats | Developers must write platform-specific conversion code | FR-001, FR-002 |
| P-002: Large payload handling | Message brokers have size limits, but large files need to be transferred | Large files either fail or require complex workarounds | FR-003 |
| P-003: Transport abstraction | Each platform has different message broker libraries and APIs | No unified interface across platforms | FR-013, FR-014 |
| P-004: Request-response patterns | Bi-directional communication requires complex correlation tracking | Developers must implement custom message routing | FR-011 |
| P-005: File server reliability | File server may be temporarily unavailable during downloads | Failed downloads without retry mechanism | FR-010 |
| P-006: Payload type preservation | Different platforms have different type systems | Data corruption or misinterpretation on receiving end | FR-006, FR-007 |
Solution Boundaries
In Scope:
- Unified API for
smartpack()andsmartunpack()across all platforms - Automatic transport selection based on payload size
- File server integration using Claim-Check pattern
- Multi-payload support with mixed types in single message
- Exponential backoff for reliable file downloads
- Correlation ID propagation for message tracing
Out of Scope:
- Message compression (adds complexity without clear benefit)
- Message encryption (application-layer concern)
- Advanced message routing (simple topic matching sufficient)
- Persistent message queues (transport pattern sufficient)
Decision IDs
| Decision ID | Decision | Description | Requirement IDs | NFR IDs |
|---|---|---|---|---|
| SD-001 | Claim-Check Pattern | Large payloads uploaded to HTTP server, small payloads sent directly | FR-003, FR-004 | NFR-104, NFR-105 |
| SD-002 | Automatic Transport Selection | <0.5MB = direct, ≥0.5MB = link based on size threshold | FR-003, FR-004 | NFR-104, NFR-105 |
| SD-003 | Handler Function Abstraction | Pluggable file server implementations via handler functions | FR-008, FR-009 | NFR-202 |
| SD-004 | Unified Tuple Format | Same (dataname, data, type) format across all platforms |
FR-006, FR-007 | - |
| SD-005 | Base64 Encoding | JSON-compatible binary data transport | FR-012 | - |
| SD-006 | Transport Abstraction | Support multiple broker protocols (NATS/MQTT/WebSocket) transparently | FR-013, FR-014 | NFR-201 |
| SD-007 | Exponential Backoff | Retry failed file downloads with exponential backoff | FR-010 | NFR-202 |
| SD-008 | Correlation ID Propagation | Propagate correlation IDs through all message processing steps | FR-011 | NFR-401, NFR-403 |
2. Solution Approach
msghandler implements a Claim-Check pattern with intelligent transport selection:
Sender (smartpack) Transport Layer Receiver (smartunpack)
┌─────────────────┐ ┌───────────────┐ ┌───────────────────┐
│ │ │ │ │ │
│ 1. Data tuples │────────────>│ │───────────>│ 1. Parse envelope │
│ [(name, │ JSON │ Message │ JSON │ 2. Check transport│
│ data, type)]│ format │ Broker │ format │ 3. Fetch/Decode │
│ │ │ (NATS/MQTT/ │ │ 4. Return tuples │
└─────────────────┘ │ WebSocket) │ │ │
│ │ └───────────────────┘
└───────────────┘
Key Design Decisions
| Decision ID | Decision | Rationale | Alternatives Rejected |
|---|---|---|---|
| SD-001 | Claim-Check Pattern | Large payloads (>0.5MB) uploaded to HTTP server, small payloads sent directly via transport | Client-side compression - adds complexity; Server-side compression - not universally supported |
| SD-002 | Automatic Transport Selection | <0.5MB = direct (fast), ≥0.5MB = link (avoid transport limits) | Manual selection - error-prone; Fixed threshold - not adaptive |
| SD-003 | Handler Function Abstraction | Allows pluggable file server implementations (Plik, AWS S3, custom) | Hardcoded Plik - not flexible; Interface-based - too complex for this use case |
| SD-004 | Unified Tuple Format | Same input/output format across all platforms | Platform-native formats - no interoperability; Protocol buffers - too heavy |
| SD-005 | Base64 Encoding | JSON-compatible binary data transport | Raw bytes - not JSON-compatible; Hex encoding - 2x size overhead |
| SD-006 | Transport Abstraction | Support multiple broker protocols (NATS/MQTT/WebSocket) transparently | Platform-specific libraries - no interoperability |
| SD-007 | Exponential Backoff | Retry failed file downloads with exponential backoff | Simple retry - too aggressive; No retry - poor reliability |
| SD-008 | Correlation ID Propagation | Propagate correlation IDs through all message processing steps | Manual correlation - error-prone; No tracing - debug impossible |
Architecture Components
flowchart TB
subgraph Client["Client Application"]
direction TB
APP["Application Code"]
API["msghandler API"]
APP -->|Data tuples| API
API -->|JSON envelope| TRANSPORT
end
subgraph Transport["Transport Layer"]
direction TB
BROKER["Message Broker<br/>NATS/MQTT/WebSocket"]
TOPICS["Topic Subscription"]
API -->|Publish| BROKER
BROKER -->|Deliver| TOPICS
TOPICS -->|Subscribe| API
end
subgraph FileServer["File Server"]
direction TB
UPLOAD["Upload Handler"]
DOWNLOAD["Download Handler"]
API -.->|Upload URL| UPLOAD
DOWNLOAD -.->|Fetch URL| API
end
style Client fill:#e1f5fe,stroke:#0288d1,stroke-width:2px
style Transport fill:#ffe0b2,stroke:#f57c00,stroke-width:2px
style FileServer fill:#c8e6c9,stroke:#43a047,stroke-width:2px
3. Alternatives Considered
| Alternative | Pros | Cons | Decision |
|---|---|---|---|
| gRPC/Protobuf | Strong typing, efficient binary format | No native MicroPython support; Complex schema management | Rejected - not cross-platform enough |
| MessagePack | Compact binary, good performance | Browser support limited; No standard for tabular data | Rejected - missing Arrow IPC alternative |
| Protocol Buffers | Type-safe, efficient | No native support for tabular data exchange | Rejected - cannot represent DataFrames natively |
| REST HTTP Upload | Simple, universal | High latency; No real-time capability | Rejected - not suitable for message broker pattern |
| Hybrid (direct/link) | Optimal for both small and large payloads | More complex implementation | Accepted - matches user requirements (FR-003, FR-004) |
| Single transport type | Simpler implementation | Cannot handle large payloads efficiently | Rejected - violates FR-003 requirement |
| Platform-specific APIs | Native performance | No interoperability; Maintenance burden | Rejected - violates cross-platform goal |
4. High-Level Component Diagram
flowchart TD
subgraph msghandler["msghandler Core Module"]
direction TB
subgraph Serialization["Serialization Layer"]
DIR["Direct Transport"]
LNK["Link Transport"]
DIR -->|Base64| JSON_MSG
LNK -->|HTTP URL| JSON_MSG
end
subgraph Envelope["Envelope Builder"]
HDR["Message Header"]
PAY["Payload Manager"]
HDR --> PAY
end
subgraph Handlers["Handler Functions"]
UPD["Upload Handler"]
DWN["Download Handler"]
UPD --> LNK
DWN --> LNK
end
API["smartpack() / smartunpack()"]
API -->|Input| Serialization
API -->|Output| Serialization
API -->|Configure| Handlers
end
subgraph Transport["Transport Layer"]
BROKER["NATS / MQTT / WebSocket"]
API -->|JSON| BROKER
BROKER -->|JSON| API
end
subgraph FileServer["File Server"]
Plik["HTTP Server"]
UPD -.->|POST| Plik
Plik -.->|URL| DWN
end
style msghandler fill:#b3e5fc,stroke:#0288d1,stroke-width:2px
style Transport fill:#ffe0b2,stroke:#f57c00,stroke-width:2px
style FileServer fill:#c8e6c9,stroke:#43a047,stroke-width:2px
Component Responsibilities
| Component | Responsibilities | Decision IDs | Requirements Addressed |
|---|---|---|---|
| Serialization Layer | Convert data types to transport format (Base64/URL) | SD-005 | FR-001, FR-002, FR-012 |
| Envelope Builder | Create standardized message envelope with metadata | SD-001, SD-008 | FR-011, FR-013, FR-014 |
| Handler Functions | Abstract file server operations for pluggability | SD-003, SD-007 | FR-008, FR-009, FR-010 |
| Transport Adapter | Support multiple broker protocols transparently | SD-006 | FR-013, FR-014 |
| Payload Manager | Track payload types, sizes, and encoding | SD-004 | FR-006, FR-007 |
5. Decision Rationale
SD-001: Why Claim-Check Pattern?
Requirement: FR-003 (Large file handling), FR-004 (Direct transport for small payloads) NFRs: NFR-104 (File upload latency <1s), NFR-105 (File download latency <1s)
Rationale:
- Transport layers (NATS, MQTT) have message size limits (typically 1MB)
- Direct transport is faster for small payloads (no file server round-trip)
- Link transport avoids transport limits for large payloads
- User doesn't need to manually choose - automatic selection based on threshold
SD-002: Why Handler Functions for File Server?
Requirement: FR-008 (Plik integration), FR-009 (Custom file server support) NFR: NFR-202 (File server availability <5% failure rate)
Rationale:
- Plik is common open-source solution for file server
- Some users need AWS S3 or custom implementation
- Handler functions provide clean abstraction without vendor lock-in
- Same signature across all platforms (unified API)
SD-003: Why Tuple Format for Payloads?
Requirement: FR-006 (Multi-payload messages), FR-007 (Payload type preservation)
Rationale:
(dataname, data, type)tuple is language-agnostic- Simple to understand: name, content, type
- Supports mixed payload types in single message
- Easy to serialize/deserialize across platforms
SD-004: Why Base64 Encoding?
Requirement: FR-012 (Message serialization), FR-001 (Cross-platform text messaging)
Rationale:
- JSON is universal - works on all platforms
- Base64 converts binary to ASCII for JSON compatibility
- Standard format with native support in all languages
- No additional dependencies needed
SD-005: Why Automatic Transport Selection?
Requirement: FR-003 (Large file handling), FR-004 (Direct transport for small payloads) NFRs: NFR-104 (File upload latency <1s), NFR-105 (File download latency <1s)
Rationale:
- <0.5MB payloads use direct transport (<1s latency, FR-004 KPI)
- ≥0.5MB payloads use link transport to avoid transport limits (FR-003 KPI: 99% successful uploads)
- User doesn't need to manually choose - automatic selection based on threshold
SD-006: Why Transport Abstraction?
Requirement: FR-013 (Transport publishing), FR-014 (Transport subscription) NFR: NFR-201 (Message delivery at-least-once)
Rationale:
- Support multiple broker protocols (NATS, MQTT, WebSocket) transparently
- Caller handles actual transport publishing/subscription
- Unified API across all platforms
- At-least-once delivery semantics via transport layer
SD-007: Why Exponential Backoff?
Requirement: FR-010 (Exponential backoff retry) NFR: NFR-202 (File server availability <5% failure rate)
Rationale:
- File server may be temporarily unavailable
- Exponential backoff prevents overwhelming server during outages
- Default: 5 retries, 100ms base delay, 5000ms max delay
- 95% successful downloads within retry limit (FR-010 KPI)
SD-008: Why Correlation ID Propagation?
Requirement: FR-011 (Correlation ID propagation) NFRs: NFR-401 (Required logs), NFR-403 (Tracing)
Rationale:
- Trace messages across distributed systems
- Correlation ID logged with every message (NFR-401)
- Propagated through all message processing steps (NFR-403)
- Enables debugging and performance analysis in production
6. Risk Assessment
| Risk | Impact | Probability | Mitigation | Requirement IDs | NFR IDs |
|---|---|---|---|---|---|
| Performance degradation with >500KB payloads | High | Medium | Size threshold detection; Link transport fallback | FR-003, FR-004 | NFR-104, NFR-105 |
| File server availability issues | Medium | Low | Exponential backoff retry; Graceful degradation | FR-010 | NFR-202 |
| Platform-specific bugs | Medium | Low | Comprehensive test suite per platform; CI validation | FR-001, FR-002, FR-006, FR-007 | - |
| Encoding mismatches between platforms | High | Low | Strict specification; Test contracts; Validation rules | FR-012 | NFR-301 |
| Transport layer incompatibility | Medium | Low | Transport-agnostic design; Handler abstraction | FR-013, FR-014 | NFR-201 |
| Correlation ID loss in processing | Medium | Low | Centralized trace context management | FR-011 | NFR-401, NFR-403 |
7. Requirements Traceability
| Solution Component | Decision ID | Requirement ID | Description |
|---|---|---|---|
| smartpack() function | SD-001, SD-002, SD-004, SD-005, SD-006, SD-008 | FR-001, FR-002, FR-003, FR-004, FR-005, FR-006, FR-007, FR-008, FR-009, FR-010, FR-011, FR-012, FR-013, FR-014 | Unified API for sending messages across all platforms |
| smartunpack() function | SD-001, SD-002, SD-004, SD-005, SD-006, SD-007, SD-008 | FR-001, FR-002, FR-003, FR-004, FR-005, FR-006, FR-007, FR-008, FR-009, FR-010, FR-011, FR-012, FR-013, FR-014 | Unified API for receiving messages across all platforms |
| Direct transport | SD-002 | FR-004 | Send payloads < threshold directly via transport |
| Link transport | SD-001, SD-002 | FR-003 | Upload payloads ≥ threshold to file server |
| File server handler | SD-003, SD-007 | FR-008, FR-009, FR-010 | Pluggable upload/download handlers with retry logic |
| Payload type preservation | SD-004 | FR-006, FR-007 | Support text, dictionary, arrowtable, jsontable, image, audio, video, binary |
| Correlation ID propagation | SD-008 | FR-011 | Message tracing across distributed systems |
| Multi-payload support | SD-004 | FR-006, FR-007 | List of (dataname, data, type) tuples |
Non-Functional Requirements Traceability
| Solution Component | Decision ID | NFR ID | Description |
|---|---|---|---|
| Serialization optimization | SD-005 | NFR-101, NFR-102 | <50ms overhead for 10KB payloads |
| Transport efficiency | SD-006 | NFR-103 | <100ms connection establishment |
| File server latency | SD-001, SD-002 | NFR-104, NFR-105 | <1s upload/download for 0.5MB files |
| Concurrent connections | SD-006 | NFR-106 | Support 100+ simultaneous connections |
| Message throughput | SD-005, SD-006 | NFR-107 | Handle 1000+ messages/second per instance |
| At-least-once delivery | SD-006 | NFR-201 | Transport layer semantics |
| Graceful degradation | SD-003, SD-007 | NFR-202 | File server unavailability handling |
| Auto-reconnect | SD-006 | NFR-203 | Transport connection failure recovery |
| Payload integrity | SD-005 | NFR-301 | 100% SHA-256 checksum validation |
| Transport security | SD-006 | NFR-302 | 100% TLS connections in production |
| File server security | SD-003 | NFR-303 | 100% authenticated file uploads |
| Required logs | SD-001, SD-008 | NFR-401 | Correlation ID, msg_id, timestamp, etc. |
| Critical metrics | SD-001, SD-005 | NFR-402 | messages_sent_total, file upload/download duration |
| Tracing | SD-001, SD-008 | NFR-403 | Correlation ID propagation |
| Alerting | SD-007 | NFR-404 | <5min alert latency for download_retry_exceeded |
8. Gap-Check Validation
| Stage Transition | Gap-Check Question | Status |
|---|---|---|
| Requirements → Solution Design | Does the Solution Design clearly explain how the system solves the user problem, not just what it does? | ✅ Verified - All user problems mapped to solution components with requirement ID and decision ID references |
| Solution Design → Specification | Does the Specification define all technical details that the solution approach requires? | ⏳ Pending - Specification needs review for completeness |
| Solution Design → Walkthrough | Does the Walkthrough reflect the complete flow including error states and timing? | ⏳ Pending - Walkthrough needs validation against design |
Solution Design Validation
User Problems (from requirements.md):
- P-001: Cross-platform data serialization (FR-001, FR-002)
- P-002: Large payload handling (FR-003)
- P-003: Transport abstraction (FR-013, FR-014)
- P-004: Request-response patterns (FR-011)
- P-005: File server reliability (FR-010)
- P-006: Payload type preservation (FR-006, FR-007)
Solution Components:
- SD-001 -
smartpack()/smartunpack()- Unified API for all platforms - SD-002 - Claim-Check pattern - Automatic transport selection based on size threshold
- SD-003 - Handler function abstraction - Plik/AWS S3/custom file server support
- SD-004 - Tuple format -
(dataname, data, type)- platform-agnostic - SD-005 - Base64 encoding - JSON-compatible binary data transport
- SD-006 - Transport abstraction - Support multiple broker protocols transparently
- SD-007 - Exponential backoff - Reliable file downloads with retry logic
- SD-008 - Correlation ID propagation - Message tracing across distributed systems
Requirement Mapping:
- Functional Requirements: FR-001 through FR-014 ✅
- Non-Functional Requirements: NFR-101 through NFR-405 ✅
Gap Check: Does this solution explain how users will actually use the system?
Answer: Yes - the walkthrough provides concrete examples:
- JavaScript sends
[(msg, "Hello", "text"), (avatar, binary_data, "image")] smartpack()automatically selects transport based on size (SD-002)- Large file (≥0.5MB) → link transport → file server upload (SD-001)
- Small payload (<0.5MB) → direct transport → base64 encoding (SD-005)
- Receiver calls
smartunpack()→ receives same tuple format with preserved types
NFR Traceability:
- Performance: NFR-101 (serialization <50ms), NFR-102 (deserialization <50ms), NFR-103 (connection <100ms) ✅
- Reliability: NFR-201 (at-least-once delivery), NFR-202 (file server <5% failure), NFR-203 (auto-reconnect <30s) ✅
- Security: NFR-301 (SHA-256 checksum), NFR-302 (TLS 100%), NFR-303 (authenticated uploads) ✅
- Observability: NFR-401 (required logs), NFR-402 (metrics), NFR-403 (tracing), NFR-404 (alerting <5min) ✅
This solution design document is versioned and maintained in git alongside the codebase. All implementations must adhere to this design.
Traceability Summary:
- All requirements traced to solution components with SD-XXX decision IDs
- Each decision ID references the corresponding requirement IDs (FR-XXX, NFR-XXX)
- Specification must cite SD-XXX references for each technical detail