Files
msghandler/docs/solution-design.md
2026-05-22 08:51:47 +07:00

21 KiB

Solution Design: msghandler

Version: 1.3.0
Date: 2026-05-22
Status: Active
Ground Truth: src/msghandler.jl ASG Framework Alignment: v8 pillars - Requirements → Solution Design → Specification → Walkthrough → Implementation Plan → Validation → Runbook


1. Problem Decomposition

msghandler addresses the challenge of cross-platform data exchange between Julia, JavaScript, Python, Dart, Rust, and MicroPython applications using message brokers as transport layers.

User Problems

Problem Description User Impact Requirement ID
P-001: Cross-platform data serialization Different languages have incompatible data types and serialization formats Developers must write platform-specific conversion code FR-001, FR-002
P-002: Large payload handling Message brokers have size limits, but large files need to be transferred Large files either fail or require complex workarounds FR-003
P-003: Transport abstraction Each platform has different message broker libraries and APIs No unified interface across platforms FR-013, FR-014
P-004: Request-response patterns Bi-directional communication requires complex correlation tracking Developers must implement custom message routing FR-011
P-005: File server reliability File server may be temporarily unavailable during downloads Failed downloads without retry mechanism FR-010
P-006: Payload type preservation Different platforms have different type systems Data corruption or misinterpretation on receiving end FR-006, FR-007

Solution Boundaries

In Scope:

  • Unified API for smartpack() and smartunpack() across all platforms
  • Automatic transport selection based on payload size
  • File server integration using Claim-Check pattern
  • Multi-payload support with mixed types in single message
  • Exponential backoff for reliable file downloads
  • Correlation ID propagation for message tracing

Out of Scope:

  • Message compression (adds complexity without clear benefit)
  • Message encryption (application-layer concern)
  • Advanced message routing (simple topic matching sufficient)
  • Persistent message queues (transport pattern sufficient)

Decision IDs

Decision ID Decision Description Requirement IDs NFR IDs
SD-001 Claim-Check Pattern Large payloads uploaded to HTTP server, small payloads sent directly FR-003, FR-004 NFR-104, NFR-105
SD-002 Automatic Transport Selection <0.5MB = direct, ≥0.5MB = link based on size threshold FR-003, FR-004 NFR-104, NFR-105
SD-003 Handler Function Abstraction Pluggable file server implementations via handler functions FR-008, FR-009 NFR-202
SD-004 Unified Tuple Format Same (dataname, data, type) format across all platforms FR-006, FR-007 -
SD-005 Base64 Encoding JSON-compatible binary data transport FR-012 -
SD-006 Transport Abstraction Support multiple broker protocols (NATS/MQTT/WebSocket) transparently FR-013, FR-014 NFR-201
SD-007 Exponential Backoff Retry failed file downloads with exponential backoff FR-010 NFR-202
SD-008 Correlation ID Propagation Propagate correlation IDs through all message processing steps FR-011 NFR-401, NFR-403

2. Solution Approach

msghandler implements a Claim-Check pattern with intelligent transport selection:

Sender (smartpack)              Transport Layer              Receiver (smartunpack)
┌─────────────────┐             ┌───────────────┐            ┌───────────────────┐
│                 │             │               │            │                   │
│ 1. Data tuples  │────────────>│               │───────────>│ 1. Parse envelope │
│    [(name,      │   JSON      │  Message      │   JSON     │ 2. Check transport│
│     data, type)]│   format    │  Broker       │   format   │ 3. Fetch/Decode   │
│                 │             │  (NATS/MQTT/  │            │ 4. Return tuples  │
└─────────────────┘             │  WebSocket)   │            │                   │
                                │               │            └───────────────────┘
                                └───────────────┘

Key Design Decisions

Decision ID Decision Rationale Alternatives Rejected
SD-001 Claim-Check Pattern Large payloads (>0.5MB) uploaded to HTTP server, small payloads sent directly via transport Client-side compression - adds complexity; Server-side compression - not universally supported
SD-002 Automatic Transport Selection <0.5MB = direct (fast), ≥0.5MB = link (avoid transport limits) Manual selection - error-prone; Fixed threshold - not adaptive
SD-003 Handler Function Abstraction Allows pluggable file server implementations (Plik, AWS S3, custom) Hardcoded Plik - not flexible; Interface-based - too complex for this use case
SD-004 Unified Tuple Format Same input/output format across all platforms Platform-native formats - no interoperability; Protocol buffers - too heavy
SD-005 Base64 Encoding JSON-compatible binary data transport Raw bytes - not JSON-compatible; Hex encoding - 2x size overhead
SD-006 Transport Abstraction Support multiple broker protocols (NATS/MQTT/WebSocket) transparently Platform-specific libraries - no interoperability
SD-007 Exponential Backoff Retry failed file downloads with exponential backoff Simple retry - too aggressive; No retry - poor reliability
SD-008 Correlation ID Propagation Propagate correlation IDs through all message processing steps Manual correlation - error-prone; No tracing - debug impossible

Architecture Components

flowchart TB
    subgraph Client["Client Application"]
        direction TB
        APP["Application Code"]
        API["msghandler API"]
        
        APP -->|Data tuples| API
        API -->|JSON envelope| TRANSPORT
    end
    
    subgraph Transport["Transport Layer"]
        direction TB
        BROKER["Message Broker<br/>NATS/MQTT/WebSocket"]
        TOPICS["Topic Subscription"]
        
        API -->|Publish| BROKER
        BROKER -->|Deliver| TOPICS
        TOPICS -->|Subscribe| API
    end
    
    subgraph FileServer["File Server"]
        direction TB
        UPLOAD["Upload Handler"]
        DOWNLOAD["Download Handler"]
        
        API -.->|Upload URL| UPLOAD
        DOWNLOAD -.->|Fetch URL| API
    end
    
    style Client fill:#e1f5fe,stroke:#0288d1,stroke-width:2px
    style Transport fill:#ffe0b2,stroke:#f57c00,stroke-width:2px
    style FileServer fill:#c8e6c9,stroke:#43a047,stroke-width:2px

3. Alternatives Considered

Alternative Pros Cons Decision
gRPC/Protobuf Strong typing, efficient binary format No native MicroPython support; Complex schema management Rejected - not cross-platform enough
MessagePack Compact binary, good performance Browser support limited; No standard for tabular data Rejected - missing Arrow IPC alternative
Protocol Buffers Type-safe, efficient No native support for tabular data exchange Rejected - cannot represent DataFrames natively
REST HTTP Upload Simple, universal High latency; No real-time capability Rejected - not suitable for message broker pattern
Hybrid (direct/link) Optimal for both small and large payloads More complex implementation Accepted - matches user requirements (FR-003, FR-004)
Single transport type Simpler implementation Cannot handle large payloads efficiently Rejected - violates FR-003 requirement
Platform-specific APIs Native performance No interoperability; Maintenance burden Rejected - violates cross-platform goal

4. High-Level Component Diagram

flowchart TD
    subgraph msghandler["msghandler Core Module"]
        direction TB
        
        subgraph Serialization["Serialization Layer"]
            DIR["Direct Transport"]
            LNK["Link Transport"]
            
            DIR -->|Base64| JSON_MSG
            LNK -->|HTTP URL| JSON_MSG
        end
        
        subgraph Envelope["Envelope Builder"]
            HDR["Message Header"]
            PAY["Payload Manager"]
            
            HDR --> PAY
        end
        
        subgraph Handlers["Handler Functions"]
            UPD["Upload Handler"]
            DWN["Download Handler"]
            
            UPD --> LNK
            DWN --> LNK
        end
        
        API["smartpack() / smartunpack()"]
        
        API -->|Input| Serialization
        API -->|Output| Serialization
        API -->|Configure| Handlers
    end
    
    subgraph Transport["Transport Layer"]
        BROKER["NATS / MQTT / WebSocket"]
        API -->|JSON| BROKER
        BROKER -->|JSON| API
    end
    
    subgraph FileServer["File Server"]
        Plik["HTTP Server"]
        UPD -.->|POST| Plik
        Plik -.->|URL| DWN
    end
    
    style msghandler fill:#b3e5fc,stroke:#0288d1,stroke-width:2px
    style Transport fill:#ffe0b2,stroke:#f57c00,stroke-width:2px
    style FileServer fill:#c8e6c9,stroke:#43a047,stroke-width:2px

Component Responsibilities

Component Responsibilities Decision IDs Requirements Addressed
Serialization Layer Convert data types to transport format (Base64/URL) SD-005 FR-001, FR-002, FR-012
Envelope Builder Create standardized message envelope with metadata SD-001, SD-008 FR-011, FR-013, FR-014
Handler Functions Abstract file server operations for pluggability SD-003, SD-007 FR-008, FR-009, FR-010
Transport Adapter Support multiple broker protocols transparently SD-006 FR-013, FR-014
Payload Manager Track payload types, sizes, and encoding SD-004 FR-006, FR-007

5. Decision Rationale

SD-001: Why Claim-Check Pattern?

Requirement: FR-003 (Large file handling), FR-004 (Direct transport for small payloads) NFRs: NFR-104 (File upload latency <1s), NFR-105 (File download latency <1s)

Rationale:

  • Transport layers (NATS, MQTT) have message size limits (typically 1MB)
  • Direct transport is faster for small payloads (no file server round-trip)
  • Link transport avoids transport limits for large payloads
  • User doesn't need to manually choose - automatic selection based on threshold

SD-002: Why Handler Functions for File Server?

Requirement: FR-008 (Plik integration), FR-009 (Custom file server support) NFR: NFR-202 (File server availability <5% failure rate)

Rationale:

  • Plik is common open-source solution for file server
  • Some users need AWS S3 or custom implementation
  • Handler functions provide clean abstraction without vendor lock-in
  • Same signature across all platforms (unified API)

SD-003: Why Tuple Format for Payloads?

Requirement: FR-006 (Multi-payload messages), FR-007 (Payload type preservation)

Rationale:

  • (dataname, data, type) tuple is language-agnostic
  • Simple to understand: name, content, type
  • Supports mixed payload types in single message
  • Easy to serialize/deserialize across platforms

SD-004: Why Base64 Encoding?

Requirement: FR-012 (Message serialization), FR-001 (Cross-platform text messaging)

Rationale:

  • JSON is universal - works on all platforms
  • Base64 converts binary to ASCII for JSON compatibility
  • Standard format with native support in all languages
  • No additional dependencies needed

SD-005: Why Automatic Transport Selection?

Requirement: FR-003 (Large file handling), FR-004 (Direct transport for small payloads) NFRs: NFR-104 (File upload latency <1s), NFR-105 (File download latency <1s)

Rationale:

  • <0.5MB payloads use direct transport (<1s latency, FR-004 KPI)
  • ≥0.5MB payloads use link transport to avoid transport limits (FR-003 KPI: 99% successful uploads)
  • User doesn't need to manually choose - automatic selection based on threshold

SD-006: Why Transport Abstraction?

Requirement: FR-013 (Transport publishing), FR-014 (Transport subscription) NFR: NFR-201 (Message delivery at-least-once)

Rationale:

  • Support multiple broker protocols (NATS, MQTT, WebSocket) transparently
  • Caller handles actual transport publishing/subscription
  • Unified API across all platforms
  • At-least-once delivery semantics via transport layer

SD-007: Why Exponential Backoff?

Requirement: FR-010 (Exponential backoff retry) NFR: NFR-202 (File server availability <5% failure rate)

Rationale:

  • File server may be temporarily unavailable
  • Exponential backoff prevents overwhelming server during outages
  • Default: 5 retries, 100ms base delay, 5000ms max delay
  • 95% successful downloads within retry limit (FR-010 KPI)

SD-008: Why Correlation ID Propagation?

Requirement: FR-011 (Correlation ID propagation) NFRs: NFR-401 (Required logs), NFR-403 (Tracing)

Rationale:

  • Trace messages across distributed systems
  • Correlation ID logged with every message (NFR-401)
  • Propagated through all message processing steps (NFR-403)
  • Enables debugging and performance analysis in production

6. Risk Assessment

Risk Impact Probability Mitigation Requirement IDs NFR IDs
Performance degradation with >500KB payloads High Medium Size threshold detection; Link transport fallback FR-003, FR-004 NFR-104, NFR-105
File server availability issues Medium Low Exponential backoff retry; Graceful degradation FR-010 NFR-202
Platform-specific bugs Medium Low Comprehensive test suite per platform; CI validation FR-001, FR-002, FR-006, FR-007 -
Encoding mismatches between platforms High Low Strict specification; Test contracts; Validation rules FR-012 NFR-301
Transport layer incompatibility Medium Low Transport-agnostic design; Handler abstraction FR-013, FR-014 NFR-201
Correlation ID loss in processing Medium Low Centralized trace context management FR-011 NFR-401, NFR-403

7. Requirements Traceability

Solution Component Decision ID Requirement ID Description
smartpack() function SD-001, SD-002, SD-004, SD-005, SD-006, SD-008 FR-001, FR-002, FR-003, FR-004, FR-005, FR-006, FR-007, FR-008, FR-009, FR-010, FR-011, FR-012, FR-013, FR-014 Unified API for sending messages across all platforms
smartunpack() function SD-001, SD-002, SD-004, SD-005, SD-006, SD-007, SD-008 FR-001, FR-002, FR-003, FR-004, FR-005, FR-006, FR-007, FR-008, FR-009, FR-010, FR-011, FR-012, FR-013, FR-014 Unified API for receiving messages across all platforms
Direct transport SD-002 FR-004 Send payloads < threshold directly via transport
Link transport SD-001, SD-002 FR-003 Upload payloads ≥ threshold to file server
File server handler SD-003, SD-007 FR-008, FR-009, FR-010 Pluggable upload/download handlers with retry logic
Payload type preservation SD-004 FR-006, FR-007 Support text, dictionary, arrowtable, jsontable, image, audio, video, binary
Correlation ID propagation SD-008 FR-011 Message tracing across distributed systems
Multi-payload support SD-004 FR-006, FR-007 List of (dataname, data, type) tuples

Non-Functional Requirements Traceability

Solution Component Decision ID NFR ID Description
Serialization optimization SD-005 NFR-101, NFR-102 <50ms overhead for 10KB payloads
Transport efficiency SD-006 NFR-103 <100ms connection establishment
File server latency SD-001, SD-002 NFR-104, NFR-105 <1s upload/download for 0.5MB files
Concurrent connections SD-006 NFR-106 Support 100+ simultaneous connections
Message throughput SD-005, SD-006 NFR-107 Handle 1000+ messages/second per instance
At-least-once delivery SD-006 NFR-201 Transport layer semantics
Graceful degradation SD-003, SD-007 NFR-202 File server unavailability handling
Auto-reconnect SD-006 NFR-203 Transport connection failure recovery
Payload integrity SD-005 NFR-301 100% SHA-256 checksum validation
Transport security SD-006 NFR-302 100% TLS connections in production
File server security SD-003 NFR-303 100% authenticated file uploads
Required logs SD-001, SD-008 NFR-401 Correlation ID, msg_id, timestamp, etc.
Critical metrics SD-001, SD-005 NFR-402 messages_sent_total, file upload/download duration
Tracing SD-001, SD-008 NFR-403 Correlation ID propagation
Alerting SD-007 NFR-404 <5min alert latency for download_retry_exceeded

8. Gap-Check Validation

Stage Transition Gap-Check Question Status
Requirements → Solution Design Does the Solution Design clearly explain how the system solves the user problem, not just what it does? Verified - All user problems mapped to solution components with requirement ID and decision ID references
Solution Design → Specification Does the Specification define all technical details that the solution approach requires? Pending - Specification needs review for completeness
Solution Design → Walkthrough Does the Walkthrough reflect the complete flow including error states and timing? Pending - Walkthrough needs validation against design

Solution Design Validation

User Problems (from requirements.md):

  • P-001: Cross-platform data serialization (FR-001, FR-002)
  • P-002: Large payload handling (FR-003)
  • P-003: Transport abstraction (FR-013, FR-014)
  • P-004: Request-response patterns (FR-011)
  • P-005: File server reliability (FR-010)
  • P-006: Payload type preservation (FR-006, FR-007)

Solution Components:

  1. SD-001 - smartpack() / smartunpack() - Unified API for all platforms
  2. SD-002 - Claim-Check pattern - Automatic transport selection based on size threshold
  3. SD-003 - Handler function abstraction - Plik/AWS S3/custom file server support
  4. SD-004 - Tuple format - (dataname, data, type) - platform-agnostic
  5. SD-005 - Base64 encoding - JSON-compatible binary data transport
  6. SD-006 - Transport abstraction - Support multiple broker protocols transparently
  7. SD-007 - Exponential backoff - Reliable file downloads with retry logic
  8. SD-008 - Correlation ID propagation - Message tracing across distributed systems

Requirement Mapping:

  • Functional Requirements: FR-001 through FR-014
  • Non-Functional Requirements: NFR-101 through NFR-405

Gap Check: Does this solution explain how users will actually use the system?

Answer: Yes - the walkthrough provides concrete examples:

  1. JavaScript sends [(msg, "Hello", "text"), (avatar, binary_data, "image")]
  2. smartpack() automatically selects transport based on size (SD-002)
  3. Large file (≥0.5MB) → link transport → file server upload (SD-001)
  4. Small payload (<0.5MB) → direct transport → base64 encoding (SD-005)
  5. Receiver calls smartunpack() → receives same tuple format with preserved types

NFR Traceability:

  • Performance: NFR-101 (serialization <50ms), NFR-102 (deserialization <50ms), NFR-103 (connection <100ms)
  • Reliability: NFR-201 (at-least-once delivery), NFR-202 (file server <5% failure), NFR-203 (auto-reconnect <30s)
  • Security: NFR-301 (SHA-256 checksum), NFR-302 (TLS 100%), NFR-303 (authenticated uploads)
  • Observability: NFR-401 (required logs), NFR-402 (metrics), NFR-403 (tracing), NFR-404 (alerting <5min)

This solution design document is versioned and maintained in git alongside the codebase. All implementations must adhere to this design.

Traceability Summary:

  • All requirements traced to solution components with SD-XXX decision IDs
  • Each decision ID references the corresponding requirement IDs (FR-XXX, NFR-XXX)
  • Specification must cite SD-XXX references for each technical detail