SHM Specification

Introduction

This document specifies the shared memory (SHM) hub transport binding for roam. The hub topology supports one host and multiple guests (1:N), designed for plugin systems where a host application loads guest plugins that communicate via shared memory.

shm.scope

This binding encodes Core Semantics over shared memory. It does NOT redefine the meaning of calls, channels, errors, or flow control — only their representation in this transport.

shm.architecture

This binding assumes:

  • All processes sharing the segment run on the same architecture (same endianness, same word size, same atomic semantics)
  • Cross-process atomics are valid (typically true on modern OSes)
  • The shared memory region is cache-coherent

Cross-architecture SHM is not supported.

Topology

Hub (1:N)

shm.topology.hub

The hub topology has exactly one host and zero or more guests. The host creates and owns the shared memory segment. Guests attach to communicate with the host.

         ┌─────────┐
         │  Host   │
         └────┬────┘
              │
    ┌─────────┼─────────┐
    │         │         │
┌───┴───┐ ┌───┴───┐ ┌───┴───┐
│Guest 1│ │Guest 2│ │Guest 3│
└───────┘ └───────┘ └───────┘
shm.topology.hub.communication

Guests communicate only with the host, not with each other. Each guest has its own rings and slot pool within the shared segment.

shm.topology.hub.calls

Either the host or a guest can initiate calls. The host can call methods on any guest; a guest can call methods on the host.

Peer Identification

shm.topology.peer-id

A guest's peer_id (u8) is 1 + the index of its entry in the peer table. Peer table entry 0 corresponds to peer_id = 1, entry 1 to peer_id = 2, etc. The host does not have a peer_id (it is not in the peer table).

shm.topology.max-guests

The maximum number of guests is limited to 255 (peer IDs 1-255). The max_guests field in the segment header MUST be ≤ 255. The peer table has exactly max_guests entries.

ID Widths

Core defines request_id and channel_id as u64. SHM uses narrower encodings to fit in the 64-byte descriptor:

shm.id.request-id

SHM encodes request_id as u32. The upper 32 bits of Core's u64 request_id are implicitly zero. Implementations MUST NOT use request IDs ≥ 2^32.

shm.id.channel-id

SHM encodes channel_id as u32. The upper 32 bits of Core's u64 channel_id are implicitly zero.

Channel ID Allocation

shm.id.channel-scope

Channel IDs are scoped to the guest-host pair. Two different guests may independently use the same channel_id value without collision because they have separate channel tables.

shm.id.channel-parity

Within a guest-host pair, channel IDs use odd/even parity to prevent collisions:

  • The host allocates even channel IDs (2, 4, 6, ...)
  • The guest allocates odd channel IDs (1, 3, 5, ...)

Channel ID 0 is reserved and MUST NOT be used.

Request ID Scope

shm.id.request-scope

Request IDs are scoped to the guest-host pair. Two different guests may use the same request_id value without collision because their rings are separate.

Handshake

Core Semantics require a Hello exchange to negotiate connection parameters. SHM replaces this with the segment header:

shm.handshake

SHM does not use Hello messages. Instead, the segment header fields (max_payload_size, initial_credit, max_channels) serve as the host's unilateral configuration. Guests accept these values by attaching to the segment.

shm.handshake.no-negotiation

Unlike networked transports, SHM has no negotiation — the host's values are authoritative. A guest that cannot operate within these limits MUST NOT attach.

Segment Layout

The host creates a shared memory segment containing all communication state for all guests.

Segment Header

shm.segment.header

The segment MUST begin with a header:

Offset  Size   Field                Description
──────  ────   ─────                ───────────
0       8      magic                Magic bytes: "RAPAHUB\x01"
8       4      version              Segment format version (1)
12      4      header_size          Size of this header
16      8      total_size           Total segment size in bytes
24      4      max_payload_size     Maximum payload per message
28      4      initial_credit       Initial channel credit (bytes)
32      4      max_guests           Maximum number of guests (≤ 255)
36      4      ring_size            Descriptor ring capacity (power of 2)
40      8      peer_table_offset    Offset to peer table
48      8      slot_region_offset   Offset to payload slot region
56      4      slot_size            Size of each payload slot
60      4      slots_per_guest      Number of slots per guest
64      4      max_channels         Max concurrent channels per guest
68      4      host_goodbye         Host goodbye flag (0 = active)
72      8      heartbeat_interval   Heartbeat interval in nanoseconds (0 = disabled)
80      48     reserved             Reserved for future use (zero)
shm.segment.header-size

The segment header is 128 bytes.

shm.segment.magic

The magic field MUST be exactly RAPAHUB\x01 (8 bytes).

Peer Table

shm.segment.peer-table

The peer table contains one entry per potential guest:

#[repr(C)]
struct PeerEntry {
    state: AtomicU32,           // 0=Empty, 1=Attached, 2=Goodbye
    epoch: AtomicU32,           // Incremented on attach
    guest_to_host_head: AtomicU32,  // Ring head (guest writes)
    guest_to_host_tail: AtomicU32,  // Ring tail (host reads)
    host_to_guest_head: AtomicU32,  // Ring head (host writes)
    host_to_guest_tail: AtomicU32,  // Ring tail (guest reads)
    last_heartbeat: AtomicU64,  // Monotonic tick count (see r[shm.crash.heartbeat-clock])
    ring_offset: u64,           // Offset to this guest's descriptor rings
    slot_pool_offset: u64,      // Offset to this guest's slot pool
    channel_table_offset: u64,  // Offset to this guest's channel table
    reserved: [u8; 8],          // Reserved (zero)
}
// Total: 64 bytes per entry
shm.segment.peer-state

Peer states:

  • Empty (0): Slot available for a new guest
  • Attached (1): Guest is active
  • Goodbye (2): Guest is shutting down or has crashed
  • Reserved (3): Host has allocated slot, guest not yet attached (see r[shm.spawn.reserved-state])

Per-Guest Rings

shm.segment.guest-rings

Each guest has two descriptor rings:

  • Guest→Host ring: Guest produces, host consumes
  • Host→Guest ring: Host produces, guest consumes

Each ring is an array of ring_size descriptors. Head/tail indices are stored in the peer table entry.

shm.ring.layout

At the offset specified by PeerEntry.ring_offset:

  1. Guest→Host ring: ring_size * 64 bytes (ring_size descriptors)
  2. Host→Guest ring: ring_size * 64 bytes (ring_size descriptors)

Both rings are contiguous. Total size per guest: 2 * ring_size * 64 bytes.

shm.ring.alignment

Each ring MUST be aligned to 64 bytes (cache line). Since descriptors are 64 bytes and rings are contiguous, this is naturally satisfied if ring_offset is 64-byte aligned.

shm.ring.initialization

On segment creation, all ring memory MUST be zeroed. On guest attach, the guest MUST NOT assume ring contents are valid — it should wait for head != tail before reading.

shm.ring.capacity

A ring can hold at most ring_size - 1 descriptors. The ring is full when (head + 1) % ring_size == tail. The ring is empty when head == tail.

shm.ring.full

If the ring is full, the producer MUST wait before enqueueing. Implementations SHOULD use futex on the tail index to avoid busy-wait. Ring fullness is not a protocol error — it indicates backpressure from a slow consumer.

Slot Pools

shm.segment.slot-pools

Each guest has a dedicated pool of slots_per_guest payload slots. Slots are used for payloads that exceed inline capacity.

shm.segment.slot-ownership

Slots from a guest's pool are used for messages sent by that guest. After the host processes a message, the slot is returned to the guest's pool.

shm.segment.pool-size

Each slot pool (host or guest) has the same size: pool_size = slot_pool_header_size + slots_per_guest * slot_size where slot_pool_header_size is the bitmap header rounded up to 64 bytes (see r[shm.slot.pool-header-size]).

shm.segment.host-slots

The host has its own slot pool for messages it sends to guests. The host slot pool is located at offset slot_region_offset in the segment (position 0), before the per-guest slot pools.

shm.segment.guest-slot-offset

A guest with peer_id = P (where P ≥ 1) has its slot pool at: slot_region_offset + P * pool_size

Message Encoding

All abstract messages from Core are encoded as 64-byte descriptors.

MsgDesc (64 bytes)

shm.desc.size

Message descriptors MUST be exactly 64 bytes (one cache line).

#[repr(C, align(64))]
pub struct MsgDesc {
    // Identity (16 bytes)
    pub msg_type: u8,             // Message type
    pub flags: u8,                // Message flags (reserved, must be 0)
    pub _reserved: [u8; 2],       // Reserved (must be zero)
    pub id: u32,                  // request_id or channel_id
    pub method_id: u64,           // Method ID (for Request only, else 0)

    // Payload location (16 bytes)
    pub payload_slot: u32,        // Slot index (0xFFFFFFFF = inline)
    pub payload_generation: u32,  // ABA counter (0 for inline payloads)
    pub payload_offset: u32,      // Offset in payload area (0 for inline)
    pub payload_len: u32,         // Payload length in bytes

    // Inline payload (32 bytes)
    pub inline_payload: [u8; 32], // Used when payload_slot == 0xFFFFFFFF
}
shm.desc.flags

The flags field is reserved for future use and MUST be zero. Receivers SHOULD ignore this field (do not reject non-zero values) to allow forward compatibility.

shm.desc.inline-fields

For inline payloads (payload_slot == 0xFFFFFFFF):

  • payload_generation MUST be 0
  • payload_offset MUST be 0
  • payload_len indicates bytes used in inline_payload

Metadata Encoding

The abstract Message type (see [CORE-SPEC]) has separate metadata and payload fields. SHM's 64-byte descriptor cannot carry both separately, so they are combined:

shm.metadata.in-payload

For Request and Response messages, the descriptor's payload contains both metadata and arguments/result, encoded as a single [POSTCARD] value:

struct RequestPayload {
    metadata: Vec<(String, MetadataValue)>,
    arguments: T,  // method arguments tuple
}

struct ResponsePayload {
    metadata: Vec<(String, MetadataValue)>,
    result: Result<T, RoamError<E>>,
}

This differs from other transports where metadata and payload are separate fields in the Message enum.

shm.metadata.limits

The limits from r[call.metadata.limits] apply: at most 128 keys, each value at most 16 KB. Violations are connection errors.

Message Types

shm.desc.msg-type

The msg_type field identifies the abstract message:

Value Message id Field Contains
1 Request request_id
2 Response request_id
3 Cancel request_id
4 Data channel_id
5 Close channel_id
6 Reset channel_id
7 Goodbye (unused)

Note: There is no Credit message type. Credit is conveyed via shared counters (see Flow Control).

Payload Encoding

shm.payload.encoding

Payloads MUST be [POSTCARD]-encoded.

shm.payload.inline

If payload_len <= 32, the payload MUST be stored inline and payload_slot MUST be 0xFFFFFFFF.

shm.payload.slot

If payload_len > 32, the payload MUST be stored in a slot from the sender's pool.

Slot Pool Structure

shm.slot.pool-layout

A slot pool is an array of slots, each slot_size bytes. Before the slots is a slot header:

#[repr(C)]
struct SlotPoolHeader {
    free_bitmap: [AtomicU64; N],  // 1 bit per slot, 1 = free
}

The bitmap size N = ceil(slots_per_guest / 64). Slots are numbered 0 to slots_per_guest - 1. Bit i of word i / 64 represents slot i.

shm.slot.pool-header-size

The slot pool header is padded to a multiple of 64 bytes for alignment. Slot 0 begins immediately after the header.

Slot Lifecycle

shm.slot.allocate

To allocate a slot:

  1. Scan the free_bitmap for a set bit (any strategy: linear, random)
  2. Atomically clear the bit (CAS from 1 to 0)
  3. If CAS fails, retry with another slot
  4. Increment the slot's generation counter
  5. Write payload to the slot
shm.slot.free

To free a slot:

  1. Set the corresponding bit in free_bitmap (atomic OR)

The receiver frees slots after processing the message. This returns the slot to the sender's pool.

shm.slot.generation

Each slot's first 4 bytes are an AtomicU32 generation counter, incremented on allocation. The usable payload area is slot_size - 4 bytes starting at byte 4 of the slot. The receiver verifies payload_generation matches to detect ABA issues.

shm.slot.payload-offset

The payload_offset field in MsgDesc is relative to the payload area (after the generation counter), not the slot start. A payload_offset of 0 means the payload begins at byte 4 of the slot.

shm.slot.exhaustion

If no free slots are available, the sender MUST wait. Use futex on a bitmap word or poll with backoff. Slot exhaustion is not a protocol error — it indicates backpressure.

Ordering and Synchronization

Memory Ordering

shm.ordering.ring-publish

When enqueueing a descriptor:

  1. Write descriptor and payload with Release ordering
  2. Increment ring head with Release ordering
shm.ordering.ring-consume

When dequeueing a descriptor:

  1. Load head with Acquire ordering
  2. If head != tail, load descriptor with Acquire ordering
  3. Process message
  4. Increment tail with Release ordering

Wakeup Mechanism

On Linux, use futex for efficient waiting. Each wait site has a corresponding wake site:

shm.wakeup.consumer-wait

Consumer waiting for messages (ring empty):

  • Wait: futex_wait on ring head when head == tail
  • Wake: Producer calls futex_wake on head after incrementing it
shm.wakeup.producer-wait

Producer waiting for space (ring full):

  • Wait: futex_wait on ring tail when (head + 1) % ring_size == tail
  • Wake: Consumer calls futex_wake on tail after incrementing it
shm.wakeup.credit-wait

Sender waiting for credit (zero remaining):

  • Wait: futex_wait on ChannelEntry.granted_total
  • Wake: Receiver calls futex_wake on granted_total after updating
shm.wakeup.slot-wait

Sender waiting for slots (pool exhausted):

  • Wait: futex_wait on a bitmap word (implementation-defined which)
  • Wake: Receiver calls futex_wake on that word after freeing a slot
shm.wakeup.fallback

On non-Linux platforms, use polling with exponential backoff or platform-specific primitives (e.g., WaitOnAddress on Windows).

Flow Control

SHM uses shared counters for flow control instead of explicit Credit messages.

Channel Metadata Table

shm.flow.channel-table

Each guest-host pair has a channel metadata table for tracking active channels. The table is located at a fixed offset within the guest's region:

#[repr(C)]
struct ChannelEntry {
    state: AtomicU32,        // 0=Free, 1=Active, 2=Closed
    granted_total: AtomicU32, // Cumulative bytes authorized
    _reserved: [u8; 8],      // Reserved (zero)
}
// 16 bytes per entry
shm.flow.channel-table-location

Each guest's channel table offset is stored in PeerEntry.channel_table_offset. The table size is max_channels * 16 bytes.

shm.flow.channel-table-indexing

The channel_id directly indexes the channel table: channel N uses entry N. This means:

  • Channel IDs MUST be < max_channels
  • Channel ID 0 is reserved; entry 0 is unused
  • Usable channel IDs are 1 to max_channels - 1
shm.flow.channel-activate

When opening a new channel, the allocator MUST initialize the entry:

  1. Set granted_total = initial_credit (from segment header)
  2. Set state = Active (with Release ordering)

The sender maintains its own sent_total counter locally (not in shared memory).

shm.flow.channel-id-reuse

A channel ID MAY be reused after the channel is closed (Close or Reset received by both peers). To reuse:

  1. Sender sends Close or Reset
  2. Receiver sets ChannelEntry.state = Free (with Release ordering)
  3. Allocator polls for state == Free before reusing

On reuse, the allocator reinitializes per r[shm.flow.channel-activate].

Implementations SHOULD delay reuse to avoid races (e.g., wait for the entry to be Free before reallocating).

Credit Counters

shm.flow.counter-per-channel

Each active channel has a granted_total: AtomicU32 counter in its channel table entry. The receiver publishes; the sender reads.

Counter Semantics

shm.flow.granted-total

granted_total is cumulative bytes authorized by the receiver. Monotonically increasing (modulo wrap).

shm.flow.remaining-credit

remaining = granted_total - sent_total (wrapping subtraction). Sender MUST NOT send if remaining < payload size.

shm.flow.wrap-rule

Interpret granted_total - sent_total as signed i32. Negative or

2^31 indicates corruption.

Memory Ordering for Credit

shm.flow.ordering.receiver

Update granted_total with Release after consuming data.

shm.flow.ordering.sender

Load granted_total with Acquire before deciding to send.

Initial Credit

shm.flow.initial

Channels start with granted_total = initial_credit from segment header. Sender's sent_total starts at 0.

Zero Credit

shm.flow.zero-credit

Sender waits. Use futex on the counter to avoid busy-wait. Receiver wakes after granting credit.

Credit and Reset

shm.flow.reset

After Reset, stop accessing the channel's credit counter. Values after Reset are undefined.

Guest Lifecycle

Attaching

shm.guest.attach

To attach, a guest:

  1. Opens the shared memory segment
  2. Validates magic and version
  3. Finds an Empty peer table entry
  4. Atomically sets state from Empty to Attached (CAS)
  5. Increments epoch
  6. Begins processing
shm.guest.attach-failure

If no Empty slots exist, the guest cannot attach (hub is full).

Detaching

shm.guest.detach

To detach gracefully:

  1. Set state to Goodbye
  2. Drain remaining messages
  3. Complete or cancel in-flight work
  4. Unmap segment

Host Observing Guests

shm.host.poll-peers

The host periodically checks peer states. On observing Goodbye or epoch change (crash), the host cleans up that guest's resources.

Failure and Goodbye

Goodbye

shm.goodbye.guest

A guest signals shutdown by setting its peer state to Goodbye. It MAY send a Goodbye descriptor with reason first.

shm.goodbye.host

The host signals shutdown by setting host_goodbye in the header to a non-zero value. Guests MUST poll this field and detach when it becomes non-zero.

shm.goodbye.payload

A Goodbye descriptor's payload is a [POSTCARD]-encoded String containing the reason. Per r[core.error.goodbye-reason], the reason MUST contain the rule ID that was violated.

shm.goodbye.host-atomic

The host_goodbye field MUST be accessed atomically (load/store with at least Relaxed ordering). It is written by the host and read by all guests.

Crash Detection

The host is responsible for detecting crashed guests. Epoch-based detection only works when a new guest attaches; the host needs additional mechanisms to detect a guest that crashed while attached.

shm.crash.host-owned

The host MUST use an out-of-band mechanism to detect crashed guests. Common approaches:

  • Hold a process handle (e.g., pidfd on Linux, process handle on Windows) and detect termination
  • Require guests to update a heartbeat field periodically
  • Use OS-specific death notifications
shm.crash.heartbeat

If using heartbeats: each PeerEntry contains a last_heartbeat: AtomicU64 field. Guests MUST update this at least every heartbeat_interval nanoseconds (from segment header). The host declares a guest crashed if heartbeat is stale by more than 2 * heartbeat_interval.

shm.crash.heartbeat-clock

Heartbeat values are monotonic clock readings, not wall-clock time. All processes read from the same system monotonic clock, so values are directly comparable without synchronization.

Each process writes its current monotonic clock reading (in nanoseconds) to last_heartbeat. The host compares the guest's value against its own clock reading: if host_now - guest_heartbeat > 2 * heartbeat_interval, the guest is considered crashed.

Platform clock sources:

  • Linux: CLOCK_MONOTONIC (via clock_gettime or Instant)
  • Windows: QueryPerformanceCounter
  • macOS: mach_absolute_time
shm.crash.epoch

Guests increment epoch on attach. If epoch changes unexpectedly, the previous instance crashed and was replaced.

shm.crash.recovery

On detecting a crashed guest, the host MUST:

  1. Set the peer state to Goodbye
  2. Treat all in-flight operations as failed
  3. Reset rings to empty (head = tail = 0)
  4. Return all slots to free
  5. Reset channel table entries to Free
  6. Set state to Empty (allowing new guest to attach)

Byte Accounting

shm.bytes.what-counts

For flow control, "bytes" = payload_len of Data descriptors (the [POSTCARD]-encoded element size). Descriptor overhead and slot padding do NOT count.

File-Backed Segments

For cross-process communication, the SHM segment must be backed by a file that can be memory-mapped by multiple processes.

Segment File

shm.file.path

The host creates the segment as a regular file at a path known to both host and guests. Common locations:

  • /dev/shm/<name> (Linux tmpfs, recommended)
  • /tmp/<name> (portable but may be disk-backed)
  • Application-specific directory

The path MUST be communicated to guests out-of-band (e.g., via command-line argument or environment variable).

shm.file.create

To create a segment file:

  1. Open or create the file with read/write permissions
  2. Truncate to the required total_size
  3. Memory-map the entire file with MAP_SHARED
  4. Initialize all data structures (header, peer table, rings, slots)
  5. Write header magic last (signals segment is ready)
shm.file.attach

To attach to an existing segment:

  1. Open the file read/write
  2. Memory-map with MAP_SHARED
  3. Validate magic and version
  4. Read configuration from header
  5. Proceed with guest attachment per r[shm.guest.attach]
shm.file.permissions

The segment file SHOULD have permissions that allow all intended guests to read and write. On POSIX systems, mode 0600 or 0660 is typical, with the host and guests running as the same user or group.

shm.file.cleanup

The host SHOULD delete the segment file on graceful shutdown. On crash, stale segment files may remain; implementations SHOULD handle this (e.g., by deleting and recreating on startup).

Platform Mapping

shm.file.mmap-posix

On POSIX systems, use mmap() with:

  • PROT_READ | PROT_WRITE
  • MAP_SHARED (required for cross-process visibility)
  • File descriptor from open() or shm_open()
shm.file.mmap-windows

On Windows, use:

  • CreateFileMapping() to create a file mapping object
  • MapViewOfFile() to map it into the process address space
  • Named mappings can use Global\<name> for cross-session access

Peer Spawning

The host typically spawns guest processes and provides them with the information needed to attach to the segment.

Spawn Ticket

shm.spawn.ticket

Before spawning a guest, the host:

  1. Allocates a peer table entry (finds Empty slot, sets to Reserved)
  2. Creates a doorbell pair (see Doorbell section)
  3. Prepares a "spawn ticket" containing:
    • hub_path: Path to the segment file
    • peer_id: Assigned peer ID (1-255)
    • doorbell_fd: Guest's end of the doorbell (Unix only)
shm.spawn.reserved-state

The peer entry state during spawning:

  • Host sets state to Reserved before spawn
  • Guest sets state to Attached after successful attach
  • If spawn fails, host resets state to Empty

The Reserved state prevents other guests from claiming the slot.

#[repr(u32)]
pub enum PeerState {
    Empty = 0,
    Attached = 1,
    Goodbye = 2,
    Reserved = 3,  // Host has allocated, guest not yet attached
}

Command-Line Arguments

shm.spawn.args

The canonical way to pass spawn ticket information to a guest process is via command-line arguments:

--hub-path=<path>    # Path to segment file
--peer-id=<id>       # Assigned peer ID (1-255)
--doorbell-fd=<fd>   # Doorbell file descriptor (Unix only)
shm.spawn.fd-inheritance

On Unix, the doorbell file descriptor MUST be inheritable by the child process. The host MUST NOT set O_CLOEXEC / FD_CLOEXEC on the guest's doorbell fd before spawning. After spawn, the host closes its copy of the guest's doorbell fd (keeping only its own end).

Guest Initialization

shm.spawn.guest-init

A spawned guest process:

  1. Parses command-line arguments to extract ticket info
  2. Opens and maps the segment file
  3. Validates segment header
  4. Locates its peer entry using peer_id
  5. Verifies state is Reserved (set by host)
  6. Atomically sets state from Reserved to Attached
  7. Initializes doorbell from the inherited fd
  8. Begins message processing

Doorbell Mechanism

Doorbells provide instant cross-process wakeup and death detection, complementing the futex-based wakeup for ring operations.

Purpose

shm.doorbell.purpose

A doorbell is a bidirectional notification channel between host and guest that provides:

  • Wakeup: Signal the other side to check for work
  • Death detection: Detect when the other process terminates

Unlike futex (which requires polling shared memory), a doorbell allows blocking on I/O that unblocks immediately when the peer dies.

Implementation

shm.doorbell.socketpair

On Unix, doorbells are implemented using socketpair():

int fds[2];
socketpair(AF_UNIX, SOCK_STREAM, 0, fds);
// fds[0] = host end, fds[1] = guest end

The host keeps fds[0] and passes fds[1] to the guest via the spawn ticket.

shm.doorbell.signal

To signal the peer, write a single byte to the socket:

char byte = 1;
write(doorbell_fd, &byte, 1);

The byte value is ignored; only the wakeup matters.

shm.doorbell.wait

To wait for a signal (with optional timeout):

struct pollfd pfd = { .fd = doorbell_fd, .events = POLLIN };
poll(&pfd, 1, timeout_ms);
if (pfd.revents & POLLIN) {
    // Peer signaled - check for work
    char buf[16];
    read(doorbell_fd, buf, sizeof(buf));  // drain
}
if (pfd.revents & (POLLHUP | POLLERR)) {
    // Peer died
}
shm.doorbell.death

When a process terminates, its end of the socketpair is closed by the kernel. The surviving process sees POLLHUP or POLLERR on its end, providing immediate death notification without polling.

Integration with Rings

shm.doorbell.ring-integration

Doorbells complement ring-based messaging:

  • After enqueueing a descriptor and updating head, signal the doorbell
  • The receiver can poll() both the doorbell fd and other I/O
  • On doorbell signal, check rings for new messages

This avoids busy-waiting and integrates with async I/O frameworks.

shm.doorbell.optional

Doorbell support is OPTIONAL. Implementations MAY use only futex-based wakeup (per r[shm.wakeup.*]). Doorbells are recommended when:

  • Death detection latency is critical
  • Integration with async I/O (epoll/kqueue/IOCP) is desired
  • Busy-waiting must be avoided entirely

Death Notification

The host needs to detect when guest processes crash or hang so it can clean up resources and optionally restart them.

Notification Callback

shm.death.callback

When adding a peer, the host MAY register a death callback:

type DeathCallback = Arc<dyn Fn(PeerId) + Send + Sync>;

struct AddPeerOptions {
    peer_name: Option<String>,
    on_death: Option<DeathCallback>,
}

The callback is invoked when the guest's doorbell indicates death (POLLHUP/POLLERR) or when heartbeat timeout is exceeded.

shm.death.callback-context

The death callback:

  • Is called from the host's I/O or monitor thread
  • Receives the peer_id of the dead guest
  • SHOULD NOT block for long (schedule cleanup asynchronously)
  • MAY trigger guest restart logic

Detection Methods

shm.death.detection-methods

Implementations SHOULD use multiple detection methods:

Method Latency Reliability Platform
Doorbell POLLHUP Immediate High Unix
Heartbeat timeout 2× interval Medium All
Process handle Immediate High All
Epoch change On reattach Low All

Doorbell provides the best latency on Unix. Process handles (pidfd on Linux, process handle on Windows) provide immediate notification on all platforms.

shm.death.process-handle

On Linux 5.3+, use pidfd_open() to get a pollable fd for the child process. On Windows, the process handle from CreateProcess() is waitable. This provides kernel-level death notification without relying on doorbells.

Recovery Actions

shm.death.recovery

On guest death detection, per r[shm.crash.recovery]:

  1. Invoke the death callback (if registered)
  2. Set peer state to Goodbye, then Empty
  3. Reset rings and free slots
  4. Close host's doorbell end
  5. Optionally respawn the guest

Variable-Size Slot Pools

For applications with diverse payload sizes (e.g., small RPC arguments vs. large binary blobs), a single fixed slot size is inefficient. Variable-size pools use multiple size classes.

Size Classes

shm.varslot.classes

A variable-size pool consists of multiple size classes, each with its own slot size and count. Example configuration:

Class Slot Size Count Total Use Case
0 1 KB 1024 1 MB Small RPC args
1 16 KB 256 4 MB Typical payloads
2 256 KB 32 8 MB Images, CSS
3 4 MB 8 32 MB Compressed fonts
4 16 MB 4 64 MB Decompressed fonts

The specific configuration is application-dependent.

shm.varslot.selection

To allocate a slot for a payload of size N:

  1. Find the smallest size class where slot_size >= N
  2. Allocate from that class's free list
  3. If exhausted, try the next larger class (optional)
  4. If all classes exhausted, block or return error

Shared Pool Architecture

shm.varslot.shared

Unlike fixed-size per-guest pools, variable-size pools are typically shared across all guests:

  • One pool region for the entire hub
  • All guests allocate from the same size classes
  • Slot ownership is tracked per-allocation

This allows efficient use of memory when different guests have different payload size distributions.

shm.varslot.ownership

Each slot tracks its current owner:

struct SlotMeta {
    generation: AtomicU32,  // ABA counter
    state: AtomicU32,       // Free=0, Allocated=1, InFlight=2
    owner_peer: AtomicU32,  // Peer ID that allocated (0 = host)
    next_free: AtomicU32,   // Free list link
}

When a guest crashes, slots with owner_peer == crashed_peer_id are returned to their respective free lists.

Extent-Based Growth

shm.varslot.extents

Size classes can grow dynamically via extents:

  • Each size class starts with one extent of slot_count slots
  • When exhausted, additional extents can be allocated
  • Extents are appended to the segment file (requires remap)
struct SizeClassHeader {
    slot_size: u32,
    slots_per_extent: u32,
    extent_count: AtomicU32,
    extent_offsets: [AtomicU64; MAX_EXTENTS],
}
shm.varslot.extent-layout

Each extent contains:

  1. Extent header (class, index, slot count, offsets)
  2. Slot metadata array (one SlotMeta per slot)
  3. Slot data array (actual payload storage)
┌─────────────────────────────────────────────────┐
│ ExtentHeader (64 bytes)                         │
├─────────────────────────────────────────────────┤
│ SlotMeta[0] │ SlotMeta[1] │ ... │ SlotMeta[N-1] │
├─────────────────────────────────────────────────┤
│ Slot[0] data │ Slot[1] data │ ... │ Slot[N-1]   │
└─────────────────────────────────────────────────┘

Free List Management

shm.varslot.freelist

Each size class maintains a lock-free free list using a Treiber stack:

struct SizeClassHeader {
    // ...
    free_head: AtomicU64,  // Packed (index, generation)
}

Allocation pops from the head; freeing pushes to the head. The generation counter prevents ABA problems.

shm.varslot.allocation

To allocate from a size class:

  1. Load free_head with Acquire
  2. If empty (sentinel value), class is exhausted
  3. Load the slot's next_free pointer
  4. CAS free_head from current to next
  5. On success, increment slot's generation, set state to Allocated
  6. On failure, retry from step 1
shm.varslot.freeing

To free a slot:

  1. Verify generation matches (detect double-free)
  2. Set slot state to Free
  3. Load current free_head
  4. Set slot's next_free to current head
  5. CAS free_head to point to this slot
  6. On failure, retry from step 3

References