Retry
The retry layer defines how vox handles ambiguous failures: cases where the client does not know whether the server received, started, completed, or lost an operation. It sits above transport/session continuity and below application logic.
This layer assumes the runtime stack Link -> Conduit -> Session -> Connection.
Conduit continuity and session continuity are separate concerns:
- A conduit may hide some link replacement internally.
- A session may survive conduit failure and later continue on a replacement conduit.
- Retry decisions are evaluated against the session's operation table, not against an individual conduit attachment.
Retry semantics are defined in terms of operations, not request attempts. A retry creates a new request attempt for an existing operation. Retry does not retroactively preserve the original request attempt.
The fundamental ambiguity
After any communication failure, the client faces irreducible uncertainty. The previous attempt is in one of these conditions, and the client cannot distinguish them:
- The request never left the client's outbound buffer
- The request arrived but the handler never started
- The handler started and is still running
- The handler completed but the response was lost in transit
- The handler started and then failed, with or without partial side effects
Any retry model must handle all five as possible realities behind a single "unknown" from the client's perspective.
Operation identity
Every RPC is bound to an operation ID: a client-generated identifier that names one logical operation across multiple request attempts.
The client MUST mint a unique operation ID for each logical operation. Every request attempt for that operation carries the same ID. A new intention, even with identical arguments, gets a new ID.
Operation IDs are scoped to a session. When a session ends cleanly, operation records for that session may be evicted.
If a session resumes on a replacement conduit, the operation ID scope does not change. The same operation table continues to govern retries for that session. However, request attempts that were in flight on the failed attachment are not preserved by session resumption.
If the same operation ID arrives with a different method or different serialized arguments, the server MUST reject it as a conflict.
Operation state machine
The server maintains a record mapping operation IDs to states:
┌──────────────┐
│ Absent │ ← no record exists
└──────┬───────┘
│ first attempt arrives
▼
┌──────────────┐
┌────│ Live │────┐
│ └──────┬───────┘ │
│ │ │
release/cancel │ handler │ crash / lost state
(volatile only) │ returns │ before a recoverable
│ │ │ terminal outcome
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────────┐
│ Released │ │ Sealed │ │Indeterminate │
└──────────┘ └──────────┘ └──────────────┘Absent. No record of this operation ID. A new attempt triggers normal admission and transitions to Live.
Live. The operation has been admitted and is not yet terminal. There MUST be at most one live execution owner for any operation ID.
Live state belongs to the session operation table, not to any one conduit attachment. Conduit replacement alone MUST NOT discard Live state.
Released. The runtime has relinquished responsibility for the operation without a sealed outcome. This state is only valid for volatile methods.
Sealed(outcome). A terminal outcome has been recorded. Retries observe that same outcome rather than creating a second independent execution.
Indeterminate. The server lost enough state that it cannot prove whether the logical operation reached a terminal outcome.
Static method retry policy
Retry behavior is fixed at admission time. There is no handler-visible
mid-flight commit() point. Instead, each method has a static retry policy
described by two dimensions.
Volatile vs persist
Volatile is the default. A volatile method may be released on client drop, disconnect, timeout, or cancellation. Once released, the server is no longer responsible for carrying that logical operation to completion.
Persist is the opposite of volatile. A persist method creates a durability obligation from the instant it is admitted. The runtime MUST NOT release the operation for any reason after admission.
A method without persist is volatile.
A method declared persist is non-volatile. Once admitted, the runtime MUST
NOT release it merely because the caller dropped interest, disconnected, or
sent a cancellation request.
Admitting a persist operation creates a durability obligation. From
admission onward, the runtime and method implementation MUST preserve enough
durable state to distinguish Sealed from Indeterminate after crash recovery.
A persist operation MUST NOT be reported as Sealed unless the sealed outcome
has been durably recorded in a form that can be recovered after process
restart.
After admitting a persist operation, an implementation MUST NOT later treat
that operation ID as Absent unless it can prove that neither the admission nor
any externally visible effect survived the crash.
Idem vs non-idem
idem declares that re-executing the same logical operation is semantically
safe. idem is orthogonal to persist.
A method declared idem may be re-executed for the same logical operation
when the retry state machine permits it.
A method without idem is non-idem. The runtime MUST NOT re-execute the same
logical operation unless some stronger proof outside this spec makes it safe.
The four combinations
The two dimensions produce four static method classes:
- volatile + non-idem: the runtime may release the operation on client drop, disconnect, timeout, or cancellation. Re-execution of the same logical operation is not permitted.
- volatile + idem: the runtime may release the operation, but if it does, it may later re-execute the same logical operation under the same operation ID.
- persist + non-idem: once admitted, the runtime MUST preserve enough state to reach a terminal outcome or report Indeterminate after recovery. Re-execution is not permitted.
- persist + idem: once admitted, the runtime MUST preserve enough state to reach a terminal outcome or report Indeterminate. If the operation reaches Indeterminate, re-execution remains safe.
Duplicate attempt handling
A retry never creates a new logical operation. It creates a new request attempt for the same operation.
If the operation is Absent, the server admits the attempt and starts the operation.
If the operation is Live, the server MUST NOT start a second independent execution owner for the same operation ID.
A duplicate arriving while Live MUST attach to the existing live operation and observe the same eventual resolution.
If the operation is Sealed, the server MUST replay the sealed terminal outcome.
If the operation is Released and the method is idem, the runtime MAY treat
the retry as a fresh execution of the same logical operation.
If the operation is Released and the method is not idem, the runtime MUST
fail closed with Indeterminate. It MUST NOT silently turn the same
operation ID into a fresh re-execution.
If the operation is Indeterminate and the method is idem, the runtime MAY
re-execute the operation under the same operation ID.
If the operation is Indeterminate and the method is not idem, the runtime
MUST either recover a sealed outcome from durable state or fail closed with
Indeterminate.
Sealed outcomes
Returning from the handler seals the operation.
A handler return produces the terminal sealed outcome for that logical operation.
A sealed failure MUST be replayed on retry. A retry of the same logical operation MUST NOT turn a sealed failure into a second independent attempt.
Sealed is absorbing. Once an operation is Sealed, later cancel, drop, or retry attempts MUST NOT unseal it.
For a persist operation, the sealed outcome record MUST outlive any crash
point after which the method's externally visible effects might survive.
Cancellation and dropped interest
Cancellation is an event in the retry state machine, not rollback magic.
For a volatile method, an explicit cancellation request MAY release the operation.
For a persist method, explicit cancellation does not authorize release. The operation remains Live until it seals or becomes Indeterminate.
For a volatile method, disconnect or dropped interest MAY release the operation.
For a persist method, disconnect or dropped interest detaches the observer but does not release the operation.
Cancellation races with sealing. Whichever state transition wins first determines the result observed by later retries.
Caller-visible runtime outcomes
A sealed application failure is part of the logical operation's terminal outcome. It MUST be replayed as that same sealed failure, not reclassified as a transport or recovery error.
Indeterminate is a first-class runtime outcome. It means the runtime
refused to guess because the logical operation could not be safely continued,
replayed, or re-executed under the method's retry policy.
Indeterminate MUST be distinguishable from:
- a sealed application failure
- a transport failure before recovery
- cancellation
When session recovery completes but an in-flight operation cannot safely
continue under the operation table and method policy, the caller MUST observe
Indeterminate.
Session resumption and retry
Transport/session continuity can hide some failures, but it does not define logical operation semantics.
StableConduit continuity is below the retry layer. It may hide a conduit break, but it does not by itself authorize re-executing an operation.
Session resumption keeps operation identity alive across a conduit break. A session outlives any individual conduit attachment.
Session resumption is runtime-managed. It MUST NOT require application-level peer collaboration.
After session resumption, the runtime MUST evaluate every still-in-flight operation against the session's operation table before making any retry decision.
If the operation is still Live after session resumption, later duplicates of that operation ID MUST attach to the live operation rather than starting a second execution owner.
If an in-flight operation cannot safely continue after session resumption, the runtime MUST terminate it with the caller-visible outcome required by the operation state machine and method retry policy.
Lower transport/session layers MAY retransmit bytes they know were not delivered. They MUST NOT silently re-execute RPC operations once delivery has become ambiguous.
Operation record lifetime
Operation records MUST outlive the maximum retry window by a comfortable margin.
A Live operation MUST NOT be evicted.
Evicting a persist operation record is only valid after the runtime can no
longer be required to distinguish Sealed from Absent for that session's retry
window.
If the runtime can recognize that an operation ID has expired, it MUST reject the retry rather than treating the ID as Absent.
Channels and retry
Channels are connection-bound and do not compose with retry the same way a plain request/response pair does.
Channel bindings are attached to a concrete connection, not directly to an operation ID.
A method with channel arguments MUST NOT be declared persist until the
runtime defines channel continuity for sticky operations.
Ordinary channels do not have transparent resume semantics. Session resumption, conduit replacement, or operation retry MUST NOT be interpreted as implicit continuation of the exact prior channel transcript.
If the connection carrying a channel breaks, channel handles from that attempt become invalid. The runtime MUST surface them as closed, failed, or otherwise unusable; it MUST NOT silently bind those handles to a replacement stream.
If a channel-bearing method becomes ambiguous and the method is not idem,
the runtime MUST fail closed with Indeterminate. It MUST NOT silently
re-execute the method to recreate channels.
When an idem volatile method with channels is re-executed on retry, the
runtime MAY rebind fresh channel handles for the new execution attempt.
Rebinding creates a new channel set for the retried execution. The runtime MUST NOT promise preservation of prior channel IDs, credit state, partial delivery state, close/reset races, or any exact stream position from the interrupted attempt.
If the parent operation has already Sealed, retry replays the sealed operation outcome in the ordinary way. That replay does not imply that channels from the earlier attempt are resurrected.
Stronger stream-continuity semantics, if introduced in the future, MUST be a separate channel abstraction or explicit opt-in mode. They MUST NOT be retroactively implied for ordinary channels.
Summary
The retry contract is split across three parties:
- The caller mints an operation ID and reuses it when addressing the same logical operation again.
- The runtime tracks operation state, honors method retry policy, prevents duplicate live execution owners, and replays sealed outcomes.
- The method declaration tells the runtime whether the operation is volatile or persist, and whether same-operation re-execution is safe.