Self-Reflective Language Models (Alizadeh et al., 2026)

URL: https://arxiv.org/abs/2603.15653

The paper extends RLM's recursive context-interaction scheme with uncertainty-aware program-trajectory selection. SRLM operates in the same sandboxed Python REPL environment RLM defines; what it adds is three intrinsic uncertainty signals (self-consistency across K=8 sampled programs, verbalized confidence via JSON self-assessment, and reasoning-trace length as behavioral uncertainty) that select among candidate context-interaction programs without recursive sub-calls. The empirical claim is a 22 percent accuracy improvement over RLM on BrowseComp+ (37.1 percent to 59.7 percent with Qwen3-Coder) under identical wall-clock budgets, and the surprising finding that "self-reflection can actually outperform recursion in both performance and cost (wall-clock time) under long-context settings."

Adopted

The paper's confirmation that RLM's substrate shape (Python REPL with prompt-as-variable plus LM-as-tool inside the environment) generalizes across reflective and recursive variants is supporting evidence for the harness-as-tool inversion this graph's [[Agent Harnesses Drive the Runtime, Not the Reverse]] Conviction names. SRLM does not change the substrate; it changes how the LM picks among trajectories the substrate makes available. The substrate-vs-glue diagnostic still applies to SRLM's substrate (Python REPL) just as it did to RLM's.

Not adopted (yet)

The uncertainty-aware trajectory-selection mechanism itself is downstream of substrate concerns -- it is a harness-side or evaluator-side choice that operates on whatever environment the substrate provides. eOS Continuum's substrate-LAYER position is upstream of this choice; the substrate makes the trajectories available, then SRLM-style selection or another method picks among them.

Sources

Relations