Tiling Without Omniscience Dynamizing Faith in Joint Argmax
Abstract
The tiling theorem of Demski, Hsia, and Rapoport proves that an Updateless Decision Theory 1.0 agent has no strict preference for self-modification, given a coherent prior and logical omniscience. In the present document the Faith in Joint Argmax assumption is weakened to a one-sided additive gap that, under iterative learning, vanishes in the Cesàro mean. All four places where omniscience is consumed in the tiling are reduced to a single condition on the inductor over a minimal subalgebra, and the result is reflectively stable. As an output, the condition preserves the independence of branch probabilities as a structural axiom, which can be dropped if one allows bounded optimality, replacing it with an internally verifiable domination condition. If global optimality is required, the agent is placed under an obligation to explore the environment, which removes the entire argmax machinery and yields tiling without structural assumptions about the value table, thereby removing omniscience.
Theorem 3 of DHR25 states: a UDT 1.0 agent has no strict preference for a self-modifying action over the best non-self-modifying one. Formally, for each and each :
Throughout we adopt DHR25's notation:
is the observation space,
the action space,
the actions available at ,
the self-modifying actions,
the non-self-modifying actions,
, denotes its externally indistinguishable non-self-modifying version,
denotes the unique policy-point it forces by Limited Self-Modification (LSM; Assumption 3), and all expectations are taken with respect to the agent's subjective prior ,
indicate a value substituted from outer context,
indicate a free variable in the inner context,
Utility is bounded, ,
The agent's policy is ,
The optimal policy is ; denotes the policy's action under observation .
We work with a propositionally coherent logical inductor with true prices , transformed into a Bayesian logical inductor with round prices ; the Boundedly Rational Inductive Agent (BRIA) round is identified with a step of deductive time. The agent's estimate at round is , with . Fix a non-decreasing budget ; an event is -decidable if its truth value is decidable in time.
We define:
Cross-block conditional laws:
2. Dynamization
The proof proceeds along the chain: Fine-Grained Fairness (FGF) Faith in Joint Argmax (FGJ) Action Coordination (AC) Knowledge of the Decision Procedure (KoDP). Each step consumes omniscience, since the agent must simultaneously not know the concrete action of its argmax and know its value. To remove this computational uncertainty, we check the robustness of the DHR25 chain to computational constraints. We reproduce the chain with explicit tracking of equalities and inequalities, since this determines where and how the assumption can be weakened. If the agent is not omniscient and makes errors in its estimates, we ensure that this error does not propagate to infinity as it passes through the steps of the proof.
FGF states that self-modification does not change value if it is equivalent to a non-self-modifying action.
FGJ links the current choice to the future argmax, guaranteeing that the agent believes in the consistency of its optimal decisions across different policy points.
AC allows collapsing nested maximization operations into a single decision process.
KoDP removes the need to predict the future argmax, allowing the agent to use knowledge of its own decision-making algorithm.
Formally, with fixed and , :
Theorem 1.
Proof. Fix an arbitrary under observation , with . Introduce the intermediate quantities
The chain is:
Here the equality is FGF; — since ; — taking the maximum over the second condition ( is the right-hand side of FGJ); the step is -FGJ in the direction ; AC collapses the iterated double maximum inside the argmax into a single one (); KoDP removes the condition on the argmax ().
Note that FGJ is used exactly once, only on the lower side, and is sandwiched between exact equalities. The AC and KoDP steps after FGJ apply to and are exact equalities, i.e. any additive gap introduced at the FGJ step passes straight through without amplification.
The -FGJ used here, at , is exactly Assumption 5 of DHR25.
Thus we have shown that any error in knowing the future argmax is transferred additively to the final result. This does not guarantee that the agent will be able to achieve tiling at all. For the error not to be a permanent obstacle, it must tend to zero in the course of learning. Since varies with self-modification, to obtain "no self-modifying action at is preferable by more than ", the assumption must hold uniformly: over all relevant pairs.
We transfer the problem to the BRIA setting (Oesterheld, Demski, Conitzer, 2023). At round the agent holds estimates of the quantities , chooses and , realizing the pair with true value and estimation regret . The realization schedules are -decidable and partition .
Theorem 2. Let be a covering BRIA over , where recommends the pair and promises . Under reward-value tracking (R2) on each :
Proof. We use two properties of the covering BRIA from Oesterheld et al. (2023):
(i)covering (Def. 6): if the empirical record of a hypothesis does not diverge to , then the set of rounds in which is rejected is finite, and hence for all but finitely many ;
(ii)no overestimation: .
Fix and . The test set of the hypothesis is -decidable; value tracking (R2) on it gives . Consequently the empirical record has mean and does not diverge to . By property (i), for all but finitely many . Letting and taking the supremum over all pairs (finitely many), , whence . By property (ii), , so .
We partition by realized pairs. For each with infinite , value tracking (R2) on gives . Summing over the finitely many pairs, . Since , we obtain .
Combining: ; from the ceiling identity , whence .
Corollary. Set
If , then the best response is chosen with limiting frequency : since , we have , whence
In the absence of a gap () only the Cesàro statement of Theorem 2 holds. (The gap is taken on the -scale of realized pairs , consistently with the indicator.)
At this point we have shown the attainability of tiling in values; however, we have implicitly assumed that the AC and KoDP steps are exact equalities, whereas so far they too consume omniscience. But in reality the agent may err not only in values (as in FGJ) but in the maximum-selection process itself. If it incorrectly identifies the argmax, this creates a new error. To guarantee that these errors also vanish, we introduce Lemma 1, which links the error in probabilities to the error in expectations.
Lemma 1. Let be events of positive -probability, an arbitrary event, . If , then:
Proof. By total expectation and : . From :
Finally, , since the bracket lies in .
Good events for AC and KoDP:
Let , , — the conditioning after the AC step. We define (at round ; in the idealized form of Theorem 3 the same residuals are written without the index ):
Now we combine the results obtained.
Theorem 3. Under the residual bounds , , :
Proof. By Theorem 1 the steps before FGJ are exact; FGJ contributes and yields .
We apply Lemma 1 with . The hypothesis holds: on the iterated double maximum inside the argmax equals the single one, so the conditioning on coincides, while the parts are identical. Hence ; taking : .
We apply Lemma 1 with : on knowledge of the procedure makes the argmax condition redundant, so . Hence .
Combining: , and adding the FGJ gap closes the chain.
Thus we have carried out the dynamization of FGJ via BRIA.
3. The Minimal Condition on the Inductor
If the agent's errors in value estimates tend to zero, then tiling holds. But what exactly must the inductor ensure for these errors to actually vanish? We cannot demand perfection from the agent in everything, but we must impose requirements on those aspects of its logic that directly affect decision-making. We call this set of requirements the Minimal Condition for Rationality (MCR).
Let the decision-relevant generators be the propositions about policy points , value comparisons , and reward variables on the realization schedules. Let be the "modification–target" pairs.
Definition (Minimal Condition for Rationality — MCR). The true prices satisfy under budget if:
(R1) for each with , the centered sequence has vanishing Cesàro mean along each -decidable subschedule ; equivalently, its true-price frequency equals , inexploitably;
(R2) for each realization schedule generated within , the reward sequence on is -boundedly van Mises–Wald–Church (vMWC) random with its true conditional mean.
R1 guarantees that the agent's belief that "event will occur" is mathematically consistent with reality (necessity). R2 guarantees that the agent physically "sees" the rewards and that they are no noisier than the budget permits (sufficiency). Together they create the foundation for convergence.
In the special case, two-block vMWC randomness for a pair is exactly MCR restricted to the two-block cross algebra of a single pair of observations. MCR lifts it to an arbitrary decision-relevant algebra.
Lemma 2. Suppose and the auxiliary coherence constraints (inherited from DHR25 §6). Let be a -decidable event with , given by a finite Boolean combination of strict comparisons from , and a -decidable event of positive true price. Then:
Proof. Write , , with true gaps (, since is finite).
By R2 the estimates track the true values on the corresponding schedules; real-valued coherence pins the credences to these estimates. Since the gap is strict, the estimated comparison coincides with the true one with limiting frequency , whence . By subadditivity , so .
Since and , there exist and with for . The auxiliary coherence constraints license the monotonicity for self-referential .
We split the sum at : the first terms contribute . For the rest: , where , which completes the proof.
Lemma 3. If R1 is violated for : there exist an infinite -decidable subschedule and such that
for infinitely many . Then the inductor is -exploitable.
The proof of Lemma 3 is a standard trader argument and, for readability, is given in Appendix A.
We see that MCR does indeed make sense as a constraint to impose; however, to understand how strict MCR must be, we need to determine the bounds of its domain of application. It would be excessive to demand MCR for all possible propositions in the agent's language. It suffices for it to hold only for those propositions the agent uses to compare actions and choose a policy. This leads us to the notion of a minimal subalgebra.
3.1 The Minimal Subalgebra
The tiling chain conditions and compares values only through the pairs . This allows defining the minimal algebra, with :
restricted to the pairs . (The value layer is the joint -algebra of rewards and schedule labels , since the schedules themselves partition and restricting the rewards to their union yields all .)
MCR guarantees that the agent learns correctly, but it does not guarantee that the structure of the world (or of the policy) itself helps it make the right choice. For the action-coordination step (AC) in our chain to become exact, we may need a certain structural orderedness in the value table. We introduce two such structural notions: branch independence (VR1) and argmax stability (CBA).
Definition. A pair satisfies VR1 under if the joint policy law factorizes: and for all .
Definition.
A pair satisfies CBA if there exists a uniform dominator with for every .
The pair satisfies sCBA (strict CBA) if it satisfies CBA and, additionally, for each there exists with and .
Thus , and both properties have distinct notations.
If VR1 and CBA hold, then MCR automatically guarantees the exactness of the AC and KoDP steps. We prove this with the following lemmas, under Action-Coordination Honesty (ACH):
Proof.(i) By VR1 the weight does not depend on , so:
For the strictness of sCBA gives an with and , making the inequality strict. Thus .
(ii) By ACH, ; since maximizes over , it also maximizes over , and uniqueness from (i) forces .
(iii) Determinism of the policy gives ; by VR1, , whence by (ii).
(iv) By CBA, ; by (iii), . The two argmaxes coincide — this is .
Clearly, without VR1 the conclusion fails: with , and , we have , even though dominates pointwise. A comparison of VR1 and full branch independence is in Appendix B.
Lemma 5. Suppose value tracking of the -block (R2) and MCR over the comparison events of the -block. Then:
and similarly with in place of .
Proof. By the BRIA rule: . Consequently — this is the -decidable event "the argmax of the estimate does not coincide with the argmax of the true value in the -block".
is a finite Boolean combination of -block comparisons, -decidable. Under true prices : the agent chooses if and only if is its estimate-argmax, which by limiting value tracking is the argmax of the true value. Lemma 2 yields the Cesàro-zero conclusion.
The only potential self-reference is in the hypothesis . Universal instantiation transfers to the concrete even under conditioning on ; real-valued coherence forces to cohere with the learned . This is exactly the auxiliary coherence constraints from DHR25 §6, no new generators are consumed.
Now we can assemble everything. We have:
the error in values, which vanishes by R2/Theorem 2,
the error in probabilities (comparisons), which vanishes by R1/Lemma 2,
the errors in the AC and KoDP steps, which vanish if the world possesses the VR1/CBA structure.
The synthesis of these results gives the main theorem of the section.
Theorem 4. Let the agent be a BRIA over a Bayesian logical inductor. Suppose LSM, the structural premises making , exact under (Lemma 4, Lemma 5), and . Then:
Proof. The static chain (Theorem 3) is exact on the first step (FGF) under true prices and becomes an -chain with residuals , arising outside the inner and not amplified by the exact steps. The conclusion, however, is denominated in the round estimates , whereas the chain is written in idealized prices; transferring to introduces one more residual at the FGF step, which we first isolate explicitly.
For : the self-modification event and the non-modification event have the same true value, but the agent sees different estimates. We define this gap as:
The cell is an -block generator of the value layer. Thanks to the value-tracking condition R2, the agent inevitably comes to know the true values of these table cells over time, so .
Similarly, the estimation residual for the non-self-modifying -block is Cesàro-zero by R2. Hence the difference between the estimated and the true form of the bound is Cesàro-zero.
For : Theorem 2 guarantees convergence of the error to .
For : by Lemma 4 is exact under ; Lemma 2 gives for each ; finiteness of and subadditivity close it.
For : Lemma 5 guarantees that, thanks to MCR, the error converges to .
Since each of these error sources is Cesàro-zero, their sum also tends to zero.
Theorem 5. Under LSM, FGF, ACH, VR1, and sCBA under ,
Proof. This is exactly Theorem 4: with the skeleton fixed, supplies R1 and R2, the structural premises give exactness of the good events under , and the assembly of Theorem 4 gives .
We show by contraposition that abandoning either layer breaks the conclusion.
Suppose R2 is violated on some realization schedule : there exists an infinite -decidable with
Then does not converge to along the subsequence on which the pair is chosen with positive frequency, and the regret has nonzero Cesàro mean along it; by Theorem 2 this is exactly a violation of the Cesàro-zeroness of , so the tiling term does not tend to zero. Hence the value layer is indispensable.
Suppose R1 is violated for for some : there exist a -decidable and with
for infinitely many . By Lemma 3 (Appendix A) the inductor is -exploitable, which is incompatible with the logical-induction criterion; equivalently, excluding the comparison layer admits an inductor whose true prices violate CBA, so fails, the AC step collapses by a nonzero amount, and an -trader betting on the mismatch of argmaxes exploits the agent. Hence the comparison layer is indispensable.
Thus both layers are necessary and sufficient, and is minimal.
3.2 Reflective Stability
Theorem 4 was proved by reasoning about the agent's prices. For tiling to be self-applicable, the predecessor must refrain from modification on grounds available to it itself.
Proposition 1. Every quantity in the tiling bound of Theorem 4 is an entry in the agent's price system . In particular, all residuals are the agent's own credences, not external quantities.
Proof. The proof of Theorem 4 is built along the scheme: transfer of FGF to estimates (the residual is an R2-tracked quantity in ) -chain in Lemma 1 introduces the residuals Lemma 2 gives their Cesàro-zeroness. Every term at every step is a round price of the agent.
As a consequence, to act, the agent does not need a credence about its own price of the form . It takes the of , which is a price. The "belief about belief" tower does not start, since the action consumes a price, not a judgment about a price.
Lemma 6. Denote the modification successor and the unmodified successor . The reflective gap . Then
where UDT 1.0 is the updateless optimization rule .
Proof. The updatelessness of UDT 1.0 chooses the policy once; in the dynamized version, at each round all coordinates of the committed policy are co-determined by one and the same price system . In particular the -coordinate (and the entire lower part) is evaluated at the round- prices, not by a later estimate with . This follows from updatelessness itself — the predecessor at round values the committed policy as a single object under — and requires neither factorization of the branch probabilities (VR1) nor invariance of values across branches (VR2): the localization of the optimization onto the -term (the content of Theorem 4 of the companion FGJ document, which uses VR1/VR2) is needed only to compute that term, whereas for the gap identity it suffices to have the qualitative fact that both valuations — and — are taken at the prices of one and the same round . Therefore the valuation "keep UDT" is not a nested with , but a single-level round conditional expectation. This is exactly what makes Lemma 6 applicable to the branches of §4, where VR1 is dropped.
Therefore "impose " is by definition, and "keep UDT" is with the co-determined lower part. The downstream is a single-level round coordination event (of type ), not a separately trusted future estimate. The correspondence between the form of the action-at- ( versus ) and the form of the modification ("impose" versus "keep") is exactly the FGF + LSM bundle.
Theorem 6. Under the premises of Theorem 4 and without any additional self-trust assumption:
Proof. By Lemma 6, is the summand of Theorem 4; we apply Theorem 4. The only residual self-reference — the downstream coordination event — is for the -block (Lemma 5), -decidable, ; its residual is Cesàro-zero by Lemma 2. No new generator beyond the value and comparison layers is introduced. In the absence of a value gap, when may have ties and , is controlled directly by the value regret (ties do not lose value), which is Cesàro-zero by Theorem 2; the indicator of is then not needed.
As an output, the minimal tiling condition preserves only VR1 as an external axiom. It is needed for Lemma 4, since without it the pointwise dominator may fail to be the agent's marginally optimal action. Globally, in the sections below, we consider various directions for removing this constraint.
4. Working with VR1
Regarding VR1, we see two options for solving the problem. On the one hand, we can impose a condition on the structure of prices under the given policy, that is, essentially state the non-guaranteedness of the agent's optimal policy, since it may require the absence of this condition. This option is acceptable from the standpoint of pure UDT, since it does not presuppose a significant change to the theory, and allows simply translating the omniscience problem into the much more realistic problem of branch independence.
On the other hand, we propose to impose a condition on the agent's action schedule over time. This approach has a goal similar to GU in the sense that, by the agent making an agreement with itself regarding its own actions, we solve a significant problem of the decision-making system. In exchange for such a constraint, we obtain the ability to guarantee attainment of the optimum without omniscience of the agent, and therefore we consider it promising.
4.1 Bounded Optimality
Definition. A regime satisfies realized cross-block domination if:
Clearly, is weaker than the bundle VR1 + sCBA: there (Lemma 4(ii)), so is a strict weakening. The converse is false: does not entail and may hold under coupled conditional laws.
Definition. A regime is admissible if it satisfies ACH () and under . The set of admissible regimes is ; it is finite ( finite).
Lemma 7. If a regime satisfies ACH and under , then holds under .
Proof. Denote .
(i) By : for every .
(ii) Under the policy is deterministic and with -probability . For any with the event has full probability, so , whence . Here only the determinism of the realized policy is used, not the factorization VR1.
From combined: for all . Coincidence of the functions entails coincidence of their argmaxes — this is .
Note that the cross-block conditional law for is nowhere computed: this is an off-policy counterfactual, conditioning on a probability-zero event. Lemma 7 bypasses it entirely.
Lemma 8. Let be admissible and the event of holding together with the argmax condition:
Then for any -decidable of positive price:
and a violation of R1 for entails -exploitability.
Proof. is a finite Boolean combination of comparisons from the comparison layer : -block comparisons and cross-block comparisons . All of them are -decidable and pinned to the true order by value tracking (R2). Since is admissible, . Lemma 2 gives the vanishing. Necessity: a violation of R1 for entails exploitability by a trader (Lemma 3).
Theorem 7.
Proof. We reproduce the tiling chain of Theorem 4, replacing the single place where VR1 is used (the AC step) with Lemma 7. By admissibility of the regime, is exact under (Lemma 7); Lemma 2 gives for , so is Cesàro-zero. The residuals (Theorem 2) and (Lemma 5), as well as the transfer residual (Step 0 of Theorem 4), are Cesàro-zero independently of VR1. The sum of Cesàro-zero sequences is Cesàro-zero.
Theorem 8. Under the premises of Theorem 7 over a finite :
(i) Membership is established in the Cesàro mean inexploitably (Lemma 8).
(ii) On each tiling with vanishing Cesàro residual holds and is reflectively stable (Theorems 7, 6).
(iii) There exists ; the values and their order are certified by tracking (R2) and are uncontaminable by R1. On tiling is reflectively stable. Moreover , with equality if and only if the -achieving regime is admissible.
Proof. (i) Lemma 8. (ii) Theorem 7 and Theorem 6, quantified over . (iii) is finite, the maximum of is attained; the pairwise comparisons are generators of the comparison layer , pinned to the true order and inexploitable by R1. On we apply (ii). The inequality is by definition of the supremum.
If the environment satisfies VR1, the globally optimal regime falls into (Lemma 4(ii)), and . If VR1 is violated, the agent may end up in an admissible component not containing the global optimum, and tiling is guaranteed only within , since the axiom about the environment is replaced by an internally verifiable condition about the agent.
4.2 Global Optimality
Note that if the agent explores enough, that is, visits every cell of the table infinitely often, then it reaches the global ceiling , and tiling follows from optimality without structural requirements on the table.
We add three setup assumptions about the agent and the environment:
Sequentiality (S) — realization proceeds round by round; note that S does not switch the decision rule from updateless argmax to Bayesian conditioning.
Stationarity (St) — on each the reward is vMWC random with constant mean ; this constraint is a direct consequence of the agent's boundedness.
Long-run criterion (L) — the agent maximizes the Cesàro mean .
Definition. A GLIE commitment is a -decidable commitment under which: (a) every schedule is infinite: ; (b) the exploration frequency vanishes: .
Lemma 9. Under S, St, and a GLIE commitment, for each the schedule is infinite and -decidable; by St and R2:
Proof. Infiniteness of each — from GLIE(a). Tracking follows from St and R2: by St the reward on is an -boundedly vMWC-random sequence with constant mean ; by the definition of vMWC-randomness (Def. 8 of Oesterheld et al., 2023), for each infinite -decidable subschedule the mean of centered rewards tends to zero, which is the statement of the lemma for .
The cross-block coupling is nowhere computed; the cells are realized and measured directly.
Lemma 10. For each :
The cell is realized under GLIE (Lemma 9), so the GLIE commitment does not require executing any irreversible self-modification.
Proof. The equality is FGF; by LSM, so is a table cell and . Realization is Lemma 9.
Lemma 11. Under S, St, L, and GLIE:
No value gap is assumed.
Proof. By Lemma 9 all cells are covered, in particular the cell of the global maximum . Then the premises of Theorem 2 (Cesàro optimality) are satisfied: a covering BRIA exists, the test sets are infinite and -decidable. We apply Theorem 2. The exploration rounds GLIE(b) have vanishing frequency and contribute .
Theorem 9. Under FGF, LSM, S, St, L, , and a GLIE commitment:
No structural assumptions about the table (gap, uniqueness of the maximum, dominator, VR1, ) are used.
Proof. At round the agent realizes with value . The non-self-modifying choice realizes by determinism . By Lemma 10 the value of any modification . Therefore the true preference for modification:
uniformly over (any modifying cell ; is finite). By Lemma 11 , so the true preference is Cesàro-zero.
We must show that , where is the FGF estimation residual.
By FGF (a premise of the problem) the true values coincide exactly: the value of the event equals . The estimation residual is the distance between the agent's estimate and the cell's true value . By Lemma 9 the cell is realized infinitely often under GLIE; by R2 the estimates on the schedule track the true value, so .
Similarly the estimation residual for the non-self-modifying -block is Cesàro-zero by R2 on the -block schedules.
Hence the difference between the estimated and the true form is Cesàro-zero; summing up gives the conclusion.
Theorem 10. Under the premises of Theorem 9:
Proof. By Lemma 6 is the summand of Theorem 9 (the identity holds without VR1, see the proof of Lemma 6); we apply Theorem 9. All quantities are denominated in (Proposition 1); the downstream coordination event for the -block (Lemma 5) is Cesàro-zero by Lemma 2.
Clearly, for a one-step agent this route is unavailable. However, it is quite likely that this modification is not the only one available, and a certain category of analogous modifications of the agent's behavior is possible.
5. Open Questions
Weakening LSM. The present results rely on self-modification pinning a single policy point on the same inductor. Price closedness (Proposition 1) and the identity of Lemma 6 use a single ledger. If the successor merely reproduces the algebra on its own inductor, two price systems arise, and Lemma 6 must be rewritten with an inter-agent valuation. Probably, the communication mechanism from Demski (2025) makes sense here, for example, the predecessor transmits not a prediction of the successor's actions but the structure of the algebra and the MCR commitment over it.
Controllability of the coupling in the branch. Can one, by choice of policy, place the globally optimal regime into , that is, is the cross-block coupling controllable enough for the realized action to become the global dominator? In other words, there is so far no evidence whether we can finally get rid of RD without the need for internal agreements to achieve the global optimum under tiling.
6. Conclusion
The results obtained apply to different models. If the environment satisfies VR1, the work closes the task of weakening omniscience completely. If VR1 is violated and the agent is one-step, we can guarantee an optimal conditional tiling. If the agent is sequential with a commitment to explore, the internal GLIE agreement gives an unconditional global tiling.
Appendix A: Proof of Lemma 3
Statement. If R1 is violated for , there exist an infinite -decidable and with
for infinitely many . Then the inductor is -exploitable.
Proof. Let . The trader shorts the conditional contract (payoff , price , settled on ) by a fixed fraction of wealth at each . The wealth multiplier lies in , so everywhere. Using for :
The second term . Along the witnessing subsequence with :
Hence with everywhere. The prices are the agent's own -bounded credences, is -decidable, so the trader is an -trader. Contradiction with the logical-induction criterion. The symmetric case (mean ) — by a long position.
Appendix B: sCBA + VR1 is Strictly Weaker than Full Branch Independence
Full branch independence (VR1, VR2, and additive separation with uniquely maximized on ) entails, under ACH, all the hypotheses of Lemma 4; the converse is false. Consequently, is strictly weaker than branch separability.
Proof.Forward. Additive separation entails CBA: does not depend on ; uniqueness of ensures strictness sCBA. VR1 is a direct assumption. ACH is an external premise. All the hypotheses of Lemma 4 are satisfied.
Converse is false. Let:
,
,
, ,
, ,
, ,
, .
Then is the strict dominator sCBA, VR1 is violated (), but
so and holds — without VR1, only through sCBA and the specific coupling . Additive separation does not hold here.
References
A. Demski, N. Hsia, and P. Rapoport. Understanding Trust. Proceedings of ILIAD, Berkeley, 2025. (DHR25)
C. Oesterheld, A. Demski, and V. Conitzer. A Theory of Bounded Inductive Rationality. TARK 2023, EPTCS 379, pp. 421–440, 2023.
S. Garrabrant, T. Benson-Tilsen, A. Critch, N. Soares, and J. Taylor. Logical Induction. arXiv:1609.03543, 2016.
A. Demski. Communication & Trust. Manuscript, September 16, 2025.