Self-Funded Pigovian Matchmaker as a Mechanism for De-Escalating the AGI Race

Abstract

A formal mechanism is presented in which a willing regulator-matchmaker fosters cooperation on resources among participants in the AGI race, collects a Pigouvian tax based on the speed-up it induces, and invests the proceeds into alignment research. The construction is derived in the continuous-time options framework of Tan (2025) in which cooperation is treated as a jump in the underlying asset value of participating players, the Pigouvian component is matched to the marginal effect of increasing expected loss, and the total collected fund endogenizes the rate of learning on safety. It is shown how the framework allows for determining participation and optimal activity levels.

Conditions under which it is optimal to enter the market are derived, and it is proven that if the orthogonality condition holds between the supported portfolio and the abilities component , the Suicide Region collapses at finite time, and the upper bound for this time is derived as sum of a deterministic and random term. Finally, if orthogonality is violated, it is proven that enhancing matchmaker capacity does not recover the market's superiority. The construction links research areas including two-sided markets, Pigouvian taxes, self-regulatory organizations, private law enforcement, evolutionary modeling of AI races, real options and option games, measurement of comparative progress and analysis of the Suicide Region.

Motivation

Tan (2025) sets up the AGI race as a symmetric preemption game with continuous time, developing the earlier framework of Armstrong, Bostrom & Shulman (2016). Each player’s asset value is described by GBM and the safety learning function is determined by the length of time safety research takes before the tool is released and by an exogenous learning rate.

The leader and the follower have identical payoff functions which include an expectation loss component, stemming from a misalignment. Given the condition for equal payoffs the term drops out, and the preemption threshold turns out to be independent of this expectation loss. However, the survival threshold contains the expectation loss, and at large losses the preemption threshold falls below the survival threshold. The Suicide Region is defined as the interval of asset values within which the game prescribes a race even when the risk-adjusted present value is negative.

There are identified three types of interventions:

The privatization of liability (increase in preemption threshold)
Windfall clauses (O'Keefe et al. 2020) (equal split of the gains for follower and leader)
sprint verification

All these require an exogenous agent with coercive abilities, which in turn runs into the problem that no global sovereign exist. Similar approaches (e.g., Han et al. 2020, 2021; Cimpeanu et al. 2022) similarly rely on a regulator, but cannot answer the most important question of: how can we afford and legalize a regulator when no central authority exists?

We propose a new mechanism that does not involve coercion (participation is voluntary), is self-financed through the race surplus itself, the regulator is not involved in the race (this is ensured by transparency and a democratic governance system), and funds accumulated allow endogenous increases of the learning rate, hence downward shift of the survival threshold, and eventual closure of the Suicide Region.

Formal Model

Let $N \geq 2$ be the number of participants in the AGI race, indexed $i \in {1, \dots, N}$ . Each player is associated with a process $V_{t}^{i}$ :

d V_{t}^{i} = μ V_{t}^{i} d t + σ V_{t}^{i} d Z_{t}^{i}, μ = r_{f} - δ .

The safety learning function is given by $π (τ) = 1 - e^{- λ τ}$ , where $τ$ is the duration of safety research prior to deployment, and $λ > 0$ denotes the exogenous rate. All players have the same sunk cost $I$ and face a common $D$ .

Leader payoff:

L (V_{τ}, τ) = (1 - S) π (τ) V_{τ} - (1 - π (τ)) D - I,

Follower payoff:

F (V_{τ}, τ) = S π (τ) V_{τ} - (1 - π (τ)) D .

Preemption threshold:

V_{P}^{*} = \frac{I}{(1 - 2 S) π (τ)},

Survival threshold:

V_{S}^{*} = \frac{I + (1 - π (τ)) D}{(1 - S) π (τ)} .

Time until deployment of player $i$ is denoted $τ_{i}$ , while deployment is triggered when $V_{t}^{i}$ crosses the strategically active threshold (in Tan's equilibrium this is $V_{P}^{-}$ , in social optimum $V_{S}^{-}$ ).

Assume that any pair $(i, j)$ may at time $t$ enter into a cooperative deal. We observe a simultaneous jump in both asset values:

V_{t}^{i} \to V_{t}^{i} + \frac{1}{2} (k - 1) (V_{t}^{i} + V_{t}^{j}), V_{t}^{j} \to V_{t}^{j} + \frac{1}{2} (k - 1) (V_{t}^{i} + V_{t}^{j}),

where $k > 1$ is the synergy multiplier. To simplify modeling, we assume the total increment $(k - 1) (V_{t}^{i} + V_{t}^{j})$ is distributed symmetrically and includes joint use of compute, exchange of data, and exchange of non-conflicting architectural discoveries, raising the capitalized value of each participant without splitting it into shares.

Direct cooperation without an intermediary requires payment of search costs $s > 0$ . In context, search means not so much finding a partner and establishing contact (which is non-critical for a small number of participants), but rather finding compatible sets of complementary assets (datasets, compute, non-conflicting architectural discoveries) within each lab's declared portfolio. Also involved are agreement on the structure of IP sharing and the gains from joint work, legal formalization of the deal without disclosing competitively-sensitive information prior to signing, and the establishment of bilateral verification infrastructure for compliance, not relying on a third-party arbiter. Assume that aggregate $s$ is sufficiently large: at typical $(V_{t}^{i}, V_{t}^{j})$

s > β \cdot (k - 1) (V_{t}^{i} + V_{t}^{j}), β \in (0, 1),

that is, direct cooperation is unprofitable at the current equilibrium and empirically rare (Williamson 1979 on transaction costs; Enabling Frontier Lab Coordination 2025 on the current state of inter-lab interaction). This is consistent with small $N$ as an enforcement mechanism: a small number of labs ensures identification of potential partners and observability of agreement violations, but does not eliminate the costs of establishing and verifying each specific deal.

Cooperation is socially negative without internalization of the externality. Increase of $V_{t}^{i}, V_{t}^{j}$ brings the crossing of the fixed thresholds $V_{P}^{-}, V_{S}^{-}$ closer, but does not shift the thresholds themselves. Consequently, expected $τ$ until deployment shrinks, $π (τ)$ decreases, and $(1 - π (τ)) D$ grows. If this acceleration is not internalized via a fee, cooperation increases aggregate expected damage.

The condition for the non-emptiness of the Suicide Region $S = {V_{P}^{-} < V < V_{S}^{-}}$ is derived directly from $V_{P}^{-} < V_{S}^{-}$ :

\begin{matrix} (*) & \frac{I}{(1 - 2 S) π} < \frac{I + (1 - π) D}{(1 - S) π} ⟺ D > \frac{I S}{(1 - 2 S) (1 - π)} . \end{matrix}

The parametric subset

S (π, I, D, S) = {V : V_{P}^{-} (π) < V < V_{S}^{-} (π)}

is non-empty if and only if condition $(*)$ holds.

In the "winner-takes-all" limit ( $S \to 0$ ), condition $(*)$ reduces to $(1 - π) D > 0$ , i.e., the Suicide Region is non-empty for any non-zero expected damage and any $π < 1$ .

Proposed Mechanism

The regulator-matchmaker $M$ pairs labs for resource cooperation such as joint compute use, data exchange, and exchange of non-conflicting architectural discoveries, after which it levies a fee $τ_{M} = δ_{M} + r$ per deal. The part $δ_{M}$ covers operational cost and is assumed negligibly small hereafter for simplicity; the part $r$ , the Pigouvian component, is directed into alignment research. The accumulated fund $R (t)$ raises the safety learning rate $λ$ , which shifts the survival threshold $V_{S}^{-}$ downward. At sufficient intensity, $V_{S}^{-}$ descends to $V_{P}^{-}$ and the Suicide Region closes.

For such a construction to work, three conditions must hold.

First, it must be profitable for participants to go through $M$ rather than forgoing cooperation or seeking a partner around it.
Second, the fee $r$ must be calibrated against the race acceleration that cooperation introduces, otherwise the mechanism subsidizes the very damage it works against.
Third, the accumulated fund must convert into $λ$ rather than leaking into capability-drift $μ$ . This section develops the first two conditions and formalizes the dynamics of $λ (t)$ under the assumption of benign conversion; the third condition is left for §7.

Consider player $i$ 's decision at time $t$ given an available pair $j$ . Cooperation generates a jump of total value $(k - 1) (V_{t}^{i} + V_{t}^{j})$ , distributed symmetrically among participants: labs jointly use compute, exchange datasets, share architectural results that are not in direct competitive conflict. The player has three options, each with its own net jump $Δ V_{i}$ :

Δ V_{i}^{c o o p, M} = \frac{1}{2} [(k - 1) (V_{t}^{i} + V_{t}^{j}) - (δ_{M} + r)],

Δ V_{i}^{c o o p, d i r e c t} = \frac{1}{2} [(k - 1) (V_{t}^{i} + V_{t}^{j}) - s],

Δ V_{i}^{n o n e} = 0.

Direct cooperation without an intermediary requires search costs $s$ , empirically high — in the sense developed in §2: not absence of knowledge about partners' existence, but the laboriousness of finding specific complementarities, agreeing IP-sharing, legal formalization, and bilateral verification without a third-party arbiter. Williamson (1979) provides the theoretical framework for these transaction costs; Enabling Frontier Lab Coordination (2025) provides a recent empirical estimate of their magnitude for frontier labs. The regulator reduces them to $s_{M} ≪ s$ , in the limit to zero, since it specializes precisely in these four subtasks. Then participation in the $M$ -deal dominates direct cooperation when

\begin{matrix} (P1) & τ_{M} = δ_{M} + r < s \end{matrix}

and dominates declining cooperation when

\begin{matrix} (P2) & (k - 1) (V_{t}^{i} + V_{t}^{j}) > δ_{M} + r . \end{matrix}

At typical $V_{t}^{i}, V_{t}^{j}$ of large labs and moderate values of $τ_{M}$ , both conditions hold with substantial margin. This is the basic operational hypothesis of the work: for participants with large capitalization, the cooperation gain substantially exceeds any reasonable fee, and the remaining substantive question is the calibration of $r$ .

Cooperation is socially negative without internalization of the externality. An upward jump in $V_{t}^{i}, V_{t}^{j}$ brings the crossing of the fixed thresholds $V_{P}^{-}, V_{S}^{-}$ closer without shifting the thresholds themselves: expected time until deployment shrinks, $π (τ)$ decreases, expected damage $(1 - π) D$ grows. This is the externality to be internalized. Quantitatively it can be estimated as follows: for a process $V_{t}$ following geometric Brownian motion with $μ > σ^{2} / 2$ , the expected first-passage time to a threshold decreases in the current value with first-order elasticity $\partial E τ / \partial V \approx - 1 / (μ V_{t})$ , to first order in $σ$ . The total jump $Δ V_{c o o p} = (k - 1) (V_{t}^{i} + V_{t}^{j})$ in a deal between $i$ and $j$ shortens the expected time until deployment by $Δ τ \approx - 2 (k - 1) / μ$ , and the corresponding marginal increment in expected damage, by the chain rule, equals

Δ SC = - \frac{\partial π}{\partial τ} \cdot Δ τ \cdot D = λ (t) e^{- λ (t) τ} \cdot \frac{2 (k - 1)}{μ} \cdot D .

By the Pigouvian taxation principle (Pigou 1920; Sandmo 1975), the fee must equal the marginal social damage:

\begin{matrix} (P3) & r^{*} = λ (t) e^{- λ (t) τ} \cdot D \cdot \frac{2 (k - 1)}{μ} . \end{matrix}

In a simplified intuitive form, $(P 3)$ reads as $r^{*} \approx (\partial p_{r} / \partial C) \cdot D \cdot Δ C$ : the fee is proportional to the marginal increase in the probability of misalignment per unit increase in capabilities. In Tan's formalism this increase is expressed through the derivative of the safety function with respect to time and through the shortening of waiting time induced by the GBM jump. Aggregately, the expected increment $(1 - π) D$ caused by the deal is fully covered by the levied fee.

The fees aggregate into a fund

R (t) = r \int_{t_{0}}^{t} M (s) d s,

where $M (s)$ is the flow of match-deals. In Tan, the safety learning rate $λ$ is exogenous; in our setting, the fund finances alignment research, and we endogenize $λ$ via

\begin{matrix} (3.1, 3.2) & λ (t) = λ_{0} + α R (t), \dot{λ} (t) = α r M (t), \end{matrix}

where $α > 0$ is the efficiency of converting alignment capital into safety learning rate. Under a sustained flow $M (t) \to \bar{M}$ this yields linear growth $λ (t) \sim λ_{0} + α r \bar{M} (t - t_{0})$ and, consequently, $π (τ; t) \to 1$ as $t \to \infty$ for any fixed $τ > 0$ .

Equation $(3.1)$ requires two caveats. The first concerns the interpretation of $π$ . Canonically $π (τ) = 1 - e^{- λ τ}$ is the probability of safe deployment given fixed duration $τ$ and constant $λ$ . With time-varying $λ (t)$ , two interpretations are possible.

The myopic one applies the current $λ (t)$ retroactively to a fixed $τ$ , $π (τ; t) = 1 - e^{- λ (t) τ}$ , and ignores the fact that past smaller values of $λ (s)$ for $s < t$ contributed less to accumulated safety.
The integral one uses the hazard-rate representation $π (t) = 1 - \exp (- \int_{t_{0}}^{t} λ (s) d s)$ , internally consistent and recovering the myopic one under constant $λ$ .

The myopic overstates $π$ under growing $λ$ , and hence myopic estimates of Suicide Region closure times are lower bounds relative to the integral ones. The main text below is written in myopic notation for compatibility with the formalism; central claims are additionally verified in the integral form.

The second caveat concerns the scalar $α$ itself. Contemporary alignment research suffers from a dual-use problem: part of the funded work, nominally directed into $λ$ , actually shifts $μ$ . This means $α$ in $(3.1)$ implicitly assumes a portfolio orthogonal to the capabilities component in a sense we make precise later through the correlation metric $ρ$ of Ren et al. (2024). Without orthogonality, equation $(3.1)$ loses its sign. In general, orthogonality is taken as an idealization as gives it operational content and delineates the regime in which it is applicable.

With these caveats, conditions $(P 1)$ , $(P 2)$ , $(P 3)$ jointly characterize the mechanism at the micro level: they specify when a participant goes through $M$ and how much $M$ collects from them. The next section verifies that these conditions survive the transition to the equilibrium of the repeated game, that is, that participants do not learn to bypass the regulator after the first match.

Mechanism Robustness in Equilibrium

We showed that in a single round, participation in the $M$ -deal is dominant under $(P 1 - P 3)$ . But participants do not play a single round. A pair $(i, j)$ , having once established contact through the regulator, can in the future cooperate directly, saving $τ_{M}$ net of residual search costs. If such bypasses become equilibrium, the fund $R (t)$ does not fill, and the central scheme collapses.

This is a standard private-enforcement problem, and it has a canonical answer. In a discounted repeated game with discount factor $γ \in (0, 1)$ and expected flow $μ_{M}$ of match-deals per unit time, the per-period payoff of a participant in terms of $V_{t}$ -jumps is

Π_{p e r i o d} \approx μ_{M} [(k - 1) \bar{V} - \frac{1}{2} τ_{M}],

where $\bar{V} = (V_{t}^{i} + V_{t}^{j}) / 2$ . The one-shot gain from bypassing in a single deal is the saved half of the fee, $τ_{M} / 2$ . The cost of bypassing, if the regulator excludes violators upon detection, is the present value of the foregone deal flow,

Φ (γ, μ_{M}) = μ_{M} [(k - 1) \bar{V} - \frac{1}{2} τ_{M}] \cdot \frac{γ}{1 - γ} .

The condition for suppressing bypass:

\begin{matrix} (P4) & \frac{τ_{M}}{2} < μ_{M} [(k - 1) \bar{V} - \frac{1}{2} τ_{M}] \cdot \frac{γ}{1 - γ} . \end{matrix}

At typical $γ \to 1$ , $μ_{M} \geq 1$ , $(k - 1) \bar{V} ≫ τ_{M}$ , this condition holds with substantial margin. Here we have a formal justification of the intuition about small $N$ as an enforcement mechanism: a small number of frontier labs implies a high effective $γ$ , because future deals within a small group are nearly guaranteed (Bernstein 1992 on the Diamond District; Greif 1993 on the Maghribi). Transparency of $M$ 's decisions and financial flows, developed in §6, ensures observability of violations: bypass becomes common knowledge, and the exclusion threat is backed by coordination in excluding behavior of the remaining participants.

Here arises a second question: the standard two-sided platforms literature (Caillaud & Jullien 2003; Rochet & Tirole 2003) indicates that below critical mass such platforms die: network ties do not form, network effects do not activate, platform value collapses to zero. This creates a classic bootstrap problem: even a socially useful platform may fail to launch if the first participants lack the incentive to join.

The proposed mechanism differs qualitatively. The Pigouvian component $r$ is directed into alignment per transaction, not proportionally to total volume or network effect. Each individual match-deal, including the first, strictly increases $λ (t)$ :

\frac{\partial λ (t)}{\partial M (t)} = α r > 0 \forall M (t) \geq 1.

Hence the regulator brings social benefit from the moment of formation: a single match-deal suffices for $\dot{λ} > 0$ , which begins shifting $V_{S}^{-}$ downward. Critical mass determines the completeness of the deterrence effect — how firmly bypass is blocked $(P 4)$ — not the presence of a safety effect as such. Below critical mass, the exclusion threat is weak, bypass is possible, but even a marginally successful regulator remains net positive in terms of social damage.

This argument, however, requires care. All three consequences — bootstrap from one match-deal, monotonic growth of $λ$ , net-positive contribution of each transaction — rest on the positive sign of $\partial λ / \partial M$ . Under dual-use this sign is not guaranteed: part of the fund, nominally directed into alignment, actually shifts not $λ$ but the $μ$ of the process $V_{t}$ . When such leakage is large, each additional match-deal accelerates capability more than it raises safety, and the regulator works in the minus. The bootstrap argument holds when the funded portfolio is orthogonal to the capabilities component, and does not hold outside this regime. That is, the early-stage binding constraint is not "finding sufficient seed capital" but "finding a seed portfolio that does not accelerate the race," and calibration of the matchmaker's throughput without control over this condition is useless.

Classic solutions to the critical-mass problem — subsidizing early participants (CAIF, LTFF, Open Philanthropy), cross-side network effects (compute access, liability shield), coordination signals (Frontier Model Forum, MLCommons) — remain applicable as accelerators but not as necessary launch conditions. The very fact of the first deal gives a positive contribution to $λ$ if that deal finances an orthogonal portfolio; without orthogonality the same network effect works on race acceleration. The next section proves that within the orthogonal regime the Suicide Region closes in finite time; the section after formalizes what makes the regime orthogonal and what happens upon exiting it.

Closing the Suicide Region

The mechanism passes muster only if it actually closes the Suicide Region in finite time, otherwise the entire construction remains a rhetorical gesture. This section proves the corresponding statement in three layers: a deterministic estimate of the closure time, a stochastic estimate of the probability of premature entry of $V_{t}$ into the region before its closure, and a formulation of the condition under which the fund attains the critical safety learning rate earlier than the capability leader crosses the preemption threshold.

The canonical safety function $π (τ) = 1 - e^{- λ τ}$ specifies the probability of safe deployment under fixed duration $τ$ and constant $λ$ . With endogenously varying $λ (t)$ , the correct form is the hazard-rate integral $π (t) = 1 - \exp (- Λ (t))$ , $Λ (t) = \int_{t_{0}}^{t} λ (s) d s$ , recovering the canonical expression under constant $λ$ . The myopic form $π (τ; t) = 1 - e^{- λ (t) τ}$ , applying the current $λ (t)$ retroactively, overstates $π$ under growing $λ$ and gives lower bounds on closure times.

We fix the assumptions:

(A1) The function $λ : [t_{0}, \infty) \to R_{+}$ is continuous, $λ (t_{0}) = λ_{0} \geq 0$ .
(A2) The flow of match-deals $M (t) \geq \bar{M} > 0$ for all $t \geq t_{0}$ .
(A3) Parameters $α, r > 0$ are fixed.
(A4) The fund flow $r M (t)$ bifurcates into safety- and capability- components as

\begin{matrix} (5.0) & \dot{λ} (t) = α (1 - ρ) r M (t), μ (t) = μ_{0} + κ ρ r M (t), \end{matrix}

where $ρ \in [0, 1]$ is the correlation of the funded work with the first principal component of the "model $\times$ benchmark" space (Ren et al. 2024), $κ \geq 0$ is the leakage gain into drift, and $ρ < ρ_{c r i t}$ (explicit form — formula 6.3 below). At $ρ = 0$ the pure dynamics of §3 is recovered.

(A5) $S \in (0, 1 / 2)$ , asymmetric "leader–follower" structure.

Under (A1)–(A5), from (5.0) we obtain the lower bound

\begin{matrix} (5.1) & \dot{Λ} (t) = λ (t) \geq λ_{0} + α (1 - ρ) r \bar{M} (t - t_{0}) . \end{matrix}

At each $t$ with $π (t) \in (0, 1)$ , time-dependent thresholds are defined

V_{P}^{-} (t) = \frac{I}{(1 - 2 S) π (t)}, V_{S}^{-} (t) = \frac{I + (1 - π (t)) D}{(1 - S) π (t)},

and the Suicide Region $S (t) = (V_{P}^{-} (t), V_{S}^{-} (t))$ , if non-empty. The closure of $S$ is conveniently rewritten as a condition on $π$ directly.

Lemma 1 (algebraic equivalence). $S (t) \neq \emptyset$ if and only if $(1 - π (t)) D > I S / (1 - 2 S)$ .

Proof. $V_{P}^{-} (t) < V_{S}^{-} (t)$ is equivalent to

\frac{I}{(1 - 2 S) π} < \frac{I + (1 - π) D}{(1 - S) π} .

Multiplying by $π (1 - S) (1 - 2 S) > 0$ we obtain $I (1 - S) < (1 - 2 S) I + (1 - 2 S) (1 - π) D$ , whence $I S < (1 - 2 S) (1 - π) D$ . $◼$

Lemma 2 (monotonicity of $π$ ). Under (A1)–(A4) the function $t \mapsto π (t)$ is strictly increasing on $[t_{0}, \infty)$ and $π (t) \to 1$ as $t \to \infty$ .

Proof. $\dot{π} (t) = λ (t) e^{- Λ (t)} > 0$ under (A2)–(A4). From (5.1) it follows that $Λ (t) \geq λ_{0} (t - t_{0}) + \frac{1}{2} α (1 - ρ) r \bar{M} (t - t_{0})^{2} \to \infty$ , whence $π (t) \to 1$ . $◼$

Combining the two lemmas yields the main statement.

Theorem 1 (closure of the Suicide Region). Under (A1)–(A5) there exists $t^{*} < \infty$ such that $S (t) = \emptyset$ for all $t \geq t^{*}$ . In integral form

\begin{matrix} (5.2) & t^{*} \leq t_{0} + \frac{- λ_{0} + \sqrt{λ_{0}^{2} + 2 α (1 - ρ) r \bar{M} L^{*}}}{α (1 - ρ) r \bar{M}}, L^{*} = {[\ln \frac{D (1 - 2 S)}{I S}]}_{+}; \end{matrix}

in myopic form

\begin{matrix} (5.3) & t^{*} \leq t_{0} + \frac{1}{α (1 - ρ) r \bar{M}} {[\frac{L^{*}}{τ} - λ_{0}]}_{+} . \end{matrix}

Proof. When $D (1 - 2 S) \leq I S$ the case is trivial: $L^{*} = 0$ , the condition of Lemma 1 fails already at $π = 0$ , and $S (t) = \emptyset$ for all $t$ . In the non-trivial case $L^{*} > 0$ , closure by Lemma 1 is equivalent to $Λ (t) \geq L^{*}$ . From (5.1) we obtain the quadratic inequality

λ_{0} (t - t_{0}) + \frac{1}{2} α (1 - ρ) r \bar{M} (t - t_{0})^{2} \geq L^{*};

the positive root gives (5.2). The myopic estimate follows from the requirement $λ (t) τ \geq L^{*}$ with linear lower bound $λ (t) \geq λ_{0} + α (1 - ρ) r \bar{M} (t - t_{0})$ . $◼$

The behavior of estimate (5.2) unfolds in two regimes. When the seed rate $λ_{0}$ is large relative to the mechanism's contribution, $λ_{0}^{2} ≫ 2 α (1 - ρ) r \bar{M} L^{*}$ , the expression under the root is dominated by the first term, and $t^{*} - t_{0} \approx L^{*} / λ_{0}$ — a linear dependence on the inverse seed rate. When the seed rate is small, $λ_{0}^{2} ≪ 2 α (1 - ρ) r \bar{M} L^{*}$ , the second term dominates, and $t^{*} - t_{0} \approx \sqrt{2 L^{*} / [α (1 - ρ) r \bar{M}]}$ — a square-root dependence on throughput. The second regime (the one the mechanism design is intended for, $λ_{0}$ small, $\bar{M}$ substantial), and in it doubling the matchmaker's throughput shortens $t^{*}$ only by a factor of $\sqrt{2} \approx 1.41$ , not by half.

We find it necessary to make a remark on the boundary of applicability of these results. The thresholds $V_{P}^{-} (t), V_{S}^{-} (t)$ are derived from instantaneous comparison of payoffs at the moment of deployment under current $λ (t)$ , without accounting for the option value of waiting under growing $λ$ . In the full option game with time-varying $λ$ , the thresholds ${\tilde{V}}_{P}^{-} (t), {\tilde{V}}_{S}^{-} (t)$ are solutions to an HJB problem with non-constant coefficients under strategic coordination of leader and follower; monotonicity of the waiting option predicts ${\tilde{V}}_{P}^{-} \geq V_{P}^{-}$ , and standard convexity of the value function (Grenadier 2002) gives closure of $\tilde{S}$ no later than the static $S$ under constant parameters. Technically, the transfer to non-stationary $λ (t)$ in a strategic environment requires a separate proof, since under strong asymmetry ${\tilde{V}}_{P}^{-} < V_{P}^{-}$ is in principle not excluded.

Closure of $S$ as a set in $V$ -space is necessary but not sufficient. A situation is possible in which $V_{t}$ falls into $S (t)$ at some $t < t^{*}$ , and the racing equilibrium triggers deployment before the mechanism closes the region. The stochastic complement estimates the probability of such entry. Denote $ρ_{S} (t_{0}; T) = P [\exists t \in [t_{0}, T] : V_{t} \in S (t)]$ .

Lemma 3 (boundary estimate). $ρ_{S} (t_{0}; t^{*}) \leq P [τ_{V_{P}^{-} (t^{*})} \leq t^{*}]$ , where $τ_{a} = inf {t \geq t_{0} : V_{t} \geq a}$ .

Proof. By Lemma 2, $π (t)$ is increasing, whence $V_{P}^{-} (t) = I / [(1 - 2 S) π (t)]$ is decreasing. The minimum of $V_{P}^{-} (t)$ on $[t_{0}, t^{*}]$ is attained at $t = t^{*}$ . Entry into $S (t)$ requires $V_{t} \geq V_{P}^{-} (t) \geq V_{P}^{-} (t^{*})$ , so the event is a sub-event of ${τ_{V_{P}^{-} (t^{*})} \leq t^{*}}$ . $◼$

The standard first-passage formula for GBM (Karatzas & Shreve 1991, §3.5.C; Borodin & Salminen 2002, II.1.1.4) with $V_{t_{0}} = v < a$ gives

\begin{matrix} (5.4) & P [τ_{a} \leq T] = Φ (\frac{\ln (v / a) + μ^{'} T}{σ \sqrt{T}}) + {(\frac{v}{a})}^{1 - 2 μ / σ^{2}} Φ (\frac{\ln (v / a) - μ^{'} T}{σ \sqrt{T}}), \end{matrix}

where $μ^{'} = μ - σ^{2} / 2$ , and the exponent in the second term is written via $μ$ without prime ( $- 2 μ^{'} / σ^{2} = 1 - 2 μ / σ^{2}$ — the canonical Karatzas–Shreve form). Substituting into Lemma 3:

Theorem 2 (stochastic complement). Under (A1)–(A5) and provided $V_{t_{0}} < V_{P}^{-} (t^{*})$

\begin{matrix} (5.5) & ρ_{S} (t_{0}; t^{*}) \leq Φ (d_{+}) + {(\frac{V_{t_{0}}}{V_{P}^{-} (t^{*})})}^{1 - 2 μ / σ^{2}} Φ (d_{-}), \end{matrix}

where $d_{\pm} = [\ln (V_{t_{0}} / V_{P}^{-} (t^{*})) \pm μ^{'} T] / [σ \sqrt{T}]$ , $T = t^{*} - t_{0}$ , $t^{*}$ given by (5.2).

Thus, the Suicide Region is effectively closed under two conditions:

First, the initial $V_{t_{0}}$ must be substantially below the preemption threshold even after its descent, $V_{t_{0}} ≪ V_{P}^{-} (t^{*})$ . This is a policy-relevant requirement: the mechanism's launch must precede substantial capability progress.
Second, the product $μ^{'} T$ must be bounded above, which couples $\bar{M}$ to $σ, μ$ and provides operational targeting. Under non-stationary volatility (the natural case if recursive self-improvement accelerates capability progress), formula (5.4) generalizes by replacing $σ^{2} T$ with the integrated variance $\int_{t_{0}}^{T} σ^{2} (s) d s$ via the time-change representation of Brownian motion (Mörters & Peres 2010, §1.2). If $σ (t)$ grows linearly or faster, the integrated variance diverges faster than $T$ , and (5.5) yields a growing probability of entry into the Suicide Region even with shrinking $T$ . This is the formal residual we return to in §7.

In other words, when $λ (t)$ reaches the level at which the Suicide Region is closed, alignment becomes a public good, and the strategic advantage of first deployment loses meaning. In Tan's formalism the leader's advantage is given by the parameter $S$ , and at $S \to 0$ the Suicide Region is empty only when $D = 0$ . But in our model, if $π (τ; t^{*}) \approx 1$ for all relevant $τ$ , the expected damage $(1 - π) D \to 0$ , and the effective leader–follower asymmetry vanishes. This is equivalent to simultaneous action of two interventions: de facto reduction of $D_{p r i v a t e}$ and endogenous emergence of the windfall effect ( $S \to 1 / 2$ ) through liquidation of the leader's monopoly premium.

To operationalize this effect, it is convenient to view the regulator as a meta-participant of the race. The accumulated $R (t)$ functions as the resource base of a quasi-player with the goal of maximizing $λ$ . This player does not compete for $V$ but competes to reach the critical level $λ^{*}$ earlier than the capability leader crosses $V_{P}^{-} (λ^{*})$ . From Lemma 1, closure occurs when $Λ (t) \geq L^{*}$ , or equivalently $λ (t) τ \geq L^{*}$ in myopic form. Accordingly

\begin{matrix} (5.6) & λ^{*} = \frac{L^{*}}{τ} = \frac{1}{τ} \ln \frac{D (1 - 2 S)}{I S} \end{matrix}

is a constant independent of ${\bar{V}}_{max}$ . The preemption threshold at $λ = λ^{*}$ equals

\begin{matrix} (5.7) & V_{P}^{-} (λ^{*}) = \frac{I}{(1 - 2 S) π^{*}}, π^{*} = 1 - \frac{I S}{D (1 - 2 S)} . \end{matrix}

The race between the fund and the capability leader is conducted as a comparison of two times. The time from $t_{0}$ until $λ (t)$ reaches $λ^{*}$ we denote $T_{λ}$ ; by Theorem 1, it is bounded above by $t^{*} - t_{0}$ from (5.2) or (5.3). The time from $t_{0}$ until ${\bar{V}}_{max} (t) = E [max_{i} V_{t}^{i}]$ reaches $V_{P}^{-} (λ^{*})$ we denote $T_{V}$ ; in the deterministic limit ${\bar{V}}_{max} (t) \approx V_{t_{0}} e^{μ (t - t_{0})}$ , whence

\begin{matrix} (5.8) & T_{V} = \frac{Γ}{μ}, Γ = \ln \frac{V_{P}^{-} (λ^{*})}{V_{t_{0}}} . \end{matrix}

The "fund outpaces capability leader" condition:

\begin{matrix} (P5’) & T_{λ} < T_{V} . \end{matrix}

In myopic form with $λ_{0} = 0$ condition (P5′) unfolds as

\begin{matrix} (P5”’) & α (1 - ρ) r \bar{M} > \frac{L^{*} μ}{τ Γ} . \end{matrix}

With $(P 5^{‴})$ the deterministic part is complete: within the orthogonal regime the mechanism closes the Suicide Region in finite time, and with sufficient throughput this occurs before the capability leader crosses the preemption threshold. Everything rests on $(A 4)$ — the assumption whose explicit form has not yet been written out. The next section derives this form and proves that under its violation, scaling $\bar{M}$ does not save the result.

Robustness to Dual-Use and the Role of the Orthogonal Regime

Capability and alignment are not canonically distinguishable: interpretability research is dual (Bostrom 2002; CAIF funding principles), and the scalar $α$ in the pure form of §3 is the share of the fund nominally directed into alignment but actually shifting $μ$ .

The decomposition (5.0) fixes $ρ \in [0, 1]$ as the correlation of the funded portfolio with the first principal component of "model $\times$ benchmark" space in the sense of Ren et al. (2024, Safetywashing). At $ρ = 0$ the portfolio is orthogonal to the capabilities component (pure differential progress; Hendrycks & Mazeika 2022). As $ρ \to 1$ the mechanism converts cooperation surplus directly into drift and is strictly worse than inaction.

Substituting (5.0) into condition (P5‴) with now explicit $μ = μ_{0} + κ ρ r \bar{M}$ :

α (1 - ρ) r \bar{M} > \frac{L^{*} (μ_{0} + κ ρ r \bar{M})}{τ Γ},

whence after regrouping

\begin{matrix} (6.1) & r \bar{M} [τ Γ α (1 - ρ) - L^{*} κ ρ] > L_{μ_{0}}^{*} . \end{matrix}

For the existence of $\bar{M} > 0$ satisfying (6.1), the coefficient in brackets must be positive:

\begin{matrix} (6.2) & τ Γ α (1 - ρ) > L^{*} κ ρ ⟺ ρ < ρ_{c r i t} := \frac{τ Γ α}{τ Γ α + L^{*} κ} . \end{matrix}

We obtain the threshold under which $(A 4)$ is operationalized. Substantively $ρ_{c r i t}$ depends on three dimensionless and one dimensional ratios:

The logarithmic capability gap $Γ$ raises $ρ_{c r i t}$ ; if deployment is far off, leakage is less dangerous.
The safety gap $L^{*}$ lowers $ρ_{c r i t}$ ; more work for the fund, more sensitivity to leakage.
The ratio $α / κ$ is the efficiency of safety to leakage gain: raises $ρ_{c r i t}$ as it grows. The dimensional coefficient $τ$ enters through the product $τ Γ α$ .

Theorem 3 (non-rescuability by scaling). Under (A1)–(A5) in the myopic safety approximation:

(a) If $ρ < ρ_{c r i t}$ , there exists ${\bar{M}}_{0} (ρ) < \infty$ such that for all $\bar{M} \geq {\bar{M}}_{0} (ρ)$ (P5′) holds, with

\begin{matrix} (6.3) & {\bar{M}}_{0} (ρ) = \frac{L_{μ_{0}}^{*}}{r [τ Γ α (1 - ρ) - L^{*} κ ρ]} . \end{matrix}

(b) If $ρ \geq ρ_{c r i t}$ , for any $\bar{M} > 0$ (P5′) fails. Increasing the matchmaker's throughput does not restore the condition.

\frac{\partial {\bar{M}}_{0}}{\partial ρ} = \frac{L_{μ_{0}}^{*} (τ Γ α + L^{*} κ)}{r [τ Γ α (1 - ρ) - L^{*} κ ρ]^{2}} > 0.

The only restoration lever is reduction of $ρ$ .

Proof. (b) At $ρ \geq ρ_{c r i t}$ the coefficient $τ Γ α (1 - ρ) - L^{*} κ ρ \leq 0$ , the LHS of (6.1) is non-positive for any $\bar{M} > 0$ , the RHS $L_{μ_{0}}^{*} > 0$ . Condition (P5′) fails. (a) At $ρ < ρ_{c r i t}$ the coefficient is strictly positive, (6.1) holds for $\bar{M} > {\bar{M}}_{0} (ρ)$ from (6.3). (c) Direct differentiation. $◼$

Theorem 3 is formulated in the myopic safety approximation, and its transfer to the strict hazard-rate form qualitatively changes the picture of the asymptotics of $\bar{M}$ . In the hazard-rate form with $λ_{0} = 0$ we have $T_{λ} \sim \sqrt{2 L^{*} / [α (1 - ρ) r \bar{M}]} = O ({\bar{M}}^{- 1 / 2})$ (square-root reduction due to the quadratic accumulation $Λ (t) = \int λ (s) d s$ ), while $T_{V} = Γ / (μ_{0} + κ ρ r \bar{M}) = O ({\bar{M}}^{- 1})$ when $κ ρ > 0$ . Then

T_{λ} / T_{V} = O ({\bar{M}}^{1 / 2}) \to \infty as \bar{M} \to \infty,

that is, in the strict form asymptotic increase of matchmaker throughput does not satisfy (P5′) when $ρ > 0$ — on the contrary, as $\bar{M} \to \infty$ the gap $T_{V} - T_{λ}$ becomes increasingly negative. Dual-use leakage in (5.0) scales linearly with $\bar{M}$ through the drift $κ ρ r \bar{M}$ , whereas safety accumulation through the integrated $Λ (t)$ gives only a sublinear reduction in $T_{λ}$ (early $λ (s)$ are small and drag the average down), and at large $\bar{M}$ leakage outpaces accumulation.

Substantively the practical conclusion is preserved, and scaling $\bar{M}$ as a lever loses on two independent counts:

realistic limits on matchmaker throughput bound $\bar{M}$ above empirically
in the strict hazard-rate form, asymptotic scaling is counterproductive at any $ρ > 0$ and requires pointwise calibration within a finite window.

In essence, control over $ρ$ becomes the only reliable lever, which transforms the matchmaker calibration problem into the problem of controlling $ρ$ . Control is operationalized as a disbursement mandate: $M$ disburses $R (t)$ only into projects with measured ${\hat{ρ}}_{p r o j} \leq \bar{ρ}$ , where $\bar{ρ} < ρ_{c r i t}$ is a chosen threshold with a margin, $\hat{ρ}$ is the correlation of the funded work with the capabilities component according to Ren et al. (2024). This turns Bostrom's differential progress criterion from an intuitive directive into a falsifiable metric with per-grant verification. The effectiveness of the gate depends on how publicly observable $\hat{ρ}$ is and how verifiable disbursement decisions are, i.e., on institutional structure.

The operational gate shifts $ρ$ but does not zero it out. As of 2026 there is no canonical criterion of differential interpretability: the safety value of interpretability remains a matter of open dispute, the symmetry between detection of jailbreaks and their construction is not operationalized. Part of the funded portfolio has a lower bound $ρ_{min} > 0$ not reducible by available methods. At first glance this is a critical vulnerability, since formulas (6.1)–(6.3) are derived for $ρ < ρ_{c r i t}$ , and the subspace with $ρ_{min}$ falls outside the scope of Theorems 1–3. The substantive question, however, is not "does the subspace fall out" but "relative to what benchmark is the loss assessed". The improvement bound in (6.2) is defined relative to the idealized point $ρ = 0$ , and relative to it $ρ_{min}$ does indeed bound the achievable, but in reality $ρ = 0$ is not the empirical counterfactual. As of 2026, alignment research is financed by philanthropic foundations, government grants, and lab internal budgets without unified $\hat{ρ}$ -filtering; allocation occurs across projects with unmeasured correlation to the capabilities component, and part of this funding empirically works in the dual-use direction, which Ren et al. (2024) record as the motivation for their metric. Accordingly, the relevant benchmark for evaluating the mechanism is the status quo.

Relative to the status quo, the subspace with $ρ_{min}$ ceases to be a subspace of pure loss. In the worst case, with a generously set $\bar{ρ}$ and weak measurement protocol, the mechanism reproduces the existing distribution of alignment funding: $\dot{λ}$ grows approximately as it does under current philanthropic channels. In the typical case, per-grant measurement of $\hat{ρ}$ provides finer resolution than the disjoint decisions of individual funds, and allocation monotonically improves relative to baseline. What we record in the formal part as a rollback to exogenous interventions ( $D_{p r i v a t e}$ , verification) on the subspace with $ρ_{min}$ in fact means a rollback to the same instruments which on this subspace already work in the status quo by default: the guarantee of Theorems 1–3 fails, but what would be in place without it is not made worse either.

The apparent asymmetry, that undirected alignment funding is credited as a plus in the current literature, while in our analysis the same channel figures as a constraint, is an artifact of the reference point. The same flow of funds receives three different signs depending on the benchmark: relative to inaction unconditionally positive, relative to the ideal $ρ = 0$ bounded above, relative to the status quo a Pareto improvement, which turns this result from a formal one into an operational one.

Regulator Governance

The operational gate requires institutional conditions:

$\hat{ρ}$ must be publicly observable
disbursement decisions must be verifiable
expertise to evaluate $\hat{ρ}$ must be embedded in decision-making; the organization's structure must be formed by its participants directly.

Transparency works as a commitment device (Schelling 1960; Kydland & Prescott 1977): all flows $R (t)$ , all allocation decisions, all participant-admission decisions are public; deviation from the declared mandate is immediately observable. This creates necessary but not sufficient conditions for resolving the Trust AI Regulation problem (Alalawi et al. 2026): transparency makes deviation observable, but coordinated response of participants (exit from the platform) itself is subject to the collective action problem, since a lab continuing to receive cooperation surplus has an individual incentive to stay even after public deviation by the regulator. Sufficiency is achieved by the same structural condition that suppresses bypass in (P4): small $N$ , repeated interactions with high effective $γ$ , and focal coordination in exclusionary behavior (Schelling 1960; Bernstein 1992; Greif 1993). Transparency of $\hat{ρ}$ plays a dual role here: in addition to controlling the orthogonal regime, it transforms the exit decision from a private choice into a publicly observable signal, making the coordinated response equilibrium by the same mechanism that supports (P4). Under these conditions, the de-escalating equilibrium is attainable. The bypass-suppression condition from $(P 4)$ is combined with the publicity of $\hat{ρ}$ in one signal: "the mechanism works in its declared regime" is equivalent to " $\hat{ρ} \leq \bar{ρ}$ is publicly confirmed".

$M$ 's decisions on calibration of $r^{*}$ , allocation of $R (t)$ , and admission of participants are made through a delegative-voting procedure. The literature on liquid democracy (Kahng, Mackenzie & Procaccia 2021; Brill & Talmon 2018; Christoff & Grossi 2017) gives formal conditions for its stability. Participants have heterogeneous expertise in alignment relative to capabilities, the delegative structure allows peripheral participants to delegate decisions to experts, transparency of delegations preserves accountability. Among classes of governance (liquid democracy, futarchy Hanson 2013, quadratic voting Weyl 2017, sortition Landemore 2020), delegative structures form a family of focal Schelling equilibria under the conditions of transparency and voluntary participation. The same expert delegation is the institutional carrier of the $\hat{ρ}$ evaluation, which will be useful when adding new participants and forming the most accurate possible estimates of $\hat{ρ}$ through aggregation of diversified expert judgments (Hong & Page 2004; Surowiecki 2004), which is strictly better than the operational-gate status quo relative to the disjoint decisions of individual funds. With subsequent narrowing of $\bar{ρ}$ as measurement standards consolidate, the distribution of funds improves monotonically relative to baseline; moreover, such a structure facilitates external and internal audits and is convenient for the application of benchmarks.

There are direct institutional analogs confirming feasibility. Self-regulatory organizations in the financial sector (DeMarzo, Fishman & Hagerty 2005) implement the same structure of financing and legitimation without coercion. Patent pools (Lerner & Tirole 2004) give an analog for the problem of joint IP use, with the caveat that Lerner & Tirole enumerate direct interest of participants in blocking the rest as a condition of pool failure. In our case, this condition is mitigated by (P4) and the orthogonal regime: blocking others through $M$ requires bypassing the gate, which is observable under transparency, and exclusion from the subsequent flow of deals. A carbon tax with revenue recycling (Bovenberg & van der Ploeg 1994) gives an analog of the double dividend: the fee simultaneously internalizes the externality and finances a public good.

Governance closes the gap between the formal mechanism and its operational implementation. Under a transparent delegative structure, the gate $\hat{ρ} \leq \bar{ρ}$ works, (P4) holds, and the central results of §5–§6 acquire an institutional carrier. What remains to verify is robustness of the mechanism to two classes of threats unrelated to governance: nonlinear capability acceleration and transition to a full-information regime.

Non-stationary Capability and Information Regime

Under a correctly functioning orthogonal regime and governance, two classes of residual threats require separate analysis.

Non-stationarity of capability progress: if $μ$ and $σ$ grow over time, the race of two monotonically growing quantities ( $λ$ and ${\bar{V}}_{max}$ ) is decided by the ratio of their derivatives, and Theorem 1 does not automatically guarantee the fund's win.
Verification systems can move the game to a full-information regime, which in the endgame may cause breakout instability.

We address both classes and show that the first partially self-corrects via Pigouvian rate recalibration, leaving only the volatility channel as an open residual, and the second adds no independent fragility on top of the first. Non-stationarity of capability decomposes into two channels:

Drift channel: when $μ$ and ${\bar{V}}_{max}$ grow, the Pigouvian rate is calibrated proportionally to the magnitude of the externality itself, $r^{*} \propto (k - 1) (V_{i} + V_{j})$ from (P3). Then $\dot{λ} = α r^{*} M (t) \propto {\bar{V}}_{max} (t) M (t)$ , while the time to reach the preemption threshold under ${\dot{\bar{V}}}_{max} \propto μ {\bar{V}}_{max}$ gives $T_{V} \propto \ln (V_{P}^{*} / V_{t_{0}}) / μ$ . Both sides of (P5‴) scale with ${\bar{V}}_{max}$ — the tax base grows together with what the race is against. Pure drift acceleration does not flip the sign of (P5′): the drift channel self-corrects when $ρ = 0$ . When $ρ > 0$ the same ${\bar{V}}_{max}$ -normalization works against the fund: leakage rides on the same base from the opposite side, and self-correction does not occur outside the orthogonal regime — this is already formalized by Theorem 3.
Volatility channel: a residual that survives §5 and is not closed by recalibration. Under non-stationary $σ (t)$ , crossing of $V_{P}^{-} (t)$ occurs as the first passage of a fluctuation whose intensity is uncoupled from the flow of match-deals $M (t)$ and from the accumulated $R (t)$ . The breakthrough becomes a tail event not correlated with $\int M$ , and monotonicity of $λ (t)$ does not provide protection against such a sprint; we have a bound on $P [first passage before λ (t) \geq λ^{*}]$ under non-stationary $σ (t)$ . This is a stochastic problem of first attainment with time-varying boundary and self-damping dynamics of $λ$ , not covered by the deterministic race of rates in (P5′) and in Theorem 3. It is moved to open questions.

Here we also describe the internal tension of Pigouvian recalibration. Under continuous recalibration $r^{*} (λ (t))$ from (P3), the safety dynamics becomes a self-damping nonlinear ODE:

\dot{λ} (t) = α M (t) D \frac{2 (k - 1)}{μ} λ (t) e^{- λ (t) τ},

where the factor $e^{- λ τ}$ chokes growth at large $λ$ . The linear growth from §3 holds only under fixed $r$ . This simultaneously strengthens the volatility residual (a slowing $λ$ wins the race against accelerating capability less well) and provides a constructive lever: fixing $r$ at the project level without downward recalibration preserves linear growth at the cost of moderate over-funding in alignment at late stages. This is an error in a benign direction. Thus, non-stationarity of capability progress narrows down to analysis of the volatility channel.

Verification systems can move the game to a full-information regime, which in the endgame accelerates the sprint to deployment — breakout instability. A straightforward reading as "full information requires a stricter $r^{*}$ and earlier bootstrap" errs in the sign of the effect. By the transparency requirement of §7, the state of the fund $(R (t), λ (t))$ is public by construction in all information regimes — it is not a private signal disclosed by verification, but by definition common knowledge. Rivals' capability leads $V^{i}$ are private without verification and public with it. The transition to full information removes the private informational advantage of rivals without changing the fund's position. The fund's relative informational position improves monotonically with growing openness.

Mechanically this sharpens the inversion of the leader's advantage described in §5. Under incomplete information, the collapse of the preemption incentive after $λ (t) \geq λ^{*}$ is delayed and noised by strategic uncertainty: a player continues racing without common knowledge that rivals will not deploy unsafely. Under public $λ (t)$ , the moment of crossing $λ^{*}$ becomes common knowledge, and the preemption game unwinds by iterated dominance immediately and in coordination. Full information accelerates breakout, but equally accelerates the coordinated stand-down on which §5 is built; accordingly, the information regime adds no independent fragility.

After the two corrections, the residual reduces to the conjunction of the timing residual of the volatility channel with the interpretability residual of §6: common knowledge of $λ (t)$ is useful exactly to the extent that it is common knowledge that $R (t)$ buys safety, not capability — i.e., to the extent that $\hat{ρ}$ is publicly credible.

Open Problems

Government programs (Manhattan-style national AI initiatives of the US and China) can in principle bypass a voluntary regulator. This is an analog of the problem of international climate negotiations (Barrett 2003) and resistance of sovereign actors to soft-law regimes. However, there is a substantial mitigation specific to the AI domain: even in the absence of $M$ , a direct match between major government programs is highly unlikely under current political conditions. Export restrictions, compute controls, outbound investment controls (Executive Order 14105 and analogs), CFIUS reviews, Chinese counter-symmetric restrictions turn any R&D cooperation into an act requiring multi-level approval with a predictably negative outcome. The public rhetoric of both sides frames AI development as geopolitical competition with a zero-sum component. State actors are de facto already in a non-cooperation equilibrium relative to each other, and the potential synergy multiplier for them is already lost — but not converted into alignment. The threat of bypass by state actors is less than it appears on a naive reading: they do not bypass $M$ while keeping the cooperation gain, they are already outside of cooperation regardless of $M$ . A more realistic picture is to view $M$ as a network superstructure over those participants for whom inter-bloc matching is possible without violating export controls. Within a single bloc (US: OpenAI, Anthropic, Google DeepMind, xAI, Meta AI), state pressure or direct financing may induce ignoring $M$ , especially under liability shield in exchange for coordination in a federal structure. On this reading, this direction remains open. One should also be wary that actors with vast unilateral resources may independently accelerate capability drift, potentially exceeding the preemption threshold.

Calibration of the Pigouvian rate requires estimation of $λ e^{- λ τ} / μ$ and $D$ , for which there is no canonical metric. This is an analog of the problem of estimating the social cost of carbon (Nordhaus 2017; Stern 2007). Robust calibration is possible through a principles-based approach with public revision. Calibration of $r^{*}$ is coupled with the estimate of $ρ_{c r i t}$ from §6: the orthogonal-regime threshold $\bar{ρ}$ must be revised together with $r^{*}$ . The operational formula from (6.3) gives the minimum matchmaker throughput under given parameters:

\begin{matrix} (9.1) & {\bar{M}}_{0} (ρ) = \frac{L_{μ_{0}}^{*}}{r [τ Γ α (1 - ρ) - L^{*} κ ρ]}, ρ < ρ_{c r i t}, \end{matrix}

or equivalently normalized by baseline drift:

\begin{matrix} (9.2) & \frac{r {\bar{M}}_{0}}{μ_{0}} = \frac{L^{*}}{τ Γ α (1 - ρ) - L^{*} κ ρ} . \end{matrix}

For empirical calibration the following are required: $L^{*} = \ln [D (1 - 2 S) / (I S)]$ from scenario estimates of $D, I, S$ ; $Γ = \ln (V_{P}^{*} / V_{t_{0}})$ from current market capitalization of frontier labs and (5.7); $τ$ from researcher surveys; $α$ from the historical effect of alignment investments; $κ$ from the average capability-spillover per dollar of interpretability research from Ren et al. (2024); $μ_{0}$ from compute scaling laws; $ρ$ from the distribution of $\hat{ρ}$ across the portfolio, auditable per-grant. In the endgame regime as $λ (t) \to λ^{*}$ , the Pigouvian rate takes the form $r_{e n d}^{*} = λ^{*} I S \cdot 2 (k - 1) / [(1 - 2 S) μ]$ via the substitution $e^{- L^{*}} = I S / [D (1 - 2 S)]$ . $D$ drops out via this relation, and $r_{e n d}^{*}$ depends only on $λ^{*}, I, S, k, μ$ . This provides a calibration point; intermediate values are interpolated via (P3) with time-varying $λ (t)$ . The minimum subsidy $R (0)$ ensuring transition to a regime protected against bypass is the substitution $\bar{M} = {\bar{M}}_{0} (ρ)$ into computations of the expected volume of match-deals to reach the exclusion threshold (P4) as a subset of the calibration of ${\bar{M}}_{0}$ .

Theorems 1–3 are formulated in the static formulation of thresholds, whereas in the full option game with time-varying $λ (t)$ the thresholds are given by an HJB problem with non-constant coefficients, and although the heuristic gives a plausible sign, a rigorous generalization of value-function convexity (Grenadier 2002) to the non-stationary strategic case is not obtained in the present work.

Also, the two irreducible robustness residuals compound, and the orthogonal regime (formula 6.2) is the only construction in the work that breaks this feedback, and its effectiveness is bounded above precisely by $ρ_{min}$ . The early-stage binding constraint, separated from rate calibration, is consolidated here: at early stages, what binds is not the size of $R (0)$ but the finding of a seed-funded portfolio within the orthogonal regime, $\hat{ρ} < ρ_{c r i t}$ . A well-capitalized regulator outside the regime is strictly worse than inaction (Theorem 3); a poorly capitalized one within it remains net-positive. The philanthropic seed is optimized by $\hat{ρ}$ , not just by volume.

Сonclusion

We demonstrate that the proposed mechanism is capable of effectively de-escalating the AGI race, operating at several levels: from the individual benefit of particular labs to global change in safety dynamics.

At the micro level, we have shown that under characteristic capitalization parameters, participation in the matchmaker's work strategically dominates direct cooperation and its absence. This is ensured within the assumptions that the costs of establishing, agreeing IP-sharing, and bilateral verification of a direct deal are high under current conditions, and the benefit of joint use of compute and data substantially outweighs the levied fee. So that the mechanism does not undermine public safety, we computed the optimal size of the Pigouvian fee. It is calibrated so that the sum of collected funds exactly covers the additional risk of catastrophe arising from acceleration of progress under cooperation. Moreover, in the conditions of repeated interaction, labs will not attempt to bypass the matchmaker, since the risk of exclusion from future profitable deals makes honest play strategically dominant.

At the macro level, we proved closure of the Suicide Region in the static formulation of thresholds in finite time and bounded above the probability of premature entry into the region before its closure. To minimize the probability of a technological sprint, the mechanism's launch must occur before progress in capabilities becomes too rapid.

Sources

Alalawi, F., Han, T. A., Zisis, I., Lenaerts, T. & Santos, F. C. (2026). Trust AI Regulation? Discerning users are vital to build trust and effective AI regulation. Applied Mathematics and Computation.
Armstrong, S., Bostrom, N. & Shulman, C. (2016). Racing to the precipice: A model of artificial intelligence development. AI & Society, 31(2), 201–206.
Barrett, S. (2003). Environment and Statecraft: The Strategy of Environmental Treaty-Making. Oxford University Press.
Bernstein, L. (1992). Opting out of the legal system: Extralegal contractual relations in the diamond industry. Journal of Legal Studies, 11(1), 115–157.
Borodin, A. N. & Salminen, P. (2002). Handbook of Brownian Motion: Facts and Formulae. Birkhäuser.
Bostrom, N. (2002). Existential risks. Journal of Evolution and Technology, 9.
Bovenberg, A. L. & van der Ploeg, F. (1994). Environmental policy, public finance and the labour market in a second-best world. Journal of Public Economics, 55(3), 349–390.
Brill, M. & Talmon, N. (2018). Pairwise liquid democracy. IJCAI, 137–143.
Caillaud, B. & Jullien, B. (2003). Chicken & egg: Competition among intermediation service providers. RAND Journal of Economics, 34(2), 309–328.
Christoff, Z. & Grossi, D. (2017). Binary voting with delegable proxy: An analysis of liquid democracy. TARK.
Cimpeanu, T., Santos, F. C., Pereira, L. M., Lenaerts, T. & Han, T. A. (2022). Artificial intelligence development races in heterogeneous settings. Scientific Reports, 12, 1723.
Clifton, J. & Martin, S. Differential Progress in Cooperative AI: Motivation and Measurement. Cooperative AI Foundation seminar / working note.
DeMarzo, P. M., Fishman, M. J. & Hagerty, K. M. (2005). Self-regulation and government oversight. Review of Economic Studies, 72(3), 687–706.
Dixit, A. K. & Pindyck, R. S. (1994). Investment under Uncertainty. Princeton University Press.
Greif, A. (1993). Contract enforceability and economic institutions in early trade. American Economic Review, 83(5), 524–548.
Grenadier, S. R. (2002). Option exercise games: An application to the equilibrium investment strategies of firms. Review of Financial Studies, 15(3), 691–721.
Han, T. A., Pereira, L. M., Santos, F. C. & Lenaerts, T. (2020). To regulate or not: A social dynamics analysis of an idealised AI race. JAIR, 69, 881–921.
Han, T. A., Lenaerts, T., Santos, F. C. & Pereira, L. M. (2021). Voluntary safety commitments provide an escape from over-regulation in AI development. Technology in Society. arXiv:2104.03741.
Hanson, R. (2013). Shall we vote on values, but bet on beliefs? Journal of Political Philosophy, 21(2), 151–178.
Hendrycks, D. & Mazeika, M. (2022). X-Risk Analysis for AI Research / Pragmatic AI Safety. arXiv:2206.05862.
Hong, L. & Page, S. E. (2004). Groups of diverse problem solvers can outperform groups of high-ability problem solvers. PNAS, 101(46), 16385–16389.
Kahng, A., Mackenzie, S. & Procaccia, A. D. (2021). Liquid democracy: An algorithmic perspective. JAIR, 70, 1223–1252.
Karatzas, I. & Shreve, S. E. (1991). Brownian Motion and Stochastic Calculus. Springer.
Kydland, F. E. & Prescott, E. C. (1977). Rules rather than discretion. Journal of Political Economy, 85(3), 473–491.
Landemore, H. (2020). Open Democracy. Princeton University Press.
Lerner, J. & Tirole, J. (2004). Efficient patent pools. American Economic Review, 94(3), 691–711.
Mörters, P. & Peres, Y. (2010). Brownian Motion. Cambridge University Press.
Nordhaus, W. D. (2017). Revisiting the social cost of carbon. PNAS, 114(15), 1518–1523.
O'Keefe, C., Cihon, P., Garfinkel, B., Flynn, C., Leung, J. & Dafoe, A. (2020). The windfall clause: Distributing the benefits of AI for the common good. Centre for the Governance of AI.
Pigou, A. C. (1920). The Economics of Welfare. Macmillan.
Ren, R., Basart, S., Khoja, A., Gatti, A., Phan, L., Yin, X., Mazeika, M., Pan, A., Mukobi, G., Kim, R. H., Fitz, S. & Hendrycks, D. (2024). Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? NeurIPS 2024 Datasets & Benchmarks Track. arXiv:2407.21792.
Rochet, J.-C. & Tirole, J. (2003). Platform competition in two-sided markets. JEEA, 1(4), 990–1029.
Sandmo, A. (1975). Optimal taxation in the presence of externalities. Swedish Journal of Economics, 77(1), 86–98.
Schelling, T. C. (1960). The Strategy of Conflict. Harvard University Press.
Stern, N. (2007). The Economics of Climate Change: The Stern Review. Cambridge University Press.
Surowiecki, J. (2004). The Wisdom of Crowds. Doubleday.
Tan, D. (2025). The Suicide Region: Option Games and the Race to Artificial General Intelligence. Working paper.
Weeds, H. (2002). Strategic delay in a real options model of R&D competition. Review of Economic Studies, 69(3), 729–747.
Weyl, E. G. (2017). The robustness of quadratic voting. Public Choice, 172(1–2), 75–107.
Williamson, O. E. (1979). Transaction-cost economics: The governance of contractual relations. Journal of Law and Economics, 22(2), 233–261.
Enabling Frontier Lab Coordination to Mitigate AI Safety Risks (2025). arXiv:2511.08631.