Self-Funded Pigovian Matchmaker as a Mechanism for De-Escalating the AGI Race

Abstract

A formal mechanism is presented in which a willing regulator-matchmaker fosters cooperation on resources among participants in the AGI race, collects a Pigouvian tax based on the speed-up it induces, and invests the proceeds into alignment research. The construction is derived in the continuous-time options framework of Tan (2025) in which cooperation is treated as a jump in the underlying asset value of participating players, the Pigouvian component is matched to the marginal effect of increasing expected loss, and the total collected fund endogenizes the rate of learning on safety. It is shown how the framework allows for determining participation and optimal activity levels.

Conditions under which it is optimal to enter the market are derived, and it is proven that if the orthogonality condition holds between the supported portfolio and the abilities component , the Suicide Region collapses at finite time, and the upper bound for this time is derived as sum of a deterministic and random term. Finally, if orthogonality is violated, it is proven that enhancing matchmaker capacity does not recover the market's superiority. The construction links research areas including two-sided markets, Pigouvian taxes, self-regulatory organizations, private law enforcement, evolutionary modeling of AI races, real options and option games, measurement of comparative progress and analysis of the Suicide Region.

Motivation

Tan (2025) sets up the AGI race as a symmetric preemption game with continuous time, developing the earlier framework of Armstrong, Bostrom & Shulman (2016). Each player’s asset value is described by GBM and the safety learning function is determined by the length of time safety research takes before the tool is released and by an exogenous learning rate.

The leader and the follower have identical payoff functions which include an expectation loss component, stemming from a misalignment. Given the condition for equal payoffs the term drops out, and the preemption threshold turns out to be independent of this expectation loss. However, the survival threshold contains the expectation loss, and at large losses the preemption threshold falls below the survival threshold. The Suicide Region is defined as the interval of asset values within which the game prescribes a race even when the risk-adjusted present value is negative.

There are identified three types of interventions:

All these require an exogenous agent with coercive abilities, which in turn runs into the problem that no global sovereign exist. Similar approaches (e.g., Han et al. 2020, 2021; Cimpeanu et al. 2022) similarly rely on a regulator, but cannot answer the most important question of: how can we afford and legalize a regulator when no central authority exists?

We propose a new mechanism that does not involve coercion (participation is voluntary), is self-financed through the race surplus itself, the regulator is not involved in the race (this is ensured by transparency and a democratic governance system), and funds accumulated allow endogenous increases of the learning rate, hence downward shift of the survival threshold, and eventual closure of the Suicide Region.

Formal Model

Let N2 be the number of participants in the AGI race, indexed i{1,,N}. Each player is associated with a process Vti:

dVti=μVtidt+σVtidZti,μ=rfδ.

The safety learning function is given by π(τ)=1eλτ, where τ is the duration of safety research prior to deployment, and λ>0 denotes the exogenous rate. All players have the same sunk cost I and face a common D.

Leader payoff:

L(Vτ,τ)=(1S)π(τ)Vτ(1π(τ))DI,

Follower payoff:

F(Vτ,τ)=Sπ(τ)Vτ(1π(τ))D.

Preemption threshold:

VP=I(12S)π(τ),

Survival threshold:

VS=I+(1π(τ))D(1S)π(τ).

Time until deployment of player i is denoted τi, while deployment is triggered when Vti crosses the strategically active threshold (in Tan's equilibrium this is VP, in social optimum VS).

Assume that any pair (i,j) may at time t enter into a cooperative deal. We observe a simultaneous jump in both asset values:

VtiVti+12(k1)(Vti+Vtj),VtjVtj+12(k1)(Vti+Vtj),

where k>1 is the synergy multiplier. To simplify modeling, we assume the total increment (k1)(Vti+Vtj) is distributed symmetrically and includes joint use of compute, exchange of data, and exchange of non-conflicting architectural discoveries, raising the capitalized value of each participant without splitting it into shares.

Direct cooperation without an intermediary requires payment of search costs s>0. In context, search means not so much finding a partner and establishing contact (which is non-critical for a small number of participants), but rather finding compatible sets of complementary assets (datasets, compute, non-conflicting architectural discoveries) within each lab's declared portfolio. Also involved are agreement on the structure of IP sharing and the gains from joint work, legal formalization of the deal without disclosing competitively-sensitive information prior to signing, and the establishment of bilateral verification infrastructure for compliance, not relying on a third-party arbiter. Assume that aggregate s is sufficiently large: at typical (Vti,Vtj)

s>β(k1)(Vti+Vtj),β(0,1),

that is, direct cooperation is unprofitable at the current equilibrium and empirically rare (Williamson 1979 on transaction costs; Enabling Frontier Lab Coordination 2025 on the current state of inter-lab interaction). This is consistent with small N as an enforcement mechanism: a small number of labs ensures identification of potential partners and observability of agreement violations, but does not eliminate the costs of establishing and verifying each specific deal.

Cooperation is socially negative without internalization of the externality. Increase of Vti,Vtj brings the crossing of the fixed thresholds VP,VS closer, but does not shift the thresholds themselves. Consequently, expected τ until deployment shrinks, π(τ) decreases, and (1π(τ))D grows. If this acceleration is not internalized via a fee, cooperation increases aggregate expected damage.

The condition for the non-emptiness of the Suicide Region S={VP<V<VS} is derived directly from VP<VS:

()I(12S)π<I+(1π)D(1S)πD>IS(12S)(1π).

The parametric subset

S(π,I,D,S)={V:VP(π)<V<VS(π)}

is non-empty if and only if condition () holds.

In the "winner-takes-all" limit (S0), condition () reduces to (1π)D>0, i.e., the Suicide Region is non-empty for any non-zero expected damage and any π<1.

Proposed Mechanism

The regulator-matchmaker M pairs labs for resource cooperation such as joint compute use, data exchange, and exchange of non-conflicting architectural discoveries, after which it levies a fee τM=δM+r per deal. The part δM covers operational cost and is assumed negligibly small hereafter for simplicity; the part r, the Pigouvian component, is directed into alignment research. The accumulated fund R(t) raises the safety learning rate λ, which shifts the survival threshold VS downward. At sufficient intensity, VS descends to VP and the Suicide Region closes.

For such a construction to work, three conditions must hold.

Consider player i's decision at time t given an available pair j. Cooperation generates a jump of total value (k1)(Vti+Vtj), distributed symmetrically among participants: labs jointly use compute, exchange datasets, share architectural results that are not in direct competitive conflict. The player has three options, each with its own net jump ΔVi:

ΔVicoop,M=12[(k1)(Vti+Vtj)(δM+r)],ΔVicoop,direct=12[(k1)(Vti+Vtj)s],ΔVinone=0.

Direct cooperation without an intermediary requires search costs s, empirically high — in the sense developed in §2: not absence of knowledge about partners' existence, but the laboriousness of finding specific complementarities, agreeing IP-sharing, legal formalization, and bilateral verification without a third-party arbiter. Williamson (1979) provides the theoretical framework for these transaction costs; Enabling Frontier Lab Coordination (2025) provides a recent empirical estimate of their magnitude for frontier labs. The regulator reduces them to sMs, in the limit to zero, since it specializes precisely in these four subtasks. Then participation in the M-deal dominates direct cooperation when

(P1)τM=δM+r<s

and dominates declining cooperation when

(P2)(k1)(Vti+Vtj)>δM+r.

At typical Vti,Vtj of large labs and moderate values of τM, both conditions hold with substantial margin. This is the basic operational hypothesis of the work: for participants with large capitalization, the cooperation gain substantially exceeds any reasonable fee, and the remaining substantive question is the calibration of r.

Cooperation is socially negative without internalization of the externality. An upward jump in Vti,Vtj brings the crossing of the fixed thresholds VP,VS closer without shifting the thresholds themselves: expected time until deployment shrinks, π(τ) decreases, expected damage (1π)D grows. This is the externality to be internalized. Quantitatively it can be estimated as follows: for a process Vt following geometric Brownian motion with μ>σ2/2, the expected first-passage time to a threshold decreases in the current value with first-order elasticity Eτ/V1/(μVt), to first order in σ. The total jump ΔVcoop=(k1)(Vti+Vtj) in a deal between i and j shortens the expected time until deployment by Δτ2(k1)/μ, and the corresponding marginal increment in expected damage, by the chain rule, equals

ΔSC=πτΔτD=λ(t)eλ(t)τ2(k1)μD.

By the Pigouvian taxation principle (Pigou 1920; Sandmo 1975), the fee must equal the marginal social damage:

(P3)r=λ(t)eλ(t)τD2(k1)μ.

In a simplified intuitive form, (P3) reads as r(pr/C)DΔC: the fee is proportional to the marginal increase in the probability of misalignment per unit increase in capabilities. In Tan's formalism this increase is expressed through the derivative of the safety function with respect to time and through the shortening of waiting time induced by the GBM jump. Aggregately, the expected increment (1π)D caused by the deal is fully covered by the levied fee.

The fees aggregate into a fund

R(t)=rt0tM(s)ds,

where M(s) is the flow of match-deals. In Tan, the safety learning rate λ is exogenous; in our setting, the fund finances alignment research, and we endogenize λ via

(3.1, 3.2)λ(t)=λ0+αR(t),λ˙(t)=αrM(t),

where α>0 is the efficiency of converting alignment capital into safety learning rate. Under a sustained flow M(t)M¯ this yields linear growth λ(t)λ0+αrM¯(tt0) and, consequently, π(τ;t)1 as t for any fixed τ>0.

Equation (3.1) requires two caveats. The first concerns the interpretation of π. Canonically π(τ)=1eλτ is the probability of safe deployment given fixed duration τ and constant λ. With time-varying λ(t), two interpretations are possible.

The myopic overstates π under growing λ, and hence myopic estimates of Suicide Region closure times are lower bounds relative to the integral ones. The main text below is written in myopic notation for compatibility with the formalism; central claims are additionally verified in the integral form.

The second caveat concerns the scalar α itself. Contemporary alignment research suffers from a dual-use problem: part of the funded work, nominally directed into λ, actually shifts μ. This means α in (3.1) implicitly assumes a portfolio orthogonal to the capabilities component in a sense we make precise later through the correlation metric ρ of Ren et al. (2024). Without orthogonality, equation (3.1) loses its sign. In general, orthogonality is taken as an idealization as gives it operational content and delineates the regime in which it is applicable.

With these caveats, conditions (P1), (P2), (P3) jointly characterize the mechanism at the micro level: they specify when a participant goes through M and how much M collects from them. The next section verifies that these conditions survive the transition to the equilibrium of the repeated game, that is, that participants do not learn to bypass the regulator after the first match.

Mechanism Robustness in Equilibrium

We showed that in a single round, participation in the M-deal is dominant under (P1P3). But participants do not play a single round. A pair (i,j), having once established contact through the regulator, can in the future cooperate directly, saving τM net of residual search costs. If such bypasses become equilibrium, the fund R(t) does not fill, and the central scheme collapses.

This is a standard private-enforcement problem, and it has a canonical answer. In a discounted repeated game with discount factor γ(0,1) and expected flow μM of match-deals per unit time, the per-period payoff of a participant in terms of Vt-jumps is

ΠperiodμM[(k1)V¯12τM],

where V¯=(Vti+Vtj)/2. The one-shot gain from bypassing in a single deal is the saved half of the fee, τM/2. The cost of bypassing, if the regulator excludes violators upon detection, is the present value of the foregone deal flow,

Φ(γ,μM)=μM[(k1)V¯12τM]γ1γ.

The condition for suppressing bypass:

(P4)τM2<μM[(k1)V¯12τM]γ1γ.

At typical γ1, μM1, (k1)V¯τM, this condition holds with substantial margin. Here we have a formal justification of the intuition about small N as an enforcement mechanism: a small number of frontier labs implies a high effective γ, because future deals within a small group are nearly guaranteed (Bernstein 1992 on the Diamond District; Greif 1993 on the Maghribi). Transparency of M's decisions and financial flows, developed in §6, ensures observability of violations: bypass becomes common knowledge, and the exclusion threat is backed by coordination in excluding behavior of the remaining participants.

Here arises a second question: the standard two-sided platforms literature (Caillaud & Jullien 2003; Rochet & Tirole 2003) indicates that below critical mass such platforms die: network ties do not form, network effects do not activate, platform value collapses to zero. This creates a classic bootstrap problem: even a socially useful platform may fail to launch if the first participants lack the incentive to join.

The proposed mechanism differs qualitatively. The Pigouvian component r is directed into alignment per transaction, not proportionally to total volume or network effect. Each individual match-deal, including the first, strictly increases λ(t):

λ(t)M(t)=αr>0M(t)1.

Hence the regulator brings social benefit from the moment of formation: a single match-deal suffices for λ˙>0, which begins shifting VS downward. Critical mass determines the completeness of the deterrence effect — how firmly bypass is blocked (P4) — not the presence of a safety effect as such. Below critical mass, the exclusion threat is weak, bypass is possible, but even a marginally successful regulator remains net positive in terms of social damage.

This argument, however, requires care. All three consequences — bootstrap from one match-deal, monotonic growth of λ, net-positive contribution of each transaction — rest on the positive sign of λ/M. Under dual-use this sign is not guaranteed: part of the fund, nominally directed into alignment, actually shifts not λ but the μ of the process Vt. When such leakage is large, each additional match-deal accelerates capability more than it raises safety, and the regulator works in the minus. The bootstrap argument holds when the funded portfolio is orthogonal to the capabilities component, and does not hold outside this regime. That is, the early-stage binding constraint is not "finding sufficient seed capital" but "finding a seed portfolio that does not accelerate the race," and calibration of the matchmaker's throughput without control over this condition is useless.

Classic solutions to the critical-mass problem — subsidizing early participants (CAIF, LTFF, Open Philanthropy), cross-side network effects (compute access, liability shield), coordination signals (Frontier Model Forum, MLCommons) — remain applicable as accelerators but not as necessary launch conditions. The very fact of the first deal gives a positive contribution to λ if that deal finances an orthogonal portfolio; without orthogonality the same network effect works on race acceleration. The next section proves that within the orthogonal regime the Suicide Region closes in finite time; the section after formalizes what makes the regime orthogonal and what happens upon exiting it.

Closing the Suicide Region

The mechanism passes muster only if it actually closes the Suicide Region in finite time, otherwise the entire construction remains a rhetorical gesture. This section proves the corresponding statement in three layers: a deterministic estimate of the closure time, a stochastic estimate of the probability of premature entry of Vt into the region before its closure, and a formulation of the condition under which the fund attains the critical safety learning rate earlier than the capability leader crosses the preemption threshold.

The canonical safety function π(τ)=1eλτ specifies the probability of safe deployment under fixed duration τ and constant λ. With endogenously varying λ(t), the correct form is the hazard-rate integral π(t)=1exp(Λ(t)), Λ(t)=t0tλ(s)ds, recovering the canonical expression under constant λ. The myopic form π(τ;t)=1eλ(t)τ, applying the current λ(t) retroactively, overstates π under growing λ and gives lower bounds on closure times.

We fix the assumptions:

(5.0)λ˙(t)=α(1ρ)rM(t),μ(t)=μ0+κρrM(t),

where ρ[0,1] is the correlation of the funded work with the first principal component of the "model × benchmark" space (Ren et al. 2024), κ0 is the leakage gain into drift, and ρ<ρcrit (explicit form — formula 6.3 below). At ρ=0 the pure dynamics of §3 is recovered.

Under (A1)–(A5), from (5.0) we obtain the lower bound

(5.1)Λ˙(t)=λ(t)λ0+α(1ρ)rM¯(tt0).

At each t with π(t)(0,1), time-dependent thresholds are defined

VP(t)=I(12S)π(t),VS(t)=I+(1π(t))D(1S)π(t),

and the Suicide Region S(t)=(VP(t),VS(t)), if non-empty. The closure of S is conveniently rewritten as a condition on π directly.

Lemma 1 (algebraic equivalence). S(t) if and only if (1π(t))D>IS/(12S).

Proof. VP(t)<VS(t) is equivalent to

I(12S)π<I+(1π)D(1S)π.

Multiplying by π(1S)(12S)>0 we obtain I(1S)<(12S)I+(12S)(1π)D, whence IS<(12S)(1π)D.

Lemma 2 (monotonicity of π). Under (A1)–(A4) the function tπ(t) is strictly increasing on [t0,) and π(t)1 as t.

Proof. π˙(t)=λ(t)eΛ(t)>0 under (A2)–(A4). From (5.1) it follows that Λ(t)λ0(tt0)+12α(1ρ)rM¯(tt0)2, whence π(t)1.

Combining the two lemmas yields the main statement.

Theorem 1 (closure of the Suicide Region). Under (A1)–(A5) there exists t< such that S(t)= for all tt. In integral form

(5.2)tt0+λ0+λ02+2α(1ρ)rM¯Lα(1ρ)rM¯,L=[lnD(12S)IS]+;

in myopic form

(5.3)tt0+1α(1ρ)rM¯[Lτλ0]+.

Proof. When D(12S)IS the case is trivial: L=0, the condition of Lemma 1 fails already at π=0, and S(t)= for all t. In the non-trivial case L>0, closure by Lemma 1 is equivalent to Λ(t)L. From (5.1) we obtain the quadratic inequality

λ0(tt0)+12α(1ρ)rM¯(tt0)2L;

the positive root gives (5.2). The myopic estimate follows from the requirement λ(t)τL with linear lower bound λ(t)λ0+α(1ρ)rM¯(tt0).

The behavior of estimate (5.2) unfolds in two regimes. When the seed rate λ0 is large relative to the mechanism's contribution, λ022α(1ρ)rM¯L, the expression under the root is dominated by the first term, and tt0L/λ0 — a linear dependence on the inverse seed rate. When the seed rate is small, λ022α(1ρ)rM¯L, the second term dominates, and tt02L/[α(1ρ)rM¯] — a square-root dependence on throughput. The second regime (the one the mechanism design is intended for, λ0 small, M¯ substantial), and in it doubling the matchmaker's throughput shortens t only by a factor of 21.41, not by half.

We find it necessary to make a remark on the boundary of applicability of these results. The thresholds VP(t),VS(t) are derived from instantaneous comparison of payoffs at the moment of deployment under current λ(t), without accounting for the option value of waiting under growing λ. In the full option game with time-varying λ, the thresholds V~P(t),V~S(t) are solutions to an HJB problem with non-constant coefficients under strategic coordination of leader and follower; monotonicity of the waiting option predicts V~PVP, and standard convexity of the value function (Grenadier 2002) gives closure of S~ no later than the static S under constant parameters. Technically, the transfer to non-stationary λ(t) in a strategic environment requires a separate proof, since under strong asymmetry V~P<VP is in principle not excluded.

Closure of S as a set in V-space is necessary but not sufficient. A situation is possible in which Vt falls into S(t) at some t<t, and the racing equilibrium triggers deployment before the mechanism closes the region. The stochastic complement estimates the probability of such entry. Denote ρS(t0;T)=P[t[t0,T]:VtS(t)].

Lemma 3 (boundary estimate). ρS(t0;t)P[τVP(t)t], where τa=inf{tt0:Vta}.

Proof. By Lemma 2, π(t) is increasing, whence VP(t)=I/[(12S)π(t)] is decreasing. The minimum of VP(t) on [t0,t] is attained at t=t. Entry into S(t) requires VtVP(t)VP(t), so the event is a sub-event of {τVP(t)t}.

The standard first-passage formula for GBM (Karatzas & Shreve 1991, §3.5.C; Borodin & Salminen 2002, II.1.1.4) with Vt0=v<a gives

(5.4)P[τaT]=Φ(ln(v/a)+μTσT)+(va)12μ/σ2Φ(ln(v/a)μTσT),

where μ=μσ2/2, and the exponent in the second term is written via μ without prime (2μ/σ2=12μ/σ2 — the canonical Karatzas–Shreve form). Substituting into Lemma 3:

Theorem 2 (stochastic complement). Under (A1)–(A5) and provided Vt0<VP(t)

(5.5)ρS(t0;t)Φ(d+)+(Vt0VP(t))12μ/σ2Φ(d),

where d±=[ln(Vt0/VP(t))±μT]/[σT], T=tt0, t given by (5.2).

Thus, the Suicide Region is effectively closed under two conditions:

In other words, when λ(t) reaches the level at which the Suicide Region is closed, alignment becomes a public good, and the strategic advantage of first deployment loses meaning. In Tan's formalism the leader's advantage is given by the parameter S, and at S0 the Suicide Region is empty only when D=0. But in our model, if π(τ;t)1 for all relevant τ, the expected damage (1π)D0, and the effective leader–follower asymmetry vanishes. This is equivalent to simultaneous action of two interventions: de facto reduction of Dprivate and endogenous emergence of the windfall effect (S1/2) through liquidation of the leader's monopoly premium.

To operationalize this effect, it is convenient to view the regulator as a meta-participant of the race. The accumulated R(t) functions as the resource base of a quasi-player with the goal of maximizing λ. This player does not compete for V but competes to reach the critical level λ earlier than the capability leader crosses VP(λ). From Lemma 1, closure occurs when Λ(t)L, or equivalently λ(t)τL in myopic form. Accordingly

(5.6)λ=Lτ=1τlnD(12S)IS

is a constant independent of V¯max. The preemption threshold at λ=λ equals

(5.7)VP(λ)=I(12S)π,π=1ISD(12S).

The race between the fund and the capability leader is conducted as a comparison of two times. The time from t0 until λ(t) reaches λ we denote Tλ; by Theorem 1, it is bounded above by tt0 from (5.2) or (5.3). The time from t0 until V¯max(t)=E[maxiVti] reaches VP(λ) we denote TV; in the deterministic limit V¯max(t)Vt0eμ(tt0), whence

(5.8)TV=Γμ,Γ=lnVP(λ)Vt0.

The "fund outpaces capability leader" condition:

(P5’)Tλ<TV.

In myopic form with λ0=0 condition (P5′) unfolds as

(P5”’)α(1ρ)rM¯>LμτΓ.

With (P5) the deterministic part is complete: within the orthogonal regime the mechanism closes the Suicide Region in finite time, and with sufficient throughput this occurs before the capability leader crosses the preemption threshold. Everything rests on (A4) — the assumption whose explicit form has not yet been written out. The next section derives this form and proves that under its violation, scaling M¯ does not save the result.

Robustness to Dual-Use and the Role of the Orthogonal Regime

Capability and alignment are not canonically distinguishable: interpretability research is dual (Bostrom 2002; CAIF funding principles), and the scalar α in the pure form of §3 is the share of the fund nominally directed into alignment but actually shifting μ.

The decomposition (5.0) fixes ρ[0,1] as the correlation of the funded portfolio with the first principal component of "model × benchmark" space in the sense of Ren et al. (2024, Safetywashing). At ρ=0 the portfolio is orthogonal to the capabilities component (pure differential progress; Hendrycks & Mazeika 2022). As ρ1 the mechanism converts cooperation surplus directly into drift and is strictly worse than inaction.

Substituting (5.0) into condition (P5‴) with now explicit μ=μ0+κρrM¯:

α(1ρ)rM¯>L(μ0+κρrM¯)τΓ,

whence after regrouping

(6.1)rM¯[τΓα(1ρ)Lκρ]>Lμ0.

For the existence of M¯>0 satisfying (6.1), the coefficient in brackets must be positive:

(6.2)τΓα(1ρ)>Lκρρ<ρcrit:=τΓατΓα+Lκ.

We obtain the threshold under which (A4) is operationalized. Substantively ρcrit depends on three dimensionless and one dimensional ratios:

Theorem 3 (non-rescuability by scaling). Under (A1)–(A5) in the myopic safety approximation:

(a) If ρ<ρcrit, there exists M¯0(ρ)< such that for all M¯M¯0(ρ) (P5′) holds, with

(6.3)M¯0(ρ)=Lμ0r[τΓα(1ρ)Lκρ].

(b) If ρρcrit, for any M¯>0 (P5′) fails. Increasing the matchmaker's throughput does not restore the condition.

(c) M¯0 is strictly increasing in ρ:

M¯0ρ=Lμ0(τΓα+Lκ)r[τΓα(1ρ)Lκρ]2>0.

The only restoration lever is reduction of ρ.

Proof. (b) At ρρcrit the coefficient τΓα(1ρ)Lκρ0, the LHS of (6.1) is non-positive for any M¯>0, the RHS Lμ0>0. Condition (P5′) fails. (a) At ρ<ρcrit the coefficient is strictly positive, (6.1) holds for M¯>M¯0(ρ) from (6.3). (c) Direct differentiation.

Theorem 3 is formulated in the myopic safety approximation, and its transfer to the strict hazard-rate form qualitatively changes the picture of the asymptotics of M¯. In the hazard-rate form with λ0=0 we have Tλ2L/[α(1ρ)rM¯]=O(M¯1/2) (square-root reduction due to the quadratic accumulation Λ(t)=λ(s)ds), while TV=Γ/(μ0+κρrM¯)=O(M¯1) when κρ>0. Then

Tλ/TV=O(M¯1/2)as M¯,

that is, in the strict form asymptotic increase of matchmaker throughput does not satisfy (P5′) when ρ>0 — on the contrary, as M¯ the gap TVTλ becomes increasingly negative. Dual-use leakage in (5.0) scales linearly with M¯ through the drift κρrM¯, whereas safety accumulation through the integrated Λ(t) gives only a sublinear reduction in Tλ (early λ(s) are small and drag the average down), and at large M¯ leakage outpaces accumulation.

Substantively the practical conclusion is preserved, and scaling M¯ as a lever loses on two independent counts:

In essence, control over ρ becomes the only reliable lever, which transforms the matchmaker calibration problem into the problem of controlling ρ. Control is operationalized as a disbursement mandate: M disburses R(t) only into projects with measured ρ^projρ¯, where ρ¯<ρcrit is a chosen threshold with a margin, ρ^ is the correlation of the funded work with the capabilities component according to Ren et al. (2024). This turns Bostrom's differential progress criterion from an intuitive directive into a falsifiable metric with per-grant verification. The effectiveness of the gate depends on how publicly observable ρ^ is and how verifiable disbursement decisions are, i.e., on institutional structure.

The operational gate shifts ρ but does not zero it out. As of 2026 there is no canonical criterion of differential interpretability: the safety value of interpretability remains a matter of open dispute, the symmetry between detection of jailbreaks and their construction is not operationalized. Part of the funded portfolio has a lower bound ρmin>0 not reducible by available methods. At first glance this is a critical vulnerability, since formulas (6.1)–(6.3) are derived for ρ<ρcrit, and the subspace with ρmin falls outside the scope of Theorems 1–3. The substantive question, however, is not "does the subspace fall out" but "relative to what benchmark is the loss assessed". The improvement bound in (6.2) is defined relative to the idealized point ρ=0, and relative to it ρmin does indeed bound the achievable, but in reality ρ=0 is not the empirical counterfactual. As of 2026, alignment research is financed by philanthropic foundations, government grants, and lab internal budgets without unified ρ^-filtering; allocation occurs across projects with unmeasured correlation to the capabilities component, and part of this funding empirically works in the dual-use direction, which Ren et al. (2024) record as the motivation for their metric. Accordingly, the relevant benchmark for evaluating the mechanism is the status quo.

Relative to the status quo, the subspace with ρmin ceases to be a subspace of pure loss. In the worst case, with a generously set ρ¯ and weak measurement protocol, the mechanism reproduces the existing distribution of alignment funding: λ˙ grows approximately as it does under current philanthropic channels. In the typical case, per-grant measurement of ρ^ provides finer resolution than the disjoint decisions of individual funds, and allocation monotonically improves relative to baseline. What we record in the formal part as a rollback to exogenous interventions (Dprivate, verification) on the subspace with ρmin in fact means a rollback to the same instruments which on this subspace already work in the status quo by default: the guarantee of Theorems 1–3 fails, but what would be in place without it is not made worse either.

The apparent asymmetry, that undirected alignment funding is credited as a plus in the current literature, while in our analysis the same channel figures as a constraint, is an artifact of the reference point. The same flow of funds receives three different signs depending on the benchmark: relative to inaction unconditionally positive, relative to the ideal ρ=0 bounded above, relative to the status quo a Pareto improvement, which turns this result from a formal one into an operational one.

Regulator Governance

The operational gate requires institutional conditions:

Transparency works as a commitment device (Schelling 1960; Kydland & Prescott 1977): all flows R(t), all allocation decisions, all participant-admission decisions are public; deviation from the declared mandate is immediately observable. This creates necessary but not sufficient conditions for resolving the Trust AI Regulation problem (Alalawi et al. 2026): transparency makes deviation observable, but coordinated response of participants (exit from the platform) itself is subject to the collective action problem, since a lab continuing to receive cooperation surplus has an individual incentive to stay even after public deviation by the regulator. Sufficiency is achieved by the same structural condition that suppresses bypass in (P4): small N, repeated interactions with high effective γ, and focal coordination in exclusionary behavior (Schelling 1960; Bernstein 1992; Greif 1993). Transparency of ρ^ plays a dual role here: in addition to controlling the orthogonal regime, it transforms the exit decision from a private choice into a publicly observable signal, making the coordinated response equilibrium by the same mechanism that supports (P4). Under these conditions, the de-escalating equilibrium is attainable. The bypass-suppression condition from (P4) is combined with the publicity of ρ^ in one signal: "the mechanism works in its declared regime" is equivalent to "ρ^ρ¯ is publicly confirmed".

M's decisions on calibration of r, allocation of R(t), and admission of participants are made through a delegative-voting procedure. The literature on liquid democracy (Kahng, Mackenzie & Procaccia 2021; Brill & Talmon 2018; Christoff & Grossi 2017) gives formal conditions for its stability. Participants have heterogeneous expertise in alignment relative to capabilities, the delegative structure allows peripheral participants to delegate decisions to experts, transparency of delegations preserves accountability. Among classes of governance (liquid democracy, futarchy Hanson 2013, quadratic voting Weyl 2017, sortition Landemore 2020), delegative structures form a family of focal Schelling equilibria under the conditions of transparency and voluntary participation. The same expert delegation is the institutional carrier of the ρ^ evaluation, which will be useful when adding new participants and forming the most accurate possible estimates of ρ^ through aggregation of diversified expert judgments (Hong & Page 2004; Surowiecki 2004), which is strictly better than the operational-gate status quo relative to the disjoint decisions of individual funds. With subsequent narrowing of ρ¯ as measurement standards consolidate, the distribution of funds improves monotonically relative to baseline; moreover, such a structure facilitates external and internal audits and is convenient for the application of benchmarks.

There are direct institutional analogs confirming feasibility. Self-regulatory organizations in the financial sector (DeMarzo, Fishman & Hagerty 2005) implement the same structure of financing and legitimation without coercion. Patent pools (Lerner & Tirole 2004) give an analog for the problem of joint IP use, with the caveat that Lerner & Tirole enumerate direct interest of participants in blocking the rest as a condition of pool failure. In our case, this condition is mitigated by (P4) and the orthogonal regime: blocking others through M requires bypassing the gate, which is observable under transparency, and exclusion from the subsequent flow of deals. A carbon tax with revenue recycling (Bovenberg & van der Ploeg 1994) gives an analog of the double dividend: the fee simultaneously internalizes the externality and finances a public good.

Governance closes the gap between the formal mechanism and its operational implementation. Under a transparent delegative structure, the gate ρ^ρ¯ works, (P4) holds, and the central results of §5–§6 acquire an institutional carrier. What remains to verify is robustness of the mechanism to two classes of threats unrelated to governance: nonlinear capability acceleration and transition to a full-information regime.

Non-stationary Capability and Information Regime

Under a correctly functioning orthogonal regime and governance, two classes of residual threats require separate analysis.

We address both classes and show that the first partially self-corrects via Pigouvian rate recalibration, leaving only the volatility channel as an open residual, and the second adds no independent fragility on top of the first. Non-stationarity of capability decomposes into two channels:

Here we also describe the internal tension of Pigouvian recalibration. Under continuous recalibration r(λ(t)) from (P3), the safety dynamics becomes a self-damping nonlinear ODE:

λ˙(t)=αM(t)D2(k1)μλ(t)eλ(t)τ,

where the factor eλτ chokes growth at large λ. The linear growth from §3 holds only under fixed r. This simultaneously strengthens the volatility residual (a slowing λ wins the race against accelerating capability less well) and provides a constructive lever: fixing r at the project level without downward recalibration preserves linear growth at the cost of moderate over-funding in alignment at late stages. This is an error in a benign direction. Thus, non-stationarity of capability progress narrows down to analysis of the volatility channel.

Verification systems can move the game to a full-information regime, which in the endgame accelerates the sprint to deployment — breakout instability. A straightforward reading as "full information requires a stricter r and earlier bootstrap" errs in the sign of the effect. By the transparency requirement of §7, the state of the fund (R(t),λ(t)) is public by construction in all information regimes — it is not a private signal disclosed by verification, but by definition common knowledge. Rivals' capability leads Vi are private without verification and public with it. The transition to full information removes the private informational advantage of rivals without changing the fund's position. The fund's relative informational position improves monotonically with growing openness.

Mechanically this sharpens the inversion of the leader's advantage described in §5. Under incomplete information, the collapse of the preemption incentive after λ(t)λ is delayed and noised by strategic uncertainty: a player continues racing without common knowledge that rivals will not deploy unsafely. Under public λ(t), the moment of crossing λ becomes common knowledge, and the preemption game unwinds by iterated dominance immediately and in coordination. Full information accelerates breakout, but equally accelerates the coordinated stand-down on which §5 is built; accordingly, the information regime adds no independent fragility.

After the two corrections, the residual reduces to the conjunction of the timing residual of the volatility channel with the interpretability residual of §6: common knowledge of λ(t) is useful exactly to the extent that it is common knowledge that R(t) buys safety, not capability — i.e., to the extent that ρ^ is publicly credible.

Open Problems

Government programs (Manhattan-style national AI initiatives of the US and China) can in principle bypass a voluntary regulator. This is an analog of the problem of international climate negotiations (Barrett 2003) and resistance of sovereign actors to soft-law regimes. However, there is a substantial mitigation specific to the AI domain: even in the absence of M, a direct match between major government programs is highly unlikely under current political conditions. Export restrictions, compute controls, outbound investment controls (Executive Order 14105 and analogs), CFIUS reviews, Chinese counter-symmetric restrictions turn any R&D cooperation into an act requiring multi-level approval with a predictably negative outcome. The public rhetoric of both sides frames AI development as geopolitical competition with a zero-sum component. State actors are de facto already in a non-cooperation equilibrium relative to each other, and the potential synergy multiplier for them is already lost — but not converted into alignment. The threat of bypass by state actors is less than it appears on a naive reading: they do not bypass M while keeping the cooperation gain, they are already outside of cooperation regardless of M. A more realistic picture is to view M as a network superstructure over those participants for whom inter-bloc matching is possible without violating export controls. Within a single bloc (US: OpenAI, Anthropic, Google DeepMind, xAI, Meta AI), state pressure or direct financing may induce ignoring M, especially under liability shield in exchange for coordination in a federal structure. On this reading, this direction remains open. One should also be wary that actors with vast unilateral resources may independently accelerate capability drift, potentially exceeding the preemption threshold.

Calibration of the Pigouvian rate requires estimation of λeλτ/μ and D, for which there is no canonical metric. This is an analog of the problem of estimating the social cost of carbon (Nordhaus 2017; Stern 2007). Robust calibration is possible through a principles-based approach with public revision. Calibration of r is coupled with the estimate of ρcrit from §6: the orthogonal-regime threshold ρ¯ must be revised together with r. The operational formula from (6.3) gives the minimum matchmaker throughput under given parameters:

(9.1)M¯0(ρ)=Lμ0r[τΓα(1ρ)Lκρ],ρ<ρcrit,

or equivalently normalized by baseline drift:

(9.2)rM¯0μ0=LτΓα(1ρ)Lκρ.

For empirical calibration the following are required: L=ln[D(12S)/(IS)] from scenario estimates of D,I,S; Γ=ln(VP/Vt0) from current market capitalization of frontier labs and (5.7); τ from researcher surveys; α from the historical effect of alignment investments; κ from the average capability-spillover per dollar of interpretability research from Ren et al. (2024); μ0 from compute scaling laws; ρ from the distribution of ρ^ across the portfolio, auditable per-grant. In the endgame regime as λ(t)λ, the Pigouvian rate takes the form rend=λIS2(k1)/[(12S)μ] via the substitution eL=IS/[D(12S)]. D drops out via this relation, and rend depends only on λ,I,S,k,μ. This provides a calibration point; intermediate values are interpolated via (P3) with time-varying λ(t). The minimum subsidy R(0) ensuring transition to a regime protected against bypass is the substitution M¯=M¯0(ρ) into computations of the expected volume of match-deals to reach the exclusion threshold (P4) as a subset of the calibration of M¯0.

Theorems 1–3 are formulated in the static formulation of thresholds, whereas in the full option game with time-varying λ(t) the thresholds are given by an HJB problem with non-constant coefficients, and although the heuristic gives a plausible sign, a rigorous generalization of value-function convexity (Grenadier 2002) to the non-stationary strategic case is not obtained in the present work.

Also, the two irreducible robustness residuals compound, and the orthogonal regime (formula 6.2) is the only construction in the work that breaks this feedback, and its effectiveness is bounded above precisely by ρmin. The early-stage binding constraint, separated from rate calibration, is consolidated here: at early stages, what binds is not the size of R(0) but the finding of a seed-funded portfolio within the orthogonal regime, ρ^<ρcrit. A well-capitalized regulator outside the regime is strictly worse than inaction (Theorem 3); a poorly capitalized one within it remains net-positive. The philanthropic seed is optimized by ρ^, not just by volume.

Сonclusion

We demonstrate that the proposed mechanism is capable of effectively de-escalating the AGI race, operating at several levels: from the individual benefit of particular labs to global change in safety dynamics.

At the micro level, we have shown that under characteristic capitalization parameters, participation in the matchmaker's work strategically dominates direct cooperation and its absence. This is ensured within the assumptions that the costs of establishing, agreeing IP-sharing, and bilateral verification of a direct deal are high under current conditions, and the benefit of joint use of compute and data substantially outweighs the levied fee. So that the mechanism does not undermine public safety, we computed the optimal size of the Pigouvian fee. It is calibrated so that the sum of collected funds exactly covers the additional risk of catastrophe arising from acceleration of progress under cooperation. Moreover, in the conditions of repeated interaction, labs will not attempt to bypass the matchmaker, since the risk of exclusion from future profitable deals makes honest play strategically dominant.

At the macro level, we proved closure of the Suicide Region in the static formulation of thresholds in finite time and bounded above the probability of premature entry into the region before its closure. To minimize the probability of a technological sprint, the mechanism's launch must occur before progress in capabilities becomes too rapid.

Sources