Effective planning horizon class

#TODO move this to math section after SPAR end.

Let JR:ΠR be a return function, π=argmaxπJR(π), the sequence {πk}kN is fixed, and JR(π)JR(π0). We define:

Heff(R)=min{k:JR(π)JR(πk)JR(π)JR(π0)ε}

Then we prove that:

(ε>0:Heff(R)=Heff(R))a0,bR:JR(π)=aJR(π)+bπ{π,π0}{πk}kN

Lemma

Let JR(π)JR(π0)0. Then:

(ε>0:Heff(R)=Heff(R))kN:f(k,R)=f(k,R)

where f(k,R):=JR(π)JR(πk)JR(π)JR(π0).

Proof

(): Suppose k:f(k,R)=f(k,R). Then ε:{k:f(k,R)ε}={k:f(k,R)ϵ}Heff(R)=Heff(R).

(): Assume there exists k0 such that f(k0,R)f(k0,R). Without loss of generality, let f(k0,R)<f(k0,R) (the case > is symmetric). Choose:

ε0(f(k0,R),f(k0,R))

Then k0 is included in the set {k:f(k,R)ε0}, but not in {k:f(k,R)ε0}. Consequently:

Heff(R)k0<Heff(R)ε=ε0:Heff(R)Heff(R)

From necessity and sufficiency, the lemma is proven.


Now, based on the lemma, it suffices to prove:

(k:f(k,R)=f(k,R)) a0,b:JR(π)=aJR(π)+b π{π,π0}{πk}kN

Let's introduce the notation: J:=JR, J:=JR, Δk:=J(π)J(πk), Δ0:=J(π)J(π0)0.

(): Suppose J(π)=aJ(π)+b for all π{π,π0}{πk}, with a0. Then for any k:

J(π)J(πk)=(aJ(π)+b)(aJ(πk)+b)=a(J(π)J(πk))=aΔkJ(π)J(π0)=a(J(π)J(π0))=aΔ0

Since a0, the denominator aΔ00, and:

f(k,R)=aΔkaΔ0=ΔkΔ0=f(k,R)

(): Suppose kN:f(k,R)=f(k,R). This means:

(1)J(π)J(πk)J(π)J(π0)=J(π)J(πk)J(π)J(π0)

Where J(π)J(π0), since otherwise f(k,R) would be undefined for all k, contradicting the assumption k:f(k,R)=f(k,R).

Let's define:

a:=J(π)J(π0)J(π)J(π0)b:=J(π)aJ(π)

From (1), by multiplying both sides by aΔ0:

J(π)J(πk)=a(J(π)J(πk))J(πk)=J(π)aJ(π)+aJ(πk)J(πk)=aJ(πk)+b

We also separately check the cases with π and π0:
aJ(π)+b=aJ(π)+J(π)aJ(π)=J(π)
From the definition of a, we have J(π0)=J(π)aΔ0=J(π)a(J(π)J(π0))=aJ(π0)+b.


Thus, from necessity and sufficiency, the relation J(π)=aJ(π)+b is proven for all π{π,π0}{πk}kN, where a0, meaning J is an affine transformation of J on the specified set.

Since the proof establishes that ε>0:Heff(R)=Heff(R) if and only if JR is an affine transformation of JR (a0) on π,π0πkkN, and STARC-equivalence requires JR=aJR+b with a>0 on the entirety of Π, it follows that STARC-equivalence is a strictly stronger condition. In particular, [R]STARC[R]Heff, because any a>0 is a special case of a0, and the condition on the entirety of Π implies the condition on a subset.