[PPT] - Motivating Benfords law by rotating a circle Motivating Benfords law PowerPoint Presentation

SLIDE 1

Motivating Benford’s law by rotating a circle

Motivating Benford’s law by rotating a circle 1 / 6

SLIDE 2

Consider a set of naturally-occurring data

Motivating Benford’s law by rotating a circle 2 / 6

SLIDE 3

Consider a set of naturally-occurring data—river lengths, for instance:

Motivating Benford’s law by rotating a circle 2 / 6

SLIDE 4

Consider a set of naturally-occurring data—river lengths, for instance: Benford’s law is the probability distribution that describes the first-digit frequency of a data set like this.

Motivating Benford’s law by rotating a circle 2 / 6

SLIDE 5

Consider a set of naturally-occurring data—river lengths, for instance: Benford’s law is the probability distribution that describes the first-digit frequency of a data set like this. p(ℓ) = log10 ℓ + 1 ℓ

,

ℓ ∈ {1, . . . , 9}.

Motivating Benford’s law by rotating a circle 2 / 6

SLIDE 6

In this presentation, we will focus on the powers of 2: {2k : k ∈ N0}.

Motivating Benford’s law by rotating a circle 3 / 6

SLIDE 7

In this presentation, we will focus on the powers of 2: {2k : k ∈ N0}. We will show that Benford’s law governs their first-digit frequency.

Motivating Benford’s law by rotating a circle 3 / 6

SLIDE 8

In this presentation, we will focus on the powers of 2: {2k : k ∈ N0}. We will show that Benford’s law governs their first-digit frequency. The heart of the argument relies on a theorem, and we need some definitions to understand it...

Motivating Benford’s law by rotating a circle 3 / 6

SLIDE 9

✶✷✳ Pr♦❜❛❜✐❧✐t② Pr❡s❡r✈✐♥❣ ❉②♥❛♠✐❝❛❧ ❙②st❡♠s ▼✉❝❤ ♦❢ ♦✉r ✐♥✈❡st✐❣❛t✐♦♥ ♦❢ ♣r♦❜❛❜✐❧✐t② t❤❡♦r② ❤❛s r❡✈♦❧✈❡❞ ❛r♦✉♥❞ t❤❡ ❧♦♥❣ t❡r♠ ❜❡❤❛✈✐♦r ♦❢ s❡q✉❡♥❝❡s ♦❢ r❛♥❞♦♠ ✈❛r✐❛❜❧❡s✳ ❲❡ ❝♦♥t✐♥✉❡ t❤✐s t❤❡♠❡ ✇✐t❤ ❛ ❝✉rs♦r② ❧♦♦❦ ❛t ❡r❣♦❞✐❝ t❤❡♦r②✳ ❘♦✉❣❤❧② s♣❡❛❦✐♥❣✱ ❡r❣♦❞✐❝ t❤❡♦r❡♠s ❛ss❡rt t❤❛t ✉♥❞❡r ❝❡rt❛✐♥ st❛❜✐❧✐t② ❛♥❞ ✐rr❡❞✉❝✐❜✐❧✐t② ❝♦♥❞✐t✐♦♥s t✐♠❡ ❛✈❡r❛❣❡s ❝♦♥✈❡r❣❡ t♦ s♣❛❝❡ ❛✈❡r❛❣❡s✳ ❆s ✉s✉❛❧✱ ✇❡ ❜❡❣✐♥ ✇✐t❤ s♦♠❡ ❞❡✜♥✐t✐♦♥s✳ ❉❡✜♥✐t✐♦♥✳ ❆ s❡q✉❡♥❝❡ X0, X1, ... ✐s s❛✐❞ t♦ ❜❡ st❛t✐♦♥❛r② ✐❢ (X0, X1, ...) =d (Xk, Xk+1, ...) ❢♦r ❛❧❧ k ∈ N✳ ❊q✉✐✈❛❧❡♥t❧②✱ X0, X1, ... ✐s st❛t✐♦♥❛r② ✐❢ ❢♦r ❡✈❡r② n, k ∈ N0✱ ✇❡ ❤❛✈❡ (X0, ..., Xn) =d (Xk, ..., Xn+k). ❲❡ ❤❛✈❡ ❛❧r❡❛❞② s❡❡♥ s❡✈❡r❛❧ ❡①❛♠♣❧❡s ♦❢ st❛t✐♦♥❛r② s❡q✉❡♥❝❡s✳ ❋♦r ✐♥st❛♥❝❡✱ ✐✳✐✳❞✳ s❡q✉❡♥❝❡s ❛r❡ st❛t✐♦♥❛r②✱ ❛♥❞ ♠♦r❡ ❣❡♥❡r❛❧❧②✱ s♦ ❛r❡ ❡①❝❤❛♥❣❡❛❜❧❡ s❡q✉❡♥❝❡s✳ ❆♥♦t❤❡r ❡①❛♠♣❧❡ ♦❢ ❛ st❛t✐♦♥❛r② s❡q✉❡♥❝❡ ✐s ❛ ▼❛r❦♦✈ ❝❤❛✐♥ X0, X1, ... st❛rt❡❞ ✐♥ ❡q✉✐❧✐❜r✐✉♠✳ ❚♦ tr❡❛t t❤❡ ❣❡♥❡r❛❧ ❝❛s❡✱ ✇❡ ✐♥tr♦❞✉❝❡ t❤❡ ❢♦❧❧♦✇✐♥❣ ❝♦♥str✉❝t✳ ❉❡✜♥✐t✐♦♥✳ ●✐✈❡♥ ❛ ♣r♦❜❛❜✐❧✐t② s♣❛❝❡ (Ω, F, P)✱ ❛ ♠❡❛s✉r❛❜❧❡ ♠❛♣ T : Ω → Ω ✐s s❛✐❞ t♦ ❜❡ ♣r♦❜❛❜✐❧✐t② ♣r❡s❡r✈✐♥❣ ✐❢ P

T −1A
= P(A) ❢♦r ❛❧❧ A ∈ F✱ ✇❤❡r❡ T −1A = {ω ∈ Ω : Tω ∈ A} ❞❡♥♦t❡s t❤❡ ♣r❡✐♠❛❣❡ ♦❢ A

✉♥❞❡r T✳ ❲❡ s❛② t❤❛t t❤❡ t✉♣❧❡ (Ω, F, P, T) ✐s ❛ ♣r♦❜❛❜✐❧✐t② ♣r❡s❡r✈✐♥❣ ❞②♥❛♠✐❝❛❧ s②st❡♠✳ ■t❡r❛t❡s ♦❢ T ❛♥❞ T −1 ❛r❡ ❞❡✜♥❡❞ ✐♥❞✉❝t✐✈❡❧② ❜② T 0ω = ω ❛♥❞✱ ❢♦r n ≥ 1✱ T n = T ◦ T n−1, T −n = T −1 ◦ T −(n−1) = (T n)−1. ✯ ❲❡ ✉s❡ t❤❡ ✐♥✈❡rs❡ ✐♠❛❣❡ ✐♥ ♦✉r ❞❡✜♥✐t✐♦♥s ❜❡❝❛✉s❡ A ∈ F ❞♦❡s ♥♦t ♥❡❝❡ss❛r✐❧② ✐♠♣❧② t❤❛t TA ∈ F✳ ❆❧s♦✱ ❜❡✇❛r❡ t❤❛t s♦♠❡ ❛✉t❤♦rs s❛② t❤❛t ✏P ✐s ❛♥ ✐♥✈❛r✐❛♥t ♠❡❛s✉r❡ ❢♦r T✑ r❛t❤❡r t❤❛♥ ✏T ♣r❡s❡r✈❡s P✳✑ ❋✐♥❛❧❧②✱ ♦❜s❡r✈❡ t❤❛t s✐♥❝❡ t❤❡ ♣✉s❤✲❢♦r✇❛r❞ ♠❡❛s✉r❡ T∗P = P ◦ T −1 ✐s ❡q✉❛❧ t♦ P✱ t❤❡ ❝❤❛♥❣❡ ♦❢ ✈❛r✐❛❜❧❡s ❢♦r♠✉❧❛ s❤♦✇s t❤❛t ˆ

Ω

f ◦ TdP = ˆ

Ω

fdT∗P = ˆ

Ω

fdP ❢♦r ❛❧❧ f ❢♦r ✇❤✐❝❤ t❤❡ ❧❛tt❡r ✐♥t❡❣r❛❧ ✐s ❞❡✜♥❡❞✳ ■❢ X ✐s ❛ r❛♥❞♦♠ ✈❛r✐❛❜❧❡ ♦♥ (Ω, F, P) ❛♥❞ T : Ω → Ω ✐s ♣r♦❜❛❜✐❧✐t② ♣r❡s❡r✈✐♥❣✱ t❤❡♥ Xn(ω) = X(T nω) ❞❡✜♥❡s ❛ st❛t✐♦♥❛r② s❡q✉❡♥❝❡ s✐♥❝❡ ❢♦r ❛♥② n, k ∈ N ❛♥❞ ❛♥② ❇♦r❡❧ s❡t B ∈ Bn+1✱ ✐❢ A = {ω : (X0(ω), ..., Xn(ω)) ∈ B}✱ t❤❡♥ P ((Xk, ..., Xn+k) ∈ B) = P

T −kA
= P (A) = P ((X0, ..., Xn) ∈ B) .

■♥ ❢❛❝t✱ ❡✈❡r② st❛t✐♦♥❛r② s❡q✉❡♥❝❡ t❛❦✐♥❣ ✈❛❧✉❡s ✐♥ ❛ ♥✐❝❡ s♣❛❝❡ ❝❛♥ ❜❡ ❡①♣r❡ss❡❞ ✐♥ t❤✐s ❢♦r♠✿ ■❢ Y0, Y1, ... ✐s ❛ st❛t✐♦♥❛r② s❡q✉❡♥❝❡ ♦❢ r❛♥❞♦♠ ✈❛r✐❛❜❧❡s t❛❦✐♥❣ ✈❛❧✉❡s ✐♥ ❛ ♥✐❝❡ s♣❛❝❡ (S, S)✱ t❤❡♥ t❤❡ ❑♦❧✲ ♠♦❣♦r♦✈ ❡①t❡♥s✐♦♥ t❤❡♦r❡♠ ❣✐✈❡s ❛ ♠❡❛s✉r❡ P ♦♥

SN0, SN0

s✉❝❤ t❤❛t t❤❡ ❝♦♦r❞✐♥❛t❡ ♣r♦❥❡❝t✐♦♥s Xn(ω) = ωn s❛t✐s❢② (X0, X1, ...) =d (Y0, Y1, ...)✳ ■❢ ✇❡ ❧❡t X = X0 ❛♥❞ T = θ ✭t❤❡ s❤✐❢t ♠❛♣✮✱ t❤❡♥ T ✐s ♣r♦❜❛❜✐❧✐t② ♣r❡✲ s❡r✈✐♥❣ ❛♥❞ Xn(ω) = ωn = (θnω)0 = X(T nω)✳

✼✻

SLIDE 10

■♥ ❧✐❣❤t ♦❢ t❤❡ ♣r❡❝❡❞✐♥❣ ♦❜s❡r✈❛t✐♦♥s✱ ✇❡ ✇✐❧❧ ❛ss✉♠❡ ❤❡♥❝❡❢♦rt❤ t❤❛t ✇❡ ❛r❡ ✇♦r❦✐♥❣ ✇✐t❤ st❛t✐♦♥❛r② s❡✲ q✉❡♥❝❡s ♦❢ t❤❡ ❢♦r♠ Xn(ω) = X(T nω) ❢♦r s♦♠❡ (S, S)✲✈❛❧✉❡❞ r❛♥❞♦♠ ✈❛r✐❛❜❧❡ X ❞❡✜♥❡❞ ♦♥ ❛ ♣r♦❜❛❜✐❧✐t② s♣❛❝❡ (Ω, F, P) ❛♥❞ s♦♠❡ ♣r♦❜❛❜✐❧✐t② ♣r❡s❡r✈✐♥❣ ♠❛♣ T : Ω → Ω✳ ❋♦r s✉❜s❡q✉❡♥t r❡s✉❧ts✱ ✇❡ ✇✐❧❧ ♥❡❡❞ ❛ ❢❡✇ ♠♦r❡ ❞❡✜♥✐t✐♦♥s✳ ❉❡✜♥✐t✐♦♥✳ ▲❡t (Ω, F, P, T) ❜❡ ❛ ♣r♦❜❛❜✐❧✐t② ♣r❡s❡r✈✐♥❣ ❞②♥❛♠✐❝❛❧ s②st❡♠✳ ❲❡ s❛② t❤❛t ❛♥ ❡✈❡♥t A ∈ F ✐s ✐♥✈❛r✐❛♥t ✐❢ T −1A = A ✉♣ t♦ ♥✉❧❧ s❡ts ✲ t❤❛t ✐s✱ P

T −1A
△A
= 0 ✇❤❡r❡ △ ❞❡♥♦t❡s t❤❡ s②♠♠❡tr✐❝

❞✐✛❡r❡♥❝❡✱ E△F = (E \ F) ∪ (F \ E)✳ ❆ r❛♥❞♦♠ ✈❛r✐❛❜❧❡ X ✐s ❝❛❧❧❡❞ ✐♥✈❛r✐❛♥t ✐❢ X ◦ T = X ❛✳s✳ ■t ✐s ❧❡❢t ❛s ❛♥ ❡①❡r❝✐s❡ t♦ s❤♦✇ Pr♦♣♦s✐t✐♦♥ ✶✷✳✶✳ I = {A ∈ F : A ✐s ✐♥✈❛r✐❛♥t} ✐s ❛ s✉❜✲σ✲✜❡❧❞ ♦❢ F✱ ❛♥❞ X ∈ I ✐❢ ❛♥❞ ♦♥❧② ✐❢ X ✐s ✐♥✈❛r✐❛♥t✳ ❉❡✜♥✐t✐♦♥✳ ❲❡ s❛② t❤❛t T ✐s ❡r❣♦❞✐❝ ✐❢ ❢♦r ❡✈❡r② ✐♥✈❛r✐❛♥t ❡✈❡♥t A ∈ I✱ ✇❡ ❤❛✈❡ P(A) ∈ {0, 1}✳ ❊r❣♦❞✐❝✐t② ✐s ❛ ❦✐♥❞ ♦❢ ✐rr❡❞✉❝✐❜✐❧✐t② r❡q✉✐r❡♠❡♥t✿ T ✐s ❡r❣♦❞✐❝ ✐❢ Ω ❝❛♥♥♦t ❜❡ ❞❡❝♦♠♣♦s❡❞ ❛s Ω = A ⊔ B ✇✐t❤ A, B ∈ I ❛♥❞ P(A), P(B) > 0✳ ❊r❣♦❞✐❝ ♠❛♣s ❛r❡ ❣♦♦❞ ❛t ♠✐①✐♥❣ t❤✐♥❣s ✉♣ ✐♥ t❤❡ s❡♥s❡ t❤❛t t❤❡② ❞♦♥✬t ✜① ♥♦♥tr✐✈✐❛❧ s✉❜s❡ts✳ ❆ ✉s❡❢✉❧ t❡st ❢♦r ❡r❣♦❞✐❝✐t② ✐s ❣✐✈❡♥ ❜② Pr♦♣♦s✐t✐♦♥ ✶✷✳✷✳ T ✐s ❡r❣♦❞✐❝ ✐❢ ❛♥❞ ♦♥❧② ✐❢ ❡✈❡r② ✐♥✈❛r✐❛♥t X : Ω → R ✐s ❛✳s✳ ❝♦♥st❛♥t✳ Pr♦♦❢✳ ❙✉♣♣♦s❡ t❤❛t T ✐s ❡r❣♦❞✐❝ ❛♥❞ X ◦ T = X ❛✳s✳ ❋♦r ❛♥② a ∈ R✱ t❤❡ s❡t Ea = {ω ∈ Ω : X(ω) < a} ✐s ❝❧❡❛r❧② ✐♥✈❛r✐❛♥t s✐♥❝❡ T −1Ea = {ω : X(Tω) < a} = Ea ❛✳s✳ ❊r❣♦❞✐❝✐t② ✐♠♣❧✐❡s t❤❛t P(Ea) ∈ {0, 1} ❢♦r ❛❧❧ a ∈ R✱ ❤❡♥❝❡ X ✐s ❛✳s✳ ❝♦♥st❛♥t✳ ❈♦♥✈❡rs❡❧②✱ s✉♣♣♦s❡ t❤❛t t❤❡ ♦♥❧② ✐♥✈❛r✐❛♥t r❛♥❞♦♠ ✈❛r✐❛❜❧❡s ❛r❡ ❛✳s✳ ❝♦♥st❛♥t✱ ❛♥❞ ❧❡t A ∈ I✳ ❚❤❡♥ 1A ✐s ❛♥ ✐♥✈❛r✐❛♥t r❛♥❞♦♠ ✈❛r✐❛❜❧❡✱ ❛♥❞ t❤✉s ✐s ❛✳s✳ ❝♦♥st❛♥t✱ ❛♥❞ ✇❡ ❝♦♥❝❧✉❞❡ t❤❛t P(A) ∈ {0, 1}✳

◆♦t❡ t❤❛t t❤❡ ❛❜♦✈❡ ♣r♦♦❢ ❛❧s♦ ❤♦❧❞s ✐❢ ✇❡ r❡str✐❝t ♦✉r ❛tt❡♥t✐♦♥ t♦ ❝❧❛ss❡s ♦❢ r❛♥❞♦♠ ✈❛r✐❛❜❧❡s ❝♦♥t❛✐♥✐♥❣

t❤❡ ✐♥❞✐❝❛t♦r ❢✉♥❝t✐♦♥s✱ s✉❝❤ ❛s X ∈ Lp (Ω, F, P)✳ ✯ ❋♦r t❤❡ s❛❦❡ ♦❢ ❝❧❛r✐t②✱ ✇❡ ✇✐❧❧ s♦♠❡t✐♠❡s ✉s❡ t❤❡ ♥♦t❛t✐♦♥ ♦❢ ❢✉♥❝t✐♦♥s r❛t❤❡r t❤❛♥ r❛♥❞♦♠ ✈❛r✐❛❜❧❡s✱ ❜✉t ✉❧t✐♠❛t❡❧② t❤❡ t✇♦ ❛r❡ t❤❡ s❛♠❡✳ ❊①❛♠♣❧❡ ✶✷✳✶ ✭❘♦t❛t✐♦♥ ♦❢ t❤❡ ❈✐r❝❧❡✮✳ ▲❡t (Ω, F, P) ❜❡ [0, 1) ✇✐t❤ t❤❡ ❇♦r❡❧ s❡ts ❛♥❞ ▲❡❜❡s❣✉❡ ♠❡❛s✉r❡✳ ❋♦r α ∈ (0, 1)✱ ❞❡✜♥❡ Tα : Ω → Ω ❜② Tαx = x + α (♠♦❞ 1)✳ Tα ✐s ❝❧❡❛r❧② ♠❡❛s✉r❛❜❧❡ ❛♥❞ ♣r♦❜❛❜✐❧✐t② ♣r❡s❡r✈✐♥❣✳ ✯ ❲❡ ❝♦♥s✐❞❡r Tα ❛ r♦t❛t✐♦♥ s✐♥❝❡ [0, 1) ♠❛② ❜❡ ✐❞❡♥t✐✜❡❞ ✇✐t❤ T = {z ∈ C : |z| = 1} ✈✐❛ t❤❡ ♠❛♣ x → e2πix✳ ■❢ α ∈ Q✱ t❤❡♥ α = m

n ❢♦r s♦♠❡ m, n ∈ N ✇✐t❤ (m, n) = 1✱ s♦ ❢♦r ❛♥② ♠❡❛s✉r❛❜❧❡ B ∈

0, 1

2n

✇✐t❤ P(B) > 0✱

t❤❡ s❡t O(B) = n−1

k=0

B + k

n (♠♦❞ 1)

✐s ✐♥✈❛r✐❛♥t ✇✐t❤ P (O(B)) ∈ (0, 1)✱ t❤✉s Tα ✐s ♥♦t ❡r❣♦❞✐❝✳

✭❆❧t❡r♥❛t✐✈❡❧②✱ t❤❡ ❢✉♥❝t✐♦♥ f(x) = e2πinx ✐s ♥♦♥❝♦♥st❛♥t ❛♥❞ ✐♥✈❛r✐❛♥t✳✮

✼✼

SLIDE 11

The Pointwise Ergodic Theorem (Birkhoff’s)

Suppose T is probability preserving and ergodic. Then, for any f ∈ L1, lim

n→∞

1 n

n−1

k=0

f ◦ T k(x) = E[f ] a.s. and in L1.

Motivating Benford’s law by rotating a circle 4 / 6

SLIDE 12

The Pointwise Ergodic Theorem (Birkhoff’s)

Suppose T is probability preserving and ergodic. Then, for any f ∈ L1, lim

n→∞

1 n

n−1

k=0

f ◦ T k(x) = E[f ] a.s. and in L1.

Motivating Benford’s law by rotating a circle 4 / 6

SLIDE 13

Intuitively,

The Pointwise Ergodic Theorem (Birkhoff’s)

Suppose T is probability preserving and ergodic. Then, for any f ∈ L1, lim

n→∞

1 n

n−1

k=0

f ◦ T k(x) = E[f ] a.s. and in L1. is like

The Fundamental Theorem of Calculus

For any f ∈ C 0, lim

h→0

1 h

[x,x+h]

f (y) dy = lim

h→0

F(x + h) − F(x) h = f (x).

Motivating Benford’s law by rotating a circle 5 / 6

SLIDE 14

■♥ ❧✐❣❤t ♦❢ t❤❡ ♣r❡❝❡❞✐♥❣ ♦❜s❡r✈❛t✐♦♥s✱ ✇❡ ✇✐❧❧ ❛ss✉♠❡ ❤❡♥❝❡❢♦rt❤ t❤❛t ✇❡ ❛r❡ ✇♦r❦✐♥❣ ✇✐t❤ st❛t✐♦♥❛r② s❡✲ q✉❡♥❝❡s ♦❢ t❤❡ ❢♦r♠ Xn(ω) = X(T nω) ❢♦r s♦♠❡ (S, S)✲✈❛❧✉❡❞ r❛♥❞♦♠ ✈❛r✐❛❜❧❡ X ❞❡✜♥❡❞ ♦♥ ❛ ♣r♦❜❛❜✐❧✐t② s♣❛❝❡ (Ω, F, P) ❛♥❞ s♦♠❡ ♣r♦❜❛❜✐❧✐t② ♣r❡s❡r✈✐♥❣ ♠❛♣ T : Ω → Ω✳ ❋♦r s✉❜s❡q✉❡♥t r❡s✉❧ts✱ ✇❡ ✇✐❧❧ ♥❡❡❞ ❛ ❢❡✇ ♠♦r❡ ❞❡✜♥✐t✐♦♥s✳ ❉❡✜♥✐t✐♦♥✳ ▲❡t (Ω, F, P, T) ❜❡ ❛ ♣r♦❜❛❜✐❧✐t② ♣r❡s❡r✈✐♥❣ ❞②♥❛♠✐❝❛❧ s②st❡♠✳ ❲❡ s❛② t❤❛t ❛♥ ❡✈❡♥t A ∈ F ✐s ✐♥✈❛r✐❛♥t ✐❢ T −1A = A ✉♣ t♦ ♥✉❧❧ s❡ts ✲ t❤❛t ✐s✱ P

T −1A
△A
= 0 ✇❤❡r❡ △ ❞❡♥♦t❡s t❤❡ s②♠♠❡tr✐❝

❞✐✛❡r❡♥❝❡✱ E△F = (E \ F) ∪ (F \ E)✳ ❆ r❛♥❞♦♠ ✈❛r✐❛❜❧❡ X ✐s ❝❛❧❧❡❞ ✐♥✈❛r✐❛♥t ✐❢ X ◦ T = X ❛✳s✳ ■t ✐s ❧❡❢t ❛s ❛♥ ❡①❡r❝✐s❡ t♦ s❤♦✇ Pr♦♣♦s✐t✐♦♥ ✶✷✳✶✳ I = {A ∈ F : A ✐s ✐♥✈❛r✐❛♥t} ✐s ❛ s✉❜✲σ✲✜❡❧❞ ♦❢ F✱ ❛♥❞ X ∈ I ✐❢ ❛♥❞ ♦♥❧② ✐❢ X ✐s ✐♥✈❛r✐❛♥t✳ ❉❡✜♥✐t✐♦♥✳ ❲❡ s❛② t❤❛t T ✐s ❡r❣♦❞✐❝ ✐❢ ❢♦r ❡✈❡r② ✐♥✈❛r✐❛♥t ❡✈❡♥t A ∈ I✱ ✇❡ ❤❛✈❡ P(A) ∈ {0, 1}✳ ❊r❣♦❞✐❝✐t② ✐s ❛ ❦✐♥❞ ♦❢ ✐rr❡❞✉❝✐❜✐❧✐t② r❡q✉✐r❡♠❡♥t✿ T ✐s ❡r❣♦❞✐❝ ✐❢ Ω ❝❛♥♥♦t ❜❡ ❞❡❝♦♠♣♦s❡❞ ❛s Ω = A ⊔ B ✇✐t❤ A, B ∈ I ❛♥❞ P(A), P(B) > 0✳ ❊r❣♦❞✐❝ ♠❛♣s ❛r❡ ❣♦♦❞ ❛t ♠✐①✐♥❣ t❤✐♥❣s ✉♣ ✐♥ t❤❡ s❡♥s❡ t❤❛t t❤❡② ❞♦♥✬t ✜① ♥♦♥tr✐✈✐❛❧ s✉❜s❡ts✳ ❆ ✉s❡❢✉❧ t❡st ❢♦r ❡r❣♦❞✐❝✐t② ✐s ❣✐✈❡♥ ❜② Pr♦♣♦s✐t✐♦♥ ✶✷✳✷✳ T ✐s ❡r❣♦❞✐❝ ✐❢ ❛♥❞ ♦♥❧② ✐❢ ❡✈❡r② ✐♥✈❛r✐❛♥t X : Ω → R ✐s ❛✳s✳ ❝♦♥st❛♥t✳ Pr♦♦❢✳ ❙✉♣♣♦s❡ t❤❛t T ✐s ❡r❣♦❞✐❝ ❛♥❞ X ◦ T = X ❛✳s✳ ❋♦r ❛♥② a ∈ R✱ t❤❡ s❡t Ea = {ω ∈ Ω : X(ω) < a} ✐s ❝❧❡❛r❧② ✐♥✈❛r✐❛♥t s✐♥❝❡ T −1Ea = {ω : X(Tω) < a} = Ea ❛✳s✳ ❊r❣♦❞✐❝✐t② ✐♠♣❧✐❡s t❤❛t P(Ea) ∈ {0, 1} ❢♦r ❛❧❧ a ∈ R✱ ❤❡♥❝❡ X ✐s ❛✳s✳ ❝♦♥st❛♥t✳ ❈♦♥✈❡rs❡❧②✱ s✉♣♣♦s❡ t❤❛t t❤❡ ♦♥❧② ✐♥✈❛r✐❛♥t r❛♥❞♦♠ ✈❛r✐❛❜❧❡s ❛r❡ ❛✳s✳ ❝♦♥st❛♥t✱ ❛♥❞ ❧❡t A ∈ I✳ ❚❤❡♥ 1A ✐s ❛♥ ✐♥✈❛r✐❛♥t r❛♥❞♦♠ ✈❛r✐❛❜❧❡✱ ❛♥❞ t❤✉s ✐s ❛✳s✳ ❝♦♥st❛♥t✱ ❛♥❞ ✇❡ ❝♦♥❝❧✉❞❡ t❤❛t P(A) ∈ {0, 1}✳

◆♦t❡ t❤❛t t❤❡ ❛❜♦✈❡ ♣r♦♦❢ ❛❧s♦ ❤♦❧❞s ✐❢ ✇❡ r❡str✐❝t ♦✉r ❛tt❡♥t✐♦♥ t♦ ❝❧❛ss❡s ♦❢ r❛♥❞♦♠ ✈❛r✐❛❜❧❡s ❝♦♥t❛✐♥✐♥❣

t❤❡ ✐♥❞✐❝❛t♦r ❢✉♥❝t✐♦♥s✱ s✉❝❤ ❛s X ∈ Lp (Ω, F, P)✳ ✯ ❋♦r t❤❡ s❛❦❡ ♦❢ ❝❧❛r✐t②✱ ✇❡ ✇✐❧❧ s♦♠❡t✐♠❡s ✉s❡ t❤❡ ♥♦t❛t✐♦♥ ♦❢ ❢✉♥❝t✐♦♥s r❛t❤❡r t❤❛♥ r❛♥❞♦♠ ✈❛r✐❛❜❧❡s✱ ❜✉t ✉❧t✐♠❛t❡❧② t❤❡ t✇♦ ❛r❡ t❤❡ s❛♠❡✳ ❊①❛♠♣❧❡ ✶✷✳✶ ✭❘♦t❛t✐♦♥ ♦❢ t❤❡ ❈✐r❝❧❡✮✳ ▲❡t (Ω, F, P) ❜❡ [0, 1) ✇✐t❤ t❤❡ ❇♦r❡❧ s❡ts ❛♥❞ ▲❡❜❡s❣✉❡ ♠❡❛s✉r❡✳ ❋♦r α ∈ (0, 1)✱ ❞❡✜♥❡ Tα : Ω → Ω ❜② Tαx = x + α (♠♦❞ 1)✳ Tα ✐s ❝❧❡❛r❧② ♠❡❛s✉r❛❜❧❡ ❛♥❞ ♣r♦❜❛❜✐❧✐t② ♣r❡s❡r✈✐♥❣✳ ✯ ❲❡ ❝♦♥s✐❞❡r Tα ❛ r♦t❛t✐♦♥ s✐♥❝❡ [0, 1) ♠❛② ❜❡ ✐❞❡♥t✐✜❡❞ ✇✐t❤ T = {z ∈ C : |z| = 1} ✈✐❛ t❤❡ ♠❛♣ x → e2πix✳ ■❢ α ∈ Q✱ t❤❡♥ α = m

n ❢♦r s♦♠❡ m, n ∈ N ✇✐t❤ (m, n) = 1✱ s♦ ❢♦r ❛♥② ♠❡❛s✉r❛❜❧❡ B ∈

0, 1

2n

✇✐t❤ P(B) > 0✱

t❤❡ s❡t O(B) = n−1

k=0

B + k

n (♠♦❞ 1)

✐s ✐♥✈❛r✐❛♥t ✇✐t❤ P (O(B)) ∈ (0, 1)✱ t❤✉s Tα ✐s ♥♦t ❡r❣♦❞✐❝✳

✭❆❧t❡r♥❛t✐✈❡❧②✱ t❤❡ ❢✉♥❝t✐♦♥ f(x) = e2πinx ✐s ♥♦♥❝♦♥st❛♥t ❛♥❞ ✐♥✈❛r✐❛♥t✳✮

✼✼

SLIDE 15

Let Nℓ(n) = { 20, 21, . . . , 2n−1 : ℓ is the first digit }.

Motivating Benford’s law by rotating a circle 6 / 6

SLIDE 16

Let Nℓ(n) = { 20, 21, . . . , 2n−1 : ℓ is the first digit }.

◮ For example, if n = 3 and ℓ = 2, N2(3) = {21}. Motivating Benford’s law by rotating a circle 6 / 6

SLIDE 17

Let Nℓ(n) = { 20, 21, . . . , 2n−1 : ℓ is the first digit }.

◮ For example, if n = 3 and ℓ = 2, N2(3) = {21}.

For the first n powers of 2, the frequency of ℓ as a first digit is |Nℓ(n)| n = 1 n

n−1

k=0

1{2k has first digit ℓ}.

Motivating Benford’s law by rotating a circle 6 / 6

SLIDE 18

Let Nℓ(n) = { 20, 21, . . . , 2n−1 : ℓ is the first digit }.

◮ For example, if n = 3 and ℓ = 2, N2(3) = {21}.

For the first n powers of 2, the frequency of ℓ as a first digit is |Nℓ(n)| n = 1 n

n−1

k=0

1{2k has first digit ℓ}. Of course, 2k has first digit ℓ if—for the appopriate m— ℓ · 10m ≤ 2k < (ℓ + 1) · 10m

Motivating Benford’s law by rotating a circle 6 / 6

SLIDE 19

Let Nℓ(n) = { 20, 21, . . . , 2n−1 : ℓ is the first digit }.

◮ For example, if n = 3 and ℓ = 2, N2(3) = {21}.

For the first n powers of 2, the frequency of ℓ as a first digit is |Nℓ(n)| n = 1 n

n−1

k=0

1{2k has first digit ℓ}. Of course, 2k has first digit ℓ if—for the appopriate m— ℓ · 10m ≤ 2k < (ℓ + 1) · 10m ⇐ ⇒ log10(ℓ) + m ≤ k log10 2 < log10(ℓ + 1) + m.

Motivating Benford’s law by rotating a circle 6 / 6

SLIDE 20

Let Nℓ(n) = { 20, 21, . . . , 2n−1 : ℓ is the first digit }.

◮ For example, if n = 3 and ℓ = 2, N2(3) = {21}.

For the first n powers of 2, the frequency of ℓ as a first digit is |Nℓ(n)| n = 1 n

n−1

k=0

1{2k has first digit ℓ}. Of course, 2k has first digit ℓ if—for the appopriate m— ℓ · 10m ≤ 2k < (ℓ + 1) · 10m ⇐ ⇒ log10(ℓ) + m ≤ k log10 2 < log10(ℓ + 1) + m.

Motivating Benford’s law by rotating a circle 6 / 6

SLIDE 21

Let Nℓ(n) = { 20, 21, . . . , 2n−1 : ℓ is the first digit }.

◮ For example, if n = 3 and ℓ = 2, N2(3) = {21}.

For the first n powers of 2, the frequency of ℓ as a first digit is |Nℓ(n)| n = 1 n

n−1

k=0

1{2k has first digit ℓ}. Of course, 2k has first digit ℓ if—for the appopriate m— ℓ · 10m ≤ 2k < (ℓ + 1) · 10m ⇐ ⇒ log10(ℓ) ≤ k log10 2 (mod 1) < log10(ℓ + 1).

Motivating Benford’s law by rotating a circle 6 / 6

SLIDE 22

Let Nℓ(n) = { 20, 21, . . . , 2n−1 : ℓ is the first digit }.

◮ For example, if n = 3 and ℓ = 2, N2(3) = {21}.

For the first n powers of 2, the frequency of ℓ as a first digit is |Nℓ(n)| n = 1 n

n−1

k=0

1{2k has first digit ℓ}. Of course, 2k has first digit ℓ if—for the appopriate m— ℓ · 10m ≤ 2k < (ℓ + 1) · 10m ⇐ ⇒ log10(ℓ) ≤ T k

α0 < log10(ℓ + 1),

α = log10 2.

Motivating Benford’s law by rotating a circle 6 / 6

SLIDE 23

Let Nℓ(n) = { 20, 21, . . . , 2n−1 : ℓ is the first digit }.

◮ For example, if n = 3 and ℓ = 2, N2(3) = {21}.

For the first n powers of 2, the frequency of ℓ as a first digit is |Nℓ(n)| n = 1 n

n−1

k=0

1{T k

α(0) ∈

log10(ℓ), log10(ℓ + 1)
}

Motivating Benford’s law by rotating a circle 6 / 6

SLIDE 24

Let Nℓ(n) = { 20, 21, . . . , 2n−1 : ℓ is the first digit }.

◮ For example, if n = 3 and ℓ = 2, N2(3) = {21}.

For the first n powers of 2, the frequency of ℓ as a first digit is |Nℓ(n)| n = 1 n

n−1

k=0

1{T k

α(0) ∈

log10(ℓ), log10(ℓ + 1)
},

which → m(

log10(ℓ), log10(ℓ + 1)
) = log10(ℓ + 1) − log10(ℓ),

by the pointwise ergodic theorem.

Motivating Benford’s law by rotating a circle 6 / 6

SLIDE 25

Let Nℓ(n) = { 20, 21, . . . , 2n−1 : ℓ is the first digit }.

◮ For example, if n = 3 and ℓ = 2, N2(3) = {21}.

For the first n powers of 2, the frequency of ℓ as a first digit is |Nℓ(n)| n = 1 n

n−1

k=0

1{T k

α(0) ∈

log10(ℓ), log10(ℓ + 1)
},

which → m(

log10(ℓ), log10(ℓ + 1)
) = log10(ℓ + 1) − log10(ℓ) = p(ℓ),

by the pointwise ergodic theorem.

Motivating Benford’s law by rotating a circle 6 / 6

SLIDE 26

Ergodic Theorem Details

Ergodic Theorem Details 1 / 7

SLIDE 27

More generally,

The Pointwise Ergodic Theorem (Birkhoff’s)

Suppose T is probability preserving. Then, for any f ∈ L1, lim

n→∞

1 n

n−1

k=0

f ◦ T k(x) = E[f |I] a.s. and in L1.

Ergodic Theorem Details 2 / 7

SLIDE 28

More generally,

The Pointwise Ergodic Theorem (Birkhoff’s)

Suppose T is probability preserving. Then, for any f ∈ L1, lim

n→∞

1 n

n−1

k=0

f ◦ T k(x) = E[f |I] a.s. and in L1. Recall: T ergodic = ⇒ A ∈ I has P(A) = 0 or 1.

Ergodic Theorem Details 2 / 7

SLIDE 29

More generally,

The Pointwise Ergodic Theorem (Birkhoff’s)

Suppose T is probability preserving. Then, for any f ∈ L1, lim

n→∞

1 n

n−1

k=0

f ◦ T k(x) = E[f |I] a.s. and in L1. Recall: T ergodic = ⇒ A ∈ I has P(A) = 0 or 1. Ω ∈ I trivially, so T ergodic = ⇒ E[f |I] = E[f ].

Ergodic Theorem Details 2 / 7

SLIDE 30

Proof.

By drawing an analogy to:

Ergodic Theorem Details 3 / 7

SLIDE 31

Proof.

By drawing an analogy to:

Lebesgue’s Differentiation Theorem

For any f ∈ L1, lim

m(B)→0 x∈B

1 m(B)

B

f (y) dy = f (x) a.s..

Ergodic Theorem Details 3 / 7

SLIDE 32

Proof.

By drawing an analogy to:

Lebesgue’s Differentiation Theorem

For any f ∈ L1, lim

m(B)→0 x∈B

1 m(B)

B

f (y) dy = f (x) a.s.. Generalizes the Fundamental Theorem of Calculus. Concerns the limiting behavior of an average—i.e., the “sum” of f over a set contatining x the set’s size as m(B) → 0.

Ergodic Theorem Details 3 / 7

SLIDE 33

Proof.

By drawing an analogy to:

Lebesgue’s Differentiation Theorem

For any f ∈ L1, lim

m(B)→0 x∈B

1 m(B)

B

f (y) dy = f (x) a.s.. Generalizes the Fundamental Theorem of Calculus. Concerns the limiting behavior of an average—i.e., the “sum” of f over a set contatining x the set’s size as m(B) → 0. Similarly, Birkhoff’s theorem concerns the limiting behavior of the sum of f over a “T chain” started at x the chain’s length .

Ergodic Theorem Details 3 / 7

SLIDE 34

Lebesgue’s Differentiation Theorem

For any f ∈ L1, lim

m(B)→0 x∈B

1 m(B)

B

f (y) dy = f (x) a.s..

Birkhoff’s Ergodic Theorem (Theorem 13.3)

Suppose T is probability preserving. Then, for any f ∈ L1, lim

n→∞

1 n

n−1

k=0

f ◦ T k(x) = E[f |I] a.s. and in L1.

Ergodic Theorem Details 4 / 7

SLIDE 35

Lebesgue’s Differentiation Theorem

For any f ∈ L1, lim

m(B)→0 x∈B

1 m(B)

B

f (y) dy = f (x) a.s..

Birkhoff’s Ergodic Theorem (Theorem 13.3)

Suppose T is probability preserving. Then, for any f ∈ L1, lim

n→∞

1 n

n−1

k=0

f ◦ T k(x) = E[f |I] a.s. and in L1. Given the statement similarities, it might not be completely surprising that their proofs parallel one another...

Ergodic Theorem Details 4 / 7

SLIDE 36

1. Define a maximal operator.

◮ Stein and Shakarchi’s Real Analysis is on the left, and our class notes

are on the right. (See Chapter 6 of the former for an even closer proof.)

100

Chapter 3. DIFFERENTIATION AND INTEGRATION

Suppose f is integrable on Rd. Is it true that lim

m(B) → 0 x ∈ B

1 m(B)

B

f(y) dy = f(x), for a.e. x? The limit is taken as the volume of open balls B containing x goes to 0. We shall refer to this question as the averaging problem. We remark that if B is any ball of radius r in Rd, then m(B) = vdrd, where vd is the measure of the unit ball. (See Exercise 14 in the previous chapter.) Note of course that in the special case when f is continuous at x , the limit does converge to f(x). Indeed, given ǫ > 0, there exists δ > 0 such that |f(x) − f(y)| < ǫ whenever |x − y| < δ. Since f(x) − 1 m(B)

B

f(y) dy = 1 m(B)

B

(f(x) − f(y)) dy, we find that whenever B is a ball of radius < δ/2 that contains x, then

f(x) −

1 m(B)

B

f(y) dy

≤

1 m(B)

B

|f(x) − f(y)| dy < ǫ, as desired. The averaging problem has an affirmative answer, but to establish that fact, which is qualitative in nature, we need to make some quantitative estimates bearing on the overall behavior of the averages of f. This will be done in terms of the maximal averages of |f|, to which we now turn. 1.1 The Hardy-Littlewood maximal function The maximal function that we consider below arose first in the one- dimensional situation treated by Hardy and Littlewood. It seems that they were led to the study of this function by toying with the question

f how a batsman’s score in cricket may best be distributed to maximize

his satisfaction. As it turns out, the concepts involved have a universal significance in analysis. The relevant definition is as follows. If f is integrable on Rd, we define its maximal function f ∗ by f ∗(x) = sup

x∈B

1 m(B)

B

|f(y)| dy, x ∈ Rd, where the supremum is taken over all balls containing the point x. In

ther words, we replace the limit in the statement of the averaging prob-

lem by a supremum, and f by its absolute value.

]

❇❡❝❛✉s❡ FN(x) = max1≤k≤N f k(x) ❢♦r x ∈ MN(f)✱ t❤✐s s❤♦✇s t❤❛t f(x) ≥ FN(x) − FN(Tx) ❢♦r x ∈ MN(f)✳ ❆s FN ✐s ♥♦♥♥❡❣❛t✐✈❡ ❛♥❞ ❡q✉❛❧ t♦ 0 ♦♥ Ω \ MN(f)✱ ✇❡ ❤❛✈❡ ˆ

MN(f)

fdP ≥ ˆ

MN(f)

FNdP − ˆ

MN(f)

FN ◦ TdP = ˆ

Ω

FNdP − ˆ

MN (f)

FN ◦ TdP ≥ ˆ

Ω

FNdP − ˆ

Ω

FN ◦ TdP = 0. ❋✐♥❛❧❧②✱ s✐♥❝❡ MN(f) ր M(f)✱ t❤❡ ❞♦♠✐♥❛t❡❞ ❝♦♥✈❡r❣❡♥❝❡ t❤❡♦r❡♠ s❤♦✇s t❤❛t ˆ

M(f)

fdP = lim

N→∞

ˆ

MN(f)

fdP ≥ 0.

❈♦r♦❧❧❛r② ✶✸✳✷✳ ■❢ Mα(f) = {x : supn≥1

1 nf n(x) > α}✱ t❤❡♥

´

Mα(f) fdP ≥ αP (Mα(f))✳

Pr♦♦❢✳ ▲❡t g = f − α✳ ❚❤❡♥ gn =

n−1

k=0

(f − α) ◦ T k =

n−1

k=0
f ◦ T k − α
= fn − nα,

s♦ 1

ngn = 1 nf n − α✱ ❛♥❞ t❤✉s

Mα(f) =

x : sup

n≥1

1 nf n(x) > α

=
x : sup

n≥1

1 nf n(x) − α > 0

=
x : sup

n≥1

1 ngn(x) > 0

=
x : sup

n≥1

gn(x) > 0

= M(g).

❚❤❡r❡❢♦r❡✱ t❤❡ ♠❛①✐♠❛❧ ❡r❣♦❞✐❝ t❤❡♦r❡♠ ✐♠♣❧✐❡s 0 ≤ ˆ

M(g)

gdP = ˆ

Mα(f)

(f − α) dP = ˆ

Mα(f)

fdP − αP (Mα(f)) .

❈♦r♦❧❧❛r② ✶✸✳✸✳ ■❢ A ⊆ M(f) ✐s T✲✐♥✈❛r✐❛♥t✱ t❤❡♥

´

A fdP ≥ 0✳

Pr♦♦❢✳ ❙✐♥❝❡ T −1A = A ✉♣ t♦ ❛ ♥✉❧❧ s❡t✱ 1A ◦ T = 1A ❛✳s✳✱ s♦ ✐❢ g = f · 1A✱ t❤❡♥ gn =

n−1

k=0

(f · 1A) ◦ T k =

n−1

k=0
f ◦ T k

· 1A = f n · 1A. ■t ❢♦❧❧♦✇s t❤❛t M(g) =

x ∈ Ω : sup

n≥1

gn(x) > 0

=
x ∈ A : sup

n≥1

f n(x) > 0

= A ∩ M(f) = A,

❛♥❞ t❤✉s ˆ

A

fdP = ˆ

M(g)

gdP ≥ 0.

✽✹

f ∗(x) = supn≥1

1 n

n−1

k=0 |f ◦ T k (x)|

Ergodic Theorem Details 5 / 7

SLIDE 37

2. Derive a weak-type inequality.

◮ A familiar example is:

αP(|X| > α) ≤ ||X||L1 (Chebyshev’s).

1. Differentiation of the integral

101 The main properties of f ∗ we shall need are summarized in a theorem. Theorem 1.1 Suppose f is integrable on Rd. Then: (i) f ∗ is measurable. (ii) f ∗(x) < ∞ for a.e. x. (iii) f ∗ satisfies (1) m({x ∈ Rd : f ∗(x) > α}) ≤ A α fL1(Rd) for all α > 0, where A = 3d, and fL1(Rd) =

Rd |f(x)| dx.

Before we come to the proof we want to clarify the nature of the main conclusion (iii). As we shall observe, one has that f ∗(x) ≥ |f(x)| for a.e. x; the effect of (iii) is that, broadly speaking, f ∗ is not much larger than |f|. From this point of view, we would have liked to conclude that f ∗ is integrable, as a result of the assumed integrability of f. However, this is not the case, and (iii) is the best substitute available (see Exercises 4 and 5). An inequality of the type (1) is called a weak-type inequality because it is weaker than the corresponding inequality for the L1-norms. Indeed, this can be seen from the Tchebychev inequality (Exercise 9 in Chapter 2), which states that for an arbitrary integrable function g, m({x : |g(x)| > α}) ≤ 1 α gL1(Rd), for all α > 0. We should add that the exact value of A in the inequality (1) is unim- portant for us. What matters is that this constant be independent of α and f. The only simple assertion in the theorem is that f ∗ is a measurable

function. Indeed, the set Eα = {x ∈ Rd : f ∗(x) > α} is open, because if

x ∈ Eα, there exists a ball B such that x ∈ B and 1 m(B)

B

|f(y)| dy > α. Now any point x close enough to x will also belong to B; hence x ∈ Eα as well. The two other properties of f ∗ in the theorem are deeper, with (ii) being a consequence of (iii). This follows at once if we observe that {x : f ∗(x) = ∞} ⊂ {x : f ∗(x) > α}

αm(|f ∗| > α) ≤ A

|f | dm

❇❡❝❛✉s❡ FN(x) = max1≤k≤N f k(x) ❢♦r x ∈ MN(f)✱ t❤✐s s❤♦✇s t❤❛t f(x) ≥ FN(x) − FN(Tx) ❢♦r x ∈ MN(f)✳ ❆s FN ✐s ♥♦♥♥❡❣❛t✐✈❡ ❛♥❞ ❡q✉❛❧ t♦ 0 ♦♥ Ω \ MN(f)✱ ✇❡ ❤❛✈❡ ˆ

MN(f)

fdP ≥ ˆ

MN (f)

FNdP − ˆ

MN (f)

FN ◦ TdP = ˆ

Ω

FNdP − ˆ

MN (f)

FN ◦ TdP ≥ ˆ

Ω

FNdP − ˆ

Ω

FN ◦ TdP = 0. ❋✐♥❛❧❧②✱ s✐♥❝❡ MN(f) ր M(f)✱ t❤❡ ❞♦♠✐♥❛t❡❞ ❝♦♥✈❡r❣❡♥❝❡ t❤❡♦r❡♠ s❤♦✇s t❤❛t ˆ

M(f)

fdP = lim

N→∞

ˆ

MN(f)

fdP ≥ 0.

❈♦r♦❧❧❛r② ✶✸✳✷✳ ■❢ Mα(f) = {x : supn≥1

1 nf n(x) > α}✱ t❤❡♥

´

Mα(f) fdP ≥ αP (Mα(f))✳

Pr♦♦❢✳ ▲❡t g = f − α✳ ❚❤❡♥ gn =

n−1

k=0

(f − α) ◦ T k =

n−1

k=0
f ◦ T k − α
= fn − nα,

s♦ 1

ngn = 1 nf n − α✱ ❛♥❞ t❤✉s

Mα(f) =

x : sup

n≥1

1 nf n(x) > α

=
x : sup

n≥1

1 nf n(x) − α > 0

=
x : sup

n≥1

1 ngn(x) > 0

=
x : sup

n≥1

gn(x) > 0

= M(g).

❚❤❡r❡❢♦r❡✱ t❤❡ ♠❛①✐♠❛❧ ❡r❣♦❞✐❝ t❤❡♦r❡♠ ✐♠♣❧✐❡s 0 ≤ ˆ

M(g)

gdP = ˆ

Mα(f)

(f − α) dP = ˆ

Mα(f)

fdP − αP (Mα(f)) .

❈♦r♦❧❧❛r② ✶✸✳✸✳ ■❢ A ⊆ M(f) ✐s T✲✐♥✈❛r✐❛♥t✱ t❤❡♥

´

A fdP ≥ 0✳

Pr♦♦❢✳ ❙✐♥❝❡ T −1A = A ✉♣ t♦ ❛ ♥✉❧❧ s❡t✱ 1A ◦ T = 1A ❛✳s✳✱ s♦ ✐❢ g = f · 1A✱ t❤❡♥ gn =

n−1

k=0

(f · 1A) ◦ T k =

n−1

k=0
f ◦ T k

· 1A = f n · 1A. ■t ❢♦❧❧♦✇s t❤❛t M(g) =

x ∈ Ω : sup

n≥1

gn(x) > 0

=
x ∈ A : sup

n≥1

f n(x) > 0

= A ∩ M(f) = A,

❛♥❞ t❤✉s ˆ

A

fdP = ˆ

M(g)

gdP ≥ 0.

✽✹

αP(f ∗ > α) ≤

{f ∗>α} f dP

Ergodic Theorem Details 6 / 7

SLIDE 38

3. Use this inequality to show µ({x :}) = 0.

104

Chapter 3. DIFFERENTIATION AND INTEGRATION

Since the balls Bi1, . . . , Bik are disjoint and satisfy (2) as well as (3), we find that m(K) ≤ m N

ℓ=1

Bℓ

≤ 3d

k

j=1

m(Bij) ≤ 3d α

k

j=1
Bij

|f(y)| dy = 3d α

Sk

j=1 Bij

|f(y)| dy ≤ 3d α

Rd |f(y)| dy.

Since this inequality is true for all compact subsets K of Eα, the proof

f the weak type inequality for the maximal operator is complete.

1.2 The Lebesgue differentiation theorem The estimate obtained for the maximal function now leads to a solution

f the averaging problem.

Theorem 1.3 If f is integrable on Rd, then (4) lim

m(B) → 0 x ∈ B

1 m(B)

B

f(y) dy = f(x) for a.e. x. Proof. It suffices to show that for each α > 0 the set Eα =   x : lim sup

m(B) → 0 x ∈ B

1

m(B)

B

f(y) dy − f(x)

> 2α

   has measure zero, because this assertion then guarantees that the set E = ∞

n=1 E1/n has measure zero, and the limit in (4) holds at all points

f Ec.

We fix α, and recall Theorem 2.4 in Chapter 2, which states that for each ǫ > 0 we may select a continuous function g of compact support with f − gL1(Rd) < ǫ. As we remarked earlier, the continuity of g implies that lim

m(B) → 0 x ∈ B

1 m(B)

B

g(y) dy = g(x), for all x. Since we may write the difference

1 m(B)

B f(y) dy − f(x) as

1 m(B)

B

(f(y) − g(y)) dy + 1 m(B)

B

g(y) dy − g(x) + g(x) − f(x)

❲❡ ❛r❡ ♥♦✇ ✐♥ ❛ ♣♦s✐t✐♦♥ t♦ ♣r♦✈❡ ❚❤❡♦r❡♠ ✶✸✳✸ ✭P♦✐♥t✇✐s❡ ❊r❣♦❞✐❝ ❚❤❡♦r❡♠✮✳ ❋♦r ❛♥② f ∈ L1✱ lim

n→∞

1 nf n = f ❛✳s✳ ✇❤❡r❡ f ∈ L1 ✐s ✐♥✈❛r✐❛♥t ✇✐t❤ ´

Ω fdP =

´

Ω fdP✳

Pr♦♦❢✳ ❙❡t f +(x) = lim sup

n→∞

1 nf n(x), f −(x) = lim inf

n→∞

1 nf n(x). ❈❧❡❛r❧② f + ❛♥❞ f − ❛r❡ ✐♥✈❛r✐❛♥t ✇✐t❤ f −(x) ≤ f +(x) ❢♦r ❛❧❧ x✳ ❲❡ ✇✐s❤ t♦ s❤♦✇ t❤❛t f + = f − ❛✳s✳✱ s♦ ✇❡ ♥❡❡❞ t♦ s❤♦✇ t❤❛t M = {x : f −(x) < f +(x)} ❤❛s P(M) = 0✳ ❋♦r α, β ∈ Q✱ ❧❡t Mα,β = {x : f −(x) < α, f +(x) > β}✳ ❚❤❡♥ M =

α,β∈Q Mα,β✱ s♦ ✐t s✉✣❝❡s t♦ s❤♦✇ t❤❛t

P(Mα,β) = 0 ❢♦r ❛❧❧ α, β ∈ Q ✇✐t❤ α < β✳ ❲❡ ♥♦t❡ ❛t t❤❡ ♦✉ts❡t t❤❛t t❤❡ ✐♥✈❛r✐❛♥❝❡ ♦❢ f + ❛♥❞ f − ✐♠♣❧② t❤❛t Mα,β ✐s ❛♥ ✐♥✈❛r✐❛♥t s❡t✳ ◆♦✇ ❧❡t M +

β

= {x : f +(x) > β}✳ ■❢ x ∈ M +

β ✱ t❤❡♥ t❤❡r❡ ✐s s♦♠❡ n ∈ N s✉❝❤ t❤❛t 1 nf n(x) > β✱ s♦

(f − β)n(x) = f n(x) − nβ > 0✱ ❤❡♥❝❡ x ∈ M(f − β)✳ ❙✐♥❝❡ Mα,β ⊆ M +

β ⊆ M(f − β) ✐s ✐♥✈❛r✐❛♥t✱ ❈♦r♦❧❧❛r② ✶✸✳✸ s❤♦✇s t❤❛t

´

Mα,β(f − β)dP ≥ 0✱ ❤❡♥❝❡

´

Mα,β fdP ≥ βP (Mα,β)✳

❙✐♠✐❧❛r❧②✱ ✐❢ x ∈ M −

α := {x : f −(x) < α}✱ t❤❡♥ t❤❡r❡ ✐s ❛♥ m ✇✐t❤ 1 mf m < α✱ s♦ (α − f)m > 0✳

■t ❢♦❧❧♦✇s t❤❛t Mα,β ⊆ M −

α ⊆ M(α − f)✱ s♦ t❤❛t

´

Mα,β fdP ≤ αP (Mα,β)✳

❚❤✉s ✇❡ ❤❛✈❡ s❤♦✇♥ t❤❛t βP (Mα,β) ≤ ˆ

Mα,β

fdP ≤ αP (Mα,β) , s♦ s✐♥❝❡ α < β✱ ✇❡ ❝♦♥❝❧✉❞❡ t❤❛t P (Mα,β) = 0 ❛s ❞❡s✐r❡❞✳ ❚❤✐s s❤♦✇s t❤❛t 1

nf n ❤❛s ❛♥ ❛❧♠♦st s✉r❡ ❧✐♠✐t f ∗✳

❇② ❈♦r♦❧❧❛r② ✶✸✳✶✱ ✇❡ ❛❧s♦ ❤❛✈❡ t❤❛t 1

nf n → f ✐♥ L1✱ ❛♥❞ ❋❛t♦✉✬s ❧❡♠♠❛ ❣✐✈❡s

ˆ f ∗ − f

dP =

ˆ lim inf

n

1

nf n − f

dP ≤ lim inf

n

ˆ

1

nf n − f

dP = 0,

s♦ f ∗ = f ❛✳s✳ ■♥ ❧✐❣❤t ♦❢ ♣r❡✈✐♦✉s ♦❜s❡r✈❛t✐♦♥s✱ t❤✐s s❤♦✇s t❤❛t 1

nf n ❝♦♥✈❡r❣❡s ❛✳s✳ t♦ f = E[f |I ]✳

■t ✐s ❧❡❢t ❛s ❛ ❤♦♠❡✇♦r❦ ❡①❡r❝✐s❡ t♦ ✉s❡ ❚❤❡♦r❡♠ ✶✸✳✸ t♦ ❡①t❡♥❞ t❤❡ ♠❡❛♥ ❡r❣♦❞✐❝ t❤❡♦r❡♠ t♦

❚❤❡♦r❡♠ ✶✸✳✹✳ ■❢ f ∈ Lp✱ 1 ≤ p < ∞✱ t❤❡♥ 1

nf n → E[f |I ] ✐♥ Lp✳

❲❡ ❝♦♥❝❧✉❞❡ ♦✉r ❞✐s❝✉ss✐♦♥ ♦❢ ❡r❣♦❞✐❝ t❤❡♦r❡♠s ✇✐t❤ s♦♠❡ ❡①❛♠♣❧❡s✳

✽✺

Ergodic Theorem Details 7 / 7

SLIDE 39

3. Use this inequality to show µ({x :}) = 0.
1. Differentiation of the integral

105 we find that lim sup

m(B) → 0 x ∈ B

1

m(B)

B

f(y) dy − f(x)

≤ (f − g)∗(x) + |g(x) − f(x)|,

where the symbol ∗ indicates the maximal function. Consequently, if Fα = {x : (f − g)∗(x) > α} and Gα = {x : |f(x) − g(x)| > α} then Eα ⊂ (Fα ∪ Gα), because if u1 and u2 are positive, then u1 + u2 > 2α only if ui > α for at least one ui. On the one hand, Tchebychev’s inequality yields m(Gα) ≤ 1 α f − gL1(Rd), and on the other hand, the weak type estimate for the maximal function gives m(Fα) ≤ A α f − gL1(Rd). The function g was selected so that f − gL1(Rd) < ǫ. Hence we get m(Eα) ≤ A α ǫ + 1 α ǫ. Since ǫ is arbitrary, we must have m(Eα) = 0, and the proof of the the-

rem is complete.

Note that as an immediate consequence of the theorem applied to |f|, we see that f ∗(x) ≥ |f(x)| for a.e. x, with f ∗ the maximal function. We have worked so far under the assumption that f is integrable. This “global” assumption is slightly out of place in the context of a “local” notion like differentiability. Indeed, the limit in Lebesgue’s theorem is taken over balls that shrink to the point x, so the behavior of f far from x is irrelevant. Thus, we expect the result to remain valid if we simply assume integrability of f on every ball. To make this precise, we say that a measurable function f on Rd is locally integrable, if for every ball B the function f(x)χB(x) is

integrable. We shall denote by L1

loc(Rd) the space of all locally integrable

functions. Loosely speaking, the behavior at infinity does not affect the

local integrability of a function. For example, the functions e|x| and |x|−1/2 are both locally integrable, but not integrable on Rd. Clearly, the conclusion of the last theorem holds under the weaker assumption that f is locally integrable.

❲❡ ❛r❡ ♥♦✇ ✐♥ ❛ ♣♦s✐t✐♦♥ t♦ ♣r♦✈❡ ❚❤❡♦r❡♠ ✶✸✳✸ ✭P♦✐♥t✇✐s❡ ❊r❣♦❞✐❝ ❚❤❡♦r❡♠✮✳ ❋♦r ❛♥② f ∈ L1✱ lim

n→∞

1 nf n = f ❛✳s✳ ✇❤❡r❡ f ∈ L1 ✐s ✐♥✈❛r✐❛♥t ✇✐t❤ ´

Ω fdP =

´

Ω fdP✳

Pr♦♦❢✳ ❙❡t f +(x) = lim sup

n→∞

1 nf n(x), f −(x) = lim inf

n→∞

1 nf n(x). ❈❧❡❛r❧② f + ❛♥❞ f − ❛r❡ ✐♥✈❛r✐❛♥t ✇✐t❤ f −(x) ≤ f +(x) ❢♦r ❛❧❧ x✳ ❲❡ ✇✐s❤ t♦ s❤♦✇ t❤❛t f + = f − ❛✳s✳✱ s♦ ✇❡ ♥❡❡❞ t♦ s❤♦✇ t❤❛t M = {x : f −(x) < f +(x)} ❤❛s P(M) = 0✳ ❋♦r α, β ∈ Q✱ ❧❡t Mα,β = {x : f −(x) < α, f +(x) > β}✳ ❚❤❡♥ M =

α,β∈Q Mα,β✱ s♦ ✐t s✉✣❝❡s t♦ s❤♦✇ t❤❛t

P(Mα,β) = 0 ❢♦r ❛❧❧ α, β ∈ Q ✇✐t❤ α < β✳ ❲❡ ♥♦t❡ ❛t t❤❡ ♦✉ts❡t t❤❛t t❤❡ ✐♥✈❛r✐❛♥❝❡ ♦❢ f + ❛♥❞ f − ✐♠♣❧② t❤❛t Mα,β ✐s ❛♥ ✐♥✈❛r✐❛♥t s❡t✳ ◆♦✇ ❧❡t M +

β

= {x : f +(x) > β}✳ ■❢ x ∈ M +

β ✱ t❤❡♥ t❤❡r❡ ✐s s♦♠❡ n ∈ N s✉❝❤ t❤❛t 1 nf n(x) > β✱ s♦

(f − β)n(x) = f n(x) − nβ > 0✱ ❤❡♥❝❡ x ∈ M(f − β)✳ ❙✐♥❝❡ Mα,β ⊆ M +

β ⊆ M(f − β) ✐s ✐♥✈❛r✐❛♥t✱ ❈♦r♦❧❧❛r② ✶✸✳✸ s❤♦✇s t❤❛t

´

Mα,β(f − β)dP ≥ 0✱ ❤❡♥❝❡

´

Mα,β fdP ≥ βP (Mα,β)✳

❙✐♠✐❧❛r❧②✱ ✐❢ x ∈ M −

α := {x : f −(x) < α}✱ t❤❡♥ t❤❡r❡ ✐s ❛♥ m ✇✐t❤ 1 mf m < α✱ s♦ (α − f)m > 0✳

■t ❢♦❧❧♦✇s t❤❛t Mα,β ⊆ M −

α ⊆ M(α − f)✱ s♦ t❤❛t

´

Mα,β fdP ≤ αP (Mα,β)✳

❚❤✉s ✇❡ ❤❛✈❡ s❤♦✇♥ t❤❛t βP (Mα,β) ≤ ˆ

Mα,β

fdP ≤ αP (Mα,β) , s♦ s✐♥❝❡ α < β✱ ✇❡ ❝♦♥❝❧✉❞❡ t❤❛t P (Mα,β) = 0 ❛s ❞❡s✐r❡❞✳ ❚❤✐s s❤♦✇s t❤❛t 1

nf n ❤❛s ❛♥ ❛❧♠♦st s✉r❡ ❧✐♠✐t f ∗✳

❇② ❈♦r♦❧❧❛r② ✶✸✳✶✱ ✇❡ ❛❧s♦ ❤❛✈❡ t❤❛t 1

nf n → f ✐♥ L1✱ ❛♥❞ ❋❛t♦✉✬s ❧❡♠♠❛ ❣✐✈❡s

ˆ f ∗ − f

dP =

ˆ lim inf

n

1

nf n − f

dP ≤ lim inf

n

ˆ

1

nf n − f

dP = 0,

s♦ f ∗ = f ❛✳s✳ ■♥ ❧✐❣❤t ♦❢ ♣r❡✈✐♦✉s ♦❜s❡r✈❛t✐♦♥s✱ t❤✐s s❤♦✇s t❤❛t 1

nf n ❝♦♥✈❡r❣❡s ❛✳s✳ t♦ f = E[f |I ]✳

■t ✐s ❧❡❢t ❛s ❛ ❤♦♠❡✇♦r❦ ❡①❡r❝✐s❡ t♦ ✉s❡ ❚❤❡♦r❡♠ ✶✸✳✸ t♦ ❡①t❡♥❞ t❤❡ ♠❡❛♥ ❡r❣♦❞✐❝ t❤❡♦r❡♠ t♦

❚❤❡♦r❡♠ ✶✸✳✹✳ ■❢ f ∈ Lp✱ 1 ≤ p < ∞✱ t❤❡♥ 1

nf n → E[f |I ] ✐♥ Lp✳

❲❡ ❝♦♥❝❧✉❞❡ ♦✉r ❞✐s❝✉ss✐♦♥ ♦❢ ❡r❣♦❞✐❝ t❤❡♦r❡♠s ✇✐t❤ s♦♠❡ ❡①❛♠♣❧❡s✳

✽✺

Ergodic Theorem Details 7 / 7