Rates of f Estimation for Dis iscrete Determinantal Point - - PowerPoint PPT Presentation

▶

Apr 25, 2023 103 likes •232 views

Rates of f Estimation for Dis iscrete Determinantal Point Processes V.-E. Brunel, A. Moitra, P. Rigollet, J. Urschel COLT 2017, Amsterdam Discrete DPPs Random variables on the hypercube , , represented as subsets of [] .

SLIDE 1

Rates of f Estimation for Dis iscrete Determinantal Point Processes

V.-E. Brunel, A. Moitra, P. Rigollet, J. Urschel

COLT 2017, Amsterdam

SLIDE 2

Discrete DPPs

1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 1 1 0 1 0 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 1 0 1 0 0 1 1 0 1 … 0 0 1 0 0 1 0 1 1 0 0 0 0 0 1 1 0 1 0 0 ↔ {1,4,5,7,9,10,12,15,19} ↔ {3,4,6,8,9, 12,15,19} ↔ {1,4,8,12,14,17,18,20} ↔ {3,6,8,9, 15,16,18}

Random variables on the hypercube 𝟏, 𝟐 𝑶, represented as subsets of [𝑶].

SLIDE 3

Discrete DPPs

Probabilistic model for correlated Bernoulli r.v.
Feature repulsion (negative association)
𝐿𝑗,𝑘 ↬ repulsion between items 𝑗 and 𝑘.
PMF:

ℙ 𝑍 = 𝐾 = det 𝐿 − 𝐽

𝐾

Definition Random subset 𝑍 ⊆ [𝑂], ℙ 𝐾 ⊆ 𝑍 = det 𝐿

𝐾 , ∀𝐾

𝐿 ∈ ℝ𝑂×𝑂, symmetric, 0 ≼ 𝐿 ≼ 𝐽

SLIDE 4

Goal

Given 𝑍

1, 𝑍 2, … , 𝑍 𝑜 ∼ DPP 𝐿∗ , estimate 𝐿∗.

Approach: Maximum Likelihood Estimator.
Question: Rate of convergence of the MLE ?

iid

SLIDE 5

Id Identification

DPP 𝐿 = DPP 𝐿∗ ⇔ det 𝐿

𝐾 = det 𝐿 𝐾 ∗ , ∀𝐾 ⊆ [𝑂]

⇔ 𝐿 = 𝐸𝐿∗𝐸 for some D = ±1 ±1 ⋱ ±1 .

E.g.: K∗ =

+ + + + + + + + + + + + + + + + ⇝ 𝐸K∗𝐸 = + − − + − + + − − + + − + − − + Measure of the error of an estimator 𝑳:

← ← ↓ ↓

ℓ 𝐿, 𝐿∗ = min

𝐸

|| 𝐿 − 𝐸𝐿∗𝐸||𝐺

SLIDE 6

Maximum likelihood estimation

Log-likelihood:

Ψ 𝐿 = 𝑞𝐾 ln det K − I

𝐾

MLE:

𝐿 ∈ argmax Ψ(𝐿) Ψ 𝐿 ≜ 𝔽 Ψ 𝐿 = 𝑞𝐾

∗ ln det K − I 𝐾

= Ψ 𝐿∗ − 𝐿𝑀 𝐸𝑄𝑄 𝐿∗ , 𝐸𝑄𝑄 𝐿

𝐾 ⊆ 𝑂 𝐾 ⊆ 𝑂

SLIDE 7

Likelihood geometry ry

Fisher information: −𝛼2Ψ 𝐿∗ What is the order of the first non degenerate derivative of 𝛀 at 𝑳 = 𝑳∗ ?

𝐿 𝐿∗ Ψ 𝐿 𝛼2Ψ K∗ < 0 𝐿 𝐿∗ Ψ 𝐿 𝛼2Ψ K∗ = 0

SLIDE 8

Determinantal Graphs & Ir Irreducibility

Definition 𝐻 = 𝑂 , 𝐹 : 𝑗, 𝑘 ∈ 𝐹 ⇔ 𝐿𝑗,𝑘

∗ ≠ 0.

𝐿∗ is irreducible iff 𝐻 is connected.
Otherwise, 𝐿∗ is block diagonal.
Rk: 𝐿∗ is block diagonal ⇒ 𝑍 = union of independent DPPs
Write 𝑗 ∼ 𝑘 when 𝑗 and 𝑘 are connected in 𝐻.

SLIDE 9

Main Results: Ir Irreducible case

Theorem 1 𝐿∗ irreducible ⇔ 𝛼2Ψ(𝐿∗) is definite negative Statistical consequences:

𝐿, 𝐿∗ = 𝑃ℙ 𝑜−1

SLIDE 10

Main Results: Block diagonal case (1 (1)

Theorem 2 Ker 𝛼2Ψ 𝐿∗ = 𝐼 ∈ ℝ𝑂×𝑂: 𝐼𝑗,𝑘 = 0, ∀𝑗 ∼ 𝑘 𝜶𝟑𝛀 𝑳∗ is negative definite along directions supported on the blocks of 𝑳∗. Theorem 3 For 𝐼 ∈ Ker 𝛼2Ψ 𝐿∗ ∖ {0}: 𝛼3Ψ 𝐿∗ 𝐼⊗3 = 0 𝛼4Ψ 𝐿∗ 𝐼⊗4 < 0

SLIDE 11

Main Results: Block diagonal case (2 (2)

Statistical consequences:

𝐿, 𝐿∗ = 𝑃ℙ 𝑜−1

𝐿

𝑇, 𝐿 𝑇 ∗ = 𝑃ℙ 𝑜−1

for all blocks 𝑇 of 𝐿∗.

SLIDE 12

Conclusions

Rates of convergence of the MLE:

𝑜−1/2 if 𝐿∗ is irreducible 𝑜−1/6

therwise
Rate only determined by connectedness of the determinantal graph
Hidden constants can be arbitrarily large in 𝑂: e.g., if 𝐻 is a path graph
In another paper we show that the sample complexity of a method-of-moment

estimator is determined by the cycle sparsity of 𝐻.

* *Learning Determinantal Point Processes from Moments and Cycles, J. Urschel, V.-E. Brunel, A. Moitra, P. Rigollet, ICML 2017