Rates of f Estimation for Dis iscrete Determinantal Point - - PowerPoint PPT Presentation

rates of f estimation for dis iscrete determinantal point
SMART_READER_LITE
LIVE PREVIEW

Rates of f Estimation for Dis iscrete Determinantal Point - - PowerPoint PPT Presentation

Rates of f Estimation for Dis iscrete Determinantal Point Processes V.-E. Brunel, A. Moitra, P. Rigollet, J. Urschel COLT 2017, Amsterdam Discrete DPPs Random variables on the hypercube , , represented as subsets of [] .


slide-1
SLIDE 1

Rates of f Estimation for Dis iscrete Determinantal Point Processes

V.-E. Brunel, A. Moitra, P. Rigollet, J. Urschel

COLT 2017, Amsterdam

slide-2
SLIDE 2

Discrete DPPs

1 0 0 1 1 0 1 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 1 1 0 1 0 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 1 0 1 0 0 1 1 0 1 … 0 0 1 0 0 1 0 1 1 0 0 0 0 0 1 1 0 1 0 0 ↔ {1,4,5,7,9,10,12,15,19} ↔ {3,4,6,8,9, 12,15,19} ↔ {1,4,8,12,14,17,18,20} ↔ {3,6,8,9, 15,16,18}

Random variables on the hypercube 𝟏, 𝟐 𝑶, represented as subsets of [𝑶].

slide-3
SLIDE 3

Discrete DPPs

  • Probabilistic model for correlated Bernoulli r.v.
  • Feature repulsion (negative association)
  • 𝐿𝑗,𝑘 ↬ repulsion between items 𝑗 and 𝑘.
  • PMF:

ℙ 𝑍 = 𝐾 = det 𝐿 − 𝐽

𝐾

Definition Random subset 𝑍 ⊆ [𝑂], ℙ 𝐾 ⊆ 𝑍 = det 𝐿

𝐾 , ∀𝐾

𝐿 ∈ ℝ𝑂×𝑂, symmetric, 0 ≼ 𝐿 ≼ 𝐽

slide-4
SLIDE 4

Goal

  • Given 𝑍

1, 𝑍 2, … , 𝑍 𝑜 ∼ DPP 𝐿∗ , estimate 𝐿∗.

  • Approach: Maximum Likelihood Estimator.
  • Question: Rate of convergence of the MLE ?

iid

slide-5
SLIDE 5

Id Identification

  • DPP 𝐿 = DPP 𝐿∗ ⇔ det 𝐿

𝐾 = det 𝐿 𝐾 ∗ , ∀𝐾 ⊆ [𝑂]

⇔ 𝐿 = 𝐸𝐿∗𝐸 for some D = ±1 ±1 ⋱ ±1 .

  • E.g.: K∗ =

+ + + + + + + + + + + + + + + + ⇝ 𝐸K∗𝐸 = + − − + − + + − − + + − + − − + Measure of the error of an estimator 𝑳:

← ← ↓ ↓

ℓ 𝐿, 𝐿∗ = min

𝐸

|| 𝐿 − 𝐸𝐿∗𝐸||𝐺

slide-6
SLIDE 6

Maximum likelihood estimation

  • Log-likelihood:

Ψ 𝐿 = 𝑞𝐾 ln det K − I

𝐾

  • MLE:

𝐿 ∈ argmax Ψ(𝐿) Ψ 𝐿 ≜ 𝔽 Ψ 𝐿 = 𝑞𝐾

∗ ln det K − I 𝐾

= Ψ 𝐿∗ − 𝐿𝑀 𝐸𝑄𝑄 𝐿∗ , 𝐸𝑄𝑄 𝐿

𝐾 ⊆ 𝑂 𝐾 ⊆ 𝑂

slide-7
SLIDE 7

Likelihood geometry ry

Fisher information: −𝛼2Ψ 𝐿∗ What is the order of the first non degenerate derivative of 𝛀 at 𝑳 = 𝑳∗ ?

𝐿 𝐿∗ Ψ 𝐿 𝛼2Ψ K∗ < 0 𝐿 𝐿∗ Ψ 𝐿 𝛼2Ψ K∗ = 0

slide-8
SLIDE 8

Determinantal Graphs & Ir Irreducibility

Definition 𝐻 = 𝑂 , 𝐹 : 𝑗, 𝑘 ∈ 𝐹 ⇔ 𝐿𝑗,𝑘

∗ ≠ 0.

  • 𝐿∗ is irreducible iff 𝐻 is connected.
  • Otherwise, 𝐿∗ is block diagonal.
  • Rk: 𝐿∗ is block diagonal ⇒ 𝑍 = union of independent DPPs
  • Write 𝑗 ∼ 𝑘 when 𝑗 and 𝑘 are connected in 𝐻.
slide-9
SLIDE 9

Main Results: Ir Irreducible case

Theorem 1 𝐿∗ irreducible ⇔ 𝛼2Ψ(𝐿∗) is definite negative Statistical consequences:

𝐿, 𝐿∗ = 𝑃ℙ 𝑜−1

2

  • CLT
slide-10
SLIDE 10

Main Results: Block diagonal case (1 (1)

Theorem 2 Ker 𝛼2Ψ 𝐿∗ = 𝐼 ∈ ℝ𝑂×𝑂: 𝐼𝑗,𝑘 = 0, ∀𝑗 ∼ 𝑘 𝜶𝟑𝛀 𝑳∗ is negative definite along directions supported on the blocks of 𝑳∗. Theorem 3 For 𝐼 ∈ Ker 𝛼2Ψ 𝐿∗ ∖ {0}: 𝛼3Ψ 𝐿∗ 𝐼⊗3 = 0 𝛼4Ψ 𝐿∗ 𝐼⊗4 < 0

slide-11
SLIDE 11

Main Results: Block diagonal case (2 (2)

Statistical consequences:

𝐿, 𝐿∗ = 𝑃ℙ 𝑜−1

6

𝐿

𝑇, 𝐿 𝑇 ∗ = 𝑃ℙ 𝑜−1

2

for all blocks 𝑇 of 𝐿∗.

slide-12
SLIDE 12

Conclusions

  • Rates of convergence of the MLE:

𝑜−1/2 if 𝐿∗ is irreducible 𝑜−1/6

  • therwise
  • Rate only determined by connectedness of the determinantal graph
  • Hidden constants can be arbitrarily large in 𝑂: e.g., if 𝐻 is a path graph
  • In another paper we show that the sample complexity of a method-of-moment

estimator is determined by the cycle sparsity of 𝐻.

* *Learning Determinantal Point Processes from Moments and Cycles, J. Urschel, V.-E. Brunel, A. Moitra, P. Rigollet, ICML 2017