ECLIPSE: An Extreme-Scale Linear Program Solver for Web-Applications - - PowerPoint PPT Presentation

eclipse an extreme scale linear program solver for web
SMART_READER_LITE
LIVE PREVIEW

ECLIPSE: An Extreme-Scale Linear Program Solver for Web-Applications - - PowerPoint PPT Presentation

ECLIPSE: An Extreme-Scale Linear Program Solver for Web-Applications Kinjal Basu Amol Ghoting Rahul Mazumder Yao Pan LinkedIn AI LinkedIn AI MIT LinkedIn AI 1 Overview 2 ECLIPSE: Extreme Scale LP Solver Agenda 3 Applications 4


slide-1
SLIDE 1

ECLIPSE: An Extreme-Scale Linear Program Solver for Web-Applications

Kinjal Basu

LinkedIn AI

Amol Ghoting

LinkedIn AI

Rahul Mazumder

MIT

Yao Pan

LinkedIn AI

slide-2
SLIDE 2

Agenda

1

Overview

2

ECLIPSE: Extreme Scale LP Solver

3 4

System Architecture

5

Experimental Results Applications

slide-3
SLIDE 3

Overview

slide-4
SLIDE 4

Introduction

Large-Scale Linear Programs (LP) has several applications on web

slide-5
SLIDE 5

Problems of Extreme Scale

  • Billions to Trillions of Variables
  • Ad-hoc Solutions
  • Splitting the problem to smaller sub-problem à No guarantee of optimality
  • Exploit the Structure of the Problem
  • Solve a Perturbation of the Primal Problem.
  • Smooth Gradient
  • Efficient computation
slide-6
SLIDE 6

Motivating Example

Friend or Connection Matching Problem

  • Maximize Value
  • Total invites sent is greater than a threshold
  • Limit on invitations per member to prevent
  • verwhelming members
  • 𝑞! - Value Model
  • 𝑞" - Invitation Model
  • 𝑦#$ - Probability of showing user j to user i

Scale:

  • 𝐽 ≈ 10%
  • 𝐾 ≈ 10&
  • 𝑜 ≈ 10!"

( 1 Trillion Decision Variables)

slide-7
SLIDE 7

min

x

cT x s.t. Ax  b xi 2 Ci, i 2 [I]

A

✓A(1)

✓ A(2)

= B @ D11 . . . D1I . . . · · · . . . Dm21 . . . Dm2I 1 C A

  • Users 𝑗, Items 𝑘, and 𝑦#$ is the association

between (𝑗, 𝑘)

  • 𝑜 = 𝐽𝐾 can range in 100s of millions to 10s of trillions
  • 𝐷# are simple constraints (i.e. allows for efficient

projections)

General Framework

Global Constraints Cohort Level Constraints Eg: Total Invite Constraint Item level constraints Eg: Limits on invitation per user

slide-8
SLIDE 8

ECLIPSE: Extreme Scale LP Solver

slide-9
SLIDE 9

min

x

cT x s.t. Ax  b, xi 2 Ci, i 2 [I]

min

x

cT x + γ 2 xT x s.t. Ax  b, xi 2 Ci, i 2 [I]

gγ(λ) := min

x∈QCi

n cT x + γ 2 xT x + λT (Ax b)

  • P ∗

0 :=

P ∗

γ :=

Key Observation: Primal LP: Primal QP: Old idea: Perturbation of the LP (Mangasarian & Meyer ’79; Nesterov ‘05; Osher et al ‘11…) Dual QP:

Dualize

length(λ) is small

= max

λ≥0 gγ(λ)

Solve the Dual QP:

g∗

γ :=

P ∗

γ

=

Strong duality

Solving The Problem

slide-10
SLIDE 10

min

x

cT x s.t. Ax  b, xi 2 Ci, i 2 [I]

gγ(λ) := min

x∈QCi

n cT x + γ 2 xT x + λT (Ax b)

  • |g∗

γ − P ∗ 0 | = O(γ)

| − ∃¯ γ > 0 such that x∗

γ solves LP for all γ ≤ ¯

γ x∗

γ 2 argmin x

cT x + γ 2 xT x s.t. Ax  b, xi 2 Ci, i 2 [I]

Primal:

  • Observation-1: Exact Regularization (Mangasarian & Meyer ’79; Friedlander Tseng ‘08)
  • Observation-2: Error Bound (Nesterov ‘05)

= max

λ≥0 gγ(λ)

g∗

γ :=

P ∗

0 :=

Solving The Problem

Dual:

slide-11
SLIDE 11

= max

λ≥0 gγ(λ)

rgγ(λ) = Aˆ x(λ) b

λ 7! gγ(λ) is O(1/γ)-smooth.

  • Observation-1: Dual objective is smooth (implicitly defined)

[Nesterov ‘05]

  • Observation-2: Gradient expression (Danskin’s Theorem)

Q

n

  • ˆ

x(λ) 2 argmin

x∈QCi

n cT x + γ 2 xT x + λT (Ax b)

  • ˆ

xi(λ) = ΠCi ✓ 1 γ (AT λ + c)i ◆

  • Proximal Gradient Based methods

(Acceleration, Restarts)

  • Optimal convergence rates.

ECLIPSE Algorithm

  • Key bottleneck: Matrix-vector multiplication
  • Simple projection operation

n n

Solving The Problem

slide-12
SLIDE 12

Overall Algorithm

Input: At Iteration k: Dual Get Primal: Compute Gradient: Update Dual: GD: AGD: Next Iteration

slide-13
SLIDE 13

Applications

slide-14
SLIDE 14

Volume Optimization

Maximize Sessions

  • Total number of emails /

notifications bounded

  • Clicks above a threshold
  • Disablement below a threshold

Generalized from global to cohort level systems and member level systems

slide-15
SLIDE 15

Multi-Objective Optimization

  • Maximize Metric 1
  • Metric 2 is greater than a

minimum

  • Metric 3 is bounded
  • Most Product Applications
  • Engagement vs Revenue
  • Sessions vs Notification /

Email Volume

  • Member Value vs Annoyance
slide-16
SLIDE 16

System Infrastructure

slide-17
SLIDE 17

System Architecture

  • Data is collected from different sources

and restructured to form Input 𝐵, 𝑐, 𝑑

slide-18
SLIDE 18

System Architecture

  • Data is collected from different sources

and restructured to form Input 𝐵, 𝑐, 𝑑

  • The solver is called which runs the overall

iterations.

  • The data is split into multiple executors and

they perform matrix vector multiplications in parallel

  • The driver collects the dual and broadcasts

it back to continue the iterations

slide-19
SLIDE 19

System Architecture

  • Data is collected from different sources

and restructured to form Input 𝐵, 𝑐, 𝑑

  • The solver is called which runs the overall

iterations.

  • The data is split into multiple executors and

they perform matrix vector multiplications in parallel

  • The driver collects the dual and broadcasts

it back to continue the iterations

  • On convergence the final duals are

returned which are used in online serving

slide-20
SLIDE 20

Detailed Spark Implementation

Data Representation

  • Customized DistributedMatrix

API

  • : BlockMatrix API from

Apache MLLib

  • : Leverage Diagonal

structure and implement DistributedVector API using RDD (index, Vector)

Estimating Primal

  • Component wise Matrix

Multiplications and Projections are done in parallel

  • We cache 𝐵 in executor and

broadcast duals to minimize communication cost.

  • The overall complexity to get

the primal is 𝑃(𝐾)

Estimating Gradient

  • Most computationally

expensive step to get

  • The worst-case complexity is

𝑃 𝑜 = 𝐽𝐾

slide-21
SLIDE 21

Experimental Results

slide-22
SLIDE 22

Comparative Results

Please see the full paper for other comparisons

  • We compare with a technique of

splitting the problem (SOTA):

slide-23
SLIDE 23

Real Data Results

  • Test on large-scale volume
  • ptimization and matching

problems

  • Spark 2.3 with up to 800

executors

  • 1 Trillion use case

converged within 12 hours

SCS: O’Donoghue et al (2016)

slide-24
SLIDE 24

Key Takeaways

slide-25
SLIDE 25

Key Takeaways

  • A framework for solving structured LP problems arising in several applications

from internet industry

  • Most multi-objective optimization can be framed through this.
  • Given the computation resources, we can scale to extremely large problems.
  • We can easily scale up to 1 Trillion variables on real data.
slide-26
SLIDE 26

Thank you