[PPT] - Projective Splitting Methods for Decomposing Convex Optimization PowerPoint Presentation

SLIDE 1

May 2019 1 of 45

Projective Splitting Methods for Decomposing Convex Optimization Problems

Jonat han Eckstein Rutgers University, New Jersey, US A Various portions of this talk describe j oint work with Patrick Combettes — NC S tate University, US A Patrick Johnstone — Rutgers University, US A Benar F. S vaiter — IMPA, Brazil Also: Jean-Paul Watson — S andia National Labs, US A David L. Woodruff — UC Davis, US A Funded in part by NSF grants CCF-1115638, CCF-1617617, and AFOS R grant FA9550-15-1-0251

SLIDE 2

May 2019 2 of 45

Introductory Remarks

I did some of the earlier work on an optimization algorithm

called the ADMM (the Alternating Direction Method of Multipliers)

But not the earliest work

SLIDE 3

May 2019 3 of 45

Introductory Remarks

I did some of the earlier work on an optimization algorithm

called the ADMM (the Alternating Direction Method of Multipliers)

But not the earliest work
I know that the ADMM has been used in image processing

because about 15 years ago I st arted being asked to referee a deluge of papers with this picture:

SLIDE 4

May 2019 4 of 45

Introductory Remarks

I did some of the earlier work on an optimization algorithm

called the ADMM (the Alternating Direction Method of Multipliers)

But not the earliest work
I know that the ADMM has been used in image processing

because about 15 years ago I st arted being asked to referee a deluge of papers with this picture:

SLIDE 5

May 2019 5 of 45

Introductory Remarks

I did some of the earlier work on an optimization algorithm

called the ADMM (the Alternating Direction Method of Multipliers)

But not the earliest work
I know that the ADMM has been used in image processing

because about 15 years ago I st arted being asked to referee a deluge of papers with this picture:

Today I want to talk about an algorithm t hat uses similar

building blocks to the ADMM but is much more flexible

SLIDE 6

May 2019 6 of 45

( )

n i i i i

G T G x

=

∈∑

where

0,

,

n

  

are real Hilbert spaces

:

i i i

T   are (generally set-valued) maximal monot one

perators,

1, , i n = 

:

i i

G   are bounded linear maps, 1, , i n = 

However, for this t alk we will restrict ourselves to...

SLIDE 7

May 2019 7 of 45

A General Convex Optimization Problem

{ }

1

min ( )

n i i i x

f G x

=

∑

For

1, , i n = 

,

: { }

i

p i

f → ∪ +∞  

is closed proper convex

For

1, , i n = 

,

i

G is a

i

p m ×

real matrix

Assume you have a class of such problems t hat is not suitable

for standard LP/ NLP solvers because either

The problems are very large
They is fairly large but also dense

SLIDE 8

May 2019 8 of 45

Subgradient Maps of Convex Functions, Monotonicity The subgradient map f

∂ of a convex function

{ }

:

p

f → ∪ +∞  

is given by

{ }

( ) ( ') ( ) , ' '

p

f x y f x f x y x x x ∂ = ≥ + − ∀ ∈

. This has the property that

( ), ' ( ') ', ' y f x y f x x x y y ∈∂ ∈∂ ⇒ − − ≥

Proof:

( ') ( ) , ' ( ) ( ') ', ' ' , ' f x f x y x x f x f x y x x y y x x − ≥ − − ≥ − ≥ − −

SLIDE 9

May 2019 9 of 45

Normal Cone Maps The indicat or f unct ion of a nonempty closed convex set C is

0, ( ) ,

C

x C x x C δ ∈  = +∞ ∉ 

Its subgradient map is the normal cone map

C

N of C:

{ }

, ' ' , ( ) ( )

C C

y y x x x C x C x N x x C δ  − ≤ ∀ ∈ ∈ ∂ = = ∅ ∉  x y ' x ' y ' x x − , ' ', ' ' , ' y x x y x x y y x x − ≤ + − ≤ − − ≤ C ( ')

C

N x

SLIDE 10

May 2019 10 of 45

A Subgradient Chain Rule

S

uppose

: { }

p

f → ∪ +∞  

is closed proper convex

S

uppose G is a p

m ×

real matrix Then for any x,

( ) {

}

( )( ) ( ) f G x G f Gx G y y f Gx ∂ ⊇ ∂ = ∈∂

T T



and “ usually”

( )

( )( ) f G x G f Gx ∂ = ∂

T



SLIDE 11

May 2019 11 of 45

An Optimality Condition Let’ s go back to

{ }

1

min ( )

n i i i x

f G x

=

∑

S uppose we have

1

, , ,

n

p p m n

z w w ∈ ∈ ∈    

such that

1

( ) 1, ,

i i i n i i i

w f G z i n G w

=

∈∂ = =

∑

T



The chain rule then implies that

1

( )

n i i i

f G z

=

  ∈∂  

∑



, so…

z is a solution t o our problem

This is always a sufficient optimality condit ion
It’ s “ usually” necessary as well
The

i

w are the Lagrange multipliers / dual variables

SLIDE 12

May 2019 12 of 45

The Primal-Dual Solution Set (Kuhn-Tucker Set)

{ }

1 1

( , , , ) ( 1, ) ( ),

n n i i i i i i

z w w i n w f G z G w

=

= ∀ = ∈∂ =

∑



T

 

Or, if we assume that

, Id

m

n n

p m G = =

 ,

{ }

1 1 1 1

( , , , ) ( 1, 1) ( ), ( )

n n i i i i i n i

z w w i n w f G z G w f z

− − =

= ∀ = − ∈∂ − ∈∂

∑



T

 

This is t he set of points satisfying the optimality conditions
S

tanding assumption:  is nonempt y

Essentially in E & S

vaiter 2009:  is a closed convex set

In the

, Id

m

n n

p m G = =

 case, streamline notation:

For

1 1 n−

∈ × ×    w

, let

1 * 1 n n i i i

w G w

− =

−∑ 

SLIDE 13

May 2019 13 of 45

Valid Inequalities for 

Take some

,

i

p i i

x y ∈ such t hat ( )

i i i

y f x ∈∂

for

1, , i n = 

If ( ,

) z ∈ w

, then

( )

i i i

w f G z ∈∂

for

1, , i n = 

S
,

,

i i i i

x G z y w − − ≥

for

1, , i n = 

Negate and add up:

1

( , ) , ( , )

n i i i i i

z G z x y w z ϕ

=

= − − ≤ ∀ ∈

∑

 w w 

{ }

( ) ( ) H p p p p ϕ ϕ = = ≤ ∀ ∈

SLIDE 14

May 2019 14 of 45

Confirming that ϕ is Affine The quadratic terms in ( ,

) z ϕ w take the form

1 1 1

, , , , 0

n n n i i i i i i i i i

G z w z G w z G w z

= = =

− = − = − = − =

∑ ∑ ∑

T T

Also true in the

, Id

m

n n

p m G = =

 case where we drop the n

th

index

S

lightly different proof, same basic idea

SLIDE 15

May 2019 15 of 45

Generic Projection Method for a Closed Convex Set  in a Hilbert Space  Apply the following general template:

Given

k

p ∈, choose some affine function

k

ϕ with ( )

k p

p ϕ ≤ ∀ ∈

Proj ect

k

p onto

{ }

( )

k k

H p p ϕ = =

, possibly with an

verrelaxation factor

[ ,2 ]

k

λ ε ε ∈ −

, giving

1 k

p + , and repeat…

In our case:

1 n

p p m

= × × ×     

and we find

k

ϕ by picking some , : ( ), 1, ,

i

p k k k k i i i i i

x y y f x i n ∈ ∈∂ =  

and using the construction above

{ }

is affine ( ) ( ) ( )

k k k k k k

H p p p p p ϕ ϕ ϕ ϕ = = ≤ ∀ ∈ > 

1 k

p +

k

p 

SLIDE 16

May 2019 16 of 45

General Properties of Projection Algorithms

Proposition. In such algorithms, assuming t hat

≠ ∅ 

,

{

}

* k

p p −

is nonincreasing for all

*

p ∈

{

}

k

p

is bounded

1

k k

p p

+ −

→

If {

}

k

ϕ ∇

is bounded, then

{ }

limsup ( )

k k k

p ϕ

→∞

≤

If all limit points of {

}

k

p

are in , then {

}

k

p

converges to a point in  The first t hree properties hold no matter how badly we choose

k

ϕ

The idea is to pick

k

ϕ so that the st ipulations of the last two

properties hold – t hen we have a convergent algorit hm If we pick

k

ϕ badly, we may “ stall”

SLIDE 17

May 2019 17 of 45

Selecting the Right

k

ϕ

S

electing

k

ϕ involves picking some , : ( )

i

p k k k k i i i i i

x y y f x ∈ ∈∂ 

,

1, , i n = 

It turns out there are many ways to pick

,

k k i i

x y so that the last

two properties of t he proposition are satisfied

One fundamental t hing we would like is

1

( , ) ,

n k k k k k k k i i i i i

z G z x y w ϕ

=

− − ≥

∑

 w

with strict inequality if (

, )

k k

z ∉ w

The oldest suggestion is “ prox” (E & S

vait er 2008 & 2009)

SLIDE 18

May 2019 18 of 45

The Prox Operation

S

uppose we have a convex function

{ }

:

p

f → ∪ +∞  

Take any vector

p

r ∈ and scalar c > and solve

2 '

1 argmin ( ') ' 2

p

x

x f x x r c

∈

  = + −    



Optimality condition for this minimization is

1 ( ) ( ) f x x r c ∈∂ + −

S
we have

1 ( ) ( ) y r x f x c − ∈∂ 

And

1 ( ) x cy x c r x r c + = + ⋅ − =

S
, we j ust found ,

p

x y∈ such that ( ) y f x ∈∂

and x

cy r + =

Call this Prox ( )

c f r ∂

SLIDE 19

May 2019 19 of 45

Picture

The choice of ,

p

x y∈ such that ( ) y f x ∈∂

and x

cy r + = must

be unique; otherwise f

∂ would not be monotone

If f is closed and proper, then this solution must exist
Any vector

p

r ∈ can t hen be written in a unique way as x cy r + = , where ( ) y f x ∈∂

Generalizes proj ect ion to a subspace and it s complement

x cy r + =

( )

1

0, c r ( ,0) r ( , ) x y f ∂

SLIDE 20

May 2019 20 of 45

Prox Does the Job!

We have an iterate

1

( , ) ( , , , )

k k k k k k n

p z z w w = =  w

Take any

1 ,

,

k nk

c c > 

and consider (

, ) Prox ( )

ik i

c k k k k i i f i ik i

x y G z c w

∂

= +

Then

( )

k k k k k k k k i ik i i ik i ik i i i i

x c y G z c w c y w G z x + = + ⇔ − = −

Implying

2 2 1

,

k k k k k k k k i i i i ik i i ik i i

G z x y w c G z x c y w

−

− − = − = − ≥

k k k k i ik i i ik i

x c y G z c w + = + ( , )

k k i i

x y ( , )

k k i

z w

i

T

SLIDE 21

May 2019 21 of 45

Prox Finishes the Job From

2 2 1

,

k k k k k k k k i i i i ik i i ik i i

G z x y w c G z x c y w

−

− − = − = − ≥

we have that

1

,

n k k k k i i i i i

G z x y w

=

− − ≥

∑

and t his inequality is strict unless

k k i i

G z x =

and

k k i i

y w =

for all i, which means that (

, )

k k

z ∈ w

The entire convergence proof follows from t his same relationship.

SLIDE 22

May 2019 22 of 45

A First Algorithm

These conditions allow one t o prove that t he cuts are “ deep

enough” and we obtain convergence S tarting with an arbit rary

1

( , , , )

n

z w w 

: For

0,1,2, k = 

1. For

1, , i n = 

, compute

,

( , ) Prox ( )

i k i

c k k k k i i T i i i

x y G z c w = +

(Process operators: Decomposition S tep)

2. Define

1 1

( , , , ) ,

n k k k n i i i i i

z w w G z x y w ϕ

=

= − −

∑



3. Compute

1 1 1 1

( , , , )

k k k n

z w w

+ + +



by proj ecting

1 1

( , , , )

k k k n

z w w

+



nto the halfspace

1

( , , , )

k n

z w w ϕ ≤ 

(possibly with some overrelaxation) (Coordination S tep)

This simple algorithm combines aspects of E & S

vaiter 2009 and Alotaibi et al. 2014

SLIDE 23

May 2019 23 of 45

Including the Details (Version 1: general case)

Choose any

min max

2 λ λ < ≤ <

For

1,2, k = 

{ }

1 1 1 1 1 1 2 2 1 min max 1

, : ( ), 1, , ( , , ) proj ( , , ) ( , , ) max , ,0 [ , ]

P to find , where Pick any rocess operat r

s

i

p k k k k i i i i i n k k k k n n n i i i n k k i i i n k k i i i i i k n k k i i k

x y y f x i n u u x x w w G w v G y G z x y w v u z θ λ λ λ

= = = = +

∈ ∈∂ = = = = = − − = + ∈ =

∑ ∑ ∑ ∑



 ฀

T T

    

1

, 1, ,

k k k k k k k i i k k i

z v w w u i n λ θ λ θ

+

− = − = 

SLIDE 24

May 2019 24 of 45

Including the Details (Version 2:

, Id

m

n n

p m G = =

 )

Choose any

min max

2 λ λ < ≤ <

For

1,2, k = 

{ }

1 1 1 2 2 1 min max 1 1

, : ( ), 1, , , 1, , 1 max , ,0 [ , ] 1,

to find Pick Process operator n s a y

i

p k k k k i i i i i k k k i i i n n k k k i i n i n k k i i i i i k n k k i i k k k k k k k k i i k k i

x y y f x i n u x G x i n v G y y G z x y w v u z z v w w u i θ λ λ λ λ θ λ θ

− = = = + +

∈ ∈∂ = = − = − = + − − = + ∈ = − = − =

∑ ∑ ∑

฀฀

T

   , 1 n − 

SLIDE 25

May 2019 25 of 45

Many Variations Possible in “Process Operators”

1. Inexact processing: the prox operations may be performed

approximat ely using a relative error criterion

E & S

vait er 2009

2. Block iterations: you do not have to process every operator

at every it eration; you may process some subset and let

1 1

( , ) ( , )

k k k k i i i i

x y x y

− −

=

for the rest, so long as you process each

perator at least once every M iterations
Combettes & E 2018, E 2017
3. Asynchrony: you may process operators using (boundedly) old

information

( , ) ( , )

( , )

d i k d i k

z w

, where

( , ) k d i k k K ≥ ≥ −

Combettes & E 2018, E 2017
4. Non-prox steps: For Lipschitz continuous gradients,

procedures using one or two gradient steps may be substit uted for the prox operations

Johnst one and E 2018, 2019

also see Tranh-Dinh and Vũ 2015 + “ mix and match”

SLIDE 26

May 2019 26 of 45

Another Variation: Primal-Dual Scaling

Method performs proj ections in primal-dual space
Consider scaling t he problem:

,

i i

f f α α → >

If α is large, dual convergence will be emphasized over primal
If α is small, primal convergence will be emphasized over dual
To compensate, use t he inner product on

1 n+



given by

1 1 1

( , , , ),( , , , ) , ,

n n n i i i

z w w z w w z z w w

γ

=

′ ′ ′ ′ ′ = +∑  

and corresponding norm, for any scalar

γ >

In the ADMM and relat ed methods the penalty paramet er can

compensat e for problems scaling, but proj ective split ting is different

SLIDE 27

May 2019 27 of 45

An Implementation Idea: Greedy Block Selection

Our separating hyperplane is

1 1 1

( , , , ) ,

n k k k n i i i i i

z w w G z x y w ϕ

− =

= − − =

∑



If we proj ect without any overrelaxation, we will have

1 1 1 1 1 1 1 1

( , , , ) ,

n k k k k k k k k n i i i i i

z w w G z x y w ϕ

+ + + + + − =

= − − =

∑

 Z

{ }

( )

k k

H p p ϕ = =

1 k

p +

k

p

SLIDE 28

May 2019 28 of 45

Greedy Block Selection (2a)

1 1 1

,

n k k k k i i i i i

G z x y w

+ + =

− − =

∑

If all the

1 1

,

k k k k ik i i i i

G z x y w ϕ

+ +

= − −

are zero, we are in 

Otherwise, some are positive and some are negative

SLIDE 29

May 2019 29 of 45

Greedy Block Selection (2b)

1 1 1

,

n k k k k i i i i i

G z x y w

+ + =

− − =

∑

If all the

1 1

,

k k k k ik i i i i

G z x y w ϕ

+ +

= − −

are zero, we are in 

Otherwise, some are positive and some are negative
Pick a block with

ik

ϕ <

Processing block i results in

ik

ϕ ≥

SLIDE 30

May 2019 30 of 45

Greedy Block Selection (2c)

1 1 1

,

n k k k k i i i i i

G z x y w

+ + =

− − =

∑

If all the

1 1

,

k k k k ik i i i i

G z x y w ϕ

+ +

= − −

are zero, we are in 

Otherwise, some are positive and some are negative
Pick a block with

ik

ϕ <

Processing block i results in

ik

ϕ ≥

Will make the entire sum positive again
⇒ Can cut off the current point by processing j ust one block

SLIDE 31

May 2019 31 of 45

Greedy Block Selection (3)

A simple “ greedy” heuristic: prioritize the block i wit h the

most negat ive

ik

ϕ

This ignores several t hings:

How large will

ik

ϕ become after we process t he block?

The proj ection formula ont o the hyperplane is

1 2

( )

k k k k k k

p p p ϕ ϕ ϕ

+

  = − ∇     ∇  

S

, the length of the step is

( )

k k k

p ϕ ϕ ∇

The heurist ic makes some attempt t o obtain a large numerator, but ignores the denominator

SLIDE 32

May 2019 32 of 45

Computational Experiments: LASSO LAS S O problems:

{ }

2 1 2 1

min

d

x

Qx b x λ

∈

− +



Partition Q into r blocks of rows, set

1 n r = +

2 1 2 1 1

min

d

r i i x i

Q x b x λ

∈ =

  − +    

∑



S

we can set

1

( ) ( ), 1.. 1

i i i i n

T x Q Q x b i n T λ = − ∀ ∈ − = ∂ ⋅

T

At each iteration, process blocks { , }

i n , where 1.. 1 i n ∈ − is

selected randomly or greedily

Measure the number of “ Q-equivalent” mat rix multiplies

SLIDE 33

May 2019 33 of 45

Augmented Cancer RNA Data: Dense, 3,204 × 20,531 “ PS For” : forward steps for

1, , i r = 

“ PS Back” : proximal steps “ (10,G)” :

10 r =

, greedy selection 526MB

f data

SLIDE 34

May 2019 34 of 45

Hand Gesture Data: Dense, 1,500 × 3,000 36MB

f data

SLIDE 35

May 2019 35 of 45

drivFace Data: Dense, 606 × 6,400 31MB

f data

SLIDE 36

May 2019 36 of 45

Randomly Generated Data: Dense, 1,000 × 100,000 800MB

f data

SLIDE 37

May 2019 37 of 45

A (not Very Realistic) Portfolio Selection Application

1 2 1

min ST 1,

m i i

x Qx r x R x x

=

≥ = ≥

∑

T T

Q is a 10,000 × 10,000 dense positive semidefinite matrix
Model as minimizing t he sum of t hree functions

1 2 3

f f f + +

1 1 1 2 2 2

0, 1, 0, ( ) ( ) ( ) , ,

therwise

m i i x

x r x R f x x Qx f x f x r x R

=

  = ≥ ≥  = = =   +∞ < +∞   

∑

T T T

1

f has a Lipschitz/ cocoercive gradient

2

3

, f f have simple, linear-time prox operators

The size and density of Q makes this problem hard for

standard QP solvers

SLIDE 38

May 2019 38 of 45

Run Time Results (Mixed)

R = (Rfac) × (average value of i

r)

5 10 15 20 25 30 Rfac=0. 5 Inst ances Rfac=0. 8 Inst ances Rfac=1 Inst ances Rfac=1. 5 Inst ances

Average R un Time Over 10 Problem Instances (NumPy Implementation)

Proj ect ive, one f orward st ep f or f 1 Pedregosa & Gidel 3-op split t ing Chambolle-Pock primal-dual (product space) Primal-dual Tseng (Combet t es + Pesquet ) Malit sky + Tam forward-reflect backward (primal-dual)

SLIDE 39

May 2019 39 of 45

Sparse Group-Regularized Logistic Regression,

1 2

0.05 λ λ = =

( )

1 1 1 2 2 ,

min log 1 exp ( )

d

n i i G x x i G

y x a x x x λ λ

∈ ∈ = ∈

  + − + + +    

∑ ∑

   

where  is a disj oint collection of subsets of {1,

, } d 

Breast cancer gene expression dataset (7705 genes × 60 patients)

SLIDE 40

May 2019 40 of 45

Sparse Group-Regularized Logistic Regression,

1 2

0.5 λ λ = =

SLIDE 41

May 2019 41 of 45

Sparse Group-Regularized Logistic Regression,

1 2

0.85 λ λ = =

SLIDE 42

May 2019 42 of 45

Another Application: Stochastic Programming

Multi-stage linear programming problem over an unfolding tree
f scenarios
Application of proj ective splitting in a working paper by E,

Watson and Woodruff

None of t he

i

G are the identity

S

ubproblems are quadratic programming problems for a single (multi-stage) scenario

Results in a method resembling Rockafellar and Wets’

progressive hedging (PH) method (blocks = scenarios)

PH synchronous and processes every scenario at every iteration
Our method is asynchronous and can process as few as one

scenario per iteration

Implement ed within t he Python-based PyS

P modeling/ solution environment (Watson, Woodruff & Hart 2012)

SLIDE 43

May 2019 43 of 45

Preliminary Results on a 32-Core Workstation (Woodruff)

10,000 N =

scenarios in

20 n =

bundles, times in seconds Blue points are PH on the same scenarios (and bundles)

CPLEX cannot solve t he extensive form of t his problem in 3

days with 96 cores and 1TB RAM

SLIDE 44

May 2019 44 of 45

Something to Keep in Mind The proj ection operat ions, e.g.

{ }

1 2 2 1 1 1

max , ,0 1, , 1

n k k i i i i i k n k k i i k k k k k k k k i i k k i

G z x y w v u z z v w w u i n θ λ θ λ θ

= = + +

− − = + = − = − = −

∑ ∑



Require linear time (less in a parallel implementation)
But do t ouch every primal and dual variable
If processing an operator requires only a simple linear-time
peration, one might as well do it every iteration
Higher-complexity operations (matrix multiplication, quadratic

programming) are different

SLIDE 45

May 2019 45 of 45

Conclusions

Proj ective splitting is a powerful framework for decomposing

convex opt imization problems

Numerous variations are possible
Does not care how many operat ors there are
Accomplished “ full splitting” when linear coupling mat rices

i

G

are present

Has applications in
Data analysis / statist ics
Multistage stochastic programming
Vision and imaging ?

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?

Projective Splitting Methods for Decomposing Convex Optimization Problems

Introductory Remarks

called the ADMM (the Alternating Direction Method of Multipliers)

Introductory Remarks

called the ADMM (the Alternating Direction Method of Multipliers)

because about 15 years ago I st arted being asked to referee a deluge of papers with this picture:

Introductory Remarks

called the ADMM (the Alternating Direction Method of Multipliers)

because about 15 years ago I st arted being asked to referee a deluge of papers with this picture:

Introductory Remarks

called the ADMM (the Alternating Direction Method of Multipliers)

because about 15 years ago I st arted being asked to referee a deluge of papers with this picture:

building blocks to the ADMM but is much more flexible

More General Problem Setting The algorit hms in t his talk can work for monotone inclusion problems of the form

( )

G T G x

∈∑

where

,

  

are real Hilbert spaces

T   are (generally set-valued) maximal monot one

1, , i n = 

G   are bounded linear maps, 1, , i n = 

However, for this t alk we will restrict ourselves to...

A General Convex Optimization Problem

{ }

min ( )

f G x

∑

1, , i n = 

,

: { }

f → ∪ +∞  

is closed proper convex

1, , i n = 

,

G is a

p m ×

real matrix

for standard LP/ NLP solvers because either

Subgradient Maps of Convex Functions, Monotonicity The subgradient map f

∂ of a convex function

{ }

:

f → ∪ +∞  

is given by

{ }

( ) ( ') ( ) , ' '

f x y f x f x y x x x ∂ = ≥ + − ∀ ∈

. This has the property that

( ), ' ( ') ', ' y f x y f x x x y y ∈∂ ∈∂ ⇒ − − ≥

Proof:

( ') ( ) , ' ( ) ( ') ', ' ' , ' f x f x y x x f x f x y x x y y x x − ≥ − − ≥ − ≥ − −

Normal Cone Maps The indicat or f unct ion of a nonempty closed convex set C is

0, ( ) ,

x C x x C δ ∈  = +∞ ∉ 

Its subgradient map is the normal cone map

N of C:

{ }

, ' ' , ( ) ( )

y y x x x C x C x N x x C δ  − ≤ ∀ ∈ ∈ ∂ = = ∅ ∉  x y ' x ' y ' x x − , ' ', ' ' , ' y x x y x x y y x x − ≤ + − ≤ − − ≤ C ( ')

N x

A Subgradient Chain Rule

uppose

: { }

f → ∪ +∞  

is closed proper convex

uppose G is a p

m ×

real matrix Then for any x,

( ) {

}

( )( ) ( ) f G x G f Gx G y y f Gx ∂ ⊇ ∂ = ∈∂



and “ usually”

( )

( )( ) f G x G f Gx ∂ = ∂



An Optimality Condition Let’ s go back to