[PPT] - Computational Optimization Mathematical Programming Fundamentals PowerPoint Presentation

SLIDE 1

Computational Optimization

Mathematical Programming Fundamentals 1/25 (revised)

SLIDE 2

If you don’t know where you are going, you probably won’t get there.

from some book I read in eight grade

Mathematical Programming Theory tells us – How to formulate a model. Strategies for solving the model. How to know when we have found an

ptimal solutions.

How hard it is to solve the model. Let’s start with the basics…………………

If you do get there, you won’t know it.

Dr. Bennett’s amendment

SLIDE 3

Line Segment

Let x∈Rn and y∈Rn, the points on the line segment joining x and y are { z | z = λx+(1- λ)y, 0≤ λ ≤ 1 }.

y x

SLIDE 4

Convex Sets

A set S is convex if the line segment joining any two points in the set is also in the set, i.e., for any x,y∈S, λx+(1- λ)y ∈S for all 0≤ λ ≤ 1 }.

convex convex not convex not convex not convex

SLIDE 5

Favorite Convex Sets

Circle with center c and radius r Linear Equalities = plane Linear Inequalities or Polyhedrons

}

{ | x x c r − ≤

{ |

}

mxn m n

Matrix A R b R x R x Ax b ∈ ∈ ∈ =

{ |

}

mxn m n

Matrix A R b R x R x Ax b ∈ ∈ ∈ ≤

SLIDE 6

Convex Sets

Is the intersection of two convex sets convex? Yes Is the union of two convex sets convex? NO

SLIDE 7

Convex Functions

A function f is (strictly) convex on a convex set S, if and only if for any x,y∈S, f(λx+(1- λ)y)(<) ≤ λ f(x)+ (1- λ)f(y) for all 0≤ λ ≤ 1.

x y f(y) f(x)

λx+(1- λ)y f(λx+(1- λ)y)

SLIDE 8

Concave Functions

A function f is (strictly) concave on a convex set S, if and only if for any –f is (strictly) convex on S.

f

f

SLIDE 9

(Strictly)Convex, Concave, or none of the above?

None of the above Concave Convex Concave Strictly convex

SLIDE 10

Favorite Convex Functions

Linear functions Certain Quadratic functions depends

n choice of Q (the Hessian matrix)

1

( ) '

n n i i i

f x w x w x where x R

=

= = ∈

∑

1 2 1 2

( , ) 2 f x x x x = + ( ) ' ' f x x Qx w x c = + +

2 2 1 2 1 2

( , ) 2 f x x x x = +

SLIDE 11

Convexity of function affects

ptimization algorithm

SLIDE 12

Convexity of constraints affects optimization algorithm

min f(x) subject to x∈S

S convex S not convex direction of Steepest descent

SLIDE 13

Convex Program

min f(x) subject to x∈S where f and S are convex Make optimization nice Many practical problems are convex problem Use convex program as subproblem for nonconvex programs

SLIDE 14

Theorem : Global Solution of convex program

If x* is a local minimizer of a convex programming problem, x* is also a global minimizer. Further more if the

bjective is strictly convex then x* is the

unique global minimizer. Proof: contradiction

x* y f(y)<f(x*)

SLIDE 15

Proof by contradiction

Suppose x* is a local but not global minimizer, i.e. there exist y, s.t. f(y) <f(x). Then for all 0<ε<1, f(εx+(1- ε)y)≤ ε f(x)+(1- ε)f(y) < ε f(x)+(1- ε)f(x)=f(x). Contradiction, x* is not a local min. You try for uniqueness in strict case.

SLIDE 16

Problems with nonconvex

bjective

a x* b

f strictly convex, problem has unique global minimum

Min f(x) subject to x ∈ [a,b]

x*

f not convex, problem has two local minima

a x’ b

SLIDE 17

Problems with nonconvex set

a x* d

Min f(x) subject to x ∈ [a,b] or [c d]

b c x’

SLIDE 18

Multivariate Calculus

For x ∈Rn, f(x)=f(x1, x2 , x3 , x4 ,…, xn) The gradient of f: The Hessian of f:

1 2

( ) ( ) ( ) ( ) , ,...,

n

f x f x f x f x x x x ′ ⎛ ⎞ ∂ ∂ ∂ ∇ = ⎜ ⎟ ∂ ∂ ∂ ⎝ ⎠

2 2 2 1 1 1 2 1 2 2 2 2 1 2 2 2 2 2 1 2

( ) ( ) ( ) ... ( ) ( ) ... ( ) ( ) ( ) ( ) ...

n n n n n

f x f x f x x x x x x x f x f x f x x x x x f x f x f x x x x x x x ⎡ ⎤ ∂ ∂ ∂ ⎢ ⎥ ∂ ∂ ∂ ∂ ∂ ∂ ⎢ ⎥ ⎢ ⎥ ∂ ∂ ⎢ ⎥ ∇ = ∂ ∂ ∂ ∂ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ∂ ∂ ∂ ⎢ ⎥ ∂ ∂ ∂ ∂ ∂ ∂ ⎢ ⎥ ⎣ ⎦

SLIDE 19

For example

1 1 1

4 3 2 1 2 1 2 3 1 2 3 2 1 3 2 2 2

( ) 3 4 2 3 4 ( ) 1 2 4 2 9 4 ( ) 4 3 6

x x x

f x x e x x x x e x f x x x e f x x = + + + ⎡ ⎤ + + ∇ = ⎢ ⎥ + ⎢ ⎥ ⎣ ⎦ ⎡ ⎤ + ∇ = ⎢ ⎥ ⎣ ⎦

2

[ 0 ,1] 7 ( ) 1 2 1 1 4 ( ) 4 3 6 x f x f x ′ = ⎡ ⎤ ∇ = ⎢ ⎥ ⎣ ⎦ ⎡ ⎤ ∇ = ⎢ ⎥ ⎣ ⎦

SLIDE 20

Quadratic Functions

Form Gradient

1 1 1

1 ( ) ' 2 1 2

n n n ij i j j j i j j

f x x Q x b x Q x x b x

= = =

′ = − = −

∑ ∑ ∑

2

( ) ( ) f x Qx b f x Q ∇ = − ∇ =

1

( ) 1 1 2 2 assuming symmetric

kk k ik i kj j k i k j k k n kj j k j

f x Q x Q x Q x b x Q x b Q

≠ ≠ =

∂ = + + − ∂ = −

∑ ∑ ∑

n nxn n

x R Q R b R ∈ ∈ ∈

SLIDE 21

Taylor Series Expansion about x* - 1D Case

Let x=x*+p Equivalently

2 2 3 3 n n

1 1 f(x)= f(x+p)=f(x)+pf (x)+ p f (x)+ p f (x) 2 3! 1 + + p f (x) + n! ′ … …

2 2 3 3 n n

1 1 f(x)=f(x)+(x-x)f (x)+ (x-x) f (x)+ ( ) f (x) 2 3! 1 + + ( ) f (x*) + n! x x x x ′ − − … …

SLIDE 22

Taylor Series Example

Let f(x) = exp(-x), compute Taylor Series Expansion about x*=0

2 2 3 3 n n 2 3 * * * n * 2 3 n

1 1 f(x)=f(x)+(x-x)f (x)+ (x-x) f (x)+ ( ) f (x) 2 3! 1 + + ( ) f (x*) + n! 1 + +(-1) 2 3! ! 1 + +(-1) 2 3! !

n x x x x n

x x x x x x x xe e e e n x x x x n

− − − −

′ − − = − + − + = − + − + … … … … … …

SLIDE 23

First Order Taylor Series Approximation

Let x=x*+p Says that a linear approximation of a function works well locally

f(x)=f(x+p)=f(x)+p f(x)+ p ( , ) lim ( *, )

p

x p where x p α α

→

′∇ = f(x)

f(x) f(x+p)= ( ) ( *) f x p f x ′ ≈ + ∇

x*

f(x) ( ) ( )' ( *) f x x x f x ≈ + − ∇

SLIDE 24

Second OrderTaylor Series Approximation

Let x=x*+p Says that a quadratic approximation of a function works even better locally

2 2

1 f(x)=f(x+p)=f(x)+p f(x)+ f(x)p+ p ( , ) 2 lim ( , )

p

p x p where x p α α

→

′ ′ ∇ ∇ = x* f(x)

2

f(x) ( ) ( )' ( ) 1 ( )' ( )( ) 2 f x x x f x x x f x x x ≈ + − ∇ + − ∇ −

SLIDE 25

Theorem 2.1 –Taylor’s Theorem version 2

Suppose f is cont diff, If f is twice cont. diff, ( ) ( ) ( )' for some [0,1]. f x p f x f x tp p t + = + ∇ + ∈

1 2 2

( ) ( ) ( )' ' ( )' for some [0,1]. f x p f x f x p p f x tp p t + = + ∇ + ∇ + ∈

Also called Mean Value Theorem

SLIDE 26

Taylor Series Approximation Exercise

Consider the function and x=[-2,3] Compute gradient and Hessian. What is First Order TSA about X What is second order TSA about X* Evaluate both TSA at y=[-1.9,3.2] and compare with f(y)

2

3 2 2 2 1 2 1 1 2 1 2

( , ) 5 7 2 f x x x x x x x x = + + +

SLIDE 27

Exercise

2

3 2 2 2 1 2 1 1 2 1 2 2 2 1 2 2

( , ) 5 7 2 fu n ctio n ( ) ( ) [ , ] ' g rad ien t ( ) ( ) H essian F irst o rd er T S A : ( ) ( ) ( ) ( ) seco n d o rd er T S A : ( ) ( ) ( ) ( ) ( ) ( f x x x x x x x x f x f x f x f x g x f x x x f x h x f x x x f x x x f = + + + ⎡ ⎤ ∇ = ∇ = ⎢ ⎥ ⎣ ⎦ ⎡ ⎤ ⎡ ⎤ ∇ = ∇ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ′ = + − ∇ = ′ = + − ∇ ′ + − ∇ )( *) | ( ) ( ) | | ( ) ( ) | x x x f y g y f y h y − − = − =

SLIDE 28

Exercise

2

3 2 2 2 1 2 1 1 2 1 2 2 2 1 1 2 2 2 1 1 2 2 1 2 1 2 2 2 1 2 1

( , ) 5 7 2 function ( *) 56 3x 10 7 ( ) ( *) [15, 52] gradient 5 14 4 6 10 10 14 18 22 ( ) ( *) Hessian 10 14 14 4 22

24

f x x x x x x x x f x x x x f x f x x x x x x x x x f x f x x x x = + + + = − ⎡ ⎤ + + ′ ∇ = ∇ = − ⎢ ⎥ + + ⎢ ⎥ ⎣ ⎦ + + ⎡ ⎤ ⎡ ⎤ ∇ = ∇ = ⎢ ⎥ ⎢ ⎥ + + ⎣ ⎦ ⎣ ⎦

SLIDE 29

Exercise

1 2 2

F ir s t o r d e r T S A : ( ) ( * ) ( * ) ( * ) s e c o n d o r d e r T S A : ( ) ( * ) ( * ) ( * ) ( * ) ( * ) ( * ) | ( ) ( ) | | 6 4 .8 1 1 ( 6 4 .9 ) | .0 8 9 | ( ) ( ) | | 6 4 .8 1 1 ( 6 4 .5 ) | .0 3 9 g x f x x x f x h x f x x x f x x x f x x x f y g y f y h y ′ = + − ∇ ′ = + − ∇ ′ ′ + − ∇ − − = − − − = − = − − − =

SLIDE 30

General Optimization algorithm

Specify some initial guess x0 For k = 0, 1, ……

If xk is optimal then stop Determine descent direction pk Determine improved estimate of the

solution: xk+1=xk+λkpk

Last step is one-dimensional search problem called line search

SLIDE 31

Descent Directions

If the directional derivative is negative then linesearch will lead to decrease in the function

[8,2] [0,-1] d

( ) f x d ′ ∇ <

( ) f x −∇

SLIDE 32

Descent directions create decrease

Proof

Let ' ( ) 0, then 0 suchthat ( ) ( ) d f x f x d f x for λ λ λ λ ∇ < ∃ > + < ≤

( ) ( ) ( ) ( , ) ( ) ( ) ( ) ( , ) ( ) ( ) 0 for sufficiently small since ( ) 0and ( , ) 0. f x d f x d f x d x d f x d f x d f x d x d f x d f x d f x x d λ λ λ α λ λ α λ λ λ λ α λ ′ + = + ∇ + ⇓ + − ′ = ∇ + ⇓ + − < ′∇ < →

SLIDE 33

Negative Gradient

An important fact to know is that the negative gradient always points downhill Proof

Let ( ), then 0 suchthat ( ) ( ) d f x f x d f x for λ λ λ λ = −∇ ∃ > + < ≤

( ) ( ) ( ) ( , ) ( ) ( ) ( ) ( , ) ( ) ( ) 0 for sufficiently small since ( ) 0and ( , ) 0. f x d f x d f x d x d f x d f x d f x d x d f x d f x d f x x d λ λ λ α λ λ α λ λ λ λ ′ + = + ∇ + ⇓ + − ′ = ∇ + ⇓ + − < α λ ′∇ < →

SLIDE 34

SLIDE 35

Notes on negative gradient

If gradient nonzero, then negative gradient defines a descent direction

2

' ( ) ( )' ( ) ( ) ( ) d f x f x f x by substitutionof d f x if f x ∇ = −∇ ∇ = − ∇ < ∇ ≠

SLIDE 36

Directional Derivative

Always exists when function is convex

( ) ( ) ( , ) lim ( ) f x d f x f x d f x d

λ

λ λ

→

+ − ′ = ′ = ∇

SLIDE 37

Computational Optimization

Mathematical Programming Fundamentals 1/25 (revised)

If you don’t know where you are going, you probably won’t get there.

Mathematical Programming Theory tells us – How to formulate a model. Strategies for solving the model. How to know when we have found an

How hard it is to solve the model. Let’s start with the basics…………………

If you do get there, you won’t know it.

Line Segment

Let x∈Rn and y∈Rn, the points on the line segment joining x and y are { z | z = λx+(1- λ)y, 0≤ λ ≤ 1 }.

y x

Convex Sets

A set S is convex if the line segment joining any two points in the set is also in the set, i.e., for any x,y∈S, λx+(1- λ)y ∈S for all 0≤ λ ≤ 1 }.

convex convex not convex not convex not convex

Favorite Convex Sets

Circle with center c and radius r Linear Equalities = plane Linear Inequalities or Polyhedrons

}

{ | x x c r − ≤

{ |

{ |

Convex Sets

Is the intersection of two convex sets convex? Yes Is the union of two convex sets convex? NO

Convex Functions

A function f is (strictly) convex on a convex set S, if and only if for any x,y∈S, f(λx+(1- λ)y)(<) ≤ λ f(x)+ (1- λ)f(y) for all 0≤ λ ≤ 1.

x y f(y) f(x)

Concave Functions

A function f is (strictly) concave on a convex set S, if and only if for any –f is (strictly) convex on S.

f

(Strictly)Convex, Concave, or none of the above?

None of the above Concave Convex Concave Strictly convex

Favorite Convex Functions

Linear functions Certain Quadratic functions depends

∑

Convexity of function affects

Convexity of constraints affects optimization algorithm

min f(x) subject to x∈S

S convex S not convex direction of Steepest descent

Convex Program

min f(x) subject to x∈S where f and S are convex Make optimization nice Many practical problems are convex problem Use convex program as subproblem for nonconvex programs

Theorem : Global Solution of convex program

If x* is a local minimizer of a convex programming problem, x* is also a global minimizer. Further more if the

unique global minimizer. Proof: contradiction

x* y f(y)<f(x*)

Proof by contradiction

Suppose x* is a local but not global minimizer, i.e. there exist y, s.t. f(y) <f(x*). Then for all 0<ε<1, f(εx*+(1- ε)y)≤ ε f(x*)+(1- ε)f(y) < ε f(x*)+(1- ε)f(x*)=f(x*). Contradiction, x* is not a local min. You try for uniqueness in strict case.

Problems with nonconvex

a x* b

f strictly convex, problem has unique global minimum

Min f(x) subject to x ∈ [a,b]

x*

f not convex, problem has two local minima

a x’ b

Problems with nonconvex set

a x* d

Min f(x) subject to x ∈ [a,b] or [c d]

b c x’

Multivariate Calculus

For x ∈Rn, f(x)=f(x1, x2 , x3 , x4 ,…, xn) The gradient of f: The Hessian of f:

For example

( ) 3 4 2 3 4 ( ) 1 2 4 2 9 4 ( ) 4 3 6

f x x e x x x x e x f x x x e f x x = + + + ⎡ ⎤ + + ∇ = ⎢ ⎥ + ⎢ ⎥ ⎣ ⎦ ⎡ ⎤ + ∇ = ⎢ ⎥ ⎣ ⎦

[ 0 ,1] 7 ( ) 1 2 1 1 4 ( ) 4 3 6 x f x f x ′ = ⎡ ⎤ ∇ = ⎢ ⎥ ⎣ ⎦ ⎡ ⎤ ∇ = ⎢ ⎥ ⎣ ⎦

Quadratic Functions

Form Gradient

∑ ∑ ∑

( ) ( ) f x Qx b f x Q ∇ = − ∇ =

∑ ∑ ∑

x R Q R b R ∈ ∈ ∈

Taylor Series Expansion about x* - 1D Case

Let x=x*+p Equivalently

1 1 f(x)= f(x*+p)=f(x*)+pf (x*)+ p f (x*)+ p f (x*) 2 3! 1 + + p f (x*) + n! ′ … …

1 1 f(x)=f(x*)+(x-x*)f (x*)+ (x-x*) f (x*)+ ( *) f (x*) 2 3! 1 + + ( *) f (x*) + n! x x x x ′ − − … …

Taylor Series Example

Let f(x) = exp(-x), compute Taylor Series Expansion about x*=0

1 1 f(x)=f(x*)+(x-x*)f (x*)+ (x-x*) f (x*)+ ( *) f (x*) 2 3! 1 + + ( *) f (x*) + n! 1 + +(-1) 2 3! ! 1 + +(-1) 2 3! !

x x x x x x x xe e e e n x x x x n

′ − − = − + − + = − + − + … … … … … …

First Order Taylor Series Approximation

Let x=x*+p Says that a linear approximation of a function works well locally

f(x)=f(x*+p)=f(x*)+p f(x*)+ p ( *, ) lim ( *, )

x p where x p α α

′∇ = f(x)

Suppose x* is a local but not global minimizer, i.e. there exist y, s.t. f(y) <f(x). Then for all 0<ε<1, f(εx+(1- ε)y)≤ ε f(x)+(1- ε)f(y) < ε f(x)+(1- ε)f(x)=f(x). Contradiction, x* is not a local min. You try for uniqueness in strict case.

1 1 f(x)= f(x+p)=f(x)+pf (x)+ p f (x)+ p f (x) 2 3! 1 + + p f (x) + n! ′ … …

1 1 f(x)=f(x)+(x-x)f (x)+ (x-x) f (x)+ ( ) f (x) 2 3! 1 + + ( ) f (x*) + n! x x x x ′ − − … …

1 1 f(x)=f(x)+(x-x)f (x)+ (x-x) f (x)+ ( ) f (x) 2 3! 1 + + ( ) f (x*) + n! 1 + +(-1) 2 3! ! 1 + +(-1) 2 3! !

f(x)=f(x+p)=f(x)+p f(x)+ p ( , ) lim ( *, )

f(x) f(x+p)= ( ) ( *) f x p f x ′ ≈ + ∇

f(x) ( ) ( )' ( *) f x x x f x ≈ + − ∇

1 f(x)=f(x+p)=f(x)+p f(x)+ f(x)p+ p ( , ) 2 lim ( , )

f(x) ( ) ( )' ( ) 1 ( )' ( )( ) 2 f x x x f x x x f x x x ≈ + − ∇ + − ∇ −

Consider the function and x=[-2,3] Compute gradient and Hessian. What is First Order TSA about X What is second order TSA about X* Evaluate both TSA at y=[-1.9,3.2] and compare with f(y)