Computational Optimization Mathematical Programming Fundamentals - - PowerPoint PPT Presentation
Computational Optimization Mathematical Programming Fundamentals - - PowerPoint PPT Presentation
Computational Optimization Mathematical Programming Fundamentals 1/25 (revised) If you dont know where you are going, you probably wont get there. -from some book I read in eight grade If you do get there, you wont know it. -Dr.
If you don’t know where you are going, you probably won’t get there.
- from some book I read in eight grade
Mathematical Programming Theory tells us – How to formulate a model. Strategies for solving the model. How to know when we have found an
- ptimal solutions.
How hard it is to solve the model. Let’s start with the basics…………………
If you do get there, you won’t know it.
- Dr. Bennett’s amendment
Line Segment
Let x∈Rn and y∈Rn, the points on the line segment joining x and y are { z | z = λx+(1- λ)y, 0≤ λ ≤ 1 }.
y x
Convex Sets
A set S is convex if the line segment joining any two points in the set is also in the set, i.e., for any x,y∈S, λx+(1- λ)y ∈S for all 0≤ λ ≤ 1 }.
convex convex not convex not convex not convex
Favorite Convex Sets
Circle with center c and radius r Linear Equalities = plane Linear Inequalities or Polyhedrons
}
{ | x x c r − ≤
{ |
}
mxn m n
Matrix A R b R x R x Ax b ∈ ∈ ∈ =
{ |
}
mxn m n
Matrix A R b R x R x Ax b ∈ ∈ ∈ ≤
Convex Sets
Is the intersection of two convex sets convex? Yes Is the union of two convex sets convex? NO
Convex Functions
A function f is (strictly) convex on a convex set S, if and only if for any x,y∈S, f(λx+(1- λ)y)(<) ≤ λ f(x)+ (1- λ)f(y) for all 0≤ λ ≤ 1.
x y f(y) f(x)
λx+(1- λ)y f(λx+(1- λ)y)
Concave Functions
A function f is (strictly) concave on a convex set S, if and only if for any –f is (strictly) convex on S.
f
- f
(Strictly)Convex, Concave, or none of the above?
None of the above Concave Convex Concave Strictly convex
Favorite Convex Functions
Linear functions Certain Quadratic functions depends
- n choice of Q (the Hessian matrix)
1
( ) '
n n i i i
f x w x w x where x R
=
= = ∈
∑
1 2 1 2
( , ) 2 f x x x x = + ( ) ' ' f x x Qx w x c = + +
2 2 1 2 1 2
( , ) 2 f x x x x = +
Convexity of function affects
- ptimization algorithm
Convexity of constraints affects optimization algorithm
min f(x) subject to x∈S
S convex S not convex direction of Steepest descent
Convex Program
min f(x) subject to x∈S where f and S are convex Make optimization nice Many practical problems are convex problem Use convex program as subproblem for nonconvex programs
Theorem : Global Solution of convex program
If x* is a local minimizer of a convex programming problem, x* is also a global minimizer. Further more if the
- bjective is strictly convex then x* is the
unique global minimizer. Proof: contradiction
x* y f(y)<f(x*)
Proof by contradiction
Suppose x* is a local but not global minimizer, i.e. there exist y, s.t. f(y) <f(x*). Then for all 0<ε<1, f(εx*+(1- ε)y)≤ ε f(x*)+(1- ε)f(y) < ε f(x*)+(1- ε)f(x*)=f(x*). Contradiction, x* is not a local min. You try for uniqueness in strict case.
Problems with nonconvex
- bjective
a x* b
f strictly convex, problem has unique global minimum
Min f(x) subject to x ∈ [a,b]
x*
f not convex, problem has two local minima
a x’ b
Problems with nonconvex set
a x* d
Min f(x) subject to x ∈ [a,b] or [c d]
b c x’
Multivariate Calculus
For x ∈Rn, f(x)=f(x1, x2 , x3 , x4 ,…, xn) The gradient of f: The Hessian of f:
1 2
( ) ( ) ( ) ( ) , ,...,
n
f x f x f x f x x x x ′ ⎛ ⎞ ∂ ∂ ∂ ∇ = ⎜ ⎟ ∂ ∂ ∂ ⎝ ⎠
2 2 2 1 1 1 2 1 2 2 2 2 1 2 2 2 2 2 1 2
( ) ( ) ( ) ... ( ) ( ) ... ( ) ( ) ( ) ( ) ...
n n n n n
f x f x f x x x x x x x f x f x f x x x x x f x f x f x x x x x x x ⎡ ⎤ ∂ ∂ ∂ ⎢ ⎥ ∂ ∂ ∂ ∂ ∂ ∂ ⎢ ⎥ ⎢ ⎥ ∂ ∂ ⎢ ⎥ ∇ = ∂ ∂ ∂ ∂ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ∂ ∂ ∂ ⎢ ⎥ ∂ ∂ ∂ ∂ ∂ ∂ ⎢ ⎥ ⎣ ⎦
For example
1 1 1
4 3 2 1 2 1 2 3 1 2 3 2 1 3 2 2 2
( ) 3 4 2 3 4 ( ) 1 2 4 2 9 4 ( ) 4 3 6
x x x
f x x e x x x x e x f x x x e f x x = + + + ⎡ ⎤ + + ∇ = ⎢ ⎥ + ⎢ ⎥ ⎣ ⎦ ⎡ ⎤ + ∇ = ⎢ ⎥ ⎣ ⎦
2
[ 0 ,1] 7 ( ) 1 2 1 1 4 ( ) 4 3 6 x f x f x ′ = ⎡ ⎤ ∇ = ⎢ ⎥ ⎣ ⎦ ⎡ ⎤ ∇ = ⎢ ⎥ ⎣ ⎦
Quadratic Functions
Form Gradient
1 1 1
1 ( ) ' 2 1 2
n n n ij i j j j i j j
f x x Q x b x Q x x b x
= = =
′ = − = −
∑ ∑ ∑
2
( ) ( ) f x Qx b f x Q ∇ = − ∇ =
1
( ) 1 1 2 2 assuming symmetric
kk k ik i kj j k i k j k k n kj j k j
f x Q x Q x Q x b x Q x b Q
≠ ≠ =
∂ = + + − ∂ = −
∑ ∑ ∑
n nxn n
x R Q R b R ∈ ∈ ∈
Taylor Series Expansion about x* - 1D Case
Let x=x*+p Equivalently
2 2 3 3 n n
1 1 f(x)= f(x*+p)=f(x*)+pf (x*)+ p f (x*)+ p f (x*) 2 3! 1 + + p f (x*) + n! ′ … …
2 2 3 3 n n
1 1 f(x)=f(x*)+(x-x*)f (x*)+ (x-x*) f (x*)+ ( *) f (x*) 2 3! 1 + + ( *) f (x*) + n! x x x x ′ − − … …
Taylor Series Example
Let f(x) = exp(-x), compute Taylor Series Expansion about x*=0
2 2 3 3 n n 2 3 * * * n * 2 3 n
1 1 f(x)=f(x*)+(x-x*)f (x*)+ (x-x*) f (x*)+ ( *) f (x*) 2 3! 1 + + ( *) f (x*) + n! 1 + +(-1) 2 3! ! 1 + +(-1) 2 3! !
n x x x x n
x x x x x x x xe e e e n x x x x n
− − − −
′ − − = − + − + = − + − + … … … … … …
First Order Taylor Series Approximation
Let x=x*+p Says that a linear approximation of a function works well locally
f(x)=f(x*+p)=f(x*)+p f(x*)+ p ( *, ) lim ( *, )
p
x p where x p α α
→
′∇ = f(x)
f(x) f(x*+p)= ( *) ( *) f x p f x ′ ≈ + ∇
x*
f(x) ( *) ( *)' ( *) f x x x f x ≈ + − ∇
Second OrderTaylor Series Approximation
Let x=x*+p Says that a quadratic approximation of a function works even better locally
2 2
1 f(x)=f(x*+p)=f(x*)+p f(x*)+ f(x*)p+ p ( *, ) 2 lim ( *, )
p
p x p where x p α α
→
′ ′ ∇ ∇ = x* f(x)
2
f(x) ( *) ( *)' ( *) 1 ( *)' ( *)( *) 2 f x x x f x x x f x x x ≈ + − ∇ + − ∇ −
Theorem 2.1 –Taylor’s Theorem version 2
Suppose f is cont diff, If f is twice cont. diff, ( ) ( ) ( )' for some [0,1]. f x p f x f x tp p t + = + ∇ + ∈
1 2 2
( ) ( ) ( )' ' ( )' for some [0,1]. f x p f x f x p p f x tp p t + = + ∇ + ∇ + ∈
Also called Mean Value Theorem
Taylor Series Approximation Exercise
Consider the function and x*=[-2,3] Compute gradient and Hessian. What is First Order TSA about X* What is second order TSA about X* Evaluate both TSA at y=[-1.9,3.2] and compare with f(y)
2
3 2 2 2 1 2 1 1 2 1 2
( , ) 5 7 2 f x x x x x x x x = + + +
Exercise
2
3 2 2 2 1 2 1 1 2 1 2 2 2 1 2 2
( , ) 5 7 2 fu n ctio n ( ) ( *) [ , ] ' g rad ien t ( ) ( *) H essian F irst o rd er T S A : ( ) ( *) ( *) ( *) seco n d o rd er T S A : ( ) ( *) ( *) ( *) ( *) ( f x x x x x x x x f x f x f x f x g x f x x x f x h x f x x x f x x x f = + + + ⎡ ⎤ ∇ = ∇ = ⎢ ⎥ ⎣ ⎦ ⎡ ⎤ ⎡ ⎤ ∇ = ∇ = ⎢ ⎥ ⎢ ⎥ ⎣ ⎦ ⎣ ⎦ ′ = + − ∇ = ′ = + − ∇ ′ + − ∇ *)( *) | ( ) ( ) | | ( ) ( ) | x x x f y g y f y h y − − = − =
Exercise
2
3 2 2 2 1 2 1 1 2 1 2 2 2 1 1 2 2 2 1 1 2 2 1 2 1 2 2 2 1 2 1
( , ) 5 7 2 function ( *) 56 3x 10 7 ( ) ( *) [15, 52] gradient 5 14 4 6 10 10 14 18 22 ( ) ( *) Hessian 10 14 14 4 22
- 24
f x x x x x x x x f x x x x f x f x x x x x x x x x f x f x x x x = + + + = − ⎡ ⎤ + + ′ ∇ = ∇ = − ⎢ ⎥ + + ⎢ ⎥ ⎣ ⎦ + + ⎡ ⎤ ⎡ ⎤ ∇ = ∇ = ⎢ ⎥ ⎢ ⎥ + + ⎣ ⎦ ⎣ ⎦
Exercise
1 2 2
F ir s t o r d e r T S A : ( ) ( * ) ( * ) ( * ) s e c o n d o r d e r T S A : ( ) ( * ) ( * ) ( * ) ( * ) ( * ) ( * ) | ( ) ( ) | | 6 4 .8 1 1 ( 6 4 .9 ) | .0 8 9 | ( ) ( ) | | 6 4 .8 1 1 ( 6 4 .5 ) | .0 3 9 g x f x x x f x h x f x x x f x x x f x x x f y g y f y h y ′ = + − ∇ ′ = + − ∇ ′ ′ + − ∇ − − = − − − = − = − − − =
General Optimization algorithm
Specify some initial guess x0 For k = 0, 1, ……
If xk is optimal then stop Determine descent direction pk Determine improved estimate of the
solution: xk+1=xk+λkpk
Last step is one-dimensional search problem called line search
Descent Directions
If the directional derivative is negative then linesearch will lead to decrease in the function
[8,2] [0,-1] d
( ) f x d ′ ∇ <
( ) f x −∇
Descent directions create decrease
Proof
Let ' ( ) 0, then 0 suchthat ( ) ( ) d f x f x d f x for λ λ λ λ ∇ < ∃ > + < ≤
( ) ( ) ( ) ( , ) ( ) ( ) ( ) ( , ) ( ) ( ) 0 for sufficiently small since ( ) 0and ( , ) 0. f x d f x d f x d x d f x d f x d f x d x d f x d f x d f x x d λ λ λ α λ λ α λ λ λ λ α λ ′ + = + ∇ + ⇓ + − ′ = ∇ + ⇓ + − < ′∇ < →
Negative Gradient
An important fact to know is that the negative gradient always points downhill Proof
Let ( ), then 0 suchthat ( ) ( ) d f x f x d f x for λ λ λ λ = −∇ ∃ > + < ≤
( ) ( ) ( ) ( , ) ( ) ( ) ( ) ( , ) ( ) ( ) 0 for sufficiently small since ( ) 0and ( , ) 0. f x d f x d f x d x d f x d f x d f x d x d f x d f x d f x x d λ λ λ α λ λ α λ λ λ λ ′ + = + ∇ + ⇓ + − ′ = ∇ + ⇓ + − < α λ ′∇ < →
Notes on negative gradient
If gradient nonzero, then negative gradient defines a descent direction
2
' ( ) ( )' ( ) ( ) ( ) d f x f x f x by substitutionof d f x if f x ∇ = −∇ ∇ = − ∇ < ∇ ≠
Directional Derivative
Always exists when function is convex
( ) ( ) ( , ) lim ( ) f x d f x f x d f x d
λ
λ λ
→