Quantitative inconsistent feasibility for averaged mappings Andrei - - PowerPoint PPT Presentation

▶

Dec 24, 2023 367 likes •977 views

Quantitative inconsistent feasibility for averaged mappings Andrei Sipos , Technische Universit at Darmstadt Institute of Mathematics of the Romanian Academy January 31, 2020 Days in Logic 2020 Lisboa, Portugal The general situation In

SLIDE 1

Quantitative inconsistent feasibility for averaged mappings

Andrei Sipos

Technische Universit¨ at Darmstadt Institute of Mathematics of the Romanian Academy

January 31, 2020 Days in Logic 2020 Lisboa, Portugal

SLIDE 2

The general situation

In nonlinear analysis and optimization, one is typically given a metric space X...

SLIDE 3

The general situation

In nonlinear analysis and optimization, one is typically given a metric space X... (you can imagine here a Hilbert space – that is all that we’ll need today)

SLIDE 4

The general situation

In nonlinear analysis and optimization, one is typically given a metric space X... (you can imagine here a Hilbert space – that is all that we’ll need today) ...and wants to find some special kind of point in it, let’s say a fixed point of a self-mapping T : X → X.

SLIDE 5

The general situation

In nonlinear analysis and optimization, one is typically given a metric space X... (you can imagine here a Hilbert space – that is all that we’ll need today) ...and wants to find some special kind of point in it, let’s say a fixed point of a self-mapping T : X → X. We denote the fixed point set of T by Fix(T).

SLIDE 6

Iterations

One typically does this by building iterative sequences (xn), e.g. the Picard iteration: let x ∈ X be arbitrary and set for any n, xn := T nx. We know that if T is a contraction, this converges strongly to a fixed point of T, but in other cases we’ll have only weaker forms of convergence...

SLIDE 7

Iterations

One typically does this by building iterative sequences (xn), e.g. the Picard iteration: let x ∈ X be arbitrary and set for any n, xn := T nx. We know that if T is a contraction, this converges strongly to a fixed point of T, but in other cases we’ll have only weaker forms of convergence... ...like weak convergence itself...

SLIDE 8

Iterations

One typically does this by building iterative sequences (xn), e.g. the Picard iteration: let x ∈ X be arbitrary and set for any n, xn := T nx. We know that if T is a contraction, this converges strongly to a fixed point of T, but in other cases we’ll have only weaker forms of convergence... ...like weak convergence itself... ...but most importantly asymptotic regularity: lim

n→∞ xn − Txn = 0.

SLIDE 9

Iterations

One typically does this by building iterative sequences (xn), e.g. the Picard iteration: let x ∈ X be arbitrary and set for any n, xn := T nx. We know that if T is a contraction, this converges strongly to a fixed point of T, but in other cases we’ll have only weaker forms of convergence... ...like weak convergence itself... ...but most importantly asymptotic regularity: lim

n→∞ xn − Txn = 0.

Intuition: convergence: “close to a fixed point” asymptotic regularity: “close to being a fixed point” (the iteration is then an approximate fixed point sequence)

SLIDE 10

A more elaborate problem

Consider now n ≥ 1 and let C1, . . . , Cn be closed, convex, nonempty subsets of X such that

n

Ci = ∅.

SLIDE 11

A more elaborate problem

Consider now n ≥ 1 and let C1, . . . , Cn be closed, convex, nonempty subsets of X such that

n

Ci = ∅. This configuration is called a (consistent) convex feasibility

problem. The problem here is to find a point in the intersection.

SLIDE 12

A more elaborate problem

Consider now n ≥ 1 and let C1, . . . , Cn be closed, convex, nonempty subsets of X such that

n

Ci = ∅. This configuration is called a (consistent) convex feasibility

problem. The problem here is to find a point in the intersection.

Bregman proved in 1965 that the Picard iteration of T := PCn ◦ . . . ◦ PC1 from an arbitrary point x converges weakly to a point in Fix(T), a set that coincides with the above intersection.

SLIDE 13

Inconsistent feasibility

What happens when the intersection is empty? (This is called a problem of inconsistent feasibility.)

SLIDE 14

Inconsistent feasibility

What happens when the intersection is empty? (This is called a problem of inconsistent feasibility.) (Of course, one doesn’t care here about convergence, since there is nothing interesting to converge to...)

SLIDE 15

Inconsistent feasibility

What happens when the intersection is empty? (This is called a problem of inconsistent feasibility.) (Of course, one doesn’t care here about convergence, since there is nothing interesting to converge to...) Conjecture (Bauschke/Borwein/Lewis ’95): asymptotic regularity still holds.

SLIDE 16

Inconsistent feasibility

What happens when the intersection is empty? (This is called a problem of inconsistent feasibility.) (Of course, one doesn’t care here about convergence, since there is nothing interesting to converge to...) Conjecture (Bauschke/Borwein/Lewis ’95): asymptotic regularity still holds. This was proved by Bauschke (Proc. AMS ’03).

SLIDE 17

More developments

The result of Bauschke was then generalized:

SLIDE 18

More developments

The result of Bauschke was then generalized: from projections onto convex sets to firmly nonexpansive mappings

SLIDE 19

More developments

The result of Bauschke was then generalized: from projections onto convex sets to firmly nonexpansive mappings

a well-behaved class of mappings which is important in convex

ptimization, as primary examples include:

projections onto closed, convex, nonempty subsets resolvents (of nonexpansive mappings, of convex lsc functions)

PC becomes R, C becomes Fix(R)

SLIDE 20

More developments

The result of Bauschke was then generalized: from projections onto convex sets to firmly nonexpansive mappings

a well-behaved class of mappings which is important in convex

ptimization, as primary examples include:

projections onto closed, convex, nonempty subsets resolvents (of nonexpansive mappings, of convex lsc functions)

PC becomes R, C becomes Fix(R)

ne assumes even less: each mapping needs to have only

approximate fixed points

SLIDE 21

More developments

The result of Bauschke was then generalized: from projections onto convex sets to firmly nonexpansive mappings

a well-behaved class of mappings which is important in convex

ptimization, as primary examples include:

projections onto closed, convex, nonempty subsets resolvents (of nonexpansive mappings, of convex lsc functions)

PC becomes R, C becomes Fix(R)

ne assumes even less: each mapping needs to have only

approximate fixed points this was done by Bauschke/Mart´ ın-M´ arquez/Moffat/Wang in 2012

SLIDE 22

More developments

The result of Bauschke was then generalized: from projections onto convex sets to firmly nonexpansive mappings

a well-behaved class of mappings which is important in convex

ptimization, as primary examples include:

projections onto closed, convex, nonempty subsets resolvents (of nonexpansive mappings, of convex lsc functions)

PC becomes R, C becomes Fix(R)

ne assumes even less: each mapping needs to have only

approximate fixed points this was done by Bauschke/Mart´ ın-M´ arquez/Moffat/Wang in 2012

even more, from firmly nonexpansive mappings to α-averaged mappings – where α ∈ (0, 1)

done by Bauschke/Moursi in 2018 firmly nonexpansive mappings are exactly 1

2-averaged mappings

SLIDE 23

Proof mining

What does logic have to do with this?

SLIDE 24

Proof mining

What does logic have to do with this? The answer is proof mining: an applied subfield of mathematical logic first suggested by G. Kreisel in the 1950s (under the name “proof unwinding”), then given maturity by U. Kohlenbach and his collaborators starting in the 1990s

SLIDE 25

Proof mining

What does logic have to do with this? The answer is proof mining: an applied subfield of mathematical logic first suggested by G. Kreisel in the 1950s (under the name “proof unwinding”), then given maturity by U. Kohlenbach and his collaborators starting in the 1990s goals: to find explicit and uniform witnesses or bounds and to remove superfluous premises from concrete mathematical statements by analyzing their proofs

SLIDE 26

Proof mining

What does logic have to do with this? The answer is proof mining: an applied subfield of mathematical logic first suggested by G. Kreisel in the 1950s (under the name “proof unwinding”), then given maturity by U. Kohlenbach and his collaborators starting in the 1990s goals: to find explicit and uniform witnesses or bounds and to remove superfluous premises from concrete mathematical statements by analyzing their proofs tools used: primarily proof interpretations (modified realizability, negative translation, functional interpretation)

SLIDE 27

Proof mining

What does logic have to do with this? The answer is proof mining: an applied subfield of mathematical logic first suggested by G. Kreisel in the 1950s (under the name “proof unwinding”), then given maturity by U. Kohlenbach and his collaborators starting in the 1990s goals: to find explicit and uniform witnesses or bounds and to remove superfluous premises from concrete mathematical statements by analyzing their proofs tools used: primarily proof interpretations (modified realizability, negative translation, functional interpretation) the adequacy of the tools to the goals is guaranteed by general logical metatheorems

SLIDE 28

Proof mining

What does logic have to do with this? The answer is proof mining: an applied subfield of mathematical logic first suggested by G. Kreisel in the 1950s (under the name “proof unwinding”), then given maturity by U. Kohlenbach and his collaborators starting in the 1990s goals: to find explicit and uniform witnesses or bounds and to remove superfluous premises from concrete mathematical statements by analyzing their proofs tools used: primarily proof interpretations (modified realizability, negative translation, functional interpretation) the adequacy of the tools to the goals is guaranteed by general logical metatheorems a short and accessible introduction may be found in Kohlenbach’s survey for ICM 2018

SLIDE 29

Rates

In the case of asymptotic regularity: ∀ε ∃N ∀n ≥ N xn − Txn ≤ ε. we would like to find a rate of asymptotic regularity: an explicit formula for N in terms of the ε and of (as few as possible of) the

ther parameters of the problem.

SLIDE 30

Rates

In the case of asymptotic regularity: ∀ε ∃N ∀n ≥ N xn − Txn ≤ ε. we would like to find a rate of asymptotic regularity: an explicit formula for N in terms of the ε and of (as few as possible of) the

ther parameters of the problem.

The statement is Π3, a case generally excluded by the metatheorems which pertain to classical logic, and rightfully so, since there exist explicit examples (“Specker sequences”) of sequences of computable reals with no computable limit and thus with no computable rate of convergence.

SLIDE 31

Rates continued

In the cases we shall discuss, however, the sequence (xn − Txn) is nonincreasing, which gets rid of the last ∀.

SLIDE 32

Rates continued

In the cases we shall discuss, however, the sequence (xn − Txn) is nonincreasing, which gets rid of the last ∀. At the other end of logical complexity, purely universal (Π1) sentences help us when they show up in proofs that we’re analyzing since they lack computational content and thus it doesn’t matter whether their subproofs conform to the requirements of the metatheorems (an observation first due to Kreisel).

SLIDE 33

Past work

Kohlenbach analyzed (Found. Comput. Math., 2019) the proofs of Bauschke ’03 and of Bauschke/Mart´ ın-M´ arquez/Moffat/Wang ’12.

SLIDE 34

Past work

Kohlenbach analyzed (Found. Comput. Math., 2019) the proofs of Bauschke ’03 and of Bauschke/Mart´ ın-M´ arquez/Moffat/Wang ’12. These proofs are organized as follows:

SLIDE 35

Past work

Kohlenbach analyzed (Found. Comput. Math., 2019) the proofs of Bauschke ’03 and of Bauschke/Mart´ ın-M´ arquez/Moffat/Wang ’12. These proofs are organized as follows: first one shows that T has arbitrarily small displacements ∀ε ∃p p − Tp ≤ ε

SLIDE 36

Past work

Kohlenbach analyzed (Found. Comput. Math., 2019) the proofs of Bauschke ’03 and of Bauschke/Mart´ ın-M´ arquez/Moffat/Wang ’12. These proofs are organized as follows: first one shows that T has arbitrarily small displacements ∀ε ∃p p − Tp ≤ ε this fact in conjunction with the fact that T is strongly nonexpansive yields asymptotic regularity (Bruck/Reich ’77)

strongly nonexpansive mappings subsume firmly nonexpansive mappings and are closed under composition

The analysis of the second part relies on previous work of Kohlenbach on strongly nonexpansive mappings (Israel J. Math., 2016).

SLIDE 37

On strongly nonexpansive mappings I

Definition (Kohlenbach, 2016) Let T : X → X and ω : (0, ∞) × (0, ∞) → (0, ∞). Then T is called strongly nonexpansive with modulus ω if for any b, ε > 0 and x, y ∈ X with x − y ≤ b and x − y − Tx − Ty < ω(b, ε), we have that (x − y) − (Tx − Ty) < ε.

SLIDE 38

On strongly nonexpansive mappings II

Theorem (Kohlenbach, 2019) Define, for any ε, b, d > 0, α : (0, ∞) → (0, ∞) and ω : (0, ∞) × (0, ∞) → (0, ∞), ϕ(ε, b, d, α, ω) :=

18b + 12α(ε/6)

ε − 1

   

d ω

ε2 27b+18α(ε/6)

  

. Let T : X → X and ω : (0, ∞) × (0, ∞) → (0, ∞) be such that T is strongly nonexpansive with modulus ω. Let α : (0, ∞) → (0, ∞) such that for any δ > 0 there is a p ∈ X with p ≤ α(δ) and p − Tp ≤ δ. Then for any ε, b, d > 0 and any x ∈ X with x ≤ b and x − Tx ≤ d, we have that for any n ≥ ϕ(ε, b, d, α, ω), T nx − T n+1x ≤ ε. Thus, one needs a bound on the p obtained in the first part and a SNE-modulus for T.

SLIDE 39

Extraction details

The first part of the proof provides the most intricate portion of the analysis, since it uses deep results such as Minty’s theorem.

SLIDE 40

Extraction details

The first part of the proof provides the most intricate portion of the analysis, since it uses deep results such as Minty’s theorem. Fortunately, these kinds of arguments only enter the proof through ∀-lemmas, so the resulting rate is of low complexity (polynomial of degree eight).

SLIDE 41

Extraction details

The first part of the proof provides the most intricate portion of the analysis, since it uses deep results such as Minty’s theorem. Fortunately, these kinds of arguments only enter the proof through ∀-lemmas, so the resulting rate is of low complexity (polynomial of degree eight). What we want is to update these techniques in order to analyze the proof of Bauschke/Moursi ’18 for averaged mappings.

SLIDE 42

On averaged mappings

The division into two parts from before remains valid. For the second part, we first need some knowledge on averaged mappings.

SLIDE 43

On averaged mappings

The division into two parts from before remains valid. For the second part, we first need some knowledge on averaged mappings. a mapping T : X → X is called nonexpansive if for all x, y ∈ X, Tx − Ty ≤ x − y; if α ∈ (0, 1), a mapping R : X → X is called α-averaged if there is a nonexpansive mapping T such that for all x ∈ X, Rx = (1 − α)x + αTx

SLIDE 44

On averaged mappings

The division into two parts from before remains valid. For the second part, we first need some knowledge on averaged mappings. a mapping T : X → X is called nonexpansive if for all x, y ∈ X, Tx − Ty ≤ x − y; if α ∈ (0, 1), a mapping R : X → X is called α-averaged if there is a nonexpansive mapping T such that for all x ∈ X, Rx = (1 − α)x + αTx for any α, β ∈ (0, 1), one defines α ⋆ β to be equal to α + β − 2αβ 1 − αβ = 1 1 +

1

α 1−α+ β 1−β

using the expression in the right-hand side, we may immediately derive that this operation is associative and commutative and that for any m ≥ 2 and any α1, . . . , αm ∈ (0, 1), α1 ⋆ · · · ⋆ αm = 1 1 +

1 m

i=1 αi 1−αi

SLIDE 45

On averaged mappings being SNE

It is a classical result that for any m ≥ 2, α1, . . . , αm ∈ (0, 1) and R1, . . . , Rm : X → X such that for each i, Ri is αi-averaged, one has that Rm ◦ . . . ◦ R1 is (α1 ⋆ · · · ⋆ αm)-averaged.

SLIDE 46

On averaged mappings being SNE

It is a classical result that for any m ≥ 2, α1, . . . , αm ∈ (0, 1) and R1, . . . , Rm : X → X such that for each i, Ri is αi-averaged, one has that Rm ◦ . . . ◦ R1 is (α1 ⋆ · · · ⋆ αm)-averaged. Proposition Define, for any α ∈ (0, 1), b, ε > 0, ωα(b, ε) := α(1 − α) 4b · ε2. Let α ∈ (0, 1) and R : X → X an α-averaged mapping. Then R is strongly nonexpansive with modulus ωα.

SLIDE 47

On averaged mappings being SNE

It is a classical result that for any m ≥ 2, α1, . . . , αm ∈ (0, 1) and R1, . . . , Rm : X → X such that for each i, Ri is αi-averaged, one has that Rm ◦ . . . ◦ R1 is (α1 ⋆ · · · ⋆ αm)-averaged. Proposition Define, for any α ∈ (0, 1), b, ε > 0, ωα(b, ε) := α(1 − α) 4b · ε2. Let α ∈ (0, 1) and R : X → X an α-averaged mapping. Then R is strongly nonexpansive with modulus ωα. Thus, we may use ωα1⋆...⋆αm as our required SNE-modulus.

SLIDE 48

The analysis of the new first part

The first part of this new proof is more natural than the ad hoc

ne in Bauschke ’03, making direct use of properties like

cocoercitivity and rectangularity. Let us see how these arise from averaged mappings. if β > 0, then A : X → X is β-cocoercive if for all x, y ∈ X, x − y, Ax − Ay ≥ βAx − Ay2

SLIDE 49

The analysis of the new first part

The first part of this new proof is more natural than the ad hoc

ne in Bauschke ’03, making direct use of properties like

cocoercitivity and rectangularity. Let us see how these arise from averaged mappings. if β > 0, then A : X → X is β-cocoercive if for all x, y ∈ X, x − y, Ax − Ay ≥ βAx − Ay2 for any α ∈ (0, 1) and any α-averaged mapping R : X → X there is an A : X → X which is maximally monotone and (α−1 − 1)-cocoercive and R = 2(idX + A)−1 − idX (R is the “reflected resolvent” of A)

SLIDE 50

The analysis of the new first part

The first part of this new proof is more natural than the ad hoc

ne in Bauschke ’03, making direct use of properties like

cocoercitivity and rectangularity. Let us see how these arise from averaged mappings. if β > 0, then A : X → X is β-cocoercive if for all x, y ∈ X, x − y, Ax − Ay ≥ βAx − Ay2 for any α ∈ (0, 1) and any α-averaged mapping R : X → X there is an A : X → X which is maximally monotone and (α−1 − 1)-cocoercive and R = 2(idX + A)−1 − idX (R is the “reflected resolvent” of A) a mapping A : X → X is rectangular if for all b, c ∈ X, sup

a∈X

a − c, Ab − Aa < ∞

SLIDE 51

The analysis of the new first part

The first part of this new proof is more natural than the ad hoc

ne in Bauschke ’03, making direct use of properties like

cocoercitivity and rectangularity. Let us see how these arise from averaged mappings. if β > 0, then A : X → X is β-cocoercive if for all x, y ∈ X, x − y, Ax − Ay ≥ βAx − Ay2 for any α ∈ (0, 1) and any α-averaged mapping R : X → X there is an A : X → X which is maximally monotone and (α−1 − 1)-cocoercive and R = 2(idX + A)−1 − idX (R is the “reflected resolvent” of A) a mapping A : X → X is rectangular if for all b, c ∈ X, sup

a∈X

a − c, Ab − Aa < ∞ any cocoercive mapping is rectangular

SLIDE 52

Quantitative rectangularity

The following is a quantitative version of that last part. Proposition Put, for all β, L1, L2, L3 > 0, Θ(β, L1, L2, L3) to be (L1 + L2) multiplied by

 L3 +

L1 + L2 + 2βL3 +

1 + L2 2 + 2L1L2 + 8βL1L3 + 4βL2L3

2β

 

Let β, L1, L2, L3 > 0 and A : X → X be β-cocoercive. Let b, c ∈ X with b ≤ L1, c ≤ L2 and Ab ≤ L3. Then for all a ∈ X, a − c, Ab − Aa ≤ Θ(β, L1, L2, L3). This was obtained, among others, by analyzing the arguments in Br´ ezis/Haraux (Israel J. Math., ’76), where rectangularity was first introduced.

SLIDE 53

The bound on the point for two mappings

To get our bound for the point of small displacement, we first analyze the Bauschke/Moursi proof for two mappings. Theorem (1/2) Put, for all α1, α2 ∈ (0, 1), δ > 0 and K : (0, ∞) → (0, ∞), B(α2, K, δ) to be

δ

4

+ δ

8

2

+ 2Θ

α−1

2

− 1, K

δ

4

+ δ

8, K

δ

4

+ δ

8, δ 8

and Φ(α1, α2, K, δ) to be

B(α2, K, δ) · max

√

2, 4B(α2, K, δ) δ

1 1 − α1 + α1 1 − α1

δ

4

+ δ

8

+ δ

8.

SLIDE 54

The bound on the point for two mappings

Theorem (2/2) Let α1, α2 ∈ (0, 1) and R1, R2 : X → X such that for each i, Ri is αi-averaged. Put R := R2 ◦ R1. Let K : (0, ∞) → (0, ∞) be such that for all i and all ε > 0 there is a p ∈ X with p ≤ K(ε) and p − Rip ≤ ε. Then for all δ > 0 there is a p ∈ X with p ≤ Φ(α1, α2, K, δ) and p − Rp ≤ δ.

SLIDE 55

The bound on the point for two mappings

Theorem (2/2) Let α1, α2 ∈ (0, 1) and R1, R2 : X → X such that for each i, Ri is αi-averaged. Put R := R2 ◦ R1. Let K : (0, ∞) → (0, ∞) be such that for all i and all ε > 0 there is a p ∈ X with p ≤ K(ε) and p − Rip ≤ ε. Then for all δ > 0 there is a p ∈ X with p ≤ Φ(α1, α2, K, δ) and p − Rp ≤ δ. As it can be seen from the form of the bound, the analysis uses previously established links of averagedness with cocoercitivity and

rectangularity. Minty’s theorem appears again only in the form of a

∀-lemma. The crucial ingredient of our proof (containing a bound which is not apparent in their paper) uses the fact that a β-cocoercive mapping is (1/β)-Lipschitz.

SLIDE 56

The general bound on the point

By induction, we get the bound that was required of us. Theorem Define, for all m ≥ 2, δ > 0, K : (0, ∞) → (0, ∞) and suitable finite sequences {αi}i ⊆ (0, 1), recursively, Ψ(2, {αi}2

i=1, K, δ) := Φ(α1, α2, K, δ)

and Ψ(m + 1, {αi}m+1

i=1 , K, δ) to be

Φ(α1 ⋆ . . . ⋆ αm, αm+1, ρ → max(Ψ(m, {αi}m

i=1, K, ρ), K(ρ)), δ).

Let m ≥ 2, α1, . . . , αm ∈ (0, 1) and R1, . . . , Rm : X → X such that for each i, Ri is αi-averaged. Put R := Rm ◦ . . . ◦ R1. Let K : (0, ∞) → (0, ∞) be such that for all i and all ε > 0 there is a p ∈ X with p ≤ K(ε) and p − Rip ≤ ε. Then for all δ > 0 there is a p ∈ X with p ≤ Ψ(m, {αi}m

i=1, K, δ) and p − Rp ≤ δ.

SLIDE 57

The rate of asymptotic regularity

Putting together everything we’ve obtained, we get the rate of asymptotic regularity. Theorem Define, for all m ≥ 2, ε, b, d > 0, K : (0, ∞) → (0, ∞) and {αi}m

i=1 ⊆ (0, 1), Σm,{αi}m

i=1,K,b,d(ε) to be

ϕ(ε, b, d, ωα1⋆...⋆αm, δ → Ψ(m, {αi}m

i=1, K, δ)).

Let m ≥ 2, α1, . . . , αm ∈ (0, 1) and R1, . . . , Rm : X → X such that for each i, Ri is αi-averaged. Put R := Rm ◦ . . . ◦ R1. Let K : (0, ∞) → (0, ∞) be such that for all i and all ε > 0 there is a p ∈ X with p ≤ K(ε) and p − Rip ≤ ε. Then for any b, d > 0 and any x ∈ X with x ≤ b and x − Rx ≤ d, we have that for any ε > 0 and n ≥ Σm,{αi}m

i=1,K,b,d(ε), Rnx − Rn+1x ≤ ε.

SLIDE 58

All this can be found in:

A. Sipos

¸, Quantitative inconsistent feasibility for averaged

mappings. arXiv:2001.01513 [math.OC], 2020.

SLIDE 59