SLIDE 1 Quantitative inconsistent feasibility for averaged mappings
Andrei Sipos
,
Technische Universit¨ at Darmstadt Institute of Mathematics of the Romanian Academy
January 31, 2020 Days in Logic 2020 Lisboa, Portugal
SLIDE 2
The general situation
In nonlinear analysis and optimization, one is typically given a metric space X...
SLIDE 3
The general situation
In nonlinear analysis and optimization, one is typically given a metric space X... (you can imagine here a Hilbert space – that is all that we’ll need today)
SLIDE 4
The general situation
In nonlinear analysis and optimization, one is typically given a metric space X... (you can imagine here a Hilbert space – that is all that we’ll need today) ...and wants to find some special kind of point in it, let’s say a fixed point of a self-mapping T : X → X.
SLIDE 5
The general situation
In nonlinear analysis and optimization, one is typically given a metric space X... (you can imagine here a Hilbert space – that is all that we’ll need today) ...and wants to find some special kind of point in it, let’s say a fixed point of a self-mapping T : X → X. We denote the fixed point set of T by Fix(T).
SLIDE 6
Iterations
One typically does this by building iterative sequences (xn), e.g. the Picard iteration: let x ∈ X be arbitrary and set for any n, xn := T nx. We know that if T is a contraction, this converges strongly to a fixed point of T, but in other cases we’ll have only weaker forms of convergence...
SLIDE 7
Iterations
One typically does this by building iterative sequences (xn), e.g. the Picard iteration: let x ∈ X be arbitrary and set for any n, xn := T nx. We know that if T is a contraction, this converges strongly to a fixed point of T, but in other cases we’ll have only weaker forms of convergence... ...like weak convergence itself...
SLIDE 8
Iterations
One typically does this by building iterative sequences (xn), e.g. the Picard iteration: let x ∈ X be arbitrary and set for any n, xn := T nx. We know that if T is a contraction, this converges strongly to a fixed point of T, but in other cases we’ll have only weaker forms of convergence... ...like weak convergence itself... ...but most importantly asymptotic regularity: lim
n→∞ xn − Txn = 0.
SLIDE 9
Iterations
One typically does this by building iterative sequences (xn), e.g. the Picard iteration: let x ∈ X be arbitrary and set for any n, xn := T nx. We know that if T is a contraction, this converges strongly to a fixed point of T, but in other cases we’ll have only weaker forms of convergence... ...like weak convergence itself... ...but most importantly asymptotic regularity: lim
n→∞ xn − Txn = 0.
Intuition: convergence: “close to a fixed point” asymptotic regularity: “close to being a fixed point” (the iteration is then an approximate fixed point sequence)
SLIDE 10 A more elaborate problem
Consider now n ≥ 1 and let C1, . . . , Cn be closed, convex, nonempty subsets of X such that
n
Ci = ∅.
SLIDE 11 A more elaborate problem
Consider now n ≥ 1 and let C1, . . . , Cn be closed, convex, nonempty subsets of X such that
n
Ci = ∅. This configuration is called a (consistent) convex feasibility
- problem. The problem here is to find a point in the intersection.
SLIDE 12 A more elaborate problem
Consider now n ≥ 1 and let C1, . . . , Cn be closed, convex, nonempty subsets of X such that
n
Ci = ∅. This configuration is called a (consistent) convex feasibility
- problem. The problem here is to find a point in the intersection.
Bregman proved in 1965 that the Picard iteration of T := PCn ◦ . . . ◦ PC1 from an arbitrary point x converges weakly to a point in Fix(T), a set that coincides with the above intersection.
SLIDE 13
Inconsistent feasibility
What happens when the intersection is empty? (This is called a problem of inconsistent feasibility.)
SLIDE 14
Inconsistent feasibility
What happens when the intersection is empty? (This is called a problem of inconsistent feasibility.) (Of course, one doesn’t care here about convergence, since there is nothing interesting to converge to...)
SLIDE 15
Inconsistent feasibility
What happens when the intersection is empty? (This is called a problem of inconsistent feasibility.) (Of course, one doesn’t care here about convergence, since there is nothing interesting to converge to...) Conjecture (Bauschke/Borwein/Lewis ’95): asymptotic regularity still holds.
SLIDE 16
Inconsistent feasibility
What happens when the intersection is empty? (This is called a problem of inconsistent feasibility.) (Of course, one doesn’t care here about convergence, since there is nothing interesting to converge to...) Conjecture (Bauschke/Borwein/Lewis ’95): asymptotic regularity still holds. This was proved by Bauschke (Proc. AMS ’03).
SLIDE 17
More developments
The result of Bauschke was then generalized:
SLIDE 18
More developments
The result of Bauschke was then generalized: from projections onto convex sets to firmly nonexpansive mappings
SLIDE 19 More developments
The result of Bauschke was then generalized: from projections onto convex sets to firmly nonexpansive mappings
a well-behaved class of mappings which is important in convex
- ptimization, as primary examples include:
projections onto closed, convex, nonempty subsets resolvents (of nonexpansive mappings, of convex lsc functions)
PC becomes R, C becomes Fix(R)
SLIDE 20 More developments
The result of Bauschke was then generalized: from projections onto convex sets to firmly nonexpansive mappings
a well-behaved class of mappings which is important in convex
- ptimization, as primary examples include:
projections onto closed, convex, nonempty subsets resolvents (of nonexpansive mappings, of convex lsc functions)
PC becomes R, C becomes Fix(R)
- ne assumes even less: each mapping needs to have only
approximate fixed points
SLIDE 21 More developments
The result of Bauschke was then generalized: from projections onto convex sets to firmly nonexpansive mappings
a well-behaved class of mappings which is important in convex
- ptimization, as primary examples include:
projections onto closed, convex, nonempty subsets resolvents (of nonexpansive mappings, of convex lsc functions)
PC becomes R, C becomes Fix(R)
- ne assumes even less: each mapping needs to have only
approximate fixed points this was done by Bauschke/Mart´ ın-M´ arquez/Moffat/Wang in 2012
SLIDE 22 More developments
The result of Bauschke was then generalized: from projections onto convex sets to firmly nonexpansive mappings
a well-behaved class of mappings which is important in convex
- ptimization, as primary examples include:
projections onto closed, convex, nonempty subsets resolvents (of nonexpansive mappings, of convex lsc functions)
PC becomes R, C becomes Fix(R)
- ne assumes even less: each mapping needs to have only
approximate fixed points this was done by Bauschke/Mart´ ın-M´ arquez/Moffat/Wang in 2012
even more, from firmly nonexpansive mappings to α-averaged mappings – where α ∈ (0, 1)
done by Bauschke/Moursi in 2018 firmly nonexpansive mappings are exactly 1
2-averaged mappings
SLIDE 23
Proof mining
What does logic have to do with this?
SLIDE 24
Proof mining
What does logic have to do with this? The answer is proof mining: an applied subfield of mathematical logic first suggested by G. Kreisel in the 1950s (under the name “proof unwinding”), then given maturity by U. Kohlenbach and his collaborators starting in the 1990s
SLIDE 25
Proof mining
What does logic have to do with this? The answer is proof mining: an applied subfield of mathematical logic first suggested by G. Kreisel in the 1950s (under the name “proof unwinding”), then given maturity by U. Kohlenbach and his collaborators starting in the 1990s goals: to find explicit and uniform witnesses or bounds and to remove superfluous premises from concrete mathematical statements by analyzing their proofs
SLIDE 26
Proof mining
What does logic have to do with this? The answer is proof mining: an applied subfield of mathematical logic first suggested by G. Kreisel in the 1950s (under the name “proof unwinding”), then given maturity by U. Kohlenbach and his collaborators starting in the 1990s goals: to find explicit and uniform witnesses or bounds and to remove superfluous premises from concrete mathematical statements by analyzing their proofs tools used: primarily proof interpretations (modified realizability, negative translation, functional interpretation)
SLIDE 27
Proof mining
What does logic have to do with this? The answer is proof mining: an applied subfield of mathematical logic first suggested by G. Kreisel in the 1950s (under the name “proof unwinding”), then given maturity by U. Kohlenbach and his collaborators starting in the 1990s goals: to find explicit and uniform witnesses or bounds and to remove superfluous premises from concrete mathematical statements by analyzing their proofs tools used: primarily proof interpretations (modified realizability, negative translation, functional interpretation) the adequacy of the tools to the goals is guaranteed by general logical metatheorems
SLIDE 28
Proof mining
What does logic have to do with this? The answer is proof mining: an applied subfield of mathematical logic first suggested by G. Kreisel in the 1950s (under the name “proof unwinding”), then given maturity by U. Kohlenbach and his collaborators starting in the 1990s goals: to find explicit and uniform witnesses or bounds and to remove superfluous premises from concrete mathematical statements by analyzing their proofs tools used: primarily proof interpretations (modified realizability, negative translation, functional interpretation) the adequacy of the tools to the goals is guaranteed by general logical metatheorems a short and accessible introduction may be found in Kohlenbach’s survey for ICM 2018
SLIDE 29 Rates
In the case of asymptotic regularity: ∀ε ∃N ∀n ≥ N xn − Txn ≤ ε. we would like to find a rate of asymptotic regularity: an explicit formula for N in terms of the ε and of (as few as possible of) the
- ther parameters of the problem.
SLIDE 30 Rates
In the case of asymptotic regularity: ∀ε ∃N ∀n ≥ N xn − Txn ≤ ε. we would like to find a rate of asymptotic regularity: an explicit formula for N in terms of the ε and of (as few as possible of) the
- ther parameters of the problem.
The statement is Π3, a case generally excluded by the metatheorems which pertain to classical logic, and rightfully so, since there exist explicit examples (“Specker sequences”) of sequences of computable reals with no computable limit and thus with no computable rate of convergence.
SLIDE 31
Rates continued
In the cases we shall discuss, however, the sequence (xn − Txn) is nonincreasing, which gets rid of the last ∀.
SLIDE 32
Rates continued
In the cases we shall discuss, however, the sequence (xn − Txn) is nonincreasing, which gets rid of the last ∀. At the other end of logical complexity, purely universal (Π1) sentences help us when they show up in proofs that we’re analyzing since they lack computational content and thus it doesn’t matter whether their subproofs conform to the requirements of the metatheorems (an observation first due to Kreisel).
SLIDE 33
Past work
Kohlenbach analyzed (Found. Comput. Math., 2019) the proofs of Bauschke ’03 and of Bauschke/Mart´ ın-M´ arquez/Moffat/Wang ’12.
SLIDE 34
Past work
Kohlenbach analyzed (Found. Comput. Math., 2019) the proofs of Bauschke ’03 and of Bauschke/Mart´ ın-M´ arquez/Moffat/Wang ’12. These proofs are organized as follows:
SLIDE 35
Past work
Kohlenbach analyzed (Found. Comput. Math., 2019) the proofs of Bauschke ’03 and of Bauschke/Mart´ ın-M´ arquez/Moffat/Wang ’12. These proofs are organized as follows: first one shows that T has arbitrarily small displacements ∀ε ∃p p − Tp ≤ ε
SLIDE 36
Past work
Kohlenbach analyzed (Found. Comput. Math., 2019) the proofs of Bauschke ’03 and of Bauschke/Mart´ ın-M´ arquez/Moffat/Wang ’12. These proofs are organized as follows: first one shows that T has arbitrarily small displacements ∀ε ∃p p − Tp ≤ ε this fact in conjunction with the fact that T is strongly nonexpansive yields asymptotic regularity (Bruck/Reich ’77)
strongly nonexpansive mappings subsume firmly nonexpansive mappings and are closed under composition
The analysis of the second part relies on previous work of Kohlenbach on strongly nonexpansive mappings (Israel J. Math., 2016).
SLIDE 37
On strongly nonexpansive mappings I
Definition (Kohlenbach, 2016) Let T : X → X and ω : (0, ∞) × (0, ∞) → (0, ∞). Then T is called strongly nonexpansive with modulus ω if for any b, ε > 0 and x, y ∈ X with x − y ≤ b and x − y − Tx − Ty < ω(b, ε), we have that (x − y) − (Tx − Ty) < ε.
SLIDE 38 On strongly nonexpansive mappings II
Theorem (Kohlenbach, 2019) Define, for any ε, b, d > 0, α : (0, ∞) → (0, ∞) and ω : (0, ∞) × (0, ∞) → (0, ∞), ϕ(ε, b, d, α, ω) :=
18b + 12α(ε/6)
ε − 1
d ω
ε2 27b+18α(ε/6)
. Let T : X → X and ω : (0, ∞) × (0, ∞) → (0, ∞) be such that T is strongly nonexpansive with modulus ω. Let α : (0, ∞) → (0, ∞) such that for any δ > 0 there is a p ∈ X with p ≤ α(δ) and p − Tp ≤ δ. Then for any ε, b, d > 0 and any x ∈ X with x ≤ b and x − Tx ≤ d, we have that for any n ≥ ϕ(ε, b, d, α, ω), T nx − T n+1x ≤ ε. Thus, one needs a bound on the p obtained in the first part and a SNE-modulus for T.
SLIDE 39
Extraction details
The first part of the proof provides the most intricate portion of the analysis, since it uses deep results such as Minty’s theorem.
SLIDE 40
Extraction details
The first part of the proof provides the most intricate portion of the analysis, since it uses deep results such as Minty’s theorem. Fortunately, these kinds of arguments only enter the proof through ∀-lemmas, so the resulting rate is of low complexity (polynomial of degree eight).
SLIDE 41
Extraction details
The first part of the proof provides the most intricate portion of the analysis, since it uses deep results such as Minty’s theorem. Fortunately, these kinds of arguments only enter the proof through ∀-lemmas, so the resulting rate is of low complexity (polynomial of degree eight). What we want is to update these techniques in order to analyze the proof of Bauschke/Moursi ’18 for averaged mappings.
SLIDE 42
On averaged mappings
The division into two parts from before remains valid. For the second part, we first need some knowledge on averaged mappings.
SLIDE 43
On averaged mappings
The division into two parts from before remains valid. For the second part, we first need some knowledge on averaged mappings. a mapping T : X → X is called nonexpansive if for all x, y ∈ X, Tx − Ty ≤ x − y; if α ∈ (0, 1), a mapping R : X → X is called α-averaged if there is a nonexpansive mapping T such that for all x ∈ X, Rx = (1 − α)x + αTx
SLIDE 44 On averaged mappings
The division into two parts from before remains valid. For the second part, we first need some knowledge on averaged mappings. a mapping T : X → X is called nonexpansive if for all x, y ∈ X, Tx − Ty ≤ x − y; if α ∈ (0, 1), a mapping R : X → X is called α-averaged if there is a nonexpansive mapping T such that for all x ∈ X, Rx = (1 − α)x + αTx for any α, β ∈ (0, 1), one defines α ⋆ β to be equal to α + β − 2αβ 1 − αβ = 1 1 +
1
α 1−α+ β 1−β
using the expression in the right-hand side, we may immediately derive that this operation is associative and commutative and that for any m ≥ 2 and any α1, . . . , αm ∈ (0, 1), α1 ⋆ · · · ⋆ αm = 1 1 +
1
m
i=1 αi 1−αi
SLIDE 45
On averaged mappings being SNE
It is a classical result that for any m ≥ 2, α1, . . . , αm ∈ (0, 1) and R1, . . . , Rm : X → X such that for each i, Ri is αi-averaged, one has that Rm ◦ . . . ◦ R1 is (α1 ⋆ · · · ⋆ αm)-averaged.
SLIDE 46
On averaged mappings being SNE
It is a classical result that for any m ≥ 2, α1, . . . , αm ∈ (0, 1) and R1, . . . , Rm : X → X such that for each i, Ri is αi-averaged, one has that Rm ◦ . . . ◦ R1 is (α1 ⋆ · · · ⋆ αm)-averaged. Proposition Define, for any α ∈ (0, 1), b, ε > 0, ωα(b, ε) := α(1 − α) 4b · ε2. Let α ∈ (0, 1) and R : X → X an α-averaged mapping. Then R is strongly nonexpansive with modulus ωα.
SLIDE 47
On averaged mappings being SNE
It is a classical result that for any m ≥ 2, α1, . . . , αm ∈ (0, 1) and R1, . . . , Rm : X → X such that for each i, Ri is αi-averaged, one has that Rm ◦ . . . ◦ R1 is (α1 ⋆ · · · ⋆ αm)-averaged. Proposition Define, for any α ∈ (0, 1), b, ε > 0, ωα(b, ε) := α(1 − α) 4b · ε2. Let α ∈ (0, 1) and R : X → X an α-averaged mapping. Then R is strongly nonexpansive with modulus ωα. Thus, we may use ωα1⋆...⋆αm as our required SNE-modulus.
SLIDE 48 The analysis of the new first part
The first part of this new proof is more natural than the ad hoc
- ne in Bauschke ’03, making direct use of properties like
cocoercitivity and rectangularity. Let us see how these arise from averaged mappings. if β > 0, then A : X → X is β-cocoercive if for all x, y ∈ X, x − y, Ax − Ay ≥ βAx − Ay2
SLIDE 49 The analysis of the new first part
The first part of this new proof is more natural than the ad hoc
- ne in Bauschke ’03, making direct use of properties like
cocoercitivity and rectangularity. Let us see how these arise from averaged mappings. if β > 0, then A : X → X is β-cocoercive if for all x, y ∈ X, x − y, Ax − Ay ≥ βAx − Ay2 for any α ∈ (0, 1) and any α-averaged mapping R : X → X there is an A : X → X which is maximally monotone and (α−1 − 1)-cocoercive and R = 2(idX + A)−1 − idX (R is the “reflected resolvent” of A)
SLIDE 50 The analysis of the new first part
The first part of this new proof is more natural than the ad hoc
- ne in Bauschke ’03, making direct use of properties like
cocoercitivity and rectangularity. Let us see how these arise from averaged mappings. if β > 0, then A : X → X is β-cocoercive if for all x, y ∈ X, x − y, Ax − Ay ≥ βAx − Ay2 for any α ∈ (0, 1) and any α-averaged mapping R : X → X there is an A : X → X which is maximally monotone and (α−1 − 1)-cocoercive and R = 2(idX + A)−1 − idX (R is the “reflected resolvent” of A) a mapping A : X → X is rectangular if for all b, c ∈ X, sup
a∈X
a − c, Ab − Aa < ∞
SLIDE 51 The analysis of the new first part
The first part of this new proof is more natural than the ad hoc
- ne in Bauschke ’03, making direct use of properties like
cocoercitivity and rectangularity. Let us see how these arise from averaged mappings. if β > 0, then A : X → X is β-cocoercive if for all x, y ∈ X, x − y, Ax − Ay ≥ βAx − Ay2 for any α ∈ (0, 1) and any α-averaged mapping R : X → X there is an A : X → X which is maximally monotone and (α−1 − 1)-cocoercive and R = 2(idX + A)−1 − idX (R is the “reflected resolvent” of A) a mapping A : X → X is rectangular if for all b, c ∈ X, sup
a∈X
a − c, Ab − Aa < ∞ any cocoercive mapping is rectangular
SLIDE 52 Quantitative rectangularity
The following is a quantitative version of that last part. Proposition Put, for all β, L1, L2, L3 > 0, Θ(β, L1, L2, L3) to be (L1 + L2) multiplied by
L3 +
L1 + L2 + 2βL3 +
1 + L2 2 + 2L1L2 + 8βL1L3 + 4βL2L3
2β
Let β, L1, L2, L3 > 0 and A : X → X be β-cocoercive. Let b, c ∈ X with b ≤ L1, c ≤ L2 and Ab ≤ L3. Then for all a ∈ X, a − c, Ab − Aa ≤ Θ(β, L1, L2, L3). This was obtained, among others, by analyzing the arguments in Br´ ezis/Haraux (Israel J. Math., ’76), where rectangularity was first introduced.
SLIDE 53 The bound on the point for two mappings
To get our bound for the point of small displacement, we first analyze the Bauschke/Moursi proof for two mappings. Theorem (1/2) Put, for all α1, α2 ∈ (0, 1), δ > 0 and K : (0, ∞) → (0, ∞), B(α2, K, δ) to be
δ
4
8
2
+ 2Θ
2
− 1, K
δ
4
8, K
δ
4
8, δ 8
- and Φ(α1, α2, K, δ) to be
B(α2, K, δ) · max
√
2, 4B(α2, K, δ) δ
1 1 − α1 + α1 1 − α1
δ
4
8
8.
SLIDE 54
The bound on the point for two mappings
Theorem (2/2) Let α1, α2 ∈ (0, 1) and R1, R2 : X → X such that for each i, Ri is αi-averaged. Put R := R2 ◦ R1. Let K : (0, ∞) → (0, ∞) be such that for all i and all ε > 0 there is a p ∈ X with p ≤ K(ε) and p − Rip ≤ ε. Then for all δ > 0 there is a p ∈ X with p ≤ Φ(α1, α2, K, δ) and p − Rp ≤ δ.
SLIDE 55 The bound on the point for two mappings
Theorem (2/2) Let α1, α2 ∈ (0, 1) and R1, R2 : X → X such that for each i, Ri is αi-averaged. Put R := R2 ◦ R1. Let K : (0, ∞) → (0, ∞) be such that for all i and all ε > 0 there is a p ∈ X with p ≤ K(ε) and p − Rip ≤ ε. Then for all δ > 0 there is a p ∈ X with p ≤ Φ(α1, α2, K, δ) and p − Rp ≤ δ. As it can be seen from the form of the bound, the analysis uses previously established links of averagedness with cocoercitivity and
- rectangularity. Minty’s theorem appears again only in the form of a
∀-lemma. The crucial ingredient of our proof (containing a bound which is not apparent in their paper) uses the fact that a β-cocoercive mapping is (1/β)-Lipschitz.
SLIDE 56
The general bound on the point
By induction, we get the bound that was required of us. Theorem Define, for all m ≥ 2, δ > 0, K : (0, ∞) → (0, ∞) and suitable finite sequences {αi}i ⊆ (0, 1), recursively, Ψ(2, {αi}2
i=1, K, δ) := Φ(α1, α2, K, δ)
and Ψ(m + 1, {αi}m+1
i=1 , K, δ) to be
Φ(α1 ⋆ . . . ⋆ αm, αm+1, ρ → max(Ψ(m, {αi}m
i=1, K, ρ), K(ρ)), δ).
Let m ≥ 2, α1, . . . , αm ∈ (0, 1) and R1, . . . , Rm : X → X such that for each i, Ri is αi-averaged. Put R := Rm ◦ . . . ◦ R1. Let K : (0, ∞) → (0, ∞) be such that for all i and all ε > 0 there is a p ∈ X with p ≤ K(ε) and p − Rip ≤ ε. Then for all δ > 0 there is a p ∈ X with p ≤ Ψ(m, {αi}m
i=1, K, δ) and p − Rp ≤ δ.
SLIDE 57 The rate of asymptotic regularity
Putting together everything we’ve obtained, we get the rate of asymptotic regularity. Theorem Define, for all m ≥ 2, ε, b, d > 0, K : (0, ∞) → (0, ∞) and {αi}m
i=1 ⊆ (0, 1), Σm,{αi}m
i=1,K,b,d(ε) to be
ϕ(ε, b, d, ωα1⋆...⋆αm, δ → Ψ(m, {αi}m
i=1, K, δ)).
Let m ≥ 2, α1, . . . , αm ∈ (0, 1) and R1, . . . , Rm : X → X such that for each i, Ri is αi-averaged. Put R := Rm ◦ . . . ◦ R1. Let K : (0, ∞) → (0, ∞) be such that for all i and all ε > 0 there is a p ∈ X with p ≤ K(ε) and p − Rip ≤ ε. Then for any b, d > 0 and any x ∈ X with x ≤ b and x − Rx ≤ d, we have that for any ε > 0 and n ≥ Σm,{αi}m
i=1,K,b,d(ε), Rnx − Rn+1x ≤ ε.
SLIDE 58 All this can be found in:
¸, Quantitative inconsistent feasibility for averaged
- mappings. arXiv:2001.01513 [math.OC], 2020.
SLIDE 59
Thank you for your attention.