Algorithms with provable guarantees for clustering problems Ola - - PowerPoint PPT Presentation
Algorithms with provable guarantees for clustering problems Ola - - PowerPoint PPT Presentation
Algorithms with provable guarantees for clustering problems Ola Svensson Where to place rescue centers? Build k centers so as to minimize sum of travel distances Where to place rescue centers? optimize some objective Build k centers so as to
Where to place rescue centers?
Build k centers so as to minimize sum of travel distances
Where to place rescue centers?
Build k centers so as to minimize sum of travel distances
- ptimize some objective
Median and Center
MEDIAN: Open point/facility on real line so as to minimize sum of distances from clients ( )
Median and Center
MEDIAN: Open point/facility on real line so as to minimize sum of distances from clients ( )
Median and Center
MEDIAN: Open point/facility on real line so as to minimize sum of distances from clients ( )
Median and Center
decrease distance for 3 clients increase distance for 6 clients
MEDIAN: Open point/facility on real line so as to minimize sum of distances from clients ( )
Median and Center
decrease distance for 3 clients increase distance for 6 clients decrease distance for 6 clients increase distance for 3 clients
MEDIAN: Open point/facility on real line so as to minimize sum of distances from clients ( )
Median and Center
MEDIAN: Open point/facility on real line so as to minimize sum of distances from clients ( )
Median and Center
CENTER: Open point/facility on real line so as to minimize max distance
- ver all clients ( )
MEDIAN: Open point/facility on real line so as to minimize sum of distances from clients ( )
Median and Center
CENTER: Open point/facility on real line so as to minimize max distance
- ver all clients ( )
MEDIAN: Open point/facility on real line so as to minimize sum of distances from clients ( )
Median and Center
CENTER: Open point/facility on real line so as to minimize max distance
- ver all clients ( )
x x
MEDIAN: Open point/facility on real line so as to minimize sum of distances from clients ( )
K-Median and K-Center
K-MEDIAN: Open k points/facilities in a metric space so as to minimize sum of distances from clients ( )
K-Median and K-Center
K-MEDIAN: Open k points/facilities in a metric space so as to minimize sum of distances from clients ( )
K-Median and K-Center
K-MEDIAN: Open k points/facilities in a metric space so as to minimize sum of distances from clients ( )
K-Median and K-Center
K-CENTER: Open k points/facilities in a metric space so as to minimize max distance over all clients ( ) K-MEDIAN: Open k points/facilities in a metric space so as to minimize sum of distances from clients ( )
K-Median and K-Center
K-CENTER: Open k points/facilities in a metric space so as to minimize max distance over all clients ( ) K-MEDIAN: Open k points/facilities in a metric space so as to minimize sum of distances from clients ( )
Mathematical formulation of objective functions
Mathematical formulation of objective functions
General Problem parameterized by π β₯ π: Find a set π» of k points/facilities in a metric space so as to minimize
π π πππππ
π π, π» π
π/π
Mathematical formulation of objective functions
General Problem parameterized by π β₯ π: Find a set π» of k points/facilities in a metric space so as to minimize
π π πππππ
π π, π» π
π/π
Distance from client j to closest facility in S
K-MEDIAN: π = π K-CENTER: π = β K-MEANS: π = π Actually, π ππππππ’ π π, π 2 and Euclidean metric
Facility Location
Facility Location: Open facilities in a metric space so as to minimize sum of distances from clients + opening costs
ALL THESE PROBLEMS ARE INTRACTABLE (NP-HARD) IN THE WORST CASE
Solving intractable problems
- Heuristics
- good for βtypicalβ instances
- bad instances do not happen too often
1 4 16 64 256 1024 4096 16384 50's 70's 80's 90's 00's
Dantzig, Fulkerson, and Johnson solve a 49- city instance to optimality Applegate, Bixby, Chvatal, Cook, and Helsgaun solve a 24978-city instance
!
Sweden has only 9 million inhabitants β 360 persons/city
Solving intractable problems
- Approximation Algorithms
- Perhaps we can efficiently find a reasonably good solution?
Approximation Ratio: worst case over all instances
- Ξ±=1 is an exact polynomial time algorithm
- Ξ±=1.01 then algorithm finds a solution with at most 1% higher cost
GOAL: Complete understanding of worst case behavior
State of the Art
Approximation Hardness Facility Location 1.488
[Liβ11]
1.463
[Guha & Khullerβ98]
K-Center 2
[Gonzalesβ85, Hochbaum & Shmoysβ85]
2
[Hsu & Nemhauserβ79]
K-Median 2.67
[Byrka et alβ15]
1+2/e
[Jain et al.β02]
K-Means 9
[Kanungo et alβ2004]
1.0013
[Lee. Schmidt, Wrightβ15]
Even better: Approximation algorithms (can be) achieved by standard LP relaxations and techniques transfer between problems
A 2-APPROXIMATION ALGORITHM FOR K-CENTER
Greedy K-Center
Open any point For π = 2, β¦ , π Open point farthest away from already
- pened points
Greedy K-Center
Open any point For π = 2, β¦ , π Open point farthest away from already
- pened points
Greedy K-Center
Open any point For π = 2, β¦ , π Open point farthest away from already
- pened points
Greedy K-Center
Open any point For π = 2, β¦ , π Open point farthest away from already
- pened points
Analysis
Open any point For π = 2, β¦ , π Open point farthest away from already
- pened points
Consider optimal solution and corresponding Voronoi diagram
Analysis
Open any point For π = 2, β¦ , π Open point farthest away from already
- pened points
Case 1: We opened up one point in each cell
Analysis
Open any point For π = 2, β¦ , π Open point farthest away from already
- pened points
Case 1: We opened up one point in each cell
Analysis
Open any point For π = 2, β¦ , π Open point farthest away from already
- pened points
Case 1: We opened up one point in each cell
β€ πππ β€ πππ
Analysis
Open any point For π = 2, β¦ , π Open point farthest away from already
- pened points
Case 1: We opened up one point in each cell
β€ πππ β€ πππ β€ 2 β πππ
In this case any client is connected within distance β€ π β π·πΈπΌ
Analysis
Open any point For π = 2, β¦ , π Open point farthest away from already
- pened points
Case 1I: We did not open up one point in each cell
Analysis
Open any point For π = 2, β¦ , π Open point farthest away from already
- pened points
Case 1I: We opened up two points in a single cell
Analysis
Open any point For π = 2, β¦ , π Open point farthest away from already
- pened points
Case 1I: We opened up two points in a single cell
β€ πππ β€ πππ
Analysis
Open any point For π = 2, β¦ , π Open point farthest away from already
- pened points
Case 1I: We opened up two points in a single cell
β€ πππ β€ 2 β πππ
Also in this case any client is connected within distance β€ π β π·πΈπΌ
β€ πππ
Open any point For π = 2, β¦ , π Open point farthest away from already
- pened points
THEOREM:
The above greedy algorithm is a 2-approximation for k-Center
Gonzales, Hochbaum & Shmoysβ85
ALGORITHMS FOR FACILITY LOCATION AND K-MEDIAN
LINEAR PROGRAMMING RELAXATION
LINEAR PROGRAM:
- yi takes value 1 if i is opened and 0 otherwise
- xij takes value 1 if j is connected to i and 0 otherwise
LP Relaxation for Facility Location
LINEAR PROGRAM:
- yi takes value 1 if i is opened and 0 otherwise
- xij takes value 1 if j is connected to i and 0 otherwise
- pening cost
connection cost
LP Relaxation for Facility Location
minimize πβπΊ π
ππ§π + πβπΊ,πβπ· ππππ¦ππ
subject to
πβπΊ π¦ππ = 1 π β π· π¦ππ β€ π§π i β πΊ, π β π· π¦ππ, π§π β [0,1] i β πΊ, π β π·
LINEAR PROGRAM:
- yi takes value 1 if i is opened and 0 otherwise
- xij takes value 1 if j is connected to i and 0 otherwise
LP Relaxation for Facility Location
minimize πβπΊ π
ππ§π + πβπΊ,πβπ· ππππ¦ππ
subject to
πβπΊ π¦ππ = 1 π β π· π¦ππ β€ π§π i β πΊ, π β π· π¦ππ, π§π β [0,1] i β πΊ, π β π·
Every client is connected
LINEAR PROGRAM:
- yi takes value 1 if i is opened and 0 otherwise
- xij takes value 1 if j is connected to i and 0 otherwise
LP Relaxation for Facility Location
minimize πβπΊ π
ππ§π + πβπΊ,πβπ· ππππ¦ππ
subject to
πβπΊ π¦ππ = 1 π β π· π¦ππ β€ π§π i β πΊ, π β π· π¦ππ, π§π β [0,1] i β πΊ, π β π·
Clients connected to open facilities
LINEAR PROGRAM:
- yi takes value 1 if i is opened and 0 otherwise
- xij takes value 1 if j is connected to i and 0 otherwise
LP Relaxation for Facility Location
minimize πβπΊ π
ππ§π + πβπΊ,πβπ· ππππ¦ππ
subject to
πβπΊ π¦ππ = 1 π β π· π¦ππ β€ π§π i β πΊ, π β π· π¦ππ, π§π β [0,1] i β πΊ, π β π·
ALGORITHMS USING RELAXATION
Randomized Rounding
Interpret yi as the probability that facility i is opened
Randomized Rounding
Interpret yi as the probability that facility i is opened
Open each facility i with probability yi Connect client to closest opened facility
Randomized Rounding
Interpret yi as the probability that facility i is opened PROBLEM:
- With constant probability: a client has no facility opened close to it
Open each facility i with probability yi Connect client to closest opened facility
Dependent Rounding
Grow and select balls Open each facility i with probability yi subject to a facility is opened in each ball Connect client to closest opened facility
Dependent Rounding
Grow and select balls Open each facility i with probability yi subject to a facility is opened in each ball Connect client to closest opened facility
Dependent Rounding
Grow and select balls Open each facility i with probability yi subject to a facility is opened in each ball Connect client to closest opened facility
While possible select ball with smallest radius that is disjoint from selected balls
Dependent Rounding
Grow and select balls Open each facility i with probability yi subject to a facility is opened in each ball Connect client to closest opened facility
While possible select ball with smallest radius that is disjoint from selected balls => Every client has a βfall backβ path of length 3 times it radius
Dependent Rounding
Grow and select balls Open each facility i with probability yi subject to a facility is opened in each ball Connect client to closest opened facility
While possible select ball with smallest radius that is disjoint from selected balls => Every client has a βfall backβ path of length 3 times it radius
Dependent Rounding
Grow and select balls Open each facility i with probability yi subject to a facility is opened in each ball Connect client to closest opened facility
Dependent Rounding
Grow and select balls Open each facility i with probability yi subject to a facility is opened in each ball Connect client to closest opened facility
Dependent Rounding
Grow and select balls Open each facility i with probability yi subject to a facility is opened in each ball Connect client to closest opened facility
Dependent Rounding
Grow and select balls Open each facility i with probability yi subject to a facility is opened in each ball Connect client to closest opened facility
Dependent Rounding
Grow and select balls Open each facility i with probability yi subject to a facility is opened in each ball Connect client to closest opened facility
First constant approximation algorithm
THEOREM:
βdependent roundingβ gives 3.16-approximation algorithm
Shmoys, Tardos, Aardalβ97
Impressive progress based on same LP
THEOREM:
βdependent roundingβ gives (1+2/e)-approximation algorithm
Chudak & Shmoysβ99
THEOREM:
Primal-dual gives 3-approximation algorithm
Jain & Vaziraniβ01, Jain et alβ03, Mahdian et al.β02
Impressive progress based on same LP
THEOREM:
βdependent roundingβ gives (1+2/e)-approximation algorithm
Chudak & Shmoysβ99
THEOREM:
Primal-dual gives 1.6-approximation algorithm
Jain & Vaziraniβ01, Jain et alβ03, Mahdian et al.β02
Impressive progress based on same LP
THEOREM:
βdependent roundingβ gives (1+2/e)-approximation algorithm
Chudak & Shmoysβ99
THEOREM:
Primal-dual gives 1.52-approximation algorithm
Jain & Vaziraniβ01, Jain et alβ03, Mahdian et al.β02
Impressive progress based on same LP
THEOREM:
βdependent roundingβ gives (1+2/e)-approximation algorithm
Chudak & Shmoysβ99
THEOREM:
Primal-dual gives 1.52-approximation algorithm
Jain & Vaziraniβ01, Jain et alβ03, Mahdian et al.β02
THEOREM:
βdependent roundingβ+primal-dual gives 1.5-approximation algorithm
Byrkaβ07
Impressive progress based on same LP
THEOREM:
Primal-dual gives 1.52-approximation algorithm
Jain & Vaziraniβ01, Jain et alβ03, Mahdian et al.β02
THEOREM:
βdependent roundingβ+primal-dual gives 1.5-approximation algorithm
Byrkaβ07
THEOREM:
βdependent roundingβ+primal-dual gives 1.488-approximation algorithm
Liβ11
Impressive progress based on same LP
THEOREM:
βdependent roundingβ+primal-dual gives 1.488-approximation algorithm
Liβ11
ALMOST TIGHT: It is NP-hard to do better than 1.463 Guha and Kullerβ99
Relation to k-Median
K-MEDIAN: same as facility location but hard constraint that at most k facilities are opened.
Relation to k-Median
K-MEDIAN: same as facility location but hard constraint that at most k facilities are opened. Relationship to facility location: Simple economy
- If the price of opening facilities is cheap, many facilities will be opened
- If the price of opening facilities is expensive, few facilities will be opened
Relation to k-Median
K-MEDIAN: same as facility location but hard constraint that at most k facilities are opened. Relationship to facility location: Simple economy
- If the price of opening facilities is cheap, many facilities will be opened
- If the price of opening facilities is expensive, few facilities will be opened
=> Find price so that β k facilities are opened
Relation to k-Median
K-MEDIAN: same as facility location but hard constraint that at most k facilities are opened. Relationship to facility location: Simple economy
- If the price of opening facilities is cheap, many facilities will be opened
- If the price of opening facilities is expensive, few facilities will be opened
=> Find price so that β k facilities are opened
First exploited by Jain & Vaziraniβ01 to give fast and elegant approximation algorithms for k-median based on algorithms for facility location
Relaxing hard constraint for k-Median
- Difficulty is the hard constraint that we can open at most k facilities
THEOREM:
An r-pseudo-approximation algorithm that opens k+c facilities can be turned into a r+Ξ΅-approximation algorithm that opens k facilities and runs in time nO(c/Ξ΅)
Li & S.β12 Together with an improved βpseudo-approximationβ gives THEOREM:
There is a 2.73- approximation algorithm for k-Median
Li & S.β12
Relaxing hard constraint for k-Median
- Difficulty is the hard constraint that we can open at most k facilities
THEOREM:
An r-pseudo-approximation algorithm that opens k+c facilities can be turned into a r+Ξ΅-approximation algorithm that opens k facilities and runs in time nO(c/Ξ΅)
Li & S.β12 Together with an improved βpseudo-approximationβ gives THEOREM:
There is a 2.73- approximation algorithm for k-Median
Li & S.β12 THEOREM:
There is a 2.67- approximation algorithm for k-Median
Byrka et alβ15
State of the Art
Approximation Hardness Facility Location 1.488
[Liβ11]
1.463
[Guha & Khullerβ98]
K-Center 2
[Gonzalesβ85, Hochbaum & Shmoysβ85]
2
[Hsu & Nemhauserβ79]
K-Median 2.6
[Byrka et alβ15]
1+2/e
[Jain et al.β02]
K-Means 9
[Kanungo et al.β04]
1.0013
[Lee. Schmidt, Wrightβ15]
Techniques developed transfers to the different problems
State of the Art
Approximation Hardness Facility Location 1.488
[Liβ11]
1.463
[Guha & Khullerβ98]
K-Center 2
[Gonzalesβ85, Hochbaum & Shmoysβ85]
2
[Hsu & Nemhauserβ79]
K-Median 2.6
[Byrka et alβ15]
1+2/e
[Jain et al.β02]
K-Means 9
[Kanungo et al.β04]
1.0013
[Lee. Schmidt, Wrightβ15]
Techniques developed transfers to the different problems
What is his problem?
Facilities have Capacities
Facilities have Capacities
Each potential facility i has a capacity Ui that regulates how many clients facility can accept 3 3 3 3
Facilities have Capacities
Each potential facility i has a capacity Ui that regulates how many clients facility can accept 3 3 3 3
Facilities have Capacities
Each potential facility i has a capacity Ui that regulates how many clients facility can accept 3 3 3 3
State of the Art
Capacitated Approximation Hardness Facility Location 5
[Bansal, Garg, Guptaβ12]
1.463
[Guha & Khullerβ98]
K-Center 9
[An et al.β14]
3
[Cygan et al.β12]
K-Median
- 1+2/e
[Jain et al.β02]
K-Means
- 1.0013
[Lee, Schmidt, Wrightβ15]
No βuniformβ approach
Standard LP has unbounded integrality gap
APPRECIATE THE DIFFICULTY
Special case of Capacitated Facility Location
Special case: all distances are 0
Special case: all distances are 0
INPUT: n clients, set of facilities with capacities and opening costs
Special case: all distances are 0
INPUT: n clients, set of facilities with capacities and opening costs GOAL: find a subset of facilities so that 1. Total capacity is at least n 2. Opening costs are minimized
Special case: all distances are 0
INPUT: n clients, set of facilities with capacities and opening costs GOAL: find a subset of facilities so that 1. Total capacity is at least n 2. Opening costs are minimized
Minimum Knapsack Problem
Standard LP has bad integrality gap Strengthened using knapsack-cover inequalities
Add a constraint for each subset of facilities βthat we suppose to openβ
Knapsack-Cover Inequalities (Wolseyβ75)
1β¦ 20 clients
β¬2 β€8 β¬0 β€5 β¬1 β€3 β¬10 β€19 β¬0 β€2
Knapsack-Cover Inequalities (Wolseyβ75)
- Suppose a subset S of facilities was already included in the solution
β¦ 20 clients
β¬2 β€8 β¬0 β€5 β¬1 β€3 β¬10 β€19 β¬0 β€2
S
Knapsack-Cover Inequalities (Wolseyβ75)
- Suppose a subset S of facilities was already included in the solution
- Among the remaining facilities must open capacity
β¦ 20 clients
β¬2 β€8 β¬0 β€5 β¬1 β€3 β¬10 β€19 β¬0 β€2
S
Knapsack-Cover Inequalities (Wolseyβ75)
- Suppose a subset S of facilities was already included in the solution
- Among the remaining facilities must open capacity
- Strengthen since no need to have higher capacity than right-hand-side
β¦ 20 clients
β¬2 β€8 β¬0 β€5 β¬1 β€3 β¬10 β€19 β¬0 β€2
S
Knapsack-Cover Inequalities (Wolseyβ75)
- Suppose a subset S of facilities was already included in the solution
- Among the remaining facilities must open capacity
- Strengthen since no need to have higher capacity than right-hand-side
β¦ 20 clients
β¬2 β€8 β¬0 β€5 β¬1 β€3 β¬10 β€19 β¬0 β€2
S
Non-Trivial to Generalize to Facility Location
- Several proposed inequalities
- Leung and Magnantiβ89, Cornuejols, Sridharan, Thizyβ91. Aardalβ92, Aardal, Pochet and Wolseyβ93, Deng and
Simchi-Leviβ93
- Many recently proved insufficient Kolliopoulos & Moysoglouβ13
- Sequence of local search algorithms that give 5-approximation algorithm
- Uniform capacities: Korupolu, Plaxton, Rajaramanβ00, Chudak & Williamsonβ05, Aggarwal et al.β13
- General capacities: Pal, Tardos, Wexlerβ01, Bansal, Garg, Guptaβ12
Recent progress
THEOREM:
A generalization of the knapsack cover inequalities yields a βgoodβ LP- relaxation for capacitated facility location. Polynomial time rounding algorithm that finds a solution whose cost is no more than a constant times LP-OPT.
An, Singh, Svenssonβ14
Constant should be improved; not optimized constant is 288 ο No known large lower bound on the integrality gap Rich family of techniques to tap into to analyze the relaxation Are the techniques flexible enough to apply to related problems?
TIME TO SUMMARIZE
- Many interesting techniques developed by studying these problems
- Quite good understanding of uncapacitated problems
- Increased understanding of capacitated ones
Better algorithms for k-Median and Facility Location? More uniform treatment of capacitated problems?
- Integrality gap of relaxation for capacitated facility location?
- Is there a βgoodβ compact relaxation?
- Constant factor for capacitated k-Median?
What about k-Means?