CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014 - - PowerPoint PPT Presentation

β–Ά
cpsc 320 intermediate algorithm
SMART_READER_LITE
LIVE PREVIEW

CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014 - - PowerPoint PPT Presentation

CPSC 320: Intermediate Algorithm Design and Analysis July 28, 2014 1 Course Outline Introduction and basic concepts Asymptotic notation Greedy algorithms Graph theory Amortized analysis Recursion


slide-1
SLIDE 1

1

CPSC 320: Intermediate Algorithm Design and Analysis

July 28, 2014

slide-2
SLIDE 2

2

Course Outline

  • Introduction and basic concepts
  • Asymptotic notation
  • Greedy algorithms
  • Graph theory
  • Amortized analysis
  • Recursion
  • Divide-and-conquer algorithms
  • Randomized algorithms
  • Dynamic programming algorithms
  • NP-completeness
slide-3
SLIDE 3

3

Dynamic Programming

slide-4
SLIDE 4

4

Dynamic Programming Components

  • Analyse the structure of an optimal solution
  • Separate one choice (usually the last) from a subproblem
  • Phrase the value of a choice as a function of the choice and the subproblem
  • Phrase an optimal solution as the value of the best choice
  • Usually a max/min result
  • Implement the calculation of the optimal value
  • Memoization: save optimal values as we compute them
  • Bottom-up: evaluate smaller problems and use them for bigger problems
  • Top-down: evaluate big problem by calling smaller problems recursively and

saving result

  • Keep record of the choice made in each level
  • Rebuild the optimal solution from the optimal value result
slide-5
SLIDE 5

5

Knapsack Problem

Algorithm Knapsack(π‘₯, π‘ž, 𝑁) – π‘₯ is array of weights, π‘ž is array of values, 𝑁 is limit 𝑠 0, 𝑛 ← 0, π‘š 0, 𝑛 ← false for 𝑛 = 0,1,2, … , 𝑁 For 𝑗 ← 1 To π‘₯ Do For 𝑛 ← 1 To 𝑁 Do If π‘₯ 𝑗 > 𝑛 Or 𝑠 𝑗 βˆ’ 1, 𝑛 > 𝑠 𝑗 βˆ’ 1, 𝑛 βˆ’ π‘₯ 𝑗 + π‘ž[𝑗] Then 𝑠 𝑗, 𝑛 ← 𝑠 𝑗 βˆ’ 1, 𝑛 , π‘š 𝑗, 𝑛 ← π‘š[𝑗 βˆ’ 1, 𝑛] Else 𝑠 𝑗, 𝑛 ← 𝑠 𝑗 βˆ’ 1, 𝑛 βˆ’ π‘₯ 𝑗 + π‘ž[𝑗], π‘š 𝑗, 𝑛 ← 𝑗 𝑑 ← βˆ…, 𝑦 ← π‘₯ While 𝑁 > 0 And π‘š[𝑦, 𝑁] is not false Do 𝑦 ← π‘š[𝑦, 𝑁], 𝑑 ← 𝑑 βˆͺ 𝑦, 𝑁 ← 𝑁 βˆ’ π‘₯ 𝑦 , 𝑦 ← 𝑦 βˆ’ 1 Return 𝑑

slide-6
SLIDE 6

6

Knapsack Algorithm - Complexity

  • What is the time complexity of the knapsack algorithm?
  • 𝑃(π‘œπ‘‹) (number of items times the weight limit)
  • This algorithm is called pseudo-polynomial
  • Time complexity is based on the value of the input, not just the size
  • There is no known polynomial algorithm to solve the knapsack problem
slide-7
SLIDE 7

7

Algorithm Strategies - Review

  • Dynamic programming algorithms:
  • Choice is made based on evaluation of all possible results
  • Time and space complexity are usually higher
  • Greedy algorithms:
  • Choice is made based on locally optimal solution
  • Usually faster, but may not result in globally optimal solution
  • Divide and conquer algorithms:
  • Choice of input division is made based on assumption that merging result of

subproblems is optimal

slide-8
SLIDE 8

8

Global Sequence Alignment Problem

  • Problem: given two sequences, analyse how similar they are
  • Allow both gaps and mismatches
  • Application:
  • Finding suggestions for misspelled words (comparing strings)
  • Comparing files (diff)
  • Analyse if two pieces of DNA match
  • Example: β€œocurrance” vs β€œoccurrence”
  • There is a letter β€œc” missing (gap)
  • An β€œa” was used instead of an β€œe” (mismatch)
  • Mismatches may be seen as gaps in both sides
  • β€œoc-urra-nce” vs β€œoccurr-ence”
slide-9
SLIDE 9

9

Formal Definition

  • We represent a gap with a hyphen β€œβˆ’β€
  • A sequence alignment of (π‘Œ, 𝑍) is a pair (π‘Œβ€², 𝑍′) of sequences, such that:
  • π‘Œβ€² minus the gaps is π‘Œ, 𝑍′ minus the gaps is 𝑍
  • π‘Œβ€² = 𝑍′ (the size is the same for both sides)
  • If π‘Œπ‘—

β€² = βˆ’, then 𝑍 𝑗 β€² β‰  βˆ’ (you can’t have gaps on both sides)

  • A parameter πœ€ > 0 defines the gap penalty (penalty if one side has a gap)
  • A parameter π›½π‘žπ‘Ÿ defines the mismatch penalty of matching π‘ž and π‘Ÿ (π›½π‘žπ‘ž = 0)
  • The cost of a matching (π‘Œβ€², 𝑍′) is 𝑗=0

π‘Œβ€²

π‘žπ‘“π‘œ(𝑦𝑗, 𝑧𝑗)

slide-10
SLIDE 10

10

Finding the Best Alignment

  • What is the choice to be made?
  • Last character could be a gap on either side, or a potential mismatch
  • Assume 𝐺(𝑗, π‘˜) is the penalty for the best alignment of 𝑦1. . 𝑦𝑗 and 𝑧1. . π‘§π‘˜

𝐺 𝑗, π‘˜ = π‘˜ β‹… πœ€ 𝑗 = 0 𝑗 β‹… πœ€ π‘˜ = 0 min 𝐺 𝑗 βˆ’ 1, π‘˜ βˆ’ 1 + π›½π‘¦π‘—π‘§π‘˜, 𝐺 𝑗 βˆ’ 1, π‘˜ + πœ€, 𝐺 𝑗, π‘˜ βˆ’ 1 + πœ€

  • therwise
slide-11
SLIDE 11

11

Algorithm (Smith-Wasserman)

Algorithm SmithWasserman(π‘Œ, 𝑍, πœ€, 𝛽) For 𝑗 ← 0 To |π‘Œ| Do 𝐺 𝑗, 0 ← 𝑗 β‹… πœ€ For π‘˜ ← 1 To |𝑍| Do 𝐺 0, π‘˜ ← π‘˜ β‹… πœ€ For 𝑗 ← 1 To |π‘Œ| Do 𝑛 ← 𝐺 𝑗 βˆ’ 1, π‘˜ βˆ’ 1 + 𝛽 π‘Œ 𝑗 , 𝑍 π‘˜

  • - matching cost

𝑕𝑦 ← 𝐺 𝑗, π‘˜ βˆ’ 1 + πœ€, 𝑕𝑧 ← 𝐺 𝑗 βˆ’ 1, π‘˜ + πœ€ -- gap penalty in π‘Œ, 𝑍 If 𝑛 ≀ 𝑕𝑦 And 𝑛 ≀ 𝑕𝑧 Then 𝐺 𝑗, π‘˜ ← 𝑛, 𝐼 𝑗, π‘˜ ←”match” Else If 𝑕𝑦 ≀ 𝑕𝑧 Then 𝐺 𝑗, π‘˜ ← 𝑕𝑦, 𝐼 𝑗, π‘˜ ←”gap in X” Else 𝐺 𝑗, π‘˜ ← 𝑕𝑧, 𝐼 𝑗, π‘˜ ←”gap in Y”

slide-12
SLIDE 12

12

Algorithm (cont.)

… π‘Œβ€² ← β€œβ€, 𝑍′ ← β€œβ€ 𝑗 ← 𝑛, π‘˜ ← π‘œ While 𝑗 > 0 Or π‘˜ > 0 Do If 𝐼 𝑗, π‘˜ = β€œmatch” Then π‘Œβ€² ← π‘Œ 𝑗 . Xβ€², 𝑍′ ← 𝑍 π‘˜ . Yβ€² 𝑗 ← 𝑗 βˆ’ 1, π‘˜ ← π‘˜ βˆ’ 1 Else If 𝐼 𝑗, π‘˜ = β€œgap in X” Then π‘Œβ€² ← βˆ’ . Xβ€², 𝑍′ ← 𝑍 π‘˜ . Yβ€² π‘˜ ← π‘˜ βˆ’ 1 Else π‘Œβ€² ← π‘Œ 𝑗 . Xβ€², 𝑍′ ← βˆ’ . Yβ€² 𝑗 ← 𝑗 βˆ’ 1 Return π‘Œβ€², 𝑍′, 𝐺[𝑛, π‘œ]

slide-13
SLIDE 13

13

Longest Common Subsequence

  • Subsequence: any sequence of items that is contained in the original sequence in

the same order (but not necessarily consecutively)

  • Example: 𝐢, 𝐷, 𝐸, 𝐢 is a subsequence of 𝐡, π‘ͺ, 𝑫, 𝐢, 𝑬, 𝐡, π‘ͺ
  • Problem: Given two sequences π‘Œ and 𝑍, find the longest common subsequence of

π‘Œ and 𝑍

  • Application:
  • Find common DNA sequences in different organisms
  • Video compression (inter-frame comparison)
slide-14
SLIDE 14

14

Characterizing the LCS

  • Define π‘Œπ‘— as the sequence π‘Œ limited to the first 𝑗 elements
  • Given two sequences π‘Œ = 𝑦1, . . , 𝑦𝑛 and 𝑍 = 𝑧1, . . , π‘§π‘œ , let π‘Ž = 𝑨1, . . , 𝑨𝑙 be the longest

common subsequence (LCS) of π‘Œ and 𝑍

  • If 𝑦𝑛 = π‘§π‘œ, then 𝑨𝑙 = 𝑦𝑛 = π‘§π‘œ, and π‘Žπ‘™βˆ’1 is an LCS of π‘Œπ‘›βˆ’1 and 𝑍

π‘œβˆ’1

  • If 𝑦𝑛 β‰  π‘§π‘œ, then π‘Ž is either an LCS of π‘Œπ‘› and 𝑍

π‘œβˆ’1, or an LCS of π‘Œπ‘›βˆ’1 and 𝑍 π‘œ

  • Define the length of the LCS of π‘Œπ‘— and 𝑍

π‘˜ as:

𝑑 𝑗, π‘˜ = 𝑗 = 0 ∨ π‘˜ = 0 𝑑 𝑗 βˆ’ 1, π‘˜ βˆ’ 1 + 1 𝑗, π‘˜ > 0 ∧ 𝑦𝑗 = π‘§π‘˜ max{𝑑 𝑗, π‘˜ βˆ’ 1 , 𝑑 𝑗 βˆ’ 1, π‘˜ }

  • therwise
slide-15
SLIDE 15

15

Algorithm

Algorithm LongestCommonSubsequence(π‘Œ, 𝑍) For 𝑗 ← 0 To π‘Œ Do c[𝑗, 0] ← 0 For π‘˜ ← 1 To 𝑍 Do c[0, π‘˜] ← 0 For 𝑗 ← 1 To |π‘Œ| Do If π‘Œ 𝑗 = 𝑍[π‘˜] Then 𝑑 𝑗, π‘˜ ← 𝑑 𝑗 βˆ’ 1, π‘˜ βˆ’ 1 + 1, β„Ž 𝑗, π‘˜ ← β€œ+” Else If 𝑑 𝑗 βˆ’ 1, π‘˜ > 𝑑[𝑗, π‘˜ βˆ’ 1] Then 𝑑 𝑗, π‘˜ ← 𝑑[𝑗 βˆ’ 1, π‘˜], β„Ž 𝑗, π‘˜ ← β€œX” Else 𝑑 𝑗, π‘˜ ← 𝑑[𝑗, π‘˜ βˆ’ 1], β„Ž 𝑗, π‘˜ ← β€œY” PrintLCS(π‘Œ, β„Ž, |π‘Œ|, |𝑍|) Return 𝑑 π‘Œ , 𝑍

slide-16
SLIDE 16

16

Algorithm (cont.)

Algorithm PrintLCS(β„Ž, π‘Œ, 𝑗, π‘˜) If 𝑗 = 0 Or π‘˜ = 0 Then Return If β„Ž 𝑗, π‘˜ = β€œ+” Then PrintLCS(β„Ž, π‘Œ, 𝑗 βˆ’ 1, π‘˜ βˆ’ 1) Print π‘Œ[𝑗] Else If β„Ž 𝑗, π‘˜ = β€œX” Then PrintLCS(β„Ž, π‘Œ, 𝑗 βˆ’ 1, π‘˜) Else PrintLCS(β„Ž, π‘Œ, 𝑗, π‘˜ βˆ’ 1)

slide-17
SLIDE 17

17

NP Complexity

slide-18
SLIDE 18

18

Time Complexity for Decision Problems

  • From this point on we analyse time complexity for problems, not algorithms
  • We want to know what is the best possible complexity for the problem
  • Our focus now is on decision problems, not optimization problems
  • Decision problems: Yes/No answer
  • Optimization: β€œfind best”, β€œfind maximum”, β€œfind minimum”
  • We also need to distinguish β€œfinding” and β€œchecking” a solution
slide-19
SLIDE 19

19

Time Complexity - Classes

  • A problem is solvable in polynomial time if there is an algorithm that solves it, that

runs in 𝑃 π‘œπ‘™ , where 𝑙 ∈ Θ 1 and π‘œ is the size of the input representation

  • Example: sort (𝑃 π‘œ log π‘œ βŠ‚ 𝑃 π‘œ2 ), select (𝑃(π‘œ)), longest common subsequence

(𝑃(π‘œ2)), matrix multiplication (𝑃(π‘œ3) or better)

  • P: set of all decision problems that are solvable in polynomial time
  • NP (non-deterministic P): set of all decision problems for which a given certificate

can be checked in polynomial time

slide-20
SLIDE 20

20

Example: Hamiltonian Path

  • Problem: given a graph, is there a path that goes through every node exactly
  • nce?
  • Decision problem: answer is yes or no
  • Optimization problem: find a path with minimum cost, etc.; not required
  • Is this problem in NP?
  • Given a path, can we verify that the path is correct in polynomial time?
  • Is this problem in P?
  • Can we solve it in polynomial time?
slide-21
SLIDE 21

21

Example: Satisfiability

  • Given a set of boolean variables 𝑦1, . . π‘¦π‘œ and a set of clauses where each clause is a

disjunction of variables or their complements, such as: 𝑦1 ∨ 𝑦2 ∨ 𝑦4 𝑦1 ∨ 𝑦3 ∨ 𝑦4 𝑦2 ∨ 𝑦3 ∨ 𝑦5 ∨ 𝑦7

  • Problem: can we assign True/False values to each value so that each clause is

satisfied?

  • Problem simplification: every clause has at most 3 variables (3-SAT)
  • Is this problem in NP?
  • Is this problem in P?
slide-22
SLIDE 22

22

NP Complete Problem

  • Turns out nobody knows!
  • There is no known algorithm that runs in polynomial time
  • There is no proof that such an algorithm doesn’t exist
  • NP complete: set of all decision problems that:
  • Are in NP (solution can be verified in polynomial time)
  • Are at least as hard as any problem in NP
  • NP hard: set of all problems for which the second rule applies
  • Includes non-decision problems (e.g., optimization)
slide-23
SLIDE 23

23

Problem Reduction

  • Some times the solution to a problem is to reduce it into another
  • Reduction: given a problem A, reduce it to B by:
  • providing a way to transform the input (instance) of A into a valid input of B in

polynomial time

  • providing a translation from the result of B into the result of A
  • proving that this transformation results in a valid solution for A
  • Example: stable matching problem, co-op setting, multiple jobs in a company
  • transform a company with 𝑙 jobs into 𝑙 companies with one job each, all with

same preferences

  • each student lists the 𝑙 companies instead of the one company in any order
  • in the result, translate matches for these 𝑙 companies into matches for original

company

slide-24
SLIDE 24

24

NP Complete

  • How do we prove a problem π‘ž is NP complete?
  • Reduce any known NP complete problem π‘žβ€² into π‘ž in polynomial time
  • If π‘žβ€² is NP complete, and we can solve π‘žβ€² by reducing it into π‘ž, then π‘ž is bound

in time complexity by the problem π‘žβ€²

slide-25
SLIDE 25

25

Graph Coloring

  • Graph coloring: can we color all nodes in a graph using at most 𝑙 colors, such that

adjacent nodes (i.e., nodes linked by an edge) have different colors?

  • Application: exam scheduling, map coloring
  • This is in NP, since, given a coloring, we can verify it’s valid in 𝑃 𝐹
  • Is this NP complete?
  • Let’s reduce the satisfiability problem into it