Lecture 01 Part 01 Algorithms How do we turn it into something a - PowerPoint PPT Presentation
Lecture 01 Part 01 Algorithms How do we turn it into something a computer Recall DSC 40A... can do? How do we formalize learning from data? Recall DSC 40A... can do? How do we formalize learning from data? How do we turn it
Lecture 01 – Part 01 Algorithms
▶ How do we turn it into something a computer Recall DSC 40A... can do? ▶ How do we formalize learning from data?
Recall DSC 40A... can do? ▶ How do we formalize learning from data? ▶ How do we turn it into something a computer
Example: Predicting Salary
Example: Predicting Salary
The End 𝑐 (𝑌 𝑈 𝑌) −1 ⃗ 𝑥 = 𝑌 𝑈 ⃗
▶ We need an algorithm . Wait... ▶ We actually need to compute the answer...
Wait... ▶ We actually need to compute the answer... ▶ We need an algorithm .
An Algorithm? >>> import numpy as np >>> w = np.linalg.solve(X.T @ X, X.T @ b) ▶ Will it work for 1,000,000 data points? ▶ What about for 1,000,000 features?
Example: Minimize Error 𝑦 1 , … , 𝑦 𝑜 : absolute error: 𝑜 ∑ 𝑗=1 |𝑁 − 𝑦 𝑗 | ▶ Goal : summarize a collection of numbers, ▶ Idea : find number 𝑁 minimizing the total
Example: Minimize Error ▶ Solution : The median of 𝑦 1 , … , 𝑦 𝑜 . ▶ But how do we actually compute the median?
Lecture 01 – Part 02 Example: Clustering
Clustering that are afgected difgerently. ▶ Given a pile of data, discover similar groups. ▶ Examples: ▶ Find political groups within social network data. ▶ Given data on COVID-19 symptoms, discover groups ▶ Find the similar regions of an image ( segmentation ). ▶ Most useful when data is high dimensional...
Example: Old Faithful
Example: Old Faithful
Clustering the data. ▶ Goal: for computer to identify the two groups in
Example: Old Faithful
Clustering can do? problem”. “goodness” of a clustering; find the best . ▶ How do we turn this into something a computer ▶ DSC 40A says: “Turn it into an optimization ▶ Idea : develop a way of quantifying the
Quantifying Separation Define the “separation” 𝜀(𝐶, 𝑆) to be the smallest distance between a blue point and red point.
The Problem ⃗ 𝑦 (1) , … , ⃗ 𝑦 (𝑜) . so as to maximize 𝜀(𝐶, 𝑆) . ▶ Given : 𝑜 points ▶ Find : an assignment of points to clusters R and B
The End
The “Brute Force” Algorithm that with largest separation, 𝜀(𝐶, 𝑆) . ▶ There are finitely-many possible clusterings. ▶ Algorithm : Try each possible clustering, return ▶ This is called a brute force algorithm.
best_separation = float('inf') # Python for ”infinity” best_clustering = None sep = calculate_separation(clustering) if sep < best_separation: print(best_clustering) for clustering in all_clusterings(data): best_separation = sep best_clustering = clustering
The End
Wait... points? ▶ How long will this take to run if there are 𝑜 ▶ How many clusterings of 𝑜 things are there?
Combinatorics objects? 1 Small nitpick: actual color doesn’t matter, 2 𝑜−1 . ▶ How many ways are there to assign R or B to 𝑜 ▶ Two choices 1 for each object: 2 × 2 × … × 2 = 2 𝑜 .
Time a single clustering. nanoseconds to check all clusterings. ▶ Suppose it takes at least 1 nanosecond to check ▶ One billionth of a second. ▶ If there are 𝑜 points, it will take at least 2 𝑜
Time Needed 𝑜 Time 1 1 nanosecond
Time Needed 𝑜 Time 1 1 nanosecond 10 1 microsecond
Time Needed 𝑜 Time 1 1 nanosecond 10 1 microsecond 20 1 millisecond
Time Needed 𝑜 Time 1 1 nanosecond 10 1 microsecond 20 1 millisecond 30 1 second
Time Needed 𝑜 Time 1 1 nanosecond 10 1 microsecond 20 1 millisecond 30 1 second 40 18 minutes
Time Needed 𝑜 Time 1 1 nanosecond 10 1 microsecond 20 1 millisecond 30 1 second 40 18 minutes 50 13 days
Time Needed 30 60 13 days 50 18 minutes 40 1 second 1 millisecond 𝑜 20 1 microsecond 10 1 nanosecond 1 Time 36 years
Time Needed 1 second 70 36 years 60 13 days 50 18 minutes 40 30 𝑜 1 millisecond 20 1 microsecond 10 1 nanosecond 1 Time 37,000 years
Example: Old Faithful ▶ The Old Faithful data set has 270 points. ▶ Brute force algorithm will finish in 6 × 10 64 years.
Example: Old Faithful ▶ The Old Faithful data set has 270 points. ▶ Brute force algorithm will finish in 6 × 10 64 years.
▶ Does this mean our problem is too hard? ▶ We’ll see an effjcient solution by the end of the Algorithm Design quarter. ▶ Oħten, most obvious algorithm is unusably slow .
▶ We’ll see an effjcient solution by the end of the Algorithm Design quarter. ▶ Oħten, most obvious algorithm is unusably slow . ▶ Does this mean our problem is too hard?
Algorithm Design quarter. ▶ Oħten, most obvious algorithm is unusably slow . ▶ Does this mean our problem is too hard? ▶ We’ll see an effjcient solution by the end of the
DSC 40B work. strategies and data structures. ▶ Assess the effjciency of algorithms. ▶ Understand why and how common algorithms ▶ Develop faster algorithms using design
Recommend
More recommend
Explore More Topics
Stay informed with curated content and fresh updates.