Numpy: Vectorize your brain K nearest neighbors - - PowerPoint PPT Presentation

▶

Jan 06, 2024 155 likes •735 views

Ekaterina Tuzova Numpy: Vectorize your brain K nearest neighbors https://archive.ics.uci.edu/ml/datasets/Wine NumPy What is NumPy? Numpy is the fundamental package for scientific computing with Python. IPython Python and Performance Python

SLIDE 1

Numpy: Vectorize your brain

Ekaterina Tuzova

SLIDE 2

K nearest neighbors

https://archive.ics.uci.edu/ml/datasets/Wine

SLIDE 3

NumPy

SLIDE 4

What is NumPy?

Numpy is the fundamental package for scientific computing with Python.

SLIDE 5

IPython

SLIDE 6

Python and Performance

SLIDE 7

Python is fast

SLIDE 8

Python is slow

SLIDE 9

Euclidian distance

SLIDE 10

“Magic” timeit

SLIDE 11

Euclidian distance. C

SLIDE 12

Euclidian distance. C

SLIDE 13

Euclidian distance

SLIDE 14

line_profiler and “magic” lprun

SLIDE 15

Euclidian distance

SLIDE 16

Compiled languages

SLIDE 17

Interpreted languages

SLIDE 18

What can be done?

SLIDE 19

NumPy

SLIDE 20

Ufuncs

SLIDE 21

Universal functions

Special type of function defined within a numpy library and it operate element-wise

n arrays.

SLIDE 22

Arithmetic operations

SLIDE 23

Arithmetic operations

SLIDE 24

Arithmetic operations

SLIDE 25

Arithmetic operations

SLIDE 26

Ufuncs available

Arithmetic
Bitwise
Comparison
Trigonometric
Floating

…

SLIDE 27

Slicing and indexing

SLIDE 28

Slicing and indexing

SLIDE 29

Slicing and indexing

SLIDE 30

Multidimensional arrays

SLIDE 31

Multidimensional arrays

SLIDE 32

Index arrays

SLIDE 33

Index arrays

SLIDE 34

Index arrays

SLIDE 35

Masking

SLIDE 36

Masking

SLIDE 37

Test train split

SLIDE 38

Test train split

SLIDE 39

Broadcasting

SLIDE 40

Broadcasting

Broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations.

SLIDE 41

Broadcasting rules

1. If two arrays differ in their number of dimension,

the shape of the array with the fewer dimensions is padded with ones on it’s leading(left) size.

2. If the shape of two arrays doesn’t match in any

dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.

3. If these conditions are not met, raise a

ValueError: operands could not be broadcast together with shapes

SLIDE 42

Broadcasting. Example

SLIDE 43

np.newaxis

SLIDE 44

np.newaxis

SLIDE 45

np.newaxis

SLIDE 46

Aggregations

SLIDE 47

Aggregations

SLIDE 48

Aggregations

SLIDE 49

NumPy resume

Basic ideas to make you code faster:

Ufuncs
Slicing and indexing
Broadcasting
Aggregations

SLIDE 50

k-means

SLIDE 51

Algorithm

1. Clusters the data into k groups where k is

predefined.

2. Select k points at random as cluster centers.
3. Assign objects to their closest cluster center

according to the Euclidean distance function.

4. Calculate the centroid or mean of all objects

in each cluster.

5. Repeat steps 2, 3 and 4 until the same points

are assigned to each cluster in consecutive rounds.

SLIDE 52

Synthetic data

SLIDE 53

SLIDE 54

Vectorized euclidian distance

SLIDE 55

k-means

SLIDE 56

SLIDE 57

Numpy: Vectorize your brain

Ekaterina Tuzova

K nearest neighbors

NumPy

What is NumPy?

Numpy is the fundamental package for scientific computing with Python.

IPython

Python and Performance

Python is fast

Python is slow

Euclidian distance

“Magic” timeit

Euclidian distance. C

Euclidian distance. C

Euclidian distance

line_profiler and “magic” lprun

Euclidian distance

Compiled languages

Interpreted languages

What can be done?

NumPy

Ufuncs

Universal functions

Special type of function defined within a numpy library and it operate element-wise

Arithmetic operations

Arithmetic operations

Arithmetic operations

Arithmetic operations

Ufuncs available

…

Slicing and indexing

Slicing and indexing

Slicing and indexing

Multidimensional arrays

Multidimensional arrays

Index arrays

Index arrays

Index arrays

Masking

Masking

Test train split

Test train split

Broadcasting

Broadcasting

Broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations.

Broadcasting rules

np.newaxis

np.newaxis

np.newaxis

Aggregations

Aggregations

Aggregations

NumPy resume

Basic ideas to make you code faster:

k-means

Algorithm

predefined.

according to the Euclidean distance function.

in each cluster.

are assigned to each cluster in consecutive rounds.

Synthetic data

Vectorized euclidian distance

k-means

Thank you.

@ktisha