Numpy: Vectorize your brain K nearest neighbors - - PowerPoint PPT Presentation

numpy vectorize your brain k nearest neighbors
SMART_READER_LITE
LIVE PREVIEW

Numpy: Vectorize your brain K nearest neighbors - - PowerPoint PPT Presentation

Ekaterina Tuzova Numpy: Vectorize your brain K nearest neighbors https://archive.ics.uci.edu/ml/datasets/Wine NumPy What is NumPy? Numpy is the fundamental package for scientific computing with Python. IPython Python and Performance Python


slide-1
SLIDE 1

Numpy: Vectorize your brain

Ekaterina Tuzova

slide-2
SLIDE 2

K nearest neighbors

https://archive.ics.uci.edu/ml/datasets/Wine

slide-3
SLIDE 3

NumPy

slide-4
SLIDE 4

What is NumPy?

Numpy is the fundamental package for scientific computing with Python.

slide-5
SLIDE 5

IPython

slide-6
SLIDE 6

Python and Performance

slide-7
SLIDE 7

Python is fast

slide-8
SLIDE 8

Python is slow

slide-9
SLIDE 9

Euclidian distance

slide-10
SLIDE 10

“Magic” timeit

slide-11
SLIDE 11

Euclidian distance. C

slide-12
SLIDE 12

Euclidian distance. C

slide-13
SLIDE 13

Euclidian distance

slide-14
SLIDE 14

line_profiler and “magic” lprun

slide-15
SLIDE 15

Euclidian distance

slide-16
SLIDE 16

Compiled languages

slide-17
SLIDE 17

Interpreted languages

slide-18
SLIDE 18

What can be done?

slide-19
SLIDE 19

NumPy

slide-20
SLIDE 20

Ufuncs

slide-21
SLIDE 21

Universal functions

Special type of function defined within a numpy library and it operate element-wise

  • n arrays.
slide-22
SLIDE 22

Arithmetic operations

slide-23
SLIDE 23

Arithmetic operations

slide-24
SLIDE 24

Arithmetic operations

slide-25
SLIDE 25

Arithmetic operations

slide-26
SLIDE 26

Ufuncs available

  • Arithmetic
  • Bitwise
  • Comparison
  • Trigonometric
  • Floating

slide-27
SLIDE 27

Slicing and indexing

slide-28
SLIDE 28

Slicing and indexing

slide-29
SLIDE 29

Slicing and indexing

slide-30
SLIDE 30

Multidimensional arrays

slide-31
SLIDE 31

Multidimensional arrays

slide-32
SLIDE 32

Index arrays

slide-33
SLIDE 33

Index arrays

slide-34
SLIDE 34

Index arrays

slide-35
SLIDE 35

Masking

slide-36
SLIDE 36

Masking

slide-37
SLIDE 37

Test train split

slide-38
SLIDE 38

Test train split

slide-39
SLIDE 39

Broadcasting

slide-40
SLIDE 40

Broadcasting

Broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations.

slide-41
SLIDE 41

Broadcasting rules

  • 1. If two arrays differ in their number of dimension,

the shape of the array with the fewer dimensions is padded with ones on it’s leading(left) size.

  • 2. If the shape of two arrays doesn’t match in any

dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.

  • 3. If these conditions are not met, raise a

ValueError: operands could not be broadcast together with shapes

slide-42
SLIDE 42
  • Broadcasting. Example
slide-43
SLIDE 43

np.newaxis

slide-44
SLIDE 44

np.newaxis

slide-45
SLIDE 45

np.newaxis

slide-46
SLIDE 46

Aggregations

slide-47
SLIDE 47

Aggregations

slide-48
SLIDE 48

Aggregations

slide-49
SLIDE 49

NumPy resume

Basic ideas to make you code faster:

  • Ufuncs
  • Slicing and indexing
  • Broadcasting
  • Aggregations
slide-50
SLIDE 50

k-means

slide-51
SLIDE 51

Algorithm

  • 1. Clusters the data into k groups where k is

predefined.

  • 2. Select k points at random as cluster centers.
  • 3. Assign objects to their closest cluster center

according to the Euclidean distance function.

  • 4. Calculate the centroid or mean of all objects

in each cluster.

  • 5. Repeat steps 2, 3 and 4 until the same points

are assigned to each cluster in consecutive rounds.

slide-52
SLIDE 52

Synthetic data

slide-53
SLIDE 53
slide-54
SLIDE 54

Vectorized euclidian distance

slide-55
SLIDE 55

k-means

slide-56
SLIDE 56
slide-57
SLIDE 57

Thank you.

@ktisha