SLIDE 1
Numpy: Vectorize your brain
Ekaterina Tuzova
SLIDE 2 K nearest neighbors
https://archive.ics.uci.edu/ml/datasets/Wine
SLIDE 3
NumPy
SLIDE 4
What is NumPy?
Numpy is the fundamental package for scientific computing with Python.
SLIDE 5
IPython
SLIDE 6
Python and Performance
SLIDE 7
Python is fast
SLIDE 8
Python is slow
SLIDE 9
Euclidian distance
SLIDE 10
“Magic” timeit
SLIDE 11
Euclidian distance. C
SLIDE 12
Euclidian distance. C
SLIDE 13
Euclidian distance
SLIDE 14
line_profiler and “magic” lprun
SLIDE 15
Euclidian distance
SLIDE 16
Compiled languages
SLIDE 17
Interpreted languages
SLIDE 18
What can be done?
SLIDE 19
NumPy
SLIDE 20
Ufuncs
SLIDE 21 Universal functions
Special type of function defined within a numpy library and it operate element-wise
SLIDE 22
Arithmetic operations
SLIDE 23
Arithmetic operations
SLIDE 24
Arithmetic operations
SLIDE 25
Arithmetic operations
SLIDE 26 Ufuncs available
- Arithmetic
- Bitwise
- Comparison
- Trigonometric
- Floating
…
SLIDE 27
Slicing and indexing
SLIDE 28
Slicing and indexing
SLIDE 29
Slicing and indexing
SLIDE 30
Multidimensional arrays
SLIDE 31
Multidimensional arrays
SLIDE 32
Index arrays
SLIDE 33
Index arrays
SLIDE 34
Index arrays
SLIDE 35
Masking
SLIDE 36
Masking
SLIDE 37
Test train split
SLIDE 38
Test train split
SLIDE 39
Broadcasting
SLIDE 40
Broadcasting
Broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations.
SLIDE 41 Broadcasting rules
- 1. If two arrays differ in their number of dimension,
the shape of the array with the fewer dimensions is padded with ones on it’s leading(left) size.
- 2. If the shape of two arrays doesn’t match in any
dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
- 3. If these conditions are not met, raise a
ValueError: operands could not be broadcast together with shapes
SLIDE 43
np.newaxis
SLIDE 44
np.newaxis
SLIDE 45
np.newaxis
SLIDE 46
Aggregations
SLIDE 47
Aggregations
SLIDE 48
Aggregations
SLIDE 49 NumPy resume
Basic ideas to make you code faster:
- Ufuncs
- Slicing and indexing
- Broadcasting
- Aggregations
SLIDE 50
k-means
SLIDE 51 Algorithm
- 1. Clusters the data into k groups where k is
predefined.
- 2. Select k points at random as cluster centers.
- 3. Assign objects to their closest cluster center
according to the Euclidean distance function.
- 4. Calculate the centroid or mean of all objects
in each cluster.
- 5. Repeat steps 2, 3 and 4 until the same points
are assigned to each cluster in consecutive rounds.
SLIDE 52
Synthetic data
SLIDE 53
SLIDE 54
Vectorized euclidian distance
SLIDE 55
k-means
SLIDE 56
SLIDE 57
Thank you.
@ktisha