Inde x ing DataFrames MAN IP U L ATIN G DATAFR AME S W ITH PAN - - PowerPoint PPT Presentation

inde x ing dataframes
SMART_READER_LITE
LIVE PREVIEW

Inde x ing DataFrames MAN IP U L ATIN G DATAFR AME S W ITH PAN - - PowerPoint PPT Presentation

Inde x ing DataFrames MAN IP U L ATIN G DATAFR AME S W ITH PAN DAS Anaconda Instr u ctor A simple DataFrame import pandas as pd df = pd.read_csv('sales.csv', index_col='month') df eggs salt spam month Jan 47 12.0 17 Feb


slide-1
SLIDE 1

Indexing DataFrames

MAN IP U L ATIN G DATAFR AME S W ITH PAN DAS

Anaconda

Instructor

slide-2
SLIDE 2

MANIPULATING DATAFRAMES WITH PANDAS

A simple DataFrame

import pandas as pd df = pd.read_csv('sales.csv', index_col='month') df eggs salt spam month Jan 47 12.0 17 Feb 110 50.0 31 Mar 221 89.0 72 Apr 77 87.0 20 May 132 NaN 52 Jun 205 60.0 55

slide-3
SLIDE 3

MANIPULATING DATAFRAMES WITH PANDAS

Indexing using square brackets

df eggs salt spam month Jan 47 12.0 17 Feb 110 50.0 31 Mar 221 89.0 72 Apr 77 87.0 20 May 132 NaN 52 Jun 205 60.0 55 df['salt']['Jan'] 12.0

slide-4
SLIDE 4

MANIPULATING DATAFRAMES WITH PANDAS

Using column attribute and row label

df eggs salt spam month Jan 47 12.0 17 Feb 110 50.0 31 Mar 221 89.0 72 Apr 77 87.0 20 May 132 NaN 52 Jun 205 60.0 55 df.eggs['Mar'] 221

slide-5
SLIDE 5

MANIPULATING DATAFRAMES WITH PANDAS

Using the .loc accessor

df eggs salt spam month Jan 47 12.0 17 Feb 110 50.0 31 Mar 221 89.0 72 Apr 77 87.0 20 May 132 NaN 52 Jun 205 60.0 55 df.loc['May', 'spam'] 52.0

slide-6
SLIDE 6

MANIPULATING DATAFRAMES WITH PANDAS

Using the .iloc accessor

df eggs salt spam month Jan 47 12.0 17 Feb 110 50.0 31 Mar 221 89.0 72 Apr 77 87.0 20 May 132 NaN 52 Jun 205 60.0 55 df.iloc[4, 2] 52.0

slide-7
SLIDE 7

MANIPULATING DATAFRAMES WITH PANDAS

Selecting only some columns

df_new = df[['salt','eggs']] df_new salt eggs month Jan 12.0 47 Feb 50.0 110 Mar 89.0 221 Apr 87.0 77 May NaN 132 Jun 60.0 205

slide-8
SLIDE 8

Let's practice!

MAN IP U L ATIN G DATAFR AME S W ITH PAN DAS

slide-9
SLIDE 9

Slicing DataFrames

MAN IP U L ATIN G DATAFR AME S W ITH PAN DAS

Anaconda

Instructor

slide-10
SLIDE 10

MANIPULATING DATAFRAMES WITH PANDAS

sales DataFrame

df eggs salt spam month Jan 47 12.0 17 Feb 110 50.0 31 Mar 221 89.0 72 Apr 77 87.0 20 May 132 NaN 52 Jun 205 60.0 55

slide-11
SLIDE 11

MANIPULATING DATAFRAMES WITH PANDAS

Selecting a column (i.e., Series)

df['eggs'] month Jan 47 Feb 110 Mar 221 Apr 77 May 132 Jun 205 Name: eggs, dtype: int64 type(df['eggs']) pandas.core.series.Series

slide-12
SLIDE 12

MANIPULATING DATAFRAMES WITH PANDAS

Slicing and indexing a Series

df['eggs'][1:4] # Part of the eggs column month Feb 110 Mar 221 Apr 77 Name: eggs, dtype: int64 df['eggs'][4] # The value associated with May 132

slide-13
SLIDE 13

MANIPULATING DATAFRAMES WITH PANDAS

Using .loc[]

df.loc[:, 'eggs':'salt'] # All rows, some columns eggs salt month Jan 47 12.0 Feb 110 50.0 Mar 221 89.0 Apr 77 87.0 May 132 NaN Jun 205 60.0

slide-14
SLIDE 14

MANIPULATING DATAFRAMES WITH PANDAS

Using .loc[]

df.loc['Jan':'Apr',:] # Some rows, all columns eggs salt spam month Jan 47 12.0 17 Feb 110 50.0 31 Mar 221 89.0 72 Apr 77 87.0 20

slide-15
SLIDE 15

MANIPULATING DATAFRAMES WITH PANDAS

Using .loc[]

df.loc['Mar':'May', 'salt':'spam'] salt spam month Mar 89.0 72 Apr 87.0 20 May NaN 52

slide-16
SLIDE 16

MANIPULATING DATAFRAMES WITH PANDAS

Using .iloc[]

df.iloc[2:5, 1:] # A block from middle of the DataFrame salt spam month Mar 89.0 72 Apr 87.0 20 May NaN 52

slide-17
SLIDE 17

MANIPULATING DATAFRAMES WITH PANDAS

Using lists rather than slices

df.loc['Jan':'May', ['eggs', 'spam']] eggs spam month Jan 47 17 Feb 110 31 Mar 221 72 Apr 77 20 May 132 52

slide-18
SLIDE 18

MANIPULATING DATAFRAMES WITH PANDAS

Using lists rather than slices

df.iloc[[0,4,5], 0:2] eggs salt month Jan 47 12.0 May 132 NaN Jun 205 60.0

slide-19
SLIDE 19

MANIPULATING DATAFRAMES WITH PANDAS

Series versus 1-column DataFrame

# A Series by column name df['eggs'] eggs month Jan 47 Feb 110 Mar 221 ... ... type(df['eggs']) pandas.core.series.Series # A DataFrame w/single column df[['eggs']] eggs month Jan 47 Feb 110 Mar 221 ... ... type(df[['eggs']]) pandas.core.frame.DataFrame

slide-20
SLIDE 20

Let's practice!

MAN IP U L ATIN G DATAFR AME S W ITH PAN DAS

slide-21
SLIDE 21

Filtering DataFrames

MAN IP U L ATIN G DATAFR AME S W ITH PAN DAS

Anaconda

Instructor

slide-22
SLIDE 22

MANIPULATING DATAFRAMES WITH PANDAS

Creating a Boolean Series

df.salt > 60 month Jan False Feb False Mar True Apr True May False Jun False Name: salt, dtype: bool

slide-23
SLIDE 23

MANIPULATING DATAFRAMES WITH PANDAS

Filtering with a Boolean Series

df[df.salt > 60] eggs salt spam month Mar 221 89.0 72 Apr 77 87.0 20 enough_salt_sold = df.salt > 60 df[enough_salt_sold] eggs salt spam month Mar 221 89.0 72 Apr 77 87.0 20

slide-24
SLIDE 24

MANIPULATING DATAFRAMES WITH PANDAS

Combining filters

df[(df.salt >= 50) & (df.eggs < 200)] # Both conditions eggs salt spam month Feb 110 50.0 31 Apr 77 87.0 20 df[(df.salt >= 50) | (df.eggs < 200)] # Either condition eggs salt spam month Jan 47 12.0 17 Feb 110 50.0 31 Mar 221 89.0 72 Apr 77 87.0 20 May 132 NaN 52 Jun 205 60.0 55

slide-25
SLIDE 25

MANIPULATING DATAFRAMES WITH PANDAS

DataFrames with zeros and NaNs

df2 = df.copy() df2['bacon'] = [0, 0, 50, 60, 70, 80] df2 eggs salt spam bacon month Jan 47 12.0 17 0 Feb 110 50.0 31 0 Mar 221 89.0 72 50 Apr 77 87.0 20 60 May 132 NaN 52 70 Jun 205 60.0 55 80

slide-26
SLIDE 26

MANIPULATING DATAFRAMES WITH PANDAS

Select columns with all nonzeros

df2.loc[:, df2.all()] eggs salt spam month Jan 47 12.0 17 Feb 110 50.0 31 Mar 221 89.0 72 Apr 77 87.0 20 May 132 NaN 52 Jun 205 60.0 55

slide-27
SLIDE 27

MANIPULATING DATAFRAMES WITH PANDAS

Select columns with any nonzeros

df2.loc[:, df2.any()] eggs salt spam bacon month Jan 47 12.0 17 0 Feb 110 50.0 31 0 Mar 221 89.0 72 50 Apr 77 87.0 20 60 May 132 NaN 52 70 Jun 205 60.0 55 80

slide-28
SLIDE 28

MANIPULATING DATAFRAMES WITH PANDAS

Select columns with any NaNs

df.loc[:, df.isnull().any()] salt month Jan 12.0 Feb 50.0 Mar 89.0 Apr 87.0 May NaN Jun 60.0

slide-29
SLIDE 29

MANIPULATING DATAFRAMES WITH PANDAS

Select columns without NaNs

df.loc[:, df.notnull().all()] eggs spam month Jan 47 17 Feb 110 31 Mar 221 72 Apr 77 20 May 132 52 Jun 205 55

slide-30
SLIDE 30

MANIPULATING DATAFRAMES WITH PANDAS

Drop rows with any NaNs

df.dropna(how='any') eggs salt spam month Jan 47 12.0 17 Feb 110 50.0 31 Mar 221 89.0 72 Apr 77 87.0 20 Jun 205 60.0 55

slide-31
SLIDE 31

MANIPULATING DATAFRAMES WITH PANDAS

Filtering a column based on another

df.eggs[df.salt > 55] month Mar 221 Apr 77 Jun 205 Name: eggs, dtype: int64

slide-32
SLIDE 32

MANIPULATING DATAFRAMES WITH PANDAS

Modifying a column based on another

df.eggs[df.salt > 55] += 5 df eggs salt spam month Jan 47 12.0 17 Feb 110 50.0 31 Mar 226 89.0 72 Apr 82 87.0 20 May 132 NaN 52 Jun 210 60.0 55

slide-33
SLIDE 33

Let's practice!

MAN IP U L ATIN G DATAFR AME S W ITH PAN DAS

slide-34
SLIDE 34

Transforming DataFrames

MAN IP U L ATIN G DATAFR AME S W ITH PAN DAS

Anaconda

Instructor

slide-35
SLIDE 35

MANIPULATING DATAFRAMES WITH PANDAS

DataFrame vectorized methods

df.floordiv(12) # Convert to dozens unit eggs salt spam month Jan 3 1.0 1 Feb 9 4.0 2 Mar 18 7.0 6 Apr 6 7.0 1 May 11 NaN 4 Jun 17 5.0 4

slide-36
SLIDE 36

MANIPULATING DATAFRAMES WITH PANDAS

NumPy vectorized functions

import numpy as np np.floor_divide(df, 12) # Convert to dozens unit eggs salt spam month Jan 3.0 1.0 1.0 Feb 9.0 4.0 2.0 Mar 18.0 7.0 6.0 Apr 6.0 7.0 1.0 May 11.0 NaN 4.0 Jun 17.0 5.0 4.0

slide-37
SLIDE 37

MANIPULATING DATAFRAMES WITH PANDAS

Plain Python functions

def dozens(n): return n // 12 df.apply(dozens) # Convert to dozens unit eggs salt spam month Jan 3 1.0 1 Feb 9 4.0 2 Mar 18 7.0 6 Apr 6 7.0 1 May 11 NaN 4 Jun 17 5.0 4

slide-38
SLIDE 38

MANIPULATING DATAFRAMES WITH PANDAS

Plain Python functions

df.apply(lambda n: n // 12) eggs salt spam month Jan 3 1.0 1 Feb 9 4.0 2 Mar 18 7.0 6 Apr 6 7.0 1 May 11 NaN 4 Jun 17 5.0 4

slide-39
SLIDE 39

MANIPULATING DATAFRAMES WITH PANDAS

Storing a transformation

df['dozens_of_eggs'] = df.eggs.floordiv(12) df eggs salt spam dozens_of_eggs month Jan 47 12.0 17 3 Feb 110 50.0 31 9 Mar 221 89.0 72 18 Apr 77 87.0 20 6 May 132 NaN 52 11 Jun 205 60.0 55 17

slide-40
SLIDE 40

MANIPULATING DATAFRAMES WITH PANDAS

The DataFrame index

df eggs salt spam dozens_of_eggs month Jan 47 12.0 17 3 Feb 110 50.0 31 9 Mar 221 89.0 72 18 Apr 77 87.0 20 6 May 132 NaN 52 11 Jun 205 60.0 55 17 df.index Index(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun'], dtype='object', name='month')

slide-41
SLIDE 41

MANIPULATING DATAFRAMES WITH PANDAS

Working with string values

df.index = df.index.str.upper() df eggs salt spam dozens_of_eggs month JAN 47 12.0 17 3 FEB 110 50.0 31 9 MAR 221 89.0 72 18 APR 77 87.0 20 6 MAY 132 NaN 52 11 JUN 205 60.0 55 17

slide-42
SLIDE 42

MANIPULATING DATAFRAMES WITH PANDAS

Working with string values

df.index = df.index.map(str.lower) df eggs salt spam dozens_of_eggs jan 47 12.0 17 3 feb 110 50.0 31 9 mar 221 89.0 72 18 apr 77 87.0 20 6 may 132 NaN 52 11 jun 205 60.0 55 17

slide-43
SLIDE 43

MANIPULATING DATAFRAMES WITH PANDAS

Defining columns using other columns

df['salty_eggs'] = df.salt + df.dozens_of_eggs df eggs salt spam dozens_of_eggs salty_eggs jan 47 12.0 17 3 15.0 feb 110 50.0 31 9 59.0 mar 221 89.0 72 18 107.0 apr 77 87.0 20 6 93.0 may 132 NaN 52 11 NaN jun 205 60.0 55 17 77.0

slide-44
SLIDE 44

Let's practice!

MAN IP U L ATIN G DATAFR AME S W ITH PAN DAS