SLIDE 9 MapReduce-based implementation
Training data Testing data Split data Split data Cartesian join train_split_0 train_split_1 train_split_n test_split_0 test_split_1 test_split_k
train_0 test_0 Map procedure
Sort
train_1 test_0 Map procedure
Sort
train_n-1 test_k-1 Map procedure
Sort
key1:local_top_list1 key1:local_top_list2 ... keyT:local_top_list1 keyT:local_top_list2 ...
...
train_n test_k Map procedure
Sort Reduce procedure Reduce procedure
key1:global_top_list key2:global_top_list key3:global_top_list ... keyT:global_top_list
... ... ...
Map phase Shuffle phase Reduce phase Input data Output data Preprocessing phase
- A. Agafonov, A. Yumaganov
Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 9 / 15