Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic - - PowerPoint PPT Presentation

spatial temporal k nearest neighbors model on mapreduce
SMART_READER_LITE
LIVE PREVIEW

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic - - PowerPoint PPT Presentation

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction A. Agafonov, A. Yumaganov Samara National Research University The 19th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL


slide-1
SLIDE 1

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction

  • A. Agafonov, A. Yumaganov

Samara National Research University The 19th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2018)

  • A. Agafonov, A. Yumaganov

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 1 / 15

slide-2
SLIDE 2

Task definition

Forecast the traffic flow in 10 minutes ahead Take into account spatial and temporal characteristics of the traffic flow Develop a distributed forecasting model Efficiently process large-scale traffic data Task Real-time processing High accuracy

  • A. Agafonov, A. Yumaganov

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 2 / 15

slide-3
SLIDE 3

Problem formulation

G = (N, E) is a directed graph representing the road network; N is a node representing the road intersection; E is an edge denoting the road segment; Vj

t is an observed traffic flow characteristic on an edge j ∈ E in a time moment

t.

Given a graph G(N, E) and traffic flow data Vj

t, j ∈ E, t = 1, 2, . . . T, predict the traffic

flow characteristic at a time interval (t + ∆) for a predefined prediction horizon ∆.

  • A. Agafonov, A. Yumaganov

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 3 / 15

slide-4
SLIDE 4

Proposed model

A short-term traffic flow forecasting model based on non-parametric regression k nearest neighbors algorithm is proposed.

k nearest neighbors Prediction function Distance metric Feature vector

  • A. Agafonov, A. Yumaganov

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 4 / 15

slide-5
SLIDE 5

Feature vector

Time-Domain Upstream / Downstream (TDUD) feature-vector:

(Vj

t−T, . . . , Vj t−1, Vj t, Vj−1 t−T, . . . , Vj−1 t−1, Vj−1 t

Vj+1

t−T, . . . , Vj+1 t−1, Vj+1 t

)

Proposed feature vector: Partition the transportation network graph into several spatially compact clusters {Gi} and define the cluster feature vector

{Vj

t}, j ∈ Gi, t = tcur − T, . . . , tcur

Reduce the dimensionality of the cluster feature vector using PCA procedure

{Xn}i, n = 1, . . . , N

Define the result feature vector for each road segment j ∈ E

Sj = ({Vj

t}, {Xn}i),

i : j ∈ Gi, t = tcur − T, . . . , tcur, n = 1, . . . , N.

  • A. Agafonov, A. Yumaganov

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 5 / 15

slide-6
SLIDE 6

Graph partitioning

Partitioning by area Garea Partitioning by distance Gdist

Gdist

i

= {j ∈ E : r(i, j) <= R},

where r(i, j) is the distance, i ∈ E, j ∈ E

  • A. Agafonov, A. Yumaganov

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 6 / 15

slide-7
SLIDE 7

Proximity measure

Weighted Euclidean distance with the trend adjustment:

d(S, ¯ Si) = dlink(V, ¯ Vi) + γdpca(X, ¯ Xi), dlink(V, ¯ Vi) = α

  • T
  • t=1

̙T−t+1

Vt − ¯ Vi

t

2 + (1 − α)

  • T
  • t=2

t−1

  • δ=1
  • (Vt − Vδ) −

¯ Vi

t − ¯

Vi

δ

2

,

dpca(X, ¯ Xi) =

  • N
  • n=1
  • Xn − ¯

Xi

n

2

.

  • A. Agafonov, A. Yumaganov

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 7 / 15

slide-8
SLIDE 8

Prediction function

Prediction function by the weighted average:

ˆ VT+1 =

K

  • k=1

d−1

k

K

k=1 d−1 k

Vk

T+1

Prediction function that combines the weighted average and the trend adjustment:

ˆ VT+1 = θ

K

  • k=1

d−1

k

K

k=1 d−1 k

Vk

T+1 + (1 − θ)

      VT + 1 KT

K

  • k=1

T

  • t=1
  • Vk

T+1 − Vk t

      

  • A. Agafonov, A. Yumaganov

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 8 / 15

slide-9
SLIDE 9

MapReduce-based implementation

Training data Testing data Split data Split data Cartesian join train_split_0 train_split_1 train_split_n test_split_0 test_split_1 test_split_k

train_0 test_0 Map procedure

Sort

train_1 test_0 Map procedure

Sort

train_n-1 test_k-1 Map procedure

Sort

key1:local_top_list1 key1:local_top_list2 ... keyT:local_top_list1 keyT:local_top_list2 ...

...

train_n test_k Map procedure

Sort Reduce procedure Reduce procedure

key1:global_top_list key2:global_top_list key3:global_top_list ... keyT:global_top_list

... ... ...

Map phase Shuffle phase Reduce phase Input data Output data Preprocessing phase

  • A. Agafonov, A. Yumaganov

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 9 / 15

slide-10
SLIDE 10

Model analysis

Comparison: proposed kNN model TDUD-KNN SARIMA MAE = 1

n

n

  • t=1

|Vt − ˆ Vt|,

MAPE = 1

n

n

  • t=1

|Vt − ˆ Vt| Vt × 100%

Data set: Transportation network with 26018 road segments Average speed in a period of 60 days New data each 10 minutes

  • A. Agafonov, A. Yumaganov

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 10 / 15

slide-11
SLIDE 11

Model analysis. MAE / MAPE

11.3 11.35 11.4 11.45 11.5 11.55 11.6 11.65 11.7 11.75 5 10 15 20 25 30 35 40 45

MAPE, % k

Table: Algorithms Comparison

MAE MAPE

R = 1

2.378 10.61

R = 2

2.374 10.598

R = 3

2.372 10.593

Garea

2.379 10.596 TDUD-KNN 2.387 10.611 SARIMA 2.399 10.77

  • A. Agafonov, A. Yumaganov

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 11 / 15

slide-12
SLIDE 12

Model analysis. MAE / MAPE by days

2.2 2.3 2.4 2.5 213 215 217 219 221 223 225 227 MAE, km/h Day of year

MAE

TDUD-KNN R=3 SARIMA 8 8.5 9 9.5 10 10.5 11 11.5 12 213 215 217 219 221 223 225 227 MAPE, % Day of year

MAPE

TDUD-KNN R=3 SARIMA

  • A. Agafonov, A. Yumaganov

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 12 / 15

slide-13
SLIDE 13

Model analysis. Execution time

Cluster up to 6 PC: Intel Core i5-3740 3.20 GHz, 8 GB RAM

346 176 139 101 88 74

50 100 150 200 250 300 350 400 1 2 3 4 5 6

Execution time, sec Number of nodes

Execution time

1 0.93 0.86 0.82 0.81 0.72

0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 5 6

Scaleup value Number of nodes

Scaleup

  • A. Agafonov, A. Yumaganov

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 13 / 15

slide-14
SLIDE 14

Conclusion

The distributed spatial-temporal model of short-term traffic flow forecasting has the following advantages: The model takes into account spatial and temporal characteristics of the traffic flow. The implementation is based on MapReduce processing model in the

  • pen-source cluster-computing framework Apache Spark for distributed Big

Data processing. The proposed model has a high prediction accuracy and reasonable execution time, sufficient for real-time prediction.

  • A. Agafonov, A. Yumaganov

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 14 / 15

slide-15
SLIDE 15

Thank you!

Anton Agafonov ant.agafonov@gmail.com The work was supported by the Ministry of Science and Higher Education

  • f the Russian Federation (project no. RFMEFI57518X0177)
  • A. Agafonov, A. Yumaganov

Spatial-Temporal K Nearest Neighbors Model on MapReduce for Traffic Flow Prediction 15 / 15