[PPT] - SOFTWARE IMPLEMENTATION OF THE IEEE 802 11A/P PHYSICAL LAYER IEEE PowerPoint Presentation

SLIDE 1

SOFTWARE IMPLEMENTATION OF THE IEEE 802 11A/P PHYSICAL LAYER IEEE 802.11A/P PHYSICAL LAYER

SDR`12 – WInnComm Europe SDR 12 WInnComm Europe 27  29 June, 2012  Brussels, Belgium

T. Cupaiuolo, D. Lo Iacono, M. Siti and M. Odoni

Advanced System Technologies STMicroelectronics, Agrate Brianza, Italy

i l Daniele Lo Iacono

SLIDE 2

Outline

The system

Wireless Access in Vehicular Environments (WAVE): IEEE 802 11p

Wireless Access in Vehicular Environments (WAVE): IEEE 802.11p
Comparison with IEEE 802.11a/g
A Software Defined Radio (SDR) implementation approach: the BPE

A Software Defined Radio (SDR) implementation approach: the BPE baseband communication platform

Digital baseband implementation
Reference system model
802.11p: Data-aided channel estimation

Customization and code profiling

Customization and code profiling
Results

2

SLIDE 3

IEEE 802.11p WAVE

Requirements
Fast access as a priority (latency <50 ms)

p y ( y )

Mobility (>60Km/h) and Range (~1Km)
Robustness and reliability
Security
Applications
Vehicle safety (emergency warning systems, Intersection collision avoidance,

forward collision warning) g)

Tolling
Infotainment
Traffic management

g

Cooperative Adaptive Cruise Control
Comparison with 802.11a/g
10 MHz OFDM bandwidth (vs 20 MHz): max PHY data rate 27 Mbit/s (vs 54)

10 MHz OFDM bandwidth (vs 20 MHz): max PHY data rate 27 Mbit/s (vs. 54)

5.9 GHz carrier frequency
digital baseband: added support for mobility → Data-aided channel estimation

3

SLIDE 4

The BPE baseband communication platform

customizable distributed reconfigurable data-path

d‐unit mesh d‐memory

customizable coarse-grain hardware

perators

embedded memory

d‐unit bank routing d memory bank instruction memory d‐instruction scheduler memory management

scheduler and dispatcher

fetch & decoding b‐instruction execution data‐port registers space

dispatcher

system bus interface data port

4

SLIDE 5

Flow-control: b-instruction

ut = opcode(in0,in1,in2)

d‐unit mesh d‐memory d‐unit bank routing y bank instruction memory

b-instruction are also used

d‐instruction scheduler memory management

are also used to set the way d-instruction will access the memory bank b-instruction

fetch & decoding b‐instruction execution data‐port registers space

memory bank execution unit

system bus interface data port

register file

5

SLIDE 6

Vector processing: d-instruction

ut = unit1.opcode(unit0,in0,in1)

d‐unit mesh d‐memory

bank of static memories for vector allocation processing units performing parallel and

d‐unit bank routing d memory bank instruction memory

routing mesh pipelined vector processing

d‐instruction scheduler memory management

d instruction routing mesh dynamically configuring unit-memory and unit-unit

fetch & decoding b‐instruction execution data‐port registers space

d-instruction scheduling unit a d u t u t connections

system bus interface data port

6

SLIDE 7

Algorithm mapping: macros

arith0.mul v0 comm0.ed v7 v1 v3 arith1.mul arith2.sub comm1.qt arith3.mul v8 v9

macro made by t ll l

v2 v4 v5 v6

two parallel branches each performing pipelined i

9 ith3 l( 1 6) arith0.mul comm0 ed

processing among different units parallel and pipelined processing to reduce execution time and memory

v9 = arith3.mul(comm1,v6) comm1.qt(arith2,v5) v8 = arith2.sub(v4,arith1) v7 = comm0.ed(arith0,v3) arith0.mul(v0,v1); arith1.mul i h2 b comm0.ed

y accesses

arith1.mul(v0,v2) arith2.sub comm1.qt arith3.mul

7

SLIDE 8

Pipeline of macros

v0

macro #0

v2

macro #1

v3

macro #2

v1

macro #0 macro #0 macro #1 macro #2 macro #1 macro #2 conflicts on shared memory access f li memory access inhibit a full pipelined processing use of memory alias (ping-pong mechanism) at intermediate stage of processing

v0

macro #0

r0

macro #1

r1

macro #2

v1

macro #0 macro #1 macro #0 macro #1 macro #0 macro #1

8

macro #2 macro #2 macro #2

SLIDE 9

Multi-thread

macro #0 macro #1 macro #0 macro #1 macro #0 macro #1

function

macro #1 macro #2 macro #1 macro #2 macro #1 macro #2

Single‐thread execution

OFDM symbol #1 OFDM symbol #2 function #1 function #2 function #3 function #1 function #2 function #3

Multi‐thread (3) execution

OFDM #1 OFDM #2 OFDM #3 OFDM #4 function #1 function #1 OFDM #1 OFDM #1 OFDM #2 OFDM #2 OFDM #3 OFDM #3 OFDM #4 OFDM #4 function #3 function #3 function #2 function #2

9

SLIDE 10

IEEE 802.11a/p reference system model

encoder puncturer+ interleaver mapper upsampling/ filtering D/A source bits IFFT channel FFT equalizer de‐map de‐int d t Viterbi d di filter decoded b A/D q p de‐punct decoding channel synchronizer bits / discard pilot and virtual sub‐carriers estimation synchronizer 802.11p: data -aided channel estimation

10

channel estimation

SLIDE 11

Data-aided channel estimation (1/2)

Data-aided channel estimation basic idea:

1 d t d t ti f th t i d OFDM b l i h l 1. data detection of the current received OFDM symbol using channel estimation corresponding to the previous OFDM symbol 2. the channel corresponding to the current OFDM symbol is estimated p g y by using the estimated data QAM symbols

 Data detection through simple hard decision detection (HDD)

 low extra complexity l l t d t 802 11 /  low latency compared to 802.11a/g

11

SLIDE 12

Data-aided channel estimation (2/2)

Initial CE based on the LTS field

Time Domain Least Square CE initial CE received sequence (FFT) LTS based

freq. domain CE

IFFT time‐domain filtering FFT

For successive OFDM symbols (SIG and DATA) CE tracked exploiting both pilot and the estimated data symbols

Data‐aided previous CE

and the estimated data symbols

Time Domain Data‐aided

freq. domain CE

updated CE HDD previous CE Time Domain Least Square CE received sequence (FFT)

SLIDE 13

Multimode 11a/p receiver data pipeline

FFT equalizer de‐map de‐int de‐punct Viterbi decoding filter channel estimation synchronizer recursive update: processing bottleneck which does not allow estimation

N N 1

11a bottleneck which does not allow to build a pipeline as for 802.11a

N‐1 N N‐2 N+1 N N‐1

symbol processing time

11p

N N N N+1 N+1 N+1 N N+1

13 symbol processing time

SLIDE 14

Multimode 11a/p receiver code profiling

FFT equalizer de‐map de‐int de‐punct Viterbi decoding filter channel estimation synchronizer function 11a/g (clock cycles) 11p (clock cycles) filter 162 h i 1536 (l t ) synchronizer 1536 (latency) FFT (iFFT) 200 (radix-4) channel estimation 64 (@LTS) 759 (data-aided, hard-detection) equalizer 64 de-mapper 348 (MCS #7) de-interleaver / de-puncturer 408 (MCS #7) OFDM symbol single-thread (@250MHz) 1432 (5.7s) 2150 (8.6s) OFDM symbol multi-thread (@250MH ) 596 (2.4s) 1490 (5.9s)

14

(@250MHz) (  ) (  )

SLIDE 15

Conclusions

The BPE software programmable architecture has support for:
macro building
(macro-) instructions pipelining
emulate memory ping-pong access
Multi-threading

Al ith fili th BPE

Algorithm profiling on the BPE
Translate the algorithm steps into macros
Build the macro-pipeline
PHY profiling on the BPE (MCS #7, @250 MHz)
802.11a/g: 5.7 s (single thread)  2.4 s (three threads) (i.e. 54 Mbit/s)
802 11p: 8 6 s (single thread)

5 9 s (two threads) (i e 27 Mbit/s)

802.11p: 8.6 s (single thread)  5.9 s (two threads) (i.e. 27 Mbit/s)
Future steps
802 11p 20 MHz optional mode
802.11p 20 MHz optional mode
Soft decision directed DA CE (FEC based, i.e. Viterbi decoding)
to address these and other issues: investigating architectural enhancements

(including the idea of a “cluster of BPE”) ( g )

15