RTEMS-SMP Improvement for LEON multi-core Contract No: - - PowerPoint PPT Presentation

rtems smp improvement for leon multi core
SMART_READER_LITE
LIVE PREVIEW

RTEMS-SMP Improvement for LEON multi-core Contract No: - - PowerPoint PPT Presentation

RTEMS-SMP Improvement for LEON multi-core Contract No: 4000116175/ 15/ NL/ FE/ as Contractor: embedded brains GmbH (Germany) TRP (95k Euro) Duration: 12 months (KO: Feb 2016, FR: May 2017) TO: M. Verhoef / T.


slide-1
SLIDE 1

ESA UNCLASSI FI ED - For Official Use ESA | 01/ 01/ 2016 | Slide 1

RTEMS-SMP Improvement for LEON multi-core

  • Contract No: 4000116175/ 15/ NL/ FE/ as
  • Contractor: embedded brains GmbH (Germany)
  • TRP (95k Euro)
  • Duration: 12 months (KO: Feb 2016, FR: May 2017)
  • TO: M. Verhoef / T. Tsiodras
slide-2
SLIDE 2

RTEMS SMP - Ready for Launch

Sebastian Huber

embedded brains GmbH

May 8, 2017

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 1 / 30

slide-3
SLIDE 3

Overview

Topics of this Presentation

What is RTEMS? Overall RTEMS features Some RTEMS SMP details

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 2 / 30

slide-4
SLIDE 4

What is RTEMS?

Real-Time Operating System for Multiprocessor Systems (RTEMS)

Operating system Multi-threaded Single address-space No kernel-space/user-space separation Real-time Permissive open source license (GPLv2 with linking exception, no obligations for application code)

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 3 / 30

slide-5
SLIDE 5

RTEMS History

1988 RTEMS development started by On-Line Applications Research Corporation (OAR) Classic real-time operating system O(1) priority scheduler Non-transitive priority inheritance Priority ceiling 2008 EDISOFT tailors RTEMS 4.8.0 now used in over 20 missions, qualified to DAL-B 2009 Astrium uses of tailored RTEMS 4.6.1 for space applications 2014 Start of Symmetric Multiprocessing (SMP) support development Sponsored by ESA with two parallel projects Gaisler/Airbus/OAR and SpaceBel/EB/UoP Other RTEMS users 2017 State-the-Art SMP support available as a result of this project (RTEMS 4.12) System initialization via constructors Scalable timer/timer support Giant lock removal OMIP implementation

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 4 / 30

slide-6
SLIDE 6

RTEMS Features - SMP Platforms

SMP Platforms

SPARC

◮ GR712RC ◮ GR740

PowerPC

◮ QorIQ (e.g. P1020, P2020, T2080, T4240, etc.)

ARMv7-A

◮ Altera Cyclone V ◮ Xilinx Zynq ◮ Raspberry Pi 2

Other (ARMv8, RISC-V, x86) - just ask for support

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 5 / 30

slide-7
SLIDE 7

RTEMS Features - APIs

APIs

Classic POSIX (pthreads) C11 threads C++11 threads Newlib and GCC internal Futex (synchronization via user-space atomic operations combined with futex system calls) A broad range of standard software runs on RTEMS

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 6 / 30

slide-8
SLIDE 8

RTEMS Features - Programming Languages/Compiler

Programming Languages

C/C++/OpenMP (RTEMS Source Builder, RSB) Ada Google Go Fortran (RSB) Erlang Python and MicroPython

Available Compiler

GCC (default, best supported and recommended) LLVM/clang (works, but currently not available via RSB) Other (not out of the box)

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 7 / 30

slide-9
SLIDE 9

RTEMS Features - Devices

Devices

Termios (serial interfaces) I2C (Linux user-space API compatible) SPI (Linux user-space API compatible) Network stacks (legacy, libbsd, lwIP) USB stack (libbsd) SD/MMC card stack (libbsd)

libbsd

Port of FreeBSD user-space and kernel-space components to RTEMS Easy access to FreeBSD software for RTEMS Support to stay in synchronization with FreeBSD

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 8 / 30

slide-10
SLIDE 10

RTEMS Features - Basic Infrastructure

Basic Infrastructure

C11/C++11 thread-local storage Lock-free timestamps (FreeBSD timecounters) Scalable timer and timeout support Link-time configuration (RTEMS is a library) System initialization via constructors (linker sets, similar to global C++ constructors)

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 9 / 30

slide-11
SLIDE 11

RTEMS Features - Schedulers and Locking Protocols

Clustered Scheduling

Independent scheduler instances for processor subsets (cache topology) Flexible link-time configuration Fixed-priority scheduler Job-level fixed-priority scheduler (EDF)

Locking Protocols for Mutual Exclusion

Transitive priority inheritance O(m) Independence-Preserving Protocol (OMIP) Priority ceiling Multiprocessor Resource-Sharing Protocol (MrsP)

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 10 / 30

slide-12
SLIDE 12

What is new?

Symmetric Multiprocessing (SMP) Support for RTEMS

SMP machines consist of a set of processors (players) attached to a common memory (field). The operating system provides means to ensure fair play.

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 11 / 30

slide-13
SLIDE 13

Why use SMP?

Solve same problem faster - Amdahl’s law

Speedup(n) = 1 (1 − p) + p

n

Solve larger problem in the same time - Gustafsons’s law

Speedup(n) = 1 − p + np Special case: Space and Time Partitioning (TSP)

No reason for SMP

Simplify application development – you use SMP since you must

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 12 / 30

slide-14
SLIDE 14

RTEMS SMP Details

Topics

Timestamps Timer/Timeout Support System Initialization Clustered Scheduling Locking Protocols

Plot Data: Testsuite Results

All plots are generated (Python Matplotlib) from data obtained by standard RTEMS testsuite resuls (XML).

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 13 / 30

slide-15
SLIDE 15

Lock-Free Timestamps

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Active Workers 1 2 3 4 5 Operation Count 1e8

Timestamp Performance (Software Timecounter)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Active Workers 0.5 1.0 1.5 2.0 2.5 Operation Count 1e7

Timestamp Performance (Hardware Timecounter)

void worker(void) { while (true) { timestamp(); } }

Timestamps for uptime and wall clock time Port of FreeBSD Timecounters Time synchronization via NTP and PPS possible Timestamp performance obtained by SPTIMECOUNTER 2 test program Example platform QorIQ T4240 running at 1.5GHz With software timecounter approximately 79 processor cycles per timestamp

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 14 / 30

slide-16
SLIDE 16

Timer/Timeout Support

Timer

Perform an action at a certain time in the future. Timer usually expire.

Timeouts

Set time limits to actions. Timeouts hopefully expire rarely.

Timer/Timeout Implementations

Priority queues (expiration time as key), e.g. red-black tree

◮ O(log(n)) insert and cancel operations (n active timer count) ◮ O(m · log(n)) expire operation (m count of timer to expire) ◮ Used by RTEMS

Timer wheel (hash table)

◮ O(1) insert and cancel operations ◮ Unpredictable expiration operation runtime ◮ Used by network stack Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 15 / 30

slide-17
SLIDE 17

Timer Support - Scalable with Active Timer Count

10 10

1

10

2

10

3

10

4

10

5

Active Timers 2 4 6 8 10 12 14 Timer Insert and Cancel [ s]

Timer Operation Performance (T4240) Earliest Expiration Time Middle Expiration Time Latest Expiration Time

10 10

1

10

2

10

3

10

4

10

5

Active Timers 4 6 8 10 12 14 Timer Insert and Cancel [ s]

Timer Operation Performance (GR740) Earliest Expiration Time Middle Expiration Time Latest Expiration Time

Timer implementation based on red-black trees Timer performance obtained by TMTIMER 1 test program Example platform QorIQ T4240 running at 1.5GHz (left) Example platform GR740 running at 250MHz (right)

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 16 / 30

slide-18
SLIDE 18

Timer Support - Scalable with Processor Count

Per-Processor Timer Maintenance

Each processor has its own data set to maintain timers Thread operation timeouts use current processor Timer use dedicated processor set during timer creation

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 17 / 30

slide-19
SLIDE 19

System Initialization via Constructors (1)

Standard System Initialization without Constructors

void system_init(void) { init_subsystem_a(); init_subsystem_b(); init_subsystem_c(); init_subsystem_d(); init_subsystem_e(); }

Disadvantage

In case a subsystem is not required by the application, it is still initialized

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 18 / 30

slide-20
SLIDE 20

System Initialization via Constructors (2)

System Initialization via Constructors

void system_init(void) { constructor *c = constructor_begin; while (c != constructor_end) { (*c->init)(); ++c; } }

Subsystem X

void subsystem_x_init(void) { /* Some init stuff */ } REGISTER_CONSTRUCTOR(subsystem_x_init, ORDER_X);

Advantage

Only subsystems used by the application are initialized and present in the executable

Disadvantage

Requires linker and object file format support Used by major software systems, e.g. C++, Linux, FreeBSD, etc.

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 19 / 30

slide-21
SLIDE 21

Low-Level Synchronization - SMP Locks

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Active Workers 0.0 0.5 1.0 1.5 2.0 2.5 Operation Count 1e7

SMP Lock Performance Ticket Lock MCS Lock TAS Lock TTAS Lock

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Active Workers 10-6 10-5 10-4 10-3 10-2 10-1 100 Normed Coefficient of Variation

SMP Lock Fairness Ticket Lock MCS Lock TAS Lock TTAS Lock

Several options exist for low-level synchronization in SMP systems Test-and-set (TAS) Test and test-and-set locks (TTAS) Ticket locks Mellor-Crummey Scott (MCS) locks SMP lock performance obtained by SMPLOCK 1 test program Example platform QorIQ T4240 running at 1.5GHz

Basic Requirement: FIFO Fairness

Ticket lock was selected as standard SMP lock for RTEMS SMP

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 20 / 30

slide-22
SLIDE 22

Clustered Scheduling (1)

Clustered Scheduling

Independent scheduler instances for pair-wise disjoint processor subsets

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 21 / 30

slide-23
SLIDE 23

Clustered Scheduling (2)

Advantages

Keep worst-case execution time (WCET) under control: SMP lock FIFO fairness ⇒ WCET increases linear with processor count Scheduler instances based on cache topology to minimize thread migration overhead (important for priority based schedulers) Optimal choice of scheduler algorithms Easy implementation compared to schedulers with local run queues and load balancing

Disadvantage

Thread assignment to scheduler instance is a system design decision (bin-packing problem)

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 22 / 30

slide-24
SLIDE 24

Locking Protocols for Mutual Exclusion (1)

Clustered Scheduling

Temporary thread migration is required to minimize latency

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 23 / 30

slide-25
SLIDE 25

Locking Protocols for Mutual Exclusion (2)

M0 T0(P0)

  • wner

Mutex M0 with owner thread T0 (thread priority P0)

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 24 / 30

slide-26
SLIDE 26

Locking Protocols for Mutual Exclusion (2)

M0 T0(P0, P1)

  • wner

T1(P1) wait

Mutex M0 with owner thread T0 and priority inheritance due to waiting thread T1

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 24 / 30

slide-27
SLIDE 27

Locking Protocols for Mutual Exclusion (2)

M0 T0(P0, P1)

  • wner

M1 T1(P1, P2)

  • wner

wait T2(P2) wait

Non-transitive priority inheritance: thread priority P2 is not propagated to thread T0

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 24 / 30

slide-28
SLIDE 28

Locking Protocols for Mutual Exclusion (2)

M0 T0(P0, P1, P2)

  • wner

M1 T1(P1, P2)

  • wner

wait T2(P2) wait

Transitive priority inheritance: thread priority P2 is propagated to thread T0 via thread T1

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 24 / 30

slide-29
SLIDE 29

Locking Protocols for Mutual Exclusion (2)

T0(P0, P1, P2) T1(P1, P2) M0 wait T2(P2) M1 wait

  • wner
  • wner

Transitive priority inheritance and clustered scheduling with three scheduler instances magenta, red and blue Thread T0 has access to all three scheduler instances while owning mutex M0

Implementation Challenge: Fine Grained Locking

Synchonization objects, threads and schedulers have dedicated SMP locks.

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 24 / 30

slide-30
SLIDE 30

Locking Protocols for Mutual Exclusion (3)

O(m) Independence-Preserving Protocol (OMIP)

Generalization of transitive priority inheritance to clustered scheduling Suitable for general purpose libraries

Multiprocessor Resource-Sharing Protocol (MrsP)

Generalization of priority ceiling to clustered scheduling User must specify ceiling priorities per scheduler instance Protocol design had schedulability analysis in mind

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 25 / 30

slide-31
SLIDE 31

Fine Grained Locking

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Active Workers 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 Total Operation Count 1e8

Uncontested Mutex Performance Self-Contained Mutex Classic Mutex

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Active Workers 2000000 3000000 4000000 5000000 6000000 7000000 8000000 Individual Operation Counts

Uncontested Mutex Performance

void worker(void) { mutex mtx; while (true) { mtx.acquire(); mtx.release(); } }

Each synchronization object (mutex, message queue, counting semaphore, etc.) has its

  • wn SMP lock

Very important for average case performance Mutex performance obtained by TMFINE 1 test program Example platform QorIQ T4240 running at 1.5GHz Classic API objects are subject to false cache line sharing

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 26 / 30

slide-32
SLIDE 32

OpenMP

OpenMP

Compiler supported parallelization using a fork-join model OpenMP 4.5 support via GCC provided libgomp Highly optimized RTEMS configuration of libgomp Uses barrier implementation of Linux based on futex system call

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 27 / 30

slide-33
SLIDE 33

Embedded Multicore Building Blocks (EMB2)/MTAPI

EMB2

Set of C/C++ libraries providing:

◮ Task management ◮ Dataflow ◮ Algorithms ◮ Containers

Initially designed for embedded systems 2-clause BSD license Developed and used by Siemens Fully supported by RTEMS

Multicore Task Management API (MTAPI)

Open source reference implementation contained in the EMB2 Custom implementation available from Gaisler

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 28 / 30

slide-34
SLIDE 34

Status and Future Work

Status

RTEMS SMP is the result of test driven development (RTEMS testsuite contains more than 600 test programs) RTEMS 4.12 release is planned for Q2-Q3 2017 RTEMS SMP is available on the GR712RC and GR740 Used on Altera Cyclone V, Xilinx Zynq and QorIQ T4240 in production systems

Next Step

Space qualification according to ECSS standards (potential GSTP G617-254SW, maybe available in 2019).

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 29 / 30

slide-35
SLIDE 35

Questions or Lunch?

Sebastian Huber (embedded brains GmbH) RTEMS SMP - Ready for Launch May 8, 2017 30 / 30