Design and Implementation of a TriCore Backend for the LLVM Compiler - - PowerPoint PPT Presentation

design and implementation of a tricore backend for the
SMART_READER_LITE
LIVE PREVIEW

Design and Implementation of a TriCore Backend for the LLVM Compiler - - PowerPoint PPT Presentation

Design and Implementation of a TriCore Backend for the LLVM Compiler Framework Studienarbeit Christoph Erhardt Friedrich-Alexander-Universit at Erlangen-N urnberg November 20, 2009 A TriCore Backend for LLVM (November 20, 2009) 1 25


slide-1
SLIDE 1

Design and Implementation of a TriCore Backend for the LLVM Compiler Framework

Studienarbeit Christoph Erhardt

Friedrich-Alexander-Universit¨ at Erlangen-N¨ urnberg

November 20, 2009

A TriCore Backend for LLVM (November 20, 2009) 1 – 25

slide-2
SLIDE 2

Overview

Overview The TriCore Processor Architecture The LLVM Compiler Infrastructure Design and Implementation of the Backend Evaluation & Conclusion

A TriCore Backend for LLVM (November 20, 2009) Overview 2 – 25

slide-3
SLIDE 3

Motivation

What do we need it for?

TriCore chips are omnipresent around here:

Quadcopter High striker Carolo Cup ...

The RTSC (Real-Time Systems Compiler) project:

Operating system aware compiler for real-time applications Processes atomic basic blocks Based on LLVM

A TriCore Backend for LLVM (November 20, 2009) Overview 3 – 25

slide-4
SLIDE 4

Motivation

What do we need it for?

TriCore chips are omnipresent around here:

Quadcopter High striker Carolo Cup ...

The RTSC (Real-Time Systems Compiler) project:

Operating system aware compiler for real-time applications Processes atomic basic blocks Based on LLVM

RTSC should be able to generate TriCore machine code

A TriCore Backend for LLVM (November 20, 2009) Overview 3 – 25

slide-5
SLIDE 5

The TriCore Processor Architecture

Overview

Three-in-one architecture

Real-time microcontroller unit DSP Superscalar RISC processor

A TriCore Backend for LLVM (November 20, 2009) The TriCore Processor Architecture 4 – 25

slide-6
SLIDE 6

The TriCore Processor Architecture

Overview

Three-in-one architecture

Real-time microcontroller unit DSP Superscalar RISC processor

Basic features

Load/store architecture 32-bit data, address, and instruction words Some special 16-bit instruction words for higher code density Little-endian byte order 16 data + 16 address registers

A TriCore Backend for LLVM (November 20, 2009) The TriCore Processor Architecture 4 – 25

slide-7
SLIDE 7

Peculiarities

Some things that TriCore handles in an unusual way

Strict distinction between data and address registers:

Also reflected in the calling conventions Serious problem for the compiler!

Data registers are also used for floating-point operands Special DSP-oriented instructions and addressing modes Task/context model:

Automatic context save/restore upon call/return Context save areas (linked lists managed by hardware)

A TriCore Backend for LLVM (November 20, 2009) The TriCore Processor Architecture 5 – 25

slide-8
SLIDE 8

The LLVM Compiler Infrastructure

Overview

Open-source compiler infrastructure project started in 2000 Main sponsor: Apple Inc. Written in C++

A TriCore Backend for LLVM (November 20, 2009) The LLVM Compiler Infrastructure 6 – 25

slide-9
SLIDE 9

Basic Architecture

The classical three tiers of a compiler

x86 assembly SPARC assembly C source Fortran source LLVM-GCC frontend Optimizer x86 code generator SPARC code generator Clang frontend ... ... LLVM assembly/ bitcode LLVM assembly/ bitcode ... ...

Language-specific frontends Optimizer: generic IR, analysis/transformation passes Several backends for machine code generation

A TriCore Backend for LLVM (November 20, 2009) The LLVM Compiler Infrastructure 7 – 25

slide-10
SLIDE 10

Unique Characteristics

What does LLVM have that others don’t?

Not merely a compiler, but a compiler infrastructure:

Static compilation Just-in-time compilation

Strictly modular, library-based architecture:

Easily extendible Possibility to incorporate parts of LLVM in other projects

BSD-style licence Produces highly optimized machine code in an efficient way:

Memory-efficient Time-efficient

A TriCore Backend for LLVM (November 20, 2009) The LLVM Compiler Infrastructure 8 – 25

slide-11
SLIDE 11

Design and Implementation of the Backend

Overview

Extensive generic code generation framework:

Makes work a lot easier ... but also imposes some problems in specific cases

Fixed class hierarchy Many target-independent algorithms:

Instruction scheduling Register colouring ...

Code generation process executed by a series of passes

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 9 – 25

slide-12
SLIDE 12

Code Generation Process

DAG Lowering DAG Legalization Instruction Selection Scheduling SSA-Based Optimization Register Allocation Pro-/Epilogue Insertion Peephole Optimization Assembly Printing

List LLVM code (SSA form) DAGs not legalized DAGs legalized DAGs native instructions List SSA form List SSA form List with physical registers List with resolved stack references List with resolved stack references Text assembly code TriCoreTargetLowering TriCoreDAGToDAGISel TriCoreInstrInfo TriCoreRegisterInfo TriCoreInstrInfo TriCoreAsmPrinter TriCoreInstrInfo TriCoreInstrInfo TriCoreLoadStoreOpt

Post- Allocation Passes

List with physical registers TriCoreVirtInstrResolver

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 10 – 25

slide-13
SLIDE 13

TableGen

One tool to rule them all...

Problem

Backend contains large portions of descriptive data C++ obviously not suitable

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 11 – 25

slide-14
SLIDE 14

TableGen

One tool to rule them all...

Problem

Backend contains large portions of descriptive data C++ obviously not suitable

TableGen

Language for domain-specific modelling Similar to object-oriented approach:

Classes, records (objects), attributes Inheritance

Definition files (.td) preprocessed by tblgen tool → Auto-generation of C++ code Used for description of:

Subtargets, registers Calling conventions Instruction set

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 11 – 25

slide-15
SLIDE 15

SelectionDAG Construction

Largely automated

Directed acyclic graph

Per basic block Nodes: instructions Edges:

Data dependencies Control flow dependencies

Example

%mul = mul i32 %a, %a %mul4 = mul i32 %b, %b %add = add nsw i32 %mul4, %mul ret i32 %add

slide-16
SLIDE 16

SelectionDAG Construction

Largely automated

Directed acyclic graph

Per basic block Nodes: instructions Edges:

Data dependencies Control flow dependencies

Example

%mul = mul i32 %a, %a %mul4 = mul i32 %b, %b %add = add nsw i32 %mul4, %mul ret i32 %add

ր

isel input for euclidSquare:entry EntryToken 0xa8321e8 ch Register %reg1024 0xa832900 i32 Register %reg1025 0xa832988 i32 Register %D2 0xa8327f0 i32 CopyFromReg 0xa832a10 i32 ch CopyFromReg 0xa832878 i32 ch mul 0xa8325d0 i32 mul 0xa832548 i32 add 0xa832658 i32 CopyToReg 0xa8324c0 ch flag TriCoreISD::RET_FLAG 0xa8326e0 ch GraphRoot

slide-17
SLIDE 17

Troubles

The integer vs. pointer problem

Problem

TriCore strictly distinguishes between addresses and data integers Have to be put into separate register files → calling conventions! LLVM’s backend framework treats pointers just like integers...

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 13 – 25

slide-18
SLIDE 18

Troubles

The integer vs. pointer problem

Problem

TriCore strictly distinguishes between addresses and data integers Have to be put into separate register files → calling conventions! LLVM’s backend framework treats pointers just like integers...

Solution

Annotation of “pointer / no pointer” flag in value type class Promotion of this flag throughout the DAG construction phase (required some hacks...) Case differentiations in all relevant situations

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 13 – 25

slide-19
SLIDE 19

Instruction Selection

Largely auto-generated

isel input for euclidSquare:entry EntryToken 0xa8321e8 ch Register %reg1024 0xa832900 i32 Register %reg1025 0xa832988 i32 Register %D2 0xa8327f0 i32 CopyFromReg 0xa832a10 i32 ch CopyFromReg 0xa832878 i32 ch mul 0xa8325d0 i32 mul 0xa832548 i32 add 0xa832658 i32 CopyToReg 0xa8324c0 ch flag TriCoreISD::RET_FLAG 0xa8326e0 ch GraphRoot

Pattern matching →

def MULrr2 : Rr2Instr<0x0a, (outs DR:$c), (ins DR:$a, DR:$b), "mul\t$c, $a, $b", [(set DR:$c, (mul DR:$a, DR:$b))]>;

slide-20
SLIDE 20

Instruction Selection

Largely auto-generated

isel input for euclidSquare:entry EntryToken 0xa8321e8 ch Register %reg1024 0xa832900 i32 Register %reg1025 0xa832988 i32 Register %D2 0xa8327f0 i32 CopyFromReg 0xa832a10 i32 ch CopyFromReg 0xa832878 i32 ch mul 0xa8325d0 i32 mul 0xa832548 i32 add 0xa832658 i32 CopyToReg 0xa8324c0 ch flag TriCoreISD::RET_FLAG 0xa8326e0 ch GraphRoot

Pattern matching →

def MULrr2 : Rr2Instr<0x0a, (outs DR:$c), (ins DR:$a, DR:$b), "mul\t$c, $a, $b", [(set DR:$c, (mul DR:$a, DR:$b))]>;

scheduler input for euclidSquare:entry EntryToken 0xa8321e8 ch Register %reg1024 0xa832900 i32 Register %reg1025 0xa832988 i32 Register %D2 0xa8327f0 i32 CopyFromReg 0xa832a10 i32 ch CopyFromReg 0xa832878 i32 ch MULrr2 0xa832548 i32 MADDrrr2 0xa832658 i32 CopyToReg 0xa8324c0 ch flag RETsys 0xa8326e0 ch GraphRoot

slide-21
SLIDE 21

Scheduling & Register Allocation Target-independent algorithms

Scheduling

DAGs → list (SSA form) Target-independent algorithm using data from the instruction description table

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 15 – 25

slide-22
SLIDE 22

Scheduling & Register Allocation Target-independent algorithms

Scheduling

DAGs → list (SSA form) Target-independent algorithm using data from the instruction description table

Register Allocation

Virtual registers → physical registers SSA deconstruction Target-independent colouring algorithm using the register information table

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 15 – 25

slide-23
SLIDE 23

Final passes

Handwork

Virtual Instruction Resolution

Some instructions (e. g., moves) had operands of unknown register classes at the time of their creation Now that physical registers have been assigned, these instructions can be resolved

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 16 – 25

slide-24
SLIDE 24

Final passes

Handwork

Virtual Instruction Resolution

Some instructions (e. g., moves) had operands of unknown register classes at the time of their creation Now that physical registers have been assigned, these instructions can be resolved

Pre-/Epilogue Insertion

Insertion of pre-/epilogue code to entry/exits of all functions Virtual stack slots → physical stack frame references

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 16 – 25

slide-25
SLIDE 25

Code Emission

Partly auto-generated

Peephole Optimization

Merging of two subsequent 32-bit loads/stores into a single 64-bit load/store # Before: st.w [%a10]4, %d9 st.w [%a10]0, %d8 # After: st.d [%a10]0, %e8

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 17 – 25

slide-26
SLIDE 26

Code Emission

Partly auto-generated

Peephole Optimization

Merging of two subsequent 32-bit loads/stores into a single 64-bit load/store # Before: st.w [%a10]4, %d9 st.w [%a10]0, %d8 # After: st.d [%a10]0, %e8

Assembly Printing

Output of assembly code in text form Large parts auto-generated from the instruction description table

A TriCore Backend for LLVM (November 20, 2009) Design and Implementation of the Backend 17 – 25

slide-27
SLIDE 27

In practice

How do I use it?

Required software

Clang compiler frontend (support has been integrated) GNU Binutils for TriCore:

Assembler Linker

Headers and libraries from TriCore-GCC Small Perl wrapper script

A TriCore Backend for LLVM (November 20, 2009) Evaluation & Conclusion 18 – 25

slide-28
SLIDE 28

Comparison to GCC

Criteria

  • 1. Compilation speed
  • 2. Code size
  • 3. Code performance

A TriCore Backend for LLVM (November 20, 2009) Evaluation & Conclusion 19 – 25

slide-29
SLIDE 29

Comparison to GCC

Criteria

  • 1. Compilation speed
  • 2. Code size
  • 3. Code performance

Testing system

Benchmark application: CoreMark Compilation PC: Core 2 Quad Q6600 (2.40 GHz) Runtime system: TC1796 board (40 MHz)

A TriCore Backend for LLVM (November 20, 2009) Evaluation & Conclusion 19 – 25

slide-30
SLIDE 30

The CoreMark Benchmark

Alternative to the well-known Dhrystone benchmark C source code publicly available (albeit not open source software) Easily portable Prevents compilers from “cheating” by optimizing away unused computation results Operations:

Linked list processing Matrix manipulation State machine operations CRC computation

Results can be validated

A TriCore Backend for LLVM (November 20, 2009) Evaluation & Conclusion 20 – 25

slide-31
SLIDE 31

Compilation Time

50 100 150 200 250 300 350 400 450 500

  • O0
  • O1
  • O2
  • O3
  • Os

Compilation time (in milliseconds) Optimization level GCC LLVM

Takes about 10 % less time than GCC Even faster when compiling at -O0

A TriCore Backend for LLVM (November 20, 2009) Evaluation & Conclusion 21 – 25

slide-32
SLIDE 32

Code Size

5 10 15 20 25 30

  • O0
  • O1
  • O2
  • O3
  • Os

Size of the text segment (in KiB) Optimization level GCC LLVM

Generates slightly smaller code

A TriCore Backend for LLVM (November 20, 2009) Evaluation & Conclusion 22 – 25

slide-33
SLIDE 33

Code Performance

5 10 15 20 25 30 35 40

  • O0
  • O1
  • O2
  • O3
  • Os

Iterations per second Optimization level GCC LLVM

Code 12–20 % slower than the code generated by GCC Further work needed to become fully competitive

A TriCore Backend for LLVM (November 20, 2009) Evaluation & Conclusion 23 – 25

slide-34
SLIDE 34

Summary

All of the basic functionality is there and is working reliably Space for further optimizations and extensions First TriCore compiler to be released under a BSD-style licence! End goal: inclusion into LLVM’s repository

A TriCore Backend for LLVM (November 20, 2009) Evaluation & Conclusion 24 – 25

slide-35
SLIDE 35

The End

Any Questions?

A TriCore Backend for LLVM (November 20, 2009) Evaluation & Conclusion 25 – 25