SLIDE 1 Superinstructions and Replication in the Cacao JVM interpreter
Christian Thalinger Andreas Krall TU Wien
SLIDE 2
Why interpreters?
Porting/Retargetting Effort (d) Execution Time 1 10 100 1000 0.1 1 10 100 1000 Interpreters JITs
Architecture Mono Cacao Alpha interp. JIT AMD64 JIT JIT ARM JIT JIT HP-PA interp. IA32 JIT JIT IA64 JIT MIPS JIT MIPS64 JIT PowerPC JIT JIT PowerPC64 interp. s390 JIT s390x JIT SPARC SPARC64
SLIDE 3
Threaded Code
VM Code VM instruction routines Machine code for iadd Dispatch next instruction Machine code for imul Dispatch next instruction imul iadd iadd ...
SLIDE 4 Dynamic Superinstructions
data segment threaded VM Code code segment VM routine template Machine code for iload Dispatch next iload b iload c isub istore a Machine code for isub Dispatch next data segment
Machine code for iload Machine code for iload Machine code for isub Machine code for istore ... Dispatch next Machine code for istore Dispatch next ...
SLIDE 5
Replication
iload b iload c isub istore a Machine code for iload Machine code for iload Machine code for isub Machine code for istore ... Dispatch next ... iload e iload f isub istore d ... iload b iload c isub istore a Machine code for iload Machine code for iload Machine code for isub Machine code for istore ... Dispatch next ... iload e iload f isub istore d Machine code for iload Machine code for iload Machine code for isub Machine code for istore ... Dispatch next ...
No Replication Replication
+ Increases BTB prediction accuracy + Simpler − Increases code size
SLIDE 6 JVM and .NET problems
- Quickening
- Potential exception-throwing instructions
- How much benefit?
SLIDE 7
Quickable Instructions ACONST ARRAYCHECKCAST CHECKCAST GETFIELD CELL GETFIELD INT GETFIELD LONG GETSTATIC CELL GETSTATIC INT GETSTATIC LONG INSTANCEOF INVOKEINTERFACE INVOKESPECIAL INVOKESTATIC INVOKEVIRTUAL MULTIANEWARRAY NATIVECALL PUTFIELD CELL PUTFIELD INT PUTFIELD LONG PUTSTATIC CELL PUTSTATIC INT PUTSTATIC LONG
SLIDE 8 Simple Solution
data segment threaded VM Code code segment VM routine template Machine code for iload Dispatch next iload b getfield_quick Example.i
istore a Machine code for getfield Dispatch next data segment
Machine code for iload Dispatch next Machine code for istore ... Dispatch next Machine code for istore Dispatch next ... Machine code for getfield_quick Dispatch next before executing getfield after executing getfield
SLIDE 9 SableVM’s Sophisticated Solution
data segment threaded VM Code code segment VM routine template Machine code for iload Dispatch next iload b getfield_quick
istore a Machine code for getfield Dispatch next data segment
Machine code for skip_operand Machine code for iload Machine code for getfield_quick Machine code for istore ... Dispatch next Machine code for istore Dispatch next ... Machine code for getfield_quick Dispatch next Machine code for goto Dispatch next Machine code for replace Dispatch next super|goto prepseq iload b getfield Example.i
istore a ... before executing prepseq after executing prepseq replace super inst-ptr goto behind unused slot
SLIDE 10 Cacao’s Sophisticated Solution
data segment threaded VM Code code segment VM routine template Machine code for iload Dispatch next super|iload b getfield Example.i
istore a Machine code for getfield Dispatch next data segment
Machine code for iload Machine code for getfield_quick Machine code for istore ... Dispatch next Machine code for istore Dispatch next ... Machine code for getfield_quick Dispatch next superstart table last quickable inst threaded code start real-machine code before executing getfield after executing getfield
SLIDE 11
Potential Exception-Throwing Instructions IALOAD LALOAD AALOAD BALOAD CALOAD SALOAD IASTORE LASTORE BASTORE CASTORE IDIV IREM GETFIELD CELL GETFIELD INT GETFIELD LONG PUTFIELD CELL PUTFIELD INT PUTFIELD LONG INVOKEVIRTUAL INVOKESPECIAL INVOKEINTERFACE ARRAYLENGTH CHECKNULL
SLIDE 12
Problem and Solution getfield_cell: getfield_cell: mov (%edi),%eax mov (%edi),%eax add $0x8,%edi add $0x8,%edi test %ebp,%ebp test %ebp,%ebp je throw jne no_throw jmp *0x2a0(%esp) no_throw: add $0x4,%edi add $0x4,%edi mov (%eax,%ebp,1),%ebp mov (%eax,%ebp,1),%ebp jmp *-4(%edi) jmp *-4(%edi)
SLIDE 13 Speedup over plain threaded code
compress jess db javac mpegaudio mtrt jack speedup Pentium 4 4 2.8 2 1.4 1.0 plain threaded code
- throw simple -repl
- throw soph -repl
- throw simple +repl
- throw soph +repl
+throw simple -repl +throw soph -repl +throw simple +repl +throw soph +repl
SLIDE 14 Speedup of various JVMs over Cacao with Superinstructions
compress jess db javac mpegaudio mtrt jack speedup Pentium 4 10 5 2 1.0 0.5 0.2 0.1 0.05 0.02 Kaffe int JamVM gij HotSpot int J9 int SableVM cacao threaded cacao +throw soph +repl cacao jit HotSpot mixed J9 mixed Jikes RVM jrockit kaffe jit
SLIDE 15 Conclusion
- Superinstructions can provide big speedups
- Replication has little impact
- Quickening:
New sophisticated solution but simple solution performs well in JIT setting
- Relocatability of throwing VM instructions:
Big performance impact Solution: replace relative with indirect jumps