CPL 2016, week 7 Performance considerations Oleg Batrashev - - PowerPoint PPT Presentation
CPL 2016, week 7 Performance considerations Oleg Batrashev - - PowerPoint PPT Presentation
CPL 2016, week 7 Performance considerations Oleg Batrashev Institute of Computer Science, Tartu, Estonia March 21, 2016 Overview Studied so far: 1. Inter-thread visibility : JMM 2. Inter-thread synchronization : locks and monitors 3. Thread
Overview
Studied so far:
- 1. Inter-thread visibility: JMM
- 2. Inter-thread synchronization: locks and monitors
- 3. Thread management: executors, tasks, cancelation
- 4. Inter-thread communication: confinements, queues, back
pressure
- 5. Inter-thread collaboration: actors, inboxes, state diagrams
- 6. Asynchronous execution: callbacks, Pyramid of Doom, Java
8 promises. Today:
◮ Performance considerations: asynchronous IO, Java 8
streams.
Performance considerations 140/160 Context switch -
Outline
Performance considerations Context switch Green threads Asynchronous IO
Java NIO
Declarative concurrency Java 8 streams
Performance considerations 141/160 Context switch -
Variants of context switch
Context switch may refer to different things
◮ application changes CPU priority level (kernel/user) of a
running code
◮ system calls – set of basic operations supported by OS that
applications use to open file/socket, write/read it, ...
◮ CPU registers, stack, ... are reloaded in the core
◮ OS changes the thread that runs on a core ◮ OS changes the process that runs on a core
Our main interest is in switching threads, e.g.:
◮ if lock is taken, the thread must be suspended until it is
released
◮ if queue is empty, then consumer must be suspended ◮ if no more data in a socket, reader must be suspended
Too many context switches may degrade the performance!
Performance considerations 142/160 Context switch -
Context switch test
Thread/process context switch is 1-10 microseconds (system dependent).
- 1. Two actors with own thread: producer writes 1 million integer
values to the consumer actor, which sums them up.
- 2. Two actors with own thread: ping-pong of 0.5 million values.
- 3. Two actors with shared thread: ping-pong of 0.5 million values.
Case Total time Two actors: producer-consumer 0.41 s Two actors with own threads 6 s Two actors with shared thread 0.18 s
◮ ping-pong between two threads causes expected decline in
efficiency (6 µs per context switch, i.e. p(io)ng)
Performance considerations 143/160 Context switch -
Solutions to context switch
- 1. Let the same thread do most of the work
◮ from queue/actor model back to wandering threads
- 2. Make sure single thread does enough work before switching
◮ make message processing work expensive (in terms of
computation)
◮ keep queues full enough for consumers/transducers/actors –
handle several in a row before switching off to another thread
◮ not always possible
- 3. Do not switch thread when switching actors, consumers,
and/or transducers
◮ use green threads
This problem is only relevant in case of many actors and/or many batch non-applicapable messages!
Performance considerations 144/160 Green threads -
Outline
Performance considerations Context switch Green threads Asynchronous IO
Java NIO
Declarative concurrency Java 8 streams
Performance considerations 145/160 Green threads -
Idea
Green threads (library threads, user-level threads):
◮ user-level thread is maintained outside OS, on the user level
◮ implemented by library or VM
◮ 1 kernel-level (OS) thread per n user-level threads
◮ OS resources are allocated for 1 thread ◮ cheaper scheduling – no context switch needed
◮ m kernel threads per n user threads
Problems:
◮ need a way to suspend execution and save/restore thread stack
◮ i.e. preempt executing thread ◮ non-preemptable threads need to yield periodically
◮ IO may block OS thread, which is needed by other green
threads
Performance considerations 146/160 Green threads -
Implementations
Languages/VMs:
◮ Java 1.1 had green threads as the main implementation ◮ Erlang VM uses green threads with no shared state ◮ Go, Smalltalk
Libraries/frameworks/engines:
◮ Akka (Java) uses m-n model (specify dispatcher for an actor) ◮ CPython greenlet, eventlet, gevent ◮ Quasar (Java) modifies your code to save the stack (location
and local variables) See also:
◮ fibers, coroutines
Performance considerations 147/160 Asynchronous IO -
Outline
Performance considerations Context switch Green threads Asynchronous IO
Java NIO
Declarative concurrency Java 8 streams
Performance considerations 148/160 Asynchronous IO -
Blocking IO problem
◮ IO may block OS thread that is used for many green threads
Solutions:
- 1. Use dedicated thread pool for blocking IO (Clojure)
- 2. Use asynchronous IO (Erlang)
Some frameworks:
◮ Netty is a non-blocking I/O (NIO) client-server framework for
the development of Java network applications
◮ Asynchronous servlets in Servlet 3.0
Performance considerations 149/160 Asynchronous IO -
Ideas
◮ Synchronous IO suspends if no data is yet available ◮ Asynchronous IO – use callbacks that are executed when IO is
readable/writeable
◮ does not block on IO operations ◮ may read multiple sockets by single thread (selectors)
Advantages:
◮ avoids context switch when reading from multiple sockets ◮ solves green thread blocking IO problem
Disadvantages:
◮ requires more code to handle IO ◮ code becomes more scattered
Performance considerations 150/160 Asynchronous IO - Java NIO
Buffers and channels
http://tutorials.jenkov.com/java-nio/index.html
◮ buffers are much like arrays
◮ provide typical write-flip-read sequence ◮ used for Java NIO channels ◮ ByteBuffer.allocate(100)
◮ channels are much like streams, but
◮ both readable/writeable ◮ support asynchronous operation, read
AsynchronousByteChannel:
Future <Integer > read(ByteBuffer dst) void read(ByteBuffer dst , A attachment , CompletionHandler <Integer ,? super A> handler)
◮ write also supports these 2 forms: future and callback
Performance considerations 151/160 Asynchronous IO - Java NIO
Selectors
◮ may register callback for each channel we are interested ◮ easier way is to use selectors ◮ register as many channels as we want, select desired operation:
- channel. configureBlocking (false );
SelectionKey key = channel.register(selector , SelectionKey .OP_READ ); ◮ supported operations OP_CONNECT, OP_ACCEPT,
OP_READ, OP_WRITE
◮ use selector.select() – blocks until at least one channel is
ready for the events you registered for
◮ selector.selectedKeys() – returns the channels that are
ready
Performance considerations 152/160 Summary -
◮ context switch is changing executing mode, thread or process ◮ context switch is quite expensive on OS (kernel) level ◮ green threads (user-level threads) may mitigate the cost ◮ green threads have problems with preemption, saving stack
and blocking IO
◮ blocking IO may be solved by:
◮ using dedicated thread pool ◮ using asynchronous IO
Declarative concurrency 153/160
- Ideas
◮ Java <8 lacked functional style ◮ declarative = pure functional (see later Erlang,Clojure)
◮ single assignment variables, lock-step execution ◮ deterministic, no side effects, no race conditions ◮ lazyness, dataflow programming
◮ interest in performance (utilizing cores) ◮ structured declarative concurrency
◮ parallel map/filter/reduce
Declarative concurrency 154/160 Java 8 streams -
Outline
Performance considerations Context switch Green threads Asynchronous IO
Java NIO
Declarative concurrency Java 8 streams
Declarative concurrency 155/160 Java 8 streams -
Java8 streams
Like usual streams:
◮ sequence of values.
Unlike usual streams:
◮ do not have state, only for data transformation ◮ support map/filter/reduce transformations ◮ lazy – do not execute until data is needed
Create stream:
Stream <E> Collection <E>. stream () Arrays.stream(Object []) Stream.of(Object []) static <T> Stream <T> generate(Supplier <T> s) static <T> Stream <T> iterate(T seed , UnaryOperator <T> f) ◮ last 2 produce infinite streams
Declarative concurrency 156/160 Java 8 streams -
Collecting stream
◮ streams are not executed until their results are needed ◮ terminal operation – one that produces the result
Some terminal operations:
long count () Optional <T> max(Comparator <? super T> comparator) Optional <T> reduce(BinaryOperator <T> accumulator ) void forEach(Consumer <? super T> action) Object [] toArray () <R,A> R collect(Collector <? super T,A,R> collector) ◮ Collector interface is very general ◮ Collectors class contains a lot of standard implementations
◮ toList(), toSet(), ...
Declarative concurrency 157/160 Java 8 streams -
Transforming stream
◮ map – transform each element and return new stream <R> Stream <R> map(Function <? super T,? extends R> mapper) ◮ filter – select only some elements from the stream Stream <T> filter(Predicate <? super T> predicate) ◮ reduce – aggregate stream into the final result Optional <T> reduce(BinaryOperator <T> accumulator ) T reduce(T identity , BinaryOperator <T> accumulator ) ◮ flatMap – like map but combining resulting streams <R> Stream <R> flatMap(Function < ? super T, ? extends Stream <? extends R>> mapper)
◮ analogue of compose in CompletableFuture
Declarative concurrency 158/160 Java 8 streams -
Example: explicit
Collect travelers that has speed>20, take their max and min temperatures
◮ types of intermediate streams are given explicitly ◮ see next slide for more compact version ◮ limit() takes first n elements Stream <Traveler > travelers = Stream.generate(Traveler :: generate ); Stream <Traveler > trav10000 = travelers.limit (10000); Stream <Traveler > carTrav = trav10000.filter(t->t.speed >20.0); List <Double > carTemps = carTrav.map(t->t. temperature ) .collect( Collectors .toList ()); Optional <Double > minT = carTemps.stream (). min(Double :: compare ); Optional <Double > maxT = carTemps.stream (). max(Double :: compare ); System.out.println(minT + " "+ maxT );
Declarative concurrency 159/160 Java 8 streams -
Example: inline
◮ more readable than explicit version double [] temps2 = Stream.generate(Traveler :: generate) .limit (10000) .filter(t->t.speed >20.0) // fast moving . mapToDouble (t->t. temperature ) // take temperature .toArray (); System.out.println( DoubleStream .of(temps2 ). min () + " " + DoubleStream .of(temps2 ). max ()); ◮ do not overuse – many anonymous intermediate results may
confuse about what is actually happening
Declarative concurrency 160/160 Java 8 streams -
Parallel execution
Advantages to usual for loop:
◮ easily parallelizable, e.g. run the example in parallel double [] temps3 = Stream.generate(Traveler :: generate) .parallel () .limit (10000) .filter(t->t.speed >20.0) // fast moving . mapToDouble (t->t. temperature ) // take temperature .toArray (); System.out.println( DoubleStream .of(temps3 ). min () + " " + DoubleStream .of(temps3 ). max ()); ◮ streams are composable, e.g. may write Stream <Double > collectTempOfCarTravelers (Stream <Traveler >)
◮ combine it in different contexts ◮ no operation is executed until terminal operation for the whole