Tobias Wrigstad, for students

Master Theses at Oracle in 2018

As part of an on-going collaboration with Oracle, they are offering master thesis work for CS master students and IT civil engineering students at Uppsala University. These are carried out at the Oracle offices in Stockholm, where the Java VM is built, and supervised by members of the JVM team. I am the academic partner and will serve as the reviewer for these theses. If you are a student at UU and interested in these theses subjects, you should contact me.

Thesis 1: NUMA support for G1

On multi-socket systems memory access time depends on the memory location relative to the processor (locality group): "closer" memory access latency is significantly smaller than memory that is located with a different processor. Currently G1 does not exploit this by improving or at least keeping access locality the same.

Tentative goals:

implement the common heuristics used in literature that keeps objects in the same locality group as long as possible: a) let G1 keep data in the same locality group in the young generation and try to evenly spread data across locality groups in old generation; try to keep locality in both young and old generation
measure the impact of these strategies across a set of industry benchmarks
analyze other areas in e garbage collector that might benefit from NUMA awareness and potentially suggest changes.

Thesis 2: Throughput barrier

G1 is an incremental, generational garbage collector. It stores the remembered sets that allow incremental evacuation as plain sets of cards for all regions. This is different to other (non-incremental) generational collectors (like Parallel GC) that use a card table. The latter has shown to have clear advantages in throughput at the cost of latency.

A hybrid approach for storing remembered sets in G1, using a card table for the young generation like Parallel GC and the existing set-of-card approach for the old generation areas may exhibit a very good trade-off between throughput and latency.

Tentative goals:

Implement and explore a throughput-optimized mode for the G1 garbage collector to improve throughput on certain workloads that are not very latency sensitive.
Measure the impact in throughput and latency on several industry benchmarks.
Evaluate this trade-off to find out under which circumstances the one or other approach seems favorable.

Thesis 3: Region pinning

G1 is a region-based incremental garbage collector that can evacuate (move) around objects at will. While this allows you to fight fragmentation, moving around objects does not help with interfacing with non-garbage collected code like C.

For this reason, while some C code accesses Java memory, the VM prevents the garbage collector to perform its task. This does not work well in situations where memory is already tight. One alternative approach usable with G1 would be to, instead of preventing any garbage collection work, only prevent garbage collection work for heap areas which contain Java objects that are currently in use with native code.

Tentative goals:

implement pinning of any kind of region in the G1 collector. Consider implementing techniques to mitigate wasting complete regions for a few pinned objects by allowing simple allocation into them.
evaluate the change and compare it to the existing mechanism of providing non-moving memory

First Batch of Bachelor Students in 2017

In 2017, the Uppsala Encore team (led by myself and Dave Clarke) took on a number of bachelor students to do work on Encore. If you are not familiar with Encore, it is a concurrency-oriented programming language built around actors and active objects initially funded by the UPSCALE FP7 grant (Future and Emerging Technologies, Open X-Track) running from February 2014 until end of April 2017.

Encore shares some of its runtime with the Pony programming language, but makes several extensions to deal with active objects (support for e.g., blocking on futures without blocking the underlying thread), garbage collection that handles shared mutable state, etc.

The first batch of students presented on June 13th 2017 and included (in order of final presentations). Links to their final presentations will be added here as soon as they are officially in Diva.

Jonas Olander

Jonas' topic was the addition of separate compilation to Encore. Separate compilation is an important feature of any programming system, but one that is hard to fund on research grants. Jonas looked at several ways in which separate compilation could be added to Encore, given its current design where the Encore compiler spits out a C program that is linked with the run-time system into an executable.

Joel Wallin

Joel has been working on adding two new features to Encore: bestow and atomic, following a paper by Elias Castegren and Tobias Wrigstad. With these features, Encore allows an actor to hand out direct references to its internal state to other actors, but operations on these references are lifted into asynchronous operations, which are delegated to the containing actor, guaranteeing that only a single thread of control ever operates on bestowed objects. A similar concept exists in the E language, where it is the main source of concurrency control. Encore's type system makes operations on bestowed objects syntactically tractable and therefore easier to reason about with respect to performance.

The atomic operation allows pruning interleaving of messages in a message queue. This can be done by grouping messages which are sent as one, but by temporarily establishing a private “mailbox channel” between two actors causing all concurrent messages to be buffered in the public mailbox until the end of the atomic block. This is reminiscent of Pi calculus, but operates in a block-structured fashion.

Sahand Samal Taher

Sahand's work has been extending Encore with exception handling, yet another crucial feature that there were not enough cycles to implement during the UPSCALE project. How to handle exceptions is interesting both from a language design standpoint as inter-actor operations are asynchronous, and is also slightly technically challenging to implement due to the many moving parts in the run-time, PThreads and green threads, etc.

As a first increment, Sahand implemented support for Java-like try/catch/finally blocks, and means of communicating exceptions via futures as is the case for other languages e.g. asynchronous Java Script.

Noa Herngren

Noa's work has been in the realm of optimisation, focusing on the future construct in Encore. Since futures allow actors (and tasks, etc.) to block, they interact heavily with management of user-space “green threads” and because futures can be shared by multiple actors, they involve operations on shared mutable state, which naturally must be handled with care. For pragmatic reasons, the first stable implementation of futures in Encore relied on PThread mutexes to prune unwanted concurrent behaviour with respect to futures (e.g., one actor blocking on a futrue concurrently with another actor fulfilling the future can easily put the blocking actor to sleep indefinitely if not handled with care).

Noa has worked on removing the mutexes from futures and rewriting them in a lock-free fashion. By basing all the concurrency control around a lock-free Treiber stack, the future code became both lock-free and simpler at the same time (because the Treiber stack contains all the tricky bits). As a result of Noa's rewrite, the futures perform better -- both in the single-producer--single-consumer scenario, and when scaling to multiple consumers. Memory foorpra int, branch misprediction and cache misses both in data and instruction-level is significantly lower. In addition, Noa also created a family of futures which optimise for scenarios where only some subset of a future's interface is used.

Erik Fransson

Erik's work was centered around the implementation of asynchronous I/O in Encore. Erik designed and implemented a file system actor as a library which is able to centralise all file I/O ensuring that at most one of the PThreads driving the execution of actors in Encore is blocking waiting on I/O. (Naturally, many file system actors can be created as part of tuning, which may or may not execute in parallel depending on the overall system load and behaviour.)

Josef Hansson Karacoca

Josef's Encore work concerns the interplay between concurrency and parallelism. Josef has been using the building blocks provided by the Encore programming language to construct “locally distributed” data structures where normal passive (aka non-actor) objects acts as front-ends to a collection of actors that allow operations on the data structure to be performaed in parallel, provided that they are sufficiently disjoint. For example, it is possible to implement a large array constructed by a collection of subarrays each owned by one actor, and the passive front-end simply lifts sychronous requests into asynchronous requests delegated to the actor(s) that holds the relevant subarray(s).

Building on these data structures, Josef implemented several map reduce-style algorithms, both serving as evaluation and adding new and relevant building blocks to the Encore libraries.

Student Project Work 2017: Karolina Nikamo, Lowe Fredriksson Eklund, Casper Strömberg

As part of the independent project course, Karolina, Lowe and Casper sought us out to do some work on Encore. In the spirit of independency, they worked almost completely isolated from the core Encore team, and learned on their own, how to hack the Encore compiler, and how to extend the language. They were considering a number of features, and eventually picked default parameters and field initialisation. As a consequence, the following Encore program:

                        class Person
                        val name : String
                        val age : int
                        def init(age : int) : unit
                        this.name = "John Doe"
                        this.age = age
                        end
                        end

can be rewritten as

                        class Person
                        val name : String = "John Doe"
                        val age : int
                        def init(age : int = 42) : unit
                        this.age = age
                        end
                        end

which allows writing new Person() or new Person(42) instead of new Person(42, "John Doe").

For Uppsala/Prospective Students

My main research interest is design and implementation of programming languages and the connection between programming languages and software engineering.

With respect to programming languages, I am interested in programming abstractions — ways to express and model properties of interest in programs, and how to fortify abstractions and how to capture and enforce programmer intent. Abstraction is an essential task of programming and involves hiding details and emphasising rules and "interfaces". For example, a stack is an abstraction that presents a certain interface: elements are inserted and removed from the same end, and the only way to manipulate the order of elements on a stack are by popping them and pushing them back in a different order. The stack concept abstracts from the concrete implementation and emphasises the rules by which a stack can be manipulated to maintain its "stack-ness". A classic text book-example of a stack implementation is through a set of linked nodes, each holding an element. Unless the implementation of the stack abstraction is properly fortified, it could for example be possible for a user of a stack to gain access to its stack nodes and through direct manipulation of the stack nodes reorder elements or delete elements, or replace them, etc. — clear violations of "stack-ness".

A properly fortified abstraction is possible to replace by another — for example a stack implemented as an array. The ability to swap entire building blocks of a program for others is important because it allows us to alter non-functional aspects of a program (a classic example of a non-functional aspect is performance, another is security). This example also illustrates a connection with software engineering.

Related is the capturing and enforcing of programmer intent — programming languages typically focus on helping programmers perform calculations but do not understand non-functional aspects of the system, and are therefore unable to help programmers write code that correspond to their intentions. Mistakes and misconceptions surface in testing (if the tests are good enough, of course), where they may show up as errors whose root causes are hard to determine. A classic example is thread safety — the ability to use share something between threads that concurrently access it. Determining whether a piece of code is thread-safe can be very hard, and require lots of so-called non-local reasoning, i.e., checking many other places of the code that may somehow interplay with the code of interest.

Going back to the stack example before — whether it is safe to share a stack abstraction across threads either requires knowledge of its concrete implementation (e.g., we know it uses some internal form of synchronisation), or access to documentation. Proper abstraction dictates that we should not rely on one specific implementation (for software engineering reasons, for example so that we can swap one stack implementation for for another). Documentation however can be wrong, for example because the programmer thinks something is thread-safe that really isn't. This can happen for many reasons. For example, imagine the stack relies on some library L in its implementation. When the programmer implemented the stack, she checked the functions in L for thread-safety, but since then, L has changed and some functions are no longer thread-safe. Because most programming languages lack the ability to talk about thread-safety (as it is a non-functional property of the code) the compiler cannot warn us about this important change and consequently, the documentation of the stack incorrectly states it is thread-safe.

Enabling the programmer to state her intent about thread-safety allows the compiler to check that the implementation of the stack is thread-safe, and also captures the error of the changing properties of some functions in library L when the stack is recompiled against the updated L library.

Static checking (i.e., at compile-time) is conservative by nature, meaning that some programs will be incorrectly rejected (or raise warnings) because the compiler is unable to prove that they satisfy the intended properties. This is unfortunate, but I believe that a small number of false positives is a small price to pay for machine-checked proofs of fulfilment of relevant properties of source code. If something is labelled thread-safe, we should be able to rely on it.

This brings me to programming languages implementation. If compilers become powerful enough to guarantee certain properties of interest of a program, it is possible to compile and/or execute these programs more efficiently. For example, if I know that some object O is only visible through this one pointer, I can temporarily copy O into CPU registers and make lots of updates on O without slow and costly updates of main memory so that a possible (however unlikely) concurrent reader of O is guaranteed to see a up-to-date value. Similar arguments can be made for garbage collection, which can be greatly simplified if only some basic properties (that happen to hold for a majority of all objects and pointers, but not all) can be guaranteed. A good illustrating example of this can be found in Java: each method call in Java must start by performing a null-check on the target even though less than 10% of all variables ever contain null. If it was possible to statically exclude null values with a compile-time guarantee, much time (and energy) would be saved every day on a global scale. A similar argument can be made for thread-safety.

Theses Supervision

I have supervised a number of theses at Uppsala University, related to programming languages. I supervise both bachelor and master students.

Tobias Wrigstad

Master Theses at Oracle in 2018

Thesis 1: NUMA support for G1

Thesis 2: Throughput barrier

Thesis 3: Region pinning

Second Batch of Students 2017

First Batch of Bachelor Students in 2017

Jonas Olander

Joel Wallin

Sahand Samal Taher

Noa Herngren

Erik Fransson

Josef Hansson Karacoca

Student Project Work 2017: Karolina Nikamo, Lowe Fredriksson Eklund, Casper Strömberg

For Uppsala/Prospective Students

Theses Supervision

Recent Topics I have Supervised/Reviewed

Contents