JVM ReCo Project

About

This page documents the output of the JVM ReCo project between Uppsala University's programming language lab (UPLANG) and Oracle. This work has enjoyed financing from The Swedish Foundation for Strategic Research, The Swedish Research Council, and corporate donations from Oracle. This research has focused mostly on the area of garbage collection, energy efficiency of garbage collection, and concurrent garbage collection. Part of this work serves as background for a Java Enhancement Proposal for automatic heap sizing.

Project Publications

Mark-Scavenge: Waiting for Trash to Take Itself Our (OOPSLA, 2024 -- to appear)
Mutator-Driven Object Placement using Load Barriers (MPLR, 2024 -- to appear)
Scheduling Garbage Collection for Energy Efficiency on Asymmetric Multicore Processors. (Art Science and Engineering of Programs 8(3), 2024)
Heap Size Adjustment with CPU Control (MPLR, 2023)
Analyzing and Predicting Energy Consumption of Garbage Collectors in OpenJDK (MPLR, 2022)
Compressed Forwarding Tables Reconsidered (MPLR, 2022)
Deep Dive into ZGC: A Modern Garbage Collector in OpenJDK (TOPLAS, 2022)
ThinGC: Complete Isolation with Marginal Overhead (ISMM, 2020)
Improving Program Locality in the GC using Hotness (PLDI, 2020)

Hackathon 2024

PhD Theses

Marina Shimchenko: Optimizing Energy Efficiency of Concurrent Garbage Collection

Master Theses

As part of an on-going collaboration with Oracle, we are offering master thesis work for CS master students and IT civil engineering students in the greater Stockholm area (Stockholm, Uppsala, Mälardalen, etc.). These are carried out at the Oracle offices in Stockholm, where the Java VM is built, and supervised by members of the JVM team. Professor Tobias Wrigstad (Uppsala University) is the academic partner. If you are a student in Sweden and is interested in these theses subjects, you should contact me.

Deep Dive into ZGC: A Modern Garbage Collector in OpenJDK

Journal paper by Albert Mingkun Yang and Tobias Wrigstad accepted for ACM Transactions on Programming Languages and Systems, 2022

ABSTRACT: ZGC is a modern, non-generational, region-based, mostly concurrent, parallel, mark-evacuate collector recently added to OpenJDK. It aims to have GC pauses that do not grow as the heap size increases, ofering low latency even with large heap sizes. The ZGC C++ source code is readily accessible in the OpenJDK repository, but reading it (25 KLOC) can be very intimidating, and one might easily get lost in low-level implementation details, obscuring the key concepts. To make the ZGC algorithm more approachable, this work provides a thorough description on a high level, focusing on the overall design with moderate implementation details. To explain the concurrency aspects, we provide a SPIN model that allows studying races between mutators and GC threads, and how they are resolved in ZGC. Such a model is not only useful for learning the current design (ofering a deterministic and interactive experience) but also beneicial for prototyping new ideas and extensions. Our hope is that our detailed description and the SPIN model will enable the use of ZGC as a building block for future GC research, and research ideas implemented on top of it could even be adopted in the industry more readily, bridging the gap between academia and industry in the context of GC research.

Download Albert's paper here (Gold Open Access)

ThinGC: Complete Isolation with Marginal Overhead

Conference paper by Albert Mingkun Yang, Erik Österlund, Hanna Nyblom, Jesper Wilhelmsson and Tobias Wrigstad accepted for International Symposium on Memory Management, 2020

ABSTRACT: Previous works on leak-tolerating GC and write-rationing GC show that most reads/writes in an application are concentrated to a small number of objects. This suggests that many applications enjoy a clear and stable clustering of hot (recently read and/or written) and cold (the inverse of hot) objects. These results have been shown in the context of Jikes RVM, for stop-the-world collectors. This paper explores a similar design for a concurrent collector in the context of OpenJDK, plus a separate collector to manage cold objects in their own subheap. We evaluate the design and implementation of ThinGC using algorithms from JGraphT and the DaCapo suite. The results show that ThinGC considers fewer objects cold than previous work, and maintaining separate subheaps of hot and cold objects induces marginal overhead for most benchmarks except one, where large slowdown due

Download Albert's paper here (Preprint)

Improving Program Locality in the GC using Hotness

Conference paper by Albert Mingkun Yang, Erik Österlund, Tobias Wrigstad accepted for Programming Languages Design and Implementation, 2020

Note: this paper won the distinguished artefact award.

ABSTRACT: The hierarchical memory system with increasingly small and increasingly fast memory closer to the CPU has for long been at the heart of hiding, or mitigating the performance gap between memories and processors. To utilise this hardware, programs must be written to exhibit good object locality. In languages like C/C++, programmers can carefully plan how objects should be laid out (albeit time consuming and error-prone); for managed languages, especially ones with moving garbage collectors, a manually created optimal layout may be destroyed in the process of object relocation. For managed languages that present an abstract view of memory, the solution lies in making the garbage collector aware of object locality, and strive to achieve and maintain good locality, even in the face of multi-phased programs that exhibit different behaviour across different phases.

This paper presents a GC design that dynamically reorganises objects in the order mutators access them, and additionally strives to separate frequently and infrequently used objects in memory. This improves locality and the efficiency of hardware prefetching. Identifying frequently used objects is done at run-time, with small overhead. HCSGC also offers tunability, e.g., for shifting relocation work towards mutators, or for more or less aggressive object relocation.

The ideas are evaluated in the context of the ZGC collector on OpenJDK and yields performance improvements of 5% (tradebeans), 9% (h2) and an impressive 25-45% (JGraphT), all with 95% confidence. For SPECjbb, results are inconclusive due to a fluctuating baseline.

Download Albert's paper here (Preprint)

Improving relocation performance in ZGC by identifying the size of small objects

MSC Thesis, Jinyu Yu

ABSTRACT: Modern Garbage Collectors provide performance improvements by increasing program locality to utilize the faster CPU cache. A common approach is to move objects together according to the mutators’ access order, which brings more relocations during GC. In most cases, more relocations would not impact performance when using concurrent Garbage Collectors such as ZGC. However, in constrained environments with fewer CPU cores or less memory, bad relocation performance will cause overall performance degradation. In this thesis, we investigated why larger objects do not benefit from better program locality, then proposed a new design to reduce the number of relocations by efficiently identifying and ignoring larger objects. As a result, the relocation performance can be improved. In constrained environments, this can lead to an increase in overall throughput.

In the new design, we introduce an extra page type, the tiny page. If an object is considerably small that it could benefit from relocation, it will be placed on the tiny page when allocating. As a result, we could replace the time‐consuming size check of objects with a faster page type check. Memory fragmentation also can be reduced by this design.

To evaluate this design, we add the size identification procedure into a locality improvement implementation named HCSGC. The results of benchmarks show a slight improvement in constrained environments. In the JGraphT benchmark, we see a 3‐5% speedup in different configurations with memory limitations. In the SPECjbb2015 benchmark, we see a ~1% increase in performance on average, but with overlapping confidence intervals. In the DaCapo benchmark suite, we see a 1% improvement in the sunflow benchmark with CPU constraint. For other benchmarks in DaCapo, no significant difference is discovered. The results suggest that the proposed new design is a feasible way of filtering out larger objects, and doing so can further improve the relocation and overall performance.

Resources: Download Jinyu's thesis | Popular presentation

Addressing Fragmentation in ZGC through Custom Allocators: Leveraging a Lean, Mean, Free-List Machine

MSC Thesis, Joel Sikström

The Java programming language manages memory automatically through the use of a garbage collector (GC). The Java Virtual Machine provides several GCs tuned for different usage scenarios. One such GC is ZGC. Both ZGC and other GCs utilize bump-pointer allocation, which allocates objects compactly but leads to the creation of unusable memory gaps over time, known as fragmentation. ZGC handles fragmentation through relocation, a process which is costly. This thesis proposes an alternative memory allocation method leveraging free-lists to reduce the need for relocation to manage fragmentation.We design and develop a new allocator tailored for ZGC, based on the TLSF allocator by Masmano et al. Previous research on the customization of allocators shows varying results and does not fully investigate usage in complex environments like a GC.Opportunities for enhancements in performance and memory efficiency are identified and implemented through the exploration of ZGC's operational boundaries. The most significant adaptation is the introduction of a 0-byte header, which leverages information within ZGC to significantly reduce internal fragmentation of the allocator. We evaluate the performance of our adapted allocator and compare it to a reference implementation of TLSF. Results show that the adapted allocator performs on par with the reference implementation for single allocations but is slightly slower for single frees and when applying allocation patterns from real-world programs. The findings of this work suggest that customizing allocators for garbage collection is worth considering and may be useful for future integration.

Resources: Link to thesis

Compressing Pointers for the Z Garbage Collector: Runtime compression of pointers in a concurrent setting

MSC Thesis, Linus Shoravi

Pointers in 64-bit architectures are unlikely to exhaust their vast address range, and are as such needlessly big. Reducing the amount of memory a pointer occupies leads to reduced memory demands, better usage of memory, and better locality. Pointer compression is a term that encompasses techniques that aim to make pointers occupy less memory, often to 32-bit for the sake of word alignment. Pointers that are 32-bit embody the opposite problem of having too restricted of an address range, being able to address only 4 GB. Z is a garbage collector in the HotSpot JVM which does not support pointer compression. Partly because the aforementioned address range restriction, and partly because the implementation of compressed pointers which exist in HotSpot would clash with the goals of the garbage collector. This project explores ways of implementing pointer compression for Z that isn't detrimental to the goals of the garbage collector, and aims to find where problems may occur. The outset was to explore compressing speculatively during runtime. The result is a design that relies on a custom bit layout for compressed pointers, inspecting bit layouts of the pointers on each read and write to detect the compression status. This seems to be the most promising in terms of code maintainability and ease of implementation.

Resources: Link to thesis

Moving Garbage Collection with Low-Variation Memory Overhead

MSC Thesis, Jonas Norlinder

ABSTRACT: A parallel and concurrent garbage collector offers low latency spikes. A common approach in such collectors is to move objects around in memory without stopping the application. This imposes additional overhead on an application in the form of tracking objects' movements, so that all pointers to them, can eventually be updated to the new locations. Typical ways of storing this information suffer from pathological cases where the size of this "forwarding information" can theoretically become as big as the heap itself. If we dimension the application for the pathological case this would be a waste of resources, since the memory usage is usually significantly less. This makes it hard to determine an application's memory requirements.

In this thesis, we propose a new design that trades memory for CPU, with a maximum memory overhead of less than 3.2% memory overhead. To evaluate the impact of this trade-off, measurements on application execution time was performed using the benchmarks from the DaCapo suite and SPECjbb2015. For 6 configurations in DaCapo a statistically significant negative effect on execution time in the range of 1-3% was found for the new design. For 10 configurations in DaCapo no change in execution times was shown in statistically significant data and two configurations in DaCapo showed a statistically significant shorter execution times for the new design on 6% and 22%, respectively. In SPECjbb2015, both max-jOPS and critical-jOPS has, for the new design, a statistically significant performance regression of ~2%. This suggests that for applications were stable and predictable memory footprint is important, this approach could be something to consider.

Resources: Download Jonas's thesis | Popular presentation

Direct Heap Snapshotting in the Java HotSpot VM: a Prototype

MSC Thesis, Ludvig Janiuk

ABSTRACT: The Java programming language is widely used across the world, powering a diverse range of technologies. However, the Java Virtual Machine suffers from long startup time and a large memory footprint. This becomes a problem when Java is used in short-lived programs such as microservices, in which the long initialization time might dominate the program runtime and even violate service level agreements. Checkpoint/Restore (C/R) is a technique which has reduced startup times for other applications, as well as reduced memory footprint. This thesis presents a prototype of a variant of C/R on the OpenJDK JVM, which saves a snapshot of the Java heap at some time during initialization. The primary goal was to see whether this was possible. The implementation successfully skips parts of initialization and the resulting program still seems to execute correctly under unit tests and test programs. It also reduces runtime by a minuscule amount under certain conditions. The portion of initialization being snapshotted would need to be further extended in order to result in larger time savings, which is a promising avenue for future work.

Resources: Download Ludvig's thesis here | Popular presentation

An Experimental Study on the Behavioural Tendencies of Objects Classified As Hot and Cold by a Java Virtual Machine Garbage Collector

MSC Thesis, Hanna Nyblom

ABSTRACT: A constitutive hypothesis of the Java Virtual Machine garbage collector ”ThinGC”, presented by Albert Mingkun Yang et al., an extension of Oracle’s ”ZGC”, is that capitalising on possible temporal locality could optimise collection by limiting the total number of objects to manage. To achieve GC optimisations, ThinGC classifies objects as hot (recently referenced) or cold, separates the hot and cold objects into distinct memory spaces, and collects the spaces separately using two garbage collectors.

In order to examine to what extent this temporal locality can actually be observed, this thesis analyses the behaviour of objects classified as hot and cold by ThinGC. Reviewed behaviour includes: tendency of cold objects to remain cold, expected length of cold streaks, and if the tendency to remain hot or cold is related to the type of the object.

In order to examine object behaviour, hotness information for all objects in selected benchmarks from the DaCapo benchmark suite is logged in each GC cycle of ThinGC. The hotness information of each object is then compiled following address forwardings, and metrics estimating the behaviour of each object is calculated.

Analysis of the charts and tables presenting the results of the metric calculations show for instance that ”reheats” of objects are uncommon, cold objects usually stay steadily cold, and long cold streaks are more common than long hot streaks.

The results highlight distinctly different behaviours of hot and cold objects and indicate that the concept of classifying objects by hotness and treating cold objects separately could be well founded.

The results also show some classes of objects being more or less likely to stay cold. If these class behaviours could eventually be proven to be reliable by examining a larger set of programs, the information could be useful as a baseline for GC tuning. Also, if hotness information were collected and similar metrics were calculated concurrently, this information could aid in live GC decision making.

Resources: Download Hanna's thesis | Popular presentation

JVM ReCo: UU+Oracle Collaboration