Thursday, April 13, 2023

Serious Progress in Low-Pause Garbage Collectors (2023)

I was recently surfing some random stuff about Typescript and V8, and it reminded me to check-in on my passion for low-pause garbage collectors.

Back in 2005, the only zero-pause game in town was the commercial Azul Zing JVM with Azul C4 pauseless collector. Lua 5.1 (2005) also introduced an incremental collector, and since it was mostly used with small heaps in games, this was probably pretty low pause, but I couldn't find any data.

In 2023, it looks like what was once a fringe area has now hit the mainstream in many runtimes. Most of this effort seems focused on the server-side ultra-large heaps, where pause times were the most extreme. Sadly, the efforts for front-end interactive systems have been less impressive.

Google Go was ahead of the game here, introducing changes to dramatically lower worst-case pause times. In 2015, one Twitter server's pause times with >1TB heap dropped from ~300ms to ~30ms because of Go GC improvements. In 2016 a new set of changes dropped this to ~2.5ms on average (on an 18TB heap). In 2017, this dropped again, to ~0.6ms. This is with memory consumption at 2x of live-manged-data, which is a bit wasteful, but hardly a problem for many applications.

Java introduced a new experimental "Z" Garabge Collector in Java 11 in 2018, which was made production in JDK 16 in 2020. ZGC claims <10ms worst-case pause times on >1TB heaps. Average pause times are on the order of 0.25ms! This comes at a 10% reduction in throughput, which seems modest given the benefits. I couldn't find any hard data on memory overhead, but casual mentions seemed on the order of 30% more than G1-GC.


Javascript-V8 introduced a number of improvements in 2018 lowering average pause times to around 50ms. It's not nearly at the sub-1ms level of Google-Go and Java-ZGC, but it's making progress. It's ironic that the most popular user-interactive GC has higher pause times on tiny desktop and mobile heaps than servers do with many-terrabyte heaps, but at least it's progress.


Microsoft CLR, in 2023, has no low-pause collector, nor does the open-source Mono-CLR runtime. I believe Microsoft CLR is mostly used for server-side and client-side business apps which are not particularly sensitive to pause times (though users will notice them). The Mono-sgen-gc lowered pause times, but only from abysmal to 2005-state-of-the-art levels. It's still ~20ms for very small heaps in mobile apps and Unity, which is way more than the 16.6ms frame budget of 60fps.



No comments:

Post a Comment