Premature optimization, where software thrives unless you kill it first - a tale of Java GC
Before going heads on into Java and the ways to tackle interference, either from the garbage collector or from context switching, let's first glance over the fundamentals of writing code for your future self.
Premature optimization is the root of all evil.
You've heard it before; premature optimization is the root of all evil. Well, sometimes. When writing software, I'm a firm believer of being:
1) as descriptive as possible; you should try to narrate intentions as if you were writing a story.
2) as optimal as possible; which means that you should know the fundamentals of the language and apply them accordingly.
As descriptive as possible
Your code should speak intention, and a lot of it pertains to the way you name methods and variables.
While numItems
is abstract, backPackItems
tells you a lot about expected behaviour.
Or say you have this method:
Can we do better? We definitely can!
Imagine you're reading the above code for the first time and stumble on the guard clause that checks if the user has actually visited countries. Also imagine this is buried in a lengthy class, reading Collections.emptyList()
is definitely more descriptive than new ArrayList<>(0)
, you're also making sure it's immutable making sure client code can't modify it.
As optimal as possible
Know your language and use it accordingly. If you need a double
there's no need to wrap it in a Double
object. The same goes to using a List
if all you actually need is an Array
.
Know that you should concatenate Strings using StringBuilder
or StringBuffer
if you're sharing state between threads.
// don't do this
String votesByCounty = "";
for (County county : counties) {
votesByCounty += county.toString();
}
// do this instead
StringBuilder votesByCounty = new StringBuilder();
for (County county : counties) {
votesByCounty.append(county.toString());
}
Know how to index your database. Anticipate bottlenecks and cache accordingly. All the above are optimizations. They are the kind of optimizations that you should be aware and implement as first citizens.
How do you kill it first?
I'll never forget about a hack I read a couple of years ago. Truth be said, the author backtracked quickly, but it goes to show how a lot of evil can spur from good intention.
You can read more on why and how the above code works in the original article and, while the exploit is definitely interesting, this is one of those things you should never ever do.
- Works by side effects,
Thread.sleep(0)
has no purpose in this block - Works by exploiting a deficiency of code downstream
- For anyone inheriting this code, it's obscure and magical
Only start forging something a bit more involved if, after writing with all the default optimizations the language provides, you've hit a bottleneck. But steer away from concoctions as the above.
How to tackle that Garbage Collector?
If after all's done, the Garbage Collector is still the piece that's offering resistance, these are some of the things you may try:
- If your service is so latency sensitive that you can't allow for GC, run with "Epsilon GC" and avoid GC altogether.
-XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC
This will obviously grow your memory until you get an OOM exception, so either it's a short-lived scenario or your program is optimized not to create objects - If your service is somewhat latency sensitive, but the allowed tolerance permits some leeway, run GC1 and feed it something like
-XX:MaxGCPauseTimeMillis=100
(default is 250ms) - If the issue spurs from external libraries, say one of them calls
System.gc()
orRuntime.getRuntime().gc()
which are stop-the-world garbage collectors, you can override offending behaviour by running with-XX:+DisableExplicitGC
- If you're running on a JVM above 11, do try the Z Garbage Collector (ZGC), performance improvements are monumental!
-XX:+UnlockExperimentalVMOptions -XX:+UseZGC
. You may also want to check this JDK 21 GC benchmark.
Version Start | Version End | Default GC |
---|---|---|
Java 1 | Java 4 | Serial Garbage Collector |
Java 5 | Java 8 | Parallel Garbage Collector |
Java 9 | ongoing | G1 Garbage Collector |
Note 1: since Java 15, ZGC
is production ready, but you still have to explicitly activate it with -XX:+UseZGC
.
Note 2: The VM considers machines as server-class if the VM detects more than two processors and a heap size larger or equal to 1792 MB. If not server-class, it will default to the Serial GC.
In essence, opt for GC tuning when it's clear that the application's performance constraints are directly tied to garbage collection behavior and you have the necessary expertise to make informed adjustments. Otherwise, trust the JVM's default settings and focus on optimizing application-level code.
u/shiphe - you'll want to read the full comment
Other relevant libraries you may want to explore:
Java Microbenchmark Harness (JMH)
If you're optimizing out of feeling without any real benchmarking, you're doing yourself a disservice. JMH is the de facto Java library to test your algorithms' performance. Use it.
Java-Thread-Affinity
Pinning a process to a specific core may improve cache hits. It will depend on the underlying hardware and how your routine is dealing with data. Nonetheless, this library makes it so easy to implement that, if a CPU intensive method is dragging you, you'll want to test it.
LMAX Disruptor
This is one of those libraries that, even if you don't need, you'll want to study. The idea is to allow for ultra low latency concurrency. But the way it's implemented, from mechanical sympathy to the ring buffer, brings a lot of new concepts. I still remember when I first discovered it, seven years ago, pulling an all-nighter to digest it.
Netflix jvmquake
The premiss of jvmquake
is that when things go sideways with the JVM, you want it to die and not hang. A couple of years ago, I was running simulations on an HTCondor cluster that was on tight memory constraints and sometimes jobs would get stuck due to "out of memory" errors. This library forces the JVM to die, allowing you to deal with the actual error. On this specific case, HTCondor would auto re-schedule the job.
Final thoughts
The code that made me write this post? I've written way worse. I still do. The best we can hope for is to continuously mess up less.
I'm expecting to be disgruntled looking at my own code a few years down the road.
And that's a good sign.
Given the nature of this post, I found it appropriate to promote a product I've had for quite some time, a home shredder!
This is the 3rd different model/brand I've had and can definitely attest to Amazon Basics sturdiness.
While I do prefer signed and encrypted PDFs, there are a lot of financial institutions that still share data via paper. I shred those. Then I shred some other trivial stuff just to make it safe by obscurity.
In all honesty, I find it soothing.
Edits & Thank You:
- to FlorianSchaetz for catching the mutability error in
visitedCountries()
and the detailed explanation. - to u/brunocborges and u/BikingSquirrel for explaining that on lower end machines, you get Serial GC
- to u/shiphe for taking the time to better explain when you should mess with the GC and when you shouldn't
- to u/tomwhoiscontrary to put me in the right track regarding what should be considered standard practice
- to u/BikingSquirrel (again) for providing the link to JDK 21 garbage collectors benchmark