Going Beyond Java 8: Compact Strings

Originally Posted February 6, 2021

Introduction

According to some surveys such as that of JetBrains, version 8 of Java is currently the most used by developers all over the world, despite being a 2014 release.

What you are reading is the first in a series of articles titled “Going beyond Java 8”, inspired by the contents of my book “Java for Aliens”. These articles will guide the reader step by step to explore the most important features introduced starting from version 9. The aim is to make the reader aware of how important it is to move forward from Java 8, explaining the enormous advantages that the latest versions of the language offer.

In this article, we will talk about compact stringsa mechanism introduced with Java 9, which represents one of the most valid reasons to abandon Java 8 and upgrade to one of the most recent versions.

Spoiler Alert

The String class is statistically the most used class in Java programming. Therefore, it seems important to ask ourselves how efficient the objects of this class are. The good news is that starting from Java 9, these objects are significantly better performing than the previous version. Moreover, this advantage is obtained practically without effort, that is, it will be enough to launch our program with a JVM version 9 (or higher), without adopting any precautions regarding our code. So, let’s understand what compact strings are and how to use them.

Behind the Scenes

Figure 1 – Location of the src.zip file inside the JDK version 8 installation folder.

Figure 1 – Location of the src.zip file inside the JDK version 8 installation folder.

Up to Java 8, an array of char was used within the class to store the characters that made up the string. It was possible to verify this by reading the source code of the String class. To do this, simply search for the String.java file in the src.zip file located in the installation folder of the JDK version 8.

This file contains all the source files of the standard Java library.

So, after unzipping it, we can find the source of the String.java class in the java/lang path (in fact the String class belongs to the java.lang package). If we open this file with any editor, we can verify that the String class is declared as follows (we have removed some comments and other elements not useful for our discussion):

Up to Java 8, therefore, the existence of the value character array implied that 16 bits (2 bytes) of memory were assigned for each character of a string.

Actually, in most applications, we use characters that can be stored in only 8 bits (1 byte). So, to get more performance in terms of speed and memory usage in our programs, in Java 9 the implementation of the String class has been revised to be supported by a byte array instead of a char array. Following is the initial part of the declaration of the String class in version 15 of Java, stripped of uninteresting elements:

Figure 2 – Location of the src.zip file within the JDK version 15 installation folder.

Figure 2 – Location of the src.zip file within the JDK version 15 installation folder.

From JDK 9, the src.zip file has been moved to the lib directory, and the packages have been included in the folders that represent the modules. So, the String.java source is now under the java.base/java/lang folders. In fact, java.base is the name of the module that contains the java.lang package.

However, it is always possible to use less common characters that need to be stored in 16 bits (2 bytes). In fact, inside the String class, has been implemented a mechanism based on the coder variable which takes care of allocating the right amount of bytes for each character. This mechanism is known as compact stringsand Since version 9 of Java it is the method used by default by the JVM. Nothing changes programmatically, we will use strings as we have always used them. However, Java applications will perform better.

Are We Really Going to Use Half the Memory for Strings?

Although we have noticed that today the String class is supported by a byte array instead of a char array as in version 8, unfortunately with Java it is not possible to determine a priori how much memory a program will use. In fact, it is automatically managed by the complex mechanisms of the Garbage Collector, and at each execution, our program could use very different amounts of memory. Furthermore, there is no way in Java to know exactly how much memory is being used for a certain object at any given time as is possible with other languages.

With a strategy based on the Instrumentation interface of the java.lang.instrument package, it is possible to have an approximation of the size of an object, but this does not apply to strings which, being immutable objects, are assigned in memory in a different way than the other items. So, even if the compact strings mechanism seems to imply a memory saving, this is neither certain nor demonstrable. So, let’s see what the advantage involves using a JDK version 9 or higher with a code example.

Example

Let’s consider the following example:

In this class, 100,000 strings are instantiated (which contain the very first 100,000 numbers) which are concatenated. Furthermore, the milliseconds it takes to create these instances and concatenate them are calculated and printed.

Let’s try to launch this application 5 times using the JDK version 15.1, and analyze the outputs:

We can observe that for each launch the speed of the application is almost constant, and is around 3.5 seconds.

So let’s try to disable compact strings using the -XX:-CompactStrings option, and try to run the same application 5 times and then analyze the results:

Again, the performance in terms of speed is almost constant, but much worse than when we use the compact strings. In fact, the average execution speed of this application without compact strings turns out to be about 8.5 seconds, while when we use compact strings, the average was only about 3.5 seconds. A significant advantage that has saved us almost 60% of the time.

If we even recompile and relaunch the program directly with the latest build of Java 8 (JDK 1.8.0_261), the advantages are even more evident:

The deterioration in performance this time is even more evident: with a JDK 15 and compact strings the performance of the application was almost 10 times better! Of course, this does not mean that all programs will have such great improvements because our example was exclusively based on the allocation and concatenation of strings.

Regarding the saving of memory usage, although probable, as we have said, it cannot be proved since the Garbage Collector performs a complex job based on the current situation.

Conclusions

In this article, we have seen the first valid reason to move forward from Java 8. The compact strings introduced starting from version 9, allow our programs to be more efficient when strings are used. Since the String class is statistically the most used class in Java programs, we can conclude that just using a JDK with a version greater than 8 will guarantee a faster execution speed for our applications. We also found that a JDK 15 without using compact strings still guarantees significantly higher performance than the latest build of the JDK 8.

Updating the JDK seems like the first step.

Author Notes

Even ignoring the increased security offered by the latest versions of the JDK, there are plenty of reasons to upgrade your knowledge of Java, or at least your own Java runtime installations. My book “Java for Aliens”, which inspired the “Going beyond Java 8” series, contains all the information you need to learn Java from scratch, and uses a well-tested teaching method that has been perfected over 20 years of experience, which makes learning simple and exciting. It is also structured to deepen the topics and have superior knowledge that can make a difference in your career.

.

Leave a Comment