You are currently viewing Going Beyond Java 8: Compact Strings

Going Beyond Java 8: Compact Strings

According to some surveys such as that of JetBrains, version 8 of Java is currently the most used by developers all over the world, despite being a 2014 release.
What you are reading is the first in a series of articles titled “Going beyond Java 8”, inspired by the contents of my book “Java for Aliens”. These articles will guide the reader step by step to explore the most important features introduced starting from version 9. The aim is to make the reader aware of how important it is to move forward from Java 8, explaining the enormous advantages that the latest versions of the language offer.

In this article we will talk about compact strings, a mechanism introduced with Java 9, which represents one of the most valid reasons to abandon Java 8 and upgrade to one of the most recent versions.

Spoiler Alert

The String class is statistically the most used class in Java programming. Therefore, it seems important to ask ourselves how efficient the objects of this class are. The good news is that starting from Java 9, these objects are significantly better performing than the previous version. Moreover, this advantage is obtained practically without effort, that is, it will be enough to launch our program with a JVM version 9 (or higher), without adopting any precautions regarding our code. So, let’s understand what compact strings are and how to use them.

Behind the scenes

Up to Java 8, an array of char was used within the class to store the characters that made up the string. It was possible to verify this by reading the source code of the String class. To do this, simply search for the String file in the src.zip file located in the installation folder of the JDK version 8 (see figure 1).

Figure 1 - JDK 8: src.zip file location
Figure 1 - JDK 8: src.zip file location.

This file contains all the source files of the standard Java library. So, after unzipping it, we can find the source of the String.java class in the java/lang path (in fact the String class belongs to the java.lang package).
If we open this file with any editor, we can verify that the String class is declared as follows (we have removed some comments and other elements not useful for our discussion):

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final char value[];
    // omitted the rest of the code

Up to Java 8, therefore, the existence of the value character array implied that 16 bits (2 bytes) of memory were allocated for each character of a string.
Actually, in most applications, we use characters that can be stored in only 8 bits (1 byte). So, to get more performance in terms of speed and memory usage in our programs, in Java 9 the implementation of the String class has been revised to be supported by a byte array instead of a char array. Following is the initial part of the declaration of the String class in version 15 of Java, stripped of uninteresting elements:

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence {
    /** The value is used for character storage. */
    private final byte[] value;

    /**
     * The identifier of the encoding used to encode the bytes in
     * {@code value}.
     */
    private final byte coder;
    // omitted the rest of the code

From JDK 9, the src.zip file has been moved to the lib folder (see figure 2), and the packages have been included in the folders that represent the modules. So, the String.java source is now under the java.base/java/lang folders. In fact, java.base is the name of the module that contains the java.lang package.

Figure 2 - JDK 15: src.zip file location
Figure 2 - JDK 15: src.zip file location.

However, it is always possible to use less common characters that need to be stored in 16 bits (2 bytes). In fact, inside the String class, has been implemented a mechanism based on the coder variable which takes care of allocating the right amount of bytes for each character.

This mechanism is known as compact strings, and since version 9 of Java it is the method used by default by the JVM. Nothing changes programmatically, we will use strings as we have always used them. However, Java applications will perform better.

Are we really going to use half the memory for strings?

Although we have noticed that today the String class is supported by a byte array instead of a char array as in version 8, unfortunately with Java it is not possible to determine a priori how much memory a program will use. In fact, it is automatically managed by the complex mechanisms of the Garbage Collector, and at each execution our program could use very different amounts of memory. Furthermore, there is no way in Java to know precisely how much memory is being used for a certain object at any given time as is possible with other languages. With a strategy based on the Instrumentation interface of the java.lang.instrument package, it is possible to have an approximation of the size of an object, but this does not apply to strings which, being immutable objects, are allocated in memory in a different way than the other items. So, even if the compact strings mechanism seems to imply a memory saving, this is neither certain nor demonstrable. So, let’s see what the advantage involves using a JDK version 9 or higher with a code example.

Example

Let’s consider the following example:

public class CompactStringsDemo {
    public static void main(String[] args) {
        long initialTime = System.currentTimeMillis();
        long limit = 100_000;
        String s ="";
        for (int i = 0; i < limit; i++) {
            s += limit;
        }
        long totalTime = System.currentTimeMillis() - initialTime;
        System.out.println("Created "+ limit +" strings in "+ totalTime +
                               " milliseconds");
    }
}

In this class 100,000 strings are instantiated (which contain the very first 100,000 numbers) which are concatenated. Furthermore, the milliseconds it takes to create these instances and concatenate them are calculated and printed.
Let’s try to launch this application 5 times using the JDK version 15.1, and analyze the outputs:

java CompactStringsDemo
Created 100000 strings in 3539 milliseconds

java CompactStringsDemo
Created 100000 strings in 3548 milliseconds

java CompactStringsDemo
Created 100000 strings in 3564 milliseconds

java CompactStringsDemo
Created 100000 strings in 3561 milliseconds

java CompactStringsDemo
Created 100000 strings in 3609 milliseconds

We can observe that for each launch the speed of the application is almost constant, and is around 3.5 seconds.
So let’s try to disable compact strings using the -XX:-CompactStrings option, and try to run the same application 5 times and then analyze the results:

java -XX:-CompactStrings CompactStringsDemo
Created 100000 strings in 8731 milliseconds

java -XX:-CompactStrings CompactStringsDemo
Created 100000 strings in 8263 milliseconds

java -XX:-CompactStrings CompactStringsDemo
Created 100000 strings in 8547 milliseconds

java -XX:-CompactStrings CompactStringsDemo
Created 100000 strings in 8602 milliseconds

java -XX:-CompactStrings CompactStringsDemo
Created 100000 strings in 8353 milliseconds

Again, the performance in terms of speed is almost constant, but much worse than when we used the compact strings. In fact, the average execution speed of this application without compact strings turns out to be about 8.5 seconds, while when we used compact strings, the average was only about 3.5 seconds. A significant advantage that has saved us almost 60% of the time.
If we even recompile and relaunch the program directly with the latest build of Java 8 (JDK 1.8.0_261), the advantages are even more evident:

"C:\Program Files\Java\jdk1.8.0_261\bin\java" CompactStringsDemo
Created 100000 strings in 31113  milliseconds

"C:\Program Files\Java\jdk1.8.0_261\bin\java" CompactStringsDemo
Created 100000 strings in 30376  milliseconds

"C:\Program Files\Java\jdk1.8.0_261\bin\java" CompactStringsDemo
Created 100000 strings in 32868  milliseconds

"C:\Program Files\Java\jdk1.8.0_261\bin\java" CompactStringsDemo
Created 100000 strings in 32508  milliseconds

"C:\Program Files\Java\jdk1.8.0_261\bin\java" CompactStringsDemo
Created 100000 strings in 35328  milliseconds

The deterioration in performance this time is truly remarkable: with a JDK 15 and compact strings the performance of the application was almost 10 times better! Of course, this does not mean that all programs will have such great improvements, because our example was exclusively based on the allocation and concatenation of strings.
Regarding the saving of memory usage, although probable, as we have said, it cannot be proved since the Garbage Collector performs a complex job based on the current situation.

Conclusions

In this article we have seen the first valid reason to move forward from Java 8. The compact strings introduced starting from version 9, allow our programs to be more efficient when strings are used. Since the String class is statistically the most used class in Java programs, we can conclude that just using a JDK with a version greater than 8 will guarantee a faster execution speed for our applications. We also found that a JDK 15 without using compact strings still guarantees significantly higher performance than the latest build of the JDK 8.
Updating the JDK seems like the first step.

 

Author Notes

Even ignoring the increased security offered by the latest versions of the JDK, there are plenty of reasons to upgrade your knowledge of Java, or at least your own Java runtime installations. My book “Java for Aliens“, which inspired the ” Going beyond Java 8″ series, contains all the information you need to learn Java from scratch, and uses a well-tested teaching method that has been perfected over 20 years of experience, which makes learning simple and exciting. It is also structured to deepen the topics and have superior knowledge that can make a difference to your career.

For more information visit https://www.javaforaliens.com.

Leave a Reply