You are currently viewing Going beyond Java 8: Text Block

Going beyond Java 8: Text Block

According to some surveys such as that of JetBrains, version 8 of Java is currently the most used by developers all over the world, despite being a 2014 release.
What you are reading is an article in a series titled “Going beyond Java 8”, inspired by the contents of my book “Java for Aliens”. These articles will guide the reader step by step to explore the most important features introduced starting from version 9. The aim is to make the reader aware of how important it is to move forward from Java 8, explaining the enormous advantages that the latest versions of the language offer.

String is undoubtedly the most used class in Java, and represents an exception among the classes of the standard library. In fact, its objects are always immutable, and these can be instantiated with a simplified syntax that makes us avoid the verbosity of the new operator and the call to the constructor, as is standard for almost all other classes. In addition, the memory management of these String objects is characterized by the reuse of instances already created through an internally-managed pool of strings. In the latest versions other improvements are being made to this fundamental class to make its use more efficient, simpler to use and less verbose. The compact strings introduced in Java 9 have undoubtedly made strings more performing. Then with Java 13, a new feature called text blocks has been introduced that allows us to use the String class in a more profitable and easier way. This feature allows strings to be defined on multiple lines using a new syntax. The formatting of multiline strings is more natural than in the past: it will no longer be necessary to use string concatenations, escape characters such as \n, and complex management of quotes and spaces. In this way the verbosity of the code decreases, and readability and ease of writing is improved. In Java 13 and Java 14, text blocks could be used as feature preview. Starting with Java 15 they have become a standard feature of the language.

 

What are they for?

Since Java is a language that usually interfaces with other languages and technologies, it often happens that you have to format within strings, code instructions written in other languages such as SQL, JPQL, XML, JavaScript, JSON, HTML etc. As we know, formatting is essential to understand the instructions of these languages.

For example, suppose you want to use the following HTML code, in a Java program:

<HTML>
  <BODY>
    <H1>Hello World!</H1>
  </BODY>
</HTML>

Before Java 13, to format HTML code like this inside a string, we were forced to use escape characters like \n to go to the next line:

String html = "<HTML>\n  <BODY>\n    <H1>Hello World!</H1>\n  </BODY>\n</HTML>";

To make everything readable you also need to concatenate multiple strings with the + operator

String htmlFile = "<HTML>\n" +
                  "  <BODY>\n" +
                  "    <H1>Hello World!</H1>\n" +
                  "  </BODY>\n" +
                  "</HTML>";

Formatting the code as done in the previous example, requires a lot of attention from the programmer. For example, if we want to use an attribute in the previous HTML tag that uses the double quotes symbol like this:

<H1 style="color: blue;">Hello World!</H1>

then we have to escape the double quotes (see Stranger things article on Java char type) to avoid syntax errors, as follows:

String htmlFile = "<HTML>\n" +
                  "  <BODY>\n" +
                  "    <H1 style=\"color: blue;\">Hello World!</H1>\n" +
                  "  </BODY>\n" +
                  "</HTML>";

From version 13 we can instead use a text block equivalently, which is similar to a normal String literal, but spans multiple lines and is delimited by sequences of three double quotes:

String htmlFile = """
                  <HTML>
                    <BODY>
                      <H1 style="color: blue;">Hello World!</H1>
                    </BODY>
                  </HTML>""";

We can see how the readability of the HTML code has improved, and how it is now easier with a copy-paste action to import the text block content from a HTML file, or copy the text block content to a HTML file. Furthermore, there is no need to use escape characters for the HTML quotes. However, there are some points to clarify, as we will see starting from the next section.

 

Syntax

As we saw in the previous example, a text block was defined inside an opening delimiter and a closing delimiter, represented by a sequence of three double quotes """. Actually, the situation is a little more complex, so let’s clarify by defining in detail the three parts that make up a text block: the opening delimiter, the text block content and the closing delimiter.

The opening delimiter is defined by a sequence of three double quotes, followed by zero or more spaces and a line terminator. The content of the text block starts from the first character after the line terminator. Therefore, any white spaces between the three quotation marks and the line terminator are not taken into consideration.

With the term white space, we mean the non-visible characters, identifiable invoking the static method boolean isWhitespace(int codepoint) of the Character class.

The closing delimiter, on the other hand, is defined only by three double quotes sequence. The content of the text block ends with the character preceding the first double quotes in the sequence, of the closing delimiter.

Finally, the text block content is equivalent to an ordinary String literal at runtime. Once compiled, a text block therefore becomes a full-fledged String literal, and at runtime it is stored in the string pool as usual. There is no possibility that at runtime the JVM will be able to distinguish ordinary string literals from those that have been created through a text block. Remember that the content of the text block starts from the first character after the line termination, however it is necessary to read next sections to master text blocks.

 

Compiling a Text Block

There are three phases that are performed during compile-time:

  1. Normalization of line terminators.
  2. Removal of white spaces that were introduced to align the text block with the Java code.
  3. Interpretation of escape characters.

 Before talking about normalization, however, let’s make a brief but fundamental premise. The content of a text block is usually made up of several lines formatted with a certain criterion. This involves managing both horizontal and vertical formatting. Horizontal formatting is usually supported by the use of the space character and the horizontal tab character. The latter, which is obtained by pressing the “TAB” key on the keyboard, and can be represented by the escape character \t, and by the Unicode code (code point) \u0009. To support vertical formatting instead, we need the characters to go to the next line, or the so-called line terminators. These, however, are no longer explicit with an escape character \n as is usually done in a normal String literal, but are implicitly defined within the source code simply going to the next line. But Unix-based platforms (for example Linux and MacOS systems), to go to the next line within a text file, use the Line Feed character (which we abbreviate with LF), and which can be represented in Java with the escape character \n, and with the code point \u000A. On the other hand, Windows systems use the Carriage Return and Line Feed character sequence as line terminators. In particular, the Carriage Return (which we abbreviate with CR), can be represented in Java with the escape character \r, and with the code point \u000D. We can therefore say that on Windows the line terminator is the combination CRLF (i.e., \u000D\u000A).

 

Normalization of line terminators

Normalization for text blocks always transforms all line terminators into LF, regardless of the platform on which it runs. This process is essential because, in carrying a source file from one platform to another, the number of characters may change. Suppose we have two Java source files that define the same text block. Let’s also assume that one of the two classes has been edited on a Linux system (where the line terminator corresponds to LF), and the other on a Windows system (where the line terminator is CRLF). A possible check using the equals method between the two text blocks will return false, even if to the naked eye they would seem identical. In fact, in the file edited on Windows there will be an extra character for each line (\r).

 

Removing superfluous white spaces

After the normalization process, a text block will clearly consist of one or more lines. The algorithm for removing superfluous white spaces, (i.e., the spaces introduced to align the text block code with the Java code) includes:

  • The removal of all white spaces that are at the end of each line.
  • The removal of all white spaces that are at the beginning of each line, common to all lines.

As for the first point, the white spaces at the end of a line are removed by default, because they are usually useless for formatting purposes.

As for the second point, however, if all non-empty lines begin with one or more white spaces, they are all examined by the compiler, which selects the minimum number of common initial white spaces. Then, just this number of white space is removed for each row. This is because it is assumed that these white spaces have been introduced to align the text box with the Java code that defines it. For example, consider the following code:

public class TextBlockDemo {
    public static void main(String args[]) {
        String htmlFile = """
                          <HTML>
                            <BODY>
                              <H1>Hello World!</H1>
                            </BODY>
                          </HTML>  """;
        System.out.println(htmlFile);   
    }
}

In this case, the HTML code defined in the text box has clearly been defined with different initial white spaces for each line, just for the purpose of aligning the content of the text block (the HTML code) with its opening delimiter (see figure 1).

Figure 1 - The superfluous white spaces are highlighted.

Even the white spaces that precede the closing delimiter of the text box in the last line are removed by the compiler.

For this reason, all the white spaces that are common to each line will be removed by the compiler, and the output of the previous class will be:

<HTML>
  <BODY>
    <H1>Hello World!</H1>
  </BODY>
</HTML>

and not:

                          <HTML>
                            <BODY>
                              <H1>Hello World!</H1>
                            </BODY>
                          </HTML>

If the text box closing delimiter had been found on the next line, we would have had another line in the output. For example, the following text box:

String htmlFile = """
                  <HTML>
                    <BODY>
                      <H1>Hello World!</H1>
                    </BODY>
                  </HTML>  
                  """;

would have printed with an extra blank trailing line due to the line terminator being moved to the next line:

<HTML>
  <BODY>
    <H1>Hello World!</H1>
  </BODY>
</HTML>
                                                                                                                                                             _

while the following text box:

        String htmlFile = """
                          <HTML>
                            <BODY>
                              <H1>Hello World!</H1>
                            </BODY>
                          </HTML>  
""";

would have produced the following output:

<HTML>
  <BODY>
    <H1>Hello World!</H1>
  </BODY>
</HTML>

In fact, the last line would have had zero leading white spaces, and this number would have been considered by the compiler as the number of white spaces to remove for all lines.

The algorithm described in this section is implemented through the use of the static method of the String class introduced with Java 13 stripIndent.

 

Interpretation of escape characters

Within the text block, it is also possible to use escape characters (see Stranger Things article on the Java char type). Technically it is also possible to use the escape characters \n, and \", but it is useless and therefore not recommended. In fact, \n is used to go to the next line within String literals, but text blocks are multiline in nature.

We can directly use the character " instead of the escape character \", since the delimiter of a text block does not consist of a single character ". In practice there is no possibility to confuse the characters " belonging to the string as delimiters of the string block itself.

There is only one case in which it is necessary to the escape the double quotes character: when the last character of the content of a text block is just ", which would then be attached to the text block closing delimiter, compromising its definition. In this case you need to use the escape character \".

However, there are other escape characters that can be used.

It is important that the interpretation of the escape characters takes place after the first two phases of normalization of the line terminators, and removal of superfluous white spaces. So in fact, escape characters like \n, \r and \f will not be removed during the first phase, while \b (BACKSPACE) and \t (TAB), will certainly not be removed in the second phase.

 

New escape characters

With Java 14, two new escape characters were introduced. The first coincides with the backslash \ symbol, and allows you to ignore line terminators following that character on the same line. In fact, if we have a string that is too long for which we don’t want to go to the next line, we usually use string concatenation to improve its readability. For example:

String lyrics1 = "The smile of dawn arrived early May, "
    + "she carried a gift from her home. " 
    + "The night shed a tear to tell her of fear " 
    + "and of sorrow and pain she'll never outgrow.";

now we can rewrite the same string with the following text block:

String lyrics2 = """
    The smile of dawn arrived early May, \
    she carried a gift from her home. \
    The night shed a tear to tell her of fear \
    and of sorrow and pain she'll never outgrow.""";

The \ character can only be used within the text block, and not in string literals.

The other escape character introduced with Java 14 is \s, and unlike the escape character \, it can now also be used in string literals. It is equivalent to the space character (identified with the code point \u0020), but used within a text block, it prevents the removal of white spaces at the end of the line that we have described in the “Removing superfluous white spaces” section. For example, we can write a text block, where each line always consists of 4 spaces. Figure 2 shows the detail of the execution with EJE (a Java editor created by me) of a program that uses this text block, where the selection of the output highlights the spaces that have been kept at the end of the line.

Figure 2 - The selection shows the spaces at the end of a line of a program launched with EJE.

 

Text Block Concatenation

Within text blocks, it is technically possible to concatenate text blocks with other text blocks, string literals, variables or method calls. Basically, text blocks can be used in all cases where String literals can be used. With concatenation, however, the readability may get worse. For example, consider the following snippet that defines and prints a text block representing a JavaScript function:

String functionName = "alert";
String jsFunction = "function dynamicFunction() {\n"+
                    "\t"+functionName+"(msg);\n" +
                    "}";
System.out.println(jsFunction);

Notice how we used concatenation to parameterize the function name.

The output will be:

function dynamicFunction() {
    alert(msg);
}

but the readability of the code is not very good! So let’s try to use a text block to improve the readability:

String functionName = "alert";
String jsFunction = """
                    function dynamicFunction() {
                    \t"""  + functionName + """
                    (msg);
                    }""";
System.out.println(jsFunction);

the output will be identical to the previous one, but the readability has even worsened! In fact, each text block spans on at least two lines, given the complexity of the definition of the opening delimiter.

 

Best Practice

In cases like this, it is preferable to use a single text block on which to call the replace method of the String class, for example, as in the following snippet:

String functionName = "$functionParameter";
String jsFunction = """
                    function dynamicFunction() {
                    \t$functionParameter(msg);
                    }""".replace("$functionParameter", "alert");
System.out.println(jsFunction);

More simply, we can use the new formatted method of the String class introduced with Java 13, as follows:

String jsFunction = """
                    function dynamicFunction() {
                    \t%s(msg);
                    }""".formatted("alert");
System.out.println(jsFunction); 

We can see how the formatted method has the same functionality on text blocks that the format method has on String literals. In fact, the above code is equivalent to the following snippet:

String jsFunction = String.format("""
                                  function dynamicFunction() {
                                  \t%s(msg);
                                  }""","alert");
System.out.println(jsFunction);

In the standard library, other methods have been introduced to support text blocks: indent, stripIndent and translateEscapes.

 

Conclusions

Text blocks allow you to create formatted multiline strings, and this will allow us to easily bring formatted code from other languages into Java code. After being introduced as a feature preview in version 13, text blocks have officially been promoted to a standard feature of Java in version 15. The rules governing text blocks are not so simple, and the impact on previous versions is considerable for syntax, new escape characters and new methods. However, once you understand some concepts, using text blocks is simple and particularly useful. Just think of the use we will make of them using languages such as SQL, JPQL, JavaScript, JSON, XML, HTML, CSS etc.

 

Author Notes

Even ignoring the increased security offered by the latest versions of the JDK, there are plenty of reasons to upgrade your knowledge of Java, or at least your own Java runtime installations. My book “Java for Aliens“, which inspired the ” Going beyond Java 8″ series, contains all the information you need to learn Java from scratch, and uses a well-tested teaching method that has been perfected over 20 years of experience, which makes learning simple and exciting. It is also structured to deepen the topics and have superior knowledge that can make a difference to your career.
This article is mainly inspired by section 13.5.5 of chapter 13 of my book ”
Java for Aliens“. You can freely download this section as PDF file from the “Samples” section of the
official website https://www.javaforaliens.com. In that sample you can also find a brief explanation
about new String methods that supports text blocks.

 

 

Leave a Reply