What you are reading is the first in a series of articles titled “Stranger things in Java”, inspired by the contents of my book “Java for Aliens”. These articles are dedicated to insights of the Java language. Deepening the topics we use every day will allow us to master the Java coding even in the strangest scenario.Do you know that the following is a valid Java statement?
\u0069\u006E\u0074 \u0069 \u003D \u0038\u003B
main
method of any class and compile it. If
you then add also the following statement after the previous instruction:
System.out.println(i);
8
!
And do you know that this comment instead produces a syntax error at compile time?
/* * The file will be generated inside the C:\users\claudio folder */
char
.
The mystery of the comment error and other stories…
Primitive Character Data Type
As everyone knows, thechar
type is one of the eight primitive Java types. It allows us
to store
characters, one at a time. Below is a simple example where the character value is assigned to a
char
type:
char aCharacter = 'a';
String s = "Java melius semper quam latinam linguam est";
char
type, and all three modes
require the
inclusion of the value between single quotes:
- use a single printable character on the keyboard (for
example
'&'
); - use the Unicode format with hexadecimal notation (for example
'\u0061'
, which is equivalent to the decimal number97
and which identifies the'a'
character); - use a special escape character (for example
'\n'
which indicates the line feed character).
Printable Keyboard Characters
We can assign to achar
variable, any character found on our keyboard, provided that our
system
settings support the required character, and that the character is printable (for example the Canc
and
Enter
keys are not printable). In any case, the literal assignable to a char
primitive type is always
included between two single quotes. Here are some examples:
char aUppercase = 'A'; char minus = '-'; char at = '@';
Unicode Format (Hexadecimal Notation)
We said that the char primitive type is stored in 16 bits, and therefore can define as many as 65536 different characters. Unicode encoding deals with standardizing all the characters (and also symbols, emojis, ideograms etc.) that exist on this planet. Unicode is an extension of the encoding known as UTF-8, which in turn is based on the old 8-bit Extended ASCII standard, which in turn contains the oldest standard known as ASCII code (acronym for American Standard Code for Information Interchange). You can give a look to an ASCII table at this link. We can directly assign achar
a Unicode value in hexadecimal format using 4 digits, which
uniquely
identifies a given character, prefixing it with the prefix \u
(always lower case). For
example:
char phiCharacter = '\u03A6'; // Capital Greek letter Φ char nonIdentifiedUnicodeCharacter = '\uABC8';
char
type, we usually use classes like
String
and
Character
, but since is a very rare case and not interesting for the purpose of this
article, we will
not talk about it.
Special Escape Characters
In achar
type it is also possible to store special
escape characters, that is, sequences of
characters that cause particular behaviors in the printing:
\b
is equivalent to a backspace, that is a cancellation to the left (equivalent to theDelete
key)\n
is equivalent to a line feed (equivalent to theEnter key
)\\
equals only one\
(just because the\
character is used for escape characters)\t
is equivalent to a horizontal tab (equivalent to theTAB
key)\'
is equivalent to a single quote (a single quote delimits the literal of a character)\"
is equivalent to a double quote (a double quote delimits the literal of a string)\r
represents a carriage return (special character that moves the cursor to the beginning of the line)\f
represents a form feed (disused special character representing the cursor moving to the next page of the document)
'"'
to a character is perfectly legal, so the following
statement:
System.out.println('"');
char doubleQuotes = '"'; System.out.println(doubleQuotes);
"
System.out.println(''');
error: empty character literal System.out.println('''); ^ error: unclosed character literal System.out.println('''); ^ 2 errors
System.out.println("'IQ'");
'IQ'
\"
escape character to use double quotes within a
string. So, the
following statement:
System.out.println(""IQ"");
error: ')' expected System.out.println(""IQ""); ^ error: ';' expected System.out.println(""IQ""); ^ 2 errors
System.out.println("\"IQ\"");
"IQ"
Write Java Code with the Unicode Format
The Unicode literal format can also be used to replace any line of our code. In fact, the compiler first transforms the Unicode format into a character, and then evaluates the syntax. For example, we could rewrite the following statement:int i = 8;
\u0069\u006E\u0074 \u0069 \u003D \u0038\u003B
System.out.println("i = " + i);
i = 8
Unicode Format for Escape Characters
The fact that the Unicode hexadecimal format is transformed by the compiler before it evaluates the code, has some consequences and justifies the existence of escape characters. For example, let’s consider the line feed character which can be represented with the escape character\n
. Theoretically,
the line feed is associated in the Unicode encoding to the
decimal number 10
(which corresponds to the
hexadecimal number A
). But, if we try to define it using the Unicode format:
char lineFeed = '\u000A';
error: illegal line end in character literal char lineFeed = '\u000d'; ^ 1 error
char lineFeed = ' ';
'
) that corresponds to the decimal number
39
(equivalent to the
hexadecimal number 27
) and that we can represent with the escape character
\'
, cannot be represented
with the Unicode format:
char singleQuote = '\u0027';
char singleQuote = ''';
error: empty character literal char singleQuote = '\u0027'; ^ error: unclosed character literal char singleQuote = '\u0027'; ^ 2 errors
D
(corresponding to the decimal number 13
), and already representable with the escape
character \r
,
there are problems. In fact, if we write:
char carriageReturn = '\u000d';
error: illegal line end in character literal char carriageReturn = '\u000d'; ^ 1 error
\
, represented by the decimal number 92
(corresponding
to the hexadecimal number
5C), and represented by the escape character \\
, if we write:
char backSlash = '\u005C';
error: unclosed character literal char backSlash = '\u005C'; ^ 1 error
char backSlash = '\';
\'
pair of characters is considered as an escape character
corresponding to an single quote
'
, and therefore the literal closure is missing another single quote.
On the other hand, if we consider the character "
, represented by the hexadecimal number
22
(corresponding to the decimal number 34
), and, represented by the escape character
\"
, if we write:
char quotationMark = '\u0022';
String quotationMarkString = "\u0022";
error: unclosed string literal String quotationMarkString = "\u0022"; ^ 1 error
String quotationMarkString = """
The mystery of the comment error
An even stranger situation is found when using single-line comments for Unicode formats such as carriage return or line feed. For example, despite being commented out, both of the following statements would give rise to compile-time errors!// char lineFeed = '\u000A'; // char carriageReturn = '\u000d';
/* char lineFeed = '\u000A'; char carriageReturn = '\u000d'; */
\u
is used in
a comment. For example, with the following comment, we will get a compile-time error:
/* * The file will be generated inside the C:\users\claudio folder */
\u
, it
will print
the following error:
error: illegal unicode escape * The file will be generated inside the C:\users\claudio folder ^ 1 error
Conclusions
In this article we have seen that the use of thechar
type in Java hides some truly
surprising special
cases. In particular, we have seen that it is possible to write Java code, using the Unicode format.
This is because the compiler first transforms the Unicode format into
a character, and then evaluates
the syntax. This implies that programmers can find syntax errors where they would never
expect,
especially inside the comments.
Author Notes
This article is a short excerpt from section 3.3.5 Primitive
Character Data Type of Volume 1 from my book “Java for Aliens”. For more information, please
visit www.javaforaliens.com (you can
download the section
3.3.5 from the Samples area). This article has been published
also on DZone.
(Italian Version here).