Everything you need to know about semicolon insertion in JavaScript

Thursday, 21/11/2019

Tram Ho

Preamble

Automatic semicolon insertion is one of the most controversial syntax features of JavaScript. There are also many misconceptions surrounding it.
Some JavaScript programmers use the semicolon at the end of each statement and some use them only when strictly required. Most or some programmers add semicolons as a matter of style.
Even if you use the semicolon at the end of each statement, there are still some parsing structures in ambiguous ways. Regardless of whether you prefer to add semicolons, you need to know the rules for writing JavaScript professionally. All the rules will be explained in this article, you will be able to understand the parsing of any program you encounter. After reading this article, I hope you will become an expert in automatic semicolon insertion JavaScript or ASI (automatic semicolon insertion).

Where semicolon is allowed

In the linguistic grammar format given in the ECMAScript specification, semicolons are displayed at the end of each sentence type that they may appear. Example of the following do-white statement:

do Statement while ( Expression ) ;

1 2	do Statement while ( Expression ) ;

The semicolon also appears in grammar at the end of the var statement, the expression statement (such as “4 + 4;” or “f ();” ), the continue , return , break , throw and debugger statements .
The empty statement is a semicolon, and is a valid statement in JavaScript. For this reason, “;;;” is a valid JavaScript program. It parses as three empty statements and runs by doing nothing three times.
Sometimes empty statements are really useful, at least syntactically. For example, to write an infinite loop, one could write while (1); , in which the semicolon is parsed as an empty statement, making the while statement syntactically valid. If the semicolon is omitted, the while statement will not complete, because a statement that follows the loop condition is required.
Finally, semicolons appear in the loop:

for ( Expression ; Expression ; Expression ) Statement

1 2	for ( Expression ; Expression ; Expression ) Statement

and of course, they can appear inside strings and regular expressions.

Where the semicolon can be ignored

In the grammar format used in the ECMAScript specification, semicolons are included, as described above. However, the specification then introduced rules that describe how the actual parsing differs from formal grammar.
This section outlines three basic rules, followed by two exceptions. The rules are:

When the program contains tokens that are not in grammatical format, the semicolon is inserted if (a) there are line breaks at the time or (b) the unwanted token is in curly braces
At the end of the file, if the program cannot be parsed, then a semicolon is inserted.
When encountering “restricted productions” containing a line end in the place where “no LineTerminator here” is located , then a semicolon is inserted.

These rules state that a statement can be terminated without a semicolon (a) before the closing brace, (b) at the end of the program, or (c) when the next token cannot be parsed. syntax.
The exceptions are semicolons that are never inserted as part of the for loop header:

for ( Expression ; Expression ; Expression ) Statement

1 2	for ( Expression ; Expression ; Expression ) Statement

and semicolons should never be inserted if it is parsed as a blank statement.

42; “hello!” is a valid program, just like 42 n “hello!” (with ” n” representing an actual line break), but 42 “hello!” do not. Line breaks automatically insert semicolons but spaces are not. “if (x) {y ()}” is also valid. Here “y ()” is an expression statement, which can be terminated with semicolons, but since next tokens are curly braces, semicolons are optional even though there are no line breaks.
Two exceptions, for empty loops and statements, can be proved together:

for (node=getNode();
     node.parent;
     node=node.parent) ;

for (node=getNode();

node.parent;

node=node.parent) ;

This for loop will repeat the parenting of a node until it meets a node without a parent. All of this is done in the for loop header, so we have nothing left for the statement inside the for loop to do. However, the for loop syntax requires a statement, so we use an empty statement. Although all three semicolons in this example appear at the end of the line, all three dots are required, because the semicolon is never inserted into the loop header or to create a blank statement.

Restricted Productions

Introduce

Restricted productions is after it the line break does not appear, if the line break appears there, it will prevent the program from executing in the inherent way, although it can still run in another way.

Classify

There are five types of restricted productions , they are the postfix ++ and – operators, the continue, break, return, throw statements . The break and continue statements are used to end or continue a loop with a specific label following it. If there is a label following it, it must be on the same line as the break or continue statement . The following is a valid program:

var c,i,l,quitchars
quitchars=['q','Q']
charloop:while(c=getc()){
    for (i=0; i&lt;quitchars.length; i++){
        if (c==quitchars[i]) break charloop
    }
    /* ... more code to handle other characters here ... */
}

var c,i,l,quitchars

quitchars=['q','Q']

charloop:while(c=getc()){

for (i=0; i<quitchars.length; i++){

if (c==quitchars[i]) break charloop

}

/* ... more code to handle other characters here ... */

}

getc () will read a character from an input device and return it, and the program will read those characters, checking each character for whether it is in an array of quitchars, or if it will end the loop. . Because the break statement has a label charloop , it exits the while loop, not just the for loop inside.

The following program, which differs only in whitespace, will parse it differently and will not produce similar results:

var c,i,l,quitchars
quitchars=['q','Q']
charloop:while(c=getc()){
    for (i=0; i&lt;quitchars.length; i++){
        if (c==quitchars[i])
            break
                charloop
    }
    /* ... more code to handle other characters here ... */
}

var c,i,l,quitchars

quitchars=['q','Q']

charloop:while(c=getc()){

for (i=0; i<quitchars.length; i++){

if (c==quitchars[i])

break

charloop

}

/* ... more code to handle other characters here ... */

}

Specifically, the label charloop is not part of the break statement. So a semicolon is automatically inserted after the break ended inner loop, also charloop only be parsed as reference variables allow charloop, will not be achieved. And while loop will run indefinitely.

Here are examples that illustrate the other four restricted productions :

// PostfixExpression :                                            
//              LeftHandSideExpression [no LineTerminator here] ++
//              LeftHandSideExpression [no LineTerminator here] --
var i=1;
i
++;

// PostfixExpression :

// LeftHandSideExpression [no LineTerminator here] ++

// LeftHandSideExpression [no LineTerminator here] --

var i=1;

++;

This is a syntax error, it will not parse into “i ++” . A line end cannot appear before the up or down postfix operators, so “++” or “-“ at the beginning of a line will never parse a part of the previous line.

i
++
j

This is not a syntax error, it parses as “i; ++ j” . The operators “++” or “-“ with the end of the line after it are not affected, they are still parsed with the expression that they modified.

// ReturnStatement: return [no LineTerminator here] Expressionopt ;
return
  {i:i, j:j}

// ReturnStatement: return [no LineTerminator here] Expressionopt ;

return

{i:i, j:j}

This code parses as an empty return statement, followed by an expression statement that will never be reached. Here is the code to achieve the following return statement:

return {
  i:i, j:j}
return (
  {i:i, j:j})
return {i:i
       ,j:j}

return {

i:i, j:j}

return (

{i:i, j:j})

return {i:i

,j:j}

Note that return statements may contain line breaks in the expression, not between the return code and the beginning of the expression. When the semicolon is omitted automatically, it is convenient because it allows the programmer to write an empty return statement without accidentally returning the value of the next line:

function initialize(a){
  // if already initialized, do nothing
  if(a.initialized) return
  a.initialized = true
  /* ... initialize a ... */
}

function initialize(a){

// if already initialized, do nothing

if(a.initialized) return

a.initialized = true

/* ... initialize a ... */

}

The continue and throw statement is similar to break and return :

continue innerloop // correct
 
continue
    innerloop;     // incorrect
// ThrowStatement : throw [no LineTerminator here] Expression ;
throw                                          // parse error
  new MyComplexError(a, b, c, more, args);
// Unlike the return, break, and continue statements, 
// the expression after "throw" is not optional, 
// so the above will not parse at all.
throw new MyComplexError(a, b, c, more, args); // correct
throw new MyComplexError(
    a, b, c, more, args);                      // also correct
// Any variation with 'new' and 'throw' on the same line is correct.

continue innerloop // correct

continue

innerloop; // incorrect

// ThrowStatement : throw [no LineTerminator here] Expression ;

throw // parse error

new MyComplexError(a, b, c, more, args);

// Unlike the return, break, and continue statements,

// the expression after "throw" is not optional,

// so the above will not parse at all.

throw new MyComplexError(a, b, c, more, args); // correct

throw new MyComplexError(

a, b, c, more, args); // also correct

// Any variation with 'new' and 'throw' on the same line is correct.

Note that indentation has no effect in analyzing ECMAScript programs, but the presence or absence of line breaks is. Therefore, any tool that handles JavaScript source code can remove leading spaces in lines (except in strings) without changing the semantics of the program, but line breaks cannot be indiscriminately replaced or replaced. with spaces or semicolons.

Common mistake

The most common mistake a programmer makes is to place the return value on the back line of the return statement, which is especially common when the value returned is a large object or string or multi-line string. Line breaks with postfix, break , continue and throw operators are rarely seen in practice, for the simple reason that the wrong line breaks seem unnatural to most programmers and therefore are not ability to be written.

Attention

The final sophistication of ASI arises from the first rule, requiring the program to contain tokens that are not grammatically allowed, before the semicolon will be inserted. When writing code with the optional semicolon omitted, it is important to keep in mind this rule so that the required semicolon is not accidentally omitted. This rule is what makes it possible to extend statements across multiple lines, as in the following examples:

return obj.method('abc')
          .method('xyz')
          .method('pqr')
 
return "a long stringn"
     + "continued acrossn"
     + "several lines"
 
totalArea = rect_a.height * rect_a.width
          + rect_b.height * rect_b.width
          + circ.radius * circ.radius * Math.PI

return obj.method('abc')

.method('xyz')

.method('pqr')

return "a long stringn"

+ "continued acrossn"

+ "several lines"

totalArea = rect_a.height * rect_a.width

+ rect_b.height * rect_b.width

+ circ.radius * circ.radius * Math.PI

The rule only looks at the first code of the following line. If the code can parse as part of the statement, the statement will continue. If the first code cannot extend the statement, a new statement will start (then the semicolon is inserted automatically as specified in the specification).

The possibility of an error whenever there is a pair of A and B statements where both A and B are valid stand alone statements, but the first code of B can also be accepted as an extension of A. In such cases, if a semicolon is not provided, the parser will not parse B as a separate statement and will reject the program or parse the way it was created. Unwanted program. Therefore, when the semicolon is omitted, the programmer must be careful with any pair of statements separated by line breaks such as:

A
B

For example, the following code snippet will produce unexpected results if semicolon is missing:

a = b + c 
(d + e).print()

a = b + c

(d + e).print()

will equal to:

a = b + c(d + e).print();

1 2	a = b + c(d + e).print();

The specification goes on to state: “In case the assignment statement must start with a left bracket, the programmer should provide a clear semicolon at the end of the previous statement instead of relying on an automatic semicolon. a more powerful alternative where the semicolon is intentionally omitted is to include a semicolon at the beginning of the line, right before the code to create a potential ambiguity:

a = b + c
;(d + e).print()

a = b + c

;(d + e).print()

The last tricky piece of code is the slash and this code can produce erroneous results:

var i,s
s="here is a string"
i=0
/[a-z]/g.exec(s)

var i,s

s="here is a string"

i=0

/[a-z]/g.exec(s)

On lines 1-3, we create and assign a number of variables and on line 4, we construct a regular expression / [az] / g that will match any character from az, and then we evaluate This regular expression with the string s using the exec method. Because the return value of exec () is not used, this code is not very useful, but one can expect it to compile. However, the slash can not only appear at the beginning of a regular expression, but also act as a division operator. That means the leading slash on line four will actually be parsed as a continuation of the statement assigned on the previous line. The entire lines three and four are analyzed in the form of a “i equals 0 divided by [az] divided by g.exec (s)”.

Wrong nontion

Many new JavaScript programmers have been advised to use semicolons everywhere and hope that if they don’t intentionally use the semicolon insertion rule, they can ignore the existence of all language features. this. Because restricted productions described above, it’s worth noting that the return statement, when aware of them, developers can then become overly alert with line breaks and avoid using them even if they will. Increase the clarity of the code. Ideally, you should be familiar with all ASI rules so that you can read any piece of code regardless of how it is written and write the code as clearly as possible.
Another problem is that there is no reason to worry about browser compatibility regarding semicolon insertion: all browsers follow the same rule and they are provided rules. by the ECMAScript specification and explained above.

Conclude

Should you remove the semicolon option? The answer depends on your personal preferences, but should be done on the basis of informed choice instead of vague concerns about unknown syntax traps or non-existent browser errors. If you remember the rules given here, you’re well equipped to make your own choices and read any JavaScript code easily.
If you choose to remove semicolons if possible, my advice is to insert them right before the opening parenthesis or square brackets in any statement that begins with a certain token or any code that begins with one of the arithmetic operators “/” , “+” or “-“ .
Whether you omit the semicolon or not, you must remember restricted productions ( return, break, continue, throw and postfix operators), and you should feel free to use line breaks everywhere else to improve your ability. Read your code.
Good luck! Reference source: http://inimino.org/~inimino/blog/javascript_semicolons

Share the news now

Source : Viblo