JavaScript. The core

http://dmitrysoshnikov.com/ecmascript/javascript-the-core/#closures

 

 

Read this article in: Chinese , Japanese , German , Arabic , Russian , Korean , French .

This note is an overview and summary of the “ECMA-262-3 in detail ” series. Every section contains references to the appropriate matching chapters so you can read them to get a deeper understanding.

Intended audience: experienced programmers, professionals.

We start out by considering the concept of an object , which is fundamental to ECMAScript.

ECMAScript, being a highly-abstracted object-oriented language, deals with objects . There are also primitives , but they, when needed, are also converted to objects.

An object is a collection of properties and has a single prototype object . The prototype may be either an object or the null value.

Let’s take a basic example of an object. A prototype of an object is referenced by the internal [[Prototype]] property. However, in figures we will use __<internal-property>__ underscore notation instead of the double brackets, particularly for the prototype object: __proto__ (which is a real, but non-standard, feature in some engines, e.g. SpiderMonkey).

For the code:

var foo = {
   x: 10,
   y: 20
};

we have the structure with two explicit own properties and one implicit __proto__ property, which is the reference to the prototype of foo :

Figure 1. A basic object with a prototype.

Figure 1. A basic object with a prototype.

What for these prototypes are needed? Let’s consider a prototype chain concept to answer this question.

Prototype objects are also just simple objects and may have their own prototypes. If a prototype has a non-null reference to its prototype, and so on, this is called the prototype chain .

A prototype chain is a finite chain of objects which is used to implemented inheritance and shared properties .

Consider the case when we have two objects which differ only in some small part and all the other part is the same for both objects. Obviously, for a good designed system, we would like to reuse that similar functionality/code without repeating it in every single object. In class-based systems, this code reuse stylistics is called the class-based inheritance — you put similar functionality into the class A , and provide classes B and C which inherit from A and have their own small additional changes.

ECMAScript has no concept of a class. However, a code reuse stylistics does not differ much (though, in some aspects it’s even more flexible than class-based) and achieved via the prototype chain . This kind of inheritance is called a delegation based inheritance (or, closer to ECMAScript, a prototype based inheritance ).

Similarly like in the example with classes A , B and C , in ECMAScript you create objects: a , b , and c . Thus, object a stores this common part of both b and c objects. And b and c store just their own additional properties or methods.

var a = {
   x: 10,
   calculate: function (z) {
     return this .x + this .y + z
   }
};
 
var b = {
   y: 20,
   __proto__: a
};
 
var c = {
   y: 30,
   __proto__: a
};
 
// call the inherited method
b.calculate(30); // 60
c.calculate(40); // 80

Easy enough, isn’t it? We see that b and c have access to the calculate method which is defined in a object. And this is achieved exactly via this prototype chain.

The rule is simple: if a property or a method is not found in the object itself (i.e. the object has no such an own property), then there is an attempt to find this property/method in the prototype chain. If the property is not found in the prototype, then a prototype of the prototype is considered, and so on, i.e. the whole prototype chain (absolutely the same is made in class-based inheritance, when resolving an inherited method — there we go through the class chain ). The first found property/method with the same name is used. Thus, a found property is called inherited property. If the property is not found after the whole prototype chain lookup, then undefined value is returned.

Notice, that this value in using an inherited method is set to the original object, but not to the (prototype) object in which the method is found. I.e. in the example above this.y is taken from b and c , but not from a . However, this.x is taken from a , and again via the prototype chain mechanism.

If a prototype is not specified for an object explicitly, then the default value for __proto__ is taken — Object.prototype . Object Object.prototype itself also has a __proto__ , which is the final link of a chain and is set to null .

The next figure shows the inheritance hierarchy of our a , b and c objects:

Figure 2. A prototype chain.

Figure 2. A prototype chain.

Often it is needed to have objects with the same or similar state structure (i.e. the same set of properties), and with different state values . In this case we may use a constructor function which produces objects by specified pattern .

Besides creation of objects by specified pattern, a constructor function does another useful thing — it automatically sets a prototype object for newly created objects. This prototype object is stored in the ConstructorFunction.prototype property.

E.g., we may rewrite previous example with b and c objects using a constructor function. Thus, the role of the object a (a prototype) Foo.prototype plays:

// a constructor function
function Foo(y) {
   // which may create objects
   // by specified pattern: they have after
   // creation own "y" property
   this .y = y;
}
 
// also "Foo.prototype" stores reference
// to the prototype of newly created objects,
// so we may use it to define shared/inherited
// properties or methods, so the same as in
// previous example we have:
 
// inherited property "x"
Foo.prototype.x = 10;
 
// and inherited method "calculate"
Foo.prototype.calculate = function (z) {
   return this .x + this .y + z;
};
 
// now create our "b" and "c"
// objects using "pattern" Foo
var b = new Foo(20);
var c = new Foo(30);
 
// call the inherited method
b.calculate(30); // 60
c.calculate(40); // 80
 
// let's show that we reference
// properties we expect
 
console.log(
 
   b.__proto__ === Foo.prototype, // true
   c.__proto__ === Foo.prototype, // true
 
   // also "Foo.prototype" automatically creates
   // a special property "constructor", which is a
   // reference to the constructor function itself;
   // instances "b" and "c" may found it via
   // delegation and use to check their constructor
 
   b.constructor === Foo, // true
   c.constructor === Foo, // true
   Foo.prototype.constructor === Foo // true
 
   b.calculate === b.__proto__.calculate, // true
   b.__proto__.calculate === Foo.prototype.calculate // true
 
);

This code may be presented as the following relationship:

Figure 3. A constructor and objects relationship.

Figure 3. A constructor and objects relationship.

This figure again shows that every object has a prototype. Constructor function Foo also has its own __proto__ which is Function.prototype , and which in turn also references via its __proto__ property again to the Object.prototype . Thus, repeat, Foo.prototype is just an explicit property of Foo which refers to the prototype of b and c objects.

Formally, if to consider a concept of a classification (and we’ve exactly just now classified the new separated thing — Foo ), a combination of the constructor function and the prototype object may be called as a “class”. Actually, e.g. Python’s first-class dynamic classes have absolutely the same implementation of properties/methods resolution. From this viewpoint, classes of Python are just a syntactic sugar for delegation based inheritance used in ECMAScript.

The complete and detailed explanation of this topic may be found in the Chapter 7 of ES3 series. There are two parts: Chapter 7.1. OOP. The general theory , where you will find description of various OOP paradigms and stylistics and also their comparison with ECMAScript, and Chapter 7.2. OOP. ECMAScript implementation , devoted exactly to OOP in ECMAScript.

Now, when we know basic object aspects, let’s see on how the runtime program execution is implemented in ECMAScript. This is what is called an execution context stack , every element of which is abstractly may be represented as also an object. Yes, ECMAScript almost everywhere operates with concept of an object ;)

There are three types of ECMAScript code: global code, function code and eval code. Every code is evaluated in its execution context . There is only one global context and may be many instances of function and eval execution contexts. Every call of a function, enters the function execution context and evaluates the function code type. Every call of eval function, enters the eval execution context and evaluates its code.

Notice, that one function may generate infinite set of contexts, because every call to a function (even if the function calls itself recursively) produces a new context with a new context state :

function foo(bar) {}
 
// call the same function,
// generate three different
// contexts in each call, with
// different context state (e.g. value
// of the "bar" argument)
 
foo(10);
foo(20);
foo(30);

An execution context may activate another context, e.g. a function calls another function (or the global context calls a global function), and so on. Logically, this is implemented as a stack, which is called the execution context stack .

A context which activates another context is called a caller . A context is being activated is called a callee . A callee at the same time may be a caller of some other callee (e.g. a function called from the global context, calls then some inner function).

When a caller activates (calls) a callee, the caller suspends its execution and passes the control flow to the callee. The callee is pushed onto the the stack and is becoming a running (active) execution context. After the callee’s context ends, it returns control to the caller, and the evaluation of the caller’s context proceeds (it may activate then other contexts) till the its end, and so on. A callee may simply return or exit with an exception . A thrown but not caught exception may exit (pop from the stack) one or more contexts.

I.e. all the ECMAScript program runtime is presented as the execution context (EC ) stack , where top of this stack is an active context:

Figure 4. An execution context stack.

Figure 4. An execution context stack.

When program begins it enters the global execution context , which is the bottom and the first element of the stack. Then the global code provides some initialization, creates needed objects and functions. During the execution of the global context, its code may activate some other (already created) function, which will enter their execution contexts, pushing new elements onto the stack, and so on. After the initialization is done, the runtime system is waiting for some event (e.g. user’s mouse click) which will activate some function and which will enter a new execution context.

In the next figure, having some function context as EC1 and the global context as Global EC , we have the following stack modification on entering and exiting EC1 from the global context:

Figure 5. An execution context stack changes.

Figure 5. An execution context stack changes.

This is exactly how the runtime system of ECMAScript manages the execution of a code.

More information on execution context in ECMAScript may be found in the appropriate Chapter 1. Execution context .

As we said, every execution context in the stack may be presented as an object. Let’s see on its structure and what kind of state (which properties) a context is needed to execute its code.

An execution context abstractly may be represented as a simple object. Every execution context has set of properties (which we may call a context’s state ) necessary to track the execution progress of its associated code. In the next figure a structure of a context is shown:

Figure 6. An execution context structure.

Figure 6. An execution context structure.

Besides these three needed properties (a variable object , a this value and a scope chain ), an execution context may have any additional state depending on implementation.

Let’s consider these important properties of a context in detail.

A variable object is a scope of data related with the execution context. It’s a special object associated with the context and which stores variables and function declarations are being defined within the context.

Notice, that function expressions (in contrast with function declarations ) are not included into the variable object.

A variable object is an abstract concept. In different context types, physically, it’s presented using different object. For example, in the global context the variable object is the global object itself (that’s why we have an ability to refer global variables via property names of the global object).

Let’s consider the following example in the global execution context:

var foo = 10;
 
function bar() {} // function declaration, FD
( function baz() {}); // function expression, FE
 
console.log(
   this .foo == foo, // true
   window .bar == bar // true
);
 
console.log(baz); // ReferenceError, "baz" is not defined

Then the global context’s variable object (VO ) will have the following properties:

Figure 7. The global variable object.

Figure 7. The global variable object.

See again, that function baz being a function expression is not included into the variable object. That’s why we have a ReferenceError when trying to access it outside the function itself.

Notice, that in contrast with other languages (e.g. C/C++) in ECMAScript only functions create a new scope. Variables and inner functions defined within a scope of a function are not visible directly outside and do not pollute the global variable object.

Using eval we also enter a new (eval’s) execution context. However, eval uses either global’s variable object, or a variable object of the caller (e.g. a function from which eval is called).

And what about functions and their variable objects? In a function context, a variable object is presented as an activation object .

When a function is activated (called) by the caller, a special object, called an activation object is created. It’s filled with formal parameters and the special arguments object (which is a map of formal parameters but with index-properties). The activation object then is used as a variable object of the function context.

I.e. a function’s variable object is the same simple variable object, but besides variables and function declarations, it also stores formal parameters and arguments object and called the activation object .

Considering the following example:

function foo(x, y) {
   var z = 30;
   function bar() {} // FD
   ( function baz() {}); // FE
}
 
foo(10, 20);

we have the next activation object (AO ) of the foo function context:

Figure 8. An activation object.

Figure 8. An activation object.

And again the function expression baz is not included into the variable/activate object.

The complete description with all subtle cases (such as “hoisting” of variables and function declarations) of the topic may be found in the same name Chapter 2. Variable object .

And we are moving forward to the next section. As is known, in ECMAScript we may use inner functions and in these inner functions we may refer to variables of parent functions or variables of the global context. As we named a variable object as a scope object of the context, similarly to the discussed above prototype chain, there is so-called a scope chain .

A scope chain is a list of objects that are searched for identifiers appear in the code of the context.

The rule is again simple and similar to a prototype chain: if a variable is not found in the own scope (in the own variable/activation object), its lookup proceeds in the parent’s variable object, and so on.

Regarding contexts, identifiers are: names of variables, function declarations, formal parameters, etc. When a function refers in its code the identifier which is not a local variable (or a local function or a formal parameter), such variable is called a free variable . And to search these free variables exactly a scope chain is used.

In general case, a scope chain is a list of all those parent variable objects , plus (in the front of scope chain) the function’s own variable/activation object . However, the scope chain may contain also any other object, e.g. objects dynamically added to the scope chain during the execution of the context — such as with-objects or special objects of catch-clauses .

When resolving (looking up) an identifier, the scope chain is searched starting from the activation object, and then (if the identifier isn’t found in the own activation object) up to the top of the scope chain — repeat, the same just like with a prototype chain.

var x = 10;
 
( function foo() {
   var y = 20;
   ( function bar() {
     var z = 30;
     // "x" and "y" are "free variables"
     // and are found in the next (after
     // bar's activation object) object
     // of the bar's scope chain
     console.log(x + y + z);
   })();
})();

We may assume the linkage of the scope chain objects via the implicit __parent__ property, which refers to the next object in the chain. This approach may be tested in a real Rhino code , and exactly this technique is used in ES5 lexical environments (there it’s named an outer link). Another representation of a scope chain may be a simple array. Using a __parent__ concept, we may represent the example above with the following figure (thus parent variable objects are saved in the [[Scope]] property of a function):

Figure 9. A scope chain.

Figure 9. A scope chain.

At code execution, a scope chain may be augmented using with statement and catch clause objects. And since these objects are simple objects, they may have prototypes (and prototype chains). This fact leads to that scope chain lookup is two-dimensional : (1) first a scope chain link is considered, and then (2) on every scope chain’s link — into the depth of the link’s prototype chain (if the link of course has a prototype).

For this example:

Object .prototype.x = 10;
 
var w = 20;
var y = 30;
 
// in SpiderMonkey global object
// i.e. variable object of the global
// context inherits from "Object.prototype",
// so we may refer "not defined global
// variable x", which is found in
// the prototype chain
 
console.log(x); // 10
 
( function foo() {
 
   // "foo" local variables
   var w = 40;
   var x = 100;
 
   // "x" is found in the
   // "Object.prototype", because
   // {z: 50} inherits from it
 
   with ({z: 50}) {
     console.log(w, x, y , z); // 40, 10, 30, 50
   }
 
   // after "with" object is removed
   // from the scope chain, "x" is
   // again found in the AO of "foo" context;
   // variable "w" is also local
   console.log(x, w); // 100, 40
 
   // and that's how we may refer
   // shadowed global "w" variable in
   // the browser host environment
   console.log( window .w); // 20
 
})();

we have the following structure (that is, before we go to the __parent__ link, first __proto__ chain is considered):

Figure 10. A "with-augmented" scope chain.

Figure 10. A “with-augmented” scope chain.

Notice, that not in all implementations the global object inherits from the Object.prototype . The behavior described on the figure (with referencing “non-defined” variable x from the global context) may be tested e.g. in SpiderMonkey.

Until all parent variable objects exist, there is nothing special in getting parent data from the inner function — we just traverse through the scope chain resolving (searching) needed variable. However, as we mentioned above, after a context ends, all its state and it itself are destroyed . At the same time an inner function may be returned from the parent function. Moreover, this returned function may be later activated from another context. What will be with such an activation if a context of some free variable is already “gone”? In the general theory, a concept which helps to solve this issue is called a (lexical) closure , which in ECMAScript is directly related with a scope chain concept.

In ECMAScript, functions are the first-class objects. This term means that functions may be passed as arguments to other functions (in such case they are called “funargs” , short from “functional arguments”). Functions which receive “funargs” are called higher-order functions or, closer to mathematics, operators . Also functions may be returned from other functions. Functions which return other functions are called function valued functions (or functions with functional value ).

There are two conceptual problems related with “funargs” and “functional values”. And these two sub-problems are generalized in one which is called a “Funarg problem” (or “A problem of a functional argument”). And exactly to solve the complete “funarg problem” , the concept of closures was invented. Let’s describe in more detail these two sub-problems (we’ll see that both of them are solved in ECMAScript using a mentioned on figures [[Scope]] property of a function).

First subtype of the “funarg problem” is an “upward funarg problem” . It appears when a function is returned “up” (to the outside) from another function and uses already mentioned above free variables . To be able access variables of the parent context even after the parent context ends , the inner function at creation moment saves in it’s [[Scope]] property parent’s scope chain . Then when the function is activated , the scope chain of its context is formed as combination of the activation object and this [[Scope]] property (actually, what we’ve just seen above on figures):

Scope chain = Activation object + [[Scope]]

Notice again the main thing — exactly at creation moment — a function saves parent’s scope chain, because exactly this saved scope chain will be used for variables lookup then in further calls of the function.

function foo() {
   var x = 10;
   return function bar() {
     console.log(x);
   };
}
 
// "foo" returns also a function
// and this returned function uses
// free variable "x"
 
var returnedFunction = foo();
 
// global variable "x"
var x = 20;
 
// execution of the returned function
returnedFunction(); // 10, but not 20

This style of scope is called the static (or lexical) scope . We see that the variable x is found in the saved [[Scope]] of returned bar function. In general theory, there is also a dynamic scope when the variable x in the example above would be resolved as 20 , but not 10 . However, dynamic scope is not used in ECMAScript.

The second part of the “funarg problem” is a “downward funarg problem” . In this case a parent context may exist, but may be an ambiguity with resolving an identifier. The problem is: from which scope a value of an identifier should be used — statically saved at a function’s creation or dynamically formed at execution (i.e. a scope of a caller )? To avoid this ambiguity and to form a closure, a static scope is decided to be used:

// global "x"
var x = 10;
 
// global function
function foo() {
   console.log(x);
}
 
( function (funArg) {
 
   // local "x"
   var x = 20;
 
   // there is no ambiguity,
   // because we use global "x",
   // which was statically saved in
   // [[Scope]] of the "foo" function,
   // but not the "x" of the caller's scope,
   // which activates the "funArg"
 
   funArg(); // 10, but not 20
 
})(foo); // pass "down" foo as a "funarg"

We may conclude that a static scope is an obligatory requirement to have closures in a language. However, some languages may provided combination of dynamic and static scopes, allowing a programmer to choose — what to closure and what do not. Since in ECMAScript only a static scope is used (i.e. we have solutions for both subtypes of the “funarg problem”), the conclusion is: ECMAScript has complete support of closures , which technically are implemented using [[Scope]] property of functions. Now we may give a correct definition of a closure:

A closure is a combination of a code block (in ECMAScript this is a function) and statically/lexically saved all parent scopes. Thus, via these saved scopes a function may easily refer free variables.

Notice, that since every (normal) function saves [[Scope]] at creation, theoretically, all functions in ECMAScript are closures .

Another important thing to note, that several functions may have the same parent scope (it’s quite a normal situation when e.g. we have two inner/global functions). In this case variables stored in the [[Scope]] property are shared between all functions having the same parent scope chain. Changes of variables made by one closure are reflected on reading these variables in another closure:

function baz() {
   var x = 1;
   return {
     foo: function foo() { return ++x; },
     bar: function bar() { return --x; }
   };
}
 
var closures = baz();
 
console.log(
   closures.foo(), // 2
   closures.bar()  // 1
);

This code may be illustrated with the following figure:

Figure 11. A shared [[Scope]].

Figure 11. A shared [[Scope]].

Exactly with this feature confusion with creating several functions in a loop is related. Using a loop counter inside created functions, some programmers often get unexpected results when all functions have the same value of a counter inside a function. Now it should be clear why it is so — because all these functions have the same [[Scope]] where the loop counter has the last assigned value.

var data = [];
 
for ( var k = 0; k < 3; k++) {
   data[k] = function () {
     alert(k);
   };
}
 
data[0](); // 3, but not 0
data[1](); // 3, but not 1
data[2](); // 3, but not 2

There are several techniques which may solve this issue. One of the techniques is to provide an additional object in the scope chain — e.g. using additional function:

var data = [];
 
for ( var k = 0; k < 3; k++) {
   data[k] = ( function (x) {
     return function () {
       alert(x);
     };
   })(k); // pass "k" value
}
 
// now it is correct
data[0](); // 0
data[1](); // 1
data[2](); // 2

Those who interested deeper in theory of closures and their practical application, may find additional information in the Chapter 6. Closures . And to get more information about a scope chain, take a look on the same name Chapter 4. Scope chain .

And we’re moving to the next section, considering the last property of an execution context. This is concept of a this value.

A this value is a special object which is related with the execution context. Therefore, it may be named as a context object (i.e. an object in which context the execution context is activated ).

Any object may be used as this value of the context. I’d like to clarify again the misconception raises sometimes in some descriptions related with execution context of ECMAScript and in particular this value. Often, a this value, incorrectly , is described as a property of the variable object. The recent such a mistake was e.g. in this book (though, the mentioned chapter of the book is quite good). Remember once again:

a this value is a property of the execution context , but not a property of the variable object.

This feature is very important, because in contrary to variables , this value never participates in identifier resolution process . I.e. when accessing this in a code, its value is taken directly from the execution context and without any scope chain lookup . The value of this is determinate only once when entering the context .

By the way, in contrast with ECMAScript, e.g. Python has its self argument of methods as a simple variable which is resolved the same and may be even changed during the execution to another value. In ECMAScript it is not possible to assign a new value to this , because, repeat, it’s not a variable and is not placed in the variable object.

In the global context, a this value is the global object itself (that means, this value here equals to variable object ):

var x = 10;
 
console.log(
   x, // 10
   this .x, // 10
   window .x // 10
);

In case of a function context, this value in every single function call may be different . Here this value is provided by the caller via the form of a call expression (i.e. the way of how a function is activated). For example, the function foo below is a callee , being called from the global context, which is a caller . Let’s see on the example, how for the same code of a function, this value in different calls (different ways of the function activation) is provided differently by the caller:

// the code of the "foo" function
// never changes, but the "this" value
// differs in every activation
 
function foo() {
   alert( this );
}
 
// caller activates "foo" (callee) and
// provides "this" for the callee
 
foo(); // global object
foo.prototype.constructor(); // foo.prototype
 
var bar = {
   baz: foo
};
 
bar.baz(); // bar
 
(bar.baz)(); // also bar
(bar.baz = bar.baz)(); // but here is global object
(bar.baz, bar.baz)(); // also global object
( false || bar.baz)(); // also global object
 
var otherFoo = bar.baz;
otherFoo(); // again global object

To consider deeply why (and that is more essential — how ) this value may change in every function call, you may read Chapter 3. This where all mentioned above cases are discussed in detail.

At this step we finish this brief overview. Though, it turned out to not so “brief” ;) However, the whole explanation of all these topics requires a complete book. We though didn’t touch two major topics: functions (and the difference between some types of functions, e.g. function declaration and function expression ) and the evaluation strategy used in ECMAScript. Both topics may be found in the appropriate chapters of ES3 series: Chapter 5. Functions and Chapter 8. Evaluation strategy .

If you have comments, questions or additions, I’ll be glad to discuss them in comments.

Good luck in studying ECMAScript!

Written by: Dmitry A. Soshnikov
Published on: 2010-09-02

你可能感兴趣的:(JavaScript)