1、AST
抽象语法树(abstract syntax code,AST)是源代码的抽象语法结构的树状表示,树上的每个节点都表示源代码中的一种结构,这所以说是抽象的,是因为抽象语法树并不会表示出真实语法出现的每一个细节,比如说,嵌套括号被隐含在树的结构中,并没有以节点的形式呈现。
我们将源代码转化为AST后,可以对AST做很多的操作,包括一些你想不到的操作,这些操作实现了各种各样形形色色的功能,给你带进一个不一样的世界。懂编译原理确实可以为所欲为。
JavaScript解析器极简Demo介绍:the-super-tiny-compiler
- Most compilers break down into three primary stages: Parsing, Transformation, and Code Generation
(1) Parsing is taking raw code and turning it into a more abstract representation of the code.
(2) Transformation takes this abstract representation and manipulates to do whatever the compiler wants it to.
(3) Code Generation takes the transformed representation of the code and turns it into new code.
Parsing
Parsing typically gets broken down into two phases: Lexical Analysis and Syntactic Analysis.
(1) Lexical Analysis takes the raw code and splits it apart into these things called tokens by a thing called a tokenizer (or lexer).
(2) Syntactic Analysis takes the tokens and reformats them into a representation that describes each part of the syntax and their relation to one another. This is known as an intermediate representation or Abstract Syntax Tree.
An Abstract Syntax Tree, or AST for short, is a deeply nested object that represents code in a way that is both easy to work with and tells us a lot of information.
Transformation:
- The next type of stage for a compiler is transformation. Again, this just takes the AST from the last step and makes changes to it. It can manipulate the AST in the same language or it can translate it into an entirely new language.
- When transforming the AST we can manipulate nodes by adding/removing/replacing properties, we can add new nodes, remove nodes, or we could leave the existing AST alone and create an entirely new one based on it.
- Since we’re targeting a new language, we’re going to focus on creating an entirely new AST that is specific to the target language.
AST tree demo:
* {
* type: 'Program',
* body: [{
* type: 'CallExpression',
* name: 'add',
* params: [{
* type: 'NumberLiteral',
* value: '2'
* }, {
* type: 'CallExpression',
* name: 'subtract',
* params: [{
* type: 'NumberLiteral',
* value: '4'
* }, {
* type: 'NumberLiteral',
* value: '2'
* }]
* }]
* }]
* }
Code Generation
- The final phase of a compiler is code generation. Sometimes compilers will do things that overlap with transformation, but for the most part code generation just means take our AST and string-ify code back out.
- Code generators work several different ways, some compilers will reuse the tokens from earlier, others will have created a separate representation of the code so that they can print nodes linearly, but from what I can tell most will use the same AST we just created, which is what we’re going to focus on.
2、常见JavaScript解析器:
Acorn:A tiny, fast JavaScript parser written in JavaScript.
Babylon: A JavaScript parser used in Babel, Heavily based on acorn and acorn-jsx.
Acorn库阐述:
(1)接口:
parse(input, options)
is the main interface to the library. The return value will be an abstract syntax tree object as specified by the ESTree spec.
let acorn = require("acorn")
console.log(acorn.parse("1 + 1", {ecmaVersion: 2020}))
输出:
{
"type": "Program",
"start": 0,
"end": 9,
"body": [
{
"type": "VariableDeclaration",
"start": 0,
"end": 9,
"declarations": [
{
"type": "VariableDeclarator",
"start": 4,
"end": 9,
"id": {
"type": "Identifier",
"start": 4,
"end": 5,
"name": "a"
},
"init": {
"type": "Literal",
"start": 8,
"end": 9,
"value": 3,
"raw": "3"
}
}
],
"kind": "let"
}
],
"sourceType": "script"
}
(2)acron库源码
熟悉如下JS全部关键字、保留字,深入了解这门语言提供的全部功能。
// Reserved word lists for various dialects of the language
var reservedWords = {
3: "abstract boolean byte char class double enum export extends final float goto implements import int interface long native package private protected public short static super synchronized throws transient volatile",
5: "class enum extends super const export import",
6: "enum",
strict: "implements interface let package private protected public static yield",
strictBind: "eval arguments"
};
// And the keywords
var ecma5AndLessKeywords = "break case catch continue debugger default do else finally for function if return switch throw try var while with null true false instanceof typeof void delete new in this";
var keywords = {
5: ecma5AndLessKeywords,
"5module": ecma5AndLessKeywords + " export import",
6: ecma5AndLessKeywords + " const class extends export import super"
};
var keywordRelationalOperator = /^in(stanceof)?$/;