Monarch 是 Monaco Editor 自带的一个语法高亮库,可以用类似 JSON
的语法来实现自定义语言的语法高亮功能。本文将通过编写一个简单的mips汇编语言的自定义语法高亮,来介绍 Monarch 的使用。
1. 初始化
首先需要定义一门语言,在此我们指定语言的名字叫 asm
。
// Register a new language
monaco.languages.register({ id: "asm", ignoreCase: false });
monaco 官方文档如下,
### register
register(language: ILanguageExtensionPoint): void
Defined in monaco.d.ts:4659
Register information about a new language.
#### Parameters
* language: ILanguageExtensionPoint
#### Returns void
其中 ILanguageExtensionPoint
是以下 Object
,
{
aliases?: string[],
configuration?: Uri,
extensions?: string[], // 源代码文件拓展名
filenamePatterns?: string[],
filenames?: string[],
firstLine?: string,
id: string, // 语言的名字
mimetypes?: string[]
}
2. Monarch Tokens Provider
接下来需要注册该语言的标识解释器,在此我们设置该语言是大小写敏感的,并且有一个 tokenizer
。
// Register a tokens provider for the language
monaco.languages.setMonarchTokensProvider("asm", {
ignoreCase: false,
tokenizer: {...}
}
Tokenizer
官方文档中有以下描述
(object with states) This defines the tokenization rules. The tokenizer attribute describes how lexical analysis takes place, and how the input is divided into tokens. Each token is given a CSS class name which is used to render each token in the editor.
即是将源代码转化为各个标识符(关键字、字符串、注释)的规则。具体而言, tokenizer
描述了一系列 state
和其规则,可以看成是一个语法解析状态机,而每一条规则描述了该 state
的匹配规则、行为action
、下一状态 next
。
在 https://microsoft.github.io/monaco-editor/monarch.html 中有很多样例,这里不具体讲解各种配置的意义,下面直接举例 asm
语言的 tokenizer
。
话不多说上代码,最终的结果如下,
{
storage_type_kw: /\.(ascii|asciiz|byte|data|double|float|half|kdata|ktext|space|text|word|set\s*(noat|at|noreorder|reorder))\b/,
function_normal: ["abs.d", "abs.s", "add", "add.d", "add.s", ..., "xor", "xori"],
function_pseudo: ["mul", "abs", "div", "divu", ..., "sd", "ush", "usw", "move", "mfc1.d", "l.d", "l.s", "s.d", "s.s"],
tokenizer: {
root: [
[/^\s*?/, "line.line", "@line_pre"],
{ include: "@normal" }
],
normal: [
[/#.*$/, "comment", "@popall"],
[/"/, { token: "string.quote", bracket: "@open", next: "@string" }],
[/[\w\.\-]+/, {
cases: {
"-?\\d+": { token: "number", next: "@popall" },
"-?\\d+\\.\\d+": { token: "number.float", next: "@popall" },
"0[xX]([0-9a-fA-F]*)": { token: "number.hex", next: "@popall" },
"0[bB]([01]*)": { token: "number.binary", next: "@popall" },
"@default": { token: "source", next: "@popall" },
"@eos": { token: "line.line", next: "@popall" }
}
}],
{ include: "register" }
],
line_pre: [
[/([a-zA-Z_]\w*):/, "tag.label.$1", "@line_fun"],
{ include: "@line_fun" },
{ include: "@normal" },
],
line_fun: [
[/[a-z][\w\.]*/, {
cases: {
"@function_normal": { token: "function.normal.$0", next: "@popall" },
"@function_pseudo": { token: "function.pseudo.$0", next: "@popall" },
"@default": { token: "source", next: "@popall" },
"@eos": { token: "line.line", next: "@popall" }
}
}],
[/@storage_type_kw/, "constructor.storage.type", "@popall"],
[/\.(align|extern|globl)\b/, "constructor.storage.modifier", "@popall"],
{ include: "@normal" },
],
register: [
[/(\$)(0|[2-9]|1[0-9]|2[0-589]|3[0-1])\b/, "variable.register.by-number", "@popall"],
[/(\$)(zero|v[01]|a[0-3]|t[0-9]|s[0-7]|gp|sp|fp|ra)\b/, "variable.register.by-name", "@popall"],
[/(\$)(at|k[01]|1|2[67])\b/, "variable.register.reserved", "@popall"],
[/(\$)f([0-9]|1[0-9]|2[0-9]|3[0-1])\b/, "variable.register.floating-point", "@popall"]
],
string: [
[/[^\\"&]+/, "string"],
{ include: "@string_common" },
[/"/, { token: 'string.quote', bracket: '@close', next: '@popall' }]
],
string_common: [
[/\\[rnt\\']/, "string.escape"],
[/&\w+;/, 'string.escape'],
[/[\\&]/, 'string']
]
}
}
其中规则的入口是 tokenizer.root
,与tokenizer
同级的是关键字表,tokenizer
的子元素是规则表。
include
包含 tokenizer
下其它的规则,例如,
root: [ { include: "@normal" } ]
Inspecting Tokens
Monaco provides an Inspect Tokens
tool in browsers to help identify the tokens parsed from source code.
To activate:
- Press
F1
while focused on a Monaco instance. (或者右键
-Command Palette
) - Trigger the
Developer: Inspect Tokens
option.
This will show a display over the currently selected token for its language, token type, basic font style and colors, and selector you can target in your editor themes.
3. Theme
4. Completion Item Provider
[To be continued]