[Monarch] asm/mips 语法高亮开发心得 - Monaco Editor

MonarchMonaco Editor 自带的一个语法高亮库,可以用类似 JSON 的语法来实现自定义语言的语法高亮功能。本文将通过编写一个简单的mips汇编语言的自定义语法高亮,来介绍 Monarch 的使用。

1. 初始化

首先需要定义一门语言,在此我们指定语言的名字叫 asm

// Register a new language
monaco.languages.register({ id: "asm", ignoreCase: false });

monaco 官方文档如下,

### register
register(language: ILanguageExtensionPoint): void

Defined in monaco.d.ts:4659
Register information about a new language.

#### Parameters
* language: ILanguageExtensionPoint

#### Returns void

其中 ILanguageExtensionPoint 是以下 Object

{
    aliases?: string[],
    configuration?: Uri,
    extensions?: string[], // 源代码文件拓展名
    filenamePatterns?: string[],
    filenames?: string[],
    firstLine?: string,
    id: string, // 语言的名字
    mimetypes?: string[]
}

2. Monarch Tokens Provider

接下来需要注册该语言的标识解释器,在此我们设置该语言是大小写敏感的,并且有一个 tokenizer

// Register a tokens provider for the language
monaco.languages.setMonarchTokensProvider("asm", {
    ignoreCase: false,
    tokenizer: {...}
}

Tokenizer

官方文档中有以下描述

(object with states) This defines the tokenization rules. The tokenizer attribute describes how lexical analysis takes place, and how the input is divided into tokens. Each token is given a CSS class name which is used to render each token in the editor.

即是将源代码转化为各个标识符(关键字、字符串、注释)的规则。具体而言, tokenizer 描述了一系列 state 和其规则,可以看成是一个语法解析状态机,而每一条规则描述了该 state 的匹配规则、行为action、下一状态 next

在 https://microsoft.github.io/monaco-editor/monarch.html 中有很多样例,这里不具体讲解各种配置的意义,下面直接举例 asm 语言的 tokenizer

话不多说上代码,最终的结果如下,

{
storage_type_kw: /\.(ascii|asciiz|byte|data|double|float|half|kdata|ktext|space|text|word|set\s*(noat|at|noreorder|reorder))\b/,
function_normal: ["abs.d", "abs.s", "add", "add.d", "add.s", ..., "xor", "xori"],
function_pseudo: ["mul", "abs", "div", "divu", ..., "sd", "ush", "usw", "move", "mfc1.d", "l.d", "l.s", "s.d", "s.s"],

tokenizer: {
    root: [
        [/^\s*?/, "line.line", "@line_pre"],
        { include: "@normal" }
    ],
    normal: [
        [/#.*$/, "comment", "@popall"],
        [/"/, { token: "string.quote", bracket: "@open", next: "@string" }],
        [/[\w\.\-]+/, {
            cases: {
                "-?\\d+": { token: "number", next: "@popall" },
                "-?\\d+\\.\\d+": { token: "number.float", next: "@popall" },
                "0[xX]([0-9a-fA-F]*)": { token: "number.hex", next: "@popall" },
                "0[bB]([01]*)": { token: "number.binary", next: "@popall" },
                "@default": { token: "source", next: "@popall" },
                "@eos": { token: "line.line", next: "@popall" }
            }
        }],
        { include: "register" }
    ],

    line_pre: [
        [/([a-zA-Z_]\w*):/, "tag.label.$1", "@line_fun"],
        { include: "@line_fun" },
        { include: "@normal" },
    ],

    line_fun: [
        [/[a-z][\w\.]*/, {
            cases: {
                "@function_normal": { token: "function.normal.$0", next: "@popall" },
                "@function_pseudo": { token: "function.pseudo.$0", next: "@popall" },
                "@default": { token: "source", next: "@popall" },
                "@eos": { token: "line.line", next: "@popall" }
            }
        }],
        [/@storage_type_kw/, "constructor.storage.type", "@popall"],
        [/\.(align|extern|globl)\b/, "constructor.storage.modifier", "@popall"],
        { include: "@normal" },
    ],

    register: [
        [/(\$)(0|[2-9]|1[0-9]|2[0-589]|3[0-1])\b/, "variable.register.by-number", "@popall"],
        [/(\$)(zero|v[01]|a[0-3]|t[0-9]|s[0-7]|gp|sp|fp|ra)\b/, "variable.register.by-name", "@popall"],
        [/(\$)(at|k[01]|1|2[67])\b/, "variable.register.reserved", "@popall"],
        [/(\$)f([0-9]|1[0-9]|2[0-9]|3[0-1])\b/, "variable.register.floating-point", "@popall"]
    ],

    string: [
        [/[^\\"&]+/, "string"],
        { include: "@string_common" },
        [/"/, { token: 'string.quote', bracket: '@close', next: '@popall' }]
    ],

    string_common: [
        [/\\[rnt\\']/, "string.escape"],
        [/&\w+;/, 'string.escape'],
        [/[\\&]/, 'string']
    ]
}
}

其中规则的入口是 tokenizer.root ,与tokenizer同级的是关键字表tokenizer 的子元素是规则表

include

包含 tokenizer 下其它的规则,例如,

root: [ { include: "@normal" } ]

Inspecting Tokens

Monaco provides an Inspect Tokens tool in browsers to help identify the tokens parsed from source code.

To activate:

  • Press F1 while focused on a Monaco instance. (或者右键 - Command Palette)
  • Trigger the Developer: Inspect Tokens option.

This will show a display over the currently selected token for its language, token type, basic font style and colors, and selector you can target in your editor themes.

可以看出 beq 的标识是 function.normal.beq.asm

3. Theme

4. Completion Item Provider

[To be continued]

你可能感兴趣的:([Monarch] asm/mips 语法高亮开发心得 - Monaco Editor)