前言

在此系列文章的第一篇，我们介绍了 Vuepress 如何让 Markdown 支持 Vue 组件的，但没有提到非 Vue 组件的其他部分如何被解析。

今天，我们就来看看 Vuepress 是如何利用 markdown-it 来解析 markdown 代码的。

markdown-it 简介

markdown-it 是一个辅助解析 markdown 的库，可以完成从 # test 到

`test`

的转换。

它同时支持浏览器环境和 Node 环境，本质上和 babel 类似，不同之处在于，babel 解析的是 JavaScript。

说到解析，实际上称为解释(interpreter)或者编译(compiler)更为令人熟悉。总归绕不开词法分析和语法分析这两个过程。

markdown-it 官方给了一个在线示例，可以让我们直观地得到 markdown 经过解析后的结果。比如还是拿 # test 举例，会得到如下结果：

[
  {
    "type": "heading_open",
    "tag": "h1",
    "attrs": null,
    "map": [
      0,
      1
    ],
    "nesting": 1,
    "level": 0,
    "children": null,
    "content": "",
    "markup": "#",
    "info": "",
    "meta": null,
    "block": true,
    "hidden": false
  },
  {
    "type": "inline",
    "tag": "",
    "attrs": null,
    "map": [
      0,
      1
    ],
    "nesting": 0,
    "level": 1,
    "children": [
      {
        "type": "text",
        "tag": "",
        "attrs": null,
        "map": null,
        "nesting": 0,
        "level": 0,
        "children": null,
        "content": "test",
        "markup": "",
        "info": "",
        "meta": null,
        "block": false,
        "hidden": false
      }
    ],
    "content": "test",
    "markup": "",
    "info": "",
    "meta": null,
    "block": true,
    "hidden": false
  },
  {
    "type": "heading_close",
    "tag": "h1",
    "attrs": null,
    "map": null,
    "nesting": -1,
    "level": 0,
    "children": null,
    "content": "",
    "markup": "#",
    "info": "",
    "meta": null,
    "block": true,
    "hidden": false
  }
]

词法分析，说白了，就是把一段代码拆分成若干个基本单元(token)，这些基本单元又可以进一步分类。这个过程称之为 tokenizes。

语法分析，其实就是将最终要生成的代码用一颗树(ast)来表示，其中每个节点都是我们通过词法分析得到的 token 对象。显而易见，我们得到了一颗这样的 AST：

ast

我们也可以手动执行下面代码得到同样的结果：

const md = new MarkdownIt()
let tokens = md.parse('# test')
console.log(tokens)

主要 API 介绍

模式

markdown-it 提供了三种模式：commonmark、default、zero。分别对应最严格、GFM、最宽松的解析模式。

解析

markdown-it 的解析规则大体上分为块(block)和内联(inline)两种。具体可体现为 MarkdownIt.block 对应的是解析块规则的 ParserBlock， MarkdownIt.inline 对应的是解析内联规则的 ParserInline，MarkdownIt.renderer.render 和 MarkdownIt.renderer.renderInline 分别对应按照块规则和内联规则生成 HTML 代码。

规则

在 MarkdownIt.renderer 中有一个特殊的属性：rules，它代表着对于 token 们的渲染规则，可以被使用者更新或扩展：

var md = require('markdown-it')();

md.renderer.rules.strong_open  = function () { return ''; };
md.renderer.rules.strong_close = function () { return ''; };

var result = md.renderInline(...);

比如这段代码就更新了渲染 strong_open 和 strong_close 这两种 token 的规则。

插件系统

markdown-it 官方说过：

We do a markdown parser. It should keep the "markdown spirit". Other things should be kept separate, in plugins, for example. We have no clear criteria, sorry.
Probably, you will find CommonMark forum a useful read to understand us better.

一言以蔽之，就是 markdown-it 只做纯粹的 markdown 解析，想要更多的功能你得自己写插件。

所以，他们提供了一个 API：MarkdownIt.use

它可以将指定的插件加载到当前的解析器实例中：

var iterator = require('markdown-it-for-inline');
var md = require('markdown-it')()
            .use(iterator, 'foo_replace', 'text', function (tokens, idx) {
              tokens[idx].content = tokens[idx].content.replace(/foo/g, 'bar');
            });

这段示例代码就将 markdown 代码中的 foo 全部替换成了 bar。

vuepress 中的应用

vuepress 借助了 markdown-it 的诸多社区插件，如高亮代码、代码块包裹、emoji 等，同时也自行编写了很多 markdown-it 插件，如识别 vue 组件、内外链区分渲染等。

入口

源码
主要做了下面五件事：

使用社区插件，如 emoji 识别、锚点、toc。
使用自定义插件，稍后详细说明。
使用 markdown-it-chain 支持链式调用 markdown-it，类似我在第二篇文章提到的 webpack-chain。
参数可以传 beforeInstantiate 和 afterInstantiate 这两个钩子，这样方便暴露 markdown-it 实例给外部。
dataReturnable 自定义 render：

module.exports.dataReturnable = function dataReturnable (md) {
  // override render to allow custom plugins return data
  const render = md.render
  md.render = (...args) => {
    md.__data = {}
    const html = render.call(md, ...args)
    return {
      html,
      data: md.__data
    }
  }
}

相当于让 __data 作为一个全局变量了，存储各个插件要用到的数据。

识别 vue 组件

源码

就做了一件事：替换默认的 htmlBlock 规则，这样就可以在根级别使用自定义的 vue 组件了。

module.exports = md => {
  md.block.ruler.at('html_block', htmlBlock)
}

这个 htmlBlock 函数和原生的 markdown-it 的 html_block 关键区别在哪呢？

答案是在 HTML_SEQUENCES 这个正则数组里添加了两个元素：

// PascalCase Components
[/^<[A-Z]/, />/, true],
// custom elements with hyphens
[/^<\w+\-/, />/, true],

很明显，这就是用来匹配帕斯卡写法（如

读 VuePress（三）使用 markdown-it 解析 markdown 代码

前言

markdown-it 简介

`test`

主要 API 介绍

模式

解析

规则

插件系统

更多信息

vuepress 中的应用

入口

识别 vue 组件

内容块

高亮代码

高亮代码行

脚本提升

行号

内外链区分

代码块包裹

锚点非 ascii 字符处理

代码片段引入

结语

你可能感兴趣的:(读 VuePress（三）使用 markdown-it 解析 markdown 代码)