webpack源码之js代码压缩

基于webpack 4.x.x的版本
由于tapable类的重写，所以4.x版本和3.x版本在插件机制上有很打区别
如果你对tapable对象不熟悉，可以假装他是一个事件订阅/发布系统，虽然tapable没那么简单就是了。

webpack中的两个重要对象

compiler对象

其实就是webpack本身暴露出来给我们使用的一个对象。经常我们在自定义node启动webpack中的方法就可以得到compiler对象，当然一般来说该对象全局唯一的，后续再有compiler对象的创建，就是childComplier
例如：

  const compiler = webpack(config,[callback]);

这样我们就可以得到compiler实例了，callback是可选的意思。

compilation实例

compilation实例是每次编译的时候就会获得的对象，在watch模式下，每次watch都会获得一个新的compilation实例，所以comppiler是每次启动webpack获得的全局唯一对象，而compilation则每次更新都会重新获取一遍。获取compilation方法如下：

// webpack4.x版本
compiler.hooks.compilation.tap(name,function(compilation){
   // 恭喜你获得compilation对象 
})

// webpack3.x版本
compiler.plugin('compilation',function(compilation){
   // 恭喜你获得compilation对象 
})

compilation.seal方法

webpack 中处理资源生成chunks和优化压缩主要是在webpack的seal阶段，由于我们讲的是资源的压缩，所以我们主要看seal中关于压缩的代码在哪一块。
seal的代码在compilation.js中的seal方法中，重点我们要关注的方法如下：

/**
 * @param {Callback} callback signals when the seal method is finishes
 * @returns {void}
 */
seal(callback) {
    //...别的代码
    this.hooks.additionalAssets.callAsync(err => {
      if (err) {
          return callback(err);
      }
       //......
       // 这边就是我们要关注的js代码压缩的地方了，webpack本身不做压缩这个功能，具体的功能由插件负责完成。
       this.hooks.optimizeChunkAssets.callAsync(this.chunks, err => {
          // 这边的意思是我的seal方法调用完成了 
          return this.hooks.afterSeal.callAsync(callback);
          });
      });
    });
}

然后具体的js代码压缩的方式在uglifyjs-wepack-plugin中，如下：

 compilation.hooks.optimizeChunkAssets.tapAsync(
     plugin,
     optimizeFn.bind(this, compilation)
 );

uglifyjs-wepack-plugin

前一部分，我们只是简单对uglifyjs-wepack-plugin的源码开了个头，不过为什么我分析webpack的js压缩流就突然要研究uglifyjs-wepack-plugin这个三方包了呢，人生真是处处都是惊喜。
接下来我们看看uglifyjs-wepack-plugin中optimizeFn到底干了什么，uglifyjs-wepack-plugin源码传送门
首先我们看到第一段代码。

      const taskRunner = new TaskRunner({
         // 是否缓存结果
        cache: this.options.cache,
        // 是否多进程压缩
        parallel: this.options.parallel,
      });

taskRunner呢是一个多进程的任务执行系统，这个从名字就可以看出来，主要是来自于TaskRunner.js他也是uglifyjs-webpack-plugin的核心，taskRunner有个方法叫run,需要两个参数，第一个是tasks的对象数组，第二个是first-error类型的回调函数，表示任务执行运行完成，当然这里的任务主要是指压缩任务啦.

接下来一大串代码就是为了组织定义tasks这个参数是什么样子的。

      const processedAssets = new WeakSet();
      // 这段代码的主要目的就是组装tasks
      const tasks = [];

      const { chunkFilter } = this.options;
      // 根据相关条件筛选要压缩的chunks,跟主流程没太多关系。
      Array.from(chunks)
        .filter((chunk) => chunkFilter && chunkFilter(chunk))
        .reduce((acc, chunk) => acc.concat(chunk.files || []), [])
        .concat(compilation.additionalChunkAssets || [])
        .filter(ModuleFilenameHelpers.matchObject.bind(null, this.options))
        .forEach((file) => {
          let inputSourceMap;
          // compilation.assets其实是每个chunk生成的文件的最后结果
          const asset = compilation.assets[file];
          // 防止资源的重复压缩
          if (processedAssets.has(asset)) {
            return;
          }

          try {
            let input;
            // 是否需要压缩sourceMap，可以跳过不看。 
            if (this.options.sourceMap && asset.sourceAndMap) {
              const { source, map } = asset.sourceAndMap();

              input = source;

              if (UglifyJsPlugin.isSourceMap(map)) {
                inputSourceMap = map;
              } else {
                inputSourceMap = map;

                compilation.warnings.push(
                  new Error(`${file} contains invalid source map`)
                );
              }
            } else {
              // input资源就是代码没有被压缩之前的字符串的样子
              input = asset.source();
              inputSourceMap = null;
            }

            // Handling comment extraction
            let commentsFile = false;

            if (this.options.extractComments) {
              commentsFile =
                this.options.extractComments.filename || `${file}.LICENSE`;

              if (typeof commentsFile === 'function') {
                commentsFile = commentsFile(file);
              }
            }

            const task = {
              file,
              input,
              inputSourceMap,
              commentsFile,
              extractComments: this.options.extractComments,
              uglifyOptions: this.options.uglifyOptions,
              minify: this.options.minify,
            };

            if (this.options.cache) {
              const defaultCacheKeys = {
                'uglify-js': uglifyJsPackageJson.version,
                node_version: process.version,
                // eslint-disable-next-line global-require
                'uglifyjs-webpack-plugin': require('../package.json').version,
                'uglifyjs-webpack-plugin-options': this.options,
                hash: crypto
                  .createHash('md4')
                  .update(input)
                  .digest('hex'),
              };

              task.cacheKeys = this.options.cacheKeys(defaultCacheKeys, file);
            }

            tasks.push(task);
          } catch (error) {
            compilation.errors.push(
              UglifyJsPlugin.buildError(
                error,
                file,
                UglifyJsPlugin.buildSourceMap(inputSourceMap),
                new RequestShortener(compiler.context)
              )
            );
          }
        });

代码真长，也是真丑，咳咳但是我们还是要继续看，直接看重点，

    const task = {
      file,
      input,
      inputSourceMap,
      commentsFile,
      extractComments: this.options.extractComments,
      uglifyOptions: this.options.uglifyOptions,
      minify: this.options.minify,
    };

这里有几个重要属性，其中file就是要生成的文件名,input就是文件中的字符串的内容，inputSourceMap就是对应的sourcemap文件内容。

好了，现在我们的tasks已经组装好了，还记得前面的taskRunner我们就可以愉快执行taskRunner的run方法来压缩了。

  taskRunner.run(tasks, (tasksError, results) => {
    // 无尽的代码
  })

TaskRunner.js

为了降低大脑负荷，我们考虑，假设taskRunner.js中没有缓存和多进程的情况。
于是整体的taskRunner.run里的代码可以简化成以下这个样子。

run(tasks, callback) {
    if (!tasks.length) {
      callback(null, []);
      return;
    }
    this.boundWorkers = (options, cb) => {
        try {
            // 压缩js代码的压缩
            cb(null, minify(options));
        } catch (error) {
            cb(error);
        }
    };
    // 所有任务数量
    let toRun = tasks.length;
    // 结果集，存储所有文件的结果 
    const results = [];
    const step = (index, data) => {
      toRun -= 1;
      results[index] = data;
      // 所有js代码的压缩都完成了，就
      if (!toRun) {
        callback(null, results);
      }
    };
    // 同时执行所有的js代码的压缩程序
    tasks.forEach((task, index) => {
      const enqueue = () => {
        this.boundWorkers(task, (error, data) => {
          const result = error ? { error } : data;
          const done = () => step(index, result);
          done();
        });
      };

    enqueue();
    });
  }

这边大概的流程就是，我们有一个专门执行js代码压缩的程序任务叫boundWorkers，然后有一个存储结果集的results，然后我们异步并行执行压缩js任务，注：这边并不是多进程js压缩。等所有压缩js的任务执行完了，就执行done函数，done函数的主要作用就是闭包index，可以使得到的结果按照顺序插入results里，这点就很想promise.all了，所以如果自己实现一个promise.all的话就可以考虑这个哟。
等所有任务都执行完了，就调用run的callbcak，也就UglifyJsPlugin的optimizeFn中的taskRunner的回调，而该回调的主要作用就是把获得的results放到compilation的assets上，然后再执行optimizeChunkAssets的callbcak，我们就继续回到了webpack的seal流程中啦。接下来我们继续看看minify.js中到底做了什么压缩操作。

minify.js

来来来，我们先不管别的，把minify的代码主要流程抽取一下，抽取之后就变成这样了。

import uglify from 'uglify-js';
 // 复制uglify的选项，用于uglify.minify
const buildUglifyOptions = ()=>{/*.......*/}

const minify = (options) => {
  const {
    file,
    input,
    inputSourceMap,
    extractComments,
    minify: minifyFn,
  } = options;
 // 如果自己定义了minify的函数，也就是压缩函数，那就调用它
  if (minifyFn) {
    return minifyFn({ [file]: input }, inputSourceMap);
  }
 // 获得最终的uglify选项
  const uglifyOptions = buildUglifyOptions(options.uglifyOptions);
 // 获得压缩之后的结果
  const { error, map, code, warnings } = uglify.minify(
    { [file]: input },
    uglifyOptions
  );

  return { error, map, code, warnings, extractedComments };
};

export default minify;

以上代码的核心在这一段

  const { error, map, code, warnings } = uglify.minify(
    { [file]: input },
    uglifyOptions
  );

这样看来，所有的所有的压缩都是uglify.minify操作的，而uglify又是来自于uglify-js，好了，我们追到现在有点追不动了。不过我们可以试试uglify-js这个三方包，比如这个样子：

  const input = `var name = 123;
                var age = "123"; 
                function say(name,age){return name+age};
                say(name,age);`

  const { code } = uglify.minify({
      'index.js':input
  });

  console.log(code)
  // var name=123,age="123";function say(a,e){return a+e}say(name,age);

到现在我们的已经把整个流程梳理的差不多了，我们可以稍微尝试(臆想)着自己写一个压缩程序的demo，只实现部分功能。

让我们尝试写一段压缩程序

Javascript混淆器的原理并不复杂，其核心是对目标代码进行AST Transformation（抽象语法树改写），我们依靠现有的Javascript的AST Parser库，能比较容易的实现自己的Javascript混淆器。以下我们借助 acorn来实现一个if语句片段的改写。
假设我们存在这么一个代码片段：

for(var i = 0; i < 100; i++){
    if(i % 2 == 0){
        console.log("foo");
    }else{
        console.log("bar");
    }
}

那我们就这样操作一下：

const {Parser} = require("acorn")
const MyUglify = Parser.extend();

const codeStr = `
for(var i = 0; i < 100; i++){
    if(i % 2 == 0){
        console.log("foo");
    }else{
        console.log("bar");
    }
}
`;

function transform(node){
    const { type } = node;
    switch(type){
        case 'Program': 
        case 'BlockStatement':{
            const { body } = node;
            return body.map(transform).join('');
        }
        case 'ForStatement':{
            const results = ['for', '('];
            const { init, test, update, body } = node;
            results.push(transform(init), ';');
            results.push(transform(test), ';');
            results.push(transform(update), ')');
            results.push(transform(body));
            return results.join('');
        }
        case 'VariableDeclaration': {
            const results = [];
            const { kind, declarations } = node;
            results.push(kind, ' ', declarations.map(transform));
            return results.join('');
        }
        case 'VariableDeclarator':{
            const {id, init} = node;
            return id.name + '=' + init.raw;
        }
        case 'UpdateExpression': {
            const {argument, operator} = node;
            return argument.name + operator;
        }
        case 'BinaryExpression': {
            const {left, operator, right} = node;
            return transform(left) + operator + transform(right);
        }
        case 'IfStatement': {
            const results = [];
            const { test, consequent, alternate } = node;
            results.push(transform(test), '?');
            results.push(transform(consequent), ":");
            results.push(transform(alternate));
            return results.join('');
        }
        case 'MemberExpression':{
            const {object, property} = node;
            return object.name + '.' + property.name;
        }
        case 'CallExpression': {
            const results = [];
            const { callee, arguments } = node;
            results.push(transform(callee), '(');
            results.push(arguments.map(transform).join(','), ')');
            return results.join('');
        }
        case 'ExpressionStatement':{
            return transform(node.expression);
        }
        case 'Literal':
            return node.raw;
        case 'Identifier':
            return node.name;
        default:
            throw new Error('unimplemented operations');
    }
}

const ast = MyUglify.parse(codeStr);
console.log(transform(ast)); // 与UglifyJS输出一致

当然，我们上面的实现只是一个简单的举例，实际上的混淆器实现会比当前的实现复杂得多，需要考虑非常多的语法上的细节，此处仅抛砖引玉供大家参考学习。

压缩流程总结

执行seal事件阶段
执行compilation.hooks.optimizeChunkAssets
执行uglifyjs-webpack-plugin
执行optimizeFn
执行runner.runTasks
执行runner.runTasks的callback
执行optimizeFn的callback
执行compilation.hooks.optimizeChunkAssets的callback

如果考虑到多进程和缓存的使用的话，流程图应该长下面这个样子。

流程图

参(chao)考(xi)资料

webpack原理
webpack群侠传（七）：代码压缩和缓存
前端核心代码保护技术面面观