TensorFlow 2.2.0-rc0,这次更新让人惊奇!

AI编辑:我是小将

刚刚谷歌在TensorFlow 开发者峰会上发布了 TensorFlow 2.2 版,2.2版本有很多地方的更新,我觉得可能两点会让大家欣喜若狂:

1. 增加同步BN层

同步BN层tf.keras.layers.experimental.SyncBatchNormalization,这是分布式训练的好帮手,接口和原来的BatchNormalization层类似:

tf.keras.layers.experimental.SyncBatchNormalization(
    axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True,
    beta_initializer='zeros', gamma_initializer='ones',
    moving_mean_initializer='zeros', moving_variance_initializer='ones',
    beta_regularizer=None, gamma_regularizer=None, beta_constraint=None,
    gamma_constraint=None, renorm=False, renorm_clipping=None, renorm_momentum=0.99,
    trainable=True, adjustment=None, name=None, **kwargs
)

用法如下:

strategy = tf.distribute.MirroredStrategy()

with strategy.scope():
  model = tf.keras.Sequential()
  model.add(tf.keras.layers.Dense(16))

2. Model.fit可以自定义训练和测试逻辑

Model.fit支持Model.train_step接口改写,这样我们可以实现训练的自定义逻辑,具体请看:

  def train_step(self, data):
    """The logic for one training step.
    This method can be overridden to support custom training logic.
    This method is called by `Model._make_train_function`.
    This method should contain the mathemetical logic for one step of training.
    This typically includes the forward pass, loss calculation, backpropagation,
    and metric updates.
    Configuration details for *how* this logic is run (e.g. `tf.function` and
    `tf.distribute.Strategy` settings), should be left to
    `Model._make_train_function`, which can also be overridden.
    Arguments:
      data: A nested structure of `Tensor`s.
    Returns:
      A `dict` containing values that will be passed to
      `tf.keras.callbacks.CallbackList.on_train_batch_end`. Typically, the
      values of the `Model`'s metrics are returned. Example:
      `{'loss': 0.2, 'accuracy': 0.7}`.
    """
    # These are the only transformations `Model.fit` applies to user-input
    # data when a `tf.data.Dataset` is provided. These utilities will be exposed
    # publicly.
    data = data_adapter.expand_1d(data)
    x, y, sample_weight = data_adapter.unpack_x_y_sample_weight(data)

    with backprop.GradientTape() as tape:
      y_pred = self(x, training=True)
      loss = self.compiled_loss(
          y, y_pred, sample_weight, regularization_losses=self.losses)
    # For custom training steps, users can just write:
    #   trainable_variables = self.trainable_variables
    #   gradients = tape.gradient(loss, trainable_variables)
    #   self.optimizer.apply_gradients(zip(gradients, trainable_variables))
    # The _minimize call does a few extra steps unnecessary in most cases,
    # such as loss scaling and gradient clipping.
    _minimize(tape, self.optimizer, loss, self.trainable_variables)

    self.compiled_metrics.update_state(y, y_pred, sample_weight)
    return {m.name: m.result() for m in self.metrics}

其实这样带来的一个好处就是,我们就可以更加灵活使用Model.fit来训练自己的模型,当然Model还有Model.test_step和Model.predict_step来修改测试和预测逻辑,我觉得这个绝对对TFer有吸引力。

主要更新和改进如下

  • Replaced the scalar type for string tensors from std::string to tensorflow::tstring which is now ABI stable.

  • A new Profiler for TF 2 for CPU/GPU/TPU. It offers both device and host performance analysis, including input pipeline and TF Ops. Optimization advisory is provided whenever possible. Please see this tutorial for usage guidelines.

  • Export C++ functions to Python using pybind11 as opposed to SWIG as a part of our deprecation of swig efforts.

  • tf.distribute:

    • Update NVIDIA NCCL to 2.5.7-1 for better performance and performance tuning. Please see nccl developer guide for more information on this.

    • Support gradient allreduce in float16. See this example usage.

    • Experimental support of all reduce gradient packing to allow overlapping gradient aggregation with backward path computation.

    • Support added for global sync BatchNormalization by using the newly added tf.keras.layers.experimental.SyncBatchNormalization layer. This layer will sync BatchNormalization statistics every step across all replicas taking part in sync training.

    • Performance improvements for GPU multi-worker distributed training using tf.distribute.experimental.MultiWorkerMirroredStrategy

  • tf.keras:

    • You can now use custom training logic with Model.fit by overriding Model.train_step.

    • Easily write state-of-the-art training loops without worrying about all of the features Model.fit handles for you (distribution strategies, callbacks, data formats, looping logic, etc)

    • See the default Model.train_step for an example of what this function should look like

    • Same applies for validation and inference via Model.test_step and Model.predict_step

    • Model.fit major improvements:

    • The SavedModel format now supports all Keras built-in layers (including metrics, preprocessing layers, and stateful RNN layers)

  • tf.lite:

    • Enable TFLite experimental new converter by default.

  • XLA

    • XLA now builds and works on windows. All prebuilt packages come with XLA available.

    • XLA can be enabled for a tf.function with “compile or throw exception” semantics on CPU and GPU.

Breaking Changes

  • tf.keras:

    • In tf.keras.applications the name of the "top" layer has been standardized to "predictions". This is only a problem if your code relies on the exact name of the layer.

    • Huber loss function has been updated to be consistent with other Keras losses. It now computes mean over the last axis of per-sample losses before applying the reduction function.

  • AutoGraph no longer converts functions passed to tf.py_function, tf.py_func and tf.numpy_function.

  • Deprecating XLA_CPU and XLA_GPU devices with this release.

  • Increasing the minimum bazel version to build TF to 1.2.1 to use Bazel's cc_experimental_shared_library.

Known Caveats

  • MacOS binaries are not available on pypi at tensorflow-cpu project, but they are identical to the binaries in tensorflow project, since MacOS has no GPU.

Bug Fixes and Other Changes

  • tf.data:

    • Removed autotune_algorithm from experimental optimization options.

  • TF Core:

    • tf.constant always creates CPU tensors irrespective of the current device context.

    • Eager TensorHandles maintain a list of mirrors for any copies to local or remote devices. This avoids any redundant copies due to op execution.

    • For tf.Tensor & tf.Variable, .experimental_ref() is no longer experimental and is available as simply .ref().

    • Support matrix inverse and solves in pfor/vectorized_map.

    • Set as much partial shape as we can infer statically within the gradient impl of the gather op.

    • Gradient of tf.while_loop emits StatelessWhile op if cond and body functions are stateless. This allows multiple gradients while ops to run in parallel under distribution strategy.

    • Speed up GradientTape in eager mode by auto-generating list of op inputs/outputs which are unused and hence not cached for gradient functions.

    • Support back_prop=False in while_v2 but mark it as deprecated.

    • Improve error message when attempting to use None in data-dependent control flow.

    • Add RaggedTensor.numpy().

    • Update RaggedTensor.__getitem__ to preserve uniform dimensions & allow indexing into uniform dimensions.

    • Update tf.expand_dims to always insert the new dimension as a non-ragged dimension.

    • Update tf.embedding_lookup to use partition_strategy and max_norm when ids is ragged.

    • Allow batch_dims==rank(indices) in tf.gather.

    • Add support for bfloat16 in tf.print.

  • tf.distribute:

    • Support embedding_column with variable-length input features for MultiWorkerMirroredStrategy.

  • tf.keras:

    • Added all_reduce_sum_gradients argument to tf.keras.optimizer.Optimizer.apply_gradients. This allows custom gradient aggregation and processing aggregated gradients in custom training loop.

    • Allow pathlib.Path paths for loading models via Keras API.

  • tf.function/AutoGraph:

    • AutoGraph is now available in ReplicaContext.merge_call, Strategy.extended.update and Strategy.extended.update_non_slot.

    • Experimental support for shape invariants has been enabled in tf.function. See the API docs for tf.autograph.experimental.set_loop_options for additonal info.

    • AutoGraph error messages now exclude frames corresponding to APIs internal to AutoGraph.

    • Improve shape inference for tf.function input arguments to unlock more Grappler optimizations in TensorFlow 2.x.

    • Improve automatic control dependency management of resources by allowing resource reads to occur in parallel and synchronizing only on writes.

    • Fix execution order of multiple stateful calls to experimental_run_v2 in tf.function.

    • You can now iterate over RaggedTensors using a for loop inside tf.function.

  • tf.lite:

    • Migrated the tf.lite C inference API out of experimental into lite/c.

    • Add an option to disallow NNAPI CPU / partial acceleration on Android 10

    • TFLite Android AARs now include the C headers and APIs are required to use TFLite from native code.

    • Refactors the delegate and delegate kernel sources to allow usage in the linter.

    • Limit delegated ops to actually supported ones if a device name is specified or NNAPI CPU Fallback is disabled.

    • TFLite now supports tf.math.reciprocal1 op by lowering to tf.div op.

    • TFLite's unpack op now supports boolean tensor inputs.

    • Microcontroller and embedded code moved from experimental to main TensorFlow Lite folder

    • Check for large TFLite tensors.

    • Fix GPU delegate crash with C++17.

    • Add 5D support to TFLite strided_slice.

    • Fix error in delegation of DEPTH_TO_SPACE to NNAPI causing op not to be accelerated.

    • Fix segmentation fault when running a model with LSTM nodes using NNAPI Delegate

    • Fix NNAPI delegate failure when an operand for Maximum/Minimum operation is a scalar.

    • Fix NNAPI delegate failure when Axis input for reduce operation is a scalar.

    • Expose option to limit the number of partitions that will be delegated to NNAPI.

    • If a target accelerator is specified, use its feature level to determine operations to delegate instead of SDK version.

  • tf.random:

    • Add a fast path for default random_uniform

    • random_seed documentation improvement.

    • RandomBinomial broadcasts and appends the sample shape to the left rather than the right.

    • Various random number generation improvements:

    • Added tf.random.stateless_binomial, tf.random.stateless_gamma, tf.random.stateless_poisson

    • tf.random.stateless_uniform now supports unbounded sampling of int types.

  • Math and Linear Algebra:

    • Add tf.linalg.LinearOperatorTridiag.

    • Add LinearOperatorBlockLowerTriangular

    • Add broadcasting support to tf.linalg.triangular_solve#26204, tf.math.invert_permutation.

    • Add tf.math.sobol_sample op.

    • Add tf.math.xlog1py.

    • Add tf.math.special.{dawsn,expi,fresnel_cos,fresnel_sin,spence}.

    • Add a Modified Discrete Cosine Transform (MDCT) and its inverse to tf.signal.

  • TPU Enhancements:

    • Refactor TpuClusterResolver to move shared logic to a separate pip package.

    • Support configuring TPU software version from cloud tpu client.

    • Allowed TPU embedding weight decay factor to be multiplied by learning rate.

  • XLA Support:

    • Add standalone XLA AOT runtime target + relevant .cc sources to pip package.

    • Add check for memory alignment to MemoryAllocation::MemoryAllocation() on 32-bit ARM. This ensures a deterministic early exit instead of a hard to debug bus error later.

    • saved_model_cli aot_compile_cpu allows you to compile saved models to XLA header+object files and include them in your C++ programs.

    • Enable Igamma, Igammac for XLA.

    • XLA reduction emitter is deterministic when the environment variable TF_DETERMINISTIC_OPS is set.

  • Tracing and Debugging:

    • Add source, destination name to _send traceme to allow easier debugging.

    • Add traceme event to fastpathexecute.

  • Other:

    • Fix an issue with AUC.reset_states for multi-label AUC #35852

    • Fix the TF upgrade script to not delete files when there is a parsing error and the output mode is in-place.

    • Move tensorflow/core:framework/*_pyclif rules to tensorflow/core/framework:*_pyclif.


参考:TensorFlow release https://github.com/tensorflow/tensorflow/releases

推荐阅读

堪比Focal Loss!解决目标检测中样本不平衡的无采样方法

超越BN和GN!谷歌提出新的归一化层:FRN

2020年,我终于决定入门GCN

另辟蹊径!斯坦福大学提出边界框回归任务新Loss:GIoU

人人必须要知道的语义分割模型:DeepLabv3+

机器学习算法工程师


                                            一个用心的公众号


 


你可能感兴趣的:(TensorFlow 2.2.0-rc0,这次更新让人惊奇!)