AI编辑:我是小将
刚刚谷歌在TensorFlow 开发者峰会上发布了 TensorFlow 2.2 版,2.2版本有很多地方的更新,我觉得可能两点会让大家欣喜若狂:
1. 增加同步BN层
同步BN层tf.keras.layers.experimental.SyncBatchNormalization,这是分布式训练的好帮手,接口和原来的BatchNormalization层类似:
tf.keras.layers.experimental.SyncBatchNormalization(
axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True,
beta_initializer='zeros', gamma_initializer='ones',
moving_mean_initializer='zeros', moving_variance_initializer='ones',
beta_regularizer=None, gamma_regularizer=None, beta_constraint=None,
gamma_constraint=None, renorm=False, renorm_clipping=None, renorm_momentum=0.99,
trainable=True, adjustment=None, name=None, **kwargs
)
用法如下:
strategy = tf.distribute.MirroredStrategy()
with strategy.scope():
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(16))
2. Model.fit可以自定义训练和测试逻辑
Model.fit支持Model.train_step接口改写,这样我们可以实现训练的自定义逻辑,具体请看:
def train_step(self, data):
"""The logic for one training step.
This method can be overridden to support custom training logic.
This method is called by `Model._make_train_function`.
This method should contain the mathemetical logic for one step of training.
This typically includes the forward pass, loss calculation, backpropagation,
and metric updates.
Configuration details for *how* this logic is run (e.g. `tf.function` and
`tf.distribute.Strategy` settings), should be left to
`Model._make_train_function`, which can also be overridden.
Arguments:
data: A nested structure of `Tensor`s.
Returns:
A `dict` containing values that will be passed to
`tf.keras.callbacks.CallbackList.on_train_batch_end`. Typically, the
values of the `Model`'s metrics are returned. Example:
`{'loss': 0.2, 'accuracy': 0.7}`.
"""
# These are the only transformations `Model.fit` applies to user-input
# data when a `tf.data.Dataset` is provided. These utilities will be exposed
# publicly.
data = data_adapter.expand_1d(data)
x, y, sample_weight = data_adapter.unpack_x_y_sample_weight(data)
with backprop.GradientTape() as tape:
y_pred = self(x, training=True)
loss = self.compiled_loss(
y, y_pred, sample_weight, regularization_losses=self.losses)
# For custom training steps, users can just write:
# trainable_variables = self.trainable_variables
# gradients = tape.gradient(loss, trainable_variables)
# self.optimizer.apply_gradients(zip(gradients, trainable_variables))
# The _minimize call does a few extra steps unnecessary in most cases,
# such as loss scaling and gradient clipping.
_minimize(tape, self.optimizer, loss, self.trainable_variables)
self.compiled_metrics.update_state(y, y_pred, sample_weight)
return {m.name: m.result() for m in self.metrics}
其实这样带来的一个好处就是,我们就可以更加灵活使用Model.fit来训练自己的模型,当然Model还有Model.test_step和Model.predict_step来修改测试和预测逻辑,我觉得这个绝对对TFer有吸引力。
Replaced the scalar type for string tensors from std::string
to tensorflow::tstring
which is now ABI stable.
A new Profiler for TF 2 for CPU/GPU/TPU. It offers both device and host performance analysis, including input pipeline and TF Ops. Optimization advisory is provided whenever possible. Please see this tutorial for usage guidelines.
Export C++ functions to Python using pybind11
as opposed to SWIG
as a part of our deprecation of swig efforts.
tf.distribute
:
Update NVIDIA NCCL
to 2.5.7-1
for better performance and performance tuning. Please see nccl developer guide for more information on this.
Support gradient allreduce
in float16
. See this example usage.
Experimental support of all reduce gradient packing to allow overlapping gradient aggregation with backward path computation.
Support added for global sync BatchNormalization
by using the newly added tf.keras.layers.experimental.SyncBatchNormalization
layer. This layer will sync BatchNormalization
statistics every step across all replicas taking part in sync training.
Performance improvements for GPU multi-worker distributed training using tf.distribute.experimental.MultiWorkerMirroredStrategy
tf.keras
:
You can now use custom training logic with Model.fit
by overriding Model.train_step
.
Easily write state-of-the-art training loops without worrying about all of the features Model.fit
handles for you (distribution strategies, callbacks, data formats, looping logic, etc)
See the default Model.train_step
for an example of what this function should look like
Same applies for validation and inference via Model.test_step
and Model.predict_step
Model.fit
major improvements:
The SavedModel format now supports all Keras built-in layers (including metrics, preprocessing layers, and stateful RNN layers)
tf.lite
:
Enable TFLite experimental new converter by default.
XLA
XLA now builds and works on windows. All prebuilt packages come with XLA available.
XLA can be enabled for a tf.function
with “compile or throw exception” semantics on CPU and GPU.
tf.keras
:
In tf.keras.applications
the name of the "top" layer has been standardized to "predictions". This is only a problem if your code relies on the exact name of the layer.
Huber loss function has been updated to be consistent with other Keras losses. It now computes mean over the last axis of per-sample losses before applying the reduction function.
AutoGraph no longer converts functions passed to tf.py_function
, tf.py_func
and tf.numpy_function
.
Deprecating XLA_CPU
and XLA_GPU
devices with this release.
Increasing the minimum bazel version to build TF to 1.2.1 to use Bazel's cc_experimental_shared_library
.
MacOS binaries are not available on pypi at tensorflow-cpu project, but they are identical to the binaries in tensorflow project, since MacOS has no GPU.
tf.data
:
Removed autotune_algorithm
from experimental optimization options.
TF Core:
tf.constant
always creates CPU tensors irrespective of the current device context.
Eager TensorHandles maintain a list of mirrors for any copies to local or remote devices. This avoids any redundant copies due to op execution.
For tf.Tensor
& tf.Variable
, .experimental_ref()
is no longer experimental and is available as simply .ref()
.
Support matrix inverse and solves in pfor/vectorized_map
.
Set as much partial shape as we can infer statically within the gradient impl of the gather op.
Gradient of tf.while_loop
emits StatelessWhile
op if cond
and body functions are stateless. This allows multiple gradients while ops to run in parallel under distribution strategy.
Speed up GradientTape
in eager mode by auto-generating list of op inputs/outputs which are unused and hence not cached for gradient functions.
Support back_prop=False
in while_v2
but mark it as deprecated.
Improve error message when attempting to use None
in data-dependent control flow.
Add RaggedTensor.numpy()
.
Update RaggedTensor.__getitem__
to preserve uniform dimensions & allow indexing into uniform dimensions.
Update tf.expand_dims
to always insert the new dimension as a non-ragged dimension.
Update tf.embedding_lookup
to use partition_strategy
and max_norm
when ids
is ragged.
Allow batch_dims==rank(indices)
in tf.gather
.
Add support for bfloat16 in tf.print
.
tf.distribute
:
Support embedding_column
with variable-length input features for MultiWorkerMirroredStrategy
.
tf.keras
:
Added all_reduce_sum_gradients
argument to tf.keras.optimizer.Optimizer.apply_gradients
. This allows custom gradient aggregation and processing aggregated gradients in custom training loop.
Allow pathlib.Path
paths for loading models via Keras API.
tf.function
/AutoGraph:
AutoGraph is now available in ReplicaContext.merge_call
, Strategy.extended.update
and Strategy.extended.update_non_slot
.
Experimental support for shape invariants has been enabled in tf.function
. See the API docs for tf.autograph.experimental.set_loop_options
for additonal info.
AutoGraph error messages now exclude frames corresponding to APIs internal to AutoGraph.
Improve shape inference for tf.function
input arguments to unlock more Grappler optimizations in TensorFlow 2.x.
Improve automatic control dependency management of resources by allowing resource reads to occur in parallel and synchronizing only on writes.
Fix execution order of multiple stateful calls to experimental_run_v2
in tf.function
.
You can now iterate over RaggedTensors
using a for loop inside tf.function
.
tf.lite
:
Migrated the tf.lite
C inference API out of experimental into lite/c.
Add an option to disallow NNAPI
CPU / partial acceleration on Android 10
TFLite Android AARs now include the C headers and APIs are required to use TFLite from native code.
Refactors the delegate and delegate kernel sources to allow usage in the linter.
Limit delegated ops to actually supported ones if a device name is specified or NNAPI
CPU Fallback is disabled.
TFLite now supports tf.math.reciprocal1
op by lowering to tf.div op
.
TFLite's unpack op now supports boolean tensor inputs.
Microcontroller and embedded code moved from experimental to main TensorFlow Lite folder
Check for large TFLite tensors.
Fix GPU delegate crash with C++17.
Add 5D support to TFLite strided_slice
.
Fix error in delegation of DEPTH_TO_SPACE
to NNAPI
causing op not to be accelerated.
Fix segmentation fault when running a model with LSTM nodes using NNAPI
Delegate
Fix NNAPI
delegate failure when an operand for Maximum/Minimum operation is a scalar.
Fix NNAPI
delegate failure when Axis input for reduce operation is a scalar.
Expose option to limit the number of partitions that will be delegated to NNAPI
.
If a target accelerator is specified, use its feature level to determine operations to delegate instead of SDK version.
tf.random
:
Add a fast path for default random_uniform
random_seed
documentation improvement.
RandomBinomial
broadcasts and appends the sample shape to the left rather than the right.
Various random number generation improvements:
Added tf.random.stateless_binomial
, tf.random.stateless_gamma
, tf.random.stateless_poisson
tf.random.stateless_uniform
now supports unbounded sampling of int
types.
Math and Linear Algebra:
Add tf.linalg.LinearOperatorTridiag
.
Add LinearOperatorBlockLowerTriangular
Add broadcasting support to tf.linalg.triangular_solve#26204, tf.math.invert_permutation.
Add tf.math.sobol_sample
op.
Add tf.math.xlog1py
.
Add tf.math.special.{dawsn,expi,fresnel_cos,fresnel_sin,spence}
.
Add a Modified Discrete Cosine Transform (MDCT) and its inverse to tf.signal
.
TPU Enhancements:
Refactor TpuClusterResolver
to move shared logic to a separate pip package.
Support configuring TPU software version from cloud tpu client.
Allowed TPU embedding weight decay factor to be multiplied by learning rate.
XLA Support:
Add standalone XLA AOT runtime target + relevant .cc sources to pip package.
Add check for memory alignment to MemoryAllocation::MemoryAllocation() on 32-bit ARM. This ensures a deterministic early exit instead of a hard to debug bus error later.
saved_model_cli aot_compile_cpu
allows you to compile saved models to XLA header+object files and include them in your C++ programs.
Enable Igamma
, Igammac
for XLA.
XLA reduction emitter is deterministic when the environment variable TF_DETERMINISTIC_OPS
is set.
Tracing and Debugging:
Add source, destination name to _send
traceme to allow easier debugging.
Add traceme event to fastpathexecute
.
Other:
Fix an issue with AUC.reset_states for multi-label AUC #35852
Fix the TF upgrade script to not delete files when there is a parsing error and the output mode is in-place
.
Move tensorflow/core:framework/*_pyclif
rules to tensorflow/core/framework:*_pyclif
.
参考:TensorFlow release https://github.com/tensorflow/tensorflow/releases