Before, accessing the `op` attribute on the return value of AutoCastVariable.assign in Eager mode would raise an AttributeError instead of returning None. Accessing the `op` attribute on an AutoCastVariable itself (that is not the return value of `assign`) still raises an AttributeError, to be consistent with tf.Variable.
Resolves https://github.com/tensorflow/tensorflow/issues/45536.
PiperOrigin-RevId: 347524886
Change-Id: I663731c0ff4c557608eae352096a527e4dcabb18
The docstrings have been reworded to make them more clear and concise. Some clarifying information is also added.
I removed some examples for uncommon use cases in order to shorten the docstrings, and rewrote or shortened other examples to make them easier and faster to read.
All references to experimental mixed precision APIs have been changed to use the non-experimental APIs. Examples now use the newly added attribute Layer.dtype_policy as well.
The section "How to use float64 in a Keras model" has been removed. Float64 can be enabled by setting floatx to float64 so I don't think its necessary to mention it in the policy section. Also, it's fairly obvious after reading the Policy docstring how to use float64 using policies: Just set the global policy to "float64".
I intend on cherrypicking this change into TF 2.4.
PiperOrigin-RevId: 339780420
Change-Id: I5f6ad44f54964114c398c306d2d7a4da39bb1c54
The metadata field will no longer be used by Keras. Since Keras is the only consumer metadata field, this field will be deprecated shortly.
PiperOrigin-RevId: 338353130
Change-Id: I762b7b223255966c78b5b362b0d07ec27351bb42
This way, if the user calls set_shape on the inputs within the function body, the serialized signature is not affected.
PiperOrigin-RevId: 338345710
Change-Id: Ie31b9e2206de57aca4e592bbd43fafbff0d2bda6
This is a purely mechanical change. All that is done is:
* Deleted python/keras/mixed_precision/experimental/__init__.py
* All other files in python/keras/mixed_precision/experimental/ are moved one directly up, out of the experimental/ folder
* All Python imports, BUILD dependencies, and other references to the old experimental files are adjusted to refer to the new location
This changes the API golden files, but there is no API change. The golden files referred to the full paths of the classes in "is_instance" sections, and the full paths have changed.
PiperOrigin-RevId: 338345459
Change-Id: I9eefc2bea49b71f26ef7ec3563364a3f1d54abe6
The current tracing implementation has a limited scope of models and layers that can be traced. When users add a custom layer or model that is unsupported (e.g. with multiple tensor arguments), they'll come across an error that prevents them from saving the model entirely. With this option, those models can now be saved to the SavedModel format for serving or retraining.
PiperOrigin-RevId: 338295627
Change-Id: Ieea88ecaa1b8665df4ab45c96e882867e4308d88
tf.function tracing depends on the inputs to the function. For a typical training loop:
x, y = next(iter)
train_fn(x,y)
it may retrace when getting a partial/batches. This is problematic for multi client training since different client may retrace at different time. We assign collective instance_key when tracing a function, retracing results in different sets of instance keys.
This change we overrides the PerReplica type spec, which is used to calculate function cache key. This tries to avoid retracing in common cases, but it doesn't guarantee that it won't happen.
Note that after such change, the function also gets partial shape information. This is the reason we only do it for multi client strategies (MWMS), to avoid performance penalty to e.g. TPU.
PiperOrigin-RevId: 338203534
Change-Id: Iae9d6c3c82113d623707e19142fbebe5597d7898
Before this CL, Keras rebuilds from a SavedModel by overwriting steps in the internal loader, so that the loader calls a different function to load Keras layers and models. Other objects and variables are loaded using the core loading functions.
With load_partial added in cl/333129134, Keras objects and core objects can be loaded in separate steps. The Keras objects are loaded first, which are passed as inputs to load_partial, which will then load the other nodes in the SavedModel.
PiperOrigin-RevId: 338167915
Change-Id: I4d46d844552cf10d5d3cdcd32bcec60cc529c31e
This change exports the following class symbols, and adds relevant documentation and example code to
tf.distribute.experimental.ParameterServerStrategy
tf.distribute.experimental.coordinator.ClusterCoordinator
tf.distribute.experimental.coordinator.PerWorkerValues
tf.distribute.experimental.coordinator.RemoteValue
PiperOrigin-RevId: 338151262
Change-Id: If2d1c513d30a999c728cecc2e73b75adda1948c2
The previous change may cause a use-after-free since StartAbort() runs a separate thread but accesses resources owned by CollectiveExecutiveMgr. Once all cancellation callbacks finish, the CollectiveExecutorMgr may already be deallocated while StartAbort() is in progress. Fixing the ownership is not trivial so we now call StartAbort() in the cancellation callback instead to ensure all resources are valid. Note that with this we need to use TryDeregisterCallback in done() instead of DeregisterCallback(), because the latter blocks until all cancellation callback is done.
We used to always abort collective ops in executor when there're errors in graph execution. However there're some errors that are intended for the user to catch, and if we abort collective ops, the user program cannot continue. It's also not necessary
to abort collective ops if there's no active ones.
Ideally we should have a cancellation story for collectives. Before that, we can at least only abort collectives when it's necessary, i.e. when there're pending collective ops or failed collective ops.
To make the the catching EOF workflow work, we also need to make all collectives in gather depend on the input tensors, so there's better chance they fire after iterator GetNext. Without that the shape gathering may run in parallel with GetNext.
PiperOrigin-RevId: 337997169
Change-Id: I4a374f9ff00bdba38e012a96fb7f5837e049c85c
Currently both Generator and CompositeTensor handler could handle it, which cause error like https://github.com/tensorflow/tensorflow/pull/43874.
PiperOrigin-RevId: 337987774
Change-Id: I706079fbe57e0e87687ceeb10e14e265a754e08e
`tf.distribute.Strategy.gather` and `tf.distribute.ReplicaContext.all_gather` methods are APIs to gather and concatenate `tf.distribute.DistributedValues` object(s) across workers and devices. They are counterparts in cross-replica context and replica context. This methods are implemented for all strategies except ParameterServerStrategy.
PiperOrigin-RevId: 337972679
Change-Id: I1d61c96b830683da135d5b4e89da29693c51262c
Additionally, the following attributes are added to Layer: `dtype_policy`, `compute_dtype`, `variable_dtype`.
The `inner_optimizer` attribute is added to LossScaleOptimizer.
This change follows the mixed precision RFC: https://github.com/tensorflow/community/pull/293
I'll move the mixed_precision folder out of the experimental folder in a subsequent change. That change will have no functional impact.
I also removed the "About the layer's `dtype` attribute" section from the base Layer docstring since it didn't properly describe mixed precision. I added some of the information to the Arguments section, which links to the Policy docstring for a complete description of layer dtypes. In a future change, I'll add a paragraph which better describes how layers use dtypes.
PiperOrigin-RevId: 337968442
Change-Id: I2738862faaabec14fe6675ea9f34075a5e56426a
The RFC does have this property (https://github.com/tensorflow/community/pull/293) but I don't think this property is very useful, and there are no uses of it within Google outside Keras, so it should be removed.
PiperOrigin-RevId: 337950640
Change-Id: I64c27589e87e4bf8f3f9c7fe38150703d914e804
LossScale and its subclasses are deprecated and will be removed from the TF 2 namespace in TensorFlow 2.5. It will still be accessible under the tf.compat.v1 namespace, and this change makes it non-experimental under the tf.compat.v1 namespace, exporting it as `tf.compat.v1.mixed_precision.LossScale`. LossScale cannot be removed from the tf.compat.v1 namespace since its used by the V1-only class tf.compat.v1.train.experimental.MixedPrecisionLossScaleOptimizer.
LossScaleOptimizer previously used a LossScale, but now it directly performs loss scaling within the class itself. Additionally a new non-experimental `tf.keras.mixed_precision.LossScaleOptimizer` has been introduced. Unlike the experimental LossScaleOptimizer, the non-experimental LossScaleOptimizer does not accept a LossScale but instead has different constructor arguments to specify the type of loss scaling to be done. The old experimental LossScaleOptimizer will be removed in TensorFlow 2.5, at which point a LossScale cannot be used with any Keras LossScaleOptimizer.
Internally, LossScaleOptimizer uses a fork of DynamicLossScale called _DynamicLossScaleState, but this is not exposed to the user. In the future, _DynamicLossScaleState will be merged into LossScaleOptimizer.
LossScaleOptimizer now exposes some attributes that DynamicLossScale previously did. "increment_period" is renamed to "dynamic_growth_steps" for consistency with `ExponentialDecay.decay_steps`. `num_good_steps` is replaced by `dynamic_counter`.
LossScaleOptimizer.loss_scale is now a tensor, not a LossScale. This means the previous way of getting the loss scale as a tensor (calling `optimizer.loss_scale()`) will raise an error instead. I don't know of any users who do this, so I do not anticipate any breakages.
Policy previously had an instance of a LossScale, and optionally took a LossScale in the constructor. By default, the "mixed_float16" policy had a DynamicLossScale, while all other policies had no loss scale. Now, Policy no longer has a loss scale or takes an instance of a loss scale. To temporarily preserve backwards compatibility with the old API, the symbol `tf.keras.mixed_precision.experimental.Policy` still takes and holds a LossScale, as it did before. A new non-experimental symbol, `tf.keras.mixed_precision.Policy`, removes the use of the LossScale. The old experimental symbol will be removed in the future.
When deserializing a layer or model with an old experimental policy, it will be restored as the new policy and the loss scale will be silently dropped. This is to preserve SavedModel compatibility with models saved in TensorFlow 2.3 and restored in future versions of TensorFlow once the old experimental Policy is removed. Luckily, dropping the loss scale is unlikely to break anyone, as a bug in the mixed precision API causes models to not save their dtype policies at all when being serialized. Similarly, when deserializing a model with the old experimental LossScaleOptimizer, it will be restored as the new LossScaleOptimizer but unlike the policy case, nothing is silently dropped.
This change is different than what is described in the mixed precision RFC (https://github.com/tensorflow/community/pull/293) but I think this API is a lot clearer and simpler than the API in the RFC. The RFC forked the LossScale classes into Keras, but I now think its better to simply not use them and make LossScale exposed under tf.compat.v1 only. This new API was designed based on feedback from @fchollet and @omalleyt12. I will retroactively update the RFC to reflect this API.
PiperOrigin-RevId: 337938270
Change-Id: Id7bb3bb89eb2143e5fadabeb2f57d1f8267379b3
This allows us the flexibility to later remove AutoCastVariable and instead have a mechanism so that individual ops will cast variables (and potentially other tensors) to the correct dtype. See the last paragraph of the this section of the mixed precision RFC (8563574455/rfcs/20200929-keras-mixed-precision.md (op-based-autocasting-api)) for an example of how this could be done.
PiperOrigin-RevId: 337793570
Change-Id: I8e56f7d276117a9a81070ab0984369e8a4490eea
The current tracing implementation has a limited scope of models and layers that can be traced. When users add a custom layer or model that is unsupported (e.g. with multiple tensor arguments), they'll come across an error that prevents them from saving the model entirely. With this option, those models can now be saved to the SavedModel format for serving or retatining.
PiperOrigin-RevId: 337540358
Change-Id: I81f1436d10ae7e9de1b6597c3b8d1be7f8a386f8
We used to always abort collective ops in executor when there're errors in graph execution. However there're some errors that are intended for the user to catch, and if we abort collective ops, the user program cannot continue. It's also not necessary
to abort collective ops if there's no active ones.
Ideally we should have a cancellation story for collectives. Before that, we can at least only abort collectives when it's necessary, i.e. when there're pending collective ops or failed collective ops.
To make the the catching EOF workflow work, we also need to make all collectives in gather depend on the input tensors, so there's better chance they fire after iterator GetNext. Without that the shape gathering may run in parallel with GetNext.
PiperOrigin-RevId: 337440792
Change-Id: I7caea917c858bcf99f6eb471abf46d94d5c255b3