Commit Graph

4950 Commits

Author SHA1 Message Date
Mihai Maruseac
e9c0ef3064
Merge pull request from geetachavan1/cherrypicks_ZX1AI
[Cherrypick:r2.4] Fix issue when using mixed precision with RMSprop.
2021-01-19 09:21:17 -08:00
Lukas Geiger
2239f71ef0 Fix unittest error message assertion 2021-01-11 23:59:15 +01:00
Reed Wanderman-Milne
ef97e90d60 Fix issue when using mixed precision with RMSprop.
Before, accessing the `op` attribute on the return value of AutoCastVariable.assign in Eager mode would raise an AttributeError instead of returning None. Accessing the `op` attribute on an AutoCastVariable itself (that is not the return value of `assign`) still raises an AttributeError, to be consistent with tf.Variable.

Resolves https://github.com/tensorflow/tensorflow/issues/45536.

PiperOrigin-RevId: 347524886
Change-Id: I663731c0ff4c557608eae352096a527e4dcabb18
2021-01-11 12:39:18 -08:00
Lukas Geiger
c4e6c635de Keras SavedModel: Ignore custom metrics failure when compile=False 2021-01-11 12:03:01 +01:00
Kathy Wu
45967fee60 Revert "Warn users when saving SavedModel with metadata."
This reverts commit 0ed710fb76.
2020-12-03 16:55:31 -08:00
Kathy Wu
5190387363 Revert "Save Keras metadata in a separate proto and raise deprecation warnings when loading a SavedModel with tf.saved_model.save()."
This reverts commit 87fc5a0509.
2020-12-03 16:55:21 -08:00
Katherine Wu
87fc5a0509 Save Keras metadata in a separate proto and raise deprecation warnings when loading a SavedModel with tf.saved_model.save().
PiperOrigin-RevId: 339760831
Change-Id: I8980807eb4f2f0f1a8c4420b7e4c386842f5ebf9
2020-12-01 13:31:01 -08:00
Reed Wanderman-Milne
2cf3cc466c Merging 2020-11-16 16:53:26 -08:00
Reed Wanderman-Milne
9765b5ab7f Improve mixed precision docstrings.
The docstrings have been reworded to make them more clear and concise. Some clarifying information is also added.

I removed some examples for uncommon use cases in order to shorten the docstrings, and rewrote or shortened other examples to make them easier and faster to read.

All references to experimental mixed precision APIs have been changed to use the non-experimental APIs. Examples now use the newly added attribute Layer.dtype_policy as well.

The section "How to use float64 in a Keras model" has been removed. Float64 can be enabled by setting floatx to float64 so I don't think its necessary to mention it in the policy section. Also, it's fairly obvious after reading the Policy docstring how to use float64 using policies: Just set the global policy to "float64".

I intend on cherrypicking this change into TF 2.4.

PiperOrigin-RevId: 339780420
Change-Id: I5f6ad44f54964114c398c306d2d7a4da39bb1c54
2020-11-04 11:22:31 -08:00
Yanhui Liang
9d266e05ac Create benchmarks for Conv2D and LSTM layers.
PiperOrigin-RevId: 338385224
Change-Id: I53fa885addc52c70109ff1ad44fc974d259a60b1
2020-10-21 20:04:26 -07:00
A. Unique TensorFlower
4e1e1499fe Disable a few failing tests on py3.8 version
PiperOrigin-RevId: 338381183
Change-Id: I95b58ff09033376936e05364c6ec35ec38389ea0
2020-10-21 19:31:09 -07:00
A. Unique TensorFlower
ec8ef1a4f2 Save Keras metadata in a separate folder and raise deprecation warnings when loading a SavedModel with tf.saved_model.save().
PiperOrigin-RevId: 338374188
Change-Id: I884ca90e9e3ed75e3b091dff6acb67c8db0d7e7b
2020-10-21 18:36:59 -07:00
Chenkai Kuang
239fe406d3 Modify some v2 initializers to be able to return a value that corresponds to a partition of the entire value. This is useful for efficiently initializing sharded variables where only a shard of the initial value is necessary at a time.
PiperOrigin-RevId: 338371904
Change-Id: Ib4320d73cbaec30f5a61793debe7755026175781
2020-10-21 18:20:24 -07:00
Sanjoy Das
f8f820a77b Enable tensorflow/python/keras/layers:gru_v2_test on CUDA 11
PiperOrigin-RevId: 338363201
Change-Id: I41e8e421d72f581b2a701a06e4e8e19ca54afff1
2020-10-21 17:12:59 -07:00
Katherine Wu
8366f2ecea Save Keras metadata in a separate folder and raise deprecation warnings when loading a SavedModel with tf.saved_model.save().
PiperOrigin-RevId: 338359077
Change-Id: I93d8c345efb323cd8d4fd1fda4c8e5e86b37d620
2020-10-21 16:47:25 -07:00
Katherine Wu
0ed710fb76 Warn users when saving SavedModel with metadata.
The metadata field will no longer be used by Keras. Since Keras is the only consumer metadata field, this field will be deprecated shortly.

PiperOrigin-RevId: 338353130
Change-Id: I762b7b223255966c78b5b362b0d07ec27351bb42
2020-10-21 16:04:49 -07:00
Katherine Wu
12d00c3e34 Serialize concrete function signature using structured_input_signature instead of function inputs.
This way, if the user calls set_shape on the inputs within the function body, the serialized signature is not affected.

PiperOrigin-RevId: 338345710
Change-Id: Ie31b9e2206de57aca4e592bbd43fafbff0d2bda6
2020-10-21 15:25:57 -07:00
Reed Wanderman-Milne
642c3e8498 Move mixed precision files out of experimental/ directory.
This is a purely mechanical change. All that is done is:
* Deleted python/keras/mixed_precision/experimental/__init__.py
* All other files in python/keras/mixed_precision/experimental/ are moved one directly up, out of the experimental/ folder
* All Python imports, BUILD dependencies, and other references to the old experimental files are adjusted to refer to the new location

This changes the API golden files, but there is no API change. The golden files referred to the full paths of the classes in "is_instance" sections, and the full paths have changed.

PiperOrigin-RevId: 338345459
Change-Id: I9eefc2bea49b71f26ef7ec3563364a3f1d54abe6
2020-10-21 15:19:21 -07:00
Geeta Chavan
72fa7b2108 Removing skip condition as windows py35 builds are no longer built.
PiperOrigin-RevId: 338343962
Change-Id: I0fdfd83dd1cf633613f7ec490456e236d1cfc1a9
2020-10-21 15:14:05 -07:00
A. Unique TensorFlower
08bf3f83f7 Disabling test failing under ASAN.
PiperOrigin-RevId: 338328301
Change-Id: I4c9dbe259a93e56adb92456a713d0eefeb0d813b
2020-10-21 13:41:12 -07:00
Yuefeng Zhou
898c3b6d33 Add a KPL test for PSStrategy with precomputed states.
PiperOrigin-RevId: 338319527
Change-Id: I91edcb84994b7d053dcf35c14fa791de26374bcf
2020-10-21 12:49:34 -07:00
Rick Chao
dbf191bb17 PSv2: Dedup the legacy ParameterServerStrategy class (as the estimator usage of it uses ParameterServerStrategyV1).
PiperOrigin-RevId: 338310081
Change-Id: Icff445e322b22ee4ac7f3e69327c7969444eeb93
2020-10-21 12:16:22 -07:00
Katherine Wu
259ffa9ea6 (rollforward) Add option to not save the traces when exporting to the SavedModel format.
The current tracing implementation has a limited scope of models and layers that can be traced. When users add a custom layer or model that is unsupported (e.g. with multiple tensor arguments), they'll come across an error that prevents them from saving the model entirely. With this option, those models can now be saved to the SavedModel format for serving or retraining.

PiperOrigin-RevId: 338295627
Change-Id: Ieea88ecaa1b8665df4ab45c96e882867e4308d88
2020-10-21 11:28:48 -07:00
Mihai Maruseac
816a1177dc Disable broken Windows test
PiperOrigin-RevId: 338284225
Change-Id: I0ef2196b09d2ae4cfb9b9890e878db1d927fd74a
2020-10-21 11:00:35 -07:00
Ran Chen
9f51b98f0b [retry]DistributedDataset creates elements with fixed spec to help avoid retracing
tf.function tracing depends on the inputs to the function. For a typical training loop:

x, y = next(iter)
train_fn(x,y)

it may retrace when getting a partial/batches. This is problematic for multi client training since different client may retrace at different time. We assign collective instance_key when tracing a function, retracing results in different sets of instance keys.

This change we overrides the PerReplica type spec, which is used to calculate function cache key. This tries to avoid retracing in common cases, but it doesn't guarantee that it won't happen.

Note that after such change, the function also gets partial shape information. This is the reason we only do it for multi client strategies (MWMS), to avoid performance penalty to e.g. TPU.

PiperOrigin-RevId: 338203534
Change-Id: Iae9d6c3c82113d623707e19142fbebe5597d7898
2020-10-20 22:48:17 -07:00
Katherine Wu
32acc39360 Refactor Keras SavedModel loading implementation.
Before this CL, Keras rebuilds from a SavedModel by overwriting steps in the internal loader, so that the loader calls a different function to load Keras layers and models. Other objects and variables are loaded using the core loading functions.

With load_partial added in cl/333129134, Keras objects and core objects can be loaded in separate steps. The Keras objects are loaded first, which are passed as inputs to load_partial, which will then load the other nodes in the SavedModel.

PiperOrigin-RevId: 338167915
Change-Id: I4d46d844552cf10d5d3cdcd32bcec60cc529c31e
2020-10-20 17:18:56 -07:00
Meghna Natraj
e3b2f635b9 Refactor keras dependency to a common utility.
PiperOrigin-RevId: 338155812
Change-Id: Ibbd933514dbb76032a7c982a9233e183d311ab37
2020-10-20 16:06:05 -07:00
Rick Chao
32f35aabce PSv2: Export a few tf.distribute symbols related to TF2 parameter server training.
This change exports the following class symbols, and adds relevant documentation and example code to

tf.distribute.experimental.ParameterServerStrategy
tf.distribute.experimental.coordinator.ClusterCoordinator
tf.distribute.experimental.coordinator.PerWorkerValues
tf.distribute.experimental.coordinator.RemoteValue

PiperOrigin-RevId: 338151262
Change-Id: If2d1c513d30a999c728cecc2e73b75adda1948c2
2020-10-20 15:42:17 -07:00
Ran Chen
d345c40688 [retry]Use cancellation manager to abort collectives
The previous change may cause a use-after-free since StartAbort() runs a separate thread but accesses resources owned by CollectiveExecutiveMgr. Once all cancellation callbacks finish, the CollectiveExecutorMgr may already be deallocated while StartAbort() is in progress. Fixing the ownership is not trivial so we now call StartAbort() in the cancellation callback instead to ensure all resources are valid. Note that with this we need to use TryDeregisterCallback in done() instead of DeregisterCallback(), because the latter blocks until all cancellation callback is done.

We used to always abort collective ops in executor when there're errors in graph execution. However there're some errors that are intended for the user to catch, and if we abort collective ops, the user program cannot continue. It's also not necessary
to abort collective ops if there's no active ones.

Ideally we should have a cancellation story for collectives. Before that, we can at least only abort collectives when it's necessary, i.e. when there're pending collective ops or failed collective ops.

To make the the catching EOF workflow work, we also need to make all collectives in gather depend on the input tensors, so there's better chance they fire after iterator GetNext. Without that the shape gathering may run in parallel with GetNext.

PiperOrigin-RevId: 337997169
Change-Id: I4a374f9ff00bdba38e012a96fb7f5837e049c85c
2020-10-19 22:13:03 -07:00
Scott Zhu
bbefe66945 Fix the data_adapter for dataset.Iterator.
Currently both Generator and CompositeTensor handler could handle it, which cause error like https://github.com/tensorflow/tensorflow/pull/43874.

PiperOrigin-RevId: 337987774
Change-Id: I706079fbe57e0e87687ceeb10e14e265a754e08e
2020-10-19 20:40:51 -07:00
Reed Wanderman-Milne
ab9b5f5b05 Support saving mixed precision models with activity regularizers.
Fixes https://github.com/tensorflow/tensorflow/issues/43978

PiperOrigin-RevId: 337973257
Change-Id: Iefb46a31ff6a45d4e5d84fb7cb8822e4ae1cd039
2020-10-19 18:15:57 -07:00
Xinyi Wang
a7467d5d51 Adding new APIs under tf.distribute: gather and all_gather.
`tf.distribute.Strategy.gather` and `tf.distribute.ReplicaContext.all_gather` methods are APIs to gather and concatenate `tf.distribute.DistributedValues` object(s) across workers and devices. They are counterparts in cross-replica context and replica context. This methods are implemented for all strategies except ParameterServerStrategy.

PiperOrigin-RevId: 337972679
Change-Id: I1d61c96b830683da135d5b4e89da29693c51262c
2020-10-19 18:09:08 -07:00
Reed Wanderman-Milne
bdc71a74f7 Make the rest of the mixed precision API non-experimental.
Additionally, the following attributes are added to Layer: `dtype_policy`, `compute_dtype`, `variable_dtype`.

The `inner_optimizer` attribute is added to LossScaleOptimizer.

This change follows the mixed precision RFC: https://github.com/tensorflow/community/pull/293

I'll move the mixed_precision folder out of the experimental folder in a subsequent change. That change will have no functional impact.

I also removed the "About the layer's `dtype` attribute" section from the base Layer docstring since it didn't properly describe mixed precision. I added some of the information to the Arguments section, which links to the Policy docstring for a complete description of layer dtypes. In a future change, I'll add a paragraph which better describes how layers use dtypes.

PiperOrigin-RevId: 337968442
Change-Id: I2738862faaabec14fe6675ea9f34075a5e56426a
2020-10-19 17:33:50 -07:00
Yanhui Liang
73b709743a Disable the new codepath of LSTM/GRU.
PiperOrigin-RevId: 337966066
Change-Id: I87b9533acdc04342d317196a77323016ee96352e
2020-10-19 17:17:01 -07:00
Katherine Wu
63f17d0fe1 (rollforward of cl/337218666) Add method to partially load a SavedModel.
PiperOrigin-RevId: 337950896
Change-Id: Idd0a9e963b34671bdf1d7b87389e2325848e5eea
2020-10-19 16:03:44 -07:00
Reed Wanderman-Milne
d119ff9af8 Remove Policy.should_cast_variables property.
The RFC does have this property (https://github.com/tensorflow/community/pull/293) but I don't think this property is very useful, and there are no uses of it within Google outside Keras, so it should be removed.

PiperOrigin-RevId: 337950640
Change-Id: I64c27589e87e4bf8f3f9c7fe38150703d914e804
2020-10-19 15:53:12 -07:00
Reed Wanderman-Milne
51fbc48cef Deprecate LossScale and modify Keras APIs to not use it.
LossScale and its subclasses are deprecated and will be removed from the TF 2 namespace in TensorFlow 2.5. It will still be accessible under the tf.compat.v1 namespace, and this change makes it non-experimental under the tf.compat.v1 namespace, exporting it as `tf.compat.v1.mixed_precision.LossScale`. LossScale cannot be removed from the tf.compat.v1 namespace since its used by the V1-only class tf.compat.v1.train.experimental.MixedPrecisionLossScaleOptimizer.

LossScaleOptimizer previously used a LossScale, but now it directly performs loss scaling within the class itself. Additionally a new non-experimental `tf.keras.mixed_precision.LossScaleOptimizer` has been introduced. Unlike the experimental LossScaleOptimizer, the non-experimental LossScaleOptimizer does not accept a LossScale but instead has different constructor arguments to specify the type of loss scaling to be done. The old experimental LossScaleOptimizer will be removed in TensorFlow 2.5, at which point a LossScale cannot be used with any Keras LossScaleOptimizer.

Internally, LossScaleOptimizer uses a fork of DynamicLossScale called _DynamicLossScaleState, but this is not exposed to the user. In the future, _DynamicLossScaleState will be merged into LossScaleOptimizer.

LossScaleOptimizer now exposes some attributes that DynamicLossScale previously did. "increment_period" is renamed to "dynamic_growth_steps" for consistency with `ExponentialDecay.decay_steps`. `num_good_steps` is replaced by `dynamic_counter`.

LossScaleOptimizer.loss_scale is now a tensor, not a LossScale. This means the previous way of getting the loss scale as a tensor (calling `optimizer.loss_scale()`) will raise an error instead. I don't know of any users who do this, so I do not anticipate any breakages.

Policy previously had an instance of a LossScale, and optionally took a LossScale in the constructor. By default, the "mixed_float16" policy had a DynamicLossScale, while all other policies had no loss scale. Now, Policy no longer has a loss scale or takes an instance of a loss scale. To temporarily preserve backwards compatibility with the old API, the symbol `tf.keras.mixed_precision.experimental.Policy` still takes and holds a LossScale, as it did before. A new non-experimental symbol, `tf.keras.mixed_precision.Policy`, removes the use of the LossScale. The old experimental symbol will be removed in the future.

When deserializing a layer or model with an old experimental policy, it will be restored as the new policy and the loss scale will be silently dropped. This is to preserve SavedModel compatibility with models saved in TensorFlow 2.3 and restored in future versions of TensorFlow once the old experimental Policy is removed. Luckily, dropping the loss scale is unlikely to break anyone, as a bug in the mixed precision API causes models to not save their dtype policies at all when being serialized. Similarly, when deserializing a model with the old experimental LossScaleOptimizer, it will be restored as the new LossScaleOptimizer but unlike the policy case, nothing is silently dropped.

This change is different than what is described in the mixed precision RFC (https://github.com/tensorflow/community/pull/293) but I think this API is a lot clearer and simpler than the API in the RFC. The RFC forked the LossScale classes into Keras, but I now think its better to simply not use them and make LossScale exposed under tf.compat.v1 only. This new API was designed based on feedback from @fchollet and @omalleyt12. I will retroactively update the RFC to reflect this API.

PiperOrigin-RevId: 337938270
Change-Id: Id7bb3bb89eb2143e5fadabeb2f57d1f8267379b3
2020-10-19 14:55:55 -07:00
TensorFlower Gardener
3a593e474c Merge pull request from YoavRamon:patch-1
PiperOrigin-RevId: 337909899
Change-Id: If75888ac2ff45d436019a49996744e3fe788cfcc
2020-10-19 12:39:23 -07:00
TensorFlower Gardener
ba79107f74 Merge pull request from ROCmSoftwarePlatform:google-upstream-disabled-rocm-tests
PiperOrigin-RevId: 337844806
Change-Id: I4847456394a8e2a7c4fad542ec360e77833bef99
2020-10-19 06:54:01 -07:00
Reed Wanderman-Milne
861f63a327 Have AutoCastVariable.dtype refer to the variable dtype.
This allows us the flexibility to later remove AutoCastVariable and instead have a mechanism so that individual ops will cast variables (and potentially other tensors) to the correct dtype. See the last paragraph of the this section of the mixed precision RFC (8563574455/rfcs/20200929-keras-mixed-precision.md (op-based-autocasting-api)) for an example of how this could be done.

PiperOrigin-RevId: 337793570
Change-Id: I8e56f7d276117a9a81070ab0984369e8a4490eea
2020-10-18 22:33:55 -07:00
A. Unique TensorFlower
7529bc18e8 Use float32 instead of float64 for confusion matrix computations to make it compatible with tpus.
PiperOrigin-RevId: 337636936
Change-Id: I06295e65d089d35bff638cd7502f6975a15dfc40
2020-10-17 00:13:53 -07:00
Abdullah Rashwan
7a636941b3 Use float32 instead of float64 for confusion matrix computations to make it compatible with tpus.
PiperOrigin-RevId: 337635801
Change-Id: I2f9c282ca010168d19c316e1f0c6c86997567dec
2020-10-16 23:53:11 -07:00
TensorFlower Gardener
c4c72171dc Merge pull request from yil532:yiwen_branch
PiperOrigin-RevId: 337623926
Change-Id: I24caa92146de8b4045b14e41f7246c37098c2dd7
2020-10-16 21:08:18 -07:00
Tomer Kaftan
7e282e6090 Add support for RaggedTensor properties, instance methods, and class methods in the Keras functional API.
PiperOrigin-RevId: 337548766
Change-Id: I9cd9f3ac5a6aa115f4b7b22ea7801bd86dfd39a3
2020-10-16 12:11:06 -07:00
A. Unique TensorFlower
52473a84f4 Add option to not save the traces when exporting to the SavedModel format.
The current tracing implementation has a limited scope of models and layers that can be traced. When users add a custom layer or model that is unsupported (e.g. with multiple tensor arguments), they'll come across an error that prevents them from saving the model entirely. With this option, those models can now be saved to the SavedModel format for serving or retatining.

PiperOrigin-RevId: 337540358
Change-Id: I81f1436d10ae7e9de1b6597c3b8d1be7f8a386f8
2020-10-16 11:12:36 -07:00
Mihai Maruseac
4c5d73cb63 Rollback: Use cancellation manager to abort collectives
PiperOrigin-RevId: 337512779
Change-Id: Iced42d2245b4362bfe23fa4e3a9e2d86a8dd3d4e
2020-10-16 08:58:57 -07:00
Mihai Maruseac
3fc46e145e Disable broken TSAN test
PiperOrigin-RevId: 337512317
Change-Id: Ie735e813519165e17f3623ded1570504b8b7afed
2020-10-16 08:54:03 -07:00
Scott Zhu
52a388a6b7 Remove unused code in tests.
PiperOrigin-RevId: 337447002
Change-Id: I61c4dedb3cb0c5438b1602ab57c091d018f91fc1
2020-10-15 22:32:06 -07:00
Ran Chen
f0844f4065 Use cancellation manager to abort collectives
We used to always abort collective ops in executor when there're errors in graph execution. However there're some errors that are intended for the user to catch, and if we abort collective ops, the user program cannot continue. It's also not necessary
to abort collective ops if there's no active ones.

Ideally we should have a cancellation story for collectives. Before that, we can at least only abort collectives when it's necessary, i.e. when there're pending collective ops or failed collective ops.

To make the the catching EOF workflow work, we also need to make all collectives in gather depend on the input tensors, so there's better chance they fire after iterator GetNext. Without that the shape gathering may run in parallel with GetNext.

PiperOrigin-RevId: 337440792
Change-Id: I7caea917c858bcf99f6eb471abf46d94d5c255b3
2020-10-15 21:37:14 -07:00
Eugene Kuznetsov
91c3c9ed6b Disabling unit tests that fail with ROCm 2020-10-16 00:02:57 +00:00