STT-tensorflow

Author	SHA1	Message	Date
Katherine Wu	a3e64f721c	Add SaveableObjects to SavedModel. When objects are loaded from the SavedModel, they don't retain their `_gather_saveables_for_checkpoint` functions, which can result in values not being loaded from the checkpoint. This CL adds a field in the SavedModel proto that stores a save and restore function for each SaveableObject in each node. When loading into Python, the SaveableObjects are restored using the functions. PiperOrigin-RevId: 318549786 Change-Id: I688c72d7658e1bca98abf373a13a0e15a7fb83e2	2020-07-01 14:28:41 -07:00
Bruce Fontaine	c27b834b49	Wrap save/restore logic in tf.function when in eager mode. This allows parallel saving and restoring when using multiple devices. PiperOrigin-RevId: 317719780 Change-Id: Ifb7e34f708da4121b49fb38d8dad046d45fedc42	2020-06-22 13:23:14 -07:00
A. Unique TensorFlower	bc1c0e86a6	Wrap save/restore logic in tf.function when in eager mode. This allows parallel saving and restoring when using multiple devices. PiperOrigin-RevId: 317180143 Change-Id: Icdc740d02beb7c2d3236191add3b72fa103fc134	2020-06-18 14:29:28 -07:00
Bruce Fontaine	a31d5da026	Wrap save/restore logic in tf.function when in eager mode. This allows parallel saving and restoring when using multiple devices. PiperOrigin-RevId: 317144560 Change-Id: Iebc230589a5e2712da03c5db3f45e4fd7eeb5ff9	2020-06-18 11:34:42 -07:00
A. Unique TensorFlower	e853835634	Add an option to choose the I/O Device for saving and loading models. This option enables saving and restoring models to or from filesystems only accessible from the localhost when using multiple devices. The option is available to - Save models: tf.saved_model.save() - Checkpoints: tf.Checkpoint() PiperOrigin-RevId: 307858098 Change-Id: I4cd0a81424e306f0eac40bfb30d5067dfc02d1be	2020-04-22 11:27:15 -07:00
TensorFlower Gardener	eacf534690	Merge pull request #36388 from rahul003:s3_skip_temp PiperOrigin-RevId: 297455352 Change-Id: I41411282776981e9cf4e347b25d238557151f9e6	2020-02-26 14:49:31 -08:00
Rahul Huilgol	e65e99c433	Not use temp files when writing to S3 Address feedback Add test for the python method has_atomic_move Removed old comment, and fixed indentation Remove unncessary imports Remove the test which checks for reference cycles when saving. Since the check for file system introduces a conditional op, it introduces a reference cycle, and this check does not apply anymore Fighting lint Fix lint errors Use returned status of hasAtomicMove	2020-02-20 20:00:53 +00:00
Rohan Jain	c19c8167c2	In all cases, we can't rely on the tensor.device attribute being set. So its better to get the device for a SaveSpec from the device passed in rather. This was an issue with saving iterators because for iterators the resource usually has a device specification but the serialized tensor derived from it might not have it set. As a result, when saving iterators in a sharded fashion all iterators end up on '' device instead which is not what is intended. Also adding support for saving iterators in a sharded fashion to avoid unnecessary copying during checkpointing. PiperOrigin-RevId: 286310419 Change-Id: I1a957af783f7f69753992ce220b59eb43df2c02f	2019-12-18 19:11:04 -08:00
Brian Atkinson	2fcfa6085b	Move additional_deps to deps for cuda_py_test. PiperOrigin-RevId: 285283853 Change-Id: I2534d9fb51955cc9a86d1900ec60fc265f451ddc	2019-12-12 15:28:04 -08:00
Revan Sopher	5304e1240a	Add SaveableHook, a special SaveableObject which registers callbacks. Registers a single constant tensor in order to conform to the SaveableObject API; I feel that's cleaner than special casing SaveableHook throughout the codebase. PiperOrigin-RevId: 280708433 Change-Id: I5872949eca35c7fe3dcc401c52a63b66a141d865	2019-11-15 12:18:18 -08:00
Gaurav Jain	c3973c78f0	Rename internal_convert_to_tensor for performance Calling ops.internal_convert_to_tensor is more efficient than calling ops.convert_to_tensor due to skipping the deprecated_argument_lookup and also less python function calling overhead. We thus swap these functions names so we can optimize most code paths. PiperOrigin-RevId: 274321742	2019-10-12 01:28:49 -07:00
Thomas O'Malley	89e33e5ef3	Add `ShardedVariable` class. PiperOrigin-RevId: 272745815	2019-10-03 15:05:15 -07:00
Tres Popp	e01699ccbd	[TF:XLA] Cleanup cuda_py_test calls with xla_enable_strict_auto_jit = True. PiperOrigin-RevId: 270858131	2019-09-24 02:07:26 -07:00
Gaurav Jain	6e60889d0e	Avoid hashing tensors directly in collections PiperOrigin-RevId: 260844204	2019-07-30 20:40:56 -07:00
Alexandre Passos	aa93ea6441	Clean up the ResourceVariable inheritance hierarchy a bit. PiperOrigin-RevId: 253592362	2019-06-17 09:19:01 -07:00
A. Unique TensorFlower	69feb7e97d	Adjust structure of all BUILD files to recommended style (https://docs.bazel.build/versions/master/skylark/build-style.html#file-structure ), moving loads to top. PiperOrigin-RevId: 252072215	2019-06-07 11:35:53 -07:00
A. Unique TensorFlower	e44f32560d	Apply 'buildozer fix moveLicensesAndDistribs movePackageToTop' to all BUILD files. PiperOrigin-RevId: 249812574	2019-05-24 04:53:01 -07:00
Tom Hennigan	e4b9fda1d0	Fixes for multi-GPU continuous integration failures. 1) Test central storage strategy against numpy dataset. 2) Implement non-overridden methods from Variable in AggregatingVariable. 3) Make ResourceVariableSaveable support `is_resource_variable` instances. PiperOrigin-RevId: 243518258	2019-04-14 12:46:11 -07:00
Taylor Robie	8d24f6ae5c	Implement several optimizations to reduce graph construction time. In approximately decreasing order of significance: 1) Cache various to_string, from_string, and string to string functionality in device.py. 2) Optimize DeviceSpec.to_string to reduce unnecessary string copies. 3) _Skip no-op device assignments when creating ops. (When possible.) 4) Remove hash caching in DeviceSpec (since it can now be computed much more cheaply) which allows less aggressive locking. 5) Misc finesse around how high traffic functions (millions of calls). PiperOrigin-RevId: 242996847	2019-04-10 21:08:37 -07:00
Alexandre Passos	8b5c79a7c7	Narrow scope of assertion which prevents hub module loading inside wrap_function PiperOrigin-RevId: 242896943	2019-04-10 10:50:09 -07:00
Tom Hennigan	ea46787326	Make {Mirrored,Aggregating,TPU}Variable extend tf.Variable. PiperOrigin-RevId: 242345741	2019-04-07 05:59:56 -07:00
Allen Lavoie	d7f5bf5ef2	Switch to sharding checkpoints by default in tf.train.Checkpoint Each worker will do its own local read and write operations rather than copying to one device. This assumes a shared filesystem for all tasks. Largely a copy-and-paste job from tf.train.Saver, except to figure out the proper sharding when executing eagerly we need a device up-front when restoring, so SaveSpecs with callable ops now require devices. Previously we evaluated the save Tensor and checked its device in order to shard restores, but executing eagerly that means allocating lots of unused memory. PiperOrigin-RevId: 240629173	2019-03-27 13:36:06 -07:00
A. Unique TensorFlower	2c0441c286	Enable all currently passing tests. I ran each test 200 times, and these passed without any flakes. PiperOrigin-RevId: 238414725	2019-03-14 04:13:43 -07:00
Allen Lavoie	bd36b48c55	Rename Checkpointable -> Trackable and AutoCheckpointable -> AutoTrackable No API changes in this CL. Just more refactoring for a future API change. PiperOrigin-RevId: 234242335	2019-02-15 17:38:56 -08:00
Allen Lavoie	a3d38e4805	Allow tf.train.Checkpoint.write in tf.functions We can't update the Checkpoint proto state (e.g. for tf.train.latest_checkpoint), but this at least throws an informative error and gives the user the option to write a checkpoint from a function without updating the Checkpoint proto. Restore is more complicated, since we do restore-on-create. I think the right thing to do is to inherit the eagerness of the context where restore() was first called. Too much for this change. PiperOrigin-RevId: 233082153	2019-02-08 11:01:28 -08:00
Allen Lavoie	f9b9cf52eb	Checkpointable->AutoCheckpointable CheckpointableBase->Checkpointable In preparation for adding public symbols. PiperOrigin-RevId: 229459684	2019-01-15 16:13:50 -08:00
Allen Lavoie	750b30d2aa	Make layer JSON config saved in checkpoints optional for restore In general if a Python string is in the checkpoint but not used directly in the saving program, assert_consumed will pass even if the attribute is totally absent on restore. This should fix checkpoint compatibility with saved_model.load() even without proper Keras revived types. There's no reason it should fail if those aren't available for some reason. PiperOrigin-RevId: 229448783	2019-01-15 15:23:45 -08:00
Allen Lavoie	efe565bc09	Make the initial tf.train.Checkpoint.restore() read Tensors in a batch Should be faster when reading from distributed file systems. Does not affect cases where restore-on-create is necessary, but as long as variable objects have been created and tracked before restore() their reads should be batched together. PiperOrigin-RevId: 227911381	2019-01-04 14:13:56 -08:00
Allen Lavoie	2f4d4da52f	Workaround for PartitionedCall trying and failing to run on TPUs when saving Just omits the function decorator for now. This is pretty terrible and we should fix it, but it will need some work on the TPU side. Spoke to iga@. Apparently the CPU annotations don't work because the function captures a resource which is on the TPU (and so the eager placer puts the call op on the TPU). One option is to then XLA-compile the function, although that fails right now because we're trying to save strings and XLA doesn't have a kernel for that. I should also follow up with TPU+checkpointing integration tests. PiperOrigin-RevId: 226390521	2018-12-20 14:18:46 -08:00
Allen Lavoie	a2bf042d36	Use functional_saver to write SaverDefs in tf.saved_model.save Replaces the restore() code with tf.train.Saver's bulk restore logic, which was its default. I only noticed because apparently the other path fails on some saveables, and the restore code gets more thoroughly tested via to_proto. PiperOrigin-RevId: 226077043	2018-12-18 16:19:55 -08:00
Allen Lavoie	66ca3cd10d	Add a functional saver, use it for object-based checkpointing Pulls some utilities out of saver.py which are necessary to actually use it. The functional saver takes only SaveableObjects, so these are utilities for taking a list of whatever users pass in and converting them to those. One other code move for object-based checkpointing to avoid circular imports. Applications which need a SaverDef still use the old Saver. Serialization to SaverDef will be added to this saver in a followup. Does not actually wrap the new Saver's methods in @tf.function yet, since there are memory issues which need to be fixed first. PiperOrigin-RevId: 224561069	2018-12-07 12:52:23 -08:00

31 Commits

No results found.