STT-tensorflow

Author	SHA1	Message	Date
Yujing Zhang	58f1434ed4	Disable multi_worker_continuous_run_test on asan PiperOrigin-RevId: 358344741 Change-Id: I41b0536dfbde6168085477306cab6a8b7c7f65a6	2021-02-18 23:30:57 -08:00
Yuefeng Zhou	426558e017	Remove cluster_coordinator_test.py from pip, add it to oss. PiperOrigin-RevId: 358113811 Change-Id: I3e04a42ba6e3c264c3a725d0b13cdd3565bbe6a5	2021-02-17 23:43:11 -08:00
Rick Chao	18bfd69f92	PSv2: Enable cluster_coordinator_test in OSS as it has been previously fixed. PiperOrigin-RevId: 358055546 Change-Id: Idc309a81d052e03bdc44c2165295c11528a39564	2021-02-17 16:14:47 -08:00
Chenkai Kuang	9663abe4c9	Document limitation of using `tf.data.Options` in coordinator.create_per_worker_dataset. PiperOrigin-RevId: 357907240 Change-Id: Ib6ce02efeca322409969af14be3318248ce928b4	2021-02-17 02:37:47 -08:00
Rick Chao	e66d5d56a8	Multi-worker testing: Reenable multi_worker_continuous_run_test on TAP as it has been fixed. PiperOrigin-RevId: 357889388 Change-Id: Ib713df996bd150c1ef1fcd1e33cd69fd9689612c	2021-02-17 00:10:06 -08:00
Rick Chao	7db445b3fb	PSv2: Enable cluster_coordinator_test in OSS as it has been previously fixed. PiperOrigin-RevId: 357655428 Change-Id: I0fdfbb6c6664058b25a8c29e21447b0f4e01ef4a	2021-02-15 23:30:39 -08:00
Chenkai Kuang	aa9bd19fe3	Fix a multi-gpu test failure. The test uses tf.constant as input to all_reduce in pure eager mode, however in eager mode tf.constant always creates host tensors regardless of the enclosing device scope. This leads to NCCL error. PiperOrigin-RevId: 357252841 Change-Id: Iddaf5f52fe6634ec29dd385a9fa034761f3df91f	2021-02-12 13:12:33 -08:00
Yunxing Dai	443f13e41a	[xla_compiler] Do not promote TF shape constant folding when input is a XLA dynamic shape. PiperOrigin-RevId: 356904310 Change-Id: Iff329b8f81777f895333726e8ca98e2d3ad4ddb5	2021-02-10 22:36:47 -08:00
Rick Chao	4835f6ec4f	PSv2/cfit: tf.distribute changes to accompany compile-fit support. 1) Single instance of ClusterCoordinator given a Strategy object 2) Circular references of ClusterCoordinator and ParameterServerStrategy 3) Attribute of a Strategy indicating if it is supposed to be used with a ClusterCoordinator PiperOrigin-RevId: 356868615 Change-Id: If19600c0101f40a9e840fe71abb848f386e32735	2021-02-10 17:51:16 -08:00
A. Unique TensorFlower	24493c8698	Fix the returned response for experimental_local_results in case of MirroredStrategy for dict, list and tuple types. PiperOrigin-RevId: 356663538 Change-Id: Ie2338c6dbb63ac0e129b9051dae56626e47f6450	2021-02-09 21:48:26 -08:00
Isha Arkatkar	21273f6e32	Fix the returned response for experimental_local_results in case of MirroredStrategy for dict, list and tuple types. PiperOrigin-RevId: 356645884 Change-Id: I0c5d5628e8bb88d661ecba41a90ddbe13aa59543	2021-02-09 19:29:21 -08:00
Chenkai Kuang	eb31d8660d	Add all_reduce APIs that can be called in replica context to class `CrossDeviceOps` and `StrategyExtended`. For `StrategyExtended`, it is a private API that will be used by `ReplicaContext.all_reduce`. This is in preparation for deprecation of merge_call from user API. PiperOrigin-RevId: 356604626 Change-Id: I2528b35b87db1b93907b17a246dbfbcfcb64ad33	2021-02-09 15:29:14 -08:00
Rick Chao	d4c8c579e1	MultiProcessRunner: Enable multi_process_runner_no_init_test in OSS with MultiProcessRunner's availability. PiperOrigin-RevId: 356560120 Change-Id: Id34aebde1e405bcca65f2d3d42ef439923f9b434	2021-02-09 12:29:40 -08:00
Ran Chen	beab125d24	[rollback] Use self.handle inside ResourceVariable to allow tf.distribute to customize handle behavior PiperOrigin-RevId: 356541183 Change-Id: If4dbfc32a834c464bc94ce1c3ae71b3fb72e1e55	2021-02-09 11:01:03 -08:00
TensorFlower Gardener	9829af63fe	Merge pull request #46747 from 8bitmp3:patch-1 PiperOrigin-RevId: 356526123 Change-Id: Ie4e5815ba6291dc7505bf08629f355085c50f3e4	2021-02-09 10:05:41 -08:00
8bitmp3	38784a85c3	Address Ubuntu Sanity error	2021-02-07 01:27:31 +00:00
Ran Chen	525524f99e	Fix a silly mistake in update PiperOrigin-RevId: 355902594 Change-Id: I02261440a6c3d42c57ee285e64cd5efbdc45bb35	2021-02-05 12:34:34 -08:00
Xinyi Wang	66e3bde76a	Delete meaningless comment. PiperOrigin-RevId: 355821017 Change-Id: Iadbaac2670ebf3456bdc939fd81b02d5947da97d	2021-02-05 03:58:15 -08:00
Ran Chen	0ac07a2fc4	Retire AutoPolicy This is part of the effort to refactor distributed variables. Auto is somewhat confusing and adds additional implementation complexity. PiperOrigin-RevId: 355733796 Change-Id: I7446c3ed706624178fcb26c9b992632a93b939f6	2021-02-04 16:16:59 -08:00
Revan Sopher	a0bd36e7f4	Support passing and returning Nones in TPUStrategy. This is supported in non-TPU strategies, and surprises users trying to migrate. The lower-level input and output replication ops can't handle values that aren't convertible to Tensor, so we need to do some massaging around this. Nones in inputs are temporarily replaced with a constant, then replaced after replication. Nones in outputs are simply not replicated, as output replication happens at a per-value granularity. PiperOrigin-RevId: 355690080 Change-Id: I9d2435e953c8feb7818a882cb5280327f310c919	2021-02-04 13:07:39 -08:00
Ran Chen	f4b06261c9	Use self.handle inside ResourceVariable to allow tf.distribute to customize handle behavior I'm working on a new version of DistributedVariable which directly inherits from BaseResourceVariable. Its handle would return different resource tensors under different context, e.g. self.handle would be a replicated tensor under tpu context. This can avoid the need to use raw variable operations for special resource handles like tpu replicate handle or parallel device handle. PiperOrigin-RevId: 355663353 Change-Id: I16201f94ef27a0dc7ac1491c616d7bd68397123a	2021-02-04 11:26:57 -08:00
Ran Chen	a58d44afc8	[retry]Move enclosing_tpu_context to a separate util file This is part of the variable refactor work to avoid dependency cycles. PiperOrigin-RevId: 355654271 Change-Id: I92f0d00ddff6655c174d999abd0290ae3e4c1849	2021-02-04 11:04:21 -08:00
A. Unique TensorFlower	c37243b055	Fix a condition used for collective ops. PiperOrigin-RevId: 355497796 Change-Id: I297c1841569d81742006fc5470a311fd2317fd10	2021-02-03 15:55:24 -08:00
Chenkai Kuang	6ff048e951	Comment on why NCCL can't be ordered in tf1. PiperOrigin-RevId: 355484203 Change-Id: I23606db9759b72355f58147cdf7ad61d3076641d	2021-02-03 14:39:20 -08:00
A. Unique TensorFlower	ace531b56d	Move enclosing_tpu_context to a separate util file This is part of the variable refactor work to avoid dependency cycles. PiperOrigin-RevId: 355456079 Change-Id: I7a8afc89d17eee9372afb3fd4c6da8126791e499	2021-02-03 12:35:59 -08:00
Ran Chen	428ce93ee4	Move enclosing_tpu_context to a separate util file This is part of the variable refactor work to avoid dependency cycles. PiperOrigin-RevId: 355414142 Change-Id: I36651a7be6462c198aae477923bc2ef0f7e7d0fb	2021-02-03 09:36:08 -08:00
Yuefeng Zhou	40d5a5685f	Fix strategy.run's docstring: python literals are not supported in args or kwargs. PiperOrigin-RevId: 355343919 Change-Id: I3bc8cf411d799a9ff36b722812e9600955278015	2021-02-03 01:18:09 -08:00
Xinyi Wang	c4d1165b15	Disable collective ops for MS. PiperOrigin-RevId: 355266199 Change-Id: I58e71e086542b39effb5c4f0f3d088169b8810b1	2021-02-02 15:34:13 -08:00
Chenkai Kuang	b4bf78ffec	Support slicing in ShardedVariable. The slicing semantic is identical to Tensor/Variable. PiperOrigin-RevId: 355249212 Change-Id: Ic9a14b5ae5cc0a446142eaa529f052c09c445396	2021-02-02 14:14:36 -08:00
Chen Chen	0add9081c1	Skip //tensorflow/python/distribute:strategy_gather_test_tpu in oss to save the build PiperOrigin-RevId: 355241184 Change-Id: I4be22489929d8f80bbbaae69a7cb58c7c55610d8	2021-02-02 13:43:54 -08:00
A. Unique TensorFlower	f7d0a77b53	An internal change. PiperOrigin-RevId: 355220346 Change-Id: I78e8d291cf2a6168ec5ba9b2679f8292fc1271e6	2021-02-02 12:07:52 -08:00
Ran Chen	e8262389c4	strategy.extended.update allows assigning non-mirrored values to non-mirrored variables PiperOrigin-RevId: 355043692 Change-Id: I4b33e840636d77489358880ee506868f6dae787b	2021-02-01 15:50:31 -08:00
A. Unique TensorFlower	055896a275	Always enable get_next_as_optional unless the dataset is finite. PiperOrigin-RevId: 354672864 Change-Id: I3a490952e8bd075bf035a0126e62b9cf5082104e	2021-01-29 23:11:37 -08:00
Ruoxin Sang	cf3d55222d	Always enable get_next_as_optional unless the dataset is finite. PiperOrigin-RevId: 354668482 Change-Id: I5af5fffa27bdda4b0774a231ca804995c78f9bde	2021-01-29 22:11:49 -08:00
Ran Chen	1a46fdc4a2	Remove collective v1 code path PiperOrigin-RevId: 354577402 Change-Id: I200d98a6a80dfe1e463044f9dedef9291ff7d846	2021-01-29 11:47:05 -08:00
Ran Chen	8a356e8ca5	[retry] Use same var key in _create_slots/get_slot in V1 optimizer We have special handling for distributed variable in get_slot, but not create_slot, while these keys need to match. This change modifies get_slot to use _var_key as well to avoid confusion. It is also to prepare for a upcoming refactor in dist strat code. Note that we need to make sure the keys don't change, so existing checkpoints can still be used. A bunch of build rules are modified to break cyclic dependencies. PiperOrigin-RevId: 354341520 Change-Id: Ifd9786263024a11806ddde0c3bd1d36157ab8db7	2021-01-28 10:48:00 -08:00
8bitmp3	95930eeb24	Review tpu_strategy.py following feedback	2021-01-28 17:12:04 +00:00
Andrew Audibert	4b124a09df	Add no_oss tag for flaky parameter_server_strategy_v2_test PiperOrigin-RevId: 354262891 Change-Id: I5f14527aad5caa11dd52176201b2d813d9682b3c	2021-01-28 01:19:11 -08:00
8bitmp3	54a8ca01a0	Improve rendering for tf.distribute.cluster_resolver.TPUClusterResolver in tpu_strategy API docs	2021-01-28 00:55:35 +00:00
Ran Chen	3db793ee03	Remove the workaround that sets PerReplica spec to dynamic batch It's no longer needed as we stopped reusing collective instance keys. Note that we still modifies element_spec to have a dynamic batch for multi worker strategies when partial batch is enabled, so that element_spec is compatible with the data produced. PiperOrigin-RevId: 354132185 Change-Id: I3857b4bb25c825befdd1f7c667437dc3bbf4ba50	2021-01-27 11:29:31 -08:00
Christian Sigg	4838793e12	Comment out a number of google-internal targets when copybara-exporting instead of removing them. PiperOrigin-RevId: 353848826 Change-Id: I0801c0e713a0c63597deb5aed31c8bdb37999c6a	2021-01-26 05:47:31 -08:00
Xinyi Wang	30fb80d468	Swap the use of NcclAllReduce for NCCL Collective Ops in MirroredStrategy. Also remove the use of async executor to launch collective ops in eager mode and use one thread per device instead. This resolves the issue of not being able to call numpy() on the result of async executor. This change applies to MWMS too. PiperOrigin-RevId: 353355403 Change-Id: I9c9f30dfe18dc830a4a8fa9bbaec042c7c2edd8f	2021-01-22 18:19:19 -08:00
Rick Chao	140a2b2bcb	PSv2: Merge cluster_coordinator_mpr_test into fault_tolerance_test: step 3: verifying 1) executing functions on workers after PS failure results in expected failure types. PiperOrigin-RevId: 353334801 Change-Id: I4aeab0b088acec0c204d3aba7dd7a6bac84817e2	2021-01-22 16:01:31 -08:00
TensorFlower Gardener	b444969f2a	Merge pull request #46545 from ROCmSoftwarePlatform:google_upstream_rocm_misc_update_210118 PiperOrigin-RevId: 353101032 Change-Id: I1250b4f0b23ae581d10f33a461986c1f31fc7372	2021-01-21 14:21:02 -08:00
Ruoxin Sang	5642c34f2e	In dynamic padder, use `xla.set_dynamic_dimension_size` to set dimension upper bound rather than propagating `padding_map` to XLABuilder. PiperOrigin-RevId: 352929263 Change-Id: Ie1b284536a0ca25abdb51fde9462034d0f835894	2021-01-20 20:03:11 -08:00
A. Unique TensorFlower	985ad0276a	PY2 removal cleanup PiperOrigin-RevId: 352907145 Change-Id: I82de30d92dc9c2b53215d6d5732c67afe339c23d	2021-01-20 17:11:44 -08:00
Deven Desai	fbf8a4a1f1	Adding no_rocm tag to unit-tests that are FLAKY on the ROCm CI nodes. The cause for their flakiness has been identified and the fix for it will be in ROCm 4.1. See JIRA ticket SWDEV 263833 for details	2021-01-20 03:32:35 +00:00
Revan Sopher	37d51318b0	Fix handling of TPUStrategy.run() when passing Variables to methods. If the user function defines "self" as the first argument, we skip it. Note that this will fail in the weird case of a user function that defines "self" without being a method, in which case the fix (and best practice) would be to name the arg something else. PiperOrigin-RevId: 352576482 Change-Id: I82622536fd89ce77993bcfe1c65f5f172e8ebcd4	2021-01-19 08:49:06 -08:00
A. Unique TensorFlower	034633f23b	PY2 removal cleanup PiperOrigin-RevId: 352106691 Change-Id: I382d53c64f0d29da430b8cb6d2395a2cb281509e	2021-01-15 16:48:57 -08:00
RJ Skerry-Ryan	102e1f9855	Expand distribute_utils.regroup to work with collections.abc.Mapping-derived containers. Motivation: This enables user-defined dict-like types inheriting from collections.abc.Mapping to work as return values of functions used with DistributionStrategy.run. Without this change, the entire collection is wrapped in a PerReplica which breaks assumptions of downstream code. PiperOrigin-RevId: 352064455 Change-Id: Iefda92654fa73d12ab213abe7ea13e0007201f95	2021-01-15 12:46:22 -08:00

1 2 3 4 5 ...

1596 Commits