Commit Graph

1423 Commits

Author SHA1 Message Date
Anna R
e1b13e9e55 Access stateful_random_ops algorithm value as int64. Algorithm is specified to have int64 type here: https://cs.opensource.google/tensorflow/tensorflow/+/master:tensorflow/core/ops/stateful_random_ops.cc;l=38?q=stateful_random_ops.cc and below in this file.
PiperOrigin-RevId: 338326682
Change-Id: I24ba7c43258d65cd690611769e715a2616a8130f
2020-10-21 13:29:40 -07:00
Marissa Ikonomidis
1914c410b2 Update context and tfe_wrapper to support mlir_bridge_rollout
Update eager/context.py and tfe_wrapper to support returning
the real value of mlir_bridge_rollout (enabled/disabled/unspecified)
instead of a bool. This gives users a clearer signal of whether
or not the mlir bridge is being used. At the moment, the mlir
bridge is only enabled when mlir_bridge_rollout is set to
enabled but this will change in the future.

PiperOrigin-RevId: 338124102
Change-Id: I5c93cbdd2815a698e6b41244db8eed716f4988e6
2020-10-20 13:38:03 -07:00
Eugene Brevdo
84967b39fa [TF] [saved_model_cli] Add support for multithreaded cpu service. Off by default.
This change allows the linkage of multithreaded XLA AOT CPU backend objects,
such as multithreaded matmul, conv2d, etc.  These are not enabled by default.

New unit tests confirm that the objects are emitted and linked correctly,
and the resulting computations are numerically correct.

MKL service backend objects are not included.

Other changes:
* C++ Unit tests now use arg_feed_{x,y} instead of arg0/arg1, since the names
  are flaky (they may swap from the signature)
* Add argument "multithreading=" to the bzl file and saved_model_cli.
* Add unit tests using "nm" to ensure that the proper symbols are used when
  enabling or disabling multithreading (not sure if they are windows-friendly).
* Use a simpler and more unique string for the entry_point string.

PiperOrigin-RevId: 338112208
Change-Id: Id734e75e63e72db93a743f451ddb7eb6f489c1c7
2020-10-20 12:36:58 -07:00
Marissa Ikonomidis
0c4416e3c2 Update tf_mlir_enable_mlir_bridge support unspecified
The existing tf_mlir_enable_mlir_bridge flag allows models to
selectively enable or disable the model via TF_XLA_FLAGS. If the
flag is not set, it defaults to false.

In order to slowly and safely rollout the mlir_bridge, we will
need to distinguish between unspecified and forcibly disabled.
If the flag is unspecified, we can selectively choose when the
bridge is enabled. This will allow us to slowly ramp up the
number of models that use the new bridge.

This patch continues to support the existing TF_XLA_FLAG
interface (tf_mlir_enable_mlir_bridge can be set to true or false)
but internally, TensorFlow can now distinguish between false
(forcibly disabled) and unset (unspecified).

PiperOrigin-RevId: 337523318
Change-Id: I8ebb49da104663e12e5c1fa6399a1bf79239a44f
2020-10-16 09:55:12 -07:00
Gunhan Gulsoy
d79eb806c5 Remove the alias for tensorflow/core/framework:tensor_testutil
PiperOrigin-RevId: 337311689
Change-Id: Iab3197ffbc07cc1457f8f77af286c4c59674e351
2020-10-15 08:30:48 -07:00
Yuanzhong Xu
ff2b597e36 Resubmit constant folding change without the 1024byte limit. It was causing tf.where to fail in tf2xla.
PiperOrigin-RevId: 337236352
Change-Id: I44b8a99c0e74f2d4814933e05149e8eab5b04aaa
2020-10-14 21:51:24 -07:00
A. Unique TensorFlower
c201ae4531 Skip computing nodes with known oversized outputs in constant folding.
The threshold on tensor size was applied after the value is computed, only when replacing the old nodes. However, that could already have caused OOM in large models.

Changed compilation to XLA to limit TF constant folding to 1024 bytes, since it's only used for getting the shapes, and XLA internally also has constant folding.

PiperOrigin-RevId: 337226951
Change-Id: Ib7ebb91950e379cac6978027a7162438eb0a58d2
2020-10-14 20:19:22 -07:00
Yuanzhong Xu
c577eb1a3d Skip computing nodes with known oversized outputs in constant folding.
The threshold on tensor size was applied after the value is computed, only when replacing the old nodes. However, that could already have caused OOM in large models.

Changed compilation to XLA to limit TF constant folding to 1024 bytes, since it's only used for getting the shapes, and XLA internally also has constant folding.

PiperOrigin-RevId: 337221696
Change-Id: I4cdca20d28141f34b2c85120298bffb89e6df85d
2020-10-14 19:16:25 -07:00
A. Unique TensorFlower
377ef73611 Support XlaGather and XlaScatter ops for bool.
The undelying XLA ops do work for bool.

PiperOrigin-RevId: 337106941
Change-Id: Ifd920287f1113b9d6140489e8518613547289977
2020-10-14 09:47:40 -07:00
Brian Patton
4e330bfbd1 Exposes variadic reduce to TF python via xla.py
PiperOrigin-RevId: 336936774
Change-Id: Iee8bf17a18594abe2a301802eade9e7c4f4c5e34
2020-10-13 13:16:03 -07:00
Brian Patton
dd21d56ae8 Adds double support for StatelessMultinomial (which is really a categorical).
Relevant to https://github.com/tensorflow/probability/issues/1127

PiperOrigin-RevId: 336903719
Change-Id: Id46908a9e57874387ef77208a21eb6ce34e6a9b1
2020-10-13 10:34:21 -07:00
Gunhan Gulsoy
0af213f96c Remove dependencies on aliases in tensorflow/core/BUILD
PiperOrigin-RevId: 336590418
Change-Id: I9f7207c64c73867a0bde9a801f26b3785f67864e
2020-10-11 21:03:09 -07:00
Lucy Fox
e87c9e1b85 Fix parameter name mismatch.
Making clang-tidy happy.

PiperOrigin-RevId: 336204904
Change-Id: Ife34458a3d2391f8cc386b0f0d938c5721ef0368
2020-10-08 18:28:28 -07:00
Andy Ly
c0da1d4092 Update CompileGraphToXlaHlo to populate target/control ret nodes.
This is in preparation of updating graph pruning to always prune imported function graphs.

PiperOrigin-RevId: 335944889
Change-Id: I3f6156aa08384883eee6227210f8fc8f1b7cc575
2020-10-07 14:07:41 -07:00
Marissa Ikonomidis
3708f73481 Support Session's ConfigProto in TF2 MLIR bridge
Some models are using "TF2" but also using Session and passing
a ConfigProto. This is TF1 code running on the TF2 MLIR bridge.
The TF2 MLIR bridge assumed that this was not possible. This cl
updates the TF2 version of the MLIR bridge to support a ConfigProto
passed in via Session.

Disable MLIR bridge presubmit testing of saved_model_test.py because this fix reveals that the test is actually broken. It is using TF2 control flow but loading a model with Session.

PiperOrigin-RevId: 335915669
Change-Id: Ib50bef389449ce0011878dd50b73856e9c520289
2020-10-07 11:44:49 -07:00
A. Unique TensorFlower
c5d4acd09a Internal change
PiperOrigin-RevId: 335680049
Change-Id: I91e6edc767caf596d3cf1a28c075cc87388043e2
2020-10-06 12:14:02 -07:00
Marissa Ikonomidis
0eda09a3fb Update tf_mlir_enable_mlir_bridge support unspecified
The existing tf_mlir_enable_mlir_bridge flag allows models to
selectively enable or disable the model via TF_XLA_FLAGS. If the
flag is not set, it defaults to false.

In order to slowly and safely rollout the mlir_bridge, we will
need to distinguish between unspecified and forcibly disabled.
If the flag is unspecified, we can selectively choose when the
bridge is enabled. This will allow us to slowly ramp up the
number of models that use the new bridge.

This patch continues to support the existing TF_XLA_FLAG
interface (tf_mlir_enable_mlir_bridge can be set to true or false)
but internally, TensorFlow can now distinguish between false
(forcibly disabled) and unset (unspecified).

PiperOrigin-RevId: 335662030
Change-Id: Iefc44436620e52ff21a72583d57ebf29124a2691
2020-10-06 10:37:32 -07:00
Russell Power
5b5aab7f63 Internal change
PiperOrigin-RevId: 335147548
Change-Id: Ib445cfbcb28421b4eb522d4d9524e4a64fe631df
2020-10-02 20:33:42 -07:00
A. Unique TensorFlower
4eb05c3014 Use macro helpers for TPU builds and clean up define flags.
PiperOrigin-RevId: 334732331
Change-Id: Ice5d240cf785d64d11d4f634ff8955933da26b4d
2020-09-30 20:13:49 -07:00
Russell Power
45d693198d Use macro helpers for TPU builds and clean up define flags.
PiperOrigin-RevId: 334725778
Change-Id: Ib0c04366bd9e460329775075d82e7cfd47ed6d4e
2020-09-30 19:10:21 -07:00
Gunhan Gulsoy
ac21f961f3 Remove portable_tf2xla_proto rule.
PiperOrigin-RevId: 334682641
Change-Id: I3def109b64ba10bca43bcd26cfe6a1cd6ec9f8b1
2020-09-30 14:52:46 -07:00
A. Unique TensorFlower
fa75523767 Add shape inference function for XlaScatter
PiperOrigin-RevId: 334603580
Change-Id: Idc193c32dd429cdf7f14a8496c6340c4e7b803b4
2020-09-30 08:31:04 -07:00
Scott Zhu
81a44fd840 Internal BUILD file change.
PiperOrigin-RevId: 334450244
Change-Id: Ia293bf03f063900ed65c367f1978ad286ddebb06
2020-09-29 13:44:38 -07:00
TensorFlower Gardener
ab55c62645 Merge pull request from kaixih:reduce_ops_layout
PiperOrigin-RevId: 333786383
Change-Id: Ifefb0a3d23cf7779858e7c011fd4195024ab9dc5
2020-09-25 12:56:20 -07:00
Wenhao Jia
45f9c4a4ea Add target environment constraints to more TensorFlow packages and targets.
PiperOrigin-RevId: 333668585
Change-Id: Id2312d95ff91c0ef662b750fd45bd95e8711753c
2020-09-24 22:38:24 -07:00
Gunhan Gulsoy
b5548734b9 Remove references to tf_proto_library in TF.
Merge all language specific proto libraries into just tf_proto_library.

PiperOrigin-RevId: 333400278
Change-Id: Ic891331668db3e562d42805295eade90fd017e91
2020-09-23 16:51:13 -07:00
Kaixi Hou
7a38d3fd96 update xla data format map ops 2020-09-23 11:51:26 -07:00
Peter Hawkins
3daf30f97d [XLA] Add support for complex numbers to Qr decomposition expander.
PiperOrigin-RevId: 333208193
Change-Id: Ic9adc699a11ffcc23a0ae518b54ee29cce8569ce
2020-09-22 19:36:07 -07:00
George Karpenkov
0e718f2b0a [TF2XLA] Remove the serialization of CustomKernelCreator, since there is only one, and we won't add new ones
Serialization adds a new surface area for bugs, as not all the callers
propagate the CustomKernelCreator correctly.  Moreover, the mechanism is quite
hacky and in the future we plan to potentially switch to a different one.

PiperOrigin-RevId: 333111910
Change-Id: I5a02200dfdffde657bd5d9e4547c470d8644d892
2020-09-22 10:59:59 -07:00
A. Unique TensorFlower
7d3979c5ce Add XLA implementation for tensor_scatter_nd_min and tensor_scatter_nd_max, and implement gradient for these functions.
PiperOrigin-RevId: 332948708
Change-Id: Ic5e3c138cd04a91a6d1fb1bccad464d146facadf
2020-09-21 15:40:42 -07:00
Wenhao Jia
8f1362de18 Add target environment constraints to a subset of TensorFlow packages and targets.
PiperOrigin-RevId: 332884872
Change-Id: I65691fa2021c065e6c2ab57815d5a2b342d30ee2
2020-09-21 10:57:01 -07:00
Srinivas Vasudevan
b946521465 Add XLA registration for Polygamma, to be used for Digamma gradients.
PiperOrigin-RevId: 332557647
Change-Id: I97fba661240412a49716544c5cf106a700ab89a4
2020-09-18 17:40:20 -07:00
Srinivas Vasudevan
c8ee679b8e Add XLA registration for Zeta/Zetac
PiperOrigin-RevId: 332509521
Change-Id: If93869e3e4732cb09e85ec9cc61b2fe085fdb1ce
2020-09-18 13:04:32 -07:00
A. Unique TensorFlower
6d605dde0a Internal change
PiperOrigin-RevId: 332379487
Change-Id: Ie43ff74a010bcc9893f6c358b2e56b27ef67fcac
2020-09-17 21:37:45 -07:00
A. Unique TensorFlower
b1c97a0bb2 Internal change
PiperOrigin-RevId: 332361104
Change-Id: I1f66d7fa0a7fa5e48656232278ae9e22f26f4747
2020-09-17 18:49:00 -07:00
Peng Wang
e922e10a0f Adds a new set of stateless RNG ops, and rebases existing stateless-RNG Python API and tf.random.Generator onto them.
The new ops have three differences from existing (old) stateless RNG ops:
* They take in `key` and `counter` instead of `seed` (thus no seed scrambling).
* They take in an `alg` argument to control which RNG algorithm to use, unlike the old ones which pick algorithm based on device.
* They don't have `HostMemory` constraints on `key` and `counter` (the old ones have such constraints on `seed`).

Two new ops `StatelessRandomGetKeyCounterAlg` and `RngReadAndSkip` are also added to bridge the gaps between the new stateless ops and the Python API for stateless RNGs and tf.random.Generator, so that the Python API's behavior doesn't change.

Also adds set_soft_device_placement(False) to tests to control which kernels are tested.

PiperOrigin-RevId: 332346574
Change-Id: Ibe0e41cccce82e50b5581ea6298218efb163157a
2020-09-17 17:06:46 -07:00
Yunxing Dai
6e71a34542 [XLA] Support strided slice grad in a while loop.
- The inputs to strided slice are stored in a stack for backward pass.
- Popping items from the stack makes the input unknown in backward pass.
- In the case where the begins and ends are unknown, lower the strided slice grads into dynamic update slice instead.

PiperOrigin-RevId: 330987081
Change-Id: I0116a02f2fd7d660b49757622afc9934bb4b37e6
2020-09-10 12:12:25 -07:00
Mehdi Amini
dfd9dedc17 Cleanup: remove remaining uses of Dialect registration from TensorFlow (NFC)
PiperOrigin-RevId: 330612300
Change-Id: I75abfceec5bbedc5a1c4404a4fbf43467bbe45a6
2020-09-08 16:51:40 -07:00
TensorFlower Gardener
b2f737e6ae Merge pull request from trentlo:parallel-reduce
PiperOrigin-RevId: 329867466
Change-Id: I899ad5926aa2379a302435cd894457f30efb7d15
2020-09-03 00:36:44 -07:00
Andy Ly
0651d1ac60 Update CompileGraphToXlaHlo to use llvm::ArrayRef<XlaArgument> instead of llvm::ArrayRef<const XlaArgument> (NFC).
This will allow for std::vector<XlaArgument> and llvm::SmallVector arg parameters in CompileGraphToXlaHlo to be used under different builds.

PiperOrigin-RevId: 329757301
Change-Id: I1025f3106af21b2672e2157c3f5b80af07ef0d0f
2020-09-02 11:54:55 -07:00
Yunxing Dai
9c703cc790 Add xla.set_bound op.
For cases where we cannot infer the bound of a value, the compilation would fail. This gives user an escape patch.

PiperOrigin-RevId: 329626655
Change-Id: Ib5d71054088692697eaf5f2b21c0c5d1a097f1eb
2020-09-01 19:11:00 -07:00
Blake Hechtman
22f5d50f9a [TF2XLA] Make dequantize support dynamic range.
PiperOrigin-RevId: 329444190
Change-Id: Icac703969a95093dd7982820c1706887a50d1bce
2020-08-31 22:28:10 -07:00
George Karpenkov
c1c35ab0b8 [rollback of rollback] [TF2XLA] Do not copy in XLA device implementation; instead, request correct placement from the start.
PiperOrigin-RevId: 329411457
Change-Id: Icc349cbf3d7a3aec9e43fb92605cb468a759648f
2020-08-31 17:54:50 -07:00
Russell Power
1619f2f19f Implement TensorStridedSliceAssign XLA op.
PiperOrigin-RevId: 329315230
Change-Id: I5aca22493f5fa38fcd03a3f78f6d9e9afdaadb8b
2020-08-31 09:25:05 -07:00
A. Unique TensorFlower
16d2b56de2 Add op definition and bridge implementation for TensorStridedSliceUpdate.
PiperOrigin-RevId: 329122036
Change-Id: I616b81ec2b328eddc77f662c77ae956972cd265a
2020-08-29 14:32:54 -07:00
Russell Power
04eeb6d145 Add op definition and bridge implementation for TensorStridedSliceUpdate.
PiperOrigin-RevId: 329118405
Change-Id: Ib8ca243f37d2728efb327d4ef52c7b2a53790c3e
2020-08-29 13:28:58 -07:00
A. Unique TensorFlower
520cdbfaa6 [TF2XLA] Do not copy in XLA device implementation; instead, request correct placement from the start.
PiperOrigin-RevId: 329013120
Change-Id: I9c7a180658b8664c9b6ad689be26b2ac4709419f
2020-08-28 14:52:34 -07:00
George Karpenkov
c287ba8d5e [TF2XLA] Do not copy in XLA device implementation; instead, request correct placement from the start.
PiperOrigin-RevId: 328989681
Change-Id: Ia57d5cd510091e94f58081d58e775d97e8c5ba9e
2020-08-28 12:41:21 -07:00
Trent Lo
8daab75490 XLA Parallel reduce.
Extend the XLA codegen to generate parallel reductions when there are multiple
reduce instructions in a fusion computation.

We see ~3% e2e gain for NVIDIA JoC BERT.

For `ManyParallelReductions` with 128 reduce instructions in the unittest, the
execution time is reduced from 325us to 3.9us (83X), reported by nvprof as below.

Before:

            Type  Time(%)      Time     Calls       Avg       Min       Max  Name

                   32.50%  325.54us         1  325.54us  325.54us  325.54us  fusion

After:

            Type  Time(%)      Time     Calls       Avg       Min       Max  Name

                    0.59%  3.9030us         1  3.9030us  3.9030us  3.9030us  fusion
2020-08-27 15:21:33 -07:00
George Karpenkov
880678767f [TF2XLA] [NFC] Remove misleading comment
PiperOrigin-RevId: 328650351
Change-Id: I6fcfa84da12e248116c270d9d70e440c7e70a5f7
2020-08-26 18:31:17 -07:00