[XLA] Update documentation on tf.function(experimental_compile=True)
PiperOrigin-RevId: 296532815 Change-Id: Ibacf4929fc689dce87433a4e063b923b24ef9773
This commit is contained in:
parent
4e3e368ae6
commit
56607b6ea5
@ -36,6 +36,5 @@ upper_tabs:
|
||||
path: /xla/tutorials/autoclustering_xla
|
||||
- title: Use XLA with tf.function
|
||||
path: /xla/tutorials/compile
|
||||
status: experimental
|
||||
|
||||
- include: /_upper_tabs_right.yaml
|
||||
|
||||
@ -47,27 +47,13 @@ removing memory operations is one of the best ways to improve performance.
|
||||
A simplest way to start using XLA in TensorFlow models is to enable
|
||||
_auto-clustering_, which automatically finds _clusters_ (connected subgraphs)
|
||||
within the TensorFlow graph which can be compiled and executed using XLA.
|
||||
Auto-clustering on GPU can be enabled by either modifying the `TF_XLA_FLAGS`
|
||||
environment variable:
|
||||
Auto-clustering on GPU can be enabled by setting the `TF_XLA_FLAGS` environment
|
||||
variable:
|
||||
|
||||
```
|
||||
$ TF_XLA_FLAGS=--tf_xla_auto_jit=2 path/to/your/tf/program
|
||||
```
|
||||
|
||||
Or by setting a configuration value within the program:
|
||||
|
||||
```
|
||||
import tensorflow as tf
|
||||
|
||||
tf.config.optimizer.set_jit(True)
|
||||
|
||||
# ... the rest of your program ...
|
||||
```
|
||||
|
||||
Note: The JIT level is cached for a session, and can only be set in the very
|
||||
beginning of the program. In order to change it midway through, the session
|
||||
needs to be cleared: `tf.keras.backend.clear_session()`
|
||||
|
||||
Auto-clustering is currently optimized for GPU workloads, but it can also be
|
||||
enabled on CPU by additionally using the flag `--tf_xla_cpu_global_jit`:
|
||||
|
||||
@ -75,27 +61,63 @@ enabled on CPU by additionally using the flag `--tf_xla_cpu_global_jit`:
|
||||
$ TF_XLA_FLAGS="--tf_xla_auto_jit=2 --tf_xla_cpu_global_jit" path/to/your/program
|
||||
```
|
||||
|
||||
Auto-clustering support on a CPU and on multi-GPU environments is experimental.
|
||||
Note: Auto-clustering support on CPU and on multi-GPU environments is
|
||||
experimental.
|
||||
|
||||
For a detailed usage example, see the
|
||||
[auto-clustering tutorial colab](./tutorials/autoclustering_xla.ipynb).
|
||||
For a detailed usage example see the [auto-clustering tutorial
|
||||
colab](./tutorials/autoclustering_xla.ipynb).
|
||||
|
||||
### Explicit compilation
|
||||
### Explicit compilation with tf.function
|
||||
|
||||
Auto-clustering is a great tool for making the model faster without any changes
|
||||
to the code, but it may be hard to understand what changes have been performed.
|
||||
|
||||
Explicit compilation API offers a more fine-grained control for choosing which
|
||||
functions should be compiled with XLA. However, it might require restructuring
|
||||
of the source code, as not all TensorFlow operations can be represented in XLA.
|
||||
functions should be compiled.
|
||||
For example, the following TensorFlow function which performs the MNIST training
|
||||
is compiled with XLA:
|
||||
|
||||
Note: Using the explicit compilation on API on functions which can not be
|
||||
represented in XLA results in an exception.
|
||||
```
|
||||
@tf.function(experimental_compile=True)
|
||||
def train_mnist(images, labels):
|
||||
images, labels = cast(images, labels)
|
||||
|
||||
Optimizing sections of the program using
|
||||
[`tf.function`](https://www.tensorflow.org/api_docs/python/tf/function) is a
|
||||
standard approach for [improving
|
||||
performance](https://www.tensorflow.org/tutorials/customization/performance) of
|
||||
TF2 programs. You can enable compilation with XLA by setting the
|
||||
`experimental_compile` argument of `tf.function` to `True`. See the [tutorial
|
||||
colab](./tutorials/compile.ipynb) for usage examples.
|
||||
with tf.GradientTape() as tape:
|
||||
predicted_labels = layer(images)
|
||||
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
|
||||
logits=predicted_labels, labels=labels
|
||||
))
|
||||
layer_variables = layer.trainable_variables
|
||||
grads = tape.gradient(loss, layer_variables)
|
||||
optimizer.apply_gradients(zip(grads, layer_variables))
|
||||
```
|
||||
|
||||
The `experimental_compile` API has _must-compile_ semantics: either the entire
|
||||
function is compiled with XLA, or an `errors.InvalidArgumentError` exception is
|
||||
thrown. XLA can not currently compile functions where dimensions are not
|
||||
_inferrable_: that is, if it's not possible to infer the dimensions of all
|
||||
tensors without running the entire computation. For example, the following
|
||||
function will not compile:
|
||||
|
||||
```
|
||||
@tf.function
|
||||
def not_compilable(x):
|
||||
return tf.unique(x)
|
||||
```
|
||||
|
||||
Shapes can vary across the runs though:
|
||||
|
||||
```
|
||||
@tf.function(experimental_compile=True)
|
||||
def recompiled_on_launch(a, b):
|
||||
return a + b
|
||||
|
||||
recompiled_on_launch(tf.ones([1, 10]), tf.ones([1, 10]))
|
||||
recompiled_on_launch(tf.ones([1, 100]), tf.ones([1, 100]))
|
||||
```
|
||||
|
||||
See the [tutorial colab](./tutorials/compile.ipynb) for a more detailed usage
|
||||
example.
|
||||
|
||||
### AOT (Ahead-of-time) compilation for CPU with `tfcompile`
|
||||
|
||||
|
||||
@ -45,9 +45,9 @@
|
||||
"source": [
|
||||
"# Classifying CIFAR-10 with XLA\n",
|
||||
"\n",
|
||||
"In this colab we train a TensorFlow model to classify the [CIFAR-10](https://en.wikipedia.org/wiki/CIFAR-10) dataset, and we compile it using XLA.\n",
|
||||
"This tutorial trains a TensorFlow model to classify the [CIFAR-10](https://en.wikipedia.org/wiki/CIFAR-10) dataset, and we compile it using XLA.\n",
|
||||
"\n",
|
||||
"We start by loading and normalizing the dataset using the Keras API:"
|
||||
"Load and normalize the dataset using the Keras API:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -197,7 +197,8 @@
|
||||
},
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"tf.keras.backend.clear_session() # We need to clear the session to enable JIT in the middle of the program.\n",
|
||||
"# We need to clear the session to enable JIT in the middle of the program.\n",
|
||||
"tf.keras.backend.clear_session()\n",
|
||||
"tf.config.optimizer.set_jit(True) # Enable XLA.\n",
|
||||
"model = compile_model(generate_model())\n",
|
||||
"(x_train, y_train), (x_test, y_test) = load_data()\n",
|
||||
|
||||
@ -87,7 +87,6 @@
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import tensorflow as tf\n",
|
||||
"\n",
|
||||
"tf.compat.v1.enable_eager_execution()"
|
||||
]
|
||||
},
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user