[XLA] Update documentation on tf.function(experimental_compile=True)

PiperOrigin-RevId: 296532815
Change-Id: Ibacf4929fc689dce87433a4e063b923b24ef9773
This commit is contained in:
George Karpenkov 2020-02-21 16:15:38 -08:00 committed by TensorFlower Gardener
parent 4e3e368ae6
commit 56607b6ea5
4 changed files with 57 additions and 36 deletions

View File

@ -36,6 +36,5 @@ upper_tabs:
path: /xla/tutorials/autoclustering_xla
- title: Use XLA with tf.function
path: /xla/tutorials/compile
status: experimental
- include: /_upper_tabs_right.yaml

View File

@ -47,27 +47,13 @@ removing memory operations is one of the best ways to improve performance.
A simplest way to start using XLA in TensorFlow models is to enable
_auto-clustering_, which automatically finds _clusters_ (connected subgraphs)
within the TensorFlow graph which can be compiled and executed using XLA.
Auto-clustering on GPU can be enabled by either modifying the `TF_XLA_FLAGS`
environment variable:
Auto-clustering on GPU can be enabled by setting the `TF_XLA_FLAGS` environment
variable:
```
$ TF_XLA_FLAGS=--tf_xla_auto_jit=2 path/to/your/tf/program
```
Or by setting a configuration value within the program:
```
import tensorflow as tf
tf.config.optimizer.set_jit(True)
# ... the rest of your program ...
```
Note: The JIT level is cached for a session, and can only be set in the very
beginning of the program. In order to change it midway through, the session
needs to be cleared: `tf.keras.backend.clear_session()`
Auto-clustering is currently optimized for GPU workloads, but it can also be
enabled on CPU by additionally using the flag `--tf_xla_cpu_global_jit`:
@ -75,27 +61,63 @@ enabled on CPU by additionally using the flag `--tf_xla_cpu_global_jit`:
$ TF_XLA_FLAGS="--tf_xla_auto_jit=2 --tf_xla_cpu_global_jit" path/to/your/program
```
Auto-clustering support on a CPU and on multi-GPU environments is experimental.
Note: Auto-clustering support on CPU and on multi-GPU environments is
experimental.
For a detailed usage example, see the
[auto-clustering tutorial colab](./tutorials/autoclustering_xla.ipynb).
For a detailed usage example see the [auto-clustering tutorial
colab](./tutorials/autoclustering_xla.ipynb).
### Explicit compilation
### Explicit compilation with tf.function
Auto-clustering is a great tool for making the model faster without any changes
to the code, but it may be hard to understand what changes have been performed.
Explicit compilation API offers a more fine-grained control for choosing which
functions should be compiled with XLA. However, it might require restructuring
of the source code, as not all TensorFlow operations can be represented in XLA.
functions should be compiled.
For example, the following TensorFlow function which performs the MNIST training
is compiled with XLA:
Note: Using the explicit compilation on API on functions which can not be
represented in XLA results in an exception.
```
@tf.function(experimental_compile=True)
def train_mnist(images, labels):
images, labels = cast(images, labels)
Optimizing sections of the program using
[`tf.function`](https://www.tensorflow.org/api_docs/python/tf/function) is a
standard approach for [improving
performance](https://www.tensorflow.org/tutorials/customization/performance) of
TF2 programs. You can enable compilation with XLA by setting the
`experimental_compile` argument of `tf.function` to `True`. See the [tutorial
colab](./tutorials/compile.ipynb) for usage examples.
with tf.GradientTape() as tape:
predicted_labels = layer(images)
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
logits=predicted_labels, labels=labels
))
layer_variables = layer.trainable_variables
grads = tape.gradient(loss, layer_variables)
optimizer.apply_gradients(zip(grads, layer_variables))
```
The `experimental_compile` API has _must-compile_ semantics: either the entire
function is compiled with XLA, or an `errors.InvalidArgumentError` exception is
thrown. XLA can not currently compile functions where dimensions are not
_inferrable_: that is, if it's not possible to infer the dimensions of all
tensors without running the entire computation. For example, the following
function will not compile:
```
@tf.function
def not_compilable(x):
return tf.unique(x)
```
Shapes can vary across the runs though:
```
@tf.function(experimental_compile=True)
def recompiled_on_launch(a, b):
return a + b
recompiled_on_launch(tf.ones([1, 10]), tf.ones([1, 10]))
recompiled_on_launch(tf.ones([1, 100]), tf.ones([1, 100]))
```
See the [tutorial colab](./tutorials/compile.ipynb) for a more detailed usage
example.
### AOT (Ahead-of-time) compilation for CPU with `tfcompile`

View File

@ -45,9 +45,9 @@
"source": [
"# Classifying CIFAR-10 with XLA\n",
"\n",
"In this colab we train a TensorFlow model to classify the [CIFAR-10](https://en.wikipedia.org/wiki/CIFAR-10) dataset, and we compile it using XLA.\n",
"This tutorial trains a TensorFlow model to classify the [CIFAR-10](https://en.wikipedia.org/wiki/CIFAR-10) dataset, and we compile it using XLA.\n",
"\n",
"We start by loading and normalizing the dataset using the Keras API:"
"Load and normalize the dataset using the Keras API:"
]
},
{
@ -197,7 +197,8 @@
},
"outputs": [],
"source": [
"tf.keras.backend.clear_session() # We need to clear the session to enable JIT in the middle of the program.\n",
"# We need to clear the session to enable JIT in the middle of the program.\n",
"tf.keras.backend.clear_session()\n",
"tf.config.optimizer.set_jit(True) # Enable XLA.\n",
"model = compile_model(generate_model())\n",
"(x_train, y_train), (x_test, y_test) = load_data()\n",

View File

@ -87,7 +87,6 @@
"outputs": [],
"source": [
"import tensorflow as tf\n",
"\n",
"tf.compat.v1.enable_eager_execution()"
]
},