[XLA] Update documentation on tf.function(experimental_compile=True)

PiperOrigin-RevId: 296532815 Change-Id: Ibacf4929fc689dce87433a4e063b923b24ef9773
2020-02-21 16:15:38 -08:00 · 2020-02-21 16:15:38 -08:00 · 56607b6ea5
commit 56607b6ea5
parent 4e3e368ae6
4 changed files with 57 additions and 36 deletions
--- a/tensorflow/compiler/xla/g3doc/_book.yaml
+++ b/tensorflow/compiler/xla/g3doc/_book.yaml
@ -36,6 +36,5 @@ upper_tabs:
        path: /xla/tutorials/autoclustering_xla
      - title: Use XLA with tf.function
        path: /xla/tutorials/compile
-        status: experimental

 - include: /_upper_tabs_right.yaml
--- a/tensorflow/compiler/xla/g3doc/index.md
+++ b/tensorflow/compiler/xla/g3doc/index.md
@ -47,27 +47,13 @@ removing memory operations is one of the best ways to improve performance.
 A simplest way to start using XLA in TensorFlow models is to enable
 _auto-clustering_, which automatically finds _clusters_ (connected subgraphs)
 within the TensorFlow graph which can be compiled and executed using XLA.
-Auto-clustering on GPU can be enabled by either modifying the `TF_XLA_FLAGS`
-environment variable:
+Auto-clustering on GPU can be enabled by setting the `TF_XLA_FLAGS` environment
+variable:

 ```
 $ TF_XLA_FLAGS=--tf_xla_auto_jit=2 path/to/your/tf/program
 ```

-Or by setting a configuration value within the program:
-
-```
-import tensorflow as tf
-
-tf.config.optimizer.set_jit(True)
-
-# ... the rest of your program ...
-```
-
-Note: The JIT level is cached for a session, and can only be set in the very
-beginning of the program. In order to change it midway through, the session
-needs to be cleared: `tf.keras.backend.clear_session()`
-
 Auto-clustering is currently optimized for GPU workloads, but it can also be
 enabled on CPU by additionally using the flag `--tf_xla_cpu_global_jit`:

@ -75,27 +61,63 @@ enabled on CPU by additionally using the flag `--tf_xla_cpu_global_jit`:
 $ TF_XLA_FLAGS="--tf_xla_auto_jit=2 --tf_xla_cpu_global_jit" path/to/your/program
 ```

-Auto-clustering support on a CPU and on multi-GPU environments is experimental.
+Note: Auto-clustering support on CPU and on multi-GPU environments is
+experimental.

-For a detailed usage example, see the
-[auto-clustering tutorial colab](./tutorials/autoclustering_xla.ipynb).
+For a detailed usage example see the [auto-clustering tutorial
+colab](./tutorials/autoclustering_xla.ipynb).

-### Explicit compilation
+### Explicit compilation with tf.function
+
+Auto-clustering is a great tool for making the model faster without any changes
+to the code, but it may be hard to understand what changes have been performed.

 Explicit compilation API offers a more fine-grained control for choosing which
-functions should be compiled with XLA. However, it might require restructuring
-of the source code, as not all TensorFlow operations can be represented in XLA.
+functions should be compiled.
+For example, the following TensorFlow function which performs the MNIST training
+is compiled with XLA:

-Note: Using the explicit compilation on API on functions which can not be
-represented in XLA results in an exception.
+```
+@tf.function(experimental_compile=True)
+def train_mnist(images, labels):
+    images, labels = cast(images, labels)

-Optimizing sections of the program using
-[`tf.function`](https://www.tensorflow.org/api_docs/python/tf/function) is a
-standard approach for [improving
-performance](https://www.tensorflow.org/tutorials/customization/performance) of
-TF2 programs. You can enable compilation with XLA by setting the
-`experimental_compile` argument of `tf.function` to `True`. See the [tutorial
-colab](./tutorials/compile.ipynb) for usage examples.
+    with tf.GradientTape() as tape:
+      predicted_labels = layer(images)
+      loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
+          logits=predicted_labels, labels=labels
+      ))
+    layer_variables = layer.trainable_variables
+    grads = tape.gradient(loss, layer_variables)
+    optimizer.apply_gradients(zip(grads, layer_variables))
+```
+
+The `experimental_compile` API has _must-compile_ semantics: either the entire
+function is compiled with XLA, or an `errors.InvalidArgumentError` exception is
+thrown. XLA can not currently compile functions where dimensions are not
+_inferrable_: that is, if it's not possible to infer the dimensions of all
+tensors without running the entire computation. For example, the following
+function will not compile:
+
+```
+@tf.function
+def not_compilable(x):
+  return tf.unique(x)
+```
+
+Shapes can vary across the runs though:
+
+```
+@tf.function(experimental_compile=True)
+def recompiled_on_launch(a, b):
+  return a + b
+
+recompiled_on_launch(tf.ones([1, 10]), tf.ones([1, 10]))
+recompiled_on_launch(tf.ones([1, 100]), tf.ones([1, 100]))
+```
+
+See the [tutorial colab](./tutorials/compile.ipynb) for a more detailed usage
+example.

 ### AOT (Ahead-of-time) compilation for CPU with `tfcompile`

--- a/tensorflow/compiler/xla/g3doc/tutorials/autoclustering_xla.ipynb
+++ b/tensorflow/compiler/xla/g3doc/tutorials/autoclustering_xla.ipynb
@ -45,9 +45,9 @@
      "source": [
        "# Classifying CIFAR-10 with XLA\n",
        "\n",
-        "In this colab we train a TensorFlow model to classify the [CIFAR-10](https://en.wikipedia.org/wiki/CIFAR-10) dataset, and we compile it using XLA.\n",
+        "This tutorial trains a TensorFlow model to classify the [CIFAR-10](https://en.wikipedia.org/wiki/CIFAR-10) dataset, and we compile it using XLA.\n",
        "\n",
-        "We start by loading and normalizing the dataset using the Keras API:"
+        "Load and normalize the dataset using the Keras API:"
      ]
    },
    {
@ -197,7 +197,8 @@
      },
      "outputs": [],
      "source": [
-        "tf.keras.backend.clear_session() # We need to clear the session to enable JIT in the middle of the program.\n",
+        "# We need to clear the session to enable JIT in the middle of the program.\n",
+        "tf.keras.backend.clear_session()\n",
        "tf.config.optimizer.set_jit(True) # Enable XLA.\n",
        "model = compile_model(generate_model())\n",
        "(x_train, y_train), (x_test, y_test) = load_data()\n",
--- a/tensorflow/compiler/xla/g3doc/tutorials/compile.ipynb
+++ b/tensorflow/compiler/xla/g3doc/tutorials/compile.ipynb
@ -87,7 +87,6 @@
      "outputs": [],
      "source": [
        "import tensorflow as tf\n",
-        "\n",
        "tf.compat.v1.enable_eager_execution()"
      ]
    },