History

TensorFlower Gardener 9489702e35 Merge pull request #45420 from offscale:args-for-google-style-docstrings

PiperOrigin-RevId: 348788129
Change-Id: I2e4c86b5526fdc83fec1e176702049f1462d1b12

2020-12-23 06:52:51 -08:00

antirectifier_benchmark_test.py

Integrate the benchmarks of Keras.io examples to MLCompass for tracking.

2020-12-22 12:33:46 -08:00

bidirectional_lstm_benchmark_test.py

Integrate the benchmarks of Keras.io examples to MLCompass for tracking.

2020-12-22 12:33:46 -08:00

cifar10_cnn_benchmark_test.py

Integrate the benchmarks of Keras.io examples to MLCompass for tracking.

2020-12-22 12:33:46 -08:00

mnist_conv_benchmark_test.py

Integrate the benchmarks of Keras.io examples to MLCompass for tracking.

2020-12-22 12:33:46 -08:00

mnist_conv_custom_training_benchmark_test.py

Merge pull request #45420 from offscale:args-for-google-style-docstrings

2020-12-23 06:52:51 -08:00

mnist_hierarchical_rnn_benchmark_test.py

Integrate the benchmarks of Keras.io examples to MLCompass for tracking.

2020-12-22 12:33:46 -08:00

mnist_irnn_benchmark_test.py

Integrate the benchmarks of Keras.io examples to MLCompass for tracking.

2020-12-22 12:33:46 -08:00

README.md

Merge pull request #42526 from xingyu-long:benchmark-fix-readme

2020-08-25 00:48:14 -07:00

reuters_mlp_benchmark_test.py

Integrate the benchmarks of Keras.io examples to MLCompass for tracking.

2020-12-22 12:33:46 -08:00

text_classification_transformer_benchmark_test.py

Integrate the benchmarks of Keras.io examples to MLCompass for tracking.

2020-12-22 12:33:46 -08:00

README.md

Benchmarks for keras model examples

Benchmarks for keras model examples

Keras benchmarks

These are benchmark tests running on keras models: models from keras/examples. Benchmarks in the current folder (tensorflow/python/keras/benchmarks/keras_examples_benchmarks) use Keras built-in dataset. In addition, these benchmarks support different distribution strategies on multiple GPUs.

Available models

These examples are implemented by Functional API and Sequential API.

Computer Vision examples

cifar10_cnn_benchmark_test.py: Simple CNN on CIFAR10 image dataset.
mnist_conv_benchmark_test.py: Simple Convnet that achieves ~99% test accuracy on MNIST.
mnist_hierarchical_rnn_benchmark_test.py: Hierarchical RNN (HRNN) to classify MNIST digits.

Text & Sequence examples

Bidirectional_lstm_benchmark_test.py: 2-layer bidirectional LSTM on IMDB movie review dataset.
text_classification_transformer_benchmark_test.py: Text classification with custom transformer block.
reuters_mlp_benchmark_test.py: Simple MLP on Reuters newswire topic classification dataset.

Other examples

antirectifier_benchmark_test.py: Simple custom layer example.
mnist_irnn_benchmark_test.py:Reproduction of the IRNN experiment with pixel-by-pixel sequential MNIST in "A Simple Way to Initialize Recurrent Networks of Rectified Linear Units" by Le et al.

Available benchmark results

The listed benchmark results are obtained by running on Google Cloud Platform (GCP) with the following setup:

GPU: 2 x Tesla V100
OS: Ubuntu 18.04
CPU: 8 x vCPUs, 30 GB memory
CUDA: 10.1
Bazel: 3.1.0

If you want to run benchmark tests on GPU, please make sure you already installed CUDA and other dependencies by following the instructions from the official tutorial for GPU support.

Metrics for following benchmarks:

Batch_size: Number of samples per batch of computation.
Wall_time: Total time to run benchmark test in seconds.
Avg_epoch_time: Average time for each epoch.
Exp_per_sec: Examples per second. The number of examples processed in one second.
Distribution_Strategy: The distribution strategies used in the benchmark.

Cifar10 CNN benchmark

  | Batch_size | Wall_time | Avg_epoch_time | Exp_per_sec | Distribution_Strategy

:---: | :--------: | :-------: | :------------: | :---------: | :-------------------: CPU | 256 | 1393.4896 | 3.21 | 15397.69 | off GPU:2 | 256 | 76.49 | 2.59 | 18758.01 | mirrored

MNIST Conv benchmark

  | Batch_size | Wall_time | Avg_epoch_time | Exp_per_sec | Distribution_Strategy

:---: | :--------: | :-------: | :------------: | :---------: | :-------------------: CPU | 256 | 196.52 | 12.19 | 4915.26 | off GPU:2 | 256 | 24.5794 | 1.21 | 47899.32 | mirrored

MNIST Hierarchical RNN (HRNN) benchmark

  | Batch_size | Wall_time | Avg_epoch_time | Exp_per_sec | Distribution_Strategy

:---: | :--------: | :-------: | :------------: | :---------: | :-------------------: CPU | 256 | 654.05 | 218.68 | 274.24 | off GPU:2 | 256 | 20.77 | 3.73 | 15088.06 | mirrored

Bidirectional LSTM benchmark

  | Batch_size | Wall_time | Avg_epoch_time | Exp_per_sec | Distribution_Strategy

:---: | :--------: | :-------: | :------------: | :---------: | :-------------------: CPU | 512 | 225.57 | 72.55 | 344.70 | off GPU:2 | 512 | 23.54 | 3.23 | 7532.53 | mirrored

Text classification with transformer benchmark

  | Batch_size | Wall_time | Avg_epoch_time | Exp_per_sec | Distribution_Strategy

:---: | :--------: | :-------: | :------------: | :---------: | :-------------------: CPU | 512 | 109.22 | 35.93 | 698.10 | off GPU:2 | 512 | 9.28 | 0.83 | 26567.54 | mirrored

MLP benchmark

  | Batch_size | Wall_time | Avg_epoch_time | Exp_per_sec | Distribution_Strategy

:---: | :--------: | :-------: | :------------: | :---------: | :-------------------: CPU | 128 | 3.76 | 0.54 | 17678.54 | off GPU:2 | 128 | 5.91 | 0.30 | 25435.14 | mirrored

Antirectifier benchmark

  | Batch_size | Wall_time | Avg_epoch_time | Exp_per_sec | Distribution_Strategy

:---: | :--------: | :-------: | :------------: | :---------: | :-------------------: CPU | 512 | 6.77 | 1.79 | 30916.39 | off GPU:2 | 512 | 6.81 | 0.66 | 66563.17 | mirrored

IRNN benchmark

  | Batch_size | Wall_time | Avg_epoch_time | Exp_per_sec | Distribution_Strategy

:---: | :--------: | :-------: | :------------: | :---------: | :-------------------: CPU | 1024 | 213.00 | 69.01 | 868.08 | off GPU:2 | 1024 | 92.71 | 29.12 | 2042.94 | mirrored

Note: For the small models, running on GPU might be even slower than CPU. The potential reason is, training small models is not computation dominant, and there might be some overhead on model replication and data sharding with distributed training on GPUs.

Install Bazel

This step can be skipped if Bazel is already installed.

Bazel is used to build targets based on BUILD files. It will take a while for the first time because it will compile all dependencies from your BUILD file. For the next time, Bazel will use the cache and it’ll be much faster. For Ubuntu OS, please use the following steps for Bazel installation. For other platforms, you may follow the corresponding guide for the installation.

Add bazel as package source
```
sudo apt install curl gnupg
```
```
curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
```
```
echo "deb [arch=amd64] https://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
```
Before we install the bazel, We should take a look for a bazel version that can build the specific tensorflow version, you can check it from here. In addition, you can follow the instructions from Bazel website.

Install Bazel

sudo apt update && sudo apt install bazel-`version`

Run benchmarks

To run benchmarks in keras/benchmarks, please take the following steps:

Pull the latest tensorflow repo from github.
Install the Bazel tool which works with tensorflow, please take a look for the Install bazel section.
To run benchmarks with Bazel, use the --benchmarks=. flags to specify the benchmarks to run.
- To run all benchmarks on CPU
```
bazel run -c opt benchmark_test -- --benchmarks=.
```
- To run all benchmarks on GPU
```
bazel run run --config=cuda -c opt --copt="-mavx" benchmarks_test -- --benchmarks=.
```
- To run a subset of benchmarks using --benchmarks flag, --benchmarks: the list of benchmarks to run. The specified value is interpreted as a regular expression and any benchmarks whose name contains a partial match to the regular expression is executed. e.g. --benchmarks=".*lstm*.", will run all lstm layer related benchmarks.

Add new benchmarks

To add a new benchmark, please take the following steps:

Create your own benchmark test file, xxxx_benchmark_test.py.
Import benchmark_util to measure and track performance if needed.
Create class which inherits from tf.test.Benchmark
Define and load dataset in __init__ method.
Design and create a model in _build_model method.
Define the benchmark_xxx method to measure the performance of benchmarks with different hyper parameters, such as batch_size, run_iters, distribution_strategy and etc. You can check examples from here.
Add the benchmark target to the BUILD file.

Troubleshooting

tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:0 failed. Status: device kernel image is invalid
- Make sure CUDA is installed on your machine.
- Pull the latest tensorflow repo and run the ./configure in the root folder of tensorflow. It will help you to create the configuration file which shows your local environment. Please check this post for more details.

README.md Unescape Escape

Benchmarks for keras model examples

Keras benchmarks

Available models

Computer Vision examples

Text & Sequence examples

Other examples

Available benchmark results

Cifar10 CNN benchmark

MNIST Conv benchmark

MNIST Hierarchical RNN (HRNN) benchmark

Bidirectional LSTM benchmark

Text classification with transformer benchmark

MLP benchmark

Antirectifier benchmark

IRNN benchmark

Install Bazel

Run benchmarks

Add new benchmarks

Troubleshooting

README.md