- Changes the backend APIs to allow an epilogue function (default, ReLU,
bias, or bias then ReLU) to be specified and a bias vector to be
provided.
- This is expected to be useful for XLA to perform fusions.
- This functionality is not currently tested, because the BatchMatMulOp
does not expose relu/bias fusion.
- Adds ThenBlasLtMatmul routines that behave similarly to
ThenBlasGemmWithAlgorithm but call into the cublasLt library and allow
separation of plan creation and execution.
- A list of heuristically-prioritized opaque algorithm objects can be
obtained via GetBlasLtMatmulAlgorithms.
- These routines are only supported when the CUDA version is >= 11.0.
This is mostly a huge amount of plumbing just to call into the cublas functions.
blasGemmStridedBatched has been available since CUDA 8.0.
For autotuning we'd need cublasGemmStridedBatchedEx, which is new in CUDA 9.2
so I didn't wire that up yet.
PiperOrigin-RevId: 207285707
* Add GPU support for float16 batched matmul
- Uses cublasGemmBatchedEx introduced in CUDA 9.1.
- Includes support for Tensor Op math.
- Falls back to a loop over non-batched gemm calls on older CUDA
versions or GPU architectures.
* Refactor GPU batched gemm into one internal func
Step 1 of re-namespace'ing StreamExecutor into ::stream_executor.
This moves everything inside of stream_executor/..., and leaves a
namespace alias into ::perftools::gputools. The next steps will clean
up users to use the new namespace.
This is mostly a mechanical change, but it also includes a bunch of
non-mechanical changes that ideally would be split out into separate
patches. Unfortunately they all sort of need to be shoved in here for
various reasons:
- forward declarations need to be in the same namespace as the actual
types, so we need to change all forward declarations of
StreamExecutor types in this one patch.
- Uses of these forward declarations need to be changed to the new
namespace (or otherwise we need to add a namespace alias to the
relevant header, but this is pretty ugly).
- Various initialization code needs to live in StreamExecutor's "real"
namespace, so all this needs to be changed.
PiperOrigin-RevId: 193256128
Extend the stream interface ThenBlasGemmWithAlgorithm to support F16 matrix
multiplication with computation type FP32.
Extend the stream executor interface DoBlasGemmWithAlgorithm to support F16
GEMM with computation type FP32.
Extend the CPU IR emitter to handle F16 Dot instruction, and add F16 matrix
multiplication implementation to the CPU runtime.
Extend the GPU backend to handle FP16 GEMM Thunk.
Replicate the existing matrix multiplication test cases in
matrix_ops_simple_test and dot_operation_test for FP16.
RELNOTES:
PiperOrigin-RevId: 187369731
cublas 8 adds the cublasGemmEx function, which lets you specify an
explicit "algorithm" for the computation. This functions as an opaque
tuning hint to cublas.
This patch adds support for cublasGemmEx to StreamExecutor, and wires up
XLA's GemmThunk to use the new function.
This patch does not add GEMM autotuning support in TensorFlow proper,
only XLA.
Change: 149068961
compilation with CUDA 7.5; fp16 convolutions via cuDNN will come soon.
This does not update any TensorFlow ops, but it is a dependency of doing
that.
Note: fp16 axpy and dot do not exist in CUDA 7.5 and have thus not been added.
CUDA 8.0 supports both (through the axpyEx and dotEx interfaces).
Change: 122069402
Change 109695551
Update FAQ
Change 109694725
Add a gradient for resize_bilinear op.
Change 109694505
Don't mention variables module in docs
variables.Variable should be tf.Variable.
Change 109658848
Adding an option to create a new thread-pool for each session.
Change 109640570
Take the snapshot of stream-executor.
+ Expose an interface for scratch space allocation in the interface.
Change 109638559
Let image_summary accept uint8 input
This allows users to do their own normalization / scaling if the default
(very weird) behavior of image_summary is undesired.
This required a slight tweak to fake_input.cc to make polymorphically typed
fake inputs infer if their type attr is not set but has a default.
Unfortunately, adding a second valid type to image_summary *disables* automatic
implicit conversion from np.float64 to tf.float32, so this change is slightly
backwards incompatible.
Change 109636969
Add serialization operations for SparseTensor.
Change 109636644
Update generated Op docs.
Change 109634899
TensorFlow: add a markdown file for producing release notes for our
releases. Seed with 0.5.0 with a boring but accurate description.
Change 109634502
Let histogram_summary take any realnumbertype
It used to take only floats, not it understands ints.
Change 109634434
TensorFlow: update locations where we mention python 3 support, update
them to current truth.
Change 109632108
Move HSV <> RGB conversions, grayscale conversions, and adjust_* ops back to tensorflow
- make GPU-capable version of RGBToHSV and HSVToRGB, allows only float input/output
- change docs to reflect new size constraints
- change HSV format to be [0,1] for all components
- add automatic dtype conversion for all adjust_* and grayscale conversion ops
- fix up docs
Change 109631077
Improve optimizer exceptions
1. grads_and_vars is now a tuple, so must be wrapped when passed to format.
2. Use '%r' instead of '%s' for dtype formatting
Base CL: 109697989
Changes:
* error message that refers to removed `DefaultSession` method.
* -Wnull-conversion warnings
* the "_start_time" attr for recvs when the flag "--brain_enable_scheduling_for_recvs" is set.
* typo in tutorial data download progress message.
* a typo ("however their installing"=>"however installing").
* typo, rename "TensorFlow Mechanics" to "How To" to be consistent with the website.
* a typo ("subtact"=>"subtract").
* protobuf examples in comments in tensorflow::Example.proto.
* formula formatting in MNIST beginner tutorial
* negative fraction-of-queue-full stats
* protobuf inclusion path so that Android demo will build under Blaze.
* small typo (moderatly > moderately)
* Session.run() to check that tensor arguments come from the session's graph.
* another six import
* seq2seq typo in bazel command
Base CL: 108349164