Commit Graph

32 Commits

Author SHA1 Message Date
TensorFlower Gardener
6859f52a3f Merge pull request from benbarsdell:cublaslt
PiperOrigin-RevId: 337382541
Change-Id: I949698ec93cb3c15654857768fcfce53984a97be
2020-10-15 14:39:38 -07:00
Ben Barsdell
c491ca455c Refactor blasLt APIs to return Status, not bool
- This does not include DoBlasLtMatmul because the helpers in
  stream.cc require it to return bool.
2020-10-05 21:17:18 +11:00
Ben Barsdell
a87442f845 Add type checks in DoBlasLtMatmul 2020-10-05 21:09:51 +11:00
Ben Barsdell
2a34bf9cce Rename kF32FastTF32/BF32 to kTF32/BF16AsF32 2020-10-05 21:07:15 +11:00
Ben Barsdell
ec203dedad Add more detailed comment for kF32FastTF32/BF16 2020-09-29 21:17:57 +10:00
Ben Barsdell
f5fd56c99b Replace blasLt overloads with runtime types 2020-09-29 21:14:17 +10:00
Ben Barsdell
6186c09367 Use (+generalize) existing dnn::DataType in blas:: 2020-09-29 16:07:57 +10:00
Ben Barsdell
b03ae6de78 Replace CreateBlasLtMatmulPlan args with struct 2020-09-29 16:02:09 +10:00
Ben Barsdell
39bf03f083 Add support for blasLt epilogue fn and bias vector
- Changes the backend APIs to allow an epilogue function (default, ReLU,
  bias, or bias then ReLU) to be specified and a bias vector to be
  provided.
- This is expected to be useful for XLA to perform fusions.
- This functionality is not currently tested, because the BatchMatMulOp
  does not expose relu/bias fusion.
2020-09-17 19:55:55 +10:00
Ben Barsdell
aaea82e6bc Add cublasLt wrappers to stream_executor
- Adds ThenBlasLtMatmul routines that behave similarly to
  ThenBlasGemmWithAlgorithm but call into the cublasLt library and allow
  separation of plan creation and execution.
- A list of heuristically-prioritized opaque algorithm objects can be
  obtained via GetBlasLtMatmulAlgorithms.
- These routines are only supported when the CUDA version is >= 11.0.
2020-09-17 15:01:02 +10:00
A. Unique TensorFlower
155ce6c067 Qualify uses of std::string
PiperOrigin-RevId: 297212802
Change-Id: Ic65150e7ab418be034f48d45ce25ef5d19105836
2020-02-25 15:07:45 -08:00
Kazuaki Ishizaki
27643b326c minor spelling tweaks 2020-01-16 14:36:52 +09:00
Tim Shen
10a42250f3 Add a function to export cuBLAS version.
PiperOrigin-RevId: 262438744
2019-08-08 16:10:23 -07:00
Benjamin Kramer
cdf986398f Alias tensorflow::gtl::InlinedVector to absl::InlinedVector
PiperOrigin-RevId: 211639440
2018-09-05 08:46:34 -07:00
Benjamin Kramer
e36c16c672 [XLA:GPU] Use strided batched gemm instead of building pointer tables.
This is mostly a huge amount of plumbing just to call into the cublas functions.
blasGemmStridedBatched has been available since CUDA 8.0.

For autotuning we'd need cublasGemmStridedBatchedEx, which is new in CUDA 9.2
so I didn't wire that up yet.

PiperOrigin-RevId: 207285707
2018-08-03 10:28:08 -07:00
Ben Barsdell
f08f24cd55 Add GPU support for float16 batched matmul ()
* Add GPU support for float16 batched matmul

- Uses cublasGemmBatchedEx introduced in CUDA 9.1.
- Includes support for Tensor Op math.
- Falls back to a loop over non-batched gemm calls on older CUDA
  versions or GPU architectures.

* Refactor GPU batched gemm into one internal func
2018-05-10 11:06:01 -07:00
A. Unique TensorFlower
9f38ab7416 Add variants of DoBlasGemmWithAlgorithm with alpha being on device.
This is in preparation of allowing XLA to fuse (A dot b) * alpha where alpha
can be on device instead of just a constant.

PiperOrigin-RevId: 194068597
2018-04-24 04:38:23 -07:00
Justin Lebar
4764bf2986 [StreamExecutor] Rename ::perftools::gputools -> ::stream_executor, part 1.
Step 1 of re-namespace'ing StreamExecutor into ::stream_executor.

This moves everything inside of stream_executor/..., and leaves a
namespace alias into ::perftools::gputools.  The next steps will clean
up users to use the new namespace.

This is mostly a mechanical change, but it also includes a bunch of
non-mechanical changes that ideally would be split out into separate
patches.  Unfortunately they all sort of need to be shoved in here for
various reasons:

 - forward declarations need to be in the same namespace as the actual
   types, so we need to change all forward declarations of
   StreamExecutor types in this one patch.

 - Uses of these forward declarations need to be changed to the new
   namespace (or otherwise we need to add a namespace alias to the
   relevant header, but this is pretty ugly).

 - Various initialization code needs to live in StreamExecutor's "real"
   namespace, so all this needs to be changed.

PiperOrigin-RevId: 193256128
2018-04-17 14:28:51 -07:00
Bixia Zheng
8a31fec675 [XLA] FP16 Dot support for the CPU and GPU backends.
Extend the stream interface ThenBlasGemmWithAlgorithm to support F16 matrix
multiplication with computation type FP32.

Extend the stream executor interface DoBlasGemmWithAlgorithm to support F16
GEMM with computation type FP32.

Extend the CPU IR emitter to handle F16 Dot instruction, and add F16 matrix
multiplication implementation to the CPU runtime.

Extend the GPU backend to handle FP16 GEMM Thunk.

Replicate the existing matrix multiplication test cases in
matrix_ops_simple_test and dot_operation_test for FP16.

RELNOTES:
PiperOrigin-RevId: 187369731
2018-02-28 12:59:55 -08:00
A. Unique TensorFlower
553e8f14c8 Update Stream::BlockHostUntilDone examples and documentation.
The new Status return value must be explicitly handled or ignored.

PiperOrigin-RevId: 178950527
2017-12-13 13:42:40 -08:00
Yangzihao Wang
3e3306ef00 Let GetBlasGemmAlgorithms() always return true.
PiperOrigin-RevId: 162748507
2017-07-21 09:38:31 -07:00
A. Unique TensorFlower
491beb74cc Automated g4 rollback of changelist 162423171
PiperOrigin-RevId: 162437318
2017-07-18 19:40:33 -07:00
Yangzihao Wang
06acccabcb Add autotuning code for matmul operator.
Currently it is turned off by default.

PiperOrigin-RevId: 162423171
2017-07-18 16:52:54 -07:00
A. Unique TensorFlower
a2ee8bca3f Add support for int8 x int8 -> int32 matrix multiplication via cublasGemmEx to stream_executor.
PiperOrigin-RevId: 161137741
2017-07-06 15:21:23 -07:00
Justin Lebar
0119469494 [XLA] [StreamExecutor] Tune GEMMs when possible.
cublas 8 adds the cublasGemmEx function, which lets you specify an
explicit "algorithm" for the computation.  This functions as an opaque
tuning hint to cublas.

This patch adds support for cublasGemmEx to StreamExecutor, and wires up
XLA's GemmThunk to use the new function.

This patch does not add GEMM autotuning support in TensorFlow proper,
only XLA.
Change: 149068961
2017-03-02 18:08:01 -08:00
A. Unique TensorFlower
b4fa4ad138 Remove Eigen/Core includes from public SE headers
This gives the Eigen headers the chance to define the macro EIGEN_USE_GPU in
the proper places.
Change: 146055230
2017-01-30 16:46:54 -08:00
A. Unique TensorFlower
122cdce33e Update copyright for 3p/tf.
Change: 123901292
2016-06-02 13:41:12 -07:00
A. Unique TensorFlower
523055469c Add fp16 matrix multiplication (GEMM) support to StreamExecutor, gated on
compilation with CUDA 7.5; fp16 convolutions via cuDNN will come soon.
This does not update any TensorFlow ops, but it is a dependency of doing
that.

Note: fp16 axpy and dot do not exist in CUDA 7.5 and have thus not been added.
CUDA 8.0 supports both (through the axpyEx and dotEx interfaces).
Change: 122069402
2016-05-11 10:51:58 -07:00
A. Unique TensorFlower
05ea40f180 Support ScratchAllocator in BLAS Batched GEMM
Change: 117590857
2016-03-18 15:47:02 -07:00
Vijay Vasudevan
ddd4aaf528 TensorFlow: upstream changes to git.
Change 109695551
	Update FAQ
Change 109694725
	Add a gradient for resize_bilinear op.
Change 109694505
	Don't mention variables module in docs

	variables.Variable should be tf.Variable.
Change 109658848
	Adding an option to create a new thread-pool for each session.
Change 109640570

	Take the snapshot of stream-executor.
	+ Expose an interface for scratch space allocation in the interface.

Change 109638559
	Let image_summary accept uint8 input

	This allows users to do their own normalization / scaling if the default
	(very weird) behavior of image_summary is undesired.

	This required a slight tweak to fake_input.cc to make polymorphically typed
	fake inputs infer if their type attr is not set but has a default.

	Unfortunately, adding a second valid type to image_summary *disables* automatic
	implicit conversion from np.float64 to tf.float32, so this change is slightly
	backwards incompatible.
Change 109636969
	Add serialization operations for SparseTensor.
Change 109636644
	Update generated Op docs.
Change 109634899
	TensorFlow: add a markdown file for producing release notes for our
	releases.  Seed with 0.5.0 with a boring but accurate description.
Change 109634502
	Let histogram_summary take any realnumbertype

	It used to take only floats, not it understands ints.
Change 109634434
	TensorFlow: update locations where we mention python 3 support, update
	them to current truth.
Change 109632108
	Move HSV <> RGB conversions, grayscale conversions, and adjust_* ops back to tensorflow
	- make GPU-capable version of RGBToHSV and HSVToRGB, allows only float input/output
	- change docs to reflect new size constraints
	- change HSV format to be [0,1] for all components
	- add automatic dtype conversion for all adjust_* and grayscale conversion ops
	- fix up docs
Change 109631077
	Improve optimizer exceptions

	1. grads_and_vars is now a tuple, so must be wrapped when passed to format.
	2. Use '%r' instead of '%s' for dtype formatting

Base CL: 109697989
2015-12-08 09:58:59 -08:00
Manjunath Kudlur
9c3043ff3b TensorFlow: Improve performance of Alexnet
Changes:

* error message that refers to removed `DefaultSession` method.
* -Wnull-conversion warnings
* the "_start_time" attr for recvs when the flag "--brain_enable_scheduling_for_recvs" is set.
* typo in tutorial data download progress message.
* a typo ("however their installing"=>"however installing").
* typo, rename "TensorFlow Mechanics" to "How To" to be consistent with the website.
* a typo ("subtact"=>"subtract").
* protobuf examples in comments in tensorflow::Example.proto.
* formula formatting in MNIST beginner tutorial
* negative fraction-of-queue-full stats
* protobuf inclusion path so that Android demo will build under Blaze.
* small typo (moderatly > moderately)
* Session.run() to check that tensor arguments come from the session's graph.
* another six import
* seq2seq typo in bazel command

Base CL: 108349164
2015-11-20 10:30:41 -08:00
Manjunath Kudlur
f41959ccb2 TensorFlow: Initial commit of TensorFlow library.
TensorFlow is an open source software library for numerical computation
using data flow graphs.

Base CL: 107276108
2015-11-06 16:27:58 -08:00