STT-tensorflow

Author	SHA1	Message	Date
TensorFlower Gardener	6859f52a3f	Merge pull request #43237 from benbarsdell:cublaslt PiperOrigin-RevId: 337382541 Change-Id: I949698ec93cb3c15654857768fcfce53984a97be	2020-10-15 14:39:38 -07:00
Ben Barsdell	c491ca455c	Refactor blasLt APIs to return Status, not bool - This does not include DoBlasLtMatmul because the helpers in stream.cc require it to return bool.	2020-10-05 21:17:18 +11:00
Ben Barsdell	a87442f845	Add type checks in DoBlasLtMatmul	2020-10-05 21:09:51 +11:00
Ben Barsdell	2a34bf9cce	Rename kF32FastTF32/BF32 to kTF32/BF16AsF32	2020-10-05 21:07:15 +11:00
Ben Barsdell	ec203dedad	Add more detailed comment for kF32FastTF32/BF16	2020-09-29 21:17:57 +10:00
Ben Barsdell	f5fd56c99b	Replace blasLt overloads with runtime types	2020-09-29 21:14:17 +10:00
Ben Barsdell	6186c09367	Use (+generalize) existing dnn::DataType in blas::	2020-09-29 16:07:57 +10:00
Ben Barsdell	b03ae6de78	Replace CreateBlasLtMatmulPlan args with struct	2020-09-29 16:02:09 +10:00
Ben Barsdell	39bf03f083	Add support for blasLt epilogue fn and bias vector - Changes the backend APIs to allow an epilogue function (default, ReLU, bias, or bias then ReLU) to be specified and a bias vector to be provided. - This is expected to be useful for XLA to perform fusions. - This functionality is not currently tested, because the BatchMatMulOp does not expose relu/bias fusion.	2020-09-17 19:55:55 +10:00
Ben Barsdell	aaea82e6bc	Add cublasLt wrappers to stream_executor - Adds ThenBlasLtMatmul routines that behave similarly to ThenBlasGemmWithAlgorithm but call into the cublasLt library and allow separation of plan creation and execution. - A list of heuristically-prioritized opaque algorithm objects can be obtained via GetBlasLtMatmulAlgorithms. - These routines are only supported when the CUDA version is >= 11.0.	2020-09-17 15:01:02 +10:00
A. Unique TensorFlower	155ce6c067	Qualify uses of std::string PiperOrigin-RevId: 297212802 Change-Id: Ic65150e7ab418be034f48d45ce25ef5d19105836	2020-02-25 15:07:45 -08:00
Kazuaki Ishizaki	27643b326c	minor spelling tweaks	2020-01-16 14:36:52 +09:00
Tim Shen	10a42250f3	Add a function to export cuBLAS version. PiperOrigin-RevId: 262438744	2019-08-08 16:10:23 -07:00
Benjamin Kramer	cdf986398f	Alias tensorflow::gtl::InlinedVector to absl::InlinedVector PiperOrigin-RevId: 211639440	2018-09-05 08:46:34 -07:00
Benjamin Kramer	e36c16c672	[XLA:GPU] Use strided batched gemm instead of building pointer tables. This is mostly a huge amount of plumbing just to call into the cublas functions. blasGemmStridedBatched has been available since CUDA 8.0. For autotuning we'd need cublasGemmStridedBatchedEx, which is new in CUDA 9.2 so I didn't wire that up yet. PiperOrigin-RevId: 207285707	2018-08-03 10:28:08 -07:00
Ben Barsdell	f08f24cd55	Add GPU support for float16 batched matmul (#18436 ) * Add GPU support for float16 batched matmul - Uses cublasGemmBatchedEx introduced in CUDA 9.1. - Includes support for Tensor Op math. - Falls back to a loop over non-batched gemm calls on older CUDA versions or GPU architectures. * Refactor GPU batched gemm into one internal func	2018-05-10 11:06:01 -07:00
A. Unique TensorFlower	9f38ab7416	Add variants of DoBlasGemmWithAlgorithm with alpha being on device. This is in preparation of allowing XLA to fuse (A dot b) * alpha where alpha can be on device instead of just a constant. PiperOrigin-RevId: 194068597	2018-04-24 04:38:23 -07:00
Justin Lebar	4764bf2986	[StreamExecutor] Rename ::perftools::gputools -> ::stream_executor, part 1. Step 1 of re-namespace'ing StreamExecutor into ::stream_executor. This moves everything inside of stream_executor/..., and leaves a namespace alias into ::perftools::gputools. The next steps will clean up users to use the new namespace. This is mostly a mechanical change, but it also includes a bunch of non-mechanical changes that ideally would be split out into separate patches. Unfortunately they all sort of need to be shoved in here for various reasons: - forward declarations need to be in the same namespace as the actual types, so we need to change all forward declarations of StreamExecutor types in this one patch. - Uses of these forward declarations need to be changed to the new namespace (or otherwise we need to add a namespace alias to the relevant header, but this is pretty ugly). - Various initialization code needs to live in StreamExecutor's "real" namespace, so all this needs to be changed. PiperOrigin-RevId: 193256128	2018-04-17 14:28:51 -07:00
Bixia Zheng	8a31fec675	[XLA] FP16 Dot support for the CPU and GPU backends. Extend the stream interface ThenBlasGemmWithAlgorithm to support F16 matrix multiplication with computation type FP32. Extend the stream executor interface DoBlasGemmWithAlgorithm to support F16 GEMM with computation type FP32. Extend the CPU IR emitter to handle F16 Dot instruction, and add F16 matrix multiplication implementation to the CPU runtime. Extend the GPU backend to handle FP16 GEMM Thunk. Replicate the existing matrix multiplication test cases in matrix_ops_simple_test and dot_operation_test for FP16. RELNOTES: PiperOrigin-RevId: 187369731	2018-02-28 12:59:55 -08:00
A. Unique TensorFlower	553e8f14c8	Update Stream::BlockHostUntilDone examples and documentation. The new Status return value must be explicitly handled or ignored. PiperOrigin-RevId: 178950527	2017-12-13 13:42:40 -08:00
Yangzihao Wang	3e3306ef00	Let GetBlasGemmAlgorithms() always return true. PiperOrigin-RevId: 162748507	2017-07-21 09:38:31 -07:00
A. Unique TensorFlower	491beb74cc	Automated g4 rollback of changelist 162423171 PiperOrigin-RevId: 162437318	2017-07-18 19:40:33 -07:00
Yangzihao Wang	06acccabcb	Add autotuning code for matmul operator. Currently it is turned off by default. PiperOrigin-RevId: 162423171	2017-07-18 16:52:54 -07:00
A. Unique TensorFlower	a2ee8bca3f	Add support for int8 x int8 -> int32 matrix multiplication via cublasGemmEx to stream_executor. PiperOrigin-RevId: 161137741	2017-07-06 15:21:23 -07:00
Justin Lebar	0119469494	[XLA] [StreamExecutor] Tune GEMMs when possible. cublas 8 adds the cublasGemmEx function, which lets you specify an explicit "algorithm" for the computation. This functions as an opaque tuning hint to cublas. This patch adds support for cublasGemmEx to StreamExecutor, and wires up XLA's GemmThunk to use the new function. This patch does not add GEMM autotuning support in TensorFlow proper, only XLA. Change: 149068961	2017-03-02 18:08:01 -08:00
A. Unique TensorFlower	b4fa4ad138	Remove Eigen/Core includes from public SE headers This gives the Eigen headers the chance to define the macro EIGEN_USE_GPU in the proper places. Change: 146055230	2017-01-30 16:46:54 -08:00
A. Unique TensorFlower	122cdce33e	Update copyright for 3p/tf. Change: 123901292	2016-06-02 13:41:12 -07:00
A. Unique TensorFlower	523055469c	Add fp16 matrix multiplication (GEMM) support to StreamExecutor, gated on compilation with CUDA 7.5; fp16 convolutions via cuDNN will come soon. This does not update any TensorFlow ops, but it is a dependency of doing that. Note: fp16 axpy and dot do not exist in CUDA 7.5 and have thus not been added. CUDA 8.0 supports both (through the axpyEx and dotEx interfaces). Change: 122069402	2016-05-11 10:51:58 -07:00
A. Unique TensorFlower	05ea40f180	Support ScratchAllocator in BLAS Batched GEMM Change: 117590857	2016-03-18 15:47:02 -07:00
Vijay Vasudevan	ddd4aaf528	TensorFlow: upstream changes to git. Change 109695551 Update FAQ Change 109694725 Add a gradient for resize_bilinear op. Change 109694505 Don't mention variables module in docs variables.Variable should be tf.Variable. Change 109658848 Adding an option to create a new thread-pool for each session. Change 109640570 Take the snapshot of stream-executor. + Expose an interface for scratch space allocation in the interface. Change 109638559 Let image_summary accept uint8 input This allows users to do their own normalization / scaling if the default (very weird) behavior of image_summary is undesired. This required a slight tweak to fake_input.cc to make polymorphically typed fake inputs infer if their type attr is not set but has a default. Unfortunately, adding a second valid type to image_summary disables automatic implicit conversion from np.float64 to tf.float32, so this change is slightly backwards incompatible. Change 109636969 Add serialization operations for SparseTensor. Change 109636644 Update generated Op docs. Change 109634899 TensorFlow: add a markdown file for producing release notes for our releases. Seed with 0.5.0 with a boring but accurate description. Change 109634502 Let histogram_summary take any realnumbertype It used to take only floats, not it understands ints. Change 109634434 TensorFlow: update locations where we mention python 3 support, update them to current truth. Change 109632108 Move HSV <> RGB conversions, grayscale conversions, and adjust_* ops back to tensorflow - make GPU-capable version of RGBToHSV and HSVToRGB, allows only float input/output - change docs to reflect new size constraints - change HSV format to be [0,1] for all components - add automatic dtype conversion for all adjust_* and grayscale conversion ops - fix up docs Change 109631077 Improve optimizer exceptions 1. grads_and_vars is now a tuple, so must be wrapped when passed to format. 2. Use '%r' instead of '%s' for dtype formatting Base CL: 109697989	2015-12-08 09:58:59 -08:00
Manjunath Kudlur	9c3043ff3b	TensorFlow: Improve performance of Alexnet Changes: * error message that refers to removed `DefaultSession` method. * -Wnull-conversion warnings * the "_start_time" attr for recvs when the flag "--brain_enable_scheduling_for_recvs" is set. * typo in tutorial data download progress message. * a typo ("however their installing"=>"however installing"). * typo, rename "TensorFlow Mechanics" to "How To" to be consistent with the website. * a typo ("subtact"=>"subtract"). * protobuf examples in comments in tensorflow::Example.proto. * formula formatting in MNIST beginner tutorial * negative fraction-of-queue-full stats * protobuf inclusion path so that Android demo will build under Blaze. * small typo (moderatly > moderately) * Session.run() to check that tensor arguments come from the session's graph. * another six import * seq2seq typo in bazel command Base CL: 108349164	2015-11-20 10:30:41 -08:00
Manjunath Kudlur	f41959ccb2	TensorFlow: Initial commit of TensorFlow library. TensorFlow is an open source software library for numerical computation using data flow graphs. Base CL: 107276108	2015-11-06 16:27:58 -08:00

32 Commits

No results found.