STT-tensorflow

Author	SHA1	Message	Date
A. Unique TensorFlower	58b1c0f401	Internal change PiperOrigin-RevId: 292220288 Change-Id: Ib7e23f56f7b79174669d10ae7d938a82d9c19900	2020-01-29 14:37:24 -08:00
Reed Wanderman-Milne	837b673aa7	Improve performance of argmax and argmin GPU kernel in some cases. Eigen has very poor performance when the output tensor has few elements. With this change, if there are at most 1024 elements, a different implementation is used. The new implementation uses the functor::ReduceImpl function that is used for most other TF reductions. Eigen performs better than functor::ReduceImpl when there are many output elements, which is why Eigen is still used when the number of output elements is greater than 1024. A benchmark was added. The results from running on my machine with two Xeon E5-2690 v4 CPUs and a Titan V GPU are shown below. All times are in milliseconds. The benchmarks were run in the internal version of TensorFlow. Only float32 benchmarks are shown, as float16 and float64 results are similar. Also only benchmarks where the new implementation are shown, as when the old implementation is used, the performance is the same as before this change. Benchmark New time (s) Old time (s) old of new % 1d_float32_dim0 0.00089 0.06431 1.4% rectangle1_2d_float32_dim1 0.00285 0.06736 4.2% rectangle2_2d_float32_dim0 0.00298 0.05501 5.2% rectangle1_3d_float32_dim0 0.07876 0.12668 62.2% rectangle2_3d_float32_dim1 0.07869 0.12757 61.7% rectangle3_3d_float32_dim2 0.07847 0.78461 10.0% PiperOrigin-RevId: 292206797 Change-Id: Ic586910e0935463190761dc3ec9e7122bba06bd6	2020-01-29 13:36:42 -08:00
George Karpenkov	9b544af08d	Change std::call_once to absl::call_once absl::call_once is faster and supports fibers. PiperOrigin-RevId: 292148213 Change-Id: I66e96d735b722a2642508a7e7a1e73de254234d7	2020-01-29 08:40:19 -08:00
Brian Atkinson	efb083cff3	Make //tensorflow/core/kernels:non_max_suppression_op_gpu_test IWYU clean. PiperOrigin-RevId: 292045357 Change-Id: I7d799733c5ff20336c4d805f915fd6fb2f9a002a	2020-01-28 17:20:20 -08:00
Srinivas Vasudevan	b105944eb6	Add Broadcasted Matrix Triangular Solve. Add Numpy-style broadcasting in the batch dimensions for tf.linalg.triangular_solve op. The last two dimensions of both operands constitute the matrix dimensions. The dimensions beyond these are broadcasted to form a common output shape with the standard NumPy broadcasting rules. (https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) Note: This implementation differs from Numpy's behavior in that vectors (rank-1 Tensors) are not promoted to matrices (rank-2 Tensors) by appending/prepending dimensions. PiperOrigin-RevId: 291857632 Change-Id: Ifce8f1ae3e0e5b990b71cf468978e1cdc7663d1f	2020-01-27 20:43:43 -08:00
A. Unique TensorFlower	2938772a08	Internal change PiperOrigin-RevId: 291690233 Change-Id: I294cf4f577d1d17ad5bbb6e3be022ece6268575c	2020-01-27 03:17:56 -08:00
TensorFlower Gardener	4f5d1cc221	Merge pull request #34951 from duncanriach:multi-algorithm-deterministic-cudnn-convolutions PiperOrigin-RevId: 291684013 Change-Id: I818177de66eeec3dd52e276a5894a1d7a7166459	2020-01-27 02:35:26 -08:00
Smit Hinsu	85cc56784d	Define TensorList class in a separate library Defining TensorList outside list_kernels library will allow clients to use TensorList class without having to also include the kernels operating on it. PiperOrigin-RevId: 291474941 Change-Id: Iaab9d6c077b6a6c6236896c80d53ac8196472a82	2020-01-24 17:37:07 -08:00
Brian Atkinson	659c90d9c5	Split eigen_backward_spatial_convolutions_test into two. The test file is testing two distinct headers, and in some contexts, the complexity of this file is leading to really long compile times. By splitting in two, we should at least enable parallel compilation if not potentially reduce the effect of whatever behavior this is triggering in the compiler. PiperOrigin-RevId: 291462169 Change-Id: I4df8934d8eaad1c93f986c074734eb31fa98e91a	2020-01-24 16:07:33 -08:00
Brian Atkinson	23e0a04dbf	Add a number of missing headers being transitively pulled in. This enables a few headers to be removed from implementations and in turn simplify the build graph some. PiperOrigin-RevId: 291286881 Change-Id: I0b8c9d1419cf81ea8b3a48b422ea2dc0fd9187d9	2020-01-23 18:18:03 -08:00
Derek Murray	649a04fbe9	[tf.data] Further optimizations for SerializeManySparseOp. 1. Use DMAHelper to access the tensor base pointers without additional alignment and type checks, and use these pointers to access the elements of the tensors directly. 2. Add a special case for rank == 2 (which is the common case when batching Example protos), to avoid a length-1 loop per element. 3. Use `memcpy` where possible (and otherwise, `std::copy_n`) instead of Eigen assignment for the group values. PiperOrigin-RevId: 291176480 Change-Id: I331213c0ac1caadf620c87833759b8a6550f1752	2020-01-23 09:00:57 -08:00
Jeremy Lau	d277dc5e68	Disable tests that can't run under msan. PiperOrigin-RevId: 291020785 Change-Id: I045d262e2e728b8a5dce80a2fa2993065c31f85a	2020-01-22 13:43:56 -08:00
A. Unique TensorFlower	a1bc56203f	Fix 64-bit integer portability problems in TensorFlow kernels. Removes reliance on the assumption that tensorflow::int64 is long long. This is intended to eventually enable changing the definition to int64_t from <cstdint>. PiperOrigin-RevId: 290872365 Change-Id: I18534aeabf153d65c3521599855f8cca279fce51	2020-01-21 19:14:20 -08:00
TensorFlower Gardener	bd4c38b3dc	Merge pull request #32302 from houtoms:pr_cudnn_ctc_loss PiperOrigin-RevId: 290387603 Change-Id: I28491f42a4559a9f79bd6a7b73d8e6b670f55368	2020-01-17 20:32:44 -08:00
Smit Hinsu	c8e8ba577e	Add Broadcasted Matrix Triangular Solve. Add Numpy-style broadcasting in the batch dimensions for tf.linalg.triangular_solve op. The last two dimensions of both operands constitute the matrix dimensions. The dimensions beyond these are broadcasted to form a common output shape with the standard NumPy broadcasting rules. (https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) Note: This implementation differs from Numpy's behavior in that vectors (rank-1 Tensors) are not pr... PiperOrigin-RevId: 289978628 Change-Id: I66e41e292e57e6df8111745cbe47ccffacb53edc	2020-01-15 18:25:29 -08:00
Srinivas Vasudevan	b1b7f38c25	Add Broadcasted Matrix Triangular Solve. Add Numpy-style broadcasting in the batch dimensions for tf.linalg.triangular_solve op. The last two dimensions of both operands constitute the matrix dimensions. The dimensions beyond these are broadcasted to form a common output shape with the standard NumPy broadcasting rules. (https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html) Note: This implementation differs from Numpy's behavior in that vectors (rank-1 Tensors) are not promoted to matrices (rank-2 Tensors) by appending/prepending dimensions. PiperOrigin-RevId: 289966825 Change-Id: Ib276b9ed1f4b7d10c25617d7ba5f1564b2077610	2020-01-15 17:03:51 -08:00
TensorFlower Gardener	0e2b5a9d2a	Merge pull request #34800 from ROCmSoftwarePlatform:google_upstream_rocm_csr_sparse_matrix_support PiperOrigin-RevId: 289617600 Change-Id: Ic1aa3714126d7b867295ae386b6be643c1dc83e4	2020-01-14 03:19:11 -08:00
Ayush Dubey	bdb99e06c5	Disable `collective_nccl_test` on single GPU and enable on multiple GPUs. PiperOrigin-RevId: 289142542 Change-Id: I6b9c41f74062accc32173cc7afa4228e500bf31c	2020-01-10 12:33:02 -08:00
A. Unique TensorFlower	b6d83da696	Explicitly export files needed by other packages PiperOrigin-RevId: 289068233 Change-Id: Iad295a519968341f3765116f5f3c6508efd51d24	2020-01-10 04:12:13 -08:00
TensorFlower Gardener	c1971ab97c	Merge pull request #35503 from ROCmSoftwarePlatform:google_upstream_rocm_miopen_immediate_mode PiperOrigin-RevId: 289053613 Change-Id: I233d95adc3aa888460bd39a07fd7e168fea14846	2020-01-10 01:43:54 -08:00
Srinivas Vasudevan	19986377f2	Add tf.math.xlog1py, a safe way to compute x * log1p(y) PiperOrigin-RevId: 288971952 Change-Id: I3850da3b37f006b11198d203a1b73f3cb336b833	2020-01-09 14:28:37 -08:00
Mihai Maruseac	31b0483fdf	Cleanup unused load statements. PiperOrigin-RevId: 288791479 Change-Id: Ib68dc2bfa2856d839a47e7430d565da766df12ad	2020-01-08 15:53:53 -08:00
Brian Atkinson	229be70a12	Cleanup unused load statements. PiperOrigin-RevId: 288759430 Change-Id: I45b80521b527d1ea2e3202a76ddc111dc6cbd273	2020-01-08 13:17:44 -08:00
TensorFlower Gardener	448c0b41bc	Merge pull request #34628 from Intel-tensorflow:yang/mirrorpad PiperOrigin-RevId: 287920237 Change-Id: I3e7aacbdd52584bfb99cc347ade8c6429d89196d	2020-01-02 17:22:31 -08:00
Deven Desai	f5b5f3d22d	[ROCm] Enabling ROCm support for code in gpu_util.cc	2019-12-30 16:10:45 +00:00
Duncan Riach	330e7ad14e	Refactor code that enables deterministic operation of cuDNN	2019-12-27 13:08:14 -08:00
A. Unique TensorFlower	50a1c3be8b	Add tf.math.sobol_sample for generating Sobol sequences (CPU implementation only). PiperOrigin-RevId: 286365254 Change-Id: Ia0c2482f4f264f36fe61db5f9c72f24db35faf65	2019-12-19 03:49:17 -08:00
Martin Wicke	9b20626323	Roll back "Use absl::flat_hash_map ... in lookup_table_op." The change omits necessary checks to is_initialized(), which could lead to data races. PiperOrigin-RevId: 286203946 Change-Id: I678d05ccc7c5220e2d30111a853fd20f505fe933	2019-12-18 09:12:46 -08:00
Derek Murray	7b2c3eb190	Roll back "Use absl::flat_hash_map ... in lookup_table_op." The change omits necessary checks to is_initialized(), which could lead to data races. PiperOrigin-RevId: 285986418 Change-Id: I12d40188473b855e398437514237b72eddb0443f	2019-12-17 08:41:29 -08:00
Guangda Lai	5d72174749	Changes package visibility. PiperOrigin-RevId: 285907961 Change-Id: Ie2861ae7106ef34545561397024ef441c020b2ad	2019-12-16 21:13:36 -08:00
Martin Wicke	adfe745b21	Use absl::flat_hash_map instead of std::unordered_map for potential memory savings in lookup_table_op. I ran the microbenchmarks -- the effect of this isn't obviously significant but there are some speed improvements. Before/After: MutableHashTableBenchmark.benchmark_many_repeated_batch_32_insert_scalar wall_time: 0.000236988067627 -> 0.000297546386719 allocator_maximum_num_bytes_cpu: 776.0 -> 776.0 MutableHashTableBenchmark.benchmark_many_repeated_scalar_insert_scalar wall_time: 0.000243902206421 -> 0.000259876251221 allocator_maximum_num_bytes_cpu: 388.0 -> 268.0 MutableHashTableBenchmark.benchmark_single_repeated_batch_32_insert_scalar wall_time: 0.000108957290649 -> 9.79900360107e-05 allocator_maximum_num_bytes_cpu: 640.0 -> 512.0 MutableHashTableBenchmark.benchmark_single_repeated_scalar_insert_scalar wall_time: 0.0001060962677 -> 0.000105142593384 allocator_maximum_num_bytes_cpu: 268.0 -> 268.0 DenseHashTableBenchmark.benchmark_many_repeated_batch_32_insert_scalar wall_time: 0.000240087509155 -> 0.000237941741943 allocator_maximum_num_bytes_cpu: 776.0 -> 776.0 DenseHashTableBenchmark.benchmark_many_repeated_scalar_insert_scalar wall_time: 0.000249147415161 -> 0.000262022018433 allocator_maximum_num_bytes_cpu: 400.0 -> 400.0 DenseHashTableBenchmark.benchmark_single_repeated_batch_32_insert_scalar wall_time: 0.0001060962677 -> 9.79900360107e-05 allocator_maximum_num_bytes_cpu: 640.0 -> 640.0 DenseHashTableBenchmark.benchmark_single_repeated_scalar_insert_scalar wall_time: 0.000102996826172 -> 0.000121116638184 allocator_maximum_num_bytes_cpu: 280.0 -> 280.0 PiperOrigin-RevId: 285790064 Change-Id: I26ed8de9f1332d8a1fb5b0cdaf3842f6e2e55d3b	2019-12-16 10:15:02 -08:00
Deven Desai	7e8ccbd22b	Adding ROCm support for the GpuSparse API (TF wrapper for cuSPARSE/hipSPARSE)	2019-12-13 15:24:42 +00:00
Gunhan Gulsoy	57c057b461	Prepare for new ops that have suitable kernels for dynamic loading. Accept the fact that old kernels will start with static linking. Once the build restructuring is complete, revisit old kernels. PiperOrigin-RevId: 285151131 Change-Id: I30fee8a789ff9733ea0573b1ce9f44bfd66a4923	2019-12-12 02:20:37 -08:00
Kaixi Hou	47306c8618	Solved a conflict	2019-12-10 16:30:16 -08:00
Brian Zhao	ead06270dc	Adding tensorflow/core/platform/default/BUILD and tensorflow/core/platform/windows/BUILD. This is part of the refactoring described in the Tensorflow Build Improvements RFC: https://github.com/tensorflow/community/pull/179 Subsequent changes will migrate targets from build_refactor.bzl into the new BUILD files. PiperOrigin-RevId: 284712709 Change-Id: I650eb200ba0ea87e95b15263bad53b0243732ef5	2019-12-10 00:08:58 -08:00
Duncan Riach	5341e3d299	Add multi-algorithm deterministic cuDNN convolutions	2019-12-06 18:18:38 -08:00
Brian Atkinson	e0fe27afa6	cudnn_rnn_ops and sdca_ops doesn't use farmhash directly and the dep can be removed. PiperOrigin-RevId: 284295152 Change-Id: I6755471f177da420fd4a051a5619392574004c12	2019-12-06 17:49:57 -08:00
Kaixi Hou	cb7e008708	Avoid to register the ctc loss kernel when cudnn is older than 7.6.3	2019-12-05 13:33:30 -08:00
Kaixi Hou	4a89f04615	Use DnnScratchAllocator	2019-12-05 09:42:44 -08:00
A. Unique TensorFlower	8053a43598	Explicitly export files needed by other packages PiperOrigin-RevId: 282800860 Change-Id: Ie7b8863629c0e4b2169c654714bf4a2338d667e5	2019-11-27 11:32:26 -08:00
ShengYang1	a1bdc83cc8	change MirrorPad packet region	2019-11-27 08:52:42 +08:00
Brian Atkinson	b12e985cc8	Use redirection point in core/platform for build_config.bzl PiperOrigin-RevId: 282458453 Change-Id: I55d997c870125aee7179e86cad664f7ecbe1e3a7	2019-11-25 16:38:05 -08:00
Brian Atkinson	b5d2f3677f	Add a redirection point to core/platform for build_config_root.bzl PiperOrigin-RevId: 282394372 Change-Id: Iea26860cafba0304fe0846f4a992ad535292029c	2019-11-25 12:25:58 -08:00
TensorFlower Gardener	f46e758677	Merge pull request #34168 from lamarrr:patch-5 PiperOrigin-RevId: 282381036 Change-Id: I9870a352cf7a664ec4f7dfab13c6bb9bff41ec72	2019-11-25 11:24:46 -08:00
A. Unique TensorFlower	0b3bcfe4d9	Explicitly export files needed by other packages PiperOrigin-RevId: 281482304 Change-Id: Iada44c7e5f7cc5070395d9966ce2a491b9fbc816	2019-11-20 02:57:36 -08:00
Zhuo Peng	de930a5ed3	Fixed a bug in DecodeProtoSparse. A repeated field may appear on wire in an non-contiguous manner. PiperOrigin-RevId: 280693833 Change-Id: Iff38580a249d2501d3508f9ca28f656cb9ec0dbc	2019-11-15 11:05:30 -08:00
Penporn Koanantakool	90e67385a4	Add Matrix{Diag,SetDiag,DiagPart}V3 ops with alignment options. V2 ops always align the diagonals to the left (LEFT_LEFT) in the compact format. V3 ops support 4 alignments: RIGHT_LEFT, LEFT_RIGHT, LEFT_LEFT, and RIGHT_RIGHT. We would like to use RIGHT_LEFT as the default alignment. This contradicts with v2's behavior so we need new a version. V2 has never been exposed to the public APIs. We will skip V2 and go from V1 to V3 directly. V3 features are currently under forward compatibility guards and will be enabled automatically in ~3 weeks from now. This commit contains - V3 API definitions. - Modifications to C++ Matrix{Diag,SetDiag,DiagPart}Op kernels (CPU, GPU, XLA) and shape inference functions to support v3. - Additional tests and gradient implementations in Python for v3. - Pfor and TFLite TOCO converters for v3. - The TFLite MLIR converter for MatrixDiagV3 is intentionally left out because of an MLIR test infrastructure issue and will be added in a separate commit. Notes: - Python changes cannot be in a separate follow-up commit because all kernel tests are in Python. (No C++ tests.) - All three ops have to be in the same commit because their gradients call each other. PiperOrigin-RevId: 280527550 Change-Id: I88e91abab5c4b50419204807ede4fa60657f048a	2019-11-14 16:02:19 -08:00
Kaixi Hou	46aa1ca220	Put the reusable class CudnnAllocatorInTemp to a separate file	2019-11-08 13:27:12 -08:00
TensorFlower Gardener	8977382c53	Merge pull request #28754 from samikama:GenerateBoxProposalsOp PiperOrigin-RevId: 279101236 Change-Id: Icf3e1b03365161708b906b44b7d544b2a4adba10	2019-11-07 09:37:48 -08:00
Adrian Kuegel	88b38d40a8	internal BUILD file cleanup PiperOrigin-RevId: 278834295 Change-Id: I5b2624c45e6b4b6aa765b23a9e73f65b4b95c626	2019-11-06 05:10:36 -08:00

1 2 3 4 5 ...

1042 Commits