STT-tensorflow/tensorflow/compiler/xla/service/gpu
George Karpenkov d379154ca0 [XLA:GPU] Enforce collectives ordering to be their appearence in the module
The ordering is enforced using control edges. This avoids deadlocks in
multi-host launches which arise due to change in ordering due to
non-determinism (coming from autotuning).

PiperOrigin-RevId: 361069409
Change-Id: I499639aafc74f4128226bccf55fe6d48ecce3f67
2021-03-04 21:47:17 -08:00
..
data Replace instances of "blacklist" with "denylist" where possible. See Google Developer guidelines at https://developers.google.com/style/word-list#blacklist for more information. 2020-07-20 16:05:25 -07:00
ir Integrate LLVM at llvm/llvm-project@a1a1d338e9 2021-02-05 14:26:21 -08:00
llvm_gpu_backend Fix a few ClangTidy warnings 2021-01-12 06:02:22 -08:00
tests [XLA:GPU] Fix all-reduce when it's degenerate and has arbitrary reduction computation. 2021-02-24 14:34:15 -08:00
alias_passthrough_params_test.cc [XLA:GPU][NFC] Prefer using * to acces absl::optional<> values. 2021-02-04 12:29:09 -08:00
alias_passthrough_params.cc [XLA] Unify aliasing types 2020-07-07 16:07:00 -07:00
alias_passthrough_params.h
amdgpu_compiler_registration.cc
amdgpu_compiler.cc Changing "GpuVersion" datatype to include hipDeviceProp_t::gcnArchName 2021-01-11 22:57:46 +00:00
amdgpu_compiler.h [XLA/GPU] Make HloModule optional in CompileTargetBinary. 2020-12-30 14:32:07 -08:00
backend_configs.proto
buffer_allocations.cc [XLA/GPU] Remove uses of BufferAssignment in GpuExecutable. 2020-12-21 15:52:53 -08:00
buffer_allocations.h [XLA/GPU] Remove uses of BufferAssignment in GpuExecutable. 2020-12-21 15:52:53 -08:00
buffer_comparator_test.cc
buffer_comparator.cc Return a failed status (instead of crashing) when block count is too high 2021-02-19 01:49:52 -08:00
buffer_comparator.h
BUILD [XLA:GPU] Enforce collectives ordering to be their appearence in the module 2021-03-04 21:47:17 -08:00
cholesky_thunk.cc [XLA/GPU] Reduce the amount of cuSolver contexts created. 2021-02-22 17:25:34 -08:00
cholesky_thunk.h [XLA/GPU] Reduce the amount of cuSolver contexts created. 2021-02-22 17:25:34 -08:00
collective_permute_thunk.cc [XLA:GPU] Migrate CollectivePermute thunk generation to MLIR 2021-02-04 10:46:05 -08:00
collective_permute_thunk.h [XLA:GPU] Migrate CollectivePermute thunk generation to MLIR 2021-02-04 10:46:05 -08:00
conditional_thunk.cc [XLA:GPU][NFC] Avoid using rvalue references per style guide. 2021-01-13 09:18:47 -08:00
conditional_thunk.h [XLA:GPU][NFC] Avoid using rvalue references per style guide. 2021-01-13 09:18:47 -08:00
convolution_thunk.cc [XLA:GPU][NFC] Avoid using rvalue references per style guide. 2021-01-13 09:18:47 -08:00
convolution_thunk.h [XLA:GPU][NFC] Avoid using rvalue references per style guide. 2021-01-13 09:18:47 -08:00
copy_thunk.cc [XLA/GPU] Remove uses of Thunk::hlo_instruction() for profiling. 2020-07-10 15:31:01 -07:00
copy_thunk.h [XLA/GPU] Remove uses of Thunk::hlo_instruction() for profiling. 2020-07-10 15:31:01 -07:00
cublas_gemm_pad_for_tensor_cores_test.cc
cublas_gemm_pad_for_tensor_cores.cc
cublas_gemm_pad_for_tensor_cores.h
cudnn_batchnorm_rewriter.cc
cudnn_batchnorm_rewriter.h
cudnn_batchnorm_runner.cc [NFC] Eliminate references to HLO insts from CudnnBatchNorm Thunks. 2020-10-06 09:24:36 -07:00
cudnn_batchnorm_runner.h [XLA/GPU] Simplify reduction implementation. 2020-11-24 13:02:44 -08:00
cudnn_batchnorm_thunk.cc [XLA:GPU][NFC] Avoid using rvalue references per style guide. 2021-01-13 09:18:47 -08:00
cudnn_batchnorm_thunk.h [XLA:GPU][NFC] Avoid using rvalue references per style guide. 2021-01-13 09:18:47 -08:00
cudnn_fused_conv_rewriter_test.cc [XLA] Add creation_pass_id and dummy op_names to OpMetadata. 2020-12-11 17:48:13 -08:00
cudnn_fused_conv_rewriter.cc
cudnn_fused_conv_rewriter.h
cudnn_pad_for_convolutions_test.cc
cudnn_pad_for_convolutions.cc
cudnn_pad_for_convolutions.h
cusolver_context.cc
cusolver_context.h
cusolver_rewriter.cc
cusolver_rewriter.h
custom_call_test.cc [XLA:GPU] Fix Custom call implementation to correctly handle token inputs. 2021-02-22 09:12:56 -08:00
custom_call_thunk.cc [XLA:GPU] Fix Custom call implementation to correctly handle token inputs. 2021-02-22 09:12:56 -08:00
custom_call_thunk.h [XLA:GPU] Fix Custom call implementation to correctly handle token inputs. 2021-02-22 09:12:56 -08:00
elemental_ir_emitter.cc Merge pull request from nouiz:upstream-llvm_var_name 2021-02-17 06:12:19 -08:00
elemental_ir_emitter.h Merge pull request from nouiz:upstream-llvm_var_name 2021-02-17 06:12:19 -08:00
fft_thunk.cc [XLA:GPU] Don't share FFT plans across devices. 2020-11-05 17:10:09 -08:00
fft_thunk.h Prefix thread annotations with ABSL_. 2020-11-16 08:26:41 -08:00
for_thunk.cc [NFC] Eliminate references to HLO Inst from ForThunk 2020-10-06 12:56:52 -07:00
for_thunk.h [NFC] Mark Thunk subclass members const when possible 2020-10-07 10:13:20 -07:00
fusion_merger_test.cc Avoid fusing expensive ops with reusing ops. 2021-01-21 06:59:55 -08:00
fusion_merger.cc [XLA] Implement the fusion progress visualizer, which dumps out the HTML+JS page visualizing the fusion decisions for XLA:GPU 2021-01-25 17:35:11 -08:00
fusion_merger.h [XLA:GPU] Update and improve documentation of the FusionMerger HLO pass. 2020-12-09 07:44:04 -08:00
gemm_algorithm_picker.cc [XLA:GPU][NFC] Prefer using * to acces absl::optional<> values. 2021-02-04 12:29:09 -08:00
gemm_algorithm_picker.h
gemm_rewriter.cc
gemm_rewriter.h
gemm_thunk.cc [XLA:GPU][NFC] Avoid using rvalue references per style guide. 2021-01-13 09:18:47 -08:00
gemm_thunk.h [XLA:GPU][NFC] Avoid using rvalue references per style guide. 2021-01-13 09:18:47 -08:00
gpu_autotuning.proto Replace instances of "blacklist" with "denylist" where possible. See Google Developer guidelines at https://developers.google.com/style/word-list#blacklist for more information. 2020-07-20 16:05:25 -07:00
gpu_compiler.cc [XLA:GPU] Enforce collectives ordering to be their appearence in the module 2021-03-04 21:47:17 -08:00
gpu_compiler.h [XLA/GPU] Add an LMHLO -> Execution result test. 2021-01-08 15:39:11 -08:00
gpu_constants.cc
gpu_constants.h
gpu_conv_algorithm_picker.cc [XLA:GPU] Do not autotune convolutions on ROCm if requiring determinism 2021-02-26 11:40:02 -08:00
gpu_conv_algorithm_picker.h
gpu_conv_padding_legalization_test.cc Fix spatial dimension reference in conv padding legalization. 2020-09-18 03:46:36 -07:00
gpu_conv_padding_legalization.cc Fix spatial dimension reference in conv padding legalization. 2020-09-18 03:46:36 -07:00
gpu_conv_padding_legalization.h
gpu_conv_rewriter_test.cc [XLA/GPU] Fix convolution rewriter heuristics for better pattern-matching backward filter convs. 2021-02-04 12:24:45 -08:00
gpu_conv_rewriter.cc [XLA] Split permutation utilities from xla/util.* into a new xla/permutation_util.* 2021-02-09 17:01:49 -08:00
gpu_conv_rewriter.h
gpu_conv_runner.cc [XLA:GPU] Migrate convolution thunk emission to MLIR 2020-12-17 11:15:21 -08:00
gpu_conv_runner.h [XLA:GPU] Migrate convolution thunk emission to MLIR 2020-12-17 11:15:21 -08:00
gpu_copy_insertion.cc
gpu_copy_insertion.h
gpu_device_info.h Return a failed status (instead of crashing) when block count is too high 2021-02-19 01:49:52 -08:00
gpu_executable_run_options.cc Split out common NCCL utils. 2020-12-03 05:25:58 -08:00
gpu_executable_run_options.h [NFC] Moved GlobalDeviceId into its own file as it is also used by the CPU runtime. 2020-11-24 04:21:38 -08:00
gpu_executable.cc move singleton xla::gpu::GPUDebugInfoManager to xla::XlaDebugInfoManager. in hope that it can be reused by XLA/CPU. 2021-02-09 10:28:57 -08:00
gpu_executable.h Merge pull request from nouiz:upstream_maybeowning 2021-01-11 01:30:30 -08:00
gpu_fusible_test.cc
gpu_fusible.cc Properly determine the shared memory requirements of a fusion node. 2020-11-03 00:15:54 -08:00
gpu_fusible.h [XLA/GPU] Address review comments. 2020-10-12 18:00:16 -07:00
gpu_hlo_schedule_test.cc Roll-forward with fixes 2021-02-22 16:27:43 -08:00
gpu_hlo_schedule.cc Roll-forward with fixes 2021-02-22 16:27:43 -08:00
gpu_hlo_schedule.h Roll-forward with fixes 2021-02-22 16:27:43 -08:00
gpu_layout_assignment_test.cc
gpu_layout_assignment.cc Add an option to override XLA GPU conv layouts to NHWC 2020-11-16 16:06:46 -08:00
gpu_layout_assignment.h [XLA:GPU] Allow using ChannelLayoutConstraints for XLA:GPU 2021-02-23 18:14:58 -08:00
gpu_sanitize_constant_names_test.cc
gpu_sanitize_constant_names.cc
gpu_sanitize_constant_names.h
gpu_scatter_expander.cc [XLA] Expand simple scatter operations into dynamic-update-slice. 2020-08-18 13:21:28 -07:00
gpu_scatter_expander.h [XLA] Expand simple scatter operations into dynamic-update-slice. 2020-08-18 13:21:28 -07:00
gpu_transfer_manager.cc [XLA] Drop useless shape argument from TransferManager::TransferLiteralFromOutfeed. 2021-01-25 17:29:36 -08:00
gpu_transfer_manager.h [XLA] Drop useless shape argument from TransferManager::TransferLiteralFromOutfeed. 2021-01-25 17:29:36 -08:00
gpu_types.h Fix a few ClangTidy warnings 2021-01-12 06:02:22 -08:00
hlo_algorithm_denylist_test.cc Rename hlo_algorithm_blacklist to hlo_algorithm_denylist 2020-07-23 11:13:39 -07:00
hlo_algorithm_denylist.cc Rename hlo_algorithm_blacklist to hlo_algorithm_denylist 2020-07-23 11:13:39 -07:00
hlo_algorithm_denylist.h Rename hlo_algorithm_blacklist to hlo_algorithm_denylist 2020-07-23 11:13:39 -07:00
hlo_execution_profiler.cc [XLA:GPU][NFC] Prefer using * to acces absl::optional<> values. 2021-02-04 12:29:09 -08:00
hlo_execution_profiler.h [XLA/GPU] Make module parameter to GpuExecutable optional. 2020-12-29 14:33:16 -08:00
hlo_to_ir_bindings.cc [XLA/GPU] Emit constants computation by computation. This helps with incrementally transitioning to MLIR. 2020-09-14 19:16:24 -07:00
hlo_to_ir_bindings.h Roll forward XLA GPU LHLO sort op migration 2020-08-13 12:00:17 -07:00
horizontal_input_fusion_test.cc [XLA:GPU] Re-enable horizontal input fusion. Use a minimum of 64 threads per block for multi-output fusions. 2020-11-17 03:28:50 -08:00
horizontal_input_fusion.cc Merge pull request from trentlo:horizontal_input_fusion_again 2020-10-15 01:40:44 -07:00
horizontal_input_fusion.h Implement horizontal input fusion. 2020-10-12 17:59:55 -07:00
horizontal_loop_fusion_test.cc [XLA/GPU] Re-enable h-loop-fusion to share operands with users. 2021-02-24 15:34:18 -08:00
horizontal_loop_fusion.cc [XLA/GPU] Re-enable h-loop-fusion to share operands with users. 2021-02-24 15:34:18 -08:00
horizontal_loop_fusion.h Rename horizontal_fusion to horizontal_loop_fusion. 2020-09-16 17:58:22 -07:00
infeed_manager.cc
infeed_manager.h
infeed_thunk.cc [XLA:GPU][NFC] Avoid using rvalue references per style guide. 2021-01-13 09:18:47 -08:00
infeed_thunk.h [XLA:GPU][NFC] Avoid using rvalue references per style guide. 2021-01-13 09:18:47 -08:00
instruction_fusion_test.cc
instruction_fusion.cc Evaluate the maximum code duplication in fusion nodes instead of average. 2020-09-02 07:36:23 -07:00
instruction_fusion.h
ir_emission_utils_test.cc Integrate LLVM at llvm/llvm-project@418c218efa 2021-02-11 10:53:15 -08:00
ir_emission_utils.cc [XLA/GPU] Remove unintended debug line. 2021-02-04 14:52:01 -08:00
ir_emission_utils.h Roll-forward with fix: 2021-02-03 15:25:54 -08:00
ir_emitter_context.h [XLA/GPU] Add optional BufferAllocation fields to IrEmitterContext, and 2020-12-30 14:56:31 -08:00
ir_emitter_nested.cc [XLA/GPU] Emit constants computation by computation. This helps with incrementally transitioning to MLIR. 2020-09-14 19:16:24 -07:00
ir_emitter_nested.h [XLA/GPU] Emit constants computation by computation. This helps with incrementally transitioning to MLIR. 2020-09-14 19:16:24 -07:00
ir_emitter_unnested.cc [XLA/GPU] Cleanup all rest of the emitters for supporting LMHLO. 2021-02-23 14:17:11 -08:00
ir_emitter_unnested.h [XLA/GPU] Cleanup all rest of the emitters for supporting LMHLO. 2021-02-23 14:17:11 -08:00
ir_emitter.cc Integrate LLVM at llvm/llvm-project@bf6380c096 2021-02-26 04:08:05 -08:00
ir_emitter.h [XLA/GPU] Plumb through Bitcast op for LMHLO. 2021-02-10 19:49:41 -08:00
kernel_mapping_scheme.h Merge pull request from nouiz:upstream_copy_bug 2020-05-13 12:09:48 -07:00
kernel_thunk.cc XLA Parallel reduce. 2020-08-27 15:21:33 -07:00
kernel_thunk.h [XLA/GPU] Remove uses of Thunk::hlo_instruction() for profiling. 2020-07-10 15:31:01 -07:00
launch_dimensions.cc Return a failed status (instead of crashing) when block count is too high 2021-02-19 01:49:52 -08:00
launch_dimensions.h Return a failed status (instead of crashing) when block count is too high 2021-02-19 01:49:52 -08:00
memset_thunk.cc [XLA/GPU] Remove uses of Thunk::hlo_instruction() for profiling. 2020-07-10 15:31:01 -07:00
memset_thunk.h [NFC] Mark Thunk subclass members const when possible 2020-10-07 10:13:20 -07:00
multi_output_fusion_test.cc Properly determine the shared memory requirements of a fusion node. 2020-11-03 00:15:54 -08:00
multi_output_fusion.cc [XLA] Implement the fusion progress visualizer, which dumps out the HTML+JS page visualizing the fusion decisions for XLA:GPU 2021-01-25 17:35:11 -08:00
multi_output_fusion.h [XLA] Implement the fusion progress visualizer, which dumps out the HTML+JS page visualizing the fusion decisions for XLA:GPU 2021-01-25 17:35:11 -08:00
nccl_all_gather_thunk.cc [XLA:GPU] Fix collective communication ops to work correctly in SPMD mode. 2021-02-22 12:42:21 -08:00
nccl_all_gather_thunk.h [XLA:GPU] Fix collective communication ops to work correctly in SPMD mode. 2021-02-22 12:42:21 -08:00
nccl_all_reduce_thunk.cc [XLA:GPU] Fix collective communication ops to work correctly in SPMD mode. 2021-02-22 12:42:21 -08:00
nccl_all_reduce_thunk.h [XLA:GPU] Fix all-reduce when it's degenerate and has arbitrary reduction computation. 2021-02-24 14:34:15 -08:00
nccl_all_to_all_thunk.cc [XLA:GPU] Fix collective communication ops to work correctly in SPMD mode. 2021-02-22 12:42:21 -08:00
nccl_all_to_all_thunk.h [XLA:GPU] Fix collective communication ops to work correctly in SPMD mode. 2021-02-22 12:42:21 -08:00
nccl_collective_thunk.cc [XLA][NFC] Add a struct to describe the LogicalID of a device. 2021-02-22 19:40:16 -08:00
nccl_collective_thunk.h [XLA:GPU] Fix collective communication ops to work correctly in SPMD mode. 2021-02-22 12:42:21 -08:00
nccl_test_utils_dummy.cc [XLA-GPU] NFC: Rename dummy NCCL files, so they appear next to the real ones in the file list. 2020-12-08 04:36:45 -08:00
nccl_test_utils.cc [XLA:GPU] Fix build failure in nccl_test_utils.cc 2021-02-01 09:37:38 -08:00
nccl_test_utils.h Split out common NCCL utils. 2020-12-03 05:25:58 -08:00
nccl_utils.cc Store NcclCliques in new NcclCliqueMap. 2021-01-29 04:16:02 -08:00
nccl_utils.h [XLA:GPU][NFC] Avoid using fully qualified namespace. 2021-02-01 18:30:57 -08:00
nvptx_compiler_registration.cc Add a skeleton for the MLIR GPU backend. 2019-07-16 03:18:23 -07:00
nvptx_compiler.cc Merge pull request from nouiz:upstream-llvm_file 2021-02-26 02:44:59 -08:00
nvptx_compiler.h [XLA/GPU] Make HloModule optional in CompileTargetBinary. 2020-12-30 14:32:07 -08:00
outfeed_manager.cc
outfeed_manager.h
outfeed_thunk.cc [XLA:GPU] Migrate outfeed thunk emission to MLIR 2021-01-12 17:08:21 -08:00
outfeed_thunk.h [XLA:GPU] Migrate outfeed thunk emission to MLIR 2021-01-12 17:08:21 -08:00
parallel_loop_emitter.cc Merge pull request from nouiz:upstream_master_grid_size 2020-10-06 04:04:24 -07:00
parallel_loop_emitter.h Merge pull request from nouiz:upstream_master_grid_size 2020-10-06 04:04:24 -07:00
reduction_degenerate_dim_remover.cc Merge pull request from wangsiyu:master 2021-01-21 12:24:31 -08:00
reduction_degenerate_dim_remover.h
reduction_dimension_grouper.cc
reduction_dimension_grouper.h
reduction_layout_normalizer.cc Add op metadata to bitcasts inserted by ReductionLayoutNormalizer. 2020-06-16 10:36:57 -07:00
reduction_layout_normalizer.h
reduction_splitter_test.cc [XLA:GPU] Split reduce ops with large but non-consecutive reduction dimensions. 2020-06-19 10:35:29 -07:00
reduction_splitter.cc [XLA:GPU] Split reduce ops with large but non-consecutive reduction dimensions. 2020-06-19 10:35:29 -07:00
reduction_splitter.h [XLA:GPU] Split reduce ops with large but non-consecutive reduction dimensions. 2020-06-19 10:35:29 -07:00
replica_id_thunk.cc [XLA][NFC] Add a struct to describe the LogicalID of a device. 2021-02-22 19:40:16 -08:00
replica_id_thunk.h [XLA:GPU] Add support for PartitionId 2021-01-29 13:42:19 -08:00
sequential_thunk.cc [XLA/GPU] Remove Thunk::hlo_instruction(). 2020-07-13 15:32:19 -07:00
sequential_thunk.h [XLA/GPU] Remove Thunk::hlo_instruction(). 2020-07-13 15:32:19 -07:00
stream_assignment_test.cc
stream_assignment.cc
stream_assignment.h
stream_executor_util.cc [XLA:GPU] Migrate convolution thunk emission to MLIR 2020-12-14 20:55:56 -08:00
stream_executor_util.h [XLA:GPU] Migrate convolution thunk emission to MLIR 2020-12-14 20:55:56 -08:00
target_constants.h
target_util.cc [XLA] More readable emitted LLVM code. 2021-02-15 06:17:34 -08:00
target_util.h [XLA] More readable emitted LLVM code. 2021-02-15 06:17:34 -08:00
thunk_emitter.cc [XLA:GPU] Migrate TriangularSolve thunk emission to use MLIR 2021-02-05 11:30:21 -08:00
thunk_emitter.h [XLA:GPU] Migrate TriangularSolve thunk emission to use MLIR 2021-02-05 11:30:21 -08:00
thunk_schedule.cc [XLA/GPU] Add an LMHLO -> Execution result test. 2021-01-08 15:39:11 -08:00
thunk_schedule.h [XLA/GPU] Add an LMHLO -> Execution result test. 2021-01-08 15:39:11 -08:00
thunk.cc [XLA:GPU] Add support for PartitionId 2021-01-29 13:42:19 -08:00
thunk.h [XLA:GPU] Add support for PartitionId 2021-01-29 13:42:19 -08:00
tree_reduction_rewriter.cc
tree_reduction_rewriter.h
triangular_solve_thunk.cc [XLA/GPU] Remove uses of Thunk::hlo_instruction() for profiling. 2020-07-10 15:31:01 -07:00
triangular_solve_thunk.h [XLA/GPU] Remove uses of Thunk::hlo_instruction() for profiling. 2020-07-10 15:31:01 -07:00
variadic_op_splitter_test.cc
variadic_op_splitter.cc
variadic_op_splitter.h
while_thunk.cc [NFC] Eliminate references to HLO Inst from WhileThunk 2020-10-06 13:55:38 -07:00
while_thunk.h [NFC] Mark Thunk subclass members const when possible 2020-10-07 10:13:20 -07:00
while_transformer_test.cc
xfeed_queue.h