STT-tensorflow/tensorflow/core/nccl
Ayush Dubey 18eaf4e8f1 Ensure that CollectiveParams outlives all references to it.
Before this change, it was possible to access a `const CollectiveParams&` after
it was destroyed.  For example, the call to `UnblockDependencies` in
`NcclCommunicator::Enqueue` raced with the done_callback of the collective
participant.

This change makes `CollectiveParams` a refcounted object, and holds references
everywhere it may be accessed.

PiperOrigin-RevId: 355646163
Change-Id: I7fd164afe8c1c9aa1c3b77a988930a0624977c7c
2021-02-04 10:07:20 -08:00
..
BUILD Disable flaky test nccl_manager_test in guitar 2021-01-05 23:02:19 -08:00
collective_communicator.cc Ensure that CollectiveParams outlives all references to it. 2021-02-04 10:07:20 -08:00
collective_communicator.h Make NcclManager part of CollectiveExecutorMgr 2020-09-17 14:35:16 -07:00
nccl_manager_test.cc fix typos in core directory 2020-10-29 02:52:55 +03:00
nccl_manager.cc Rely on cancellation in collective V2 kernels 2020-10-22 11:11:50 -07:00
nccl_manager.h Enable resetting NcclManager after a previous StartAbort. 2020-10-21 11:33:23 -07:00
nccl_rewrite.cc