STT-tensorflow/tensorflow/core/distributed_runtime
Chris Kennelly c04bf06bfc Optimize calls to std::string::find() and friends for a single char.
The character literal overload is more efficient.

PiperOrigin-RevId: 348124169
Change-Id: I55909265a8267017210eb0deff5091da20d8ed70
2020-12-17 17:48:51 -08:00
..
eager Fix an issue of out of order execution. For a multi-device function, don't send a packed input to the function device until all underlying remote handles are ready on remote devices. Otherwise, on a remote worker, a remote component function execution request could be enqueued before a request for producing a function input. 2020-12-11 14:56:03 -08:00
rpc Optimize calls to std::string::find() and friends for a single char. 2020-12-17 17:48:51 -08:00
base_rendezvous_mgr.cc Fix cancellation race condition in BaseRendezvousMgr::RegisterCall 2020-06-19 13:04:53 -07:00
base_rendezvous_mgr.h Fix cancellation race condition in BaseRendezvousMgr::RegisterCall 2020-06-19 13:04:53 -07:00
BUILD Support aborting RING communication in multi worker collectives 2020-10-21 17:03:09 -07:00
call_options_test.cc
call_options.cc A series of changes to significantly reduce the number of allocations 2016-06-27 13:32:57 -07:00
call_options.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
cancellable_call.cc Support aborting RING communication in multi worker collectives 2020-10-21 17:03:09 -07:00
cancellable_call.h Support aborting RING communication in multi worker collectives 2020-10-21 17:03:09 -07:00
cluster_function_library_runtime_test.cc Pass in GrpcWorkerEnv when creating GrpcWorkerCache. 2020-06-04 11:46:09 -07:00
cluster_function_library_runtime.cc Change the function output type, either a Tensor for a local output or a TensorShape for a remote output, preparing for the support of function outputs placed on remote workers. 2020-08-04 19:13:03 -07:00
cluster_function_library_runtime.h Use the original output indices when adding a component function output to RemoteMgr. 2020-08-19 14:41:05 -07:00
collective_param_resolver_distributed_test.cc Set a timeout to check health RPC 2020-10-21 13:02:25 -07:00
collective_param_resolver_distributed.cc Support aborting param resolution in multi worker collectives 2020-10-21 18:40:58 -07:00
collective_param_resolver_distributed.h Support aborting param resolution in multi worker collectives 2020-10-21 18:40:58 -07:00
collective_rma_distributed_test.cc Set a timeout to check health RPC 2020-10-21 13:02:25 -07:00
collective_rma_distributed.cc Support aborting RING communication in multi worker collectives 2020-10-21 17:03:09 -07:00
collective_rma_distributed.h Support aborting RING communication in multi worker collectives 2020-10-21 17:03:09 -07:00
device_resolver_distributed_test.cc Use device attributes from group resolution 2020-09-09 10:53:49 -07:00
device_resolver_distributed.cc Use device attributes from group resolution 2020-09-09 10:53:49 -07:00
device_resolver_distributed.h Use device attributes from group resolution 2020-09-09 10:53:49 -07:00
graph_mgr.cc [TF2XLA] Remove the serialization of CustomKernelCreator, since there is only one, and we won't add new ones 2020-09-22 10:59:59 -07:00
graph_mgr.h When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
local_master.cc Add remote session support for the MakeCallable API. 2018-04-06 18:18:06 -07:00
local_master.h Address compiler warnings in tensorflow/core/distributed_runtime. 2018-06-05 08:23:35 -07:00
master_env.h Fix two memory leaks and enable asan for C API remote tests. 2020-07-17 10:17:20 -07:00
master_interface.h
master_session.cc fix typos in core directory 2020-10-29 02:52:55 +03:00
master_session.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
master_test.cc Fix the call to NewHostPortGrpcChannel in distributed_runtime/master_test 2019-09-23 15:03:29 -07:00
master.cc Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
master.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
message_wrappers_test.cc
message_wrappers.cc Support TensorProtos as Operation inputs, in order to support remote inputs passed as Tensors to EagerClusterFunctionLibraryRuntime::Run. 2020-03-09 17:10:38 -07:00
message_wrappers.h Support TensorProtos as Operation inputs, in order to support remote inputs passed as Tensors to EagerClusterFunctionLibraryRuntime::Run. 2020-03-09 17:10:38 -07:00
partial_run_mgr_test.cc
partial_run_mgr.cc
partial_run_mgr.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
README.md Fix how-to reference in distributed runtime README (#9772) 2017-05-12 06:35:31 -07:00
recent_request_ids_test.cc Internal tests cleanup. 2020-10-27 13:24:35 -07:00
recent_request_ids.cc
recent_request_ids.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
remote_device_test.cc Pass in GrpcWorkerEnv when creating GrpcWorkerCache. 2020-06-04 11:46:09 -07:00
remote_device.cc replace PFLR DeviceGetContext hardcode with Device::IsRemoteCallAllowed 2020-10-22 20:49:03 +02:00
remote_device.h The remote device manager in WorkerSession contains only RemoteDevice instance which has device->IsLocal() == false even if the device is on the local host. This patch ensures that device->IsLocal() should return true if and only if this device is on the local host. 2019-08-20 13:50:02 -07:00
rendezvous_mgr_interface.h [Cleanup] Remove unused method RendezvousMgrInterface::CleanupAll(). 2020-04-13 11:26:06 -07:00
request_id_test.cc Reject retried RecvTensor requests. 2018-01-22 17:30:59 -08:00
request_id.cc
request_id.h
rpc_collective_executor_mgr_test.cc Make NcclManager part of CollectiveExecutorMgr 2020-09-17 14:35:16 -07:00
rpc_collective_executor_mgr.cc Make NcclManager part of CollectiveExecutorMgr 2020-09-17 14:35:16 -07:00
rpc_collective_executor_mgr.h Make NcclManager part of CollectiveExecutorMgr 2020-09-17 14:35:16 -07:00
rpcbench_test.cc Internal tests cleanup. 2020-10-27 13:24:35 -07:00
scheduler.cc
scheduler.h
server_lib_test.cc When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
server_lib.cc When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
server_lib.h When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
session_mgr_test.cc Garbage collect old WorkerSession when the restarted master task create new one. 2020-08-03 11:31:26 -07:00
session_mgr.cc Garbage collect old WorkerSession when the restarted master task create new one. 2020-08-03 11:31:26 -07:00
session_mgr.h Garbage collect old WorkerSession when the restarted master task create new one. 2020-08-03 11:31:26 -07:00
tensor_coding_test.cc Internal tests cleanup. 2020-10-27 13:24:35 -07:00
tensor_coding.cc
tensor_coding.h New Timestamped BFCAllocator and GPUKernelTracker. 2019-02-06 11:01:38 -08:00
test_utils.h Set a timeout to check health RPC 2020-10-21 13:02:25 -07:00
worker_cache_logger.cc
worker_cache_logger.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
worker_cache_partial.cc
worker_cache_partial.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
worker_cache_wrapper.h Fix compiler warnings in worker_cache_wrapper.h. 2019-07-31 09:15:51 -07:00
worker_cache.h
worker_env.h Fix two memory leaks and enable asan for C API remote tests. 2020-07-17 10:17:20 -07:00
worker_interface.h Set a timeout to check health RPC 2020-10-21 13:02:25 -07:00
worker_session.cc When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
worker_session.h When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
worker.cc Set a timeout to check health RPC 2020-10-21 13:02:25 -07:00
worker.h Set a timeout to check health RPC 2020-10-21 13:02:25 -07:00

Distributed TensorFlow

This directory contains the initial open-source implementation of the distributed TensorFlow runtime, using gRPC for inter-process communication.

To learn how to use the distributed runtime to create a TensorFlow cluster, see the Distributed TensorFlow How-To.