STT-tensorflow/tensorflow/core/distributed_runtime
Isha Arkatkar 6fd9628434 Add a protected getter for host_name field in GrpcServer
PiperOrigin-RevId: 356420826
Change-Id: I461be80631878ace47ca107fee304957df12837e
2021-02-08 21:24:14 -08:00
..
eager Fix a race between EagerContext::SetReuseRendezvousForFunctions and async op execution 2021-01-27 10:28:30 -08:00
rpc Add a protected getter for host_name field in GrpcServer 2021-02-08 21:24:14 -08:00
base_rendezvous_mgr.cc Fix cancellation race condition in BaseRendezvousMgr::RegisterCall 2020-06-19 13:04:53 -07:00
base_rendezvous_mgr.h Fix cancellation race condition in BaseRendezvousMgr::RegisterCall 2020-06-19 13:04:53 -07:00
BUILD Support aborting RING communication in multi worker collectives 2020-10-21 17:03:09 -07:00
call_options_test.cc
call_options.cc
call_options.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
cancellable_call.cc Support aborting RING communication in multi worker collectives 2020-10-21 17:03:09 -07:00
cancellable_call.h Support aborting RING communication in multi worker collectives 2020-10-21 17:03:09 -07:00
cluster_function_library_runtime_test.cc Pass in GrpcWorkerEnv when creating GrpcWorkerCache. 2020-06-04 11:46:09 -07:00
cluster_function_library_runtime.cc Change the function output type, either a Tensor for a local output or a TensorShape for a remote output, preparing for the support of function outputs placed on remote workers. 2020-08-04 19:13:03 -07:00
cluster_function_library_runtime.h Use the original output indices when adding a component function output to RemoteMgr. 2020-08-19 14:41:05 -07:00
collective_param_resolver_distributed_test.cc Ensure that CollectiveParams outlives all references to it. 2021-02-04 10:07:20 -08:00
collective_param_resolver_distributed.cc Allow cancellation of v2 collectives during param resolution 2021-02-05 10:03:57 -08:00
collective_param_resolver_distributed.h Support aborting param resolution in multi worker collectives 2020-10-21 18:40:58 -07:00
collective_rma_distributed_test.cc Set a timeout to check health RPC 2020-10-21 13:02:25 -07:00
collective_rma_distributed.cc Support aborting RING communication in multi worker collectives 2020-10-21 17:03:09 -07:00
collective_rma_distributed.h Support aborting RING communication in multi worker collectives 2020-10-21 17:03:09 -07:00
device_resolver_distributed_test.cc Use device attributes from group resolution 2020-09-09 10:53:49 -07:00
device_resolver_distributed.cc Use device attributes from group resolution 2020-09-09 10:53:49 -07:00
device_resolver_distributed.h Use device attributes from group resolution 2020-09-09 10:53:49 -07:00
graph_mgr.cc [TF2XLA] Remove the serialization of CustomKernelCreator, since there is only one, and we won't add new ones 2020-09-22 10:59:59 -07:00
graph_mgr.h When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
local_master.cc
local_master.h
master_env.h Fix two memory leaks and enable asan for C API remote tests. 2020-07-17 10:17:20 -07:00
master_interface.h
master_session.cc fix typos in core directory 2020-10-29 02:52:55 +03:00
master_session.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
master_test.cc
master.cc Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
master.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
message_wrappers_test.cc
message_wrappers.cc Support TensorProtos as Operation inputs, in order to support remote inputs passed as Tensors to EagerClusterFunctionLibraryRuntime::Run. 2020-03-09 17:10:38 -07:00
message_wrappers.h Support TensorProtos as Operation inputs, in order to support remote inputs passed as Tensors to EagerClusterFunctionLibraryRuntime::Run. 2020-03-09 17:10:38 -07:00
partial_run_mgr_test.cc
partial_run_mgr.cc
partial_run_mgr.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
README.md
recent_request_ids_test.cc Internal tests cleanup. 2020-10-27 13:24:35 -07:00
recent_request_ids.cc
recent_request_ids.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
remote_device_test.cc Pass in GrpcWorkerEnv when creating GrpcWorkerCache. 2020-06-04 11:46:09 -07:00
remote_device.cc - Adds support to optionally adds replica as part of job name 2021-01-12 12:26:47 -08:00
remote_device.h - Adds support to optionally adds replica as part of job name 2021-01-12 12:26:47 -08:00
rendezvous_mgr_interface.h [Cleanup] Remove unused method RendezvousMgrInterface::CleanupAll(). 2020-04-13 11:26:06 -07:00
request_id_test.cc
request_id.cc
request_id.h
rpc_collective_executor_mgr_test.cc Make NcclManager part of CollectiveExecutorMgr 2020-09-17 14:35:16 -07:00
rpc_collective_executor_mgr.cc Make NcclManager part of CollectiveExecutorMgr 2020-09-17 14:35:16 -07:00
rpc_collective_executor_mgr.h Make NcclManager part of CollectiveExecutorMgr 2020-09-17 14:35:16 -07:00
rpcbench_test.cc Internal tests cleanup. 2020-10-27 13:24:35 -07:00
scheduler.cc
scheduler.h
server_lib_test.cc When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
server_lib.cc When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
server_lib.h When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
session_mgr_test.cc Garbage collect old WorkerSession when the restarted master task create new one. 2020-08-03 11:31:26 -07:00
session_mgr.cc Garbage collect old WorkerSession when the restarted master task create new one. 2020-08-03 11:31:26 -07:00
session_mgr.h Garbage collect old WorkerSession when the restarted master task create new one. 2020-08-03 11:31:26 -07:00
tensor_coding_test.cc Internal tests cleanup. 2020-10-27 13:24:35 -07:00
tensor_coding.cc
tensor_coding.h
test_utils.h Set a timeout to check health RPC 2020-10-21 13:02:25 -07:00
worker_cache_logger.cc
worker_cache_logger.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
worker_cache_partial.cc
worker_cache_partial.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
worker_cache_wrapper.h
worker_cache.h
worker_env.h Fix two memory leaks and enable asan for C API remote tests. 2020-07-17 10:17:20 -07:00
worker_interface.h Set a timeout to check health RPC 2020-10-21 13:02:25 -07:00
worker_session.cc When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
worker_session.h When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
worker.cc Set a timeout to check health RPC 2020-10-21 13:02:25 -07:00
worker.h Set a timeout to check health RPC 2020-10-21 13:02:25 -07:00

Distributed TensorFlow

This directory contains the initial open-source implementation of the distributed TensorFlow runtime, using gRPC for inter-process communication.

To learn how to use the distributed runtime to create a TensorFlow cluster, see the Distributed TensorFlow How-To.