STT-tensorflow/tensorflow/core/distributed_runtime
Xiao Yu fe6e64b098 Refactor eager placement logic into three util methods:
- MaybePinSmallOpsToCpu
- MaybePinToResourceDevice
- MaybePinToCustomDevice

We are going to reuse MaybePinSmallOpsToCpu in TFRT but not the other two. Because TFRT doesn't have native Resource neither Custom Device.

PiperOrigin-RevId: 317766813
Change-Id: I43241b5786120ddf39dc4bfff6071239afdfd785
2020-06-22 18:05:46 -07:00
..
eager Refactor eager placement logic into three util methods: 2020-06-22 18:05:46 -07:00
rpc Clear cancel callback when gRPC eager call returns with state. 2020-06-19 15:48:45 -07:00
base_rendezvous_mgr.cc Fix cancellation race condition in BaseRendezvousMgr::RegisterCall 2020-06-19 13:04:53 -07:00
base_rendezvous_mgr.h Fix cancellation race condition in BaseRendezvousMgr::RegisterCall 2020-06-19 13:04:53 -07:00
BUILD Fix FunctionRun's TraceMe and apply the new TraceMe APIs. 2020-06-10 21:47:11 -07:00
call_options_test.cc
call_options.cc
call_options.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
cancellable_call.h
cluster_function_library_runtime_test.cc Pass in GrpcWorkerEnv when creating GrpcWorkerCache. 2020-06-04 11:46:09 -07:00
cluster_function_library_runtime.cc Merge EagerPFLR and PFLR. 2020-04-06 16:46:20 -07:00
cluster_function_library_runtime.h Merge EagerPFLR and PFLR. 2020-04-06 16:46:20 -07:00
collective_param_resolver_distributed_test.cc [ROCm] Unit-test updates for the ROCm platform. 2020-03-16 15:54:51 +00:00
collective_param_resolver_distributed.cc
collective_param_resolver_distributed.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
collective_rma_distributed_test.cc Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
collective_rma_distributed.cc LSC: Replace cord.ToString() with std::string(cord) 2020-06-05 11:45:40 -07:00
collective_rma_distributed.h
device_resolver_distributed_test.cc Make DeviceMgr a pure virtual interface with StaticDeviceMgr as its only implementation. 2019-08-30 22:55:53 -07:00
device_resolver_distributed.cc RefreshRemoteAttributes() is used to initialize device attributes between workers when we use CollectiveOp. We need it to send GetStatus RPC with fail_fast = false so that each worker can block waiting for other workers to start up. 2019-08-29 13:49:43 -07:00
device_resolver_distributed.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
graph_mgr.cc Change the prefix of xprof arguments from "$" to "_". 2020-06-12 13:44:24 -07:00
graph_mgr.h When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
local_master.cc
local_master.h
master_env.h
master_interface.h
master_session.cc Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
master_session.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
master_test.cc Fix the call to NewHostPortGrpcChannel in distributed_runtime/master_test 2019-09-23 15:03:29 -07:00
master.cc Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
master.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
message_wrappers_test.cc
message_wrappers.cc Support TensorProtos as Operation inputs, in order to support remote inputs passed as Tensors to EagerClusterFunctionLibraryRuntime::Run. 2020-03-09 17:10:38 -07:00
message_wrappers.h Support TensorProtos as Operation inputs, in order to support remote inputs passed as Tensors to EagerClusterFunctionLibraryRuntime::Run. 2020-03-09 17:10:38 -07:00
partial_run_mgr_test.cc
partial_run_mgr.cc
partial_run_mgr.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
README.md
recent_request_ids_test.cc
recent_request_ids.cc
recent_request_ids.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
remote_device_test.cc Pass in GrpcWorkerEnv when creating GrpcWorkerCache. 2020-06-04 11:46:09 -07:00
remote_device.cc Update criteria for TPU job/worker experiments 2020-05-05 08:08:48 -07:00
remote_device.h
rendezvous_mgr_interface.h [Cleanup] Remove unused method RendezvousMgrInterface::CleanupAll(). 2020-04-13 11:26:06 -07:00
request_id_test.cc
request_id.cc
request_id.h
rpc_collective_executor_mgr_test.cc Make DeviceMgr a pure virtual interface with StaticDeviceMgr as its only implementation. 2019-08-30 22:55:53 -07:00
rpc_collective_executor_mgr.cc
rpc_collective_executor_mgr.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
rpcbench_test.cc
scheduler.cc
scheduler.h
server_lib_test.cc When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
server_lib.cc When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
server_lib.h When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
session_mgr_test.cc Automated rollback of commit 8c521f81b1 2019-09-20 10:33:51 -07:00
session_mgr.cc When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
session_mgr.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
tensor_coding_test.cc
tensor_coding.cc
tensor_coding.h
test_utils.h Make sure the rendezvous abort check is finished before triggering the callback. 2020-05-26 09:31:52 -07:00
worker_cache_logger.cc
worker_cache_logger.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
worker_cache_partial.cc
worker_cache_partial.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
worker_cache_wrapper.h
worker_cache.h
worker_env.h When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
worker_interface.h minor spelling tweaks 2020-02-27 15:42:16 +09:00
worker_session.cc When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
worker_session.h When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
worker.cc When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
worker.h Automated rollback of commit 8c521f81b1 2019-09-20 10:33:51 -07:00

Distributed TensorFlow

This directory contains the initial open-source implementation of the distributed TensorFlow runtime, using gRPC for inter-process communication.

To learn how to use the distributed runtime to create a TensorFlow cluster, see the Distributed TensorFlow How-To.