STT-tensorflow/tensorflow/core/distributed_runtime
Xiao Yu fe6e64b098 Refactor eager placement logic into three util methods:
- MaybePinSmallOpsToCpu
- MaybePinToResourceDevice
- MaybePinToCustomDevice

We are going to reuse MaybePinSmallOpsToCpu in TFRT but not the other two. Because TFRT doesn't have native Resource neither Custom Device.

PiperOrigin-RevId: 317766813
Change-Id: I43241b5786120ddf39dc4bfff6071239afdfd785
2020-06-22 18:05:46 -07:00
..
eager Refactor eager placement logic into three util methods: 2020-06-22 18:05:46 -07:00
rpc Clear cancel callback when gRPC eager call returns with state. 2020-06-19 15:48:45 -07:00
base_rendezvous_mgr.cc Fix cancellation race condition in BaseRendezvousMgr::RegisterCall 2020-06-19 13:04:53 -07:00
base_rendezvous_mgr.h Fix cancellation race condition in BaseRendezvousMgr::RegisterCall 2020-06-19 13:04:53 -07:00
BUILD Fix FunctionRun's TraceMe and apply the new TraceMe APIs. 2020-06-10 21:47:11 -07:00
call_options_test.cc
call_options.cc A series of changes to significantly reduce the number of allocations 2016-06-27 13:32:57 -07:00
call_options.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
cancellable_call.h
cluster_function_library_runtime_test.cc Pass in GrpcWorkerEnv when creating GrpcWorkerCache. 2020-06-04 11:46:09 -07:00
cluster_function_library_runtime.cc Merge EagerPFLR and PFLR. 2020-04-06 16:46:20 -07:00
cluster_function_library_runtime.h Merge EagerPFLR and PFLR. 2020-04-06 16:46:20 -07:00
collective_param_resolver_distributed_test.cc [ROCm] Unit-test updates for the ROCm platform. 2020-03-16 15:54:51 +00:00
collective_param_resolver_distributed.cc
collective_param_resolver_distributed.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
collective_rma_distributed_test.cc Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
collective_rma_distributed.cc LSC: Replace cord.ToString() with std::string(cord) 2020-06-05 11:45:40 -07:00
collective_rma_distributed.h Share ownership of UnboundedWorkQueue between collective executor and 2019-08-05 15:31:03 -07:00
device_resolver_distributed_test.cc Make DeviceMgr a pure virtual interface with StaticDeviceMgr as its only implementation. 2019-08-30 22:55:53 -07:00
device_resolver_distributed.cc RefreshRemoteAttributes() is used to initialize device attributes between workers when we use CollectiveOp. We need it to send GetStatus RPC with fail_fast = false so that each worker can block waiting for other workers to start up. 2019-08-29 13:49:43 -07:00
device_resolver_distributed.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
graph_mgr.cc Change the prefix of xprof arguments from "$" to "_". 2020-06-12 13:44:24 -07:00
graph_mgr.h When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
local_master.cc
local_master.h Address compiler warnings in tensorflow/core/distributed_runtime. 2018-06-05 08:23:35 -07:00
master_env.h fix C++ header guards. 2018-08-21 16:22:05 -07:00
master_interface.h Implement duplicate checking on Master methods 2019-02-07 12:59:47 -08:00
master_session.cc Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
master_session.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
master_test.cc Fix the call to NewHostPortGrpcChannel in distributed_runtime/master_test 2019-09-23 15:03:29 -07:00
master.cc Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
master.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
message_wrappers_test.cc
message_wrappers.cc Support TensorProtos as Operation inputs, in order to support remote inputs passed as Tensors to EagerClusterFunctionLibraryRuntime::Run. 2020-03-09 17:10:38 -07:00
message_wrappers.h Support TensorProtos as Operation inputs, in order to support remote inputs passed as Tensors to EagerClusterFunctionLibraryRuntime::Run. 2020-03-09 17:10:38 -07:00
partial_run_mgr_test.cc Replace calls to deprecated googletest macros *TEST_CASE() with *TEST_SUITE() 2019-01-16 11:08:17 -08:00
partial_run_mgr.cc
partial_run_mgr.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
README.md Fix how-to reference in distributed runtime README (#9772) 2017-05-12 06:35:31 -07:00
recent_request_ids_test.cc
recent_request_ids.cc
recent_request_ids.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
remote_device_test.cc Pass in GrpcWorkerEnv when creating GrpcWorkerCache. 2020-06-04 11:46:09 -07:00
remote_device.cc Update criteria for TPU job/worker experiments 2020-05-05 08:08:48 -07:00
remote_device.h The remote device manager in WorkerSession contains only RemoteDevice instance which has device->IsLocal() == false even if the device is on the local host. This patch ensures that device->IsLocal() should return true if and only if this device is on the local host. 2019-08-20 13:50:02 -07:00
rendezvous_mgr_interface.h [Cleanup] Remove unused method RendezvousMgrInterface::CleanupAll(). 2020-04-13 11:26:06 -07:00
request_id_test.cc
request_id.cc
request_id.h Remove THIRD_PARTY_ from #include guards 2018-01-24 14:31:28 -08:00
rpc_collective_executor_mgr_test.cc Make DeviceMgr a pure virtual interface with StaticDeviceMgr as its only implementation. 2019-08-30 22:55:53 -07:00
rpc_collective_executor_mgr.cc Share ownership of UnboundedWorkQueue between collective executor and 2019-08-05 15:31:03 -07:00
rpc_collective_executor_mgr.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
rpcbench_test.cc Cleanup: Ran clang-format on files in tensorflow/core/.../*.{cc,h}. 2018-01-30 12:27:47 -08:00
scheduler.cc
scheduler.h Added const to Node* in various parts of the code base. 2018-02-27 14:33:33 -08:00
server_lib_test.cc When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
server_lib.cc When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
server_lib.h When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
session_mgr_test.cc Automated rollback of commit 8c521f81b1 2019-09-20 10:33:51 -07:00
session_mgr.cc When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
session_mgr.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
tensor_coding_test.cc Update core/distributed_runtime to use tstring. 2019-08-22 21:31:18 -07:00
tensor_coding.cc
tensor_coding.h
test_utils.h Make sure the rendezvous abort check is finished before triggering the callback. 2020-05-26 09:31:52 -07:00
worker_cache_logger.cc Improve timeline logging for distributed execution. 2018-11-06 14:35:59 -08:00
worker_cache_logger.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
worker_cache_partial.cc
worker_cache_partial.h Prefixing TensorFlow thread annotation macros with TF_. 2020-03-05 08:42:01 -08:00
worker_cache_wrapper.h Fix compiler warnings in worker_cache_wrapper.h. 2019-07-31 09:15:51 -07:00
worker_cache.h Instead of creating EagerClientCache by a separate factory method, this change add GetEagerClientCache in WorkerCacheInterface to allow it create EagerClientCache. With this change, we don't need to keep channel_cache in grpc_server_lib anymore since all instance that needs channel_cache will be created by WorkerCacheInterface. 2019-06-24 16:53:44 -07:00
worker_env.h When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
worker_interface.h minor spelling tweaks 2020-02-27 15:42:16 +09:00
worker_session.cc When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
worker_session.h When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
worker.cc When calling connect_to_cluser, if the options are identical and there is no renaming of local device, reuse existing local DeviceManager, otherwise we keep the old DeviceManager around to allow the old Tensor created to be usable. 2020-05-20 08:53:52 -07:00
worker.h Automated rollback of commit 8c521f81b1 2019-09-20 10:33:51 -07:00

Distributed TensorFlow

This directory contains the initial open-source implementation of the distributed TensorFlow runtime, using gRPC for inter-process communication.

To learn how to use the distributed runtime to create a TensorFlow cluster, see the Distributed TensorFlow How-To.