Commit Graph

63 Commits

Author SHA1 Message Date
A. Unique TensorFlower
e647a3b425 Add experimental C API to access EagerContext context ID.
PiperOrigin-RevId: 317476439
Change-Id: I9e97bce61cf526695f0c903b5f4f837116fef455
2020-06-20 11:32:20 -07:00
Gaurav Jain
d5b3ec27d1 Allow dynamically configuring device placement
Enable setting soft device placement as well as logging dynamically.
This required ensuring the device placement policy was part of the cache
key.

Further, we fix the logging to ensure in eager mode if a kernel is
retrieved from the kernel cache, then the execution is still logged. We
also log closer to the actual op execution to avoid logging before all
checks have been done.

PiperOrigin-RevId: 311271808
Change-Id: I9765228894f84a3447cc03332a2559f6d933165b
2020-05-12 23:17:39 -07:00
Yujing Zhang
7e6ea21148 Support running a function with packed input handles through C APIs.
Introduce a C API TFE_CreatePackedTensorHandle which creates a TFE_TensorHandle referring to multiple TFE_TensorHandles.

PiperOrigin-RevId: 310610230
Change-Id: Icc0ffd5c58ad7780eca38d552c1a2f4617f04891
2020-05-08 12:53:55 -07:00
Allen Lavoie
6e3bea20a1 Less pointer indirection for TFE_OpAttrs, add TFE_OpGetAttrs
We'll want this for implementing copy for `TF_AbstractOp`s backed by `TFE_Op`s (since we want to copy the type/attributes but not the inputs).

PiperOrigin-RevId: 309756974
Change-Id: I07a8c48f50ab6d3c8a7d7db972fb60202b86434d
2020-05-04 09:24:03 -07:00
Allen Lavoie
e0606af65f Small cleanups for experimental TFE attribute APIs
The op name was included twice, and TFE_OpGetAttrs is unusable without a way to allocate a TFE_OpAttrs on the heap (and so has no callers). I'm removing it for now.

PiperOrigin-RevId: 308859222
Change-Id: Ibb3901a1821ffc2e9ebc0efb26592e5b3d8bb88f
2020-04-28 11:20:43 -07:00
Allen Lavoie
d8c89c1cd7 Fix API exports for the experimental TFE_RegisterCustomDevice
Extern, plus it was missing TF_CAPI_EXPORT which is probably the main reason it wasn't in the Windows DLL

PiperOrigin-RevId: 305795200
Change-Id: I7ab3d847f3f60f71588f19bfa962a861d02bba44
2020-04-09 17:36:59 -07:00
Mihai Maruseac
3f70de8266 Automated rollback of 17b7bc01bc
PiperOrigin-RevId: 305779485
Change-Id: Ifa9eda9d594916b0e9ddb57d52cb69cb534ae56a
2020-04-09 16:07:38 -07:00
Allen Lavoie
17b7bc01bc Add a way to register custom devices with the Python TFE_Context
The API accepts TFE_RegisterCustomDevice arguments as PyCapsules, so each custom device will need some method to create those. Presumably most custom devices will end up wrapping the PyCapsule creation+registration rather than exposing it to the user.

No public API yet, but this is roughly what I have in mind at the moment.

This only works with --config=monolithic or when the custom device registration is bundled with pywrap_tensorflow.so right now since that has its own copy of the C API. Something like this could work if we switched pywrap_tensorflow.so to instead rely on libtensorflow.so for the C API, then custom device extensions could link against that.

PiperOrigin-RevId: 305762978
Change-Id: I4d2d9bd9c01ba22391e138244a3948bae8963c5c
2020-04-09 14:42:08 -07:00
Gaurav Jain
9b576164f1 Add Tensor & TensorHandle C APIs taking a Context
The existing TF_AllocateTensor & TFE_NewTensorHandle APIs do not take a
TFE_Context which is undesirable as the TFE_Context indicates ownership
of the tensor. Thus we add new APIs to super-seed the existing ones.

PiperOrigin-RevId: 305126310
Change-Id: I9863ebc692d48875c61b79197ab418f29503a8c6
2020-04-06 15:10:09 -07:00
Gaurav Jain
907a55ebad Remove implicit mirroring toggle
Implicit mirroring is set to true by default already and is essential
for eager performance. This CL just removes dead code since there is no
API to disable mirroring for tensors.

We also shouldn't have this in the TensorHandleInterface class since
mirroring is a runtime-specific implementation detail.

PiperOrigin-RevId: 304421014
Change-Id: I383fa24da08a86028cabb3a4b1c5f2612d57336d
2020-04-02 09:58:19 -07:00
Gaurav Jain
857f0c9557 Add option to enable tfrt eager context
PiperOrigin-RevId: 303254195
Change-Id: Ibee9c3a9cb4f0abf2e1738ed09c7a9ec326b5b64
2020-03-26 21:12:29 -07:00
Allen Lavoie
1e59f1e54d Custom devices: devices take a TFE_Context explicitly
This will be useful for switching between graph building and eager execution (although that may need a different context type), but also gives us the option to pass a custom device representation into language bindings without requiring them to expose their TFE_Context directly (they still expose it to the custom device when executing operations).

PiperOrigin-RevId: 300630552
Change-Id: I41083c63db1b137af60f932114f1fcaae8ac2eb0
2020-03-12 15:15:03 -07:00
Allen Lavoie
28b4039a1a Custom devices: add a TF_Status return, disallow duplicate registrations
PiperOrigin-RevId: 300159589
Change-Id: I4e8cfcdc54999c04a41c7351f7b016a85234ac0d
2020-03-10 13:03:02 -07:00
Haoyu Zhang
957792181b Introduce async_wait and async_clear_error primitives.
Add tests to demonstrate the usage of the primitives in handling exceptions thrown in remote async execution.

PiperOrigin-RevId: 297041596
Change-Id: Ibc9ffa7c5eaaa9b62c6849e815c0c933ff0ec86c
2020-02-24 22:07:20 -08:00
Allen Lavoie
ed52a7c6f5 Add an eager C API for deserializing generic op attributes, TFE_OpSetAttrValueProto (like TF_SetAttrValueProto for graph building)
Also adds an experimental eager C API for serializing op attributes as generic name->value mappings

It's a bit sad that being generic requires serializing here, but I don't see a great way around it if the attributes will be used generically (e.g. to build a FunctionDef). We can add special cases that don't require serialization for fetching attributes when the type is known.

PiperOrigin-RevId: 297003316
Change-Id: Id6e65bc7a8178fbbb8a85a542bd31def08225fe6
2020-02-24 17:00:11 -08:00
Gaurav Jain
6bf2895298 Reduce overhead of protecting tensors for eager
The eager executor tried to prevent forwarding of any input tensors by
incrementing the reference count of any "non-consumed" inputs. This
involved highly delicate logic which first signaled "non-consumed"
inputs as those with a reference count greater than 1 (1 from python and
another from the EagerOperation class), which require "protecting" by
incrementing underlying tensor buffer. This logic is highly heavyweight
for the common case of synchronous execution. We thus simplify the logic
by having all TensorHandle Tensors protected at construction and
"unprotect" then if the reference count is 1.

- Hold 2 reference counts a TensorHandle's backing Tensor. This protects
  the Tensor from being forwarded.
- Add the ability to unprotect a TensorHandle's backing Tensor when the
  reference count is 1.
- Split ExecuteNode into Async implementation. The sync ExecuteNode
  class can avoid various copies such as the list of inputs and the
  forwarding map.
- Remove the experimental TFE_OpConsumeInput API. Input forwarding can
  be achieved by releasing the handle after calling TFE_OpAddInput as
  demonstrated by the added tests.
- Fix TF_AllocateTensor to return a forwardable tensor it was previously
  disabled due to re-using the logic in TF_NewTensor.
- Save mirror tensor when calling TFE_TensorHandleResolve.

PiperOrigin-RevId: 296225251
Change-Id: I484cfccbef8b44e82757b8bda0981cd7fd2f8096
2020-02-20 09:19:23 -08:00
Allen Lavoie
1f5bc8a979 Add an experimental eager C API for generically fetching and setting op attributes.
Right now you can only fetch the whole attribute map and set it wholesale, but we can add more fine-grained attribute control in the future.

This allows the custom device API to pass in attributes, and custom devices to forward these to their own TFE_Execute calls. This is required for creating variables.

PiperOrigin-RevId: 296096192
Change-Id: I98c23bdcd13e479235b3e27850b1bb0bd7a53bba
2020-02-19 17:56:18 -08:00
Akshay Modi
fa5cdeae7e Add a functiondef getter to the context
PiperOrigin-RevId: 296002833
Change-Id: I238a2984a9320c084b7157e6eeb30b30aa132036
2020-02-19 10:48:38 -08:00
Jose Baiocchi
767e4d5dab Move profiler API implementation to _pywrap_profiler
PiperOrigin-RevId: 295240754
Change-Id: I3664efc053696a3c521d18527c04747688cac932
2020-02-14 15:50:39 -08:00
Gaurav Jain
b41b89dc75 Add local mirroring support to tensor handles
We allow a TensorHandle to reference multiple tensors on the local host.
This allows us to essentially cache any implicit copies that occur
before executing an op. This helps avoid repeated copies if a tensor is
constantly fed to an op on a different device.

Additional clean-ups:
- Move CustomDevice TensorHandle constructor to separate constructor
- If the TensorHandle is on the host CPU device, ensure that device_ is
  set to nullptr.
- Clean up CAPI test to use ASSERT_EQ instead of ASSERT_TRUE

PiperOrigin-RevId: 294180977
Change-Id: I26892e9058973eebac557fc529b46de793418e12
2020-02-10 02:47:37 -08:00
Jose Baiocchi
21bb9be2c1 Decouple ProfilerSession wrapper from pywrap_tfe
PiperOrigin-RevId: 293882403
Change-Id: I947e32807447460b6fc7ca1b19bf9ca276c3e994
2020-02-07 13:33:01 -08:00
Haoyu Zhang
bdba822d97 Adding barrier message to clear remote executors in order to support catching OutOfRangeErrors.
PiperOrigin-RevId: 293716720
Change-Id: I0768c99baf080f817e0985e188ffe330b3e15dcc
2020-02-06 17:44:52 -08:00
Allen Lavoie
a4064a389e Experimental API for custom devices in TFE.
Custom devices are an experimental hook into eager op execution, allowing experimentation outside the TensorFlow codebase. These devices do not work in traced code at the moment.

PiperOrigin-RevId: 293615055
Change-Id: I031da213e964caa7d4e11e0f491a3985d034b175
2020-02-06 10:00:55 -08:00
Maher Jendoubi
215dab52c6 Contributing: fix typos 2020-01-26 13:47:00 +01:00
Dong Lin
7bfb8380a7 Place all py_func op on the local host's address space if eager execution is enabled.
PiperOrigin-RevId: 290993424
Change-Id: I0c33cdf781fa4b3c401ea5e8649f606137e42862
2020-01-22 11:28:15 -08:00
Gaurav Jain
77aaa1ef2d Move functionality from TFE_Op to EagerOperation
A lot of functionality in TFE_Op was simply a pass-through to
EagerOperation. We instead want the TFE_Op to be a simple struct and
have the functionality defined in the operation member.

The following changes were made:

- Remove a pointer to the TFE_Context in TFE_Op as the context is stored
  in EagerOperation.
- Modify the constructor of EagerOperation to only take a EagerContext
  pointer and require the caller to call Reset. This allows callers to
  handle any errors from construction.
- We expect the context to not be null. We enforce this with references
  and clean up the code to ensure that an eager context is never reset
  with a different context. As a result the `ctx` parameter has been
  removed from TFE_OpReset.
- Move OpInferenceContext into EagerOperation

PiperOrigin-RevId: 290386452
Change-Id: I3ffb62b01dce230ddc555d84d6ae39fd4ec90b2f
2020-01-17 20:12:26 -08:00
A. Unique TensorFlower
f80f6c6056 Place all py_func op on the local host's address space.
PiperOrigin-RevId: 290008258
Change-Id: If68f84ed37f83ed0aac0689df70e8df69a2d256f
2020-01-15 23:35:10 -08:00
Dong Lin
24ceca6744 Place all py_func op on the local host's address space.
PiperOrigin-RevId: 290005443
Change-Id: I7294676d17d6e2f37fc939bd9d685d71aad8feeb
2020-01-15 23:00:23 -08:00
A. Unique TensorFlower
f18ffa8204 Place all py_func op on the local host's address space.
PiperOrigin-RevId: 289903686
Change-Id: I38f3b8020cea5b3eab1e5d9141c32350473dadfa
2020-01-15 11:44:57 -08:00
Dong Lin
30936d89ac Place all py_func op on the local host's address space.
PiperOrigin-RevId: 289883431
Change-Id: I5990df1fa6825729dcd843e708574451bc16111d
2020-01-15 10:15:16 -08:00
Amit Patankar
50fae6026e Decouple ProfilerSession wrapper from pywrap_tfe
PiperOrigin-RevId: 286641101
Change-Id: Ic5046b977d1b42ed6a1e9038e3b6ec40a0a82e2f
2019-12-20 14:37:32 -08:00
Jose Baiocchi
df6698712a Decouple ProfilerSession wrapper from pywrap_tfe
PiperOrigin-RevId: 286609127
Change-Id: Ic70e27ad3820a2f1399dc414ded40bef811e5653
2019-12-20 11:15:12 -08:00
Alexandre Passos
3f8a370b5a Allow accessing the GPU device memory from the TF C API.
Addresses ; might help with .

PiperOrigin-RevId: 284921061
Change-Id: I2b31e474bf961f731f67a85aad39bfeaefd3998a
2019-12-10 22:47:39 -08:00
A. Unique TensorFlower
495e179730 [Perf] Skip EagerOperation::SetDeviceName(...) call if input device name didn't change.
PiperOrigin-RevId: 284700133
Change-Id: I7716abe6968b0686df00ea15dec3d85bf16e8cf5
2019-12-09 22:04:13 -08:00
Yuefeng Zhou
33a2ba1b47 Add a check_alive to context to check whether a remote worker is alive.
PiperOrigin-RevId: 280080043
Change-Id: Id152c198ebf20256fc14b2ea1e16b8c5db71844c
2019-11-12 16:22:31 -08:00
Yujing Zhang
206d6af149 Add lazy_remote_inputs_copy to TFE_ContextOptions to control lazy remote tensor copy. Disable it by default.
PiperOrigin-RevId: 279212487
Change-Id: Ie46de71fd2902b79281e6257ff28c06d9aaa73d4
2019-11-07 18:35:44 -08:00
Haoyu Zhang
7349bf5e09 Support adding / removing servers when executing distributed ops and functions.
Introduce `update_server_def`, which support running remote ops and functions with dynamic cluster membership in a cluster. The client will register new contexts on the newly added workers, remove old contexts from the removed servers, and rebuild the connections between workers for proper communication.

PiperOrigin-RevId: 271234187
2019-09-25 19:46:26 -07:00
Yujing Zhang
b1efc03535 Reuse the same tensorflow::EagerOperation object across multiple
ops in same thread by adding a Reset method

PiperOrigin-RevId: 268942974
2019-09-13 11:46:47 -07:00
Xiao Yu
4411b77626 Executor api clean up:
1. Remove async_wait() and async_clear_error() in EagerContext.
2. Allow getting current executor from EagerContext.
3. Remove StartAsync() method in EagerExecutor.

PiperOrigin-RevId: 262445965
2019-08-08 16:34:52 -07:00
Derek Murray
086bf1c5d1 Change name of TFE_NewExecutor() argument to is_async.
This fixes a breakage on Python 3.7+, where the SWIG wrapper uses the reserved keyword `async` as a parameter name. This was recently fixed in https://github.com/swig/swig/pull/1382.

PiperOrigin-RevId: 260301284
2019-07-27 08:36:27 -07:00
Xiao Yu
77cc4bcd61 Adds new python APIs which allows specifying an eager executor for current thread. This change also use a new Executor to execute pyfunc, which can avoid pyfunc deadlock in async mode.
PiperOrigin-RevId: 260181580
2019-07-26 11:51:32 -07:00
A. Unique TensorFlower
4024cedbc1 Remove ProfilerContext (no longer used)
PiperOrigin-RevId: 259983256
2019-07-25 11:41:48 -07:00
Derek Murray
27fe47055e Add experimental implementation of cancelable eager function execution.
The experimental interface uses `cancellation.CancellationManager`:

```python
c_mgr = cancellation.CancellationManager()

@tf.function
def f(?):
  ?

cancelable_f = c_mgr.get_cancelable_function(f.get_concrete_function(?))

# Call a function that might run for a long time.
cancelable_f(?)

# Asynchronously:
c_mgr.start_cancel()
```

A subsequent change will add a publicly-accessible (probably experimental) API endpoint for `CancellationManager`.

PiperOrigin-RevId: 258648702
2019-07-17 15:11:54 -07:00
Gaurav Jain
1ea9f63103 Export TFE_ContextOptionsSetMirroringPolicy
Additionally
- Move functions to eager/c_api_experimental.cc
- Misc lint fixes

PiperOrigin-RevId: 256506413
2019-07-04 01:16:01 -07:00
Derek Murray
987046e078 Add a SWIG wrapper for the tensorflow::CancellationManager class.
This change is a step towards supporting user-driven cancellation for eager function calls. In a future change, I plan to add an experimental method for calling a `tf.function` and passing a `CancellationManager` argument, so that the caller can cancel execution asynchronously.

PiperOrigin-RevId: 256369003
2019-07-03 08:28:39 -07:00
Gaurav Jain
e75d8dc058 Add mirroring for remote tensor handles
When executing on a remote worker, we may have to copy the TensorHandle
for each executed op. To avoid duplicated work, we expand the
TensorHandle to keep track of mirrors which are tied to the lifetime of
the TensorHandle. If a mirror already exists on a remote worker, no
additional copy is needed.

The change consists of the following:
- Add map of remote mirrors in TensorHandle.
- Add `mirror` boolean argument to EagerCopyToDevice which indicates to try
  configuring a mirror if possible.
- Add Device argument to RemoteAddress to handle mirrors.
- Expose a ContextMirroringPolicy for the EagerContext. We plan to add
  additional policies in the future, such as local tensor mirroring.
- Rename ContextDevicePlacementPolicy variables to be consistent with
  ContextMirroringPolicy.

PiperOrigin-RevId: 253945140
2019-06-19 00:39:29 -07:00
Xiao Yu
6c5d79930c Fix an issue that start_profiler_server complains 'AssertionError: Context must be initialized first.'
PiperOrigin-RevId: 253093414
2019-06-13 13:31:32 -07:00
A. Unique TensorFlower
514004a234 Add a StartMonitoring Python API.
PiperOrigin-RevId: 253079116
2019-06-13 12:18:31 -07:00
A. Unique TensorFlower
80ad7b024a Move TF_Status to the last argument of StartTracing api
PiperOrigin-RevId: 251497647
2019-06-04 13:12:00 -07:00
A. Unique TensorFlower
74a9032d8c Add status to StartTracing API.
PiperOrigin-RevId: 251299535
2019-06-03 14:40:01 -07:00