STT-tensorflow

Author	SHA1	Message	Date
Peter Hawkins	e7f6b0c7ee	[XLA] Don't pass on_host_shape to ShapedBuffer/ScopedShapedBuffer inside XLA. PiperOrigin-RevId: 336133292 Change-Id: I47a6fa5a5f2c6a460bdaeb1acc5125ff20710230	2020-10-08 12:00:12 -07:00
Taré Gaskin	939db02ff5	xla directory resolutions	2020-07-26 22:14:33 +00:00
George Karpenkov	1a9b57d729	[XLA] Store host shape in ExecutionInput Simplify the APIs explicitly passing the host shape PiperOrigin-RevId: 321083080 Change-Id: I9e124dd4465ee4037f2d0cdbd33f04a43f35abc2	2020-07-13 20:03:34 -07:00
George Karpenkov	6116b7f911	[XLA] [client] Implement a RunAsync overload which does not need a vector of shapes PiperOrigin-RevId: 317406952 Change-Id: I69d8cc8a68ffdfbf70e2969f5df5e6adba7d2e1d	2020-06-19 17:09:13 -07:00
George Karpenkov	62683d061c	[XLA] Rollback of rollback of "Implement LocalClient::Run which supports buffer donation" PiperOrigin-RevId: 317400695 Change-Id: I56f1f8df347d5a3b2bad9526c7315c63ad6ddadb	2020-06-19 16:33:47 -07:00
A. Unique TensorFlower	a82b75c82b	[XLA/Client] Implement LocalClient::Run which supports buffer donation PiperOrigin-RevId: 317195199 Change-Id: If4d35d0627fa068a0c2b522fdae52466abd21f51	2020-06-18 15:40:54 -07:00
George Karpenkov	ef52b4e088	[XLA/Client] Implement LocalClient::Run which supports buffer donation PiperOrigin-RevId: 317173848 Change-Id: If92955ac5051376fbf0932b773b675459497c0c4	2020-06-18 14:03:29 -07:00
Peter Hawkins	a056967714	[XLA:Python] Move Compile() options into a CompileOptions struct. When there are many options to a function, it's cleaner to use an options struct. This change is in preparation for adding another option. Since ExecutableBuildOptions already has a DeviceAssignment field, we can use it rather than having our own. We can also sink the DeviceAssignment legality check into XLA proper, given it's now just testing that two parts of ExecutableBuildOptions agree. PiperOrigin-RevId: 300407020 Change-Id: Idcd0acae6bb89eda34085ae2471b0eafd5cdef95	2020-03-11 14:31:35 -07:00
Davide Libenzi	d1085a6e00	Make XLA's kUserAlias work together with XRT's swap/compaction. The XRT tuple allocation owns the device memory, which, in order for the lower level aliasing to work, needs to be handed out as "owning" within the parameter's shape tree. But if the parameter's shape tree get destroyed (for an intermediate error before execute) the memory will get released and the tuple allocation will be pointing to free memory. This CL introduces an ExecutionInput data structure which wraps a maybe-owning shape tree together with the indices which should be released before the shape tree gets destroyed. This allows the data structure to travel down until the point where the buffers lands inside the ExecutionOutput, which uses a similar logic (until finally the result gets consumed). Unfortunately the situation of the device memory data structures got a bit messy, with Owning, MaybeOwning, ShapedBuffer, ScopedShapedBuffer, ... none of which can work nicely with buffer sharing. Ideally we should have something like std::shared_ptr<OwningDeviceMemory> and ShapeTree<std::shared_ptr<OwningDeviceMemory>> and be done with it. Unfortunately the change (I started that route first) towards that goal is pretty major. PiperOrigin-RevId: 298498866 Change-Id: I2e27c11b7187fa2992ae3b606ea95c18f312cb5a	2020-03-02 18:39:10 -08:00
Chris Jones	9e944aa4fc	Change `LocalClient::Compile` to support returning multiple executables (one per partition). PiperOrigin-RevId: 292094485 Change-Id: Idaa4d14246478e5ec9b45d1d17d5610f35d35611	2020-01-29 01:03:57 -08:00
A. Unique TensorFlower	a3edaa7235	Change `LocalClient::Compile` to support returning multiple executables (one per partition). PiperOrigin-RevId: 291343358 Change-Id: I0550040ddbb67e78e9e4078185e0af6b11b96e35	2020-01-24 03:35:03 -08:00
Chris Jones	c42a05f658	Change `LocalClient::Compile` to support returning multiple executables (one per partition). PiperOrigin-RevId: 291341897 Change-Id: I8107c03ebb08b40e5e5cabed1002fecee38767ed	2020-01-24 03:20:38 -08:00
Peter Hawkins	4a08e00f6e	[XLA] Add support for buffer donation to the XLA local client API. PiperOrigin-RevId: 289847802 Change-Id: Ic25df197d6cdcea4ef08840ab2ac16d0c986cd06	2020-01-15 06:36:47 -08:00
TensorFlower Gardener	240497c2b2	Merge pull request #34958 from kiszk:spelling_tweaks_compiler PiperOrigin-RevId: 285793343 Change-Id: I20bc2b8ca24d5fee4613f12abeba48957f80221f	2019-12-16 10:53:40 -08:00
Kazuaki Ishizaki	e664420b79	minor spelling tweaks	2019-12-09 18:21:12 +09:00
A. Unique TensorFlower	e4159c833e	Update paths to LLVM. PiperOrigin-RevId: 278455338 Change-Id: I4422a2de8365a8fa8cd5bcfe87b0f70e7dd352d6	2019-11-04 13:53:39 -08:00
Chen Chen	001c155f41	Internal change. PiperOrigin-RevId: 278446480 Change-Id: I105e8b3841bba3cb6513a7840bac2bb70d1536d9	2019-11-04 13:49:21 -08:00
Peter Hawkins	efd304f40e	[XLA] Add an Executable::ExecuteAsyncOnStreamWrapper. Move logic from ExecuteOnStreamWrapper, make it callback-based, and make ExecuteOnStreamWrapper a thin wrapper around ExecuteAsyncOnStreamWrapper than blocks. Helps solve https://github.com/google/jax/issues/774 by making the XLA profiling code work on the async path used by JAX. PiperOrigin-RevId: 262416495	2019-08-08 14:06:31 -07:00
Peter Hawkins	1762bef938	[XLA] Consolidate Executable::ExecuteOnStream and ExecuteAsyncOnStream. Remove ExecuteOnStream virtual method, make ExecuteOnStream a non-virtual wrapper around ExecuteAsyncOnStream. This means that backend authors have one method to implement (ExecuteAsyncOnStream) rather than two, and reduces the number of code paths to running an executable. Comment that ExecuteAsyncOnStream may in fact not be async. While undesirable, this is a quality of implementation issue not a bug. Future changes can make implementations of ExecuteAsyncOnStream truly async. PiperOrigin-RevId: 261922907	2019-08-06 08:56:57 -07:00
Peter Hawkins	ab8b627e0b	[XLA] Clean up execution_profile usage and make it thread-safe. Currently a mutable execution_profile is attached to xla::Executable. This isn't thread safe, since the same Executable may be invoked concurrently. Instead, clients already have the ability to pass in their own ExecutionProfile via the ExecutableRunOptions; update that one instead. Simplify APIs that accepted both an ExecutionProfile and an ExecutableRunOptions given the latter contains a pointer to an instance of the former. PiperOrigin-RevId: 261570194	2019-08-04 08:57:47 -07:00
Peter Hawkins	18f1467496	[XLA] Make HLO snapshot dumping work on the LocalClient::RunAsync path. PiperOrigin-RevId: 259956061	2019-07-25 09:07:24 -07:00
Peter Hawkins	3f266b1c8d	Automated rollback of commit `53da0bc5ce` PiperOrigin-RevId: 259945774	2019-07-25 07:58:19 -07:00
A. Unique TensorFlower	53da0bc5ce	Automated rollback of commit `e8510ab01d` PiperOrigin-RevId: 259937937	2019-07-25 06:58:45 -07:00
Peter Hawkins	e8510ab01d	[XLA] Improve thread-safety of HLO snapshot dumping. Currently the code keeps a mutable HloSnapshot attached to the xla::Executable object. This cannot work correctly in the presence of concurrent executions. Instead, keep only an immutable HloProto attached to xla::Executable and construct ephemeral HloSnapshots during dumping. This has the minor downside that it requires copying the HloProto each time we dump, but presumably if you are dumping HLO snapshots you don't particularly care about performance. PiperOrigin-RevId: 259934176	2019-07-25 06:26:40 -07:00
George Karpenkov	0410cff073	Move DeviceMemoryAllocator and OwningDeviceMemory from XLA to StreamExecutor. This change achieves three goals: 1. There are currently three different allocator abstractions in three different places: XLA, stream executor, and tensorflow. This change shrinks down the number of packages with allocator abstraction to two. 2. Moving the allocator enables unifying ScopedDeviceMemory and OwningDeviceMemory which both have "owning pointer" semantics, but a slightly different API. 3. Moving the allocator enables moving RedzoneAllocator in stream executor, which we would like to use in tensorflow to catch out-of-bound-writes in CUDNN convolutions during the autotuning. PiperOrigin-RevId: 247211996	2019-05-08 10:11:22 -07:00
Peter Hawkins	c7b255ae35	[JAX] Add support for asynchronous execution, but leave it disabled by default for now. [XLA:Python] Add support for asynchronous execution in the Python client. Python isn't famous for being the world's fastest language, so for high performance eager-style dispatch it is helpful to be able to hide Python latency behind device computations by having the Python code dispatch device operations asynchronously. The design here closely follows the design of asynchronous execution in TensorFlow and the TensorFlow/XLA client. We use three main streams: * a compute stream, for running XLA computations, * a host-to-device stream, for transferring data onto the device * a device-to-host stream, for transferring data off the device. Both host-to-device transfers and compute are asynchronous, that is, they return control to Python as soon as any necessary error checking is complete, but before the operation completes. This allows the Python code to enqueue any subsequent operations while the previously enqueued operations complete. Device-to-host transfers are still blocking, in the sense that they stall the host until the host-side data is ready. [XLA] Add LocalExecutable::RunAsync() to obtain async execution on a stream. There is currently no way to achieve this via the LocalClient API, only by using internal XLA APIs. [XLA:GPU] Implement ExecuteAsyncOnStream. It turns out that ExecuteOnStream is already more or less async anyway. PiperOrigin-RevId: 246650968	2019-05-04 07:36:09 -07:00
Peter Hawkins	4fc7f70df7	[XLA] Change some xla::Literal arguments to xla::LiteralSlice so the functions accept other kinds of literals. Update a stale comment in xla::ShapeUtil. PiperOrigin-RevId: 240606826	2019-03-27 11:41:24 -07:00
Justin Lebar	39587aaeb7	[XLA] Rework debug flags for dumping HLO. The following flags (usually passed via the XLA_FLAGS envvar) are removed: xla_dump_computations_to xla_dump_executions_to xla_dump_ir_to xla_dump_optimized_hlo_proto_to xla_dump_per_pass_hlo_proto_to xla_dump_unoptimized_hlo_proto_to xla_generate_hlo_graph xla_generate_hlo_text_to xla_hlo_dump_as_html xla_hlo_graph_path xla_log_hlo_text The following new flags are added: xla_dump_to xla_dump_hlo_module_re xla_dump_hlo_pass_re xla_dump_hlo_as_text xla_dump_hlo_as_proto xla_dump_hlo_as_dot xla_dump_hlo_as_url xla_dump_hlo_as_html xla_dump_ir xla_dump_hlo_snapshots The default is not to dump anything at all, but as soon as some dumping flag is specified, we enable the following defaults (most of which can be overridden). * dump to stdout (overridden by --xla_dump_to) * dump HLO modules at the very beginning and end of the optimization pipeline * don't dump between any HLO passes (overridden by --xla_dump_hlo_pass_re) * dump all HLO modules (overridden by --xla_dump_hlo_module_re) * dump in textual format (overridden by --xla_dump_hlo_as_{text,proto,dot,url,html}). For example, to dump optimized and unoptimized HLO text and protos to /tmp/foo, pass --xla_dump_to=/tmp/foo --xla_dump_hlo_as_text --xla_dump_hlo_as_proto For details on these flags' meanings, see xla.proto. The intent of this change is to make dumping both simpler to use and more powerful. For example: * Previously there was no way to dump the HLO module during the pass pipeline in HLO text format; the only option was --dump_per_pass_hlo_proto_to, which dumped in proto format. Now this is --xla_dump_pass_re=.* --xla_dump_hlo_as_text. (In fact, the second flag is not necessary in this case, as dumping as text is the default.) * Previously there was no way to dump HLO as a graph before and after compilation; the only option was --xla_generate_hlo_graph, which would dump before/after every pass. Now this is --xla_dump_hlo_as_{dot,url,html} (depending on what format you want the graph in). * Previously, there was no coordination between the filenames written by the various flags, so info about one module might be dumped with various filename prefixes. Now the filenames are consistent and all dumps from a particular module are next to each other. If you only specify some of these flags, we try to figure out what you wanted. For example: * --xla_dump_to implies --xla_dump_hlo_as_text unless you specify some other --xla_dump_as_* flag. * --xla_dump_hlo_as_text or --xla_dump_ir implies dumping to stdout unless you specify a different --xla_dump_to directory. You can explicitly dump to stdout with --xla_dump_to=-. As part of this change, I simplified the debugging code in the HLO passes for dumping HLO modules. Previously, many tests explicitly VLOG'ed the HLO module before, after, and sometimes during the pass. I removed these VLOGs. If you want dumps before/during/after an HLO pass, use --xla_dump_pass_re=<pass_name>. PiperOrigin-RevId: 237920279	2019-03-11 17:35:39 -07:00
Sanjoy Das	767a1fe746	Delete unused ServiceExecutableRunOptions::xla_intra_op_thread_pool; NFC PiperOrigin-RevId: 228627868	2019-01-09 18:37:15 -08:00
Justin Lebar	7c7ed2b64c	[XLA] Print out shapes' layouts in local_client's "shape/layout doesn't match" error. Previously we'd say "shape/layout doesn't match" and then print out the shapes without their layouts! PiperOrigin-RevId: 224105237	2018-12-04 23:22:57 -08:00
A. Unique TensorFlower	0cd83fc0d8	Use low level API to transfer literal to device in local client. PiperOrigin-RevId: 221471132	2018-11-14 10:52:09 -08:00
A. Unique TensorFlower	dd6d7c5c58	Global de-std::unique_ptr cleanup for xla::Literal. PiperOrigin-RevId: 212313258	2018-09-10 12:38:19 -07:00
Tim Shen	6f879f891a	[XLA] Rename all (Mutable)ArraySlice to absl::Span. PiperOrigin-RevId: 210998142	2018-08-30 16:07:27 -07:00
Justin Lebar	d57f5a8202	[XLA] Switch to absl::StrFormat. Unlike Printf, StrFormat does not require type-length qualifiers, e.g %z, %ll. Nor does it require that you call c_str() to print strings. So these are fixed up here as well. PiperOrigin-RevId: 210435915	2018-08-27 14:55:29 -07:00
Justin Lebar	e924d67bff	[XLA] Use absl::make_unique instead of xla::MakeUnique. Same for WrapUnique. PiperOrigin-RevId: 209531124	2018-08-20 20:23:24 -07:00
Kay Zhu	b67b3927a1	[XLA] Fix use-of-unintialized-value msan failure in local_client as well. PiperOrigin-RevId: 208004791	2018-08-09 00:24:49 -07:00
Kay Zhu	963ef37203	[TF:XLA] Introduce MutableBorrowingLiteral to enable interacting with a (tensor) buffer not owned by XLA/Literal class directly, without having to memcpy the Literal to a (Host)Tensor. PiperOrigin-RevId: 207972410	2018-08-08 17:24:39 -07:00
A. Unique TensorFlower	abd645085b	Use the correct device ordinal to check whether the device the executable was built for is equivalent to the device the it will run on. Before this patch, if the device to run on was provided via a stream without setting the device ordinal in the ExecutableRunOptions, we would check the default device against the device the executable was built for. PiperOrigin-RevId: 206892902	2018-08-01 01:06:22 -07:00
Todd Wang	15b155e929	Replace generic Pool with StreamPool, and discard failed streams. We have a Pool in XLA that maintains a freelist of Streams, to avoid the overhead of repeatedly allocating new Streams. Streams have a monotonic state machine; if a stream encounters any error, it will remain in an error state forever. The functional change in this CL is to ensure that streams which have encountered an error are deleted, rather than being put back on the pool. Without this change, a previously failed stream will be put back on the pool, only to cause the next usage of the stream to trivially fail. I've chosen to replace the generic templatized Pool with a concrete StreamPool, since this makes the logic more straightforward to reason about. Also note that the only existing usage of Pool is to hold streams. The functional change is in stream_pool.cc; most of everything else is mechanical updates. PiperOrigin-RevId: 206100631	2018-07-25 21:02:08 -07:00
Mark Heffernan	438c8e2b0a	Move xla_computation.* from xla/client/xla_client up to xla/client. Plan is to move everything in xla/client/xla_client up to xla/client and remove the directory. No functional change. PiperOrigin-RevId: 206055680	2018-07-25 14:33:14 -07:00
A. Unique TensorFlower	9ab04addfb	Remove the ambiguity of device/host computation layouts within the HloModuleConfig. PiperOrigin-RevId: 201284741	2018-06-19 19:42:57 -07:00
A. Unique TensorFlower	bae4a271c0	Internal change PiperOrigin-RevId: 201161803	2018-06-19 06:04:39 -07:00
A. Unique TensorFlower	f0230735d1	[XLA] Redesign: delete SessionModule. PiperOrigin-RevId: 199361402	2018-06-05 14:51:27 -07:00
Justin Lebar	49535c9da6	[XLA] Switch replay_computation to use LocalClient. This lets replay_computation build an executable once and run it multiple times. This is particularly important because in XLA:GPU, the first run of an executable does some autotuning and therefore is unrepresentative. This change removes --xla_hlo_profile_last_run, because I don't see how to support it in LocalClient -- LocalClient wants the do-profile bit to be set when we compile. (There may not be an easy fix for this; it worked with regular Client because we were recompiling every time we ran.) PiperOrigin-RevId: 198643577	2018-05-30 17:03:41 -07:00
A. Unique TensorFlower	f36c93505f	[XLA] Redesign: delete xla::Computation. PiperOrigin-RevId: 197069851	2018-05-17 16:26:35 -07:00
Justin Lebar	2f5f2cb425	[XLA] s/tensorflow::Status/Status/. These are type aliases of one another; we'd like to be consistent and use the shorter one. PiperOrigin-RevId: 196322955	2018-05-11 16:07:35 -07:00
A. Unique TensorFlower	210abebd3f	[TF:XLA] Separate on-host and on-device shape and layout in HloModule. Previously, only one layout was stored with an HLO module. This CL allows HLO passes to modify the on-device layouts without affecting the on-host layout (provided by the client) PiperOrigin-RevId: 195014875	2018-05-01 16:19:55 -07:00
Justin Lebar	56fd856425	[XLA] Make Executable return a ScopedShapedBuffer. Previously, we returned a plain ShapedBuffer. But this doesn't capture our semantics: It's up to the callee to free this ShapedBuffer. PiperOrigin-RevId: 193854051	2018-04-22 14:50:48 -07:00
Justin Lebar	4e17a3f149	[XLA] De-unique_ptr-ify ShapedBuffer and ScopedShapedBuffer. These are already notionally equivalent to T* and unique_ptr<T>, so having a unique_ptr of a {Scoped,}ShapedBuffer is pretty redundant. Also clean up the ScopedShapedBuffer API a bit. PiperOrigin-RevId: 193599773	2018-04-19 17:22:23 -07:00
Justin Lebar	d77a621a57	[XLA] Convert XLA to use xla::se as a namespace alias for ::stream_executor. PiperOrigin-RevId: 193301997	2018-04-17 21:07:05 -07:00

1 2

80 Commits