STT-tensorflow

Author	SHA1	Message	Date
George Karpenkov	9771765f41	[TF/XLA] Force all tensors which need to be constant during the XLA compilation to be located on the host Otherwise, this leads to strange crashes during the compilation. PiperOrigin-RevId: 304226917 Change-Id: Ia2f1e77b13a25c7e15f009787af81f93b90e8bca	2020-04-01 11:34:32 -07:00
Yanan Cao	1e71cf27dc	Move MLIR bridge pass to before all other passes. New order: MlirBridgePass: Phase 0 before all passes Passes that were at Phase 0 originally are moved to Phase 10 Passes that were at Phase 1 originally are moved to Phase 20 Passes that were at Phase 20+ originally are moved to Phase 30+ PiperOrigin-RevId: 282394988 Change-Id: Ief93072c52fcc073ceb0998271e1e4d5ad2d1f74	2019-11-25 12:30:15 -08:00
Trent Lo	b1dd38b13a	[TF:XLA] Add cluster_scoping_pass. This pass uses some heuristics to add scopes to nodes to guide the clustering results. Currently, the only heuristic is to preserve the parallelism between Tensorflow pipeline stages.	2019-08-13 12:48:03 -07:00
Sanjoy Das	df54488c64	Roll forward "Add an XLA "activity listener" mechanism." PiperOrigin-RevId: 253323502	2019-06-14 17:14:19 -07:00
Gunhan Gulsoy	eed9bfdeb0	Automated rollback of commit `5f2291877d` PiperOrigin-RevId: 253284324	2019-06-14 13:26:05 -07:00
Sanjoy Das	5f2291877d	Add an XLA "activity listener" mechanism. This allows various components to listen to auto-clustering and JIT compilation events in TensorFlow. PiperOrigin-RevId: 253265614	2019-06-14 11:37:35 -07:00
JiangXIAO	bf19e4a398	bugfix:this file has been moved to another dir	2019-06-12 18:41:45 +08:00
Tong Shen	83668b0826	Move function argument rearrangement from a graph pass to XlaCompiler. 1. This is only required for XLA, so it makes sense to move it into XlaCompiler; 2. We need it inside XLA compiler so TPU eager mode works (TPU eager mode does not call graph rewrite passes). PiperOrigin-RevId: 248432264	2019-05-15 16:58:38 -07:00
Tong Shen	5399872207	Add a RearrangeFunctionArgumentPass to rearrange function _Arg/_Retval nodes. TF/XLA bridge expects FunctionDef to satisfy the following rules: 1. DT_RESOURCE arguments are always in the last; 2. Do not return DT_RESOURCE as return values. But functions defined by Tensorflow might not satisfy them. PiperOrigin-RevId: 244714052	2019-04-22 12:59:38 -07:00
Sanjoy Das	ef5519ab43	Add a debug-only pass that introduces a small error to a designated TF node This can let us check how susceptible a model or a unit test is to floating point differences. PiperOrigin-RevId: 240824222	2019-03-28 12:25:47 -07:00
Sanjoy Das	0c9cb95315	Add an optimization pass that clones Constant nodes to make larger clusters I don't particularly love this approach since IMO it is papering over a problem in mark_for_compilation_pass -- mark_for_compilation_pass should instead rematerialize constants as necessary to create larger clusters. But this is what fits in best with scheme we have today. PiperOrigin-RevId: 236729916	2019-03-04 15:25:22 -08:00
Sanjoy Das	485923680d	Re-enable IncreaseDynamismForAutoJitPass It was reverted because the pass creates slices from DT_INT64 tensors which wasn't supported on the GPU. We now support these slices so the pass can be re-enabled. PiperOrigin-RevId: 220738294	2018-11-08 19:03:31 -08:00
Sanjoy Das	8efee06786	Disable IncreaseDynamismForAutoJitPass It creates slices of index type DT_INT64 which do not have a kernels on GPU. PiperOrigin-RevId: 219080243	2018-10-28 22:57:48 -07:00
Sanjoy Das	2c164ed32f	Introduce a pass to increase the amount of dynamism supported by an XLA cluster Increases the amount of dynamism representable by XLA clusters by rewriting the TensorFlow graph. See the header for a description. This pass, combined with jit/partially_decluster_pass, reduces the number of unnecessary cluster recompilations in some common cases. The CL is organized as follows: - cc/framework/scope* and core/graph/node_builder are modified so that new nodes can now be automatically put in an XLA cluster using Scope::WithXlaCluster. - The pass is implemented in jit/increase_dynamism_for_auto_jit_pass. - In jit/jit_compilation_pass_registration The new pass is registered to run between MarkForCompilationPass and PartiallyDeclusterPass. PiperOrigin-RevId: 218907734	2018-10-26 14:01:34 -07:00
Sanjoy Das	4d39844c1d	Split XlaLaunch into XlaCompile and XlaRun; NFC This CL splits the functionality in XlaLaunch into two separate operations: - XlaCompile, responsible for compiling a TF function into a LocalExecutable - XlaRun, responsible for executing a LocalExecutable created by XlaCompile This CL is a stepping stone towards implementing lazy compilation for TF/XLA. The XlaCompile op is spec'ed to return a boolean indicating whether the compilation was successful. Right now that boolean is always set to true by XlaCompile and its value is otherwise ignored, but in the future it will be used to indicate whether the TF function was compiled or not, and thus whether we should execute XlaRun or just directly call the TF function. XlaLaunch still exists, and will be created by create_xla_launch_op.cc. In the future we may consider removing it altogether. build_xla_launch_ops.cc, now renamed to build_xla_ops.cc, creates a XlaCompile/XlaRun pair instead of XlaLaunch. This CL is organized as follows: - jit/ops/xla_ops.cc gets two new XLA-specific operations, XlaCompile and XlaRun, described above. XlaRun redundantly takes the must-be-constant inputs to the TensorFlow cluster to keep the implementation simple (simple in the sense of similar to XlaLaunch), but I will remove this in a subsequent cleanup CL. - jit/kernels/xla_ops.cc implements XlaCompile and XlaRun in a fairly straightforward manner. XlaCompile compiles the TF function, puts it in a process-global storage, XlaExecutableClosureStore, and produces a int64 key. XlaRun uses the key to read out the LocalExecutable and execute it. I'm not sure if XlaExecutableClosureStore should be a resource like XlaCompilationCache; I did not immediately see any reason to make it so. - There are changes to the various _device files to register XlaCompile and XlaRun for the XLA_* devices. - Finally, I had to fix some tests that were expecting XlaLaunch in the execution timeline. PiperOrigin-RevId: 213895405	2018-09-20 15:45:36 -07:00
A. Unique TensorFlower	29b56bde1e	Automated rollback of commit `ac60b46e2c` PiperOrigin-RevId: 212896336	2018-09-13 16:12:39 -07:00
Tong Shen	37ddb13ece	Roll forward change "Move control flow functionalization as a graph optimization pass, instead of a step in XlaCompiler.". PiperOrigin-RevId: 212657932	2018-09-12 10:07:48 -07:00
Yanan Cao	ac60b46e2c	Automated rollback of commit `45965cfd8b` PiperOrigin-RevId: 212465918	2018-09-11 09:38:56 -07:00
A. Unique TensorFlower	45965cfd8b	Graph optimization pass that creates XlaLaunch ops for the computations that have been explicitly marked to be compiled via xla.compile() PiperOrigin-RevId: 212407112	2018-09-11 00:54:33 -07:00
Tong Shen	b40ace8f28	Automated rollback of commit `a3776a234f` PiperOrigin-RevId: 212182923	2018-09-09 09:54:26 -07:00
Tong Shen	a3776a234f	Move control flow functionalization as a graph optimization pass, instead of a step in XlaCompiler. PiperOrigin-RevId: 212164482	2018-09-09 01:41:14 -07:00
Sanjoy Das	4ab8a1056a	Avoid device to host copies by "partially declustering" certain nodes. "Partial declustering" is defined as cloning a clustered node outside its cluster and transferring some of its outgoing edges to the cloned version. Some TensorFlow operations expect their inputs in host-memory and, because XLA only produces device tensors, such nodes can incur a device-to-host copy if not clustered along with their producers. TensorFlow operations, on the other hand, may produce their outputs in host memory so cloning the producer to outside the cluster and moving the host-mem expecting consumers to use the cloned version instead lets us avoid the memcpy. PiperOrigin-RevId: 208710603	2018-08-14 14:19:45 -07:00
Peter Hawkins	1e67c90e2c	Initial open-source release of XLA: Accelerated Linear Algebra. XLA is a compiler-based linear algebra execution engine that targets CPUs, GPUs and custom accelerators. XLA is still experimental; we are releasing it early to get the community involved. Change: 143990941	2017-01-09 12:26:35 -08:00

23 Commits

No results found.