Commit Graph

831 Commits

Author SHA1 Message Date
Xiaoqiang Zheng
01a6f5e504 Multiple layout support for pooling operations.
Change: 115611259
2016-02-25 18:09:01 -08:00
Derek Murray
cdd0f2eeef Fix compilation error in argv parsing code... whoops.
Change: 115610448
2016-02-25 15:25:10 -08:00
A. Unique TensorFlower
aeae4825b3 Add symbolic gradient functions for Conv2D and MaxPool
Change: 115608522
2016-02-25 15:24:58 -08:00
Vijay Vasudevan
03fed366e4 TensorFlow: conv_ops uses gpu_device_context, needs to depend on the lib.
Change: 115607974
2016-02-25 15:00:06 -08:00
Derek Murray
f3ead2df04 Correct handling of argv in test utility.
Change: 115607801
2016-02-25 14:59:54 -08:00
Vijay Vasudevan
c38bbf42e8 Rollback of "TestReporter is back in. Maybe also fixed the Android build."
Test fails.
Change: 115602477
2016-02-25 14:13:33 -08:00
Vijay Vasudevan
90cf3e2eea Rollback of: Add native depthwise_convolution op (forward pass).
The current depthwise_conv is very inefficient by calling slice() on each
input channel on input and filters, followed by a conv() on each input channel,
after which is a concat().
Change: 115601904
2016-02-25 14:13:22 -08:00
Vijay Vasudevan
a82f7e6b55 Make gpu_lib for non-cuda deps that we use in public kernels.
Change: 115598732
2016-02-25 13:45:59 -08:00
Sherry Moore
a5f3979004 Clarify comments for max_to_keep.
Change: 115598592
2016-02-25 13:45:46 -08:00
Manjunath Kudlur
be64da9454 Remove endl at the end of VLOG.
Change: 115594986
2016-02-25 13:19:40 -08:00
Josh Levenberg
eec5477ab6 Execute TODO to rename io.* to save_restore_tensor.*. This will
hopefully reduce confusion since io.* is not the implementation of the
".../kernels:io" build target.
Change: 115593814
2016-02-25 13:19:30 -08:00
Eugene Brevdo
ad3ef4c05b TestReporter is back in. Maybe also fixed the Android build.
Change: 115589642
2016-02-25 13:19:20 -08:00
Vijay Vasudevan
356bf7f466 TensorFlow: add missing header file to posix/test.cc
Change: 115589382
2016-02-25 12:13:32 -08:00
Jianmin Chen
7b47c8b4a3 Add native depthwise_convolution op (forward pass).
The current depthwise_conv is very inefficient by calling slice() on each
input channel on input and filters, followed by a conv() on each input channel,
after which is a concat().
Change: 115583330
2016-02-25 12:13:20 -08:00
Derek Murray
818644c2a9 Changed testing::SrcDir() to testing::TensorFlowSourceRoot() and fixed it.
Also fixed some compiler warnings.
Change: 115582482
2016-02-25 11:16:31 -08:00
Vijay Vasudevan
13d7f52034 TensorFlow: make split_op not use internal header library for callback,
since this breaks the build on GPU.
Change: 115582331
2016-02-25 11:16:20 -08:00
Vijay Vasudevan
86e93febaa TensorFlow: Fix scatter_op_test now that StringPiece::contains is fixed.
Change: 115580211
2016-02-25 11:16:08 -08:00
A. Unique TensorFlower
d1aed6505a Add contrib/testing.
Change: 115578243
2016-02-25 11:15:56 -08:00
A. Unique TensorFlower
9ccc4b6afe Avoid some over-inlined routines. Reduces code size of TensorFlow binaries
considerably.  Shrinks text size of example_trainer binary by ~1.5%.
Change: 115578002
2016-02-25 11:15:46 -08:00
Benoit Steiner
63bd3efc5c Made sure that the tracking allocator always counts the allocated sizes.
Made the corresponding unit test more robust.
Change: 115575179
2016-02-25 11:15:32 -08:00
Vijay Vasudevan
5c9f4f8973 TensorFlow: fix bug in StringPiece::contains which made it always
return true.  Add a unittest to catch this type of regression in
the future.
Change: 115573280
2016-02-25 11:15:20 -08:00
A. Unique TensorFlower
82ecfff7da Fix for constant folding where nodes with no inputs doesn't get constant folded.
Change: 115568214
2016-02-25 11:14:58 -08:00
A. Unique TensorFlower
e752109efb Fixes bug in accumulation of total-approximate-duality-gap.
Change: 115528686
2016-02-25 09:03:20 -08:00
A. Unique TensorFlower
73d557cc88 Fix an error message in tf.sparse_to_dense to include the possibility that indices are invalid because they are out of bounds.
Change: 115522264
2016-02-25 09:02:57 -08:00
Eugene Brevdo
fcfa866d67 Added TestReporter and test / benchmark reporting tools.
These tools are meant to allow recording of benchmark & unit test
structured output to pbtxt files in a directory only when the
environment variable TEST_REPORT_FILE_PREFIX is set.  For now,
only saving of C++  microbenchmark output is supported.
Change: 115518303
2016-02-25 09:02:41 -08:00
Sherry Moore
4ecd2a70dd Added unit test for max_to_keep being None.
Change: 115516426
2016-02-25 09:02:28 -08:00
Kiril Gorovoy
77da168dbc Move all Tensorflow WORKSPACE rules to a skylark macro
Change: 115515678
2016-02-25 09:02:17 -08:00
Josh Levenberg
9ba55d8a75 Remove no-longer-needed RequireDefaultOps().
Change: 115511835
2016-02-25 09:01:55 -08:00
Josh Levenberg
ab286e0996 Remove no-longer-needed RequireDefaultOps().
Change: 115511794
2016-02-25 09:01:43 -08:00
Vincent Vanhoucke
bce6216610 Switch nn.moments() to using a one-pass stable algorithm.
Helps with: https://github.com/tensorflow/tensorflow/issues/917
Also fixes https://github.com/tensorflow/tensorflow/issues/1162

The main benefit is that the computation of the sufficient statistics is now decoupled of the aggregation of the moments, which means that if you want to perform the accumulation incrementally, you don't have to keep all the inputs around, and can instead keep the much more compact sum and sum-of-squares. Accumulation could also be performed locally if you aggregate across multiple devices.
Computing sum and sum-of-squares can also theoretically be performed in parallel now.

Tested running inception: same performance, same step time.
Batch normalization benchmark is a bit faster on CPU, a bit slower on GPU:

Before:
cpu shape:4/3 #layers:10 mode:py scale:True train:False - 1.139310 secs
gpu shape:4/3 #layers:10 mode:py scale:True train:False - 0.021970 secs
cpu shape:4/3 #layers:10 mode:py scale:True train:True - 2.767147 secs
gpu shape:4/3 #layers:10 mode:py scale:True train:True - 0.074531 secs
cpu shape:4/3 #layers:10 mode:py scale:True train:False - 0.742835 secs
gpu shape:4/3 #layers:10 mode:py scale:True train:False - 0.013473 secs
cpu shape:4/3 #layers:10 mode:py scale:True train:True - 1.738806 secs
gpu shape:4/3 #layers:10 mode:py scale:True train:True - 0.052777 secs
cpu shape:2/1 #layers:10 mode:py scale:True train:False - 0.119180 secs
gpu shape:2/1 #layers:10 mode:py scale:True train:False - 0.011201 secs
cpu shape:2/1 #layers:10 mode:py scale:True train:True - 0.218297 secs
gpu shape:2/1 #layers:10 mode:py scale:True train:True - 0.048526 secs

After:
cpu shape:4/3 #layers:10 mode:py scale:True train:False - 0.998944 secs
gpu shape:4/3 #layers:10 mode:py scale:True train:False - 0.025828 secs
cpu shape:4/3 #layers:10 mode:py scale:True train:True - 2.657428 secs
gpu shape:4/3 #layers:10 mode:py scale:True train:True - 0.086614 secs
cpu shape:4/3 #layers:10 mode:py scale:True train:False - 0.603137 secs
gpu shape:4/3 #layers:10 mode:py scale:True train:False - 0.017668 secs
cpu shape:4/3 #layers:10 mode:py scale:True train:True - 1.519533 secs
gpu shape:4/3 #layers:10 mode:py scale:True train:True - 0.055214 secs
cpu shape:2/1 #layers:10 mode:py scale:True train:False - 0.071344 secs
gpu shape:2/1 #layers:10 mode:py scale:True train:False - 0.016440 secs
cpu shape:2/1 #layers:10 mode:py scale:True train:True - 0.222093 secs
gpu shape:2/1 #layers:10 mode:py scale:True train:True - 0.039967 secs
Change: 115507032
2016-02-25 09:01:18 -08:00
Josh Levenberg
2cc5ed87e3 Execute TODO to explain graph-consumer usage of
RemoveNewDefaultAttrsFromGraphDef().
Change: 115506523
2016-02-25 09:01:06 -08:00
A. Unique TensorFlower
8041c546bb Switch sdca_ops to use tf.load_library mechanism.
Change: 115505008
2016-02-25 09:00:55 -08:00
Benoit Steiner
223794ee78 Avoid using initialization lists since the version of nvcc shipped with Tegra
X1 crashes when attempting to compile them
Change: 115500414
2016-02-24 15:36:22 -08:00
Eugene Brevdo
2861cc1d23 Surface control_flow_ops.case to public. Update docs. Add unit tests.
Change: 115496194
2016-02-24 15:36:11 -08:00
Geoffrey Irving
497606904b Fix build issue with safety fix to gather and scatter
Change: 115495726
2016-02-24 15:35:58 -08:00
Eugene Brevdo
746ccc842e Temporarily disable sdca_ops_test - it breaks the opensource build.
Change: 115494526
2016-02-24 15:35:47 -08:00
A. Unique TensorFlower
4afef14f02 Support leaving the offset (beta) parameter out in batch_normalization, in which case no offset will be added after normalization.
Change: 115489328
2016-02-24 15:35:25 -08:00
A. Unique TensorFlower
87a289103f removing repeated hostcast lines
Change: 115472914
2016-02-24 15:35:03 -08:00
A. Unique TensorFlower
57df84c47e Rename map in control_flow_ops to map_fn, to avoid name conflict with Python's native 'map' function. This also fixes the bug with control_flow_ops.case
Change: 115472163
2016-02-24 15:34:53 -08:00
David G. Andersen
6b2c0012d1 Eliminate unneded pylint disable
Change: 115470945
2016-02-24 15:34:42 -08:00
A. Unique TensorFlower
14a237beb0 Update TensorBoard README.md.
Describe how to load many runs.
Change: 115467346
2016-02-24 15:34:19 -08:00
Geoffrey Irving
26078dfaf2 Fix safety bug in gather and scatter
Both gather and scatter now unconditionally validate indices in the inner loop,
which prevents crashes if indices are changed asynchronously while the ops are
running.

For gather when validate_indices = true, the new code is within the noise of the
old code speedwise or possibly slightly faster (unsurprising since the new code
fuses two loops).  Specifically, the geometric mean of int32 gather benchmarks
goes from 4.05GB/s to 4.04-4.07GB/s.

For gather when validate_indices = false, the old code and a version of the old
code that supported validate_indices = false both get 1.5% slower.  Xiaoqiang
and I deem this difference insufficient to preserve the unsafe code path, so
poof: it's gone.

For scatter (which always validates), the new code is slightly faster than the
old code: the geometric mean goes from 546-559M items/s to 573M items/s.
Change: 115467091
2016-02-24 15:34:07 -08:00
Yuan Yu
8804a486c7 Store only what is needed in the node name to node map.
Change: 115464489
2016-02-24 15:33:35 -08:00
Eugene Brevdo
8411effdee Add the OneHot op.
Change: 115464229
2016-02-24 15:33:24 -08:00
Vijay Vasudevan
9d84271a20 TensorFlow: Initial support in SimplePlacer for colocation groups,
to be used to colocate based on attributes rather than either
names of ops or devices (op names and devices aren't portable).

A follow up change will add an ops.colocate_with() to Python that adds
this attribute to nodes, and will be used to replace calls to 'with
tf.device(foo.device)' in TF library code, which assumes that devices
have been specified.
Change: 115463464
2016-02-24 15:33:10 -08:00
A. Unique TensorFlower
92383c8754 Fix minor typo in documentation in training_util.py
Change: 115462062
2016-02-24 15:32:59 -08:00
A. Unique TensorFlower
5f4ec004b8 Tests to check linear_optimizer in tf.contrib.
Change: 115419426
2016-02-24 15:32:48 -08:00
A. Unique TensorFlower
94a992cfc3 Add correct dependencies to sdca ops to fix build breakage.
Change: 115408162
2016-02-24 15:32:39 -08:00
Josh Levenberg
185cff7f41 Make core/framework/graph_def_util.h publicly accessible.
Change: 115384748
2016-02-24 15:31:32 -08:00
Josh Levenberg
2408e359cc Give tensorflow/core/kernels/ its own BUILD file.
Change: 115379524
2016-02-24 15:31:20 -08:00