It's backward compatible. Stats of a source code line
are aggregated from all ops created by that line.
A example.
_TFProfRoot (0us/22.44ms)
model_analyzer_test.py:149:run_filename_as_m...:none (0us/22.44ms)
model_analyzer_test.py:33:_run_code_in_main:none (0us/22.44ms)
model_analyzer_test.py:208:<module>:test.main() (0us/22.44ms)
model_analyzer_test.py:132:testComplexCodeView:x = lib.BuildFull... (0us/22.44ms)
model_analyzer_testlib.py:63:BuildFullModel:return sgd_op.min... (0us/21.83ms)
model_analyzer_testlib.py:54:BuildFullModel:seq.append(array_... (0us/254us)
model_analyzer_testlib.py:42:BuildSmallModel:x = nn_ops.conv2d... (0us/134us)
...
model_analyzer_testlib.py:61:BuildFullModel:loss = nn_ops.l2_... (0us/28us)
model_analyzer_test.py:134:testComplexCodeView:sess.run(variable... (0us/0us)
Change: 155258346
ClusterSpec propagation is a capability upgrade for TensorFlow that should make
it much easier to (1) build distributed TensorFlow clusters, and (2) handle
node failures. The ClusterSpec propagation capability allows TensorFlow workers
to be booted independently of each other, and with no knowledge about others.
The client can then construct a ClusterDef (ClusterSpec), and then send it
to the TF master at session creation. The master in turn then propagates the
ClusterDef along to all of the workers.
Change: 155159972
Cleanups:
* Remove a deprecated entry point, update callers to use the Status-returning entry point. Rename DoConstantFoldingWithStatus to ConstantFold, now that we have removed the "without Status" API.
* Hide an internal function from the header.
* Move ConstantFoldingOptions into constant_folding.h
Change: 154462306
This is a numerically stable version of tf.log(tf.sigmoid(x)). It's just
-tf.nn.softplus(-x), but it's easy to add and the identity is easy to mistype.
RELNOTES: Add tf.log_sigmoid(x) = tf.log(tf.sigmoid(x)) = -tf.nn.softplus(-x).
Fixes#3719.
Change: 154308666
This change reduces the overhead imposed by string processing and
rendezvous invocation in the DirectSession::Run() call by 1--2 microseconds
per value fed or fetched.
RELNOTES: Improved DirectSession::Run() overhead and error checking. Feeding a value of the wrong type will now synchronously raise an INVALID_ARGUMENT error instead of asynchronously raising an INTERNAL error. Code that depends on the (undefined) behavior when feeding a tensor of the wrong type may need to be updated.
Change: 153797943
This contains TensorFlowInferenceInterface and the Java API, as well as all of the native prebuilt libraries. This means that TF can be integrated into an Android Studio app simply by downloading the AAR file to e.g. aar/, and then adding the following into a gradle build file:
allprojects {
repositories {
jcenter()
flatDir {
dirs 'aar'
}
}
}
dependencies {
compile(name:'tensorflow', ext:'aar')
}
Change: 153741338
- remove \ from within strings
- remove :0 from inputs and outputs, so fold_constants works
- make sure fold_(old_)batch_norms runs before quantize_weigths
and round_weights.
Change: 153728959
Windows: Fix typo in C API binary release script.
(Not that the C API binary release for Windows is ready to release yet)
Linux/Mac: Fix typo in libtensorflow_proto.zip location
Change: 153656805
This adds:
* tf.train.sdca_optimizer
* tf.train.sdca_fprint
* tf.train.sdca_shrink_l1
which were previously documented, and prior to 1.0, in tf.sdca.
In 1.0, they were absent from tf.sdca, so this does not break
compatibility.
The module tf.sdca is removed.
Change: 153176548
Other optimizers which have only a single slot variable allow control through
the 'name' constructor parameter, but the FtrlOptimizer has two variables.
Because they both are created with the same 'name' parameter, one of them has
name as a suffix, and the other has name + "_1" as a suffix. This change
allows them to be specified in a more controllable way.
Change: 152802478
This method returns a Python callable that has the same semantics as
`tf.Session.run()`, but can cache some of the work that must be done
to map Tensor-like objects to the arguments of the underlying C API
function.
The initial implementation is optimized for single-`Tensor` and
single-`Operation` fetches, and delegates to `tf.Session.run()` for
handling feeds. Since most queue runners use a single-`Operation`
`run()` call, switch the `tf.train.QueueRunner` implementation to use
`make_callable()`
Using this new interface can improve the latency of small steps (measurements from my workstation):
* The median time to fetch a 4-byte tensor decreases from 99us to 52us (-47us).
* The median time to run a trivial op decreases from 80us to 31us (-49us).
Change: 152757301