Merge changes from github.

PiperOrigin-RevId: 201011811
This commit is contained in:
Akshay Modi 2018-06-18 09:57:19 -07:00 committed by TensorFlower Gardener
parent 8ecf506fb8
commit e80732c989
232 changed files with 3340 additions and 906 deletions

View File

@ -90,7 +90,7 @@ Bazel BUILD files also need to include a license section, e.g.,
Changes to TensorFlow C++ code should conform to
[Google C++ Style Guide](https://google.github.io/styleguide/cppguide.html).
Use `clang-tidy` to check your C/C++ changes. To install clang-tidy on ubuntu:16.04, do:
Use `clang-tidy` to check your C/C++ changes. To install `clang-tidy` on ubuntu:16.04, do:
```bash
apt-get install -y clang-tidy

View File

@ -56,6 +56,7 @@ $ python
42
>>> sess.close()
```
Learn more examples about how to do specific tasks in TensorFlow at the [tutorials page of tensorflow.org](https://www.tensorflow.org/tutorials/).
## Contribution guidelines

View File

@ -1,3 +1,62 @@
# Release 1.9.0
## Major Features And Improvements
* Update tf.keras to the Keras 2.1.6 API.
* `tfe.Network` is deprecated. Please inherit from `tf.keras.Model`.
* Adding support of core feature columns and losses to gradient boosted trees estimators.
* The distributions.Bijector API supports broadcasting for Bijectors with new API changes. See [here](https://www.tensorflow.org/versions/r1.9/api_docs/python/tf/distributions/bijectors/Bijector) for more details.
* Layered variable names have changed in the following conditions:
* Using `tf.keras.layers` with custom variable scopes.
* Using `tf.layers` in a subclassed `tf.keras.Model` class. See [here](https://www.tensorflow.org/versions/r1.9/api_docs/python/tf/layers) for more details
## Breaking Chances
* If you're opening empty variable scopes; replace `variable_scope`('', ...) by `variable_scope`(`tf.get_variable_scope()`, ...).
## Bug Fixes and Other Changes
* `tf.data`:
* The `DatasetBase::DebugString()` method is now `const`.
* Added the `tf.contrib.data.sample_from_datasets()` API for randomly sampling from multiple datasets.
* Eager Execution:
* `tf.keras`:
* Move Keras code out of _impl folder and remove API files.
* `tf.keras.Model.save_weights` now saves in TensorFlow format by default.
* Enable dataset iterators to be passed to `tf.keras.Model` training/eval methods.
* Accelerated Linear Algebra (XLA):
* TensorFlow Debugger (tfdbg): fix an issue in which the TensorBoard Debugger Plugin could not handle total source file size exceeding gRPC message size limit (4 MB).
* `tf.contrib`:
* Add `tf.contrib.data.choose_from_datasets()`.
* `tf.contrib.data.make_csv_dataset()` now supports line breaks in quoted strings. Two arguments were removed from `make_csv_dataset`.
* `tf.contrib.framework.zero_initializer` supports ResourceVariable.
* Adding "constrained_optimization" to tensorflow/contrib.
* Other:
* Add GCS Configuration Ops.
* Changing signature of `MakeIterator` to enable propagating error status.
* KL divergence for two Dirichlet distributions.
* More consistent GcsFileSystem behavior for certain reads past EOF.
* Update benchmark for tf.scan to match ranges across eager and graph modes.
* Fixed bug in `tf.reduce_prod gradient` for complex dtypes.
* Add optional `args` argument to `Dataset.from_generator()`.
* Allow the use of '.' in variables (e.g. "hparams.parse('a.b=1.0')"), which would previously raise an error. This will correspond to an attribute name with an embedded '.' symbol (e.g. 'a.b'), which can only be accessed indirectly (e.g. through getattr and setattr). To set this up the user will first need to explicitly add the variable to the hparam object (e.g. "hparams.add_hparam(name='a.b', value=0.0)").
* Benchmark for tf.scan in graph and eager modes.
* Added complex128 support to FFT, FFT2D, FFT3D, IFFT, IFFT2D, and IFFT3D.
* Making ids unique in `nn.embedding_lookup_sparse`. This helps to reduce RPC calls for looking up the embeddings when there are repeated ids in the batch.
* Support indicator column in boosted trees.
* Prevent `tf.gradients()` from backpropagating through integer tensors.
* LinearOperator[1D,2D,3D]Circulant added to `tensorflow.linalg`.
* Conv3D, Conv3DBackpropInput, Conv3DBackpropFilter now supports arbitrary.
* Added `tf.train.Checkpoint` for reading/writing object-based checkpoints.
* `Dataset.list_files()` now produces determinstic results when `shuffle=False` or a `seed` is passed.
* Added LinearOperatorKronecker, a dense-free implementation of the Kronecker Product.
* Allow LinearOperator to broadcast.
* SavedModelBuilder will now deduplicate asset names that point to files with the same basename and the same contents. Note that this may result in new asset files included in SavedModels in cases where assets with the same name but different contents were previously overwriting each other.
## Thanks to our Contributors
This release contains contributions from many people at Google, as well as:
Abdullah Alrasheed, Achal Shah, Ad-530, ADiegoCAlonso, Aditya Yogi, Ag Ramesh, akindyakov, Andy Kernahan, Anya Petrova, Aurelien Geron, Ben, Ben Barsdell, Bhavani-Subramanian, braincodercn, Brett Koonce, Brian Nemsick, Brian Zier, Bryan Heden, candy.dc, cclauss, Clayne Robison, ctiijima, Dalmo Cirne, David Norman, David T.H. Kao, DosLin, ekelsen, Elson Rodriguez, Erik Smistad, Felix Abecassis, Fergal Cotter, fo40225, foo0x29a, Freedom" Koan-Sin Tan, FréDéRic Branchaud-Charron, gdh1995, Geoffrey Irving, Giuseppe, gracehoney, Guido Zuidhof, Guillaume Klein, Guozhong Zhuang, Haggai, Harald Husum, imsheridan, Ivan Zhang, Jan Zikes, Jayaram Bobba, Jesse Benson, Jesse Gumz, Jiajia Li, Jie, jinghuangintel, Jingwen, jjsjann123, Joe Yearsley, Joel Hestness, Joel Shor, josephyearsley, Junpeng Lao, Karol M. Langner, Kb Sriram, krantideep95, Krish Ravindranath, Letian Feng, Loo Rong Jie, Lukas Geiger, Maciej, Mahmoud Abuzaina, ManHyuk, Mark Ryan, mbhuiyan, Michal Turek, Mostafa Alaa, Myungsung Kwak, Nand Dalal, Nehal J Wani, Neil Tenenholtz, ngc92, Nicholas Nadeau, P.Eng., Avs, Niranjan Hasabnis, P-Hidringer, Paul Van Eck, Peng Yu, Qing Zhao, Qingying Chen, Quanlong, Rajendra Arora, Rholais Lii, rmanyari, Robin Richtsfeld, Russell Klopfer, Sagi, Sam Sendelbach, Sandeep N Gupta, Sandip Giri, Sarah Edkins, Scott Tseng, Sdalbsoo, Sergii Khomenko, Seungwoo Choi (Biggie), Seyed Majid Azimi, Shaoning Zeng, shengfuintel, Siu Kei, Muk, Smit Shilu, soonson, Stefan Schweter, Sukhwan Kim, Sunitha Kambhampati, Taehoon Lee, tamimaddari82, Tang, Wenyi, Ted Chang, u2takey, Utkarsh Upadhyay, Vadim Markovtsev, voegtlel, Wai Hon Law, wangsiyu, Wenhao Hu, wenhao.hu, William D. Irons, Yan Facai (颜发才), Yanbo Liang, Yihong Wang, Yilei (Dolee) Yang, Yong Tang, Yuan (Terry) Tang
# Release 1.8.0
## Major Features And Improvements
@ -404,14 +463,6 @@ answered questions, and were part of inspiring discussions.
# Release 1.4.0
## Major Features And Improvements
* `tf.keras` is now part of the core TensorFlow API.
* [`tf.data`](http://tensorflow.org/programmers_guide/datasets) is now part of
the core TensorFlow API.
* The API is now subject to backwards compatibility guarantees.
# Release 1.4.0
## Major Features And Improvements
* `tf.keras` is now part of the core TensorFlow API.
* [`tf.data`](http://tensorflow.org/programmers_guide/datasets) is now part of

View File

@ -1397,6 +1397,10 @@ def set_grpc_build_flags():
write_to_bazelrc('build --define grpc_no_ares=true')
def set_build_strip_flag():
write_to_bazelrc('build --strip=always')
def set_windows_build_flags():
if is_windows():
# The non-monolithic build is not supported yet
@ -1519,6 +1523,7 @@ def main():
set_grpc_build_flags()
set_cc_opt_flags(environ_cp)
set_build_strip_flag()
set_windows_build_flags()
if get_var(

View File

@ -475,7 +475,7 @@ tf_cc_shared_object(
# excludes all but a subset of function names.
# On MacOS, the linker does not support version_script, but has an
# an "-exported_symbols_list" command. -z defs disallows undefined
# symbols in object files and -s strips the output.
# symbols in object files.
tf_cc_shared_object(
name = "libtensorflow.so",
@ -489,7 +489,6 @@ tf_cc_shared_object(
"//tensorflow:windows_msvc": [],
"//conditions:default": [
"-z defs",
"-s",
"-Wl,--version-script", # This line must be directly followed by the version_script.lds file
"$(location //tensorflow/c:version_script.lds)",
],
@ -515,7 +514,6 @@ tf_cc_shared_object(
"//tensorflow:windows_msvc": [],
"//conditions:default": [
"-z defs",
"-s",
"-Wl,--version-script", # This line must be directly followed by the version_script.lds file
"$(location //tensorflow:tf_version_script.lds)",
],

View File

@ -15,10 +15,12 @@
# ==============================================================================
TF_PREFIX='/usr/local'
LIBDIR='lib'
usage() {
echo "Usage: $0 OPTIONS"
echo -e "-p, --prefix\tset installation prefix (default: /usr/local)"
echo -e "-l, --libdir\tset lib directory (default: lib)"
echo -e "-v, --version\tset TensorFlow version"
echo -e "-h, --help\tdisplay this message"
}
@ -26,7 +28,7 @@ usage() {
[ $# == 0 ] && usage && exit 0
# read the options
ARGS=$(getopt -o p:v:h --long prefix:,version:,help -n $0 -- "$@")
ARGS=$(getopt -o p:l:v:h --long prefix:,libdir:,version:,help -n $0 -- "$@")
eval set -- "$ARGS"
# extract options and their arguments into variables.
@ -38,6 +40,11 @@ while true ; do
"") shift 2 ;;
*) TF_PREFIX=$2 ; shift 2 ;;
esac ;;
-l|--libdir)
case "$2" in
"") shift 2 ;;
*) LIBDIR=$2 ; shift 2 ;;
esac ;;
-v|--version)
case "$2" in
"") shift 2 ;;
@ -55,7 +62,7 @@ echo "Generating pkgconfig file for TensorFlow $TF_VERSION in $TF_PREFIX"
cat << EOF > tensorflow.pc
prefix=${TF_PREFIX}
exec_prefix=\${prefix}
libdir=\${exec_prefix}/lib
libdir=\${exec_prefix}/${LIBDIR}
includedir=\${prefix}/include
Name: TensorFlow

View File

@ -38,6 +38,7 @@ REGISTER_NO_GRADIENT_OP("NotEqual");
REGISTER_NO_GRADIENT_OP("LogicalAnd");
REGISTER_NO_GRADIENT_OP("LogicalOr");
REGISTER_NO_GRADIENT_OP("LogicalNot");
REGISTER_NO_GRADIENT_OP("Floor");
// Conjugate helper function returns the conjugate of an Output if it
// is complex valued.

View File

@ -255,6 +255,53 @@ Status LRNGradHelper(const Scope& scope, const Operation& op,
}
REGISTER_GRADIENT_OP("LRN", LRNGradHelper);
Status SoftplusGradHelper(const Scope& scope, const Operation& op,
const std::vector<Output>& grad_inputs,
std::vector<Output>* grad_outputs) {
auto dx = internal::SoftplusGrad(scope, grad_inputs[0], op.input(0));
grad_outputs->push_back(dx);
return scope.status();
}
REGISTER_GRADIENT_OP("Softplus", SoftplusGradHelper);
Status SoftsignGradHelper(const Scope& scope, const Operation& op,
const std::vector<Output>& grad_inputs,
std::vector<Output>* grad_outputs) {
auto dx = internal::SoftsignGrad(scope, grad_inputs[0], op.input(0));
grad_outputs->push_back(dx);
return scope.status();
}
REGISTER_GRADIENT_OP("Softsign", SoftsignGradHelper);
Status FractionalAvgPoolGradHelper(const Scope& scope, const Operation& op,
const std::vector<Output>& grad_inputs,
std::vector<Output>* grad_outputs) {
bool overlapping;
TF_RETURN_IF_ERROR(
GetNodeAttr(op.output(0).node()->attrs(), "overlapping", &overlapping));
auto dx = internal::FractionalAvgPoolGrad(
scope, Shape(scope, op.input(0), Shape::OutType(DT_INT64)),
grad_inputs[0], op.output(1), op.output(2),
internal::FractionalAvgPoolGrad::Overlapping(overlapping));
grad_outputs->push_back(dx);
return scope.status();
}
REGISTER_GRADIENT_OP("FractionalAvgPool", FractionalAvgPoolGradHelper);
Status FractionalMaxPoolGradHelper(const Scope& scope, const Operation& op,
const std::vector<Output>& grad_inputs,
std::vector<Output>* grad_outputs) {
bool overlapping;
TF_RETURN_IF_ERROR(
GetNodeAttr(op.output(0).node()->attrs(), "overlapping", &overlapping));
auto dx = internal::FractionalMaxPoolGrad(
scope, op.input(0), op.output(0), grad_inputs[0], op.output(1),
op.output(2), internal::FractionalMaxPoolGrad::Overlapping(overlapping));
grad_outputs->push_back(dx);
return scope.status();
}
REGISTER_GRADIENT_OP("FractionalMaxPool", FractionalMaxPoolGradHelper);
} // anonymous namespace
} // namespace ops
} // namespace tensorflow

View File

@ -28,6 +28,8 @@ namespace {
using ops::BiasAdd;
using ops::Conv2D;
using ops::Elu;
using ops::FractionalAvgPool;
using ops::FractionalMaxPool;
using ops::L2Loss;
using ops::LogSoftmax;
using ops::LRN;
@ -41,6 +43,8 @@ using ops::Relu;
using ops::Relu6;
using ops::Selu;
using ops::Softmax;
using ops::Softplus;
using ops::Softsign;
class NNGradTest : public ::testing::Test {
protected:
@ -71,22 +75,30 @@ class NNGradTest : public ::testing::Test {
EXPECT_LT(max_error, 1e-3);
}
// Sets tensor with random values, ensuring that the max value is largest by
// a reasonable amount.
// This is an issue for MaxPool, MaxPoolV2 and MaxPool3D, in which
// perturbations by the numeric gradient computation in the gradient checker
// can change the max value if values are too close together.
// Sets tensor with random values, ensuring that every pair of elements are at
// least a reasonable amount apart.
// This is an issue for max pooling operations, in which perturbations by the
// numeric gradient computation in the gradient checker can change the max
// value if a pool has values that are too close together.
template <typename T>
void SetRandomValuesWithBumpedMax(Tensor* tensor) {
void SetRandomValuesForMaxPooling(Tensor* tensor) {
auto tensor_flat = tensor->flat<T>();
tensor_flat.setRandom();
int32 max_index = 0;
for (size_t i = 1; i < tensor->NumElements(); i++) {
if (tensor_flat(i) > tensor_flat(max_index)) {
max_index = i;
}
// First set the array to an increasing sequence of values spaced
// a reasonable amount apart
T cur = 0;
for (size_t i = 0; i < tensor->NumElements(); i++) {
tensor_flat(i) = cur;
cur += 5e-2;
}
// Fischer-Yates shuffle the array
for (size_t i = tensor->NumElements() - 1; i >= 1; i--) {
// j <- random integer 0 <= j <= i
size_t j = random::New64() % (i + 1);
// swap values at i, j
T tmp = tensor_flat(i);
tensor_flat(i) = tensor_flat(j);
tensor_flat(j) = tmp;
}
tensor_flat(max_index) += 1e-2;
}
Scope scope_;
@ -189,7 +201,7 @@ TEST_F(NNGradTest, MaxPoolGradHelper) {
const std::vector<int> strides{1, 2, 2, 1};
auto y = MaxPool(scope_, x, ksize, strides, "VALID");
Tensor x_init_value = Tensor(DT_FLOAT, x_shape);
SetRandomValuesWithBumpedMax<float>(&x_init_value);
SetRandomValuesForMaxPooling<float>(&x_init_value);
RunTest(x, x_init_value, y, y_shape);
}
@ -202,7 +214,7 @@ TEST_F(NNGradTest, MaxPoolGradV2Helper) {
Tensor strides = test::AsTensor<int>({1, 2, 2, 1}, {4});
auto y = MaxPoolV2(scope_, x, ksize, strides, "VALID");
Tensor x_init_value = Tensor(DT_FLOAT, x_shape);
SetRandomValuesWithBumpedMax<float>(&x_init_value);
SetRandomValuesForMaxPooling<float>(&x_init_value);
RunTest(x, x_init_value, y, y_shape);
}
@ -215,7 +227,7 @@ TEST_F(NNGradTest, MaxPool3DGradHelper) {
const std::vector<int> strides{1, 3, 3, 3, 1};
auto y = MaxPool3D(scope_, x, ksize, strides, "VALID");
Tensor x_init_value = Tensor(DT_FLOAT, x_shape);
SetRandomValuesWithBumpedMax<float>(&x_init_value);
SetRandomValuesForMaxPooling<float>(&x_init_value);
RunTest(x, x_init_value, y, y_shape);
}
@ -248,5 +260,45 @@ TEST_F(NNGradTest, LRN){
RunTest(x, x_shape, y, x_shape);
}
TEST_F(NNGradTest, SoftplusGrad) {
TensorShape shape({3, 7});
auto x = Placeholder(scope_, DT_FLOAT, Placeholder::Shape(shape));
auto y = Softplus(scope_, x);
RunTest(x, shape, y, shape);
}
TEST_F(NNGradTest, SoftsignGrad) {
TensorShape shape({3, 7});
auto x = Placeholder(scope_, DT_FLOAT, Placeholder::Shape(shape));
auto y = Softsign(scope_, x);
RunTest(x, shape, y, shape);
}
TEST_F(NNGradTest, FractionalAvgPoolGradHelper) {
TensorShape x_shape({1, 3, 7, 1});
auto x = Placeholder(scope_, DT_FLOAT, Placeholder::Shape(x_shape));
// Force consistent pooling regions for unit testing.
auto y = FractionalAvgPool(
scope_, x, {1, 1.2, 1.9, 1},
FractionalAvgPool::Deterministic(true).Overlapping(true).Seed(1).Seed2(
2));
TensorShape y_shape({1, 2, 3, 1});
RunTest(x, x_shape, y.output, y_shape);
}
TEST_F(NNGradTest, FractionalMaxPoolGradHelper) {
TensorShape x_shape({1, 3, 7, 1});
auto x = Placeholder(scope_, DT_FLOAT, Placeholder::Shape(x_shape));
// Force consistent pooling regions for unit testing.
auto y = FractionalMaxPool(
scope_, x, {1, 1.2, 1.9, 1},
FractionalMaxPool::Deterministic(true).Overlapping(true).Seed(1).Seed2(
2));
Tensor x_init_value = Tensor(DT_FLOAT, x_shape);
SetRandomValuesForMaxPooling<float>(&x_init_value);
TensorShape y_shape({1, 2, 3, 1});
RunTest(x, x_init_value, y.output, y_shape);
}
} // namespace
} // namespace tensorflow

View File

@ -56,9 +56,9 @@ namespace bar {
//
// Memory stats:
// arg bytes total: 104
// arg bytes aligned: 128
// arg bytes aligned: 192
// temp bytes total: 126
// temp bytes aligned: 224
// temp bytes aligned: 320
class MyClass : public tensorflow::XlaCompiledCpuFunction {
public:
// Number of input arguments for the compiled computation.

View File

@ -71,7 +71,7 @@ struct ProtobufToEmbed {
const ::tensorflow::protobuf::MessageLite* message;
};
// Embeds a a sequence of protocol buffers into an object file.
// Embeds a sequence of protocol buffers into an object file.
//
// `target_triple` is the target triple for the target architecture for the
// generated object file.

View File

@ -25,8 +25,8 @@ namespace tensorflow {
namespace tfcompile {
namespace runtime {
// Align to 32-bytes, to mimic tensorflow::Allocator::kAllocatorAlignment.
static constexpr size_t kAlign = 32;
// Align to 64-bytes, to mimic tensorflow::Allocator::kAllocatorAlignment.
static constexpr size_t kAlign = 64;
// aligned_buffer_bytes returns the sum of each size in `sizes`, skipping -1
// values. There are `n` entries in `sizes`. Each buffer is aligned to kAlign

View File

@ -24,7 +24,7 @@ namespace runtime {
namespace {
TEST(Runtime, AlignmentValue) {
// We've chosen 32 byte alignment for the tfcompile runtime to mimic the
// We've chosen 64 byte alignment for the tfcompile runtime to mimic the
// regular tensorflow allocator, which was chosen to play nicely with Eigen.
// The tfcompile runtime also has a requirement that comes from the xla
// generated code, on the relation: buffer_size >= 16 ? 2 * sizeof(void*) : 8
@ -39,13 +39,13 @@ TEST(Runtime, AlignedBufferBytes) {
EXPECT_EQ(aligned_buffer_bytes(sizesA, 1), 0);
static constexpr intptr_t sizesB[1] = {3};
EXPECT_EQ(aligned_buffer_bytes(sizesB, 1), 32);
EXPECT_EQ(aligned_buffer_bytes(sizesB, 1), 64);
static constexpr intptr_t sizesC[1] = {32};
EXPECT_EQ(aligned_buffer_bytes(sizesC, 1), 32);
EXPECT_EQ(aligned_buffer_bytes(sizesC, 1), 64);
static constexpr intptr_t sizesD[7] = {1, -1, 32, -1, 64, 2, 3};
EXPECT_EQ(aligned_buffer_bytes(sizesD, 7), 192);
EXPECT_EQ(aligned_buffer_bytes(sizesD, 7), 320);
}
void* add_ptr(void* base, uintptr_t delta) {
@ -101,11 +101,11 @@ TEST(Runtime, MallocFreeContiguousBuffers) {
EXPECT_NE(base, nullptr);
EXPECT_EQ(bufD[0], add_ptr(base, 0));
EXPECT_EQ(bufD[1], nullptr);
EXPECT_EQ(bufD[2], add_ptr(base, 32));
EXPECT_EQ(bufD[2], add_ptr(base, 64));
EXPECT_EQ(bufD[3], nullptr);
EXPECT_EQ(bufD[4], add_ptr(base, 64));
EXPECT_EQ(bufD[5], add_ptr(base, 128));
EXPECT_EQ(bufD[6], add_ptr(base, 160));
EXPECT_EQ(bufD[4], add_ptr(base, 128));
EXPECT_EQ(bufD[5], add_ptr(base, 192));
EXPECT_EQ(bufD[6], add_ptr(base, 256));
for (int i = 0; i < 7; ++i) {
const intptr_t size = sizesD[i];
if (size != -1) {

View File

@ -178,6 +178,7 @@ cc_library(
":runtime_matmul",
":runtime_matmul_mkl",
":runtime_single_threaded_conv2d",
":runtime_single_threaded_fft",
":runtime_single_threaded_matmul",
"@llvm//:execution_engine",
"@llvm//:core",
@ -516,7 +517,6 @@ cc_library(
deps = [
"//tensorflow/compiler/xla:executable_run_options",
"//tensorflow/compiler/xla:xla_data_proto",
"//tensorflow/core:framework",
"//tensorflow/core:framework_lite",
"//third_party/eigen3",
],
@ -578,6 +578,22 @@ cc_library(
],
)
cc_library(
name = "runtime_single_threaded_fft",
srcs = [
"runtime_fft_impl.h",
"runtime_single_threaded_fft.cc",
],
hdrs = ["runtime_single_threaded_fft.h"],
copts = runtime_copts(),
visibility = ["//visibility:public"],
deps = [
"//tensorflow/compiler/xla:xla_data_proto",
"//tensorflow/core:framework_lite",
"//third_party/eigen3",
],
)
cc_library(
name = "runtime_single_threaded_matmul",
srcs = ["runtime_single_threaded_matmul.cc"],

View File

@ -51,6 +51,8 @@ extern const char* const kEigenConvF16SymbolName =
extern const char* const kEigenConvF32SymbolName =
"__xla_cpu_runtime_EigenConvF32";
extern const char* const kEigenFftSymbolName = "__xla_cpu_runtime_EigenFft";
extern const char* const kEigenSingleThreadedFftSymbolName =
"__xla_cpu_runtime_EigenSingleThreadedFft";
extern const char* const kEigenSingleThreadedMatMulF16SymbolName =
"__xla_cpu_runtime_EigenSingleThreadedMatMulF16";
extern const char* const kEigenSingleThreadedMatMulF32SymbolName =

View File

@ -52,6 +52,7 @@ extern const char* const kMKLSingleThreadedMatMulF64SymbolName;
extern const char* const kEigenConvF16SymbolName;
extern const char* const kEigenConvF32SymbolName;
extern const char* const kEigenFftSymbolName;
extern const char* const kEigenSingleThreadedFftSymbolName;
extern const char* const kEigenSingleThreadedMatMulF16SymbolName;
extern const char* const kEigenSingleThreadedMatMulF32SymbolName;
extern const char* const kEigenSingleThreadedMatMulF64SymbolName;

View File

@ -1172,7 +1172,13 @@ Status IrEmitter::HandleFft(HloInstruction* fft) {
{int8_ptr_type, int8_ptr_type, int8_ptr_type, int32_type, int32_type,
int64_type, int64_type, int64_type, int64_type},
/*isVarArg=*/false);
const char* fn_name = runtime::kEigenFftSymbolName;
bool multi_threaded_eigen =
hlo_module_config_.debug_options().xla_cpu_multi_thread_eigen();
const char* fn_name = multi_threaded_eigen
? runtime::kEigenFftSymbolName
: runtime::kEigenSingleThreadedFftSymbolName;
llvm::Function* fft_func = llvm::cast<llvm::Function>(
module_->getOrInsertFunction(fn_name, fft_type));
fft_func->setCallingConv(llvm::CallingConv::C);

View File

@ -21,8 +21,6 @@ limitations under the License.
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
#include "tensorflow/compiler/xla/xla_data.pb.h"
#include "tensorflow/core/framework/numeric_types.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/framework/tensor_shape.h"
#include "tensorflow/core/platform/types.h"
// 'tensorflow' namespace is used so that int64 and other types don't require
@ -71,11 +69,9 @@ void EigenFftR2C(const EigenDevice& device, complex64* out, float* operand,
in_dims[0] = input_batch;
Eigen::DSizes<Eigen::DenseIndex, FFTRank + 1> out_dims;
out_dims[0] = input_batch;
TensorShape temp_shape{input_batch};
for (int i = 0; i < FFTRank; i++) {
in_dims[i + 1] = fft_shape[i];
out_dims[i + 1] = i == FFTRank - 1 ? fft_shape[i] / 2 + 1 : fft_shape[i];
temp_shape.AddDim(fft_shape[i]);
}
const Eigen::TensorMap<Eigen::Tensor<float, FFTRank + 1, Eigen::RowMajor>,
Eigen::Aligned>
@ -88,8 +84,8 @@ void EigenFftR2C(const EigenDevice& device, complex64* out, float* operand,
const auto axes = Eigen::ArrayXi::LinSpaced(FFTRank, 1, FFTRank);
// Compute the full FFT using a temporary tensor.
Tensor temp(DataTypeToEnum<complex64>::v(), temp_shape);
auto full_fft = temp.flat_inner_dims<complex64, FFTRank + 1>();
Eigen::Tensor<complex64, FFTRank + 1, Eigen::RowMajor> full_fft(in_dims);
const Eigen::DSizes<Eigen::DenseIndex, FFTRank + 1> zero_start_indices;
full_fft.device(device) =
input.template fft<Eigen::BothParts, Eigen::FFT_FORWARD>(axes);
@ -112,11 +108,9 @@ void EigenFftC2R(const EigenDevice& device, float* out, complex64* operand,
in_dims[0] = input_batch;
Eigen::DSizes<Eigen::DenseIndex, FFTRank + 1> out_dims;
out_dims[0] = input_batch;
TensorShape temp_shape{input_batch};
for (int i = 0; i < FFTRank; i++) {
in_dims[i + 1] = i == FFTRank - 1 ? fft_shape[i] / 2 + 1 : fft_shape[i];
out_dims[i + 1] = fft_shape[i];
temp_shape.AddDim(fft_shape[i]);
}
const Eigen::TensorMap<Eigen::Tensor<complex64, FFTRank + 1, Eigen::RowMajor>,
Eigen::Aligned>
@ -129,8 +123,7 @@ void EigenFftC2R(const EigenDevice& device, float* out, complex64* operand,
// region we will slice from input given fft_shape. We slice input to
// fft_shape on its inner-most dimensions, except the last (which we
// slice to fft_shape[-1] / 2 + 1).
Tensor temp(DataTypeToEnum<complex64>::v(), temp_shape);
auto full_fft = temp.flat_inner_dims<complex64, FFTRank + 1>();
Eigen::Tensor<complex64, FFTRank + 1, Eigen::RowMajor> full_fft(out_dims);
// Calculate the starting point and range of the source of
// negative frequency part.
@ -179,7 +172,6 @@ template <int FFTRank, typename EigenDevice>
void EigenFftWithRank(const EigenDevice& device, void* out, void* operand,
int32 fft_type, int64 input_batch, int64 fft_length0,
int64 fft_length1, int64 fft_length2) {
CHECK(::xla::FftType_IsValid(fft_type)) << fft_type;
switch (fft_type) {
case ::xla::FftType::FFT:
EigenFftC2C<true, FFTRank, EigenDevice>(
@ -204,7 +196,8 @@ void EigenFftWithRank(const EigenDevice& device, void* out, void* operand,
input_batch, fft_length0, fft_length1, fft_length2);
break;
default:
LOG(FATAL) << "Unsupported FFT type: " << fft_type;
// Unsupported FFT type
abort();
}
}
@ -230,7 +223,8 @@ void EigenFftImpl(const EigenDevice& device, void* out, void* operand,
fft_length1, fft_length2);
break;
default:
LOG(FATAL) << "Unsupported FFT rank " << fft_rank;
// Unsupported FFT rank
abort();
}
}

View File

@ -0,0 +1,32 @@
/* Copyright 2018 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#include "tensorflow/compiler/xla/service/cpu/runtime_single_threaded_fft.h"
#include "tensorflow/compiler/xla/service/cpu/runtime_fft_impl.h"
#include "tensorflow/core/platform/dynamic_annotations.h"
#include "tensorflow/core/platform/types.h"
using tensorflow::int32;
using tensorflow::int64;
TF_ATTRIBUTE_NO_SANITIZE_MEMORY void __xla_cpu_runtime_EigenSingleThreadedFft(
const void* run_options_ptr, void* out, void* operand, int32 fft_type,
int32 fft_rank, int64 input_batch, int64 fft_length0, int64 fft_length1,
int64 fft_length2) {
tensorflow::xla::EigenFftImpl(Eigen::DefaultDevice(), out, operand, fft_type,
fft_rank, input_batch, fft_length0, fft_length1,
fft_length2);
}

View File

@ -0,0 +1,31 @@
/* Copyright 2018 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#ifndef TENSORFLOW_COMPILER_XLA_SERVICE_CPU_RUNTIME_SINGLE_THREADED_FFT_H_
#define TENSORFLOW_COMPILER_XLA_SERVICE_CPU_RUNTIME_SINGLE_THREADED_FFT_H_
#include "tensorflow/core/platform/types.h"
extern "C" {
extern void __xla_cpu_runtime_EigenSingleThreadedFft(
const void* /* xla::ExecutableRunOptions* */ run_options_ptr, void* out,
void* operand, tensorflow::int32 fft_type, tensorflow::int32 fft_rank,
tensorflow::int64 input_batch, tensorflow::int64 fft_length0,
tensorflow::int64 fft_length1, tensorflow::int64 fft_length2);
} // extern "C"
#endif // TENSORFLOW_COMPILER_XLA_SERVICE_CPU_RUNTIME_SINGLE_THREADED_FFT_H_

View File

@ -38,6 +38,7 @@ limitations under the License.
#include "tensorflow/compiler/xla/service/cpu/runtime_matmul.h"
#include "tensorflow/compiler/xla/service/cpu/runtime_matmul_mkl.h"
#include "tensorflow/compiler/xla/service/cpu/runtime_single_threaded_conv2d.h"
#include "tensorflow/compiler/xla/service/cpu/runtime_single_threaded_fft.h"
#include "tensorflow/compiler/xla/service/cpu/runtime_single_threaded_matmul.h"
#include "tensorflow/compiler/xla/service/cpu/windows_compatibility.h"
#include "tensorflow/compiler/xla/types.h"
@ -202,6 +203,7 @@ bool RegisterKnownJITSymbols() {
REGISTER_CPU_RUNTIME_SYMBOL(MKLSingleThreadedMatMulF64);
REGISTER_CPU_RUNTIME_SYMBOL(EigenSingleThreadedConvF16);
REGISTER_CPU_RUNTIME_SYMBOL(EigenSingleThreadedConvF32);
REGISTER_CPU_RUNTIME_SYMBOL(EigenSingleThreadedFft);
REGISTER_CPU_RUNTIME_SYMBOL(EigenSingleThreadedMatMulF16);
REGISTER_CPU_RUNTIME_SYMBOL(EigenSingleThreadedMatMulF32);
REGISTER_CPU_RUNTIME_SYMBOL(EigenSingleThreadedMatMulF64);

View File

@ -204,7 +204,7 @@ class LayoutPattern {
// Modifies the pattern to match only if the layout equals the given proto.
// The layout must outlive the returned pattern.
constexpr LayoutPattern<LayoutType, LayoutPatternEqualImpl<Impl>> EqualTo(
const Layout* layout) const {
const ::xla::Layout* layout) const {
return LayoutPattern<LayoutType, LayoutPatternEqualImpl<Impl>>(
LayoutPatternEqualImpl<Impl>(impl_, layout), matched_layout_);
}

View File

@ -30,10 +30,17 @@ limitations under the License.
namespace xla {
TupleSimplifier::TupleSimplifier(bool exclude_entry_computation) :
exclude_entry_computation_(exclude_entry_computation) {}
StatusOr<bool> TupleSimplifier::Run(HloModule* module) {
// Initially add all GTE and Tuple instructions to the worklist.
std::queue<HloInstruction*> worklist;
for (auto* computation : module->computations()) {
if (exclude_entry_computation_ &&
computation == module->entry_computation()) {
continue;
}
for (auto* instruction : computation->instructions()) {
if (instruction->opcode() == HloOpcode::kTuple ||
instruction->opcode() == HloOpcode::kGetTupleElement) {

View File

@ -27,13 +27,20 @@ namespace xla {
// the module.
class TupleSimplifier : public HloPassInterface {
public:
TupleSimplifier() {}
TupleSimplifier() : TupleSimplifier(/*exclude_entry_computation=*/false) {}
explicit TupleSimplifier(bool exclude_entry_computation);
~TupleSimplifier() override {}
tensorflow::StringPiece name() const override { return "tuple-simplifier"; }
// Run tuple simplification on the given computation. Returns whether the
// computation was changed.
StatusOr<bool> Run(HloModule* module) override;
private:
// When set, this pipeline stage will perform optimization of all computations
// apart from the module's entry computation. This is used by Graphcore's
// backend.
bool exclude_entry_computation_;
};
} // namespace xla

View File

@ -42,6 +42,12 @@ class TupleSimplifierTest : public HloTestBase {
TF_ASSERT_OK(changed_status.status());
EXPECT_EQ(change_expected, changed_status.ValueOrDie());
}
void Run(HloModule* module, bool change_expected, bool exclude_entry) {
TupleSimplifier simplifier(exclude_entry);
auto changed_status = simplifier.Run(module);
TF_ASSERT_OK(changed_status.status());
EXPECT_EQ(change_expected, changed_status.ValueOrDie());
}
const Shape scalar_shape_ = ShapeUtil::MakeShape(F32, {});
const Shape tuple_shape_ = ShapeUtil::MakeTupleShape(
@ -211,5 +217,76 @@ TEST_F(TupleSimplifierTest, IncompatibleTuples) {
EXPECT_THAT(computation->root_instruction(), tuple);
}
TEST_F(TupleSimplifierTest, CanExcludeEntryComputation) {
// Verify that the root computation can be excluded
auto module = CreateNewModule();
HloInstruction* p0;
HloInstruction* p1;
HloComputation* c0;
HloComputation* c1;
HloComputation* entry;
{
HloComputation::Builder builder(TestName() + "_1");
p0 = builder.AddInstruction(
HloInstruction::CreateParameter(0, tuple_shape_, "param"));
HloInstruction* gte0 = builder.AddInstruction(
HloInstruction::CreateGetTupleElement(scalar_shape_, p0, 0));
HloInstruction* gte1 = builder.AddInstruction(
HloInstruction::CreateGetTupleElement(scalar_shape_, p0, 1));
HloInstruction* gte2 = builder.AddInstruction(
HloInstruction::CreateGetTupleElement(scalar_shape_, p0, 2));
builder.AddInstruction(HloInstruction::CreateTuple({gte0, gte1, gte2}));
c0 = module->AddEmbeddedComputation(builder.Build());
}
{
HloComputation::Builder builder(TestName() + "_2");
p1 = builder.AddInstruction(
HloInstruction::CreateParameter(0, tuple_shape_, "param"));
HloInstruction* gte0 = builder.AddInstruction(
HloInstruction::CreateGetTupleElement(scalar_shape_, p1, 0));
HloInstruction* gte1 = builder.AddInstruction(
HloInstruction::CreateGetTupleElement(scalar_shape_, p1, 1));
HloInstruction* gte2 = builder.AddInstruction(
HloInstruction::CreateGetTupleElement(scalar_shape_, p1, 2));
builder.AddInstruction(HloInstruction::CreateTuple({gte0, gte1, gte2}));
c1 = module->AddEmbeddedComputation(builder.Build());
}
{
HloComputation::Builder builder(TestName() + "_Entry");
HloInstruction* tuple_param = builder.AddInstruction(
HloInstruction::CreateParameter(0, tuple_shape_, "param"));
HloInstruction* call0 = builder.AddInstruction(
HloInstruction::CreateCall(tuple_shape_, {tuple_param}, c0));
HloInstruction* call1 = builder.AddInstruction(
HloInstruction::CreateCall(tuple_shape_, {tuple_param}, c1));
HloInstruction* gte0 = builder.AddInstruction(
HloInstruction::CreateGetTupleElement(scalar_shape_, call0, 0));
HloInstruction* gte1 = builder.AddInstruction(
HloInstruction::CreateGetTupleElement(scalar_shape_, call1, 1));
HloInstruction* tuple0 =
builder.AddInstruction(HloInstruction::CreateTuple({gte0, gte1}));
HloInstruction* gte2 = builder.AddInstruction(
HloInstruction::CreateGetTupleElement(scalar_shape_, tuple0, 0));
HloInstruction* gte3 = builder.AddInstruction(
HloInstruction::CreateGetTupleElement(scalar_shape_, tuple0, 1));
builder.AddInstruction(HloInstruction::CreateTuple({gte2, gte3}));
entry = module->AddEntryComputation(builder.Build());
}
Run(module.get(), /*change_expected=*/true, /*exclude_entry=*/ true);
EXPECT_THAT(c0->root_instruction(), p0);
EXPECT_THAT(c1->root_instruction(), p1);
EXPECT_THAT(entry->instruction_count(), 9);
}
} // namespace
} // namespace xla

View File

@ -23,6 +23,7 @@ from __future__ import print_function
# TODO(mdan): Bring only the relevant symbols to the top level.
from tensorflow.contrib.autograph import utils
from tensorflow.contrib.autograph import operators
from tensorflow.contrib.autograph.impl.api import convert
from tensorflow.contrib.autograph.impl.api import converted_call
from tensorflow.contrib.autograph.impl.api import do_not_convert
@ -43,6 +44,8 @@ _allowed_symbols = [
'do_not_convert',
'to_code',
'to_graph',
# Overloaded operators
'operators',
# Special functions and directives
'set_element_type',
'set_loop_options',

View File

@ -37,13 +37,15 @@ add_dependencies(
tf_core_lib
tf_protos_cc)
add_library(tf_c_python_api OBJECT
"${tensorflow_source_dir}/tensorflow/c/python_api.cc"
"${tensorflow_source_dir}/tensorflow/c/python_api.h"
)
add_dependencies(
tf_c_python_api
tf_c
tf_core_lib
tf_core_framework
tf_protos_cc)
if(tensorflow_BUILD_PYTHON_BINDINGS)
add_library(tf_c_python_api OBJECT
"${tensorflow_source_dir}/tensorflow/c/python_api.cc"
"${tensorflow_source_dir}/tensorflow/c/python_api.h"
)
add_dependencies(
tf_c_python_api
tf_c
tf_core_lib
tf_core_framework
tf_protos_cc)
endif()

View File

@ -155,7 +155,7 @@ if (WIN32)
set (pywrap_tensorflow_lib "${CMAKE_CURRENT_BINARY_DIR}/pywrap_tensorflow_internal.lib")
endif()
else (WIN32)
set (pywrap_tensorflow_lib "${CMAKE_CURRENT_BINARY_DIR}/libpywrap_tensorflow_internal.so")
set (pywrap_tensorflow_lib "${CMAKE_CURRENT_BINARY_DIR}/libpywrap_tensorflow_internal${CMAKE_SHARED_LIBRARY_SUFFIX}")
endif (WIN32)
add_custom_target(tf_extension_ops)

View File

@ -715,7 +715,7 @@ if(WIN32)
endif()
else()
add_custom_command(TARGET pywrap_tensorflow_internal POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_CURRENT_BINARY_DIR}/libpywrap_tensorflow_internal.so
COMMAND ${CMAKE_COMMAND} -E copy ${CMAKE_CURRENT_BINARY_DIR}/libpywrap_tensorflow_internal${CMAKE_SHARED_LIBRARY_SUFFIX}
${CMAKE_CURRENT_BINARY_DIR}/tf_python/tensorflow/python/_pywrap_tensorflow_internal.so)
endif()
@ -832,7 +832,6 @@ add_custom_command(TARGET tf_python_build_pip_package POST_BUILD
add_custom_command(TARGET tf_python_copy_scripts_to_destination PRE_BUILD
COMMAND ${CMAKE_COMMAND} -E copy ${tensorflow_source_dir}/tensorflow/contrib/testing/python/framework/util_test.py
${CMAKE_CURRENT_BINARY_DIR}/tf_python/tensorflow/contrib/testing/python/framework/)
add_custom_command(TARGET tf_python_build_pip_package POST_BUILD
COMMAND ${CMAKE_COMMAND} -E copy ${tensorflow_source_dir}/tensorflow/tools/pip_package/README
${CMAKE_CURRENT_BINARY_DIR}/tf_python/)

View File

@ -44,7 +44,8 @@ UNDNAME = "undname.exe"
DUMPBIN = "dumpbin.exe"
# Exclude if matched
EXCLUDE_RE = re.compile(r"RTTI|deleting destructor|::internal::")
EXCLUDE_RE = re.compile(r"RTTI|deleting destructor|::internal::|Internal|"
r"python_op_gen_internal|grappler")
# Include if matched before exclude
INCLUDEPRE_RE = re.compile(r"google::protobuf::internal::ExplicitlyConstructed|"
@ -56,6 +57,10 @@ INCLUDEPRE_RE = re.compile(r"google::protobuf::internal::ExplicitlyConstructed|"
r"tensorflow::ops::internal::Enter|"
r"tensorflow::strings::internal::AppendPieces|"
r"tensorflow::strings::internal::CatPieces|"
r"tensorflow::errors::Internal|"
r"tensorflow::Tensor::CopyFromInternal|"
r"tensorflow::kernel_factory::"
r"OpKernelRegistrar::InitInternal|"
r"tensorflow::io::internal::JoinPathImpl")
# Include if matched after exclude
@ -64,7 +69,7 @@ INCLUDE_RE = re.compile(r"^(TF_\w*)$|"
r"tensorflow::|"
r"functor::|"
r"\?nsync_|"
r"perftools::gputools")
r"stream_executor::")
# We want to identify data members explicitly in the DEF file, so that no one
# can implicitly link against the DLL if they use one of the variables exported

View File

@ -151,16 +151,24 @@ class SinhArcsinhBijectorTest(test.TestCase):
self.assertAllClose(y, bijector.forward(x).eval(), rtol=1e-4, atol=0.)
self.assertAllClose(x, bijector.inverse(y).eval(), rtol=1e-4, atol=0.)
# Do the numpy calculation in float128 to avoid inf/nan.
y_float128 = np.float128(y)
self.assertAllClose(
np.log(np.cosh(
np.arcsinh(y_float128) / tailweight - skewness) / np.sqrt(
y_float128**2 + 1)) -
np.log(tailweight),
bijector.inverse_log_det_jacobian(y, event_ndims=0).eval(),
rtol=1e-4,
atol=0.)
# On IBM PPC systems, longdouble (np.float128) is same as double except that it can have more precision.
# Type double being of 8 bytes, can't hold square of max of float64 (which is also 8 bytes) and
# below test fails due to overflow error giving inf. So this check avoids that error by skipping square
# calculation and corresponding assert.
if np.amax(y) <= np.sqrt(np.finfo(np.float128).max) and \
np.fabs(np.amin(y)) <= np.sqrt(np.fabs(np.finfo(np.float128).min)):
# Do the numpy calculation in float128 to avoid inf/nan.
y_float128 = np.float128(y)
self.assertAllClose(
np.log(np.cosh(
np.arcsinh(y_float128) / tailweight - skewness) / np.sqrt(
y_float128**2 + 1)) -
np.log(tailweight),
bijector.inverse_log_det_jacobian(y, event_ndims=0).eval(),
rtol=1e-4,
atol=0.)
self.assertAllClose(
-bijector.inverse_log_det_jacobian(y, event_ndims=0).eval(),
bijector.forward_log_det_jacobian(x, event_ndims=0).eval(),

View File

@ -106,7 +106,8 @@ class Iterator(iterator_ops.EagerIterator, checkpointable.CheckpointableBase):
target_device=target,
buffer_size=10,
container="",
shared_name=_generate_shared_name("function_buffer_resource"))
shared_name=_generate_shared_name(
"contrib_eager_iterator_function_buffer_resource"))
self._buffer_resource_deleter = resource_variable_ops.EagerResourceDeleter( # pylint: disable=line-too-long
handle=self._buffer_resource_handle,
handle_device=self._device)

View File

@ -68,7 +68,7 @@
"# simply construct the object. Most layers take as a first argument the number\n",
"# of output dimensions / channels.\n",
"layer = tf.keras.layers.Dense(100)\n",
"# The number of input dimensionss is often unnecessary, as it can be inferred\n",
"# The number of input dimensions is often unnecessary, as it can be inferred\n",
"# the first time the layer is used, but it can be provided if you want to \n",
"# specify it manually, which is useful in some complex models.\n",
"layer = tf.keras.layers.Dense(10, input_shape=(None, 5))"
@ -267,7 +267,7 @@
" * `build`, where you know the shapes of the input tensors and can do the rest of the initialization\n",
" * `call`, where you do the forward computation\n",
"\n",
"Note that you don't have to wait until `build` is called to create your variables, you can also create them in `__init__`. However, the advantage of creating them in `build` is that it enables late variable creation based on the shape of the inputs the layer will operate on. On the other hand, creating variables in `__init__` would mean that shapes requires to create the variables will need to be explicitly specified."
"Note that you don't have to wait until `build` is called to create your variables, you can also create them in `__init__`. However, the advantage of creating them in `build` is that it enables late variable creation based on the shape of the inputs the layer will operate on. On the other hand, creating variables in `__init__` would mean that shapes required to create the variables will need to be explicitly specified."
]
},
{

View File

@ -346,7 +346,8 @@ def sequence_numeric_column(
key,
shape=(1,),
default_value=0.,
dtype=dtypes.float32):
dtype=dtypes.float32,
normalizer_fn=None):
"""Returns a feature column that represents sequences of numeric data.
Example:
@ -370,6 +371,12 @@ def sequence_numeric_column(
default_value: A single value compatible with `dtype` that is used for
padding the sparse data into a dense `Tensor`.
dtype: The type of values.
normalizer_fn: If not `None`, a function that can be used to normalize the
value of the tensor after `default_value` is applied for parsing.
Normalizer function takes the input `Tensor` as its argument, and returns
the output `Tensor`. (e.g. lambda x: (x - 3.0) / 4.2). Please note that
even though the most common use case of this function is normalization, it
can be used for any kind of Tensorflow transformations.
Returns:
A `_SequenceNumericColumn`.
@ -383,12 +390,16 @@ def sequence_numeric_column(
if not (dtype.is_integer or dtype.is_floating):
raise ValueError('dtype must be convertible to float. '
'dtype: {}, key: {}'.format(dtype, key))
if normalizer_fn is not None and not callable(normalizer_fn):
raise TypeError(
'normalizer_fn must be a callable. Given: {}'.format(normalizer_fn))
return _SequenceNumericColumn(
key,
shape=shape,
default_value=default_value,
dtype=dtype)
dtype=dtype,
normalizer_fn=normalizer_fn)
def _assert_all_equal_and_return(tensors, name=None):
@ -407,7 +418,7 @@ class _SequenceNumericColumn(
fc._SequenceDenseColumn,
collections.namedtuple(
'_SequenceNumericColumn',
['key', 'shape', 'default_value', 'dtype'])):
['key', 'shape', 'default_value', 'dtype', 'normalizer_fn'])):
"""Represents sequences of numeric data."""
@property
@ -419,7 +430,10 @@ class _SequenceNumericColumn(
return {self.key: parsing_ops.VarLenFeature(self.dtype)}
def _transform_feature(self, inputs):
return inputs.get(self.key)
input_tensor = inputs.get(self.key)
if self.normalizer_fn is not None:
input_tensor = self.normalizer_fn(input_tensor)
return input_tensor
@property
def _variable_shape(self):

View File

@ -28,6 +28,7 @@ from tensorflow.python.framework import dtypes
from tensorflow.python.framework import errors
from tensorflow.python.framework import ops
from tensorflow.python.framework import sparse_tensor
from tensorflow.python.ops import sparse_ops
from tensorflow.python.platform import test
from tensorflow.python.training import monitored_session
@ -947,6 +948,7 @@ class SequenceNumericColumnTest(test.TestCase):
self.assertEqual((1,), a.shape)
self.assertEqual(0., a.default_value)
self.assertEqual(dtypes.float32, a.dtype)
self.assertIsNone(a.normalizer_fn)
def test_shape_saved_as_tuple(self):
a = sfc.sequence_numeric_column('aaa', shape=[1, 2])
@ -965,6 +967,10 @@ class SequenceNumericColumnTest(test.TestCase):
ValueError, 'dtype must be convertible to float'):
sfc.sequence_numeric_column('aaa', dtype=dtypes.string)
def test_normalizer_fn_must_be_callable(self):
with self.assertRaisesRegexp(TypeError, 'must be a callable'):
sfc.sequence_numeric_column('aaa', normalizer_fn='NotACallable')
def test_get_sequence_dense_tensor(self):
sparse_input = sparse_tensor.SparseTensorValue(
# example 0, values [[0.], [1]]
@ -985,6 +991,41 @@ class SequenceNumericColumnTest(test.TestCase):
self.assertAllEqual(
expected_dense_tensor, dense_tensor.eval(session=sess))
def test_get_sequence_dense_tensor_with_normalizer_fn(self):
def _increment_two(input_sparse_tensor):
return sparse_ops.sparse_add(
input_sparse_tensor,
sparse_tensor.SparseTensor(((0, 0), (1, 1)), (2.0, 2.0), (2, 2))
)
sparse_input = sparse_tensor.SparseTensorValue(
# example 0, values [[0.], [1]]
# example 1, [[10.]]
indices=((0, 0), (0, 1), (1, 0)),
values=(0., 1., 10.),
dense_shape=(2, 2))
# Before _increment_two:
# [[0.], [1.]],
# [[10.], [0.]],
# After _increment_two:
# [[2.], [1.]],
# [[10.], [2.]],
expected_dense_tensor = [
[[2.], [1.]],
[[10.], [2.]],
]
numeric_column = sfc.sequence_numeric_column(
'aaa', normalizer_fn=_increment_two)
dense_tensor, _ = numeric_column._get_sequence_dense_tensor(
_LazyBuilder({'aaa': sparse_input}))
with monitored_session.MonitoredSession() as sess:
self.assertAllEqual(
expected_dense_tensor, dense_tensor.eval(session=sess))
def test_get_sequence_dense_tensor_with_shape(self):
"""Tests get_sequence_dense_tensor with shape !=(1,)."""
sparse_input = sparse_tensor.SparseTensorValue(

View File

@ -28,7 +28,6 @@ from __future__ import print_function
from tensorflow.contrib.ffmpeg.ffmpeg_ops import decode_audio
from tensorflow.contrib.ffmpeg.ffmpeg_ops import decode_video
from tensorflow.contrib.ffmpeg.ffmpeg_ops import encode_audio
from tensorflow.contrib.ffmpeg.ffmpeg_ops import decode_video
from tensorflow.python.util.all_util import remove_undocumented

View File

@ -21,7 +21,6 @@ from __future__ import print_function
from tensorflow.contrib.ffmpeg.ops import gen_decode_audio_op_py
from tensorflow.contrib.ffmpeg.ops import gen_decode_video_op_py
from tensorflow.contrib.ffmpeg.ops import gen_encode_audio_op_py
from tensorflow.contrib.ffmpeg.ops import gen_decode_video_op_py
from tensorflow.contrib.util import loader
from tensorflow.python.framework import ops
from tensorflow.python.platform import resource_loader

View File

@ -119,14 +119,13 @@ from tensorflow.python.framework.smart_cond import smart_cond
from tensorflow.python.framework.smart_cond import smart_constant_value
from tensorflow.python.framework.tensor_spec import BoundedTensorSpec
from tensorflow.python.framework.tensor_spec import TensorSpec
from tensorflow.python.ops.array_ops import broadcast_to
from tensorflow.python.ops.init_ops import convolutional_delta_orthogonal
from tensorflow.python.ops.init_ops import convolutional_orthogonal_1d
from tensorflow.python.ops.init_ops import convolutional_orthogonal_2d
from tensorflow.python.ops.init_ops import convolutional_orthogonal_3d
from tensorflow.python.util.all_util import remove_undocumented
_allowed_symbols = ['nest', 'broadcast_to']
_allowed_symbols = ['nest']
_nest_allowed_symbols = [
'assert_same_structure',
'is_sequence',

View File

@ -301,8 +301,8 @@ class FusedConv2DBiasActivationTest(test.TestCase):
conv = tensors[i]
value = values[i]
ref_value = ref_values[i]
print("expected = ", ref_value)
print("actual = ", value)
tf_logging.info("expected = ", ref_value)
tf_logging.info("actual = ", value)
tol = 1e-5
if value.dtype == np.float16:
tol = 1e-3
@ -843,7 +843,8 @@ class FusedConvInt8Tests(test.TestCase):
vertical_stride, padding_type)
output_width = CalculateConvolvedOutputDim(input_width, filter_width,
horizontal_stride, padding_type)
print("output_height=", output_height, ", output_width=", output_width)
tf_logging.info("output_height=", output_height, ", output_width=",
output_width)
side_input, _, _ = gen_array_ops.quantize_v2(
random_ops.random_uniform(
@ -880,8 +881,8 @@ class FusedConvInt8Tests(test.TestCase):
with self.test_session(
use_gpu=True, config=NoMemoryOptimizationConfig()) as sess:
actual_y, expected_y = sess.run([actual, expected])
print("actual_y = ", actual_y)
print("expected_y = ", expected_y)
tf_logging.info("actual_y = ", actual_y)
tf_logging.info("expected_y = ", expected_y)
self.assertTrue(np.array_equal(actual_y, expected_y))
def testFusedConvInt8(self):

View File

@ -19,7 +19,7 @@ limitations under the License.
#include "hexagon_controller.h"
#include <malloc.h>
#include <stdlib.h>
#include <stdio.h>
#include "adspmsgd.h"

View File

@ -30,9 +30,7 @@ if [ ! -f $BZL_FILE_PATH ]; then
fi
EIGEN_URL="$(grep -o 'http.*bitbucket.org/eigen/eigen/get/.*tar\.gz' "${BZL_FILE_PATH}" | grep -v mirror.bazel | head -n1)"
# TODO (yongtang): Replace the following with 'https://mirror.bazel.build/github.com/google/gemmlowp/.*zip' once
# the archive has been propagated in mirror.bazel.build.
GEMMLOWP_URL="$(grep -o 'https://github.com/google/gemmlowp/.*zip' "${BZL_FILE_PATH}" | head -n1)"
GEMMLOWP_URL="$(grep -o 'https://mirror.bazel.build/github.com/google/gemmlowp/.*zip' "${BZL_FILE_PATH}" | head -n1)"
GOOGLETEST_URL="https://github.com/google/googletest/archive/release-1.8.0.tar.gz"
ABSL_URL="$(grep -o 'https://github.com/abseil/abseil-cpp/.*tar.gz' "${BZL_FILE_PATH}" | head -n1)"
NEON_2_SSE_URL="https://github.com/intel/ARM_NEON_2_x86_SSE/archive/master.zip"

View File

@ -38,7 +38,7 @@ using namespace tflite;
int main(int argc, char *argv[]) {
if(argc != 2) {
fprintf(stderr, "Usage: %s <model>\n");
fprintf(stderr, "minimal <tflite model>\n");
return 1;
}
const char* filename = argv[1];

View File

@ -128,7 +128,6 @@ TensorFlow operation not listed above are likely unsupported. Notably, the
following common ops are not supported at the moment:
* [tf.depth_to_space](https://www.tensorflow.org/api_docs/python/tf/depth_to_space)
* [tf.gather](https://www.tensorflow.org/api_docs/python/tf/gather)
* [tf.image.resize_bilinear](https://www.tensorflow.org/api_docs/python/tf/image/resize_bilinear)
* [tf.tanh](https://www.tensorflow.org/api_docs/python/tf/tanh)
@ -306,6 +305,19 @@ Options {
}
```
**GATHER**
```
Inputs {
0: params tensor
1: indices tensor
2: axis tensor (optional)
}
Outputs {
0: a tensor with same type as the params tensor.
}
```
**GREATER**
```

View File

@ -2,7 +2,7 @@
This folder contains building code for track one of the [Low Power ImageNet Recognition Challenge workshop at CVPR 2018.](https://rebootingcomputing.ieee.org/home/sitemap/14-lpirc/80-low-power-image-recognition-challenge-lpirc-2018)
## Pre-requesits
## Pre-requisite
Follow the steps [here](https://www.tensorflow.org/mobile/tflite/demo_android) to install Tensorflow, Bazel, and the Android NDK and SDK.
@ -49,7 +49,7 @@ Once you have a submission that follows the instructions from the [competition s
You can call the validator binary below to verify that your model fits the format requirements. This often helps you to catch size mismatches (e.g. output should be [1, 1001] instead of [1,1,1,1001]). Let say the submission file is located at `/path/to/my_model.lite`, then call:
```sh
bazel build --cxxopt--std=c++11 //tensorflow/contrib/lite/java/ovic:ovic_validator --cxxopt=-Wno-all
bazel build --cxxopt=--std=c++11 //tensorflow/contrib/lite/java/ovic:ovic_validator --cxxopt=-Wno-all
bazel-bin/tensorflow/contrib/lite/java/ovic/ovic_validator /path/to/my_model.lite
```

View File

@ -1934,7 +1934,7 @@ inline void LstmCell(const float* input_data, const Dims<4>& input_dims,
// The quantization of the input, output arrays is as follows:
// - The input activations are quantized as uint8 on the interval
// [-1, 127/128].
// The rationale for that is that that is the natural interval for output
// The rationale for that is that is the natural interval for output
// activations (see next point) and these need to be concatenated together.
// We could accommodate different ranges by re-scaling, but we empirically
// found that setting the input activations range to be [-1, 127/128] in the
@ -1999,7 +1999,7 @@ inline void LstmCell(const float* input_data, const Dims<4>& input_dims,
// However, for a fixed-point implementation in 16-bit integers, using 5
// integer bits to represent the [-16, 16] range would leave only 11
// fractional bits, giving an increment of 2^-11 = 4.9e-4 between consecutive
// representable values. Notice that that is higher than the
// representable values. Notice that is higher than the
// worst-case clamping error with clamping to [-8, 8]: 3.4e-4 for Logistic.
// Using [-8, 8] thus seems like the better compromise overall, enjoying
// an increment of 2.4e-4 between representable values and a worst-case

View File

@ -55,7 +55,7 @@ class Interpreter(object):
elif model_content and not model_path:
self._interpreter = (
_interpreter_wrapper.InterpreterWrapper_CreateWrapperCPPFromBuffer(
model_content, len(model_content)))
model_content))
if not self._interpreter:
raise ValueError(
'Failed to create model from {} bytes'.format(len(model_content)))

View File

@ -397,9 +397,14 @@ InterpreterWrapper* InterpreterWrapper::CreateWrapperCPPFromFile(
}
InterpreterWrapper* InterpreterWrapper::CreateWrapperCPPFromBuffer(
const char* data, size_t len) {
PyObject* data) {
char * buf = nullptr;
Py_ssize_t length;
if (PY_TO_CPPSTRING(data, &buf, &length) == -1) {
return nullptr;
}
std::unique_ptr<tflite::FlatBufferModel> model =
tflite::FlatBufferModel::BuildFromBuffer(data, len);
tflite::FlatBufferModel::BuildFromBuffer(buf, length);
return model ? new InterpreterWrapper(std::move(model)) : nullptr;
}

View File

@ -40,8 +40,7 @@ class InterpreterWrapper {
static InterpreterWrapper* CreateWrapperCPPFromFile(const char* model_path);
// SWIG caller takes ownership of pointer.
static InterpreterWrapper* CreateWrapperCPPFromBuffer(const char* data,
size_t len);
static InterpreterWrapper* CreateWrapperCPPFromBuffer(PyObject* data);
~InterpreterWrapper();
bool AllocateTensors();

View File

@ -34,6 +34,8 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from six import PY3
from google.protobuf import text_format as _text_format
from google.protobuf.message import DecodeError
from tensorflow.contrib.lite.python import lite_constants as constants
@ -54,6 +56,7 @@ from tensorflow.python.framework.importer import import_graph_def
from tensorflow.python.ops.variables import global_variables_initializer
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import tag_constants
# from tensorflow.python.util.all_util import remove_undocumented
class TocoConverter(object):
@ -203,6 +206,12 @@ class TocoConverter(object):
except (_text_format.ParseError, DecodeError):
try:
print("Ignore 'tcmalloc: large alloc' warnings.")
if not isinstance(file_content, str):
if PY3:
file_content = file_content.decode('utf-8')
else:
file_content = file_content.encode('utf-8')
_text_format.Merge(file_content, graph_def)
except (_text_format.ParseError, DecodeError):
raise ValueError(
@ -382,3 +391,5 @@ def _freeze_graph(sess, output_tensors):
output_arrays)
else:
return sess.graph_def
# remove_undocumented(__name__)

View File

@ -178,7 +178,7 @@ ArrayDataType ConvertDataType(tensorflow::DataType dtype) {
else if (dtype == DT_STRING)
return ArrayDataType::kString;
else
LOG(INFO) << "Unsupported data type in placehoder op: " << dtype;
LOG(INFO) << "Unsupported data type in placeholder op: " << dtype;
return ArrayDataType::kNone;
}

View File

@ -20,6 +20,12 @@ limitations under the License.
#include "tensorflow/core/lib/core/status.h"
#include "tensorflow/core/platform/logging.h"
#if defined(__ANDROID__) && defined(__ARM_ARCH_7A__)
namespace std {
double round(double x) { return ::round(x); }
} // namespace std
#endif
namespace toco {
namespace port {
void CopyToBuffer(const string& src, char* dest) {

View File

@ -34,6 +34,24 @@ limitations under the License.
#define TFLITE_PROTO_NS google::protobuf
#endif
#ifdef __ANDROID__
#include <sstream>
namespace std {
template <typename T>
std::string to_string(T value)
{
std::ostringstream os ;
os << value ;
return os.str() ;
}
#ifdef __ARM_ARCH_7A__
double round(double x);
#endif
}
#endif
namespace toco {
namespace port {

View File

@ -270,7 +270,7 @@ for arch in $archs; do
PLATFORM_LDFLAGS=-pthread
MKDEP=${CC} -M -std=c++11
PLATFORM_C=../../platform/c++11/src/nsync_semaphore_mutex.cc \
../../platform/c++11/src/per_thread_waiter.cc \
../../platform/posix/src/per_thread_waiter.c \
../../platform/c++11/src/yield.cc \
../../platform/c++11/src/time_rep_timespec.cc \
../../platform/c++11/src/nsync_panic.cc

View File

@ -27,9 +27,7 @@ if [ ! -f $BZL_FILE_PATH ]; then
fi
EIGEN_URL="$(grep -o 'http.*bitbucket.org/eigen/eigen/get/.*tar\.gz' "${BZL_FILE_PATH}" | grep -v mirror.bazel | head -n1)"
# TODO (yongtang): Replace the following with 'https://mirror.bazel.build/github.com/google/gemmlowp/.*zip' once
# the archive has been propagated in mirror.bazel.build.
GEMMLOWP_URL="$(grep -o 'https://github.com/google/gemmlowp/.*zip' "${BZL_FILE_PATH}" | head -n1)"
GEMMLOWP_URL="$(grep -o 'https://mirror.bazel.build/github.com/google/gemmlowp/.*zip' "${BZL_FILE_PATH}" | head -n1)"
GOOGLETEST_URL="https://github.com/google/googletest/archive/release-1.8.0.tar.gz"
NSYNC_URL="$(grep -o 'https://mirror.bazel.build/github.com/google/nsync/.*tar\.gz' "${BZL_FILE_PATH}" | head -n1)"
PROTOBUF_URL="$(grep -o 'https://mirror.bazel.build/github.com/google/protobuf/.*tar\.gz' "${BZL_FILE_PATH}" | head -n1)"

View File

@ -2503,7 +2503,7 @@ def _compute_recall_at_precision(tp, fp, fn, precision, name):
name: An optional variable_scope name.
Returns:
The recall at a the given `precision`.
The recall at a given `precision`.
"""
precisions = math_ops.div(tp, tp + fp + _EPSILON)
tf_index = math_ops.argmin(

View File

@ -129,7 +129,7 @@ cudaStream_t CudaStreamForMPI();
* has the fully accumulated Segment 1; and so on. The scatter-reduce is
* complete.
*
* Next, the allgather distributes these fully accumululated chunks across all
* Next, the allgather distributes these fully accumulated chunks across all
* nodes. Communication proceeds in the same ring, once again in N-1 steps. At
* the ith step, node j will send chunk (j - i + 1) and receive chunk (j - i).
* For example, at the first iteration, the following transfers will occur:

View File

@ -224,8 +224,10 @@ class AdaMaxOptimizerTest(test.TestCase):
var1_np, m1, v1 = adamax_update_numpy(var1_np, grads1_np, t, m1, v1)
# Validate updated params
self.assertAllCloseAccordingToType(var0_np, self.evaluate(var0))
self.assertAllCloseAccordingToType(var1_np, self.evaluate(var1))
self.assertAllCloseAccordingToType(var0_np, self.evaluate(var0),
rtol=1e-2)
self.assertAllCloseAccordingToType(var1_np, self.evaluate(var1),
rtol=1e-2)
if use_resource:
self.assertEqual("var0_%d/AdaMax:0" % (i,),
opt.get_slot(var=var0, name="m").name)

View File

@ -62,7 +62,7 @@ class ModelAverageCustomGetter(object):
"""
def __init__(self, worker_device):
"""Create a new `ElasticAverageCustomGetter`.
"""Create a new `ModelAverageCustomGetter`.
Args:
worker_device: String. Name of the `worker` job.

View File

@ -6,12 +6,13 @@ exports_files(["LICENSE"])
load(
"//tensorflow:tensorflow.bzl",
"py_test",
"tf_cc_test",
"tf_gen_op_libs",
"tf_custom_op_library",
"tf_custom_op_py_library",
"tf_gen_op_wrapper_py",
)
load("//tensorflow:tensorflow.bzl", "py_test")
cc_library(
name = "all_ops",
@ -84,6 +85,23 @@ py_test(
":init_py",
"//tensorflow/contrib/util:util_py",
"//tensorflow/python:framework_test_lib",
"//tensorflow/python:gradient_checker",
],
)
tf_cc_test(
name = "periodic_resample_op_cc_test",
size = "small",
srcs = [
"ops/array_ops_test.cc",
],
deps = [
":all_ops",
"//tensorflow/core:framework",
"//tensorflow/core:protos_all_proto",
"//tensorflow/core:test",
"//tensorflow/core:test_main",
"//tensorflow/core:testlib",
],
)

View File

@ -22,4 +22,9 @@ namespace tensorflow {
REGISTER_KERNEL_BUILDER(Name("PeriodicResample").Device(DEVICE_CPU),
PeriodicResampleOp);
REGISTER_KERNEL_BUILDER(Name("PeriodicResampleOpGrad")
.Device(DEVICE_CPU),
PeriodicResampleOpGrad);
} // namespace tensorflow

View File

@ -25,92 +25,202 @@
#include "tensorflow/core/framework/shape_inference.h"
#include "tensorflow/core/framework/tensor_shape.h"
#include "tensorflow/core/lib/core/status.h"
#include "tensorflow/core/util/work_sharder.h"
namespace {
template <class IndexVecT, class IndexT>
IndexT compute_input_index(
IndexVecT* target_dimensions, const IndexT& output_index,
const IndexVecT& original_dimensions, const int& adjustable_dimension,
const std::vector<tensorflow::int64>& dimension_ceiling,
const std::vector<tensorflow::int64>& cumulative_dimensions, IndexT* result,
std::vector<IndexT>* output_indices, const int& rank) {
*result = 0;
output_indices->clear();
// Computes input tensor index for given output index during forward
// propagation through periodic_resample operation.
class InputIndexer {
public:
InputIndexer(const std::vector<tensorflow::int64>& output_dimensions,
const tensorflow::TensorShape& input_shape,
int adjustable_dimension)
: output_dimensions_(output_dimensions),
adjustable_dimension_(adjustable_dimension),
rank_(input_shape.dims()),
linear_output_index_(0),
linear_input_index_(0),
adjustable_dimension_carriage_sum_(0) {
auto input_dimensions = TensorShapeToVector(input_shape);
// factors by which input_dimensions increases/decreases w.r.t.
// output_dimensions
dimension_ceiling_ =
ComputeDimensionCeiling(output_dimensions, input_dimensions);
cumulative_dimensions_ = ComputeCumulativeDimensions();
output_indices_.resize(output_dimensions_.size());
input_indices_.resize(output_dimensions_.size());
// Compute index_factors
index_factors_.resize(rank_);
tensorflow::int64 last_index_factor = 1;
for (auto r = rank_ - 1; r >= 0; --r) {
index_factors_[r] = last_index_factor;
last_index_factor *= input_dimensions[r];
}
}
tensorflow::int64 linear_input_index() const { return linear_input_index_; }
void MoveToOutputIndex(tensorflow::int64 output_index);
void IncrementOutputIndex();
private:
void RecomputeInputAdjustableDimensionIndex() {
tensorflow::int64 index = adjustable_dimension_carriage_sum_;
index *= output_dimensions_[adjustable_dimension_];
index += output_indices_[adjustable_dimension_];
input_indices_[adjustable_dimension_] = index;
}
std::vector<tensorflow::int64> TensorShapeToVector(
const tensorflow::TensorShape& tensor_shape);
std::vector<tensorflow::int64> ComputeDimensionCeiling(
const std::vector<tensorflow::int64>& output_dimensions,
const std::vector<tensorflow::int64>& input_dimensions);
std::vector<tensorflow::int64> ComputeCumulativeDimensions();
const std::vector<tensorflow::int64> output_dimensions_;
std::vector<tensorflow::int64> dimension_ceiling_;
std::vector<tensorflow::int64> index_factors_;
std::vector<tensorflow::int64> cumulative_dimensions_;
std::vector<tensorflow::int64> output_indices_;
std::vector<tensorflow::int64> input_indices_;
const int adjustable_dimension_;
const int rank_;
tensorflow::int64 linear_output_index_;
tensorflow::int64 linear_input_index_;
tensorflow::int64 adjustable_dimension_carriage_sum_;
};
void InputIndexer::MoveToOutputIndex(tensorflow::int64 output_index) {
linear_output_index_ = output_index;
linear_input_index_ = 0;
// un-rasterize the output index
auto last_reduced_i = output_index;
for (auto r = rank - 1; r >= 0; --r) {
(*output_indices)[r] = last_reduced_i % (*target_dimensions)[r];
for (auto r = rank_ - 1; r >= 0; --r) {
output_indices_[r] = last_reduced_i % output_dimensions_[r];
last_reduced_i =
(last_reduced_i - (*output_indices)[r]) / (*target_dimensions)[r];
(last_reduced_i - output_indices_[r]) / output_dimensions_[r];
}
tensorflow::int64 carriage_sum = 0;
for (int qi = 0; qi < rank_; ++qi) {
if (qi == adjustable_dimension_) continue;
carriage_sum += cumulative_dimensions_[qi] *
(output_indices_[qi] % dimension_ceiling_[qi]);
}
adjustable_dimension_carriage_sum_ = carriage_sum;
// rasterize the input index
IndexT last_index_factor = 1;
for (auto r = rank - 1; r >= 0; --r) {
IndexT index = 0;
if (r != adjustable_dimension)
index = (*output_indices)[r] / dimension_ceiling[r];
else {
for (int qi = 0; qi < rank; ++qi) {
if (qi == adjustable_dimension) continue;
index += cumulative_dimensions[qi] *
((*output_indices)[qi] % dimension_ceiling[qi]);
}
index *= (*target_dimensions)[adjustable_dimension];
index += (*output_indices)[r];
for (auto r = rank_ - 1; r >= 0; --r) {
if (r != adjustable_dimension_) {
input_indices_[r] = output_indices_[r] / dimension_ceiling_[r];
} else {
RecomputeInputAdjustableDimensionIndex();
}
*result += last_index_factor * index;
last_index_factor *= original_dimensions[r];
}
return *result;
for (auto r = rank_ - 1; r >= 0; --r) {
linear_input_index_ += index_factors_[r] * input_indices_[r];
}
}
template <class InputDataT,
class IndexVecT> // both types are needed here b/c IndexVecT and
// InputDataT are not related
void
fill_periodic_tensor(
tensorflow::OpKernelContext* context,
const IndexVecT& desired_shape,
const tensorflow::Tensor& input_tensor) {
// input is a strided array (last index is fastest, C-ordered)
auto input = input_tensor.flat<InputDataT>();
const int rank = input_tensor.dims();
// original and target dimensions
std::vector<tensorflow::int64> original_dimensions(rank),
target_dimensions(rank);
tensorflow::int64 total_size(input_tensor.NumElements()), new_sliced_size(1);
// factors by which original_dimensions increases/decreases w.r.t.
// target_dimensions
std::vector<tensorflow::int64> dimension_ceiling(rank),
cumulative_dimensions(rank);
// index of adjustable dimension
int adjustable_dimension;
tensorflow::TensorShape output_shape;
void InputIndexer::IncrementOutputIndex() {
linear_output_index_++;
for (auto r = rank_ - 1; r >= 0; --r) {
auto old_carriage_sum_increment =
cumulative_dimensions_[r] *
(output_indices_[r] % dimension_ceiling_[r]);
output_indices_[r] = (output_indices_[r] + 1) % output_dimensions_[r];
if (r != adjustable_dimension_) {
auto new_input_index = output_indices_[r] / dimension_ceiling_[r];
linear_input_index_ +=
(new_input_index - input_indices_[r]) * index_factors_[r];
// requires that the rank of the input tensor and length of the desired shape
// are equal
OP_REQUIRES(context, rank == desired_shape.size(),
tensorflow::errors::InvalidArgument(
"periodic_resample expects the rank of the input tensor, ",
rank, ", to be the same as the length of the desired shape, ",
desired_shape.size(), "."));
input_indices_[r] = new_input_index;
auto new_carriage_sum_increment =
cumulative_dimensions_[r] *
(output_indices_[r] % dimension_ceiling_[r]);
adjustable_dimension_carriage_sum_ = adjustable_dimension_carriage_sum_ -
old_carriage_sum_increment +
new_carriage_sum_increment;
}
if (output_indices_[r] != 0) {
// No more carries to higher indices.
break;
}
}
auto old_adjustable_dimension_input_index =
input_indices_[adjustable_dimension_];
RecomputeInputAdjustableDimensionIndex();
linear_input_index_ += (input_indices_[adjustable_dimension_] -
old_adjustable_dimension_input_index) *
index_factors_[adjustable_dimension_];
}
std::vector<tensorflow::int64> InputIndexer::TensorShapeToVector(
const tensorflow::TensorShape& tensor_shape) {
std::vector<tensorflow::int64> result(tensor_shape.dims());
int count = 0;
for (const auto dim_info : tensor_shape) {
result[count] = dim_info.size;
++count;
}
return result;
}
std::vector<tensorflow::int64> InputIndexer::ComputeDimensionCeiling(
const std::vector<tensorflow::int64>& output_dimensions,
const std::vector<tensorflow::int64>& input_dimensions) {
std::vector<tensorflow::int64> dimension_ceiling(input_dimensions.size());
for (size_t i = 0; i < input_dimensions.size(); ++i) {
dimension_ceiling[i] = (output_dimensions[i] + input_dimensions[i] - 1) /
input_dimensions[i];
}
return dimension_ceiling;
}
std::vector<tensorflow::int64> InputIndexer::ComputeCumulativeDimensions() {
std::vector<tensorflow::int64> cumulative_dimensions(rank_);
int count = 0;
for (int i = 0; i < rank_; ++i) {
if (count == 0) {
cumulative_dimensions[count] = 1;
} else {
cumulative_dimensions[count] =
cumulative_dimensions[count - 1] * dimension_ceiling_[count - 1];
}
++count;
}
return cumulative_dimensions;
}
template <typename IndexVecT>
void process_desired_shape(tensorflow::OpKernelContext* context,
const tensorflow::TensorShape& input_tensor_shape,
const IndexVecT& desired_shape,
int* adjustable_dimension,
std::vector<tensorflow::int64>* target_dimensions,
tensorflow::int64* output_size) {
tensorflow::int64 new_sliced_size = 1;
bool found = false;
const auto& input_tensor_shape = input_tensor.shape();
const int rank = input_tensor_shape.dims();
for (int i = 0; i < rank; ++i) {
// if (desired_shape(i) < 1) {
if (desired_shape[i] < 1) {
// only one index can be adjustable
OP_REQUIRES(context, !found,
tensorflow::errors::InvalidArgument(
"periodic_resample expects only "
"one index to be marked as adjustable."));
adjustable_dimension = i;
*adjustable_dimension = i;
found = true;
} else {
OP_REQUIRES(
@ -122,9 +232,8 @@ template <class InputDataT,
i, " input tensor has size ", input_tensor_shape.dim_size(i),
", desired shape has size ", desired_shape[i], "."));
// target_dimensions[i] = desired_shape(i);
target_dimensions[i] = desired_shape[i];
new_sliced_size *= target_dimensions[i];
(*target_dimensions)[i] = desired_shape[i];
new_sliced_size *= (*target_dimensions)[i];
}
}
// at least one index needs to be adjustable
@ -132,26 +241,50 @@ template <class InputDataT,
tensorflow::errors::InvalidArgument(
"periodic_resample expects at least "
"one index to be marked as adjustable."));
(*target_dimensions)[*adjustable_dimension] =
input_tensor_shape.num_elements() / new_sliced_size;
int count = 0;
for (const auto dim_info : input_tensor.shape()) {
original_dimensions[count] = dim_info.size;
++count;
}
*output_size = new_sliced_size * (*target_dimensions)[*adjustable_dimension];
}
target_dimensions[adjustable_dimension] = total_size / new_sliced_size;
// Heuristic number based on measurements on
// Intel(R) Core(TM) i7-4930K CPU @ 3.40GHz
const tensorflow::int64 costPerFillIndex = 35;
count = 0;
for (int i = 0; i < input_tensor.shape().dims(); ++i) {
dimension_ceiling[count] = tensorflow::int64(std::ceil(
float(target_dimensions[count]) / float(original_dimensions[count])));
if (count == 0)
cumulative_dimensions[count] = 1;
else
cumulative_dimensions[count] =
cumulative_dimensions[count - 1] * dimension_ceiling[count - 1];
++count;
}
enum class Mode {
kForward,
kGradient
};
// Computes either periodic_resample operation output or gradients for it,
// depending on |mode|.
// |original_shape| is always shape of input to periodic_resample operation.
// |source_tensor| is either source for periodic_resample (for forward mode)
// or gradients tensor.
// |desired_shape| is always shape, provided by user, to which forward
// propagation attempts resample input tensor.
template <class InputDataT, Mode mode>
void
do_periodic_resample_op(tensorflow::OpKernelContext* context,
const tensorflow::TensorShape& original_shape,
const tensorflow::PartialTensorShape& desired_shape,
const tensorflow::Tensor& source_tensor) {
const int rank = source_tensor.dims();
// requires that the rank of the input tensor and length of the desired shape
// are equal
OP_REQUIRES(context, rank == desired_shape.dims(),
tensorflow::errors::InvalidArgument(
"periodic_resample expects the rank of the input tensor, ",
rank, ", to be the same as the length of the desired shape, ",
desired_shape.dims(), "."));
std::vector<tensorflow::int64> target_dimensions(rank);
tensorflow::int64 new_size = 0;
// index of adjustable dimension
int adjustable_dimension = 0;
process_desired_shape(context, original_shape, desired_shape.dim_sizes(),
&adjustable_dimension, &target_dimensions, &new_size);
// ensure that the new dimension is greater than zero
OP_REQUIRES(context, target_dimensions[adjustable_dimension] > 0,
@ -160,11 +293,14 @@ template <class InputDataT,
"adjustable dimension, ",
adjustable_dimension, ", isn't greater than zero, ",
target_dimensions[adjustable_dimension], "."));
for (int i = 0; i < rank; ++i) {
output_shape.AddDim(target_dimensions[i]);
tensorflow::TensorShape output_shape;
if (mode == Mode::kForward) {
for (int i = 0; i < rank; ++i) {
output_shape.AddDim(target_dimensions[i]);
}
} else {
output_shape = original_shape;
}
const auto new_size =
new_sliced_size * target_dimensions[adjustable_dimension];
// Create an output tensor and attach it to the current context
tensorflow::Tensor* output_tensor = nullptr;
@ -172,47 +308,73 @@ template <class InputDataT,
context->allocate_output(0, output_shape, &output_tensor));
auto output = output_tensor->flat<InputDataT>();
// memory is allocated for these variables outside the inner loop for
// efficiency (although, I could create a separate class scope for
// this purpose instead)
tensorflow::int64 result = 0;
std::vector<tensorflow::int64> output_indices(target_dimensions.size());
// input is a strided array (last index is fastest, C-ordered)
auto input = source_tensor.flat<InputDataT>();
// Fill output tensor with periodically resampled input tensor values
for (tensorflow::int64 output_index = 0; output_index < new_size;
++output_index) {
output(output_index) = input(compute_input_index(
&target_dimensions, output_index, original_dimensions,
adjustable_dimension, dimension_ceiling, cumulative_dimensions, &result,
&output_indices, rank));
}
InputIndexer input_indexer(target_dimensions, original_shape,
adjustable_dimension);
auto worker_threads = *(context->device()->tensorflow_cpu_worker_threads());
auto fill_output_tensor = [&input_indexer, &output, &input](
tensorflow::int64 start, tensorflow::int64 limit) {
InputIndexer local_indexer(input_indexer);
local_indexer.MoveToOutputIndex(start);
for (tensorflow::int64 output_index = start; output_index < limit;
++output_index) {
if (mode == Mode::kForward) {
output(output_index) = input(local_indexer.linear_input_index());
} else {
output(local_indexer.linear_input_index()) = input(output_index);
}
local_indexer.IncrementOutputIndex();
}
};
::tensorflow::Shard(worker_threads.num_threads, worker_threads.workers,
new_size, costPerFillIndex, fill_output_tensor);
}
#define DATA_TYPE_SWITCH(data_type, context, CASE) \
switch (data_type) { \
CASE(float) \
CASE(double) \
CASE(tensorflow::int32) \
CASE(tensorflow::int64) \
default: \
context->CtxFailure(__FILE__, __LINE__, \
tensorflow::errors::InvalidArgument( \
"Unsuppored tensor elements type")); \
break; \
}
void create_output_tensor(
tensorflow::OpKernelContext* context,
const tensorflow::Tensor& input_tensor,
const tensorflow::DataType& input_tensor_type,
const tensorflow::PartialTensorShape& desired_shape_tensor) {
auto desired_shape = desired_shape_tensor.dim_sizes();
const tensorflow::PartialTensorShape& desired_shape) {
#define CASE(type) \
case tensorflow::DataTypeToEnum<type>::value: \
do_periodic_resample_op<type, Mode::kForward>( \
context, input_tensor.shape(), desired_shape, input_tensor); \
break;
// obligatory type switch
switch (input_tensor_type) {
case tensorflow::DataTypeToEnum<float>::value:
fill_periodic_tensor<float>(context, desired_shape, input_tensor);
DATA_TYPE_SWITCH(input_tensor_type, context, CASE);
#undef CASE
}
void create_grad_tensor(tensorflow::OpKernelContext* context,
const tensorflow::Tensor& grad_tensor,
const tensorflow::DataType& grad_tensor_type,
const tensorflow::TensorShape& original_shape,
const tensorflow::PartialTensorShape& desired_shape) {
#define CASE(type) \
case tensorflow::DataTypeToEnum<type>::value: \
do_periodic_resample_op<type, Mode::kGradient>( \
context, original_shape, desired_shape, grad_tensor); \
break;
case tensorflow::DataTypeToEnum<double>::value:
fill_periodic_tensor<double>(context, desired_shape, input_tensor);
break;
case tensorflow::DataTypeToEnum<tensorflow::int32>::value:
fill_periodic_tensor<tensorflow::int32>(context, desired_shape,
input_tensor);
break;
case tensorflow::DataTypeToEnum<tensorflow::int64>::value:
fill_periodic_tensor<tensorflow::int64>(context, desired_shape,
input_tensor);
break;
default:;
}
DATA_TYPE_SWITCH(grad_tensor_type, context, CASE);
#undef CASE
}
} // namespace
@ -238,4 +400,25 @@ class PeriodicResampleOp : public tensorflow::OpKernel {
tensorflow::PartialTensorShape desired_shape;
};
class PeriodicResampleOpGrad : public tensorflow::OpKernel {
public:
explicit PeriodicResampleOpGrad(tensorflow::OpKernelConstruction* context)
: tensorflow::OpKernel(context) {
OP_REQUIRES_OK(context,
context->GetAttr("original_shape", &original_shape));
OP_REQUIRES_OK(context, context->GetAttr("desired_shape", &desired_shape));
}
void Compute(tensorflow::OpKernelContext* context) override {
const tensorflow::Tensor& grad_tensor = context->input(0);
const tensorflow::DataType grad_tensor_type = context->input_dtype(0);
create_grad_tensor(context, grad_tensor, grad_tensor_type, original_shape,
desired_shape);
}
private:
tensorflow::TensorShape original_shape;
tensorflow::PartialTensorShape desired_shape;
};
#endif // TENSORFLOW_KERNELS_PERIODICRESAMPLE_OP_H_

View File

@ -26,7 +26,42 @@ REGISTER_OP("PeriodicResample")
.Input("values: T")
.Attr("shape: shape")
.Output("output: T")
.SetShapeFn(shape_inference::ExplicitShape)
.SetShapeFn([](shape_inference::InferenceContext* c) {
tensorflow::PartialTensorShape desired_shape;
TF_RETURN_IF_ERROR(c->GetAttr("shape", &desired_shape));
shape_inference::ShapeHandle input_tensor_shape = c->input(0);
shape_inference::DimensionHandle num_input_elements =
c->NumElements(input_tensor_shape);
shape_inference::ShapeHandle result_shape_handle;
if (!shape_inference::InferenceContext::ValueKnown(num_input_elements)) {
TF_RETURN_IF_ERROR(c->MakeShapeFromPartialTensorShape(
desired_shape, &result_shape_handle));
} else {
const int rank = c->Rank(input_tensor_shape);
std::vector<tensorflow::int64> target_dimensions(rank);
tensorflow::int64 new_sliced_size = 1;
int adjustable_dimension = 0;
for (int i = 0; i < rank; ++i) {
if (desired_shape.dim_size(i) < 1) {
adjustable_dimension = i;
} else {
target_dimensions[i] = desired_shape.dim_size(i);
new_sliced_size *= target_dimensions[i];
}
}
target_dimensions[adjustable_dimension] =
shape_inference::InferenceContext::Value(
num_input_elements) / new_sliced_size;
tensorflow::TensorShape result_shape;
for (int i = 0; i < rank; ++i) {
result_shape.AddDim(target_dimensions[i]);
}
TF_RETURN_IF_ERROR(c->MakeShapeFromTensorShape(
result_shape, &result_shape_handle));
}
c->set_output(0, result_shape_handle);
return Status::OK();
})
.Doc(R"doc(
Periodically resample elements of a tensor to conform to `shape`.
@ -101,4 +136,20 @@ output: Periodically resampled tensor that has dimensions specified as in
)doc");
REGISTER_OP("PeriodicResampleOpGrad")
.Attr("T: numbertype")
.Input("grad: T")
.Attr("original_shape: shape")
.Attr("desired_shape: shape")
.Output("grad_values: T")
.SetShapeFn([](shape_inference::InferenceContext* c) {
tensorflow::TensorShape original_shape;
TF_RETURN_IF_ERROR(c->GetAttr("original_shape", &original_shape));
shape_inference::ShapeHandle s;
TF_RETURN_IF_ERROR(c->MakeShapeFromTensorShape(original_shape, &s));
c->set_output(0, s);
return Status::OK();
});
} // namespace tensorflow

View File

@ -0,0 +1,41 @@
/* Copyright 2018 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#include "tensorflow/core/framework/node_def_builder.h"
#include "tensorflow/core/framework/shape_inference_testutil.h"
#include "tensorflow/core/framework/tensor_shape.pb.h"
#include "tensorflow/core/framework/tensor_testutil.h"
#include "tensorflow/core/lib/core/status_test_util.h"
#include "tensorflow/core/platform/test.h"
namespace tensorflow {
TEST(ArrayOpsTest, PeriodicResample_ShapeFn) {
ShapeInferenceTestOp op("PeriodicResample");
// Case 1: output shape can be fully inferreed.
PartialTensorShape shape({4, 4, -1});
TensorShapeProto shape_proto;
shape.AsProto(&shape_proto);
TF_ASSERT_OK(NodeDefBuilder("test", "PeriodicResample")
.Input({"values", 0, DT_INT32})
.Attr("shape", shape_proto)
.Finalize(&op.node_def));
INFER_OK(op, "[2,2,4]", "[4,4,1]");
// Case 2: output shape can not be inferred - report desired shape.
INFER_OK(op, "[2,2,?]", "[4,4,?]");
}
} // end namespace tensorflow

View File

@ -21,8 +21,11 @@ from __future__ import print_function
import numpy
from tensorflow.contrib.periodic_resample import periodic_resample
from tensorflow.python.framework import dtypes
from tensorflow.python.framework import errors_impl
from tensorflow.python.framework import test_util
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import gradient_checker
from tensorflow.python.ops import variables
from tensorflow.python.platform import googletest
@ -93,7 +96,6 @@ class PeriodicResampleTest(test_util.TensorFlowTestCase):
def testPeriodicResampleErrors(self):
input_tensor = numpy.zeros(shape=[1, 2, 2, 4])
with self.test_session():
variables.global_variables_initializer().run()
with self.assertRaisesWithPredicateMatch(
errors_impl.InvalidArgumentError,
'Dimension 3 input tensor has size 4, desired shape has size 1'):
@ -103,6 +105,29 @@ class PeriodicResampleTest(test_util.TensorFlowTestCase):
'4, to be the same as the length of the desired shape, 3'):
periodic_resample(input_tensor, [None, 4, 4]).eval()
def testPeriodicResampleGradient(self):
desired_shape = numpy.array([4, 4, None])
result_shape = (4, 4, 1)
input_shape = (2, 2, 4)
with self.test_session() as sess:
x = array_ops.placeholder(dtypes.float32, shape=input_shape)
output = periodic_resample(x, desired_shape)
error = gradient_checker.compute_gradient_error(
x, input_shape, output, result_shape)
self.assertLess(error, 1e-4)
def testPeriodicResampleShapeInference(self):
with self.test_session() as sess:
# Case 1: output shape can be fully inferreed.
x = array_ops.placeholder(dtypes.float32, shape=(2, 2, 4))
output = periodic_resample(x, [4, 4, None])
self.assertEqual(output.shape, [4, 4, 1])
# Case 2: output shape can not be inferred - report desired shape.
x = array_ops.placeholder(dtypes.float32, shape=(2, 2, None))
output = periodic_resample(x, [4, 4, None])
self.assertTrue(output.shape.is_compatible_with([4, 4, None]))
self.assertEqual(output.shape[2].value, None)
if __name__ == '__main__':
googletest.main()

View File

@ -21,11 +21,17 @@ from __future__ import print_function
# pylint: disable=unused-import
from tensorflow.contrib.periodic_resample.python.ops import gen_periodic_resample_op
from tensorflow.contrib.periodic_resample.python.ops.gen_periodic_resample_op import periodic_resample
from tensorflow.contrib.periodic_resample.python.ops.gen_periodic_resample_op import periodic_resample, periodic_resample_op_grad
from tensorflow.contrib.util import loader
from tensorflow.python.framework import ops
from tensorflow.python.platform import resource_loader
# pylint: enable=unused-import
_periodic_resample_op = loader.load_op_library(
resource_loader.get_path_to_datafile('_periodic_resample_op.so'))
@ops.RegisterGradient("PeriodicResample")
def _periodic_resample_grad_cc(op, grad):
return periodic_resample_op_grad(
grad, op.inputs[0].shape, op.get_attr('shape'))

View File

@ -34,7 +34,8 @@ class ContribEstimatorPredictor(predictor.Predictor):
prediction_input_fn,
input_alternative_key=None,
output_alternative_key=None,
graph=None):
graph=None,
config=None):
"""Initialize a `ContribEstimatorPredictor`.
Args:
@ -48,6 +49,7 @@ class ContribEstimatorPredictor(predictor.Predictor):
multi-headed models.
graph: Optional. The Tensorflow `graph` in which prediction should be
done.
config: `ConfigProto` proto used to configure the session.
"""
self._graph = graph or ops.Graph()
with self._graph.as_default():
@ -58,6 +60,7 @@ class ContribEstimatorPredictor(predictor.Predictor):
checkpoint_path = saver.latest_checkpoint(estimator.model_dir)
self._session = monitored_session.MonitoredSession(
session_creator=monitored_session.ChiefSessionCreator(
config=config,
checkpoint_filename_with_path=checkpoint_path))
input_alternative_key = (

View File

@ -51,7 +51,8 @@ class CoreEstimatorPredictor(predictor.Predictor):
estimator,
serving_input_receiver_fn,
output_key=None,
graph=None):
graph=None,
config=None):
"""Initialize a `CoreEstimatorPredictor`.
Args:
@ -62,6 +63,7 @@ class CoreEstimatorPredictor(predictor.Predictor):
`None`, then `DEFAULT_SERVING_SIGNATURE_DEF_KEY` is used.
graph: Optional. The Tensorflow `graph` in which prediction should be
done.
config: `ConfigProto` proto used to configure the session.
"""
self._graph = graph or ops.Graph()
with self._graph.as_default():
@ -71,6 +73,7 @@ class CoreEstimatorPredictor(predictor.Predictor):
checkpoint_dir = estimator.model_dir
self._session = monitored_session.MonitoredSession(
session_creator=monitored_session.ChiefSessionCreator(
config=config,
checkpoint_dir=checkpoint_dir))
feed_tensor_info = signature_def.inputs

View File

@ -30,7 +30,8 @@ def from_contrib_estimator(estimator,
prediction_input_fn,
input_alternative_key=None,
output_alternative_key=None,
graph=None):
graph=None,
config=None):
"""Constructs a `Predictor` from a `tf.contrib.learn.Estimator`.
Args:
@ -44,6 +45,7 @@ def from_contrib_estimator(estimator,
multi-headed models.
graph: Optional. The Tensorflow `graph` in which prediction should be
done.
config: `ConfigProto` proto used to configure the session.
Returns:
An initialized `Predictor`.
@ -62,13 +64,15 @@ def from_contrib_estimator(estimator,
prediction_input_fn,
input_alternative_key=input_alternative_key,
output_alternative_key=output_alternative_key,
graph=graph)
graph=graph,
config=config)
def from_estimator(estimator,
serving_input_receiver_fn,
output_key=None,
graph=None):
graph=None,
config=None):
"""Constructs a `Predictor` from a `tf.python.estimator.Estimator`.
Args:
@ -79,6 +83,7 @@ def from_estimator(estimator,
`None`, then `DEFAULT_SERVING_SIGNATURE_DEF_KEY` is used.
graph: Optional. The Tensorflow `graph` in which prediction should be
done.
config: `ConfigProto` proto used to configure the session.
Returns:
An initialized `Predictor`.
@ -93,14 +98,19 @@ def from_estimator(estimator,
'tf.contrib.learn.Estimator. You likely want to call '
'from_contrib_estimator.')
return core_estimator_predictor.CoreEstimatorPredictor(
estimator, serving_input_receiver_fn, output_key=output_key, graph=graph)
estimator,
serving_input_receiver_fn,
output_key=output_key,
graph=graph,
config=config)
def from_saved_model(export_dir,
signature_def_key=None,
signature_def=None,
tags=None,
graph=None):
graph=None,
config=None):
"""Constructs a `Predictor` from a `SavedModel` on disk.
Args:
@ -115,6 +125,7 @@ def from_saved_model(export_dir,
`SignatureDef`. Defaults to `DEFAULT_TAGS`.
graph: Optional. The Tensorflow `graph` in which prediction should be
done.
config: `ConfigProto` proto used to configure the session.
Returns:
An initialized `Predictor`.
@ -128,4 +139,5 @@ def from_saved_model(export_dir,
signature_def_key=signature_def_key,
signature_def=signature_def,
tags=tags,
graph=graph)
graph=graph,
config=config)

View File

@ -20,6 +20,7 @@ from __future__ import print_function
from tensorflow.contrib.predictor import predictor_factories
from tensorflow.contrib.predictor import testing_common
from tensorflow.core.protobuf import config_pb2
from tensorflow.python.platform import test
MODEL_DIR_NAME = 'contrib/predictor/test_export_dir'
@ -41,6 +42,11 @@ class PredictorFactoriesTest(test.TestCase):
"""Test loading from_saved_model with tags."""
predictor_factories.from_saved_model(self._export_dir, tags='serve')
def testFromSavedModelWithSessionConfig(self):
"""Test loading from_saved_model with session config."""
predictor_factories.from_saved_model(
self._export_dir, config=config_pb2.ConfigProto())
def testFromSavedModelWithBadTags(self):
"""Test that loading fails for bad tags."""
bad_tags_regex = ('.*? could not be found in SavedModel')
@ -53,6 +59,13 @@ class PredictorFactoriesTest(test.TestCase):
predictor_factories.from_contrib_estimator(
estimator, input_fn, output_alternative_key='sum')
def testFromContribEstimatorWithSessionConfig(self):
estimator = testing_common.get_arithmetic_estimator(core=False)
input_fn = testing_common.get_arithmetic_input_fn(core=False)
predictor_factories.from_contrib_estimator(
estimator, input_fn, output_alternative_key='sum',
config=config_pb2.ConfigProto())
def testFromContribEstimatorWithCoreEstimatorRaises(self):
estimator = testing_common.get_arithmetic_estimator(core=True)
input_fn = testing_common.get_arithmetic_input_fn(core=True)
@ -64,6 +77,12 @@ class PredictorFactoriesTest(test.TestCase):
input_fn = testing_common.get_arithmetic_input_fn(core=True)
predictor_factories.from_estimator(estimator, input_fn)
def testFromCoreEstimatorWithSessionConfig(self):
estimator = testing_common.get_arithmetic_estimator(core=True)
input_fn = testing_common.get_arithmetic_input_fn(core=True)
predictor_factories.from_estimator(
estimator, input_fn, config=config_pb2.ConfigProto())
def testFromCoreEstimatorWithContribEstimatorRaises(self):
estimator = testing_common.get_arithmetic_estimator(core=False)
input_fn = testing_common.get_arithmetic_input_fn(core=False)

View File

@ -121,7 +121,8 @@ class SavedModelPredictor(predictor.Predictor):
input_names=None,
output_names=None,
tags=None,
graph=None):
graph=None,
config=None):
"""Initialize a `CoreEstimatorPredictor`.
Args:
@ -142,6 +143,7 @@ class SavedModelPredictor(predictor.Predictor):
the correct `SignatureDef`. Defaults to `DEFAULT_TAGS`.
graph: Optional. The Tensorflow `graph` in which prediction should be
done.
config: `ConfigProto` proto used to configure the session.
Raises:
ValueError: If more than one of signature_def_key OR signature_def OR
(input_names AND output_names) is specified.
@ -152,7 +154,7 @@ class SavedModelPredictor(predictor.Predictor):
self._graph = graph or ops.Graph()
with self._graph.as_default():
self._session = session.Session()
self._session = session.Session(config=config)
loader.load(self._session, tags.split(','), export_dir)
if input_names is None:

View File

@ -6,7 +6,7 @@ inference. The details of the transformation implemented in this package is
described here [1].
This is done using the
[fake quantization op](https://www.tensorflow.org/versions/r0.12/api_docs/python/array_ops/fake_quantization).
[fake quantization op](https://www.tensorflow.org/api_guides/python/array_ops#Fake_quantization).
Literature has shown that fixed point networks provide comparable performance to
floating point networks [2]. This is achieved by modeling the quantization

View File

@ -26,7 +26,6 @@ import time
import numpy as np
from tensorflow.contrib.framework.python.ops import variables as variables_lib
from tensorflow.contrib.metrics.python.ops import metric_ops
from tensorflow.contrib.slim.python.slim import evaluation
from tensorflow.contrib.training.python.training import evaluation as evaluation_lib
from tensorflow.core.protobuf import saver_pb2
@ -37,6 +36,7 @@ from tensorflow.python.framework import dtypes
from tensorflow.python.framework import errors
from tensorflow.python.ops import control_flow_ops
from tensorflow.python.ops import math_ops
from tensorflow.python.ops import metrics
from tensorflow.python.ops import variables
from tensorflow.python.platform import flags
from tensorflow.python.platform import gfile
@ -89,8 +89,8 @@ class EvaluationTest(test.TestCase):
self._predictions, self._scale = TestModel(self._inputs)
def testFinalOpsOnEvaluationLoop(self):
value_op, update_op = metric_ops.streaming_accuracy(self._predictions,
self._labels)
value_op, update_op = metrics.accuracy(
labels=self._labels, predictions=self._predictions)
init_op = control_flow_ops.group(variables.global_variables_initializer(),
variables.local_variables_initializer())
# Create checkpoint and log directories:
@ -136,9 +136,10 @@ class EvaluationTest(test.TestCase):
self.assertTrue(obj.hook_was_run)
def _create_names_to_metrics(self, predictions, labels):
accuracy0, update_op0 = metric_ops.streaming_accuracy(predictions, labels)
accuracy1, update_op1 = metric_ops.streaming_accuracy(predictions + 1,
labels)
accuracy0, update_op0 = metrics.accuracy(
labels=labels, predictions=predictions)
accuracy1, update_op1 = metrics.accuracy(
labels=labels, predictions=predictions + 1)
names_to_values = {'Accuracy': accuracy0, 'Another_accuracy': accuracy1}
names_to_updates = {'Accuracy': update_op0, 'Another_accuracy': update_op1}
@ -198,8 +199,8 @@ class EvaluationTest(test.TestCase):
predictions_limited = input.limit_epochs(self._predictions, num_epochs=1)
labels_limited = input.limit_epochs(self._labels, num_epochs=1)
value_op, update_op = metric_ops.streaming_accuracy(
predictions_limited, labels_limited)
value_op, update_op = metrics.accuracy(
labels=labels_limited, predictions=predictions_limited)
init_op = control_flow_ops.group(variables.global_variables_initializer(),
variables.local_variables_initializer())
@ -260,8 +261,8 @@ class SingleEvaluationTest(test.TestCase):
self._prepareCheckpoint(checkpoint_path)
# Next, determine the metric to evaluate:
value_op, update_op = metric_ops.streaming_accuracy(self._predictions,
self._labels)
value_op, update_op = metrics.accuracy(
labels=self._labels, predictions=self._predictions)
# Run the evaluation and verify the results:
accuracy_value = evaluation.evaluate_once(
@ -276,8 +277,8 @@ class SingleEvaluationTest(test.TestCase):
self._prepareCheckpoint(checkpoint_path)
# Next, determine the metric to evaluate:
value_op, update_op = metric_ops.streaming_accuracy(self._predictions,
self._labels)
value_op, update_op = metrics.accuracy(
labels=self._labels, predictions=self._predictions)
dumping_root = os.path.join(self.get_temp_dir(), 'tfdbg_dump_dir')
dumping_hook = hooks.DumpingDebugHook(dumping_root, log_usage=False)

View File

@ -21,6 +21,7 @@ from @{tf.summary.merge_all} to @{tf.summary.FileWriter}.
To use with eager execution enabled, write your code as follows:
```python
global_step = tf.train.get_or_create_global_step()
summary_writer = tf.contrib.summary.create_file_writer(
train_dir, flush_millis=10000)
@ -30,9 +31,11 @@ with summary_writer.as_default(), tf.contrib.summary.always_record_summaries():
tf.contrib.summary.scalar("loss", my_loss)
# In this case every call to tf.contrib.summary.scalar will generate a record
# ...
```
To use it with graph execution, write your code as follows:
```python
global_step = tf.train.get_or_create_global_step()
summary_writer = tf.contrib.summary.create_file_writer(
train_dir, flush_millis=10000)
@ -53,7 +56,7 @@ with tf.Session(...) as sess:
while not_done_training:
sess.run([train_op, tf.contrib.summary.all_summary_ops()])
# ...
```
"""
from __future__ import absolute_import

View File

@ -21,10 +21,10 @@ import numpy as np
from tensorflow.contrib import losses
from tensorflow.contrib.learn.python.learn.estimators import prediction_key
from tensorflow.contrib.metrics.python.ops import metric_ops
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import math_ops
from tensorflow.python.ops import metrics
from tensorflow.python.ops import nn
INFERENCE_PROB_NAME = prediction_key.PredictionKey.PROBABILITIES
@ -38,12 +38,13 @@ def _top_k_generator(k):
targets = math_ops.to_int32(targets)
if targets.get_shape().ndims > 1:
targets = array_ops.squeeze(targets, axis=[1])
return metric_ops.streaming_mean(nn.in_top_k(probabilities, targets, k))
return metrics.mean(nn.in_top_k(probabilities, targets, k))
return _top_k
def _accuracy(predictions, targets, weights=None):
return metric_ops.streaming_accuracy(predictions, targets, weights=weights)
return metrics.accuracy(
labels=targets, predictions=predictions, weights=weights)
def _r2(probabilities, targets, weights=None):
@ -53,7 +54,7 @@ def _r2(probabilities, targets, weights=None):
squares_residuals = math_ops.reduce_sum(
math_ops.square(targets - probabilities), 0)
score = 1 - math_ops.reduce_sum(squares_residuals / squares_total)
return metric_ops.streaming_mean(score, weights=weights)
return metrics.mean(score, weights=weights)
def _squeeze_and_onehot(targets, depth):
@ -62,7 +63,7 @@ def _squeeze_and_onehot(targets, depth):
def _sigmoid_entropy(probabilities, targets, weights=None):
return metric_ops.streaming_mean(
return metrics.mean(
losses.sigmoid_cross_entropy(probabilities,
_squeeze_and_onehot(
targets,
@ -71,7 +72,7 @@ def _sigmoid_entropy(probabilities, targets, weights=None):
def _softmax_entropy(probabilities, targets, weights=None):
return metric_ops.streaming_mean(
return metrics.mean(
losses.sparse_softmax_cross_entropy(probabilities,
math_ops.to_int32(targets)),
weights=weights)
@ -82,7 +83,7 @@ def _predictions(predictions, unused_targets, **unused_kwargs):
def _class_log_loss(probabilities, targets, weights=None):
return metric_ops.streaming_mean(
return metrics.mean(
losses.log_loss(probabilities,
_squeeze_and_onehot(targets,
array_ops.shape(probabilities)[1])),
@ -90,34 +91,36 @@ def _class_log_loss(probabilities, targets, weights=None):
def _precision(predictions, targets, weights=None):
return metric_ops.streaming_precision(predictions, targets, weights=weights)
return metrics.precision(
labels=targets, predictions=predictions, weights=weights)
def _precision_at_thresholds(predictions, targets, weights=None):
return metric_ops.streaming_precision_at_thresholds(
array_ops.slice(predictions, [0, 1], [-1, 1]),
targets,
np.arange(
0, 1, 0.01, dtype=np.float32),
return metrics.precision_at_thresholds(
labels=targets,
predictions=array_ops.slice(predictions, [0, 1], [-1, 1]),
thresholds=np.arange(0, 1, 0.01, dtype=np.float32),
weights=weights)
def _recall(predictions, targets, weights=None):
return metric_ops.streaming_recall(predictions, targets, weights=weights)
return metrics.recall(
labels=targets, predictions=predictions, weights=weights)
def _recall_at_thresholds(predictions, targets, weights=None):
return metric_ops.streaming_recall_at_thresholds(
array_ops.slice(predictions, [0, 1], [-1, 1]),
targets,
np.arange(
0, 1, 0.01, dtype=np.float32),
return metrics.recall_at_thresholds(
labels=targets,
predictions=array_ops.slice(predictions, [0, 1], [-1, 1]),
thresholds=np.arange(0, 1, 0.01, dtype=np.float32),
weights=weights)
def _auc(probs, targets, weights=None):
return metric_ops.streaming_auc(array_ops.slice(probs, [0, 1], [-1, 1]),
targets, weights=weights)
return metrics.auc(
labels=targets,
predictions=array_ops.slice(probs, [0, 1], [-1, 1]),
weights=weights)
_EVAL_METRICS = {

View File

@ -295,7 +295,7 @@ def get_epoch_variable():
# A simple container to hold the training variables for a single tree.
class TreeTrainingVariables(object):
class TreeVariables(object):
"""Stores tf.Variables for training a single random tree.
Uses tf.get_variable to get tree-specific names so that this can be used
@ -303,7 +303,7 @@ class TreeTrainingVariables(object):
then relies on restoring that model to evaluate).
"""
def __init__(self, params, tree_num, training):
def __init__(self, params, tree_num, training, tree_config='', tree_stat=''):
if (not hasattr(params, 'params_proto') or
not isinstance(params.params_proto,
_params_proto.TensorForestParams)):
@ -315,27 +315,28 @@ class TreeTrainingVariables(object):
# TODO(gilberth): Manually shard this to be able to fit it on
# multiple machines.
self.stats = stats_ops.fertile_stats_variable(
params, '', self.get_tree_name('stats', tree_num))
params, tree_stat, self.get_tree_name('stats', tree_num))
self.tree = model_ops.tree_variable(
params, '', self.stats, self.get_tree_name('tree', tree_num))
params, tree_config, self.stats, self.get_tree_name('tree', tree_num))
def get_tree_name(self, name, num):
return '{0}-{1}'.format(name, num)
class ForestTrainingVariables(object):
class ForestVariables(object):
"""A container for a forests training data, consisting of multiple trees.
Instantiates a TreeTrainingVariables object for each tree. We override the
Instantiates a TreeVariables object for each tree. We override the
__getitem__ and __setitem__ function so that usage looks like this:
forest_variables = ForestTrainingVariables(params)
forest_variables = ForestVariables(params)
... forest_variables.tree ...
"""
def __init__(self, params, device_assigner, training=True,
tree_variables_class=TreeTrainingVariables):
tree_variables_class=TreeVariables,
tree_configs=None, tree_stats=None):
self.variables = []
# Set up some scalar variables to run through the device assigner, then
# we can use those to colocate everything related to a tree.
@ -347,7 +348,13 @@ class ForestTrainingVariables(object):
for i in range(params.num_trees):
with ops.device(self.device_dummies[i].device):
self.variables.append(tree_variables_class(params, i, training))
kwargs = {}
if tree_configs is not None:
kwargs.update(dict(tree_config=tree_configs[i]))
if tree_stats is not None:
kwargs.update(dict(tree_stat=tree_stats[i]))
self.variables.append(tree_variables_class(
params, i, training, **kwargs))
def __setitem__(self, t, val):
self.variables[t] = val
@ -361,9 +368,11 @@ class RandomForestGraphs(object):
def __init__(self,
params,
tree_configs=None,
tree_stats=None,
device_assigner=None,
variables=None,
tree_variables_class=TreeTrainingVariables,
tree_variables_class=TreeVariables,
tree_graphs=None,
training=True):
self.params = params
@ -371,9 +380,10 @@ class RandomForestGraphs(object):
device_assigner or framework_variables.VariableDeviceChooser())
logging.info('Constructing forest with params = ')
logging.info(self.params.__dict__)
self.variables = variables or ForestTrainingVariables(
self.variables = variables or ForestVariables(
self.params, device_assigner=self.device_assigner, training=training,
tree_variables_class=tree_variables_class)
tree_variables_class=tree_variables_class,
tree_configs=tree_configs, tree_stats=tree_stats)
tree_graph_class = tree_graphs or RandomTreeGraphs
self.trees = [
tree_graph_class(self.variables[i], self.params, i)

View File

@ -18,10 +18,14 @@ from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from google.protobuf.json_format import ParseDict
from tensorflow.contrib.decision_trees.proto import generic_tree_model_pb2 as _tree_proto
from tensorflow.contrib.tensor_forest.python import tensor_forest
from tensorflow.python.framework import ops
from tensorflow.python.framework import sparse_tensor
from tensorflow.python.framework import test_util
from tensorflow.python.ops import resources
from tensorflow.python.ops import variables
from tensorflow.python.platform import googletest
@ -110,6 +114,47 @@ class TensorForestTest(test_util.TensorFlowTestCase):
self.assertTrue(isinstance(paths, ops.Tensor))
self.assertTrue(isinstance(var, ops.Tensor))
def testInfrenceFromRestoredModel(self):
input_data = [[-1., 0.], [-1., 2.], # node 1
[1., 0.], [1., -2.]] # node 2
expected_prediction = [[0.0, 1.0], [0.0, 1.0],
[0.0, 1.0], [0.0, 1.0]]
hparams = tensor_forest.ForestHParams(
num_classes=2,
num_features=2,
num_trees=1,
max_nodes=1000,
split_after_samples=25).fill()
tree_weight = {'decisionTree':
{'nodes':
[{'binaryNode':
{'rightChildId': 2,
'leftChildId': 1,
'inequalityLeftChildTest':
{'featureId': {'id': '0'},
'threshold': {'floatValue': 0}}}},
{'leaf': {'vector':
{'value': [{'floatValue': 0.0},
{'floatValue': 1.0}]}},
'nodeId': 1},
{'leaf': {'vector':
{'value': [{'floatValue': 0.0},
{'floatValue': 1.0}]}},
'nodeId': 2}]}}
restored_tree_param = ParseDict(tree_weight,
_tree_proto.Model()).SerializeToString()
graph_builder = tensor_forest.RandomForestGraphs(hparams,
[restored_tree_param])
probs, paths, var = graph_builder.inference_graph(input_data)
self.assertTrue(isinstance(probs, ops.Tensor))
self.assertTrue(isinstance(paths, ops.Tensor))
self.assertTrue(isinstance(var, ops.Tensor))
with self.test_session():
variables.global_variables_initializer().run()
resources.initialize_resources(resources.shared_resources()).run()
self.assertEquals(probs.eval().shape, (4, 2))
self.assertEquals(probs.eval().tolist(), expected_prediction)
def testTrainingConstructionClassificationSparse(self):
input_data = sparse_tensor.SparseTensor(
indices=[[0, 0], [0, 3], [1, 0], [1, 7], [2, 1], [3, 9]],

View File

@ -91,8 +91,11 @@ void GetSubGraphIncomingEdges(const tensorflow::Graph& graph,
if (!subgraph_node_ids.count(edge->src()->id()) &&
!edge->src()->IsSource() && !edge->IsControlEdge()) {
incoming_edges->insert(edge);
VLOG(2) << "INCOMING " << edge->src()->name() << " -> " << node->name()
<< " Y, ";
} else {
VLOG(2) << node->name() << " -> " << edge->src()->name() << " N, ";
VLOG(2) << "INCOMING " << edge->src()->name() << " -> " << node->name()
<< " N, ";
}
}
}
@ -106,10 +109,12 @@ void GetSubGraphOutgoingEdges(const tensorflow::Graph& graph,
for (const tensorflow::Edge* edge : node->out_edges()) {
if (!subgraph_node_ids.count(edge->dst()->id()) &&
!edge->dst()->IsSink() && !edge->IsControlEdge()) {
VLOG(2) << node->name() << " -> " << edge->dst()->name() << " Y, ";
VLOG(2) << "OUTGOING " << node->name() << " -> " << edge->dst()->name()
<< " Y, ";
outgoing_edges->insert(edge);
} else {
VLOG(2) << node->name() << " -> " << edge->dst()->name() << " N, ";
VLOG(2) << "OUTGOING " << node->name() << " -> " << edge->dst()->name()
<< " N, ";
}
}
}
@ -181,29 +186,27 @@ struct ConvertGraphParams {
static tensorflow::Status FillSubGraphEdgeSets(ConvertGraphParams* p) {
GetSubGraphIncomingEdges(p->graph, p->subgraph_node_ids,
&p->subgraph_incoming_edges);
std::set<std::pair<int, int>> unique_tensors;
// Add only unique input source nodes. If output of an outside node is shared
// between multiple nodes inside the engine, only one edge should be created
for (const tensorflow::Edge* edge : p->subgraph_incoming_edges) {
p->subgraph_inputs.push_back({edge->src()->id(), edge->src_output()});
}
auto output_name_to_index_map = BuildTensorNameMap(p->output_names);
std::set<std::pair<int, int>> subgraph_outputs_set;
// Collect outputs referenced from output_names
for (int node_id : p->subgraph_node_ids) {
tensorflow::Node* node = p->graph.FindNodeId(node_id);
if (output_name_to_index_map.count(node->name())) {
for (int index : output_name_to_index_map.at(node->name())) {
subgraph_outputs_set.insert({node_id, index});
}
}
unique_tensors.insert({edge->src()->id(), edge->src_output()});
}
p->subgraph_inputs.insert(p->subgraph_inputs.begin(), unique_tensors.begin(),
unique_tensors.end());
GetSubGraphOutgoingEdges(p->graph, p->subgraph_node_ids,
&p->subgraph_outgoing_edges);
unique_tensors.clear();
// Similar to above, if multiple ouside nodes are sharing the output of an
// internal node only one output port should be created and shared between
// outputs
for (const tensorflow::Edge* edge : p->subgraph_outgoing_edges) {
subgraph_outputs_set.insert({edge->src()->id(), edge->src_output()});
unique_tensors.insert({edge->src()->id(), edge->src_output()});
}
p->subgraph_outputs.reserve(subgraph_outputs_set.size());
p->subgraph_outputs.reserve(unique_tensors.size());
p->subgraph_outputs.insert(p->subgraph_outputs.begin(),
subgraph_outputs_set.begin(),
subgraph_outputs_set.end());
unique_tensors.begin(), unique_tensors.end());
return tensorflow::Status::OK();
}
@ -225,7 +228,6 @@ tensorflow::Status GetCalibNode(ConvertGraphParams* params) {
for (auto in_edge :
params->subgraph_incoming_edges) { // loop over incoming edges and
// attach them to calib node
// tensorflow::Node* src_node = in_edge->src();
auto src_output = in_edge->src_output();
auto dst_node = in_edge->dst();
auto dst_input = in_edge->dst_input();
@ -257,19 +259,24 @@ tensorflow::Status ConvertSubGraphToTensorRT(ConvertGraphParams* params) {
for (size_t i = 0; i < params->subgraph_inputs.size(); ++i) {
subgraph_edge_to_input_map.insert({params->subgraph_inputs.at(i), i});
}
std::set<std::pair<int, int>> unique_tensors;
for (const tensorflow::Edge* edge : params->subgraph_incoming_edges) {
std::pair<int, int> old_src = {edge->src()->id(), edge->src_output()};
if (unique_tensors.count(old_src)) continue;
unique_tensors.insert(old_src);
int new_src_output = subgraph_edge_to_input_map.at(old_src);
params->graph.AddEdge(edge->src(), edge->src_output(), trt_node,
new_src_output);
VLOG(1) << "Wire " << edge->src()->name() << ":" << edge->src_output()
<< " -> " << trt_node->name() << ":" << new_src_output;
params->graph.RemoveEdge(edge);
}
VLOG(2) << "new wiring edges: " << trt_node->in_edges().size();
for (const tensorflow::Edge* edge : trt_node->in_edges()) {
VLOG(2) << edge->src()->name() << " port: " << edge->src_output();
if (VLOG_IS_ON(2)) {
VLOG(2) << "new edge count: " << trt_node->in_edges().size();
for (const tensorflow::Edge* edge : trt_node->in_edges()) {
VLOG(2) << edge->src()->name() << " port: " << edge->src_output();
}
}
TF_RETURN_IF_ERROR(status);
// Re-map outgoing edges to use the new TRT node instead of the orig subgraph
@ -283,6 +290,8 @@ tensorflow::Status ConvertSubGraphToTensorRT(ConvertGraphParams* params) {
int new_src_output = subgraph_edge_to_output_map.at(old_src);
TF_RETURN_IF_ERROR(params->graph.UpdateEdge(
trt_node, new_src_output, edge->dst(), edge->dst_input()));
VLOG(1) << "Wire " << trt_node->name() << ":" << new_src_output << " -> "
<< edge->dst()->name() << ":" << edge->dst_input();
}
// Remove the original subgraph
for (int node_id : params->subgraph_node_ids) {
@ -317,9 +326,12 @@ tensorflow::Status ConvertCalibGraphToInferGraph(
tensorflow::GraphConstructorOptions(), graph_def, &graph));
// get calib nodes
std::vector<tensorflow::Node*> calib_nodes;
for (auto node : graph.op_nodes()) {
std::vector<tensorflow::Node*> topo_order;
tensorflow::GetPostOrder(graph, &topo_order);
for (auto rit = topo_order.rbegin(); rit != topo_order.rend(); ++rit) {
auto node = *rit;
if (node->type_string() == "TRTCalibOp") {
VLOG(1) << "Found Calib Node";
VLOG(1) << "Found Calib Node " << node->name();
calib_nodes.push_back(node);
}
}

View File

@ -362,10 +362,11 @@ void ReorderCKtoKC(const TRT_ShapedWeights& iweights,
break;
}
case tensorflow::DataType::DT_HALF: {
Reorder2({k, c}, static_cast<Eigen::half const*>(iweights.GetValues()),
istrides, static_cast<Eigen::half*>(
const_cast<void*>(oweights->GetValues())),
ostrides);
Reorder2(
{k, c}, static_cast<Eigen::half const*>(iweights.GetValues()),
istrides,
static_cast<Eigen::half*>(const_cast<void*>(oweights->GetValues())),
ostrides);
break;
}
default:
@ -1179,9 +1180,9 @@ tensorflow::Status BinaryTensorOpTensor(
CHECK_EQ_TYPE(tensor_r->getType(), dtype);
auto op_pair = ops.find(node_def.op());
if (op_pair == ops.end())
return tensorflow::errors::Unimplemented("binary op: " + node_def.op() +
" not supported at: " +
node_def.name());
return tensorflow::errors::Unimplemented(
"binary op: " + node_def.op() +
" not supported at: " + node_def.name());
nvinfer1::IElementWiseLayer* layer = ctx.network()->addElementWise(
*const_cast<nvinfer1::ITensor*>(tensor_l),
@ -2138,9 +2139,7 @@ void Converter::register_op_converters() {
}
} // namespace
tensorflow::Status GetTensorRTGraph(tensorrt::convert::SubGraphParams& s) {
return tensorflow::errors::Unimplemented("Not implemented yet");
}
tensorflow::Status ConvertCalibrationNodeToEngineNode(
tensorflow::Graph& graph, tensorflow::Node* c_node) {
const auto ndef = c_node->def();
@ -2164,9 +2163,23 @@ tensorflow::Status ConvertCalibrationNodeToEngineNode(
for (auto n : graph.op_nodes()) {
node_maps.insert({n->name(), n});
}
std::set<int> subgraph_ids;
for (const auto internal_node : segment_nodes) {
subgraph_ids.insert(node_maps.at(internal_node)->id());
}
if (VLOG_IS_ON(2)) {
string node_names = StrCat(c_node->name(), " segment nodes= ");
for (const auto& node_name : segment_nodes) {
StrAppend(&node_names, node_name, ", ");
}
VLOG(2) << node_names;
}
VLOG(1) << "Output Nodes:";
std::vector<tensorflow::DataType> out_types;
std::vector<const tensorflow::Edge*> out_edges;
for (auto& i : output_nodes) {
auto node_port = tensorflow::str_util::Split(i, ":");
VLOG(1) << " " << i << " in graph " << node_maps.count(i);
@ -2186,18 +2199,24 @@ tensorflow::Status ConvertCalibrationNodeToEngineNode(
out_types.push_back(out_node->output_type(0));
}
for (auto out_edge : out_node->out_edges()) {
if (subgraph_ids.count(out_edge->dst()->id()))
continue; // skip internal edges;
if (out_edge->src_output() == port) {
out_edges.push_back(out_edge);
break;
VLOG(1) << "OUTPUT EDGE " << out_edge->src()->name() << ":"
<< out_edge->src_output() << " -> " << out_edge->dst()->name()
<< ":" << out_edge->dst_input();
}
}
} else {
LOG(WARNING) << " couldn't find output node " << out_node_name;
}
}
VLOG(1) << "Input Nodes:";
for (auto& i : input_names) {
VLOG(1) << " " << i << " in graph " << node_maps.count(i);
if (VLOG_IS_ON(1)) {
VLOG(1) << c_node->name() << " Input Nodes:";
for (auto& i : input_names) {
VLOG(1) << " Input " << i << " in graph " << node_maps.count(i);
}
}
auto trt_rm = tensorflow::tensorrt::TRTResourceManager::instance();
auto resmgr = trt_rm->getManager("TRTCalibOps");
@ -2231,14 +2250,24 @@ tensorflow::Status ConvertCalibrationNodeToEngineNode(
calib_res->builder_ = nullptr;
tensorflow::NodeDefBuilder op_builder(engine_name, "TRTEngineOp");
std::vector<tensorflow::NodeDefBuilder::NodeOut> income_edges;
income_edges.resize(c_node->num_inputs());
for (const auto in_edge : c_node->in_edges()) {
auto src = in_edge->src();
int dest_port = in_edge->dst_input();
income_edges.emplace_back(src->name(), in_edge->src_output(),
c_node->input_type(dest_port));
VLOG(1) << "Incoming connection " << src->name() << ":"
<< in_edge->src_output() << " -> " << c_node->name() << ":"
<< dest_port;
income_edges.at(dest_port) = {src->name(), in_edge->src_output(),
c_node->input_type(dest_port)};
}
tensorflow::gtl::ArraySlice<tensorflow::NodeDefBuilder::NodeOut> input_list(
income_edges);
if (VLOG_IS_ON(2)) {
for (const auto& inp : input_list) {
VLOG(2) << " Input from inputlist " << inp.node << ":" << inp.index << " "
<< tensorflow::DataTypeString(inp.data_type);
}
}
op_builder.Input(input_list);
tensorflow::NodeDef engine_node;
const char* engine_plan_data = static_cast<const char*>(engine_plan->data());
@ -2255,13 +2284,26 @@ tensorflow::Status ConvertCalibrationNodeToEngineNode(
}
auto trt_engine_node = graph.AddNode(engine_node, &status);
TF_RETURN_IF_ERROR(status);
for (size_t i = 0; i < out_edges.size(); i++) {
VLOG(1) << "Connecting trt_engine_node output " << i << " with "
<< out_edges.at(i)->dst()->name() << " port "
<< out_edges.at(i)->dst_input();
TF_RETURN_IF_ERROR(graph.UpdateEdge(trt_engine_node, i,
out_edges.at(i)->dst(),
out_edges.at(i)->dst_input()));
std::map<string, int> port_map;
for (size_t t = 0; t < output_nodes.size(); t++) {
port_map.insert({output_nodes.at(t), t});
}
for (auto& i : out_edges) {
string s(i->src()->name());
if (i->src_output()) StrAppend(&s, ":", i->src_output());
int out_port = port_map.at(s);
VLOG(1) << "Connecting " << trt_engine_node->name() << ":" << out_port
<< " -> " << i->dst()->name() << ":" << i->dst_input();
TF_RETURN_IF_ERROR(
graph.UpdateEdge(trt_engine_node, out_port, i->dst(), i->dst_input()));
}
for (const auto ed : trt_engine_node->in_edges()) {
VLOG(1) << "In Edge " << ed->src()->name() << ":" << ed->src_output()
<< " -> " << ed->dst()->name() << ":" << ed->dst_input();
}
for (const auto ed : trt_engine_node->out_edges()) {
VLOG(1) << "Out Edge " << ed->src()->name() << ":" << ed->src_output()
<< " -> " << ed->dst()->name() << ":" << ed->dst_input();
}
VLOG(1) << "Segment nodes:";
for (auto& i : segment_nodes) {
@ -2332,6 +2374,7 @@ tensorflow::Status ConvertSubgraph(
std::vector<string>* output_names,
std::vector<tensorflow::DataType>* output_dtypes,
const string& engine_name) {
std::set<string> added_tensors;
for (const std::pair<int, int>& input : s.input_inds) {
VLOG(2) << "parsing input. Node id= " << input.first;
int node_id = input.first;
@ -2374,7 +2417,6 @@ tensorflow::Status ConvertSubgraph(
auto op_info = op_info_vec.at(shape_inference_output_idx);
tensorflow::DataType tf_dtype = op_info.dtype();
input_dtypes->push_back(tf_dtype);
nvinfer1::DataType dtype(nvinfer1::DataType::kFLOAT);
auto type_status = ConvertDType(tf_dtype, &dtype);
@ -2410,8 +2452,10 @@ tensorflow::Status ConvertSubgraph(
if (output_idx != 0) {
input_tensor_name = StrCat(node_name, ":", output_idx);
}
if (added_tensors.count(input_tensor_name)) continue;
added_tensors.insert(input_tensor_name);
input_names->push_back(input_tensor_name);
input_dtypes->push_back(tf_dtype);
nvinfer1::ITensor* input_tensor = converter.network()->addInput(
input_tensor_name.c_str(), dtype, input_dim_pseudo_chw);
@ -2435,6 +2479,7 @@ tensorflow::Status ConvertSubgraph(
// Gather output metadata
int trt_engine_op_output_idx = 0;
added_tensors.clear();
for (const std::pair<int, int>& output : s.output_inds) {
int node_id = output.first;
int output_idx = output.second;
@ -2451,6 +2496,8 @@ tensorflow::Status ConvertSubgraph(
if (output_idx != 0)
tensorflow::strings::StrAppend(&tensor_name, ":", output_idx);
VLOG(2) << "Output tensor name: " << tensor_name;
if (added_tensors.count(tensor_name)) continue;
added_tensors.insert(tensor_name);
output_names->push_back(tensor_name);
auto tensor_or_weights = converter.get_tensor(tensor_name);
if (!tensor_or_weights.is_tensor()) {

View File

@ -166,11 +166,21 @@ def StreamingFilesDataset(files,
return remote_iterator.get_next()
def MapFn(unused_input):
return functional_ops.remote_call(
if isinstance(source_dataset.output_types, dtypes.DType):
output_types = [source_dataset.output_types]
elif isinstance(source_dataset.output_types, (list, tuple)):
output_types = source_dataset.output_types
else:
raise ValueError('source dataset has invalid output types')
remote_calls = functional_ops.remote_call(
args=[source_handle],
Tout=[dtypes.string],
Tout=output_types,
f=LoadingFunc,
target='/job:%s/replica:0/task:0/cpu:0' % file_reader_job)[0]
target='/job:%s/replica:0/task:0/cpu:0' % file_reader_job)
if len(remote_calls) == 1:
return remote_calls[0]
else:
return remote_calls
with ops.device('/job:%s' % worker_job):
output_dataset = dataset_ops.Dataset.range(2).repeat().map(

View File

@ -26,6 +26,8 @@ from tensorflow.core.protobuf import config_pb2
from tensorflow.python.client import session
from tensorflow.python.data.ops import dataset_ops
from tensorflow.python.data.ops import readers
from tensorflow.python.framework import dtypes
from tensorflow.python.framework import tensor_shape
from tensorflow.python.lib.io import python_io
from tensorflow.python.platform import test
from tensorflow.python.training import server_lib
@ -162,6 +164,30 @@ class DatasetsTest(test.TestCase):
self.assertEqual(set(all_contents), set(retrieved_values))
def testArbitraryReaderFuncFromDatasetGenerator(self):
def my_generator():
yield (1, [1] * 10)
def gen_dataset(dummy):
return dataset_ops.Dataset.from_generator(
my_generator, (dtypes.int64, dtypes.int64),
(tensor_shape.TensorShape([]), tensor_shape.TensorShape([10])))
dataset = datasets.StreamingFilesDataset(
dataset_ops.Dataset.range(10), filetype=gen_dataset)
iterator = dataset.make_initializable_iterator()
self._sess.run(iterator.initializer)
get_next = iterator.get_next()
retrieved_values = self._sess.run(get_next)
self.assertIsInstance(retrieved_values, (list, tuple))
self.assertEqual(len(retrieved_values), 2)
self.assertEqual(retrieved_values[0], 1)
self.assertItemsEqual(retrieved_values[1], [1] * 10)
def testUnexpectedFiletypeString(self):
with self.assertRaises(ValueError):
datasets.StreamingFilesDataset(

View File

@ -699,7 +699,9 @@ cc_library(
srcs = ["platform/stacktrace_handler.cc"],
hdrs = ["platform/stacktrace_handler.h"],
deps = [
":abi",
":lib_platform",
":stacktrace",
],
)
@ -3089,6 +3091,8 @@ cc_library(
# we now need at least "str_util".
":lib",
":lib_platform",
":stacktrace_handler",
":test_lite",
"//tensorflow/core/platform/default/build_config:test_lite_main",
],
alwayslink = 1,
@ -3569,7 +3573,10 @@ tf_cc_tests_gpu(
tf_cc_test_mkl(
name = "mkl_runtime_tests",
size = "small",
srcs = ["common_runtime/mkl_cpu_allocator_test.cc"],
srcs = [
"common_runtime/mkl_cpu_allocator_test.cc",
"common_runtime/mkl_threadpool_device_test.cc",
],
linkstatic = 1,
deps = [
":core",

View File

@ -4,6 +4,10 @@ op {
description: <<END
if < 0, `scale * features` otherwise.
To be used together with
`initializer = tf.variance_scaling_initializer(factor=1.0, mode='FAN_IN')`.
For correct dropout, use `tf.contrib.nn.alpha_dropout`.
See [Self-Normalizing Neural Networks](https://arxiv.org/abs/1706.02515)
END
}

View File

@ -0,0 +1,48 @@
op {
graph_op_name: "StringSplitV2"
in_arg {
name: "input"
description: <<END
`1-D` string `Tensor`, the strings to split.
END
}
in_arg {
name: "sep"
description: <<END
`0-D` string `Tensor`, the delimiter character.
END
}
attr {
name: "maxsplit"
description: <<END
An `int`. If `maxsplit > 0`, limit of the split of the result.
END
}
summary: "Split elements of `source` based on `sep` into a `SparseTensor`."
description: <<END
Let N be the size of source (typically N will be the batch size). Split each
element of `source` based on `sep` and return a `SparseTensor`
containing the split tokens. Empty tokens are ignored.
For example, N = 2, source[0] is 'hello world' and source[1] is 'a b c',
then the output will be
```
st.indices = [0, 0;
0, 1;
1, 0;
1, 1;
1, 2]
st.shape = [2, 3]
st.values = ['hello', 'world', 'a', 'b', 'c']
```
If `sep` is given, consecutive delimiters are not grouped together and are
deemed to delimit empty strings. For example, source of `"1<>2<><>3"` and
sep of `"<>"` returns `["1", "2", "", "3"]`. If `sep` is None or an empty
string, consecutive whitespace are regarded as a single separator, and the
result will contain no empty strings at the startor end if the string has
leading or trailing whitespace.
Note that the above mentioned behavior matches python's str.split.
END
}

View File

@ -0,0 +1,4 @@
op {
graph_op_name: "StringSplitV2"
visibility: HIDDEN
}

View File

@ -86,7 +86,7 @@ BFCAllocator::Chunk* BFCAllocator::ChunkFromHandle(ChunkHandle h) {
return &(chunks_[h]);
}
bool BFCAllocator::Extend(size_t rounded_bytes) {
bool BFCAllocator::Extend(size_t alignment, size_t rounded_bytes) {
size_t available_bytes = memory_limit_ - total_region_allocated_bytes_;
// Rounds available_bytes down to the nearest multiple of kMinAllocationSize.
available_bytes = (available_bytes / kMinAllocationSize) * kMinAllocationSize;
@ -108,7 +108,7 @@ bool BFCAllocator::Extend(size_t rounded_bytes) {
// Try allocating.
size_t bytes = std::min(curr_region_allocation_bytes_, available_bytes);
void* mem_addr = suballocator_->Alloc(32, bytes);
void* mem_addr = suballocator_->Alloc(alignment, bytes);
if (mem_addr == nullptr && !started_backpedal_) {
// Only backpedal once.
started_backpedal_ = true;
@ -119,7 +119,7 @@ bool BFCAllocator::Extend(size_t rounded_bytes) {
while (mem_addr == nullptr) {
bytes = RoundedBytes(bytes * kBackpedalFactor);
if (bytes < rounded_bytes) break;
mem_addr = suballocator_->Alloc(32, bytes);
mem_addr = suballocator_->Alloc(alignment, bytes);
}
}
@ -261,7 +261,7 @@ void* BFCAllocator::AllocateRawInternal(size_t unused_alignment,
}
// Try to extend
if (Extend(rounded_bytes)) {
if (Extend(unused_alignment, rounded_bytes)) {
ptr = FindChunkPtr(bin_num, rounded_bytes, num_bytes);
if (ptr != nullptr) {
return ptr;

View File

@ -305,7 +305,8 @@ class BFCAllocator : public VisitableAllocator {
// Try to add a new memory region that can satisfy an allocation of
// 'rounded_bytes' bytes. Returns true on success and false on
// failure.
bool Extend(size_t rounded_bytes) EXCLUSIVE_LOCKS_REQUIRED(lock_);
bool Extend(size_t alignment, size_t rounded_bytes)
EXCLUSIVE_LOCKS_REQUIRED(lock_);
// Returns a pointer to an underlying allocated chunk of size
// 'rounded_bytes'.

View File

@ -102,9 +102,25 @@ TEST(DirectSessionWithTrackingAllocTest, CostModelTest) {
EXPECT_EQ(2, shape.dim(0).size());
EXPECT_EQ(1, shape.dim(1).size());
if (node->name() == y->name()) {
#ifdef INTEL_MKL
// if MKL is used, it goes through various additional
// graph rewrite pass. In TF, everytime a graph pass
// happens, "constant" nodes are allocated
// and deallocated. Each allocation calls the
// (FindChunkPtr of BFCAllocator),
// which increments the value of AllocationId.
// Thus AllocationId becomes more than 3 and 4 if
// MKL is used. Now they are 9 and 10 for MKL.
EXPECT_EQ(19, cm->AllocationId(node, 0));
#else
EXPECT_EQ(21, cm->AllocationId(node, 0));
#endif
} else {
#ifdef INTEL_MKL
EXPECT_EQ(20, cm->AllocationId(node, 0));
#else
EXPECT_EQ(22, cm->AllocationId(node, 0));
#endif
}
}
EXPECT_LE(0, cm->MaxExecutionTime(node));

View File

@ -0,0 +1,53 @@
/* Copyright 2018 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#ifdef INTEL_MKL
#include "tensorflow/core/common_runtime/threadpool_device.h"
#include "tensorflow/core/lib/core/status_test_util.h"
#include "tensorflow/core/platform/cpu_info.h"
#include "tensorflow/core/platform/logging.h"
#include "tensorflow/core/platform/test.h"
#include "tensorflow/core/public/session_options.h"
namespace tensorflow {
#ifdef _OPENMP
TEST(MKLThreadPoolDeviceTest, TestOmpDefaults) {
SessionOptions options;
unsetenv("OMP_NUM_THREADS");
ThreadPoolDevice* tp = new ThreadPoolDevice(
options, "/device:CPU:0", Bytes(256), DeviceLocality(), cpu_allocator());
const int ht = port::NumHyperthreadsPerCore();
EXPECT_EQ(omp_get_max_threads(), (port::NumSchedulableCPUs() + ht - 1) / ht);
}
TEST(MKLThreadPoolDeviceTest, TestOmpPreSets) {
SessionOptions options;
setenv("OMP_NUM_THREADS", "314", 1);
ThreadPoolDevice* tp = new ThreadPoolDevice(
options, "/device:CPU:0", Bytes(256), DeviceLocality(), cpu_allocator());
EXPECT_EQ(omp_get_max_threads(), 314);
}
#endif // _OPENMP
} // namespace tensorflow
#endif // INTEL_MKL

View File

@ -16,8 +16,10 @@ limitations under the License.
#include "tensorflow/core/common_runtime/process_util.h"
#ifdef INTEL_MKL
#ifdef _OPENMP
#include <omp.h>
#endif
#endif // _OPENMP
#endif // INTEL_MKL
#include <string.h>
#include "tensorflow/core/lib/core/threadpool.h"
@ -57,7 +59,10 @@ int32 NumInterOpThreadsFromSessionOptions(const SessionOptions& options) {
// MKL library executes ops in parallel using OMP threads
// Set inter_op conservatively to avoid thread oversubscription that could
// lead to severe perf degradations and OMP resource exhaustion
const int mkl_intra_op = omp_get_max_threads();
int mkl_intra_op = 1;
#ifdef _OPENMP
mkl_intra_op = omp_get_max_threads();
#endif // _OPENMP
CHECK_GE(mkl_intra_op, 1);
const int32 mkl_inter_op = std::max(
(port::NumSchedulableCPUs() + mkl_intra_op - 1) / mkl_intra_op, 2);
@ -68,7 +73,7 @@ int32 NumInterOpThreadsFromSessionOptions(const SessionOptions& options) {
#else
// Default to using the number of cores available in the process.
return port::NumSchedulableCPUs();
#endif
#endif // INTEL_MKL
}
thread::ThreadPool* NewThreadPoolFromSessionOptions(

View File

@ -31,7 +31,11 @@ limitations under the License.
#include "tensorflow/core/public/session_options.h"
#ifdef INTEL_MKL
#ifdef _OPENMP
#include <omp.h>
#endif
#include "tensorflow/core/common_runtime/mkl_cpu_allocator.h"
#include "tensorflow/core/platform/cpu_info.h"
#endif
namespace tensorflow {
@ -43,7 +47,26 @@ ThreadPoolDevice::ThreadPoolDevice(const SessionOptions& options,
: LocalDevice(options, Device::BuildDeviceAttributes(
name, DEVICE_CPU, memory_limit, locality)),
allocator_(allocator),
scoped_allocator_mgr_(new ScopedAllocatorMgr(name)) {}
scoped_allocator_mgr_(new ScopedAllocatorMgr(name)) {
#ifdef INTEL_MKL
#ifdef _OPENMP
const char* user_omp_threads = getenv("OMP_NUM_THREADS");
if (user_omp_threads == nullptr) {
// OMP_NUM_THREADS controls MKL's intra-op parallelization
// Default to available physical cores
const int mkl_intra_op = port::NumSchedulableCPUs();
const int ht = port::NumHyperthreadsPerCore();
omp_set_num_threads((mkl_intra_op + ht - 1) / ht);
} else {
uint64 user_val = 0;
if (strings::safe_strtou64(user_omp_threads, &user_val)) {
// Superflous but triggers OpenMP loading
omp_set_num_threads(user_val);
}
}
#endif // _OPENMP
#endif // INTEL_MKL
}
ThreadPoolDevice::~ThreadPoolDevice() {}

View File

@ -147,7 +147,9 @@ MasterService::Stub::Stub(
}
MasterService::AsyncService::AsyncService() {
for (int i = 0; i < 10; ++i) {
int method_len = sizeof(grpcMasterService_method_names) /
sizeof(grpcMasterService_method_names[0]);
for (int i = 0; i < method_len; ++i) {
AddMethod(new ::grpc::internal::RpcServiceMethod(
grpcMasterService_method_names[i],
::grpc::internal::RpcMethod::NORMAL_RPC, nullptr));

View File

@ -17,6 +17,7 @@ limitations under the License.
#include "tensorflow/core/distributed_runtime/rpc/grpc_session.h"
#include "tensorflow/core/lib/strings/str_util.h"
#include "tensorflow/core/platform/env.h"
#include "tensorflow/core/util/device_name_utils.h"
namespace tensorflow {
@ -50,9 +51,14 @@ Status TestCluster::MakeTestCluster(const SessionOptions& options, int n,
}
for (int i = 0; i < n; ++i) {
string server_file =
strings::StrCat(testing::TensorFlowSrcRoot(),
"/core/distributed_runtime/rpc/grpc_testlib_server");
if (!options.env->FileExists(server_file).ok()) {
return errors::Internal("Could not find grpc_testlib_server");
}
const std::vector<string> argv(
{strings::StrCat(testing::TensorFlowSrcRoot(),
"/core/distributed_runtime/rpc/grpc_testlib_server"),
{server_file,
/* see grpc_testlib_server.cc for flags */
tf_jobs, "--tf_job=localhost", strings::StrCat("--tf_task=", i),
strings::StrCat("--num_cpus=", num_cpus),

View File

@ -67,13 +67,8 @@ struct AllocatorStats {
// device memory.
class Allocator {
public:
#ifdef EIGEN_VECTORIZE_AVX512
// Align to 64 byte boundary.
static constexpr size_t kAllocatorAlignment = 64;
#else
// Align to 32 byte boundary.
static constexpr size_t kAllocatorAlignment = 32;
#endif
virtual ~Allocator();

View File

@ -15,6 +15,7 @@ limitations under the License.
#include "tensorflow/core/framework/op_gen_lib.h"
#include <algorithm>
#include <vector>
#include "tensorflow/core/framework/attr_value.pb.h"
#include "tensorflow/core/lib/core/errors.h"

View File

@ -5,7 +5,7 @@ option cc_enable_arenas = true;
option java_outer_classname = "RemoteFusedGraphExecuteInfoProto";
option java_multiple_files = true;
option java_package = "org.tensorflow.framework";
//add go_package externally
option go_package = "github.com/tensorflow/tensorflow/tensorflow/go/core/framework";
import "tensorflow/core/framework/graph.proto";
import "tensorflow/core/framework/tensor_shape.proto";
import "tensorflow/core/framework/types.proto";

View File

@ -1147,29 +1147,29 @@ TEST(Tensor, FailureToAllocate) {
// On the alignment.
//
// As of 2015/8, tensorflow::Tensor allocates its buffer with 32-byte
// As of 2018/5, tensorflow::Tensor allocates its buffer with 64-byte
// alignment. Tensor::tensor/flat/vec/matrix methods requires the
// buffer satisfies Eigen::Aligned (e.g., 16-bytes aligned usually,
// and 32-bytes for AVX). Tensor::Slice requires the caller to ensure
// its result is aligned if the caller intends to use those methods.
// In this test case, we simply make sure each slice is 32-byte
// aligned: sizeof(float) * 4 * 2 = 32.
// 32-bytes for AVX, and 64-bytes for AVX512). Tensor::Slice requires
// the caller to ensure its result is aligned if the caller intends
// to use those methods. In this test case, we simply make sure each
// slice is 64-byte aligned: sizeof(float) * 4 * 36 = 576. 576 % 64 = 0.
TEST(Tensor, Slice_Basic) {
Tensor saved;
{ // General
Tensor x(DT_FLOAT, TensorShape({10, 4, 34}));
Tensor x(DT_FLOAT, TensorShape({10, 4, 36}));
// Fills in known values.
for (int i = 0; i < 10; ++i) {
x.Slice(i, i + 1).flat<float>().setConstant(i * 1.f);
}
// A simple slice along dim0.
Tensor y = x.Slice(4, 8);
EXPECT_TRUE(y.shape().IsSameSize(TensorShape({4, 4, 34})));
EXPECT_TRUE(y.shape().IsSameSize(TensorShape({4, 4, 36})));
auto tx = x.tensor<float, 3>();
auto ty = y.tensor<float, 3>();
for (int i = 0; i < 4; ++i) {
for (int j = 0; j < 4; ++j) {
for (int k = 0; k < 34; ++k) {
for (int k = 0; k < 36; ++k) {
EXPECT_EQ(ty(i, j, k), 4.0 + i);
EXPECT_EQ(&tx(4 + i, j, k), &ty(i, j, k));
}
@ -1186,7 +1186,7 @@ TEST(Tensor, Slice_Basic) {
auto tz = z.tensor<float, 3>();
EXPECT_EQ(1, z.dim_size(0));
for (int j = 0; j < 4; ++j) {
for (int k = 0; k < 34; ++k) {
for (int k = 0; k < 36; ++k) {
EXPECT_EQ(tz(0, j, k), 6.0);
}
}
@ -1198,16 +1198,16 @@ TEST(Tensor, Slice_Basic) {
EXPECT_EQ(1, saved.dim_size(0));
auto tsaved = saved.tensor<float, 3>();
for (int j = 0; j < 4; ++j) {
for (int k = 0; k < 34; ++k) {
for (int k = 0; k < 36; ++k) {
EXPECT_EQ(tsaved(0, j, k), 6.0);
}
}
}
{ // Empty
Tensor x(DT_FLOAT, TensorShape({10, 0, 34}));
Tensor x(DT_FLOAT, TensorShape({10, 0, 36}));
x.flat<float>().setRandom();
Tensor y = x.Slice(4, 8);
EXPECT_TRUE(y.shape().IsSameSize(TensorShape({4, 0, 34})));
EXPECT_TRUE(y.shape().IsSameSize(TensorShape({4, 0, 36})));
}
{

View File

@ -2691,14 +2691,14 @@ class MklLayoutRewritePass : public GraphOptimizationPass {
// If Op has been specifically assigned to a non-CPU device, then No.
if (!n->assigned_device_name().empty() &&
!str_util::StrContains(n->assigned_device_name(),kCPUDeviceSubStr)) {
!str_util::StrContains(n->assigned_device_name(), kCPUDeviceSubStr)) {
result = false;
reason = "Op has been assigned a runtime device that is not CPU.";
}
// If user has specifically assigned this op to a non-CPU device, then No.
if (!n->def().device().empty() &&
!str_util::StrContains(n->def().device(),kCPUDeviceSubStr)) {
!str_util::StrContains(n->def().device(), kCPUDeviceSubStr)) {
result = false;
reason = "User has assigned a device that is not CPU.";
}
@ -2865,9 +2865,9 @@ class MklLayoutRewritePass : public GraphOptimizationPass {
return false;
}
// If the depth_radius of LRN is not 2, then MKL DNN takes unoptimized
// path. The unoptimized path is slow. Thus we dont rewrite the node
// and use default Eigen. But for depth_radius=2, MKL DNN optimized
// If the depth_radius of LRN is not 2, then MKL DNN takes unoptimized
// path. The unoptimized path is slow. Thus we dont rewrite the node
// and use default Eigen. But for depth_radius=2, MKL DNN optimized
// path is taken, i.e., eigen node is rewritten by MKl DNN node.
static bool LrnRewrite(const Node* n) {
CHECK_NOTNULL(n);
@ -2876,13 +2876,13 @@ class MklLayoutRewritePass : public GraphOptimizationPass {
CHECK_EQ(GetNodeAttr(n->def(), "depth_radius", &depth_radius).ok(), true);
// if the depth_radius of LRN is not 2, don't rewrite the node by MKL DNN
// and use eigen node instead
// and use eigen node instead
if (depth_radius == 2) {
return true;
}
VLOG(1) << "LrnRewrite: The model sets depth_radius as not 2 which"
<< "case is not optimized by Intel MKL, thus using Eigen op"
<< "for LRN " ;
<< "for LRN ";
return false;
}
@ -3015,6 +3015,35 @@ class MklLayoutRewritePass : public GraphOptimizationPass {
std::vector<NodeBuilder::NodeOut>* ws_tensors,
bool* are_ws_tensors_added);
// Helper function used by FixMklMetaDataEdges. Fixes the metadata edge
// pointed by 'e_metadata' corresponding to the data edge 'e_data' in graph
// 'g'. Returns true is fixup was done; otherwise, it returns false.
bool FixMklMetaDataEdgeIfNeeded(std::unique_ptr<Graph>* g,
const Edge* e_data, const Edge* e_metadata);
// Are the input Mkl metadata edges for node 'n' in graph 'g' correctly
// connected? If not, then fix them. This is needed because a graph may have
// some input Mkl metadata edges incorrectly setup after node merge and
// rewrite passes. This could happen because GetReversePostOrder function may
// not provide topologically sorted order if a graph contains cycles. The
// function returns true if at least one Mkl metadata edge for node 'n' was
// fixed. Otherwise, it returns false.
//
// Example:
//
// X = MklConv2D(_, _, _)
// Y = MklConv2DWithBias(_, _, _, _, _, _)
// Z = MklAdd(X, Y, DummyMklTensor, Y:1)
//
// For a graph such as shown above, note that 3rd argument of MklAdd contains
// DummyMklTensor. Actually, it should be getting the Mkl metadata from
// MklConv2D op (specifically, X:2). This incorrect plumbing could be possible
// (although rare) if the Mkl NodeMerge + NodeRewrite passes visit Z before X
// (possible if X, Y, Z are part of a loop.) This function fixes the Mkl
// metadata edges only - it does not rewrite nodes nor does it modify the Mkl
// data edges (1st and 2nd arguments of MklAdd).
bool FixMklMetaDataEdges(std::unique_ptr<Graph>* g, Node* n);
// Functions specific to operators to copy attributes
// We need operator-specific function to copy attributes because the framework
// does not provide any generic function for it.
@ -4241,6 +4270,92 @@ MklLayoutRewritePass::CheckForNodeRewrite(const Node* n) const {
return nullptr;
}
///////////////////////////////////////////////////////////////////////////////
// Post-rewrite Mkl metadata fixup pass
///////////////////////////////////////////////////////////////////////////////
bool MklLayoutRewritePass::FixMklMetaDataEdgeIfNeeded(std::unique_ptr<Graph>* g,
const Edge* e_data, const Edge* e_metadata) {
if (g == nullptr || e_data == nullptr || e_metadata == nullptr) {
return false;
}
Node* n_data = e_data->src();
int n_data_op_slot = e_data->src_output();
int n_metadata_op_slot = GetTensorMetaDataIndex(n_data_op_slot,
n_data->num_outputs());
// If the source of meta edge is a constant node (producing dummy Mkl metadata
// tensor), then we will need to fix.
if (IsConstant(e_metadata->src())) {
Node* e_metadata_dst = e_metadata->dst();
int e_metadata_in_slot = e_metadata->dst_input();
CHECK_NOTNULL((*g)->AddEdge(n_data, n_metadata_op_slot,
e_metadata_dst, e_metadata_in_slot));
(*g)->RemoveEdge(e_metadata);
return true;
}
return false;
}
bool MklLayoutRewritePass::FixMklMetaDataEdges(std::unique_ptr<Graph>* g,
Node* n) {
bool result = false;
// If graph node is not Mkl node, then return.
DataType T = DT_INVALID;
if (!GetNodeAttr(n->def(), "T", &T).ok() ||
!mkl_op_registry::IsMklOp(n->type_string(), T)) {
return result;
}
// If it is Mkl node, then check if the input edges to this node that carry
// Mkl metadata are linked up correctly with the source node.
// For Mkl nodes, we generate twice the number of input tensors (n for Mkl
// data tensors + n for Mkl metadata tensors). We need to check for correct
// connection of n metadata tensors only.
int num_data_inputs = n->num_inputs() / 2;
for (int idx = 0; idx < num_data_inputs; idx++) {
// Get the edge connecting input slot with index (idx).
const Edge* e = nullptr;
TF_CHECK_OK(n->input_edge(idx, &e));
// If e is control edge, then skip.
if (e->IsControlEdge()) {
continue;
}
// Check that the source node for edge 'e' is Mkl node. If it is not an Mkl
// node, then we don't need to do anything.
Node* e_src = e->src();
if (GetNodeAttr(e_src->def(), "T", &T).ok() &&
mkl_op_registry::IsMklOp(e_src->type_string(), T)) {
// Source node for edge 'e' is Mkl node.
// Destination node and destination input slot of e is node 'n' and 'idx'
// resp.
CHECK_EQ(e->dst(), n);
CHECK_EQ(e->dst_input(), idx);
// Let's get edge that carries Mkl metadata corresponding to Mkl data edge
// 'e'. For that, let's first get the input slot of 'n' where the meta
// edge will feed the value.
int e_meta_in_slot = GetTensorMetaDataIndex(e->dst_input(),
n->num_inputs());
const Edge* e_meta = nullptr;
TF_CHECK_OK(n->input_edge(e_meta_in_slot, &e_meta));
// Let's check if we need to fix this meta edge.
if (FixMklMetaDataEdgeIfNeeded(g, e, e_meta)) {
result = true;
}
}
}
return result;
}
///////////////////////////////////////////////////////////////////////////////
// Run function for the pass
///////////////////////////////////////////////////////////////////////////////
@ -4307,6 +4422,25 @@ bool MklLayoutRewritePass::RunPass(std::unique_ptr<Graph>* g) {
DumpGraph("After running MklLayoutRewritePass(NodeMerge+Rewrite)", &**g);
order.clear();
GetReversePostOrder(**g, &order); // This will give us topological sort.
for (Node* n : order) {
// If node is not an op or it cannot run on CPU device, then skip.
if (!n->IsOp() || !CanOpRunOnCPUDevice(n)) {
continue;
}
if (FixMklMetaDataEdges(g, n)) {
string node_name = n->name();
string op_name = n->type_string();
VLOG(1) << "MklLayoutRewritePass: fixed metadata edges for node "
<< node_name << " with op " << op_name;
result = true;
}
}
DumpGraph("After running MklLayoutRewritePass(NodeMerge+Rewrite+Fixup)",
&**g);
return result;
}

View File

@ -3518,6 +3518,37 @@ TEST_F(MklLayoutPassTest, NodeMerge_Conv2DWithBias_DeviceTest) {
"B->C:1;C->E;D->E:1;E->Z;M->C:2;N->C:3;Y->Z:1");
}
/////////////////////////////////////////////////////////////////////
// Post-rewrite fixup pass test
TEST_F(MklLayoutPassTest, PostRewriteFixUpPass) {
InitGraph(
"node { name: 'A' op: 'Input'}"
"node { name: 'B' op: 'Input'}"
"node { name: 'M' op: '_MklInput'}"
"node { name: 'N' op: '_MklInput'}"
"node { name: 'C' op: '_MklConv2D'"
" attr { key: 'T' value { type: DT_FLOAT } }"
" attr { key: 'data_format' value { s: 'NCHW' } }"
" attr { key: 'use_cudnn_on_gpu' value { b: false } }"
" attr { key: 'strides' value { list: {i: 1, i:1, i:1, i:1} } }"
" attr { key: 'padding' value { s: 'SAME' } }"
" attr { key: 'dilations' value { list: {i: 1, i:1, i:1, i:1} } }"
" input: ['A', 'B', 'M', 'N']}"
"node { name: 'D' op: 'Const' "
" attr { key: 'dtype' value { type: DT_UINT8 } }"
" attr { key: 'value' value { "
" tensor { dtype: DT_UINT8 tensor_shape { dim { size: 1 } } "
" int_val: 0 } } } }"
"node { name: 'E' op: '_MklAdd'"
" attr {key: 'T' value { type: DT_FLOAT } }"
" input: ['C', 'A', 'D', 'D']}");
EXPECT_EQ(DoMklLayoutOptimizationPass(),
"A(Input);B(Input);C(_MklConv2D);D(Const);E(_MklAdd);"
"M(_MklInput);N(_MklInput)|A->C;A->E:1;B->C:1;C->E;C:2->E:2;"
"D->E:3;M->C:2;N->C:3");
}
/////////////////////////////////////////////////////////////////////
static void BM_MklLayoutRewritePass(int iters, int op_nodes) {

View File

@ -610,7 +610,6 @@ class SymbolicShapeRefiner {
}
};
// Compute the shape of the tensors outputed by node 'node' at output port
// 'port_index' as the union of shape1 and shape2.
ShapeHandle OutputAsUnion(const NodeDef* node, int port_index,
ShapeHandle shape1, ShapeHandle shape2) {

View File

@ -679,6 +679,7 @@ cc_library(
deps = [
":constant_folding",
":graph_optimizer",
"//tensorflow/core:lib",
"//tensorflow/core:protos_all_cc",
"//tensorflow/core/grappler:graph_view",
"//tensorflow/core/grappler:grappler_item",
@ -780,7 +781,6 @@ cc_library(
"//tensorflow/core:lib",
"//tensorflow/core:lib_internal",
"//tensorflow/core:protos_all_cc",
"//tensorflow/core:scoped_allocator_ops_op_lib",
"//tensorflow/core/grappler:grappler_item",
"//tensorflow/core/grappler:op_types",
"//tensorflow/core/grappler:utils",

Some files were not shown because too many files have changed in this diff Show More