Branch 163121296 (#11767)

* Update ops-related pbtxt files.

PiperOrigin-RevId: 163014080

* Go: Update generated wrapper functions for TensorFlow ops.

PiperOrigin-RevId: 163014834

* Removing session reset since destroying the session object would delete its variables as well. Resetting session might unintentionally close other sessions in the same process.

PiperOrigin-RevId: 163019166

* [XLA] Teach CPU and GPU compilers to optionally invoke the HLO insert-reduce-precision-operations pass.

This also required a few additions and fixups.  We add pieces to ReducePrecisionInsertion to translate between the protocol-buffer representation of the pass options and the predicate-function actually used in the pass.  To facilitate this translation, we also add a function to HloOpcode to return the number of opcodes so that we can iterate over the whole set easily.

PiperOrigin-RevId: 163037250

* Refactor HLO graph dumping.

This also makes a few minor cosmetic changes, like moving the fusion
type out of the fusion node and into the out-of-line computation and
adjusting the arrow labels that we use to indicate operand numbers.

PiperOrigin-RevId: 163038795

* Use correct order of arguments in call of valid_bitcast_callback_.

There are platforms where bitcasts are not symmetric. I.e. there are shapes A and B so that A->B is a bitcast, but B->A not. So we have to consider the correct order when calling valid_bitcast_callback_.

PiperOrigin-RevId: 163058665

* Two improvements to pip.sh

1. Distinguish between passed and skipped tests.
2. Allow skipping the smoke test of tensorflow install in clean virtualenv with NO_TEST_ON_INSTALL=1

PiperOrigin-RevId: 163065599

* [XLA] Update StatusOr implementation to use more nuanced type traits.

Previously we would evaluate the is_copy_constructible trait before template
parameters were fully defined; e.g. StatusOr<ThingIAmDefiningRightNow>,
which could lead to surprising effects.

Also, previously it was not possible to provide an error status to a
StatusOr<T> where T was not default-constructible.

PiperOrigin-RevId: 163073057

* [TF:XLA] Register a no-op kernel for ControlTrigger, but forbid the JIT marking pass from compiling ControlTrigger nodes.

CL in preparation for compiling dynamic RNN gradients via XLA.

PiperOrigin-RevId: 163073212

* Improve the HLO graph dumper's output.

 - Truncate long shapes.  It's not uncommon to have giant tuples, and
   displaying the whole thing makes the graph unreadable.

 - Don't traverse into the users of a node with < 16 users.  These are
   probably not interesting, and traversing into them can quickly blow
   up the graph, making it un-renderable.

 - Allow nodes which have multiple trivial subcomputations (e.g.
   select-and-scatter) to have those computations inlined.

 - Match additional patterns in MatchTrivialComputation

PiperOrigin-RevId: 163079329

* If the value to be forwarded from a loop to its gradient is a constant, clone the constant instead of repeatedly pushing it onto a stack on each iteration. This should never consume more memory than the stack approach (notwithstanding swapping), and frequently should be much better.

This change is in preparation for enabling XLA compilation of RNN gradients.

PiperOrigin-RevId: 163082165

* [TF:XLA] Make the shape of a TensorArray flow value a scalar.

Previously we used an f32[0] value, since the exact flow value does not matter, however this causes problems when a TensorArray computation is placed in a loop since the shape of the flow value is no longer loop invariant.

PiperOrigin-RevId: 163082452

* Automated g4 rollback of changelist 163019166

PiperOrigin-RevId: 163083436

* Automated g4 rollback of changelist 162769374

PiperOrigin-RevId: 163086518

* internal change

PiperOrigin-RevId: 163088509

* Clarify docstring for tf.rank.

PiperOrigin-RevId: 163089480

* Reduce gather_op_test timeouts by reducing the size of testHigherRank.

PiperOrigin-RevId: 163090428

* Add PopulationCount op (popcnt): element-wise counts the number of "on" bits.

PiperOrigin-RevId: 163090921

* Show fusion nodes inline in HLO graph dumper.

To make this work sanely I had to change NodeFilter so that it says to
dump all nodes inside subcomputations.  Previously, we passed an
explicit NodeFilter down to DumpSubcomputation, and used that to control
whether or not we dumped nodes in there.  But this becomes unwieldy with
inline fusion nodes, as sometimes you want to look at 'filter', and
other times you want to look at 'filter_', and there's no good way to
tell why.

I also had to remove the heuristic whereby we'd pull in operands of
nodes with just some operands shown.  With the much bigger nodes that
are generated by this change, the graph was becoming illegible.  I think
most of the confusion that heuristic was attempting to avoid is
addressed by the fact that we "gray out" incomplete nodes.

PiperOrigin-RevId: 163091423

* errors: Avoid stripping error details when convering POSIX errors to Status

This change is made out of a desire to have additional information be reported
when there are filesystem errors (for e.g. see
#11628)

PiperOrigin-RevId: 163091773

* C API: Fix a bug with TF_OperationGetAttrTensor when TF_STRING tensors are
involved.

The TensorBuffer owned by a TF_Tensor object has a different memory layout than
the TensorBuffer owned by the corresponding tensorflow::Tensor object.
This change consolidates conversions between the runtime's tensorflow::Tensor
and the C API's TF_Tensor objects into a pair helper functions.

The added test: CApiAttributesTest.StringTensor fails without corresponding
changes to c_api.cc

PiperOrigin-RevId: 163091789

* Speed up tf.contrib.signal spectral_ops_test.py by reducing the size of the gradient test.

PiperOrigin-RevId: 163092423

* Add new CompareAndBitpackOp.

PiperOrigin-RevId: 163093146

* Update ops-related pbtxt files.

PiperOrigin-RevId: 163094455

* Minor tweaks to avoid unnecessary copies

PiperOrigin-RevId: 163101160

* [BatchNormGrad] Add end-to-end test for BatchNormGrad

RELNOTES: n/a
PiperOrigin-RevId: 163101568

* Go: Update generated wrapper functions for TensorFlow ops.

PiperOrigin-RevId: 163102070

* [XLA] Add more unit tests for DynamicSlice and DynamicUpdateSlice.

PiperOrigin-RevId: 163102445

* Adding missing deps to targets in llvm.BUILD. This was only working in non-sandboxed builds.

PiperOrigin-RevId: 163103908

* Pass batch_size in params when use_tpu=False.

PiperOrigin-RevId: 163105673

* Remove duplicate import.

PiperOrigin-RevId: 163108237

* Implementation of UnsortedSegmentSum in tf2xla bridge.

PiperOrigin-RevId: 163109769

* Add gradient checking tests for nn.moments().

PiperOrigin-RevId: 163110994

* Improved the speed of constant folding

PiperOrigin-RevId: 163113085

* Convert configure to python.

PiperOrigin-RevId: 163114551

* [TF:XLA] Ignore control edges from Enter nodes to the graph sink during loop functionalization.

PiperOrigin-RevId: 163115904

* Support customized residual function in the residual wrapper.

PiperOrigin-RevId: 163121296
This commit is contained in:
Vijay Vasudevan 2017-07-25 18:41:40 -07:00 committed by GitHub
parent 245a5eeafc
commit 62de0b9479
81 changed files with 4579 additions and 2018 deletions

810
configure vendored
View File

@ -3,816 +3,12 @@
set -e set -e
set -o pipefail set -o pipefail
MIN_BAZEL_VERSION=0.4.5
# Find out the absolute path to where ./configure resides
pushd `dirname $0` > /dev/null
SOURCE_BASE_DIR=`pwd -P`
popd > /dev/null
PLATFORM="$(uname -s | tr 'A-Z' 'a-z')"
function is_linux() {
[[ "${PLATFORM}" == "linux" ]]
}
function is_macos() {
[[ "${PLATFORM}" == "darwin" ]]
}
function is_windows() {
# On windows, the shell script is actually running in msys
[[ "${PLATFORM}" =~ msys_nt*|mingw*|cygwin*|uwin* ]]
}
function is_ppc64le() {
[[ "$(uname -m)" == "ppc64le" ]]
}
function sed_in_place() {
sed -e $1 $2 > "$2.bak"
mv "$2.bak" $2
}
function write_to_bazelrc() {
echo "$1" >> .tf_configure.bazelrc
}
function write_action_env_to_bazelrc() {
write_to_bazelrc "build --action_env $1=\"$2\""
}
function python_path {
"$PYTHON_BIN_PATH" - <<END
from __future__ import print_function
import site
import os
try:
input = raw_input
except NameError:
pass
python_paths = []
if os.getenv('PYTHONPATH') is not None:
python_paths = os.getenv('PYTHONPATH').split(':')
try:
library_paths = site.getsitepackages()
except AttributeError:
from distutils.sysconfig import get_python_lib
library_paths = [get_python_lib()]
all_paths = set(python_paths + library_paths)
paths = []
for path in all_paths:
if os.path.isdir(path):
paths.append(path)
print(",".join(paths))
END
}
function setup_python {
## Set up python-related environment settings:
while true; do
fromuser=""
if [ -z "$PYTHON_BIN_PATH" ]; then if [ -z "$PYTHON_BIN_PATH" ]; then
default_python_bin_path=$(which python || which python3 || true) PYTHON_BIN_PATH=$(which python || which python3 || true)
read -p "Please specify the location of python. [Default is $default_python_bin_path]: " PYTHON_BIN_PATH
fromuser="1"
if [ -z "$PYTHON_BIN_PATH" ]; then
PYTHON_BIN_PATH=$default_python_bin_path
fi fi
fi
if [ -e "$PYTHON_BIN_PATH" ]; then
break
fi
echo "Invalid python path. ${PYTHON_BIN_PATH} cannot be found" 1>&2
if [ -z "$fromuser" ]; then
exit 1
fi
PYTHON_BIN_PATH=""
# Retry
done
if [ -z "$PYTHON_LIB_PATH" ]; then
# Split python_path into an array of paths, this allows path containing spaces
IFS=',' read -r -a python_lib_path <<< "$(python_path)"
if [ 1 = "$USE_DEFAULT_PYTHON_LIB_PATH" ]; then
PYTHON_LIB_PATH=${python_lib_path[0]}
echo "Using python library path: $PYTHON_LIB_PATH"
else
echo "Found possible Python library paths:"
for x in "${python_lib_path[@]}"; do
echo " $x"
done
set -- "${python_lib_path[@]}"
echo "Please input the desired Python library path to use. Default is [$1]"
read b || true
if [ "$b" == "" ]; then
PYTHON_LIB_PATH=${python_lib_path[0]}
echo "Using python library path: $PYTHON_LIB_PATH"
else
PYTHON_LIB_PATH="$b"
fi
fi
fi
if [ ! -x "$PYTHON_BIN_PATH" ] || [ -d "$PYTHON_BIN_PATH" ]; then
echo "PYTHON_BIN_PATH is not executable. Is it the python binary?"
exit 1
fi
local python_major_version
python_major_version=$("${PYTHON_BIN_PATH}" -c 'from __future__ import print_function; import sys; print(sys.version_info[0]);' | head -c1)
if [ -z "$python_major_version" ]; then
echo -e "\n\nERROR: Problem getting python version. Is $PYTHON_BIN_PATH the correct python binary?"
exit 1
fi
# Convert python path to Windows style before writing into bazel.rc
if is_windows; then
PYTHON_BIN_PATH="$(cygpath -m "$PYTHON_BIN_PATH")"
PYTHON_LIB_PATH="$(cygpath -m "$PYTHON_LIB_PATH")"
fi
# Set-up env variables used by python_configure.bzl
write_action_env_to_bazelrc "PYTHON_BIN_PATH" "$PYTHON_BIN_PATH"
write_action_env_to_bazelrc "PYTHON_LIB_PATH" "$PYTHON_LIB_PATH"
write_to_bazelrc "build --define PYTHON_BIN_PATH=\"$PYTHON_BIN_PATH\""
write_to_bazelrc "build --define PYTHON_LIB_PATH=\"$PYTHON_LIB_PATH\""
write_to_bazelrc "build --force_python=py$python_major_version"
write_to_bazelrc "build --host_force_python=py$python_major_version"
write_to_bazelrc "build --python${python_major_version}_path=\"$PYTHON_BIN_PATH\""
write_to_bazelrc "test --force_python=py$python_major_version"
write_to_bazelrc "test --host_force_python=py$python_major_version"
write_to_bazelrc "test --define PYTHON_BIN_PATH=\"$PYTHON_BIN_PATH\""
write_to_bazelrc "test --define PYTHON_LIB_PATH=\"$PYTHON_LIB_PATH\""
write_to_bazelrc "run --define PYTHON_BIN_PATH=\"$PYTHON_BIN_PATH\""
write_to_bazelrc "run --define PYTHON_LIB_PATH=\"$PYTHON_LIB_PATH\""
# Write tools/python_bin_path.sh
echo "export PYTHON_BIN_PATH=\"$PYTHON_BIN_PATH\"" > tools/python_bin_path.sh
}
function version {
echo "$@" | awk -F. '{ printf("%03d%03d%03d\n", $1,$2,$3); }';
}
bazel version > bazel.version
set +e
curr_bazel_version=$(grep -m 1 'Build label:' bazel.version | cut -d ' ' -f3)
set -e
rm -f bazel.version
echo "You have bazel $curr_bazel_version installed."
if [ -z "$curr_bazel_version" ]; then
echo "WARNING: current bazel installation is not a release version."
echo "Make sure you are running at least bazel $MIN_BAZEL_VERSION."
elif [ "$(version "$MIN_BAZEL_VERSION")" -gt "$(version "$curr_bazel_version")" ]; then
echo "Please upgrade your bazel installation to version $MIN_BAZEL_VERSION or higher to build TensorFlow!"
echo "Exiting..."
exit 1
fi
# This file contains customized config settings.
rm -f .tf_configure.bazelrc
touch .tf_configure.bazelrc
if [[ ! -e .bazelrc ]]; then
if [[ -e "${HOME}/.bazelrc" ]]; then
echo "import ${HOME}/.bazelrc" >.bazelrc
else
touch .bazelrc
fi
fi
sed_in_place "/tf_configure/d" .bazelrc
echo "import %workspace%/.tf_configure.bazelrc" >> .bazelrc
# Delete any leftover BUILD files from the Makefile build, which would interfere
# with Bazel parsing.
MAKEFILE_DOWNLOAD_DIR=tensorflow/contrib/makefile/downloads
if [ -d "${MAKEFILE_DOWNLOAD_DIR}" ]; then
find ${MAKEFILE_DOWNLOAD_DIR} -type f -name '*BUILD' -delete
fi
setup_python
## Set up MKL related environment settings
write_to_bazelrc 'build:mkl --define with_mkl_support=true'
write_to_bazelrc 'build:mkl --define using_mkl=true'
write_to_bazelrc 'build:mkl -c opt'
write_to_bazelrc 'build:mkl --copt="-DEIGEN_USE_VML"'
echo ""
echo "Add \"--config=mkl\" to your bazel command to build with MKL support."
echo "Please note that MKL on MacOS or windows is still not supported."
echo "If you would like to use a local MKL instead of downloading, please "
echo " set the environment variable \"TF_MKL_ROOT\" every time before build."
echo ""
## End MKL setup
## Set up architecture-dependent optimization flags.
if [ -z "$CC_OPT_FLAGS" ]; then
if is_ppc64le; then
# gcc on ppc64le does not support -march, use mcpu instead
default_cc_opt_flags="-mcpu=native"
else
default_cc_opt_flags="-march=native"
fi
read -p "Please specify optimization flags to use during compilation when bazel option "\
"\"--config=opt\" is specified [Default is $default_cc_opt_flags]: " CC_OPT_FLAGS
if [ -z "$CC_OPT_FLAGS" ]; then
CC_OPT_FLAGS=$default_cc_opt_flags
fi
fi
if is_windows; then
TF_NEED_GCP=0
TF_NEED_HDFS=0
TF_NEED_JEMALLOC=0
TF_NEED_OPENCL=0
TF_CUDA_CLANG=0
fi
if is_linux; then
while [ "$TF_NEED_JEMALLOC" == "" ]; do
read -p "Do you wish to use jemalloc as the malloc implementation? [Y/n] "\
INPUT
case $INPUT in
[Yy]* ) echo "jemalloc enabled"; TF_NEED_JEMALLOC=1;;
[Nn]* ) echo "jemalloc disabled"; TF_NEED_JEMALLOC=0;;
"" ) echo "jemalloc enabled"; TF_NEED_JEMALLOC=1;;
* ) echo "Invalid selection: " $INPUT;;
esac
done
else
TF_NEED_JEMALLOC=0
fi
if [[ "$TF_NEED_JEMALLOC" == "1" ]]; then
write_to_bazelrc 'build --define with_jemalloc=true'
fi
while [[ "$TF_NEED_GCP" == "" ]]; do
read -p "Do you wish to build TensorFlow with "\
"Google Cloud Platform support? [y/N] " INPUT
case $INPUT in
[Yy]* ) echo "Google Cloud Platform support will be enabled for "\
"TensorFlow"; TF_NEED_GCP=1;;
[Nn]* ) echo "No Google Cloud Platform support will be enabled for "\
"TensorFlow"; TF_NEED_GCP=0;;
"" ) echo "No Google Cloud Platform support will be enabled for "\
"TensorFlow"; TF_NEED_GCP=0;;
* ) echo "Invalid selection: " $INPUT;;
esac
done
if [[ "$TF_NEED_GCP" == "1" ]]; then
write_to_bazelrc 'build --define with_gcp_support=true'
fi
while [[ "$TF_NEED_HDFS" == "" ]]; do
read -p "Do you wish to build TensorFlow with "\
"Hadoop File System support? [y/N] " INPUT
case $INPUT in
[Yy]* ) echo "Hadoop File System support will be enabled for "\
"TensorFlow"; TF_NEED_HDFS=1;;
[Nn]* ) echo "No Hadoop File System support will be enabled for "\
"TensorFlow"; TF_NEED_HDFS=0;;
"" ) echo "No Hadoop File System support will be enabled for "\
"TensorFlow"; TF_NEED_HDFS=0;;
* ) echo "Invalid selection: " $INPUT;;
esac
done
if [[ "$TF_NEED_HDFS" == "1" ]]; then
write_to_bazelrc 'build --define with_hdfs_support=true'
fi
## Enable XLA.
while [[ "$TF_ENABLE_XLA" == "" ]]; do
read -p "Do you wish to build TensorFlow with the XLA just-in-time compiler (experimental)? [y/N] " INPUT
case $INPUT in
[Yy]* ) echo "XLA JIT support will be enabled for TensorFlow"; TF_ENABLE_XLA=1;;
[Nn]* ) echo "No XLA JIT support will be enabled for TensorFlow"; TF_ENABLE_XLA=0;;
"" ) echo "No XLA support will be enabled for TensorFlow"; TF_ENABLE_XLA=0;;
* ) echo "Invalid selection: " $INPUT;;
esac
done
if [[ "$TF_ENABLE_XLA" == "1" ]]; then
write_to_bazelrc 'build --define with_xla_support=true'
fi
# Verbs configuration
while [ "$TF_NEED_VERBS" == "" ]; do
read -p "Do you wish to build TensorFlow with "\
"VERBS support? [y/N] " INPUT
case $INPUT in
[Yy]* ) echo "VERBS support will be enabled for "\
"TensorFlow"; TF_NEED_VERBS=1;;
[Nn]* ) echo "No VERBS support will be enabled for "\
"TensorFlow"; TF_NEED_VERBS=0;;
"" ) echo "No VERBS support will be enabled for "\
"TensorFlow"; TF_NEED_VERBS=0;;
* ) echo "Invalid selection: " $INPUT;;
esac
done
if [[ "$TF_NEED_VERBS" == "1" ]]; then
write_to_bazelrc 'build --define with_verbs_support=true'
fi
# Append CC optimization flags to bazel.rc
for opt in $CC_OPT_FLAGS; do
write_to_bazelrc "build:opt --cxxopt=$opt --copt=$opt"
done
# Run the gen_git_source to create links where bazel can track dependencies for
# git hash propagation
GEN_GIT_SOURCE=tensorflow/tools/git/gen_git_source.py
chmod a+x ${GEN_GIT_SOURCE}
"${PYTHON_BIN_PATH}" ${GEN_GIT_SOURCE} --configure "${SOURCE_BASE_DIR}"
## Set up SYCL-related environment settings
while [ "$TF_NEED_OPENCL" == "" ]; do
read -p "Do you wish to build TensorFlow with OpenCL support? [y/N] " INPUT
case $INPUT in
[Yy]* ) echo "OpenCL support will be enabled for TensorFlow"; TF_NEED_OPENCL=1;;
[Nn]* ) echo "No OpenCL support will be enabled for TensorFlow"; TF_NEED_OPENCL=0;;
"" ) echo "No OpenCL support will be enabled for TensorFlow"; TF_NEED_OPENCL=0;;
* ) echo "Invalid selection: " $INPUT;;
esac
done
## Set up Cuda-related environment settings
while [ "$TF_NEED_CUDA" == "" ]; do
read -p "Do you wish to build TensorFlow with CUDA support? [y/N] " INPUT
case $INPUT in
[Yy]* ) echo "CUDA support will be enabled for TensorFlow"; TF_NEED_CUDA=1;;
[Nn]* ) echo "No CUDA support will be enabled for TensorFlow"; TF_NEED_CUDA=0;;
"" ) echo "No CUDA support will be enabled for TensorFlow"; TF_NEED_CUDA=0;;
* ) echo "Invalid selection: " $INPUT;;
esac
done
export TF_NEED_CUDA
write_action_env_to_bazelrc "TF_NEED_CUDA" "$TF_NEED_CUDA"
export TF_NEED_OPENCL
write_action_env_to_bazelrc "TF_NEED_OPENCL" "$TF_NEED_OPENCL"
if [ "$TF_NEED_CUDA" == "1" ]; then
while [[ "$TF_CUDA_CLANG" == "" ]]; do
read -p "Do you want to use clang as CUDA compiler? [y/N] " INPUT
case $INPUT in
[Yy]* ) echo "Clang will be used as CUDA compiler"; TF_CUDA_CLANG=1;;
[Nn]* ) echo "nvcc will be used as CUDA compiler"; TF_CUDA_CLANG=0;;
"" ) echo "nvcc will be used as CUDA compiler"; TF_CUDA_CLANG=0;;
* ) echo "Invalid selection: " $INPUT;;
esac
done
export TF_CUDA_CLANG
write_action_env_to_bazelrc "TF_CUDA_CLANG" "$TF_CUDA_CLANG"
# Set up which clang we should use as the cuda / host compiler.
while [[ "$TF_CUDA_CLANG" == "1" ]] && true; do
fromuser=""
if [ -z "$CLANG_CUDA_COMPILER_PATH" ]; then
default_clang_host_compiler_path=$(which clang || true)
read -p "Please specify which clang should be used as device and host compiler. [Default is $default_clang_host_compiler_path]: " CLANG_CUDA_COMPILER_PATH
fromuser="1"
if [ -z "$CLANG_CUDA_COMPILER_PATH" ]; then
CLANG_CUDA_COMPILER_PATH="$default_clang_host_compiler_path"
fi
fi
if [ -e "$CLANG_CUDA_COMPILER_PATH" ]; then
export CLANG_CUDA_COMPILER_PATH
write_action_env_to_bazelrc "CLANG_CUDA_COMPILER_PATH" "$CLANG_CUDA_COMPILER_PATH"
break
fi
echo "Invalid clang path. ${CLANG_CUDA_COMPILER_PATH} cannot be found" 1>&2
if [ -z "$fromuser" ]; then
exit 1
fi
CLANG_CUDA_COMPILER_PATH=""
# Retry
done
# Find out where the CUDA toolkit is installed
while true; do
# Configure the Cuda SDK version to use.
if [ -z "$TF_CUDA_VERSION" ]; then
read -p "Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 8.0]: " TF_CUDA_VERSION
fi
# Set default CUDA version if not set
TF_CUDA_VERSION=${TF_CUDA_VERSION:-8.0}
fromuser=""
if [ -z "$CUDA_TOOLKIT_PATH" ]; then
default_cuda_path=/usr/local/cuda
if is_windows; then
if [ -z "$CUDA_PATH" ]; then
default_cuda_path="C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0"
else
default_cuda_path="$(cygpath -m "$CUDA_PATH")"
fi
elif is_linux; then
# If the default doesn't exist, try an alternative default.
if [ ! -d $default_cuda_path ] && [ -d /opt/cuda ]; then
default_cuda_path=/opt/cuda
fi
fi
read -p "Please specify the location where CUDA $TF_CUDA_VERSION toolkit is installed. Refer to README.md for more details. [Default is $default_cuda_path]: " CUDA_TOOLKIT_PATH
fromuser="1"
if [ -z "$CUDA_TOOLKIT_PATH" ]; then
CUDA_TOOLKIT_PATH="$default_cuda_path"
fi
fi
if [[ -z "$TF_CUDA_VERSION" ]]; then
TF_CUDA_EXT=""
else
TF_CUDA_EXT=".$TF_CUDA_VERSION"
fi
if is_windows; then
CUDA_RT_LIB_PATH="lib/x64/cudart.lib"
elif is_linux; then
CUDA_RT_LIB_PATH="lib64/libcudart.so${TF_CUDA_EXT}"
elif is_macos; then
CUDA_RT_LIB_PATH="lib/libcudart${TF_CUDA_EXT}.dylib"
fi
if [ -e "${CUDA_TOOLKIT_PATH}/${CUDA_RT_LIB_PATH}" ]; then
export CUDA_TOOLKIT_PATH
write_action_env_to_bazelrc "CUDA_TOOLKIT_PATH" "$CUDA_TOOLKIT_PATH"
export TF_CUDA_VERSION
break
fi
echo "Invalid path to CUDA $TF_CUDA_VERSION toolkit. ${CUDA_TOOLKIT_PATH}/${CUDA_RT_LIB_PATH} cannot be found"
if [ -z "$fromuser" ]; then
exit 1
fi
# Retry
TF_CUDA_VERSION=""
CUDA_TOOLKIT_PATH=""
done
export TF_CUDA_VERSION
write_action_env_to_bazelrc "TF_CUDA_VERSION" "$TF_CUDA_VERSION"
# Set up which gcc nvcc should use as the host compiler # Set all env variables
# No need to set this on Windows $PYTHON_BIN_PATH configure.py
while [[ "$TF_CUDA_CLANG" != "1" ]] && ! is_windows && true; do
fromuser=""
if [ -z "$GCC_HOST_COMPILER_PATH" ]; then
default_gcc_host_compiler_path=$(which gcc || true)
cuda_bin_symlink="$CUDA_TOOLKIT_PATH/bin/gcc"
if [ -L "$cuda_bin_symlink" ]; then
default_gcc_host_compiler_path=$(readlink $cuda_bin_symlink)
fi
read -p "Please specify which gcc should be used by nvcc as the host compiler. [Default is $default_gcc_host_compiler_path]: " GCC_HOST_COMPILER_PATH
fromuser="1"
if [ -z "$GCC_HOST_COMPILER_PATH" ]; then
GCC_HOST_COMPILER_PATH="$default_gcc_host_compiler_path"
fi
fi
if [ -e "$GCC_HOST_COMPILER_PATH" ]; then
export GCC_HOST_COMPILER_PATH
write_action_env_to_bazelrc "GCC_HOST_COMPILER_PATH" "$GCC_HOST_COMPILER_PATH"
break
fi
echo "Invalid gcc path. ${GCC_HOST_COMPILER_PATH} cannot be found" 1>&2
if [ -z "$fromuser" ]; then
exit 1
fi
GCC_HOST_COMPILER_PATH=""
# Retry
done
# Find out where the cuDNN library is installed
while true; do
# Configure the cuDNN version to use.
if [ -z "$TF_CUDNN_VERSION" ]; then
read -p "Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 6.0]: " TF_CUDNN_VERSION
fi
# Set default CUDNN version if not set
TF_CUDNN_VERSION=${TF_CUDNN_VERSION:-6}
fromuser=""
if [ -z "$CUDNN_INSTALL_PATH" ]; then
default_cudnn_path=${CUDA_TOOLKIT_PATH}
read -p "Please specify the location where cuDNN $TF_CUDNN_VERSION library is installed. Refer to README.md for more details. [Default is $default_cudnn_path]: " CUDNN_INSTALL_PATH
fromuser="1"
if [ -z "$CUDNN_INSTALL_PATH" ]; then
CUDNN_INSTALL_PATH=$default_cudnn_path
fi
# Result returned from "read" will be used unexpanded. That make "~" unusable.
# Going through one more level of expansion to handle that.
CUDNN_INSTALL_PATH=`"${PYTHON_BIN_PATH}" -c "import os; print(os.path.realpath(os.path.expanduser('${CUDNN_INSTALL_PATH}')))"`
if is_windows; then
CUDNN_INSTALL_PATH="$(cygpath -m "$CUDNN_INSTALL_PATH")"
fi
fi
if [[ -z "$TF_CUDNN_VERSION" ]]; then
TF_CUDNN_EXT=""
else
TF_CUDNN_EXT=".$TF_CUDNN_VERSION"
fi
if is_windows; then
CUDA_DNN_LIB_PATH="lib/x64/cudnn.lib"
CUDA_DNN_LIB_ALT_PATH="lib/x64/cudnn.lib"
elif is_linux; then
CUDA_DNN_LIB_PATH="lib64/libcudnn.so${TF_CUDNN_EXT}"
CUDA_DNN_LIB_ALT_PATH="libcudnn.so${TF_CUDNN_EXT}"
elif is_macos; then
CUDA_DNN_LIB_PATH="lib/libcudnn${TF_CUDNN_EXT}.dylib"
CUDA_DNN_LIB_ALT_PATH="libcudnn${TF_CUDNN_EXT}.dylib"
fi
if [ -e "$CUDNN_INSTALL_PATH/${CUDA_DNN_LIB_ALT_PATH}" ] || [ -e "$CUDNN_INSTALL_PATH/${CUDA_DNN_LIB_PATH}" ]; then
export TF_CUDNN_VERSION
write_action_env_to_bazelrc "TF_CUDNN_VERSION" "$TF_CUDNN_VERSION"
export CUDNN_INSTALL_PATH
write_action_env_to_bazelrc "CUDNN_INSTALL_PATH" "$CUDNN_INSTALL_PATH"
break
fi
if is_linux; then
if ! type ldconfig > /dev/null 2>&1; then
LDCONFIG_BIN=/sbin/ldconfig
else
LDCONFIG_BIN=ldconfig
fi
CUDNN_PATH_FROM_LDCONFIG="$($LDCONFIG_BIN -p | sed -n 's/.*libcudnn.so .* => \(.*\)/\1/p')"
if [ -e "${CUDNN_PATH_FROM_LDCONFIG}${TF_CUDNN_EXT}" ]; then
export TF_CUDNN_VERSION
export CUDNN_INSTALL_PATH
CUDNN_INSTALL_PATH="$(dirname ${CUDNN_PATH_FROM_LDCONFIG})"
write_action_env_to_bazelrc "CUDNN_INSTALL_PATH" "$CUDNN_INSTALL_PATH"
break
fi
fi
echo "Invalid path to cuDNN ${CUDNN_VERSION} toolkit. Neither of the following two files can be found:"
echo "${CUDNN_INSTALL_PATH}/${CUDA_DNN_LIB_PATH}"
echo "${CUDNN_INSTALL_PATH}/${CUDA_DNN_LIB_ALT_PATH}"
if is_linux; then
echo "${CUDNN_PATH_FROM_LDCONFIG}${TF_CUDNN_EXT}"
fi
if [ -z "$fromuser" ]; then
exit 1
fi
# Retry
TF_CUDNN_VERSION=""
CUDNN_INSTALL_PATH=""
done
export TF_CUDNN_VERSION
write_action_env_to_bazelrc "TF_CUDNN_VERSION" "$TF_CUDNN_VERSION"
# Configure the compute capabilities that TensorFlow builds for.
# Since Cuda toolkit is not backward-compatible, this is not guaranteed to work.
function get_native_cuda_compute_capabilities {
device_query_bin="$CUDA_TOOLKIT_PATH/extras/demo_suite/deviceQuery" # Also works on Windows without .exe
"$device_query_bin" | grep 'Capability' | grep -o '[0-9]*\.[0-9]*' | sed ':a;{N;s/\n/,/};ba'
exit 0 # ensure that this function always exit success even if device detection fails, to prevent the whole configure from aborting
}
while true; do
fromuser=""
native_cuda_compute_capabilities=$(get_native_cuda_compute_capabilities)
default_cuda_compute_capabilities=${native_cuda_compute_capabilities:-"3.5,5.2"}
if [ -z "$TF_CUDA_COMPUTE_CAPABILITIES" ]; then
cat << EOF
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
EOF
read -p "[Default is: \"$default_cuda_compute_capabilities\"]: " TF_CUDA_COMPUTE_CAPABILITIES
fromuser=1
fi
if [ -z "$TF_CUDA_COMPUTE_CAPABILITIES" ]; then
TF_CUDA_COMPUTE_CAPABILITIES=$default_cuda_compute_capabilities
fi
# Check whether all capabilities from the input is valid
COMPUTE_CAPABILITIES=${TF_CUDA_COMPUTE_CAPABILITIES//,/ }
ALL_VALID=1
for CAPABILITY in $COMPUTE_CAPABILITIES; do
if [[ ! "$CAPABILITY" =~ [0-9]+.[0-9]+ ]]; then
echo "Invalid compute capability: " $CAPABILITY
ALL_VALID=0
break
fi
done
if [ "$ALL_VALID" == "0" ]; then
if [ -z "$fromuser" ]; then
exit 1
fi
else
export TF_CUDA_COMPUTE_CAPABILITIES
write_action_env_to_bazelrc "TF_CUDA_COMPUTE_CAPABILITIES" "$TF_CUDA_COMPUTE_CAPABILITIES"
break
fi
TF_CUDA_COMPUTE_CAPABILITIES=""
done
if is_windows; then
# The following three variables are needed for MSVC toolchain configuration in Bazel
export CUDA_PATH="$CUDA_TOOLKIT_PATH"
export CUDA_COMPUTE_CAPABILITIES="$TF_CUDA_COMPUTE_CAPABILITIES"
export NO_WHOLE_ARCHIVE_OPTION=1
write_action_env_to_bazelrc "CUDA_PATH" "$CUDA_PATH"
write_action_env_to_bazelrc "CUDA_COMPUTE_CAPABILITIES" "$CUDA_COMPUTE_CAPABILITIES"
write_action_env_to_bazelrc "NO_WHOLE_ARCHIVE_OPTION" "1"
write_to_bazelrc "build --config=win-cuda"
write_to_bazelrc "test --config=win-cuda"
else
# If CUDA is enabled, always use GPU during build and test.
if [ "$TF_CUDA_CLANG" == "1" ]; then
write_to_bazelrc "build --config=cuda_clang"
write_to_bazelrc "test --config=cuda_clang"
else
write_to_bazelrc "build --config=cuda"
write_to_bazelrc "test --config=cuda"
fi
fi
# end of if "$TF_NEED_CUDA" == "1"
fi
# OpenCL configuration
if [ "$TF_NEED_OPENCL" == "1" ]; then
# Determine which C++ compiler should be used as the host compiler
while true; do
fromuser=""
if [ -z "$HOST_CXX_COMPILER" ]; then
default_cxx_host_compiler=$(which g++ || true)
read -p "Please specify which C++ compiler should be used as the host C++ compiler. [Default is $default_cxx_host_compiler]: " HOST_CXX_COMPILER
fromuser="1"
if [ -z "$HOST_CXX_COMPILER" ]; then
HOST_CXX_COMPILER=$default_cxx_host_compiler
fi
fi
if [ -e "$HOST_CXX_COMPILER" ]; then
export HOST_CXX_COMPILER
write_action_env_to_bazelrc "HOST_CXX_COMPILER" "$HOST_CXX_COMPILER"
break
fi
echo "Invalid C++ compiler path. ${HOST_CXX_COMPILER} cannot be found" 1>&2
if [ -z "$fromuser" ]; then
exit 1
fi
HOST_CXX_COMPILER=""
# Retry
done
# Determine which C compiler should be used as the host compiler
while true; do
fromuser=""
if [ -z "$HOST_C_COMPILER" ]; then
default_c_host_compiler=$(which gcc || true)
read -p "Please specify which C compiler should be used as the host C compiler. [Default is $default_c_host_compiler]: " HOST_C_COMPILER
fromuser="1"
if [ -z "$HOST_C_COMPILER" ]; then
HOST_C_COMPILER=$default_c_host_compiler
fi
fi
if [ -e "$HOST_C_COMPILER" ]; then
export HOST_C_COMPILER
write_action_env_to_bazelrc "HOST_C_COMPILER" "$HOST_C_COMPILER"
break
fi
echo "Invalid C compiler path. ${HOST_C_COMPILER} cannot be found" 1>&2
if [ -z "$fromuser" ]; then
exit 1
fi
HOST_C_COMPILER=""
# Retry
done
while true; do
# Configure the OPENCL version to use.
TF_OPENCL_VERSION="1.2"
# Point to ComputeCpp root
if [ -z "$COMPUTECPP_TOOLKIT_PATH" ]; then
default_computecpp_toolkit_path=/usr/local/computecpp
read -p "Please specify the location where ComputeCpp for SYCL $TF_OPENCL_VERSION is installed. [Default is $default_computecpp_toolkit_path]: " COMPUTECPP_TOOLKIT_PATH
fromuser="1"
if [ -z "$COMPUTECPP_TOOLKIT_PATH" ]; then
COMPUTECPP_TOOLKIT_PATH=$default_computecpp_toolkit_path
fi
fi
if is_linux; then
SYCL_RT_LIB_PATH="lib/libComputeCpp.so"
fi
if [ -e "${COMPUTECPP_TOOLKIT_PATH}/${SYCL_RT_LIB_PATH}" ]; then
export COMPUTECPP_TOOLKIT_PATH
write_action_env_to_bazelrc "COMPUTECPP_TOOLKIT_PATH" "$COMPUTECPP_TOOLKIT_PATH"
break
fi
echo "Invalid SYCL $TF_OPENCL_VERSION library path. ${COMPUTECPP_TOOLKIT_PATH}/${SYCL_RT_LIB_PATH} cannot be found"
if [ -z "$fromuser" ]; then
exit 1
fi
# Retry
TF_OPENCL_VERSION=""
COMPUTECPP_TOOLKIT_PATH=""
done
# end of if "$TF_NEED_OPENCL" == "1"
fi
while [ "$TF_NEED_MPI" == "" ]; do
read -p "Do you wish to build TensorFlow with "\
"MPI support? [y/N] " INPUT
case $INPUT in
[Yy]* ) echo "MPI support will be enabled for "\
"TensorFlow"; TF_NEED_MPI=1;;
[Nn]* ) echo "MPI support will not be enabled for "\
"TensorFlow"; TF_NEED_MPI=0;;
"" ) echo "MPI support will not be enabled for "\
"TensorFlow"; TF_NEED_MPI=0;;
* ) echo "Invalid selection: " $INPUT;;
esac
done
# Find out where the MPI toolkit is installed
while true; do
if [ "$TF_NEED_MPI" == "0" ]; then
break;
fi
fromuser=""
if [ -z "$MPI_HOME" ]; then
#Get the base folder by removing the bin path
default_mpi_path=$(dirname $(dirname $(which mpirun)) || dirname $(dirname $(which mpiexec)) || true)
read -p "Please specify the MPI toolkit folder. [Default is $default_mpi_path]: " MPI_HOME
fromuser="1"
if [ -z "$MPI_HOME" ]; then
MPI_HOME=$default_mpi_path
fi
fi
#Check that the include and library folders are where we expect them to be
if [ -e "$MPI_HOME/include" ] && [ -e "$MPI_HOME/lib" ]; then
break
fi
echo "Invalid path to the MPI Toolkit. ${MPI_HOME}/include or ${MPI_HOME}/lib cannot be found."
if [ -z "$fromuser" ]; then
exit 1
fi
# Retry
MPI_HOME=""
done
if [ "$TF_NEED_MPI" == "1" ]; then
write_to_bazelrc 'build --define with_mpi_support=true'
#Link the MPI header files
ln -sf "${MPI_HOME}/include/mpi.h" third_party/mpi/mpi.h
#Determine if we use OpenMPI or MVAPICH, these require different header files
#to be included here to make bazel dependency checker happy
if [ -e "${MPI_HOME}/include/mpi_portable_platform.h" ]; then
#OpenMPI
ln -sf "${MPI_HOME}/include/mpi_portable_platform.h" third_party/mpi/
sed -i -e "s/MPI_LIB_IS_OPENMPI=False/MPI_LIB_IS_OPENMPI=True/" third_party/mpi/mpi.bzl
else
#MVAPICH / MPICH
ln -sf "${MPI_HOME}/include/mpio.h" third_party/mpi/
ln -sf "${MPI_HOME}/include/mpicxx.h" third_party/mpi/
sed -i -e "s/MPI_LIB_IS_OPENMPI=True/MPI_LIB_IS_OPENMPI=False/" third_party/mpi/mpi.bzl
fi
if [ -e "${MPI_HOME}/lib/libmpi.so" ]; then
ln -sf "${MPI_HOME}/lib/libmpi.so" third_party/mpi/
else
echo "Cannot find the MPI library file in ${MPI_HOME}/lib "
exit 1
fi
fi
echo "Configuration finished" echo "Configuration finished"

950
configure.py Normal file
View File

@ -0,0 +1,950 @@
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""configure script to get build parameters from user."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import errno
import os
import platform
import re
import site
import subprocess
import sys
_TF_BAZELRC = '.tf_configure.bazelrc'
_DEFAULT_CUDA_VERSION = '8.0'
_DEFAULT_CUDNN_VERSION = '6'
_DEFAULT_CUDA_COMPUTE_CAPABILITIES = '3.5,5.2'
_DEFAULT_CUDA_PATH = '/usr/local/cuda'
_DEFAULT_CUDA_PATH_LINUX = '/opt/cuda'
_DEFAULT_CUDA_PATH_WIN = ('C:/Program Files/NVIDIA GPU Computing '
'Toolkit/CUDA/v%s' % _DEFAULT_CUDA_VERSION)
_TF_OPENCL_VERSION = '1.2'
_DEFAULT_COMPUTECPP_TOOLKIT_PATH = '/usr/local/computecpp'
def is_windows():
return platform.system() == 'Windows'
def is_linux():
return platform.system() == 'Linux'
def is_macos():
return platform.system() == 'Darwin'
def is_ppc64le():
return platform.machine() == 'ppc64le'
def get_input(question):
try:
try:
answer = raw_input(question)
except NameError:
answer = input(question) # pylint: disable=bad-builtin
except EOFError:
answer = ''
return answer
def symlink_force(target, link_name):
"""Force symlink, equivalent of 'ln -sf'.
Args:
target: items to link to.
link_name: name of the link.
"""
try:
os.symlink(target, link_name)
except OSError as e:
if e.errno == errno.EEXIST:
os.remove(link_name)
os.symlink(target, link_name)
else:
raise e
def sed_in_place(filename, old, new):
"""Replace old string with new string in file.
Args:
filename: string for filename.
old: string to replace.
new: new string to replace to.
"""
with open(filename, 'r') as f:
filedata = f.read()
newdata = filedata.replace(old, new)
with open(filename, 'w') as f:
f.write(newdata)
def remove_line_with(filename, token):
"""Remove lines that contain token from file.
Args:
filename: string for filename.
token: string token to check if to remove a line from file or not.
"""
with open(filename, 'r') as f:
filedata = f.read()
with open(filename, 'w') as f:
for line in filedata.strip().split('\n'):
if token not in line:
f.write(line + '\n')
def write_to_bazelrc(line):
with open(_TF_BAZELRC, 'a') as f:
f.write(line + '\n')
def write_action_env_to_bazelrc(var_name, var):
write_to_bazelrc('build --action_env %s="%s"' % (var_name, str(var)))
def run_shell(cmd):
return subprocess.check_output(cmd, shell=True).decode('UTF-8').strip()
def cygpath(path):
"""Convert path from posix to windows."""
return run_shell('cygpath -m "%s"' % path)
def get_python_path(environ_cp):
"""Get the python site package paths."""
python_paths = []
if environ_cp.get('PYTHONPATH'):
python_paths = environ_cp.get('PYTHONPATH').split(':')
try:
library_paths = site.getsitepackages()
except AttributeError:
from distutils.sysconfig import get_python_lib # pylint: disable=g-import-not-at-top
library_paths = [get_python_lib()]
all_paths = set(python_paths + library_paths)
paths = []
for path in all_paths:
if os.path.isdir(path):
paths.append(path)
return paths
def setup_python(environ_cp):
"""Setup python related env variables."""
# Get PYTHON_BIN_PATH, default is the current running python.
default_python_bin_path = sys.executable
ask_python_bin_path = ('Please specify the location of python. [Default is '
'%s]: ') % default_python_bin_path
while True:
python_bin_path = get_from_env_or_user_or_default(
environ_cp, 'PYTHON_BIN_PATH', ask_python_bin_path,
default_python_bin_path)
# Check if the path is valid
if (os.path.isfile(python_bin_path) and os.access(
python_bin_path, os.X_OK)) or (os.path.isdir(python_bin_path)):
break
elif not os.path.exists(python_bin_path):
print('Invalid python path: %s cannot be found.' % python_bin_path)
else:
print('%s is not executable. Is it the python binary?' % python_bin_path)
environ_cp['PYTHON_BIN_PATH'] = ''
# Get PYTHON_LIB_PATH
python_lib_path = environ_cp.get('PYTHON_LIB_PATH')
if not python_lib_path:
python_lib_paths = get_python_path(environ_cp)
if environ_cp.get('USE_DEFAULT_PYTHON_LIB_PATH') == '1':
environ_cp['PYTHON_LIB_PATH'] = python_lib_paths[0]
else:
print('Found possible Python library paths:\n%s' %
'\n'.join(python_lib_paths))
default_python_lib_path = python_lib_paths[0]
python_lib_path = get_input(
'Please input the desired Python library path to use. Default is %s'
% python_lib_paths[0])
if not python_lib_path:
python_lib_path = default_python_lib_path
environ_cp['PYTHON_LIB_PATH'] = python_lib_path
python_major_version = sys.version_info[0]
# Convert python path to Windows style before writing into bazel.rc
if is_windows():
python_bin_path = cygpath(python_bin_path)
python_lib_path = cygpath(python_lib_path)
# Set-up env variables used by python_configure.bzl
write_action_env_to_bazelrc('PYTHON_BIN_PATH', python_bin_path)
write_action_env_to_bazelrc('PYTHON_LIB_PATH', python_lib_path)
write_to_bazelrc('build --define PYTHON_BIN_PATH="%s"' % python_bin_path)
write_to_bazelrc('build --define PYTHON_LIB_PATH="%s"' % python_lib_path)
write_to_bazelrc('build --force_python=py%s' % python_major_version)
write_to_bazelrc('build --host_force_python=py%s' % python_major_version)
write_to_bazelrc('build --python%s_path=\"%s"' % (python_major_version,
python_bin_path))
write_to_bazelrc('test --force_python=py%s' % python_major_version)
write_to_bazelrc('test --host_force_python=py%s' % python_major_version)
write_to_bazelrc('test --define PYTHON_BIN_PATH="%s"' % python_bin_path)
write_to_bazelrc('test --define PYTHON_LIB_PATH="%s"' % python_lib_path)
write_to_bazelrc('run --define PYTHON_BIN_PATH="%s"' % python_bin_path)
write_to_bazelrc('run --define PYTHON_LIB_PATH="%s"' % python_lib_path)
environ_cp['PYTHON_BIN_PATH'] = python_bin_path
# Write tools/python_bin_path.sh
with open('tools/python_bin_path.sh', 'w') as f:
f.write('export PYTHON_BIN_PATH="%s"' % python_bin_path)
def reset_tf_configure_bazelrc():
"""Reset file that contains customized config settings."""
open(_TF_BAZELRC, 'w').close()
home = os.path.expanduser('~')
if not os.path.exists('.bazelrc'):
if os.path.exists(os.path.join(home, '.bazelrc')):
with open('.bazelrc', 'a') as f:
f.write('import %s/.bazelrc\n' % home)
else:
open('.bazelrc', 'w').close()
remove_line_with('.bazelrc', 'tf_configure')
with open('.bazelrc', 'a') as f:
f.write('import %workspace%/.tf_configure.bazelrc\n')
def run_gen_git_source(environ_cp):
"""Run the gen_git_source to create links.
The links are for bazel to track dependencies for git hash propagation.
Args:
environ_cp: copy of the os.environ.
"""
cmd = '%s tensorflow/tools/git/gen_git_source.py --configure %s' % (
environ_cp.get('PYTHON_BIN_PATH'), os.getcwd())
os.system(cmd)
def cleanup_makefile():
"""Delete any leftover BUILD files from the Makefile build.
These files could interfere with Bazel parsing.
"""
makefile_download_dir = 'tensorflow/contrib/makefile/downloads'
if os.path.isdir(makefile_download_dir):
for root, _, filenames in os.walk(makefile_download_dir):
for f in filenames:
if f.endswith('BUILD'):
os.remove(os.path.join(root, f))
def get_var(environ_cp,
var_name,
query_item,
enabled_by_default,
question=None,
yes_reply=None,
no_reply=None):
"""Get boolean input from user.
If var_name is not set in env, ask user to enable query_item or not. If the
response is empty, use the default.
Args:
environ_cp: copy of the os.environ.
var_name: string for name of environment variable, e.g. "TF_NEED_HDFS".
query_item: string for feature related to the variable, e.g. "Hadoop File
System".
enabled_by_default: boolean for default behavior.
question: optional string for how to ask for user input.
yes_reply: optionanl string for reply when feature is enabled.
no_reply: optional string for reply when feature is disabled.
Returns:
boolean value of the variable.
"""
if not question:
question = 'Do you wish to build TensorFlow with %s support?' % query_item
if not yes_reply:
yes_reply = '%s support will be enabled for TensorFlow.' % query_item
if not no_reply:
no_reply = 'No %s' % yes_reply
yes_reply += '\n'
no_reply += '\n'
if enabled_by_default:
question += ' [Y/n]: '
else:
question += ' [y/N]: '
var = environ_cp.get(var_name)
while var is None:
user_input_origin = get_input(question)
user_input = user_input_origin.strip().lower()
if user_input == 'y':
print(yes_reply)
var = True
elif user_input == 'n':
print(no_reply)
var = False
elif not user_input:
if enabled_by_default:
print(yes_reply)
var = True
else:
print(no_reply)
var = False
else:
print('Invalid selection: %s' % user_input_origin)
return var
def set_build_var(environ_cp, var_name, query_item, option_name,
enabled_by_default):
"""Set if query_item will be enabled for the build.
Ask user if query_item will be enabled. Default is used if no input is given.
Set subprocess environment variable and write to .bazelrc if enabled.
Args:
environ_cp: copy of the os.environ.
var_name: string for name of environment variable, e.g. "TF_NEED_HDFS".
query_item: string for feature related to the variable, e.g. "Hadoop File
System".
option_name: string for option to define in .bazelrc.
enabled_by_default: boolean for default behavior.
"""
var = str(int(get_var(environ_cp, var_name, query_item, enabled_by_default)))
environ_cp[var_name] = var
if var == '1':
write_to_bazelrc('build --define %s=true' % option_name)
def set_action_env_var(environ_cp,
var_name,
query_item,
enabled_by_default,
question=None,
yes_reply=None,
no_reply=None):
"""Set boolean action_env variable.
Ask user if query_item will be enabled. Default is used if no input is given.
Set environment variable and write to .bazelrc.
Args:
environ_cp: copy of the os.environ.
var_name: string for name of environment variable, e.g. "TF_NEED_HDFS".
query_item: string for feature related to the variable, e.g. "Hadoop File
System".
enabled_by_default: boolean for default behavior.
question: optional string for how to ask for user input.
yes_reply: optionanl string for reply when feature is enabled.
no_reply: optional string for reply when feature is disabled.
"""
var = int(
get_var(environ_cp, var_name, query_item, enabled_by_default, question,
yes_reply, no_reply))
write_action_env_to_bazelrc(var_name, var)
environ_cp[var_name] = str(var)
def check_bazel_version(min_version):
"""Check installed bezel version is at least min_version.
Args:
min_version: string for minimum bazel version.
"""
try:
curr_version = run_shell('bazel version')
except subprocess.CalledProcessError:
print('Cannot find bazel. Please install bazel.')
sys.exit(0)
for line in curr_version.split('\n'):
if 'Build label: ' in line:
curr_version = line.split('Build label: ')[1]
break
min_version_segments = min_version.split('.')
curr_version_segments = curr_version.split('.')
# Check if current bazel version can be detected properly.
for seg in curr_version_segments:
if not seg.isdigit():
print('WARNING: current bazel installation is not a release version.')
print('Make sure you are running at least bazel %s' % min_version)
return
min_version_str = ''.join(['%03d' % int(seg) for seg in min_version_segments])
curr_version_str = ''.join(
['%03d' % int(seg) for seg in curr_version_segments])
if int(curr_version_str) < int(min_version_str):
print('Please upgrade your bazel installation to version %s or higher to '
'build TensorFlow!' % min_version)
sys.exit(0)
def set_cc_opt_flags(environ_cp):
"""Set up architecture-dependent optimization flags.
Also append CC optimization flags to bazel.rc..
Args:
environ_cp: copy of the os.environ.
"""
if is_ppc64le():
# gcc on ppc64le does not support -march, use mcpu instead
default_cc_opt_flags = '-mcpu=native'
else:
default_cc_opt_flags = '-march=native'
question = ('Please specify optimization flags to use during compilation when'
' bazel option "--config=opt" is specified [Default is %s]: '
) % default_cc_opt_flags
cc_opt_flags = get_from_env_or_user_or_default(environ_cp, 'CC_OPT_FLAGS',
question, default_cc_opt_flags)
for opt in cc_opt_flags.split():
write_to_bazelrc('build:opt --cxxopt=%s --copt=%s' % (opt, opt))
def set_tf_cuda_clang(environ_cp):
"""set TF_CUDA_CLANG action_env.
Args:
environ_cp: copy of the os.environ.
"""
question = 'Do you want to use clang as CUDA compiler?'
yes_reply = 'Clang will be used as CUDA compiler.'
no_reply = 'nvcc will be used as CUDA compiler.'
set_action_env_var(
environ_cp,
'TF_CUDA_CLANG',
None,
False,
question=question,
yes_reply=yes_reply,
no_reply=no_reply)
def get_from_env_or_user_or_default(environ_cp, var_name, ask_for_var,
var_default):
"""Get var_name either from env, or user or default.
If var_name has been set as environment variable, use the preset value, else
ask for user input. If no input is provided, the default is used.
Args:
environ_cp: copy of the os.environ.
var_name: string for name of environment variable, e.g. "TF_NEED_HDFS".
ask_for_var: string for how to ask for user input.
var_default: default value string.
Returns:
string value for var_name
"""
var = environ_cp.get(var_name)
if not var:
var = get_input(ask_for_var)
if not var:
var = var_default
return var
def set_clang_cuda_compiler_path(environ_cp):
"""Set CLANG_CUDA_COMPILER_PATH."""
default_clang_path = run_shell('which clang || true')
ask_clang_path = ('Please specify which clang should be used as device and '
'host compiler. [Default is %s]: ') % default_clang_path
while True:
clang_cuda_compiler_path = get_from_env_or_user_or_default(
environ_cp, 'CLANG_CUDA_COMPILER_PATH', ask_clang_path,
default_clang_path)
if os.path.exists(clang_cuda_compiler_path):
break
# Reset and retry
print('Invalid clang path: %s cannot be found.' % clang_cuda_compiler_path)
environ_cp['CLANG_CUDA_COMPILER_PATH'] = ''
# Set CLANG_CUDA_COMPILER_PATH
environ_cp['CLANG_CUDA_COMPILER_PATH'] = clang_cuda_compiler_path
write_action_env_to_bazelrc('CLANG_CUDA_COMPILER_PATH',
clang_cuda_compiler_path)
def set_gcc_host_compiler_path(environ_cp):
"""Set GCC_HOST_COMPILER_PATH."""
default_gcc_host_compiler_path = run_shell('which gcc || true')
cuda_bin_symlink = '%s/bin/gcc' % environ_cp.get('CUDA_TOOLKIT_PATH')
if os.path.islink(cuda_bin_symlink):
# os.readlink is only available in linux
default_gcc_host_compiler_path = run_shell('readlink %s' % cuda_bin_symlink)
ask_gcc_path = (
'Please specify which gcc should be used by nvcc as the '
'host compiler. [Default is %s]: ') % default_gcc_host_compiler_path
while True:
gcc_host_compiler_path = get_from_env_or_user_or_default(
environ_cp, 'GCC_HOST_COMPILER_PATH', ask_gcc_path,
default_gcc_host_compiler_path)
if os.path.exists(gcc_host_compiler_path):
break
# Reset and retry
print('Invalid gcc path. %s cannot be found' % gcc_host_compiler_path)
environ_cp['GCC_HOST_COMPILER_PATH'] = ''
# Set GCC_HOST_COMPILER_PATH
environ_cp['GCC_HOST_COMPILER_PATH'] = gcc_host_compiler_path
write_action_env_to_bazelrc('GCC_HOST_COMPILER_PATH', gcc_host_compiler_path)
def set_tf_cuda_version(environ_cp):
"""Set CUDA_TOOLKIT_PATH and TF_CUDA_VERSION."""
ask_cuda_version = (
'Please specify the CUDA SDK version you want to use, '
'e.g. 7.0. [Leave empty to default to CUDA %s]: ') % _DEFAULT_CUDA_VERSION
while True:
# Configure the Cuda SDK version to use.
tf_cuda_version = get_from_env_or_user_or_default(
environ_cp, 'TF_CUDA_VERSION', ask_cuda_version, _DEFAULT_CUDA_VERSION)
# Find out where the CUDA toolkit is installed
default_cuda_path = _DEFAULT_CUDA_PATH
if is_windows():
default_cuda_path = cygpath(
environ_cp.get('CUDA_PATH', _DEFAULT_CUDA_PATH_WIN))
elif is_linux():
# If the default doesn't exist, try an alternative default.
if (not os.path.exists(default_cuda_path)
) and os.path.exists(_DEFAULT_CUDA_PATH_LINUX):
default_cuda_path = _DEFAULT_CUDA_PATH_LINUX
ask_cuda_path = ('Please specify the location where CUDA %s toolkit is'
' installed. Refer to README.md for more details. '
'[Default is %s]: ') % (tf_cuda_version, default_cuda_path)
cuda_toolkit_path = get_from_env_or_user_or_default(
environ_cp, 'CUDA_TOOLKIT_PATH', ask_cuda_path, default_cuda_path)
if is_windows():
cuda_rt_lib_path = 'lib/x64/cudart.lib'
elif is_linux():
cuda_rt_lib_path = 'lib64/libcudart.so.%s' % tf_cuda_version
elif is_macos():
cuda_rt_lib_path = 'lib/libcudart.%s.dylib' % tf_cuda_version
cuda_toolkit_path_full = os.path.join(cuda_toolkit_path, cuda_rt_lib_path)
if os.path.exists(cuda_toolkit_path_full):
break
# Reset and retry
print('Invalid path to CUDA %s toolkit. %s cannot be found' %
(tf_cuda_version, cuda_toolkit_path_full))
environ_cp['TF_CUDA_VERSION'] = ''
environ_cp['CUDA_TOOLKIT_PATH'] = ''
# Set CUDA_TOOLKIT_PATH and TF_CUDA_VERSION
environ_cp['CUDA_TOOLKIT_PATH'] = cuda_toolkit_path
write_action_env_to_bazelrc('CUDA_TOOLKIT_PATH', cuda_toolkit_path)
environ_cp['TF_CUDA_VERSION'] = tf_cuda_version
write_action_env_to_bazelrc('TF_CUDA_VERSION', tf_cuda_version)
def set_tf_cunn_version(environ_cp):
"""Set CUDNN_INSTALL_PATH and TF_CUDNN_VERSION."""
ask_cudnn_version = (
'"Please specify the cuDNN version you want to use. '
'[Leave empty to default to cuDNN %s.0]: ') % _DEFAULT_CUDNN_VERSION
while True:
tf_cudnn_version = get_from_env_or_user_or_default(
environ_cp, 'TF_CUDNN_VERSION', ask_cudnn_version,
_DEFAULT_CUDNN_VERSION)
default_cudnn_path = environ_cp.get('CUDA_TOOLKIT_PATH')
ask_cudnn_path = (r'Please specify the location where cuDNN %s library is '
'installed. Refer to README.md for more details. [Default'
' is %s]:') % (tf_cudnn_version, default_cudnn_path)
cudnn_install_path = get_from_env_or_user_or_default(
environ_cp, 'CUDNN_INSTALL_PATH', ask_cudnn_path, default_cudnn_path)
# Result returned from "read" will be used unexpanded. That make "~"
# unusable. Going through one more level of expansion to handle that.
cudnn_install_path = os.path.realpath(
os.path.expanduser(cudnn_install_path))
if is_windows():
cudnn_install_path = cygpath(cudnn_install_path)
if is_windows():
cuda_dnn_lib_path = 'lib/x64/cudnn.lib'
cuda_dnn_lib_alt_path = 'lib/x64/cudnn.lib'
elif is_linux():
cuda_dnn_lib_path = 'lib64/libcudnn.so.%s' % tf_cudnn_version
cuda_dnn_lib_alt_path = 'libcudnn.so.%s' % tf_cudnn_version
elif is_macos():
cuda_dnn_lib_path = 'lib/libcudnn.%s.dylib' % tf_cudnn_version
cuda_dnn_lib_alt_path = 'libcudnn.%s.dylib' % tf_cudnn_version
cuda_dnn_lib_path_full = os.path.join(cudnn_install_path, cuda_dnn_lib_path)
cuda_dnn_lib_alt_path_full = os.path.join(cudnn_install_path,
cuda_dnn_lib_alt_path)
if os.path.exists(cuda_dnn_lib_path_full) or os.path.exists(
cuda_dnn_lib_alt_path_full):
break
# Try another alternative for Linux
if is_linux():
if subprocess.call(['which', 'ldconfig']):
ldconfig_bin = '/sbin/ldconfig'
else:
ldconfig_bin = 'ldconfig'
cudnn_path_from_ldconfig = run_shell(
r'%s -p | sed -n "s/.*libcudnn.so .* => \(.*\)/\\1/p"' % ldconfig_bin)
if os.path.exists('%s.%s' % (cudnn_path_from_ldconfig, tf_cudnn_version)):
cudnn_install_path = os.path.dirname(cudnn_path_from_ldconfig)
break
# Reset and Retry
print(
'Invalid path to cuDNN %s toolkit. None of the following files can be '
'found:' % tf_cudnn_version)
print(cuda_dnn_lib_path_full)
print(cuda_dnn_lib_alt_path_full)
if is_linux():
print('%s.%s' % (cudnn_path_from_ldconfig, tf_cudnn_version))
environ_cp['TF_CUDNN_VERSION'] = ''
# Set CUDNN_INSTALL_PATH and TF_CUDNN_VERSION
environ_cp['CUDNN_INSTALL_PATH'] = cudnn_install_path
write_action_env_to_bazelrc('CUDNN_INSTALL_PATH', cudnn_install_path)
environ_cp['TF_CUDNN_VERSION'] = tf_cudnn_version
write_action_env_to_bazelrc('TF_CUDNN_VERSION', tf_cudnn_version)
def get_native_cuda_compute_capabilities(environ_cp):
"""Get native cuda compute capabilities.
Args:
environ_cp: copy of the os.environ.
Returns:
string of native cuda compute capabilities, separated by comma.
"""
device_query_bin = os.path.join(
environ_cp.get('CUDA_TOOLKIT_PATH'), 'extras/demo_suite/deviceQuery')
cmd = (r'"%s" | grep "Capability" | grep -o "[0-9]*\.[0-9]*" | sed '
'":a;{N;s/\\n/,/};ba"') % device_query_bin
try:
output = run_shell(cmd)
except subprocess.CalledProcessError:
output = ''
return output
def set_tf_cuda_compute_capabilities(environ_cp):
"""Set TF_CUDA_COMPUTE_CAPABILITIES."""
while True:
native_cuda_compute_capabilities = get_native_cuda_compute_capabilities(
environ_cp)
if not native_cuda_compute_capabilities:
default_cuda_compute_capabilities = _DEFAULT_CUDA_COMPUTE_CAPABILITIES
else:
default_cuda_compute_capabilities = native_cuda_compute_capabilities
ask_cuda_compute_capabilities = (
'Please specify a list of comma-separated '
'Cuda compute capabilities you want to '
'build with.\nYou can find the compute '
'capability of your device at: '
'https://developer.nvidia.com/cuda-gpus.\nPlease'
' note that each additional compute '
'capability significantly increases your '
'build time and binary size. [Default is: %s]' %
default_cuda_compute_capabilities)
tf_cuda_compute_capabilities = get_from_env_or_user_or_default(
environ_cp, 'TF_CUDA_COMPUTE_CAPABILITIES',
ask_cuda_compute_capabilities, default_cuda_compute_capabilities)
# Check whether all capabilities from the input is valid
all_valid = True
for compute_capability in tf_cuda_compute_capabilities.split(','):
if not re.match('[0-9]+.[0-9]+', compute_capability):
print('Invalid compute capability: ' % compute_capability)
all_valid = False
if all_valid:
break
# Reset and Retry
environ_cp['TF_CUDA_COMPUTE_CAPABILITIES'] = ''
# Set TF_CUDA_COMPUTE_CAPABILITIES
environ_cp['TF_CUDA_COMPUTE_CAPABILITIES'] = tf_cuda_compute_capabilities
write_action_env_to_bazelrc('TF_CUDA_COMPUTE_CAPABILITIES',
tf_cuda_compute_capabilities)
def set_other_cuda_vars(environ_cp):
"""Set other CUDA related variables."""
if is_windows():
# The following three variables are needed for MSVC toolchain configuration
# in Bazel
environ_cp['CUDA_PATH'] = environ_cp.get('CUDA_TOOLKIT_PATH')
environ_cp['CUDA_COMPUTE_CAPABILITIES'] = environ_cp.get(
'TF_CUDA_COMPUTE_CAPABILITIES')
environ_cp['NO_WHOLE_ARCHIVE_OPTION'] = 1
write_action_env_to_bazelrc('CUDA_PATH', environ_cp.get('CUDA_PATH'))
write_action_env_to_bazelrc('CUDA_COMPUTE_CAPABILITIE',
environ_cp.get('CUDA_COMPUTE_CAPABILITIE'))
write_action_env_to_bazelrc('NO_WHOLE_ARCHIVE_OPTION',
environ_cp.get('NO_WHOLE_ARCHIVE_OPTION'))
write_to_bazelrc('build --config=win-cuda')
write_to_bazelrc('test --config=win-cuda')
else:
# If CUDA is enabled, always use GPU during build and test.
if environ_cp.get('TF_CUDA_CLANG') == '1':
write_to_bazelrc('build --config=cuda_clang')
write_to_bazelrc('test --config=cuda_clang')
else:
write_to_bazelrc('build --config=cuda')
write_to_bazelrc('test --config=cuda')
def set_host_cxx_compiler(environ_cp):
"""Set HOST_CXX_COMPILER."""
default_cxx_host_compiler = run_shell('which g++ || true')
ask_cxx_host_compiler = (
'Please specify which C++ compiler should be used as'
' the host C++ compiler. [Default is %s]: ') % default_cxx_host_compiler
while True:
host_cxx_compiler = get_from_env_or_user_or_default(
environ_cp, 'HOST_CXX_COMPILER', ask_cxx_host_compiler,
default_cxx_host_compiler)
if os.path.exists(host_cxx_compiler):
break
# Reset and retry
print('Invalid C++ compiler path. %s cannot be found' % host_cxx_compiler)
environ_cp['HOST_CXX_COMPILER'] = ''
# Set HOST_CXX_COMPILER
environ_cp['HOST_CXX_COMPILER'] = host_cxx_compiler
write_action_env_to_bazelrc('HOST_CXX_COMPILER', host_cxx_compiler)
def set_host_c_compiler(environ_cp):
"""Set HOST_C_COMPILER."""
default_c_host_compiler = run_shell('which gcc || true')
ask_c_host_compiler = (
'Please specify which C compiler should be used as the'
' host C compiler. [Default is %s]: ') % default_c_host_compiler
while True:
host_c_compiler = get_from_env_or_user_or_default(
environ_cp, 'HOST_C_COMPILER', ask_c_host_compiler,
default_c_host_compiler)
if os.path.exists(host_c_compiler):
break
# Reset and retry
print('Invalid C compiler path. %s cannot be found' % host_c_compiler)
environ_cp['HOST_C_COMPILER'] = ''
# Set HOST_C_COMPILER
environ_cp['HOST_C_COMPILER'] = host_c_compiler
write_action_env_to_bazelrc('HOST_C_COMPILER', host_c_compiler)
def set_computecpp_toolkit_path(environ_cp):
"""Set COMPUTECPP_TOOLKIT_PATH."""
ask_computecpp_toolkit_path = ('Please specify the location where ComputeCpp '
'for SYCL %s is installed. [Default is %s]: '
) % (_TF_OPENCL_VERSION,
_DEFAULT_COMPUTECPP_TOOLKIT_PATH)
while True:
computecpp_toolkit_path = get_from_env_or_user_or_default(
environ_cp, 'COMPUTECPP_TOOLKIT_PATH', ask_computecpp_toolkit_path,
_DEFAULT_COMPUTECPP_TOOLKIT_PATH)
if is_linux():
sycl_rt_lib_path = 'lib/libComputeCpp.so'
else:
sycl_rt_lib_path = ''
sycl_rt_lib_path_full = os.path.join(computecpp_toolkit_path,
sycl_rt_lib_path)
if os.path.exists(sycl_rt_lib_path_full):
break
print('Invalid SYCL %s library path. %s cannot be found' %
(_TF_OPENCL_VERSION, sycl_rt_lib_path_full))
environ_cp['COMPUTECPP_TOOLKIT_PATH'] = ''
# Set COMPUTECPP_TOOLKIT_PATH
environ_cp['COMPUTECPP_TOOLKIT_PATH'] = computecpp_toolkit_path
write_action_env_to_bazelrc('COMPUTECPP_TOOLKIT_PATH',
computecpp_toolkit_path)
def set_mpi_home(environ_cp):
"""Set MPI_HOME."""
cmd = ('dirname $(dirname $(which mpirun)) || dirname $(dirname $(which '
'mpiexec)) || true')
default_mpi_home = run_shell(cmd)
ask_mpi_home = ('Please specify the MPI toolkit folder. [Default is %s]: '
) % default_mpi_home
while True:
mpi_home = get_from_env_or_user_or_default(environ_cp, 'MPI_HOME',
ask_mpi_home, default_mpi_home)
if os.path.exists(os.path.join(mpi_home, 'include')) and os.path.exists(
os.path.join(mpi_home, 'lib')):
break
print('Invalid path to the MPI Toolkit. %s or %s cannot be found' %
(os.path.join(mpi_home, 'include'),
os.path.exists(os.path.join(mpi_home, 'lib'))))
environ_cp['MPI_HOME'] = ''
# Set MPI_HOME
environ_cp['MPI_HOME'] = str(mpi_home)
def set_other_mpi_vars(environ_cp):
"""Set other MPI related variables."""
# Link the MPI header files
mpi_home = environ_cp.get('MPI_HOME')
symlink_force('%s/include/mpi.h' % mpi_home, 'third_party/mpi/mpi.h')
# Determine if we use OpenMPI or MVAPICH, these require different header files
# to be included here to make bazel dependency checker happy
if os.path.exists(os.path.join(mpi_home, 'include/mpi_portable_platform.h')):
symlink_force(
os.path.join(mpi_home, 'include/mpi_portable_platform.h'),
'third_party/mpi/mpi_portable_platform.h')
# TODO(gunan): avoid editing files in configure
sed_in_place('third_party/mpi/mpi.bzl', 'MPI_LIB_IS_OPENMPI=False',
'MPI_LIB_IS_OPENMPI=True')
else:
# MVAPICH / MPICH
symlink_force(
os.path.join(mpi_home, 'include/mpio.h'), 'third_party/mpi/mpio.h')
symlink_force(
os.path.join(mpi_home, 'include/mpicxx.h'), 'third_party/mpi/mpicxx.h')
# TODO(gunan): avoid editing files in configure
sed_in_place('third_party/mpi/mpi.bzl', 'MPI_LIB_IS_OPENMPI=True',
'MPI_LIB_IS_OPENMPI=False')
if os.path.exists(os.path.join(mpi_home, 'lib/libmpi.so')):
symlink_force(
os.path.join(mpi_home, 'lib/libmpi.so'), 'third_party/mpi/libmpi.so')
else:
raise ValueError('Cannot find the MPI library file in %s/lib' % mpi_home)
def set_mkl():
write_to_bazelrc('build:mkl --define with_mkl_support=true')
write_to_bazelrc('build:mkl --define using_mkl=true')
write_to_bazelrc('build:mkl -c opt')
write_to_bazelrc('build:mkl --copt="-DEIGEN_USE_VML"')
print(
'Add "--config=mkl" to your bazel command to build with MKL '
'support.\nPlease note that MKL on MacOS or windows is still not '
'supported.\nIf you would like to use a local MKL instead of '
'downloading, please set the environment variable \"TF_MKL_ROOT\" every '
'time before build.')
def main():
# Make a copy of os.environ to be clear when functions and getting and setting
# environment variables.
environ_cp = dict(os.environ)
check_bazel_version('0.4.5')
reset_tf_configure_bazelrc()
cleanup_makefile()
setup_python(environ_cp)
run_gen_git_source(environ_cp)
if is_windows():
environ_cp['TF_NEED_GCP'] = '0'
environ_cp['TF_NEED_HDFS'] = '0'
environ_cp['TF_NEED_JEMALLOC'] = '0'
environ_cp['TF_NEED_OPENCL'] = '0'
environ_cp['TF_CUDA_CLANG'] = '0'
if is_macos():
environ_cp['TF_NEED_JEMALLOC'] = '0'
set_build_var(environ_cp, 'TF_NEED_JEMALLOC', 'jemalloc as malloc',
'with_jemalloc', True)
set_build_var(environ_cp, 'TF_NEED_GCP', 'Google Cloud Platform',
'with_gcp_support', False)
set_build_var(environ_cp, 'TF_NEED_HDFS', 'Hadoop File System',
'with_hdfs_support', False)
set_build_var(environ_cp, 'TF_ENABLE_XLA', 'XLA JIT', 'with_xla_support',
False)
set_build_var(environ_cp, 'TF_NEED_VERBS', 'VERBS', 'with_verbs_support',
False)
set_action_env_var(environ_cp, 'TF_NEED_OPENCL', 'OpenCL', False)
if environ_cp.get('TF_NEED_OPENCL') == '1':
set_host_cxx_compiler(environ_cp)
set_host_c_compiler(environ_cp)
set_computecpp_toolkit_path(environ_cp)
set_action_env_var(environ_cp, 'TF_NEED_CUDA', 'CUDA', False)
if environ_cp.get('TF_NEED_CUDA') == '1':
set_tf_cuda_version(environ_cp)
set_tf_cunn_version(environ_cp)
set_tf_cuda_compute_capabilities(environ_cp)
set_tf_cuda_clang(environ_cp)
if environ_cp.get('TF_CUDA_CLANG') == '1':
# Set up which clang we should use as the cuda / host compiler.
set_clang_cuda_compiler_path(environ_cp)
else:
# Set up which gcc nvcc should use as the host compiler
# No need to set this on Windows
if not is_windows():
set_gcc_host_compiler_path(environ_cp)
set_other_cuda_vars(environ_cp)
set_build_var(environ_cp, 'TF_NEED_MPI', 'MPI', 'with_mpi_support', False)
if environ_cp.get('TF_NEED_MPI') == '1':
set_mpi_home(environ_cp)
set_other_mpi_vars(environ_cp)
set_cc_opt_flags(environ_cp)
set_mkl()
if __name__ == '__main__':
main()

View File

@ -56,21 +56,16 @@ limitations under the License.
// The implementation below is at the top level instead of the // The implementation below is at the top level instead of the
// brain namespace because we are defining 'extern "C"' functions. // brain namespace because we are defining 'extern "C"' functions.
using tensorflow::error::Code;
using tensorflow::errors::InvalidArgument;
using tensorflow::gtl::ArraySlice;
using tensorflow::strings::StrCat;
using tensorflow::AllocationDescription; using tensorflow::AllocationDescription;
using tensorflow::DataType; using tensorflow::DataType;
using tensorflow::Graph; using tensorflow::Graph;
using tensorflow::GraphDef; using tensorflow::GraphDef;
using tensorflow::mutex_lock;
using tensorflow::NameRangeMap; using tensorflow::NameRangeMap;
using tensorflow::NameRangesForNode; using tensorflow::NameRangesForNode;
using tensorflow::NewSession; using tensorflow::NewSession;
using tensorflow::Node; using tensorflow::Node;
using tensorflow::NodeDef;
using tensorflow::NodeBuilder; using tensorflow::NodeBuilder;
using tensorflow::NodeDef;
using tensorflow::OpDef; using tensorflow::OpDef;
using tensorflow::OpRegistry; using tensorflow::OpRegistry;
using tensorflow::PartialTensorShape; using tensorflow::PartialTensorShape;
@ -83,6 +78,11 @@ using tensorflow::TensorBuffer;
using tensorflow::TensorId; using tensorflow::TensorId;
using tensorflow::TensorShape; using tensorflow::TensorShape;
using tensorflow::TensorShapeProto; using tensorflow::TensorShapeProto;
using tensorflow::error::Code;
using tensorflow::errors::InvalidArgument;
using tensorflow::gtl::ArraySlice;
using tensorflow::mutex_lock;
using tensorflow::strings::StrCat;
extern "C" { extern "C" {
@ -258,24 +258,27 @@ size_t TF_StringEncode(const char* src, size_t src_len, char* dst,
return sz; return sz;
} }
size_t TF_StringDecode(const char* src, size_t src_len, const char** dst, static Status TF_StringDecode_Impl(const char* src, size_t src_len,
size_t* dst_len, TF_Status* status) { const char** dst, size_t* dst_len) {
tensorflow::uint64 len64 = 0; tensorflow::uint64 len64 = 0;
const char* p = tensorflow::core::GetVarint64Ptr(src, src + src_len, &len64); const char* p = tensorflow::core::GetVarint64Ptr(src, src + src_len, &len64);
if (p == nullptr) { if (p == nullptr) {
status->status = return InvalidArgument("invalid string encoding or truncated src buffer");
InvalidArgument("invalid string encoding or truncated src buffer");
return 0;
} }
if (len64 > std::numeric_limits<size_t>::max()) { if (len64 > std::numeric_limits<size_t>::max()) {
status->status = return InvalidArgument("encoded string is ", len64,
InvalidArgument("encoded string is ", len64,
"-bytes, which is too large for this architecture"); "-bytes, which is too large for this architecture");
return 0;
} }
*dst = p; *dst = p;
*dst_len = static_cast<size_t>(len64); *dst_len = static_cast<size_t>(len64);
return static_cast<size_t>(p - src) + *dst_len; return Status::OK();
}
size_t TF_StringDecode(const char* src, size_t src_len, const char** dst,
size_t* dst_len, TF_Status* status) {
status->status = TF_StringDecode_Impl(src, src_len, dst, dst_len);
if (!status->status.ok()) return 0;
return static_cast<size_t>(*dst - src) + *dst_len;
} }
size_t TF_StringEncodedSize(size_t len) { size_t TF_StringEncodedSize(size_t len) {
@ -391,16 +394,20 @@ void TF_Reset(const TF_SessionOptions* opt, const char** containers,
namespace tensorflow { namespace tensorflow {
// Non-static for testing. Status TF_TensorToTensor(const TF_Tensor* src, Tensor* dst) {
bool TF_Tensor_DecodeStrings(TF_Tensor* src, Tensor* dst, TF_Status* status) { if (src->dtype != TF_STRING) {
*dst = TensorCApi::MakeTensor(src->dtype, src->shape, src->buffer);
return Status::OK();
}
// TF_STRING tensors require copying since Tensor class expects a sequence of
// string objects.
const tensorflow::int64 num_elements = src->shape.num_elements(); const tensorflow::int64 num_elements = src->shape.num_elements();
const char* input = reinterpret_cast<const char*>(TF_TensorData(src)); const char* input = reinterpret_cast<const char*>(TF_TensorData(src));
const size_t src_size = TF_TensorByteSize(src); const size_t src_size = TF_TensorByteSize(src);
if (static_cast<tensorflow::int64>(src_size / sizeof(tensorflow::uint64)) < if (static_cast<tensorflow::int64>(src_size / sizeof(tensorflow::uint64)) <
num_elements) { num_elements) {
status->status = InvalidArgument( return InvalidArgument(
"Malformed TF_STRING tensor; too short to hold number of elements"); "Malformed TF_STRING tensor; too short to hold number of elements");
return false;
} }
const char* data_start = input + sizeof(tensorflow::uint64) * num_elements; const char* data_start = input + sizeof(tensorflow::uint64) * num_elements;
const char* limit = input + src_size; const char* limit = input + src_size;
@ -411,24 +418,30 @@ bool TF_Tensor_DecodeStrings(TF_Tensor* src, Tensor* dst, TF_Status* status) {
tensorflow::uint64 offset = tensorflow::uint64 offset =
reinterpret_cast<const tensorflow::uint64*>(input)[i]; reinterpret_cast<const tensorflow::uint64*>(input)[i];
if (static_cast<ptrdiff_t>(offset) >= (limit - data_start)) { if (static_cast<ptrdiff_t>(offset) >= (limit - data_start)) {
status->status = InvalidArgument("Malformed TF_STRING tensor; element ", return InvalidArgument("Malformed TF_STRING tensor; element ", i,
i, " out of range"); " out of range");
return false;
} }
size_t len; size_t len;
const char* p; const char* p;
const char* srcp = data_start + offset; const char* srcp = data_start + offset;
TF_StringDecode(srcp, limit - srcp, &p, &len, status); Status status = TF_StringDecode_Impl(srcp, limit - srcp, &p, &len);
if (!status->status.ok()) { if (!status.ok()) return status;
return false;
}
dstarray(i).assign(p, len); dstarray(i).assign(p, len);
} }
return true; return Status::OK();
} }
// Non-static for testing. // Non-static for testing.
TF_Tensor* TF_Tensor_EncodeStrings(const Tensor& src) { TF_Tensor* TF_TensorFromTensor(const tensorflow::Tensor& src) {
if (src.dtype() != DT_STRING) {
TensorBuffer* buf = TensorCApi::Buffer(src);
buf->Ref();
return new TF_Tensor{static_cast<TF_DataType>(src.dtype()), src.shape(),
buf};
}
// DT_STRING tensors require a copying since TF_Tensor.buffer expects a flatly
// encoded sequence of strings.
// Compute bytes needed for encoding. // Compute bytes needed for encoding.
size_t size = 0; size_t size = 0;
const auto& srcarray = src.flat<tensorflow::string>(); const auto& srcarray = src.flat<tensorflow::string>();
@ -507,16 +520,8 @@ static bool TF_Run_Inputs(
TF_Status* status) { TF_Status* status) {
const int ninputs = input_pairs->size(); const int ninputs = input_pairs->size();
for (int i = 0; i < ninputs; ++i) { for (int i = 0; i < ninputs; ++i) {
TF_Tensor* src = c_inputs[i]; status->status = TF_TensorToTensor(c_inputs[i], &(*input_pairs)[i].second);
if (c_inputs[i]->dtype != TF_STRING) { if (!status->status.ok()) return false;
(*input_pairs)[i].second = tensorflow::TensorCApi::MakeTensor(
src->dtype, src->shape, src->buffer);
} else if (!tensorflow::TF_Tensor_DecodeStrings(
src, &(*input_pairs)[i].second, status)) {
// TF_STRING tensors require copying since Tensor class expects
// a sequence of string objects.
return false;
}
} }
return true; return true;
} }
@ -574,15 +579,7 @@ static void TF_Run_Helper(
static_cast<TF_DataType>(src.dtype()), src.shape()); static_cast<TF_DataType>(src.dtype()), src.shape());
continue; continue;
} }
if (src.dtype() != tensorflow::DT_STRING) { c_outputs[i] = TF_TensorFromTensor(src);
// Share the underlying buffer.
TensorBuffer* buf = tensorflow::TensorCApi::Buffer(src);
buf->Ref();
c_outputs[i] = new TF_Tensor{static_cast<TF_DataType>(src.dtype()),
src.shape(), buf};
} else {
c_outputs[i] = tensorflow::TF_Tensor_EncodeStrings(src);
}
} }
} }
@ -1062,20 +1059,9 @@ void TF_SetAttrTensorShapeProtoList(TF_OperationDescription* desc,
void TF_SetAttrTensor(TF_OperationDescription* desc, const char* attr_name, void TF_SetAttrTensor(TF_OperationDescription* desc, const char* attr_name,
TF_Tensor* value, TF_Status* status) { TF_Tensor* value, TF_Status* status) {
status->status = Status::OK();
Tensor t; Tensor t;
bool ok = true; status->status = TF_TensorToTensor(value, &t);
if (status->status.ok()) desc->node_builder.Attr(attr_name, t);
if (value->dtype != TF_STRING) {
t = tensorflow::TensorCApi::MakeTensor(value->dtype, value->shape,
value->buffer);
} else {
// TF_STRING tensors require copying since Tensor class expects
// a sequence of string objects.
ok = tensorflow::TF_Tensor_DecodeStrings(value, &t, status);
}
if (ok) desc->node_builder.Attr(attr_name, t);
} }
void TF_SetAttrTensorList(TF_OperationDescription* desc, const char* attr_name, void TF_SetAttrTensorList(TF_OperationDescription* desc, const char* attr_name,
@ -1084,21 +1070,14 @@ void TF_SetAttrTensorList(TF_OperationDescription* desc, const char* attr_name,
status->status = Status::OK(); status->status = Status::OK();
std::vector<Tensor> t; std::vector<Tensor> t;
t.reserve(num_values); t.reserve(num_values);
bool ok = true;
for (int i = 0; i < num_values && ok; ++i) { for (int i = 0; i < num_values && status->status.ok(); ++i) {
if (values[i]->dtype != TF_STRING) { Tensor v;
t.emplace_back(tensorflow::TensorCApi::MakeTensor( status->status = TF_TensorToTensor(values[i], &v);
values[i]->dtype, values[i]->shape, values[i]->buffer)); t.emplace_back(v);
} else {
t.emplace_back(::tensorflow::DT_STRING);
// TF_STRING tensors require copying since Tensor class expects
// a sequence of string objects.
ok = tensorflow::TF_Tensor_DecodeStrings(values[i], &t.back(), status);
}
} }
if (ok) desc->node_builder.Attr(attr_name, t); if (status->status.ok()) desc->node_builder.Attr(attr_name, t);
} }
void TF_SetAttrValueProto(TF_OperationDescription* desc, const char* attr_name, void TF_SetAttrValueProto(TF_OperationDescription* desc, const char* attr_name,
@ -1555,9 +1534,7 @@ void TF_OperationGetAttrTensor(TF_Operation* oper, const char* attr_name,
Tensor t; Tensor t;
status->status = tensorflow::GetNodeAttr(oper->node.attrs(), attr_name, &t); status->status = tensorflow::GetNodeAttr(oper->node.attrs(), attr_name, &t);
if (!status->status.ok()) return; if (!status->status.ok()) return;
*value = new TF_Tensor{static_cast<TF_DataType>(t.dtype()), t.shape(), *value = TF_TensorFromTensor(t);
tensorflow::TensorCApi::Buffer(t)};
(*value)->buffer->Ref();
} }
void TF_OperationGetAttrTensorList(TF_Operation* oper, const char* attr_name, void TF_OperationGetAttrTensorList(TF_Operation* oper, const char* attr_name,
@ -1568,10 +1545,7 @@ void TF_OperationGetAttrTensorList(TF_Operation* oper, const char* attr_name,
if (!status->status.ok()) return; if (!status->status.ok()) return;
const auto len = std::min(max_values, static_cast<int>(ts.size())); const auto len = std::min(max_values, static_cast<int>(ts.size()));
for (int i = 0; i < len; ++i) { for (int i = 0; i < len; ++i) {
const Tensor& t = ts[i]; values[i] = TF_TensorFromTensor(ts[i]);
values[i] = new TF_Tensor{static_cast<TF_DataType>(t.dtype()), t.shape(),
tensorflow::TensorCApi::Buffer(t)};
values[i]->buffer->Ref();
} }
} }

View File

@ -45,9 +45,8 @@ limitations under the License.
#include "tensorflow/core/util/equal_graph_def.h" #include "tensorflow/core/util/equal_graph_def.h"
namespace tensorflow { namespace tensorflow {
TF_Tensor* TF_TensorFromTensor(const Tensor& src);
bool TF_Tensor_DecodeStrings(TF_Tensor* src, Tensor* dst, TF_Status* status); Status TF_TensorToTensor(const TF_Tensor* src, Tensor* dst);
TF_Tensor* TF_Tensor_EncodeStrings(const Tensor& src);
namespace { namespace {
@ -146,19 +145,16 @@ void TestEncodeDecode(int line, const std::vector<string>& data) {
for (tensorflow::int64 i = 0; i < src.NumElements(); ++i) { for (tensorflow::int64 i = 0; i < src.NumElements(); ++i) {
src.flat<string>()(i) = data[i]; src.flat<string>()(i) = data[i];
} }
TF_Tensor* dst = TF_Tensor_EncodeStrings(src); TF_Tensor* dst = TF_TensorFromTensor(src);
// Convert back to a C++ Tensor and ensure we get expected output. // Convert back to a C++ Tensor and ensure we get expected output.
TF_Status* status = TF_NewStatus();
Tensor output; Tensor output;
ASSERT_TRUE(TF_Tensor_DecodeStrings(dst, &output, status)) << line; ASSERT_EQ(Status::OK(), TF_TensorToTensor(dst, &output)) << line;
ASSERT_EQ(TF_OK, TF_GetCode(status)) << line;
ASSERT_EQ(src.NumElements(), output.NumElements()) << line; ASSERT_EQ(src.NumElements(), output.NumElements()) << line;
for (tensorflow::int64 i = 0; i < src.NumElements(); ++i) { for (tensorflow::int64 i = 0; i < src.NumElements(); ++i) {
ASSERT_EQ(data[i], output.flat<string>()(i)) << line; ASSERT_EQ(data[i], output.flat<string>()(i)) << line;
} }
TF_DeleteStatus(status);
TF_DeleteTensor(dst); TF_DeleteTensor(dst);
} }
} }
@ -918,7 +914,7 @@ TEST(CAPI, SavedModel) {
TF_Operation* input_op = TF_Operation* input_op =
TF_GraphOperationByName(graph, input_op_name.c_str()); TF_GraphOperationByName(graph, input_op_name.c_str());
ASSERT_TRUE(input_op != nullptr); ASSERT_TRUE(input_op != nullptr);
csession.SetInputs({{input_op, TF_Tensor_EncodeStrings(input)}}); csession.SetInputs({{input_op, TF_TensorFromTensor(input)}});
const tensorflow::string output_op_name = const tensorflow::string output_op_name =
tensorflow::ParseTensorName(output_name).first.ToString(); tensorflow::ParseTensorName(output_name).first.ToString();
@ -1636,6 +1632,39 @@ TEST_F(CApiAttributesTest, Tensor) {
TF_DeleteTensor(value); TF_DeleteTensor(value);
} }
TEST_F(CApiAttributesTest, StringTensor) {
// Create the string-Tensor "atttribute" value.
char encoded[] = {
0, 0, 0, 0, 0, 0, 0, 0, // array[uint64] offsets
1, // varint encoded string length
'A',
};
auto deallocator = [](void* data, size_t len, void* arg) {};
unique_tensor_ptr t_in(TF_NewTensor(TF_STRING, nullptr, 0, &encoded[0],
sizeof(encoded), deallocator, nullptr),
TF_DeleteTensor);
// Create a TF_Operation with the attribute t_in
auto desc = init("tensor");
TF_SetAttrTensor(desc, "v", t_in.get(), s_);
ASSERT_EQ(TF_OK, TF_GetCode(s_)) << TF_Message(s_);
auto oper = TF_FinishOperation(desc, s_);
ASSERT_EQ(TF_OK, TF_GetCode(s_)) << TF_Message(s_);
// Fetch the attribute back.
EXPECT_TF_META("v", -1, TF_ATTR_TENSOR, -1);
TF_Tensor* t_out = nullptr;
TF_OperationGetAttrTensor(oper, "v", &t_out, s_);
ASSERT_EQ(TF_OK, TF_GetCode(s_)) << TF_Message(s_);
EXPECT_EQ(TF_STRING, TF_TensorType(t_out));
EXPECT_EQ(0, TF_NumDims(t_out));
ASSERT_EQ(TF_TensorByteSize(t_in.get()), TF_TensorByteSize(t_out));
EXPECT_EQ(0, memcmp(TF_TensorData(t_in.get()), TF_TensorData(t_out),
TF_TensorByteSize(t_out)));
TF_DeleteTensor(t_out);
}
TEST_F(CApiAttributesTest, TensorList) { TEST_F(CApiAttributesTest, TensorList) {
const char tensor1[] = {5, 7}; const char tensor1[] = {5, 7};
const int64_t dims1[] = {1, 2}; const int64_t dims1[] = {1, 2};
@ -1647,7 +1676,8 @@ TEST_F(CApiAttributesTest, TensorList) {
auto desc = init("list(tensor)"); auto desc = init("list(tensor)");
TF_Tensor* tmp[] = { TF_Tensor* tmp[] = {
Int8Tensor(dims1, ndims1, tensor1), Int8Tensor(dims2, ndims2, tensor2), Int8Tensor(dims1, ndims1, tensor1),
Int8Tensor(dims2, ndims2, tensor2),
}; };
TF_SetAttrTensorList(desc, "v", tmp, TF_ARRAYSIZE(tmp), s_); TF_SetAttrTensorList(desc, "v", tmp, TF_ARRAYSIZE(tmp), s_);
for (int i = 0; i < TF_ARRAYSIZE(tmp); ++i) { for (int i = 0; i < TF_ARRAYSIZE(tmp); ++i) {

View File

@ -356,7 +356,7 @@ Status SymbolicGradientBuilder::AddGradients() {
// Check if any input nodes still have pending gradients and have not been // Check if any input nodes still have pending gradients and have not been
// processed yet. This happens if not all outputs of a node are in 'inputs_'. // processed yet. This happens if not all outputs of a node are in 'inputs_'.
std::unordered_map<Node*, int> requested_grads; std::unordered_map<Node*, int> requested_grads;
for (Output nout : inputs_) { for (const Output& nout : inputs_) {
if (pending_[nout.node()->id()] > 0) { if (pending_[nout.node()->id()] > 0) {
DCHECK_GT(nout.node()->num_outputs(), 1); DCHECK_GT(nout.node()->num_outputs(), 1);
int idx = input_nodes_[nout]; int idx = input_nodes_[nout];
@ -365,7 +365,7 @@ Status SymbolicGradientBuilder::AddGradients() {
++requested_grads[nout.node()]; ++requested_grads[nout.node()];
} }
} }
for (auto& p : requested_grads) { for (const auto& p : requested_grads) {
int num_requested_inputs = p.first->num_outputs() - pending_[p.first->id()]; int num_requested_inputs = p.first->num_outputs() - pending_[p.first->id()];
CHECK_EQ(num_requested_inputs, p.second); CHECK_EQ(num_requested_inputs, p.second);
} }

View File

@ -257,6 +257,11 @@ Status MarkForCompilationPass::Run(
&registration)) { &registration)) {
return false; return false;
} }
// Don't compile control trigger nodes. We won't preserve their deadness
// semantics correctly, so it's safest not to compile them.
if (node->IsControlTrigger()) return false;
// If this device requires a JIT, we must say yes. // If this device requires a JIT, we must say yes.
if (registration->requires_compilation) return true; if (registration->requires_compilation) return true;

View File

@ -353,6 +353,20 @@ tf_xla_py_test(
], ],
) )
tf_xla_py_test(
name = "segment_reduction_ops_test",
size = "small",
srcs = ["segment_reduction_ops_test.py"],
deps = [
":xla_test",
"//tensorflow/python:array_ops",
"//tensorflow/python:framework_for_generated_wrappers",
"//tensorflow/python:math_ops",
"//tensorflow/python:math_ops_gen",
"//tensorflow/python:platform_test",
],
)
tf_xla_py_test( tf_xla_py_test(
name = "spacetobatch_op_test", name = "spacetobatch_op_test",
size = "medium", size = "medium",

View File

@ -0,0 +1,139 @@
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Test cases for segment reduction ops."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import functools
import numpy as np
from tensorflow.compiler.tests.xla_test import XLATestCase
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import math_ops
from tensorflow.python.platform import googletest
class SegmentReductionOpsTest(XLATestCase):
"""Test cases for segment reduction ops."""
def UnsortedSegmentSum(self, data, indices, num_segments):
with self.test_session() as sess, self.test_scope():
d = array_ops.placeholder(data.dtype, shape=data.shape)
if isinstance(indices, int):
i = array_ops.placeholder(np.int32, shape=[])
else:
i = array_ops.placeholder(indices.dtype, shape=indices.shape)
return sess.run(
math_ops.unsorted_segment_sum(d, i, num_segments),
{d: data,
i: indices})
def testUnsortedSegmentSum0DIndices1DData(self):
for dtype in self.numeric_types:
self.assertAllClose(
np.array(
[[0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0], [0, 1, 2, 3, 4, 5],
[0, 0, 0, 0, 0, 0]],
dtype=dtype),
self.UnsortedSegmentSum(
np.array([0, 1, 2, 3, 4, 5], dtype=dtype), 2, 4))
def testUnsortedSegmentSum1DIndices1DData(self):
for dtype in self.numeric_types:
self.assertAllClose(
np.array([1, 3, 2, 9], dtype=dtype),
self.UnsortedSegmentSum(
np.array([0, 1, 2, 3, 4, 5], dtype=dtype),
np.array([3, 0, 2, 1, 3, 3], dtype=np.int32), 4))
def testUnsortedSegmentSum1DIndices2DDataDisjoint(self):
for dtype in self.numeric_types:
data = np.array(
[[0, 1, 2, 3], [20, 21, 22, 23], [30, 31, 32, 33], [40, 41, 42, 43],
[50, 51, 52, 53]],
dtype=dtype)
indices = np.array([8, 1, 0, 3, 7], dtype=np.int32)
num_segments = 10
y = self.UnsortedSegmentSum(data, indices, num_segments)
self.assertAllClose(
np.array(
[[30, 31, 32, 33], [20, 21, 22, 23], [0, 0, 0, 0],
[40, 41, 42, 43], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0],
[50, 51, 52, 53], [0, 1, 2, 3], [0, 0, 0, 0]],
dtype=dtype), y)
def testUnsortedSegmentSum1DIndices2DDataNonDisjoint(self):
for dtype in self.numeric_types:
data = np.array(
[[0, 1, 2, 3], [20, 21, 22, 23], [30, 31, 32, 33], [40, 41, 42, 43],
[50, 51, 52, 53]],
dtype=dtype)
indices = np.array([0, 1, 2, 0, 1], dtype=np.int32)
num_segments = 4
y = self.UnsortedSegmentSum(data, indices, num_segments)
self.assertAllClose(
np.array(
[[40, 42, 44, 46], [70, 72, 74, 76], [30, 31, 32, 33],
[0, 0, 0, 0]],
dtype=dtype), y)
def testUnsortedSegmentSum2DIndices3DData(self):
for dtype in self.numeric_types:
data = np.array(
[[[0, 1, 2], [10, 11, 12]], [[100, 101, 102], [110, 111, 112]],
[[200, 201, 202], [210, 211, 212]], [[300, 301, 302],
[310, 311, 312]]],
dtype=dtype)
indices = np.array([[3, 5], [3, 1], [5, 0], [6, 2]], dtype=np.int32)
num_segments = 8
y = self.UnsortedSegmentSum(data, indices, num_segments)
self.assertAllClose(
np.array(
[[210, 211, 212], [110, 111, 112], [310, 311, 312],
[100, 102, 104], [0, 0, 0.], [210, 212, 214], [300, 301,
302], [0, 0, 0]],
dtype=dtype), y)
def testUnsortedSegmentSum1DIndices3DData(self):
for dtype in self.numeric_types:
data = np.array(
[[[0, 1, 2], [10, 11, 12]], [[100, 101, 102], [110, 111, 112]],
[[200, 201, 202], [210, 211, 212]], [[300, 301, 302],
[310, 311, 312]]],
dtype=dtype)
indices = np.array([3, 0, 2, 5], dtype=np.int32)
num_segments = 6
y = self.UnsortedSegmentSum(data, indices, num_segments)
self.assertAllClose(
np.array(
[[[100, 101, 102.], [110, 111, 112]], [[0, 0, 0], [0, 0, 0]],
[[200, 201, 202], [210, 211, 212]], [[0, 1, 2.], [10, 11, 12]],
[[0, 0, 0], [0, 0, 0]], [[300, 301, 302], [310, 311, 312]]],
dtype=dtype), y)
def testUnsortedSegmentSumShapeError(self):
for dtype in self.numeric_types:
data = np.ones((4, 8, 7), dtype=dtype)
indices = np.ones((3, 2), dtype=np.int32)
num_segments = 4
self.assertRaises(ValueError,
functools.partial(self.UnsortedSegmentSum, data,
indices, num_segments))
if __name__ == '__main__':
googletest.main()

View File

@ -57,11 +57,13 @@ class TensorArrayTest(xla_test.XLATestCase):
r0 = w2.read(0) r0 = w2.read(0)
r1 = w2.read(1) r1 = w2.read(1)
r2 = w2.read(2) r2 = w2.read(2)
flow = w2.flow
d0, d1, d2 = session.run([r0, r1, r2]) d0, d1, d2, flow_val = session.run([r0, r1, r2, flow])
self.assertAllEqual([[4.0, 5.0]], d0) self.assertAllEqual([[4.0, 5.0]], d0)
self.assertAllEqual([[1.0, 3.0]], d1) self.assertAllEqual([[1.0, 3.0]], d1)
self.assertAllEqual([[7.0, -8.5]], d2) self.assertAllEqual([[7.0, -8.5]], d2)
self.assertAllEqual([], flow_val.shape)
def _testTensorArrayWritePack(self, tf_dtype): def _testTensorArrayWritePack(self, tf_dtype):
with self.test_session(), self.test_scope(): with self.test_session(), self.test_scope():

View File

@ -323,12 +323,26 @@ Status FunctionalizeLoop(Graph* graph, Frame* frame,
for (Arg& arg : frame->args) { for (Arg& arg : frame->args) {
if (!arg.is_loop_invariant) { if (!arg.is_loop_invariant) {
// Follow the edge from the Enter to Merge. // Follow the edge from the Enter to Merge.
if (arg.enter->out_edges().size() != 1) { const Edge* enter_merge = nullptr;
return errors::Internal("Enter node for loop-varying argument ", for (const Edge* e : arg.enter->out_edges()) {
arg.enter->name(), // Ignore control-edges to the sink node. These are allowed by the
" does not have exactly one successor"); // graph invariants, although probably they should have been stripped
// off earlier.
if (e->IsControlEdge() && e->dst()->IsSink()) {
continue;
}
if (enter_merge != nullptr) {
return errors::Internal(
"Enter node for loop-varying argument ", arg.enter->name(),
" has multiple successors: ", enter_merge->dst()->name(), " and ",
e->dst()->name());
}
enter_merge = e;
}
if (enter_merge == nullptr) {
return errors::Internal("Enter node for loop-varying argument ",
arg.enter->name(), " has zero successors");
} }
const Edge* enter_merge = *arg.enter->out_edges().begin();
arg.merge = enter_merge->dst(); arg.merge = enter_merge->dst();
if (!IsMerge(arg.merge)) { if (!IsMerge(arg.merge)) {
return errors::InvalidArgument( return errors::InvalidArgument(

View File

@ -96,6 +96,14 @@ TEST(FunctionalizeControlFlow, OneLoopVar) {
TF_EXPECT_OK(scope.ToGraph(&graph)); TF_EXPECT_OK(scope.ToGraph(&graph));
} }
// Regression test: control edges from an Enter node to the graph sink should
// be ignored.
for (Node* n : graph.nodes()) {
if (n->name() == "while/Enter") {
graph.AddControlEdge(n, graph.sink_node());
}
}
FunctionLibraryDefinition library(OpRegistry::Global(), {}); FunctionLibraryDefinition library(OpRegistry::Global(), {});
TF_ASSERT_OK(FunctionalizeControlFlow(&graph, &library)); TF_ASSERT_OK(FunctionalizeControlFlow(&graph, &library));

View File

@ -47,6 +47,7 @@ tf_kernel_library(
"reshape_op.cc", "reshape_op.cc",
"retval_op.cc", "retval_op.cc",
"reverse_op.cc", "reverse_op.cc",
"segment_reduction_ops.cc",
"select_op.cc", "select_op.cc",
"sequence_ops.cc", "sequence_ops.cc",
"shape_op.cc", "shape_op.cc",

View File

@ -23,4 +23,9 @@ namespace tensorflow {
// dummy operator using CompilationOnly(). // dummy operator using CompilationOnly().
REGISTER_XLA_OP(Name("NoOp").CompilationOnly(), NoOp); REGISTER_XLA_OP(Name("NoOp").CompilationOnly(), NoOp);
// We register ControlTrigger as a no-op. This is correct since nodes seen
// by the XLA compiler are never dead. This may need rethinking when we add
// support for conditionals to XLA.
REGISTER_XLA_OP(Name("ControlTrigger"), NoOp);
} // namespace tensorflow } // namespace tensorflow

View File

@ -0,0 +1,155 @@
/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#include <sstream>
#include "tensorflow/compiler/tf2xla/kernels/cwise_ops.h"
#include "tensorflow/compiler/tf2xla/shape_util.h"
#include "tensorflow/compiler/tf2xla/xla_helpers.h"
#include "tensorflow/compiler/tf2xla/xla_op_registry.h"
#include "tensorflow/compiler/xla/client/computation_builder.h"
#include "tensorflow/compiler/xla/literal_util.h"
#include "tensorflow/core/framework/kernel_def_builder.h"
#include "tensorflow/core/framework/types.h"
namespace tensorflow {
namespace {
class UnsortedSegmentSum : public XlaOpKernel {
public:
explicit UnsortedSegmentSum(OpKernelConstruction* ctx) : XlaOpKernel(ctx) {
OP_REQUIRES_OK(ctx, ctx->GetAttr("T", &dtype_));
}
void Compile(XlaOpKernelContext* ctx) override {
// output = unsorted_segment_sum(data, indices, num_segments)
// Compute a tensor such that:
// output[i] = sum over {j where indices[j] == i} of data[j]
// output[i] == 0 if i does not appear in indices
//
// Contrast with segment_sum(), which assumes indices are sorted and that
// max(indices)+1 is the desired size of the output.
//
// The returned output tensor has the same type as data, and the same shape
// as data with the first indices.rank dimensions are replaced
// by a single dimension with size num_segments.
xla::ComputationBuilder* builder = ctx->builder();
auto data = ctx->Input(0);
auto data_shape = ctx->InputShape(0);
auto indices = ctx->Input(1);
auto indices_shape = ctx->InputShape(1);
OP_REQUIRES(ctx, data_shape.dims() >= indices_shape.dims(),
errors::InvalidArgument(
"UnsortedSegmentSum requires that indices' rank be"
" less than or equal to data's rank."));
// Validate that indices.shape is a prefix of data.shape.
for (int d = 0; d < indices_shape.dims(); ++d) {
OP_REQUIRES(ctx, (data_shape.dim_size(d) == indices_shape.dim_size(d)),
errors::InvalidArgument(
"UnsortedSegmentSum requires indices shape to be prefix"
" of data_shape, but dimension ",
d, " differs ", data_shape.dim_size(d), " vs. ",
indices_shape.dim_size(d)));
}
int64 num_segments;
OP_REQUIRES_OK(ctx, ctx->ConstantInputAsIntScalar(2, &num_segments));
// Flatten the indices into 1-D.
auto indices_1d = builder->Reshape(indices, {indices_shape.num_elements()});
// flatten data for dynamic indexing.
int64 out_tensor_dims = data_shape.dims() - indices_shape.dims();
std::vector<int64> flat_shape(1 + out_tensor_dims);
flat_shape[0] = indices_shape.num_elements();
for (int64 k = 0; k < out_tensor_dims; ++k) {
flat_shape[1 + k] = data_shape.dim_size(indices_shape.dims() + k);
}
auto data_flat = builder->Reshape(data, flat_shape);
// output shape; same as data_shape, but dimension 0 is num_segments.
std::vector<int64> out_shape(flat_shape);
out_shape[0] = num_segments;
// Pad the output array dims to rank >= 3 to work around lowering issues.
// TODO(b/37575001) This is awkward, and could be improved.
int64 extra_dims = 0;
if (out_shape.size() < 3) {
extra_dims = 3u - out_shape.size();
}
std::vector<int64> rshape(extra_dims + out_shape.size(), 1);
for (unsigned k = 0; k < out_shape.size(); ++k) {
rshape[extra_dims + k] = out_shape[k];
}
auto output = builder->Broadcast(XlaHelpers::Zero(builder, dtype_), rshape);
auto zero = builder->ConstantR1<int32>({0});
for (int64 i = 0; i < indices_shape.num_elements(); ++i) {
// output[indices[i]] += data[i]
std::vector<int64> data_start_indices(flat_shape.size());
data_start_indices[0] = i;
for (unsigned d = 1; d < flat_shape.size(); ++d) {
data_start_indices[d] = 0;
}
std::vector<int64> data_limit_indices(flat_shape);
data_limit_indices[0] = i + 1;
std::vector<int64> stride(flat_shape.size(), 1);
auto data_slice = builder->Slice(data_flat, data_start_indices,
data_limit_indices, stride);
// Reshape the sliced data into the R3+ shape to match output array.
std::vector<int64> rdata_shape(extra_dims + flat_shape.size());
for (int64 k = 0; k <= extra_dims; ++k) {
rdata_shape[k] = 1;
}
for (unsigned k = 1; k < data_limit_indices.size(); ++k) {
rdata_shape[extra_dims + k] = data_limit_indices[k];
}
auto rdata_slice = builder->Reshape(data_slice, rdata_shape);
auto index = builder->Slice(indices_1d, {i}, {i + 1}, {1});
// Construct the index into the R3+ output array 0, ..., <index>, 0, ...
std::vector<xla::ComputationDataHandle> out_start_index_parts(
extra_dims + flat_shape.size(), zero);
out_start_index_parts[extra_dims] = builder->Reshape(index, {1});
auto out_start_indices = builder->ConcatInDim(out_start_index_parts, 0);
std::vector<int64> slice_size(rshape);
slice_size[extra_dims] = 1;
auto out_slice =
builder->DynamicSlice(output, out_start_indices, slice_size);
auto sumval = builder->Add(out_slice, rdata_slice);
output = builder->DynamicUpdateSlice(output, sumval, out_start_indices);
}
auto reshaped_output = builder->Reshape(output, out_shape);
ctx->SetOutput(0, reshaped_output);
}
private:
DataType dtype_;
};
REGISTER_XLA_OP(Name("UnsortedSegmentSum"), UnsortedSegmentSum);
} // namespace
} // namespace tensorflow

View File

@ -182,7 +182,10 @@ class TensorArrayOp : public XlaOpKernel {
dtype_, value, &var)); dtype_, value, &var));
var->tensor_array_size = size; var->tensor_array_size = size;
ctx->SetResourceOutput(0, var); ctx->SetResourceOutput(0, var);
ctx->SetConstantOutput(1, Tensor(DT_FLOAT));
Tensor flow(DT_FLOAT, TensorShape({}));
flow.scalar<float>()() = 0.0f;
ctx->SetConstantOutput(1, flow);
} }
private: private:
@ -216,6 +219,7 @@ class TensorArrayWriteOp : public XlaOpKernel {
xla::ComputationDataHandle ta = resource->value; xla::ComputationDataHandle ta = resource->value;
xla::ComputationDataHandle index = ctx->Input(1); xla::ComputationDataHandle index = ctx->Input(1);
xla::ComputationDataHandle value = ctx->Input(2); xla::ComputationDataHandle value = ctx->Input(2);
xla::ComputationDataHandle flow = ctx->Input(3);
// start_indices of the DynamicUpdateSlice are [index, 0, 0, ..., 0]. // start_indices of the DynamicUpdateSlice are [index, 0, 0, ..., 0].
auto start_indices = XlaHelpers::PadWithZeros(b, index, elem_shape.dims()); auto start_indices = XlaHelpers::PadWithZeros(b, index, elem_shape.dims());
@ -228,7 +232,7 @@ class TensorArrayWriteOp : public XlaOpKernel {
DynamicAddSlice(b, ta, update, slice_shape.dim_sizes(), start_indices); DynamicAddSlice(b, ta, update, slice_shape.dim_sizes(), start_indices);
resource->value = written; resource->value = written;
ctx->SetConstantOutput(0, Tensor(DT_FLOAT)); ctx->SetOutput(0, flow);
} }
private: private:
@ -369,6 +373,7 @@ class TensorArrayScatterOp : public XlaOpKernel {
xla::ComputationDataHandle ta = resource->value; xla::ComputationDataHandle ta = resource->value;
const xla::ComputationDataHandle value = ctx->Input(2); const xla::ComputationDataHandle value = ctx->Input(2);
const xla::ComputationDataHandle flow = ctx->Input(3);
auto slice_dims = value_shape.dim_sizes(); auto slice_dims = value_shape.dim_sizes();
slice_dims[0] = 1LL; slice_dims[0] = 1LL;
@ -394,7 +399,7 @@ class TensorArrayScatterOp : public XlaOpKernel {
} }
resource->value = ta; resource->value = ta;
ctx->SetConstantOutput(0, Tensor(DT_FLOAT)); ctx->SetOutput(0, flow);
} }
private: private:
@ -489,6 +494,7 @@ class TensorArraySplitOp : public XlaOpKernel {
lengths.size(), " vs. ", resource->tensor_array_size, ")")); lengths.size(), " vs. ", resource->tensor_array_size, ")"));
const xla::ComputationDataHandle value = ctx->Input(1); const xla::ComputationDataHandle value = ctx->Input(1);
const xla::ComputationDataHandle flow = ctx->Input(3);
OP_REQUIRES(ctx, value_shape.num_elements() == ta_shape.num_elements(), OP_REQUIRES(ctx, value_shape.num_elements() == ta_shape.num_elements(),
errors::InvalidArgument("mismatched element count ", errors::InvalidArgument("mismatched element count ",
@ -497,7 +503,7 @@ class TensorArraySplitOp : public XlaOpKernel {
resource->value = b->Add(ta, b->Reshape(value, ta_shape.dim_sizes())); resource->value = b->Add(ta, b->Reshape(value, ta_shape.dim_sizes()));
ctx->SetConstantOutput(0, Tensor(DT_FLOAT)); ctx->SetOutput(0, flow);
} }
private: private:

View File

@ -132,7 +132,10 @@ cc_library(
cc_library( cc_library(
name = "statusor", name = "statusor",
srcs = ["statusor.cc"], srcs = ["statusor.cc"],
hdrs = ["statusor.h"], hdrs = [
"statusor.h",
"statusor_internals.h",
],
visibility = ["//visibility:public"], visibility = ["//visibility:public"],
deps = [ deps = [
":status", ":status",

View File

@ -1943,6 +1943,7 @@ cc_library(
":buffer_liveness", ":buffer_liveness",
":hlo", ":hlo",
":hlo_pass", ":hlo_pass",
"//tensorflow/compiler/xla:shape_util",
"//tensorflow/core:lib", "//tensorflow/core:lib",
], ],
) )

View File

@ -1488,9 +1488,9 @@ Status AlgebraicSimplifierVisitor::HandleConvolution(
// We cannot insert bitcasts if the layouts will not be compatible. // We cannot insert bitcasts if the layouts will not be compatible.
// TODO(b/33178038): Consider inserting a transpose if a bitcast would be // TODO(b/33178038): Consider inserting a transpose if a bitcast would be
// invalid. // invalid.
if (!valid_bitcast_callback_(lhs->shape(), input_shape) || if (!valid_bitcast_callback_(input_shape, lhs->shape()) ||
!valid_bitcast_callback_(rhs->shape(), new_filter_shape) || !valid_bitcast_callback_(new_filter_shape, rhs->shape()) ||
!valid_bitcast_callback_(dot_output_shape, convolution_shape)) { !valid_bitcast_callback_(convolution_shape, dot_output_shape)) {
return Status::OK(); return Status::OK();
} }

View File

@ -26,12 +26,13 @@ namespace xla {
// A pass which performs AlgebraicSimplications. // A pass which performs AlgebraicSimplications.
class AlgebraicSimplifier : public HloPassInterface { class AlgebraicSimplifier : public HloPassInterface {
public: public:
// Given two shapes, determines if it is valid to bitcast between them after // Given shapes 'from_shape' and 'to_shape', determines if it is valid to
// considering platform dependent effects on layout like alignment // bitcast from 'from_shape' to 'to_shape' after considering platform
// restrictions. // dependent effects on layout like alignment restrictions. Precondition: the
// Precondition: the two shapes have layouts, the same number of // two shapes have layouts, the same number of elements and
// elements and ShapeUtil::ReshapeIsBitcast returns true. // ShapeUtil::ReshapeIsBitcast returns true.
using ValidBitcastCallback = std::function<bool(const Shape&, const Shape&)>; using ValidBitcastCallback =
std::function<bool(const Shape& from_shape, const Shape& to_shape)>;
// If is_layout_sensitive is true, then the simplifier preserves layout during // If is_layout_sensitive is true, then the simplifier preserves layout during
// transformation. Otherwise, layout is ignored. If valid_bitcast_callback // transformation. Otherwise, layout is ignored. If valid_bitcast_callback

View File

@ -72,6 +72,7 @@ cc_library(
"//tensorflow/compiler/xla/service:hlo_subcomputation_unification", "//tensorflow/compiler/xla/service:hlo_subcomputation_unification",
"//tensorflow/compiler/xla/service:hlo_verifier", "//tensorflow/compiler/xla/service:hlo_verifier",
"//tensorflow/compiler/xla/service:inliner", "//tensorflow/compiler/xla/service:inliner",
"//tensorflow/compiler/xla/service:reduce_precision_insertion",
"//tensorflow/compiler/xla/service:reshape_mover", "//tensorflow/compiler/xla/service:reshape_mover",
"//tensorflow/compiler/xla/service:transpose_folding", "//tensorflow/compiler/xla/service:transpose_folding",
"//tensorflow/compiler/xla/service/llvm_ir:llvm_util", # fixdeps: keep "//tensorflow/compiler/xla/service/llvm_ir:llvm_util", # fixdeps: keep

View File

@ -74,6 +74,7 @@ limitations under the License.
#include "tensorflow/compiler/xla/service/hlo_verifier.h" #include "tensorflow/compiler/xla/service/hlo_verifier.h"
#include "tensorflow/compiler/xla/service/inliner.h" #include "tensorflow/compiler/xla/service/inliner.h"
#include "tensorflow/compiler/xla/service/llvm_ir/llvm_util.h" #include "tensorflow/compiler/xla/service/llvm_ir/llvm_util.h"
#include "tensorflow/compiler/xla/service/reduce_precision_insertion.h"
#include "tensorflow/compiler/xla/service/reshape_mover.h" #include "tensorflow/compiler/xla/service/reshape_mover.h"
#include "tensorflow/compiler/xla/service/transpose_folding.h" #include "tensorflow/compiler/xla/service/transpose_folding.h"
#include "tensorflow/compiler/xla/status_macros.h" #include "tensorflow/compiler/xla/status_macros.h"
@ -253,6 +254,14 @@ Status CpuCompiler::RunHloPasses(HloModule* module) {
HloPassPipeline pipeline("CPU"); HloPassPipeline pipeline("CPU");
pipeline.AddInvariantChecker<HloVerifier>(); pipeline.AddInvariantChecker<HloVerifier>();
for (const auto& reduce_precision_options :
module->config().debug_options().hlo_reduce_precision_options()) {
if (reduce_precision_options.pass_timing() ==
HloReducePrecisionOptions::BEFORE_OP_FUSION) {
pipeline.AddPass<ReducePrecisionInsertion>(reduce_precision_options);
}
}
// TODO(b/35786417): Re-enable inliner pass after fixing the bug and deciding // TODO(b/35786417): Re-enable inliner pass after fixing the bug and deciding
// where we will take this pass in future. // where we will take this pass in future.
// pipeline.AddPass<Inliner>(); // pipeline.AddPass<Inliner>();
@ -278,6 +287,15 @@ Status CpuCompiler::RunHloPasses(HloModule* module) {
TransposeFolding::NeverFoldTranspose); TransposeFolding::NeverFoldTranspose);
pipeline.AddPass<HloCSE>(/*is_layout_sensitive=*/false); pipeline.AddPass<HloCSE>(/*is_layout_sensitive=*/false);
pipeline.AddPass<CpuInstructionFusion>(); pipeline.AddPass<CpuInstructionFusion>();
for (const auto& reduce_precision_options :
module->config().debug_options().hlo_reduce_precision_options()) {
if (reduce_precision_options.pass_timing() ==
HloReducePrecisionOptions::AFTER_OP_FUSION) {
pipeline.AddPass<ReducePrecisionInsertion>(reduce_precision_options);
}
}
pipeline.AddPass<CpuLayoutAssignment>( pipeline.AddPass<CpuLayoutAssignment>(
module->mutable_entry_computation_layout()); module->mutable_entry_computation_layout());
// The LayoutAssignment pass may leave behind kCopy instructions which are // The LayoutAssignment pass may leave behind kCopy instructions which are

View File

@ -432,6 +432,7 @@ cc_library(
"//tensorflow/compiler/xla/service:hlo_proto_util", "//tensorflow/compiler/xla/service:hlo_proto_util",
"//tensorflow/compiler/xla/service:hlo_subcomputation_unification", "//tensorflow/compiler/xla/service:hlo_subcomputation_unification",
"//tensorflow/compiler/xla/service:hlo_verifier", "//tensorflow/compiler/xla/service:hlo_verifier",
"//tensorflow/compiler/xla/service:reduce_precision_insertion",
"//tensorflow/compiler/xla/service:reshape_mover", "//tensorflow/compiler/xla/service:reshape_mover",
"//tensorflow/compiler/xla/service:transpose_folding", "//tensorflow/compiler/xla/service:transpose_folding",
"//tensorflow/compiler/xla/service/gpu/llvm_gpu_backend", "//tensorflow/compiler/xla/service/gpu/llvm_gpu_backend",

View File

@ -56,6 +56,7 @@ limitations under the License.
#include "tensorflow/compiler/xla/service/hlo_subcomputation_unification.h" #include "tensorflow/compiler/xla/service/hlo_subcomputation_unification.h"
#include "tensorflow/compiler/xla/service/hlo_verifier.h" #include "tensorflow/compiler/xla/service/hlo_verifier.h"
#include "tensorflow/compiler/xla/service/llvm_ir/llvm_util.h" #include "tensorflow/compiler/xla/service/llvm_ir/llvm_util.h"
#include "tensorflow/compiler/xla/service/reduce_precision_insertion.h"
#include "tensorflow/compiler/xla/service/reshape_mover.h" #include "tensorflow/compiler/xla/service/reshape_mover.h"
#include "tensorflow/compiler/xla/service/transpose_folding.h" #include "tensorflow/compiler/xla/service/transpose_folding.h"
#include "tensorflow/compiler/xla/status_macros.h" #include "tensorflow/compiler/xla/status_macros.h"
@ -123,6 +124,15 @@ tensorflow::Status OptimizeHloModule(HloModule* hlo_module,
{ {
HloPassPipeline pipeline("optimization"); HloPassPipeline pipeline("optimization");
pipeline.AddInvariantChecker<HloVerifier>(); pipeline.AddInvariantChecker<HloVerifier>();
for (const auto& reduce_precision_options :
hlo_module->config().debug_options().hlo_reduce_precision_options()) {
if (reduce_precision_options.pass_timing() ==
HloReducePrecisionOptions::BEFORE_OP_FUSION) {
pipeline.AddPass<ReducePrecisionInsertion>(reduce_precision_options);
}
}
{ {
auto& pass = auto& pass =
pipeline.AddPass<HloPassFix<HloPassPipeline>>("simplification"); pipeline.AddPass<HloPassFix<HloPassPipeline>>("simplification");
@ -149,9 +159,28 @@ tensorflow::Status OptimizeHloModule(HloModule* hlo_module,
fusion.AddPass<GpuInstructionFusion>(/*may_duplicate=*/false); fusion.AddPass<GpuInstructionFusion>(/*may_duplicate=*/false);
fusion.AddPass<GpuInstructionFusion>(/*may_duplicate=*/true); fusion.AddPass<GpuInstructionFusion>(/*may_duplicate=*/true);
fusion.AddPass<FusionMerger>(); fusion.AddPass<FusionMerger>();
return fusion.Run(hlo_module).status(); TF_RETURN_IF_ERROR(fusion.Run(hlo_module).status());
HloPassPipeline reduce_pipeline("reduce-precision");
for (const auto& reduce_precision_options :
hlo_module->config().debug_options().hlo_reduce_precision_options()) {
if (reduce_precision_options.pass_timing() ==
HloReducePrecisionOptions::AFTER_OP_FUSION) {
reduce_pipeline.AddPass<ReducePrecisionInsertion>(
reduce_precision_options);
} }
} }
StatusOr<bool> reduce_result = reduce_pipeline.Run(hlo_module);
TF_RETURN_IF_ERROR(reduce_result.status());
if (reduce_result.ValueOrDie()) {
// Do another fusion pass, with the expectation that we may be able to
// fuse the new ReducePrecision operations.
TF_RETURN_IF_ERROR(fusion.Run(hlo_module).status());
}
}
return tensorflow::Status::OK();
}
// Modifies the given HLO module so that it will be accepted by IrEmitter. // Modifies the given HLO module so that it will be accepted by IrEmitter.
// Unlike optimization passes, the passes are necessary for correctness. // Unlike optimization passes, the passes are necessary for correctness.

File diff suppressed because it is too large Load Diff

View File

@ -112,6 +112,11 @@ bool HloOpcodeIsComparison(HloOpcode opcode);
// Returns true iff the given opcode has variadic operands. // Returns true iff the given opcode has variadic operands.
bool HloOpcodeIsVariadic(HloOpcode opcode); bool HloOpcodeIsVariadic(HloOpcode opcode);
// Returns the number of HloOpcode values.
inline const uint32_t HloOpcodeCount() {
return static_cast<uint32_t>(HloOpcode::kWhile) + 1;
}
} // namespace xla } // namespace xla
#endif // TENSORFLOW_COMPILER_XLA_SERVICE_HLO_OPCODE_H_ #endif // TENSORFLOW_COMPILER_XLA_SERVICE_HLO_OPCODE_H_

View File

@ -16,6 +16,7 @@ limitations under the License.
#include "tensorflow/compiler/xla/service/reduce_precision_insertion.h" #include "tensorflow/compiler/xla/service/reduce_precision_insertion.h"
#include "tensorflow/compiler/xla/service/hlo_module.h" #include "tensorflow/compiler/xla/service/hlo_module.h"
#include "tensorflow/compiler/xla/shape_util.h"
#include "tensorflow/core/platform/logging.h" #include "tensorflow/core/platform/logging.h"
namespace xla { namespace xla {
@ -30,14 +31,15 @@ StatusOr<bool> ReducePrecisionInsertion::Run(HloModule* module) {
for (auto& instruction : computation->instructions()) { for (auto& instruction : computation->instructions()) {
VLOG(3) << "Visited instruction: " << instruction->ToString(); VLOG(3) << "Visited instruction: " << instruction->ToString();
// For now, ReducePrecision is only implemented for F32 data, so this // For now, ReducePrecision is only implemented for F32 arrays, so this
// ignore instructions that produce other data. In particular, this // ignore instructions that produce other data. In particular, this
// currently ignores instructions producing tuples, even if those tuples // currently ignores instructions producing tuples, even if those tuples
// contain F32 data inside them. The assumption is that in most cases // contain F32 arrays inside them. The assumption is that in most cases
// equivalent behavior can be obtained by adding ReducePrecision // equivalent behavior can be obtained by adding ReducePrecision
// instructions after the instructions that pull the F32 data out of the // instructions after the instructions that pull the F32 arrays out of
// tuples. // the tuples.
if (instruction->shape().element_type() == PrimitiveType::F32 && if (instruction->shape().element_type() == PrimitiveType::F32 &&
!ShapeUtil::IsScalar(instruction->shape()) &&
should_reduce_output_precision_(instruction->opcode())) { should_reduce_output_precision_(instruction->opcode())) {
instructions_to_suffix.push_back(instruction.get()); instructions_to_suffix.push_back(instruction.get());
} }
@ -58,4 +60,33 @@ StatusOr<bool> ReducePrecisionInsertion::Run(HloModule* module) {
return changed; return changed;
} }
ReducePrecisionInsertion::OpcodeFilterFunction
ReducePrecisionInsertion::make_filter_function(
const HloReducePrecisionOptions& reduce_precision_options) {
// Implement the filter function with a lookup table.
std::vector<bool> filter(HloOpcodeCount(), false);
for (const auto& opcode : reduce_precision_options.opcodes_to_suffix()) {
filter[opcode] = true;
}
return [filter](const HloOpcode opcode) {
return filter[static_cast<unsigned int>(opcode)];
};
}
HloReducePrecisionOptions ReducePrecisionInsertion::make_options_proto(
const HloReducePrecisionOptions::PassTiming pass_timing,
const int exponent_bits, const int mantissa_bits,
const OpcodeFilterFunction& should_reduce_output_precision) {
HloReducePrecisionOptions options;
options.set_pass_timing(pass_timing);
options.set_exponent_bits(exponent_bits);
options.set_mantissa_bits(mantissa_bits);
for (uint32_t opcode = 0; opcode < HloOpcodeCount(); opcode++) {
if (should_reduce_output_precision(static_cast<HloOpcode>(opcode))) {
options.add_opcodes_to_suffix(opcode);
}
}
return options;
}
} // namespace xla } // namespace xla

View File

@ -42,6 +42,17 @@ class ReducePrecisionInsertion : public HloPassInterface {
: exponent_bits_(exponent_bits), : exponent_bits_(exponent_bits),
mantissa_bits_(mantissa_bits), mantissa_bits_(mantissa_bits),
should_reduce_output_precision_(should_reduce_output_precision) {} should_reduce_output_precision_(should_reduce_output_precision) {}
// Version of the constructor that takes an HloReducePrecisionOptions proto
// rather than explicitly-enumerated parameters, for convenience when
// creating passes based on DebugOptions.
explicit ReducePrecisionInsertion(
const HloReducePrecisionOptions& reduce_precision_options)
: exponent_bits_(reduce_precision_options.exponent_bits()),
mantissa_bits_(reduce_precision_options.mantissa_bits()),
should_reduce_output_precision_(
make_filter_function(reduce_precision_options)) {}
~ReducePrecisionInsertion() override{}; ~ReducePrecisionInsertion() override{};
tensorflow::StringPiece name() const override { tensorflow::StringPiece name() const override {
@ -52,6 +63,15 @@ class ReducePrecisionInsertion : public HloPassInterface {
// (reduce-precision instructions were inserted). // (reduce-precision instructions were inserted).
StatusOr<bool> Run(HloModule* module) override; StatusOr<bool> Run(HloModule* module) override;
// Convert between the (inconvenient) xla.proto HloReducePrecisionOptions
// representation and OpcodeFilterFunction functions.
static OpcodeFilterFunction make_filter_function(
const HloReducePrecisionOptions& reduce_precision_options);
static HloReducePrecisionOptions make_options_proto(
const HloReducePrecisionOptions::PassTiming pass_timing,
const int exponent_bits, const int mantissa_bits,
const OpcodeFilterFunction& should_reduce_output_precision);
private: private:
// Parameters for the precision reduction to be added. // Parameters for the precision reduction to be added.
const int exponent_bits_; const int exponent_bits_;
@ -59,7 +79,7 @@ class ReducePrecisionInsertion : public HloPassInterface {
// Function to determine (from the opcode) whether a given instruction should // Function to determine (from the opcode) whether a given instruction should
// have a reduce-precision instruction inserted in its output stream. // have a reduce-precision instruction inserted in its output stream.
const OpcodeFilterFunction& should_reduce_output_precision_; const OpcodeFilterFunction should_reduce_output_precision_;
}; };
} // namespace xla } // namespace xla

View File

@ -19,28 +19,20 @@ limitations under the License.
#include "tensorflow/core/platform/logging.h" #include "tensorflow/core/platform/logging.h"
namespace xla { namespace xla {
namespace internal { namespace internal_statusor {
Status StatusOrHelper::HandleInvalidStatusCtorArg() { void Helper::HandleInvalidStatusCtorArg(Status* status) {
const char* kMessage = const char* kMessage =
"Status::OK is not a valid constructor argument to StatusOr<T>"; "An OK status is not a valid constructor argument to StatusOr<T>";
LOG(ERROR) << kMessage; LOG(ERROR) << kMessage;
// In optimized builds, we will fall back to tensorflow::error::INTERNAL. // Fall back to tensorflow::error::INTERNAL.
return Status(tensorflow::error::INTERNAL, kMessage); *status = ::tensorflow::errors::Internal(kMessage);
} }
Status StatusOrHelper::HandleNullObjectCtorArg() { void Helper::Crash(const Status& status) {
const char* kMessage =
"NULL is not a valid constructor argument to StatusOr<T*>";
LOG(ERROR) << kMessage;
// In optimized builds, we will fall back to tensorflow::error::INTERNAL.
return Status(tensorflow::error::INTERNAL, kMessage);
}
void StatusOrHelper::Crash(const Status& status) {
LOG(FATAL) << "Attempting to fetch value instead of handling error " LOG(FATAL) << "Attempting to fetch value instead of handling error "
<< status; << status;
} }
} // namespace internal } // namespace internal_statusor
} // namespace xla } // namespace xla

View File

@ -72,216 +72,233 @@ limitations under the License.
#define TENSORFLOW_COMPILER_XLA_STATUSOR_H_ #define TENSORFLOW_COMPILER_XLA_STATUSOR_H_
#include "tensorflow/compiler/xla/status.h" #include "tensorflow/compiler/xla/status.h"
#include "tensorflow/compiler/xla/statusor_internals.h"
#include "tensorflow/core/platform/macros.h" #include "tensorflow/core/platform/macros.h"
namespace xla { namespace xla {
#if defined(__clang__) #if defined(__clang__)
// Only clang supports warn_unused_result as a type annotation. // Only clang supports warn_unused_result as a type annotation.
template <typename T, bool CopyConstructible> template <typename T>
class TF_MUST_USE_RESULT StatusOr; class TF_MUST_USE_RESULT StatusOr;
#endif #endif
template <typename T, template <typename T>
bool CopyConstructible = std::is_copy_constructible<T>::value> class StatusOr : private internal_statusor::StatusOrData<T>,
class StatusOr { private internal_statusor::TraitsBase<
template <typename U, bool UC> std::is_copy_constructible<T>::value,
std::is_move_constructible<T>::value> {
template <typename U>
friend class StatusOr; friend class StatusOr;
typedef internal_statusor::StatusOrData<T> Base;
public: public:
typedef T element_type; typedef T element_type;
// Construct a new StatusOr with Status::UNKNOWN status // Constructs a new StatusOr with Status::UNKNOWN status. This is marked
StatusOr(); // 'explicit' to try to catch cases like 'return {};', where people think
// StatusOr<std::vector<int>> will be initialized with an empty vector,
// instead of a Status::UNKNOWN status.
explicit StatusOr();
// Construct a new StatusOr with the given non-ok status. After calling // StatusOr<T> will be copy constructuble/assignable if T is copy
// constructible.
StatusOr(const StatusOr&) = default;
StatusOr& operator=(const StatusOr&) = default;
// StatusOr<T> will be move constructuble/assignable if T is move
// constructible.
StatusOr(StatusOr&&) = default;
StatusOr& operator=(StatusOr&&) = default;
// Conversion copy/move constructor, T must be convertible from U.
// TODO(b/62186717): These should not participate in overload resolution if U
// is not convertible to T.
template <typename U>
StatusOr(const StatusOr<U>& other);
template <typename U>
StatusOr(StatusOr<U>&& other);
// Conversion copy/move assignment operator, T must be convertible from U.
template <typename U>
StatusOr& operator=(const StatusOr<U>& other);
template <typename U>
StatusOr& operator=(StatusOr<U>&& other);
// Constructs a new StatusOr with the given value. After calling this
// constructor, calls to ValueOrDie() will succeed, and calls to status() will
// return OK.
//
// NOTE: Not explicit - we want to use StatusOr<T> as a return type
// so it is convenient and sensible to be able to do 'return T()'
// when the return type is StatusOr<T>.
//
// REQUIRES: T is copy constructible.
StatusOr(const T& value);
// Constructs a new StatusOr with the given non-ok status. After calling
// this constructor, calls to ValueOrDie() will CHECK-fail. // this constructor, calls to ValueOrDie() will CHECK-fail.
// //
// NOTE: Not explicit - we want to use StatusOr<T> as a return // NOTE: Not explicit - we want to use StatusOr<T> as a return
// value, so it is convenient and sensible to be able to do 'return // value, so it is convenient and sensible to be able to do 'return
// Status()' when the return type is StatusOr<T>. // Status()' when the return type is StatusOr<T>.
// //
// REQUIRES: status != Status::OK. This requirement is DCHECKed. // REQUIRES: !status.ok(). This requirement is DCHECKed.
// In optimized builds, passing Status::OK here will have the effect // In optimized builds, passing Status::OK() here will have the effect
// of passing tensorflow::error::INTERNAL as a fallback. // of passing tensorflow::error::INTERNAL as a fallback.
StatusOr(Status status); // NOLINT StatusOr(const Status& status);
StatusOr& operator=(const Status& status);
// Construct a new StatusOr with the given value. If T is a plain pointer, // TODO(b/62186997): Add operator=(T) overloads.
// value must not be NULL. After calling this constructor, calls to
// ValueOrDie() will succeed, and calls to status() will return OK. // Similar to the `const T&` overload.
// //
// NOTE: Not explicit - we want to use StatusOr<T> as a return type // REQUIRES: T is move constructible.
// so it is convenient and sensible to be able to do 'return T()' StatusOr(T&& value);
// when the return type is StatusOr<T>.
//
// REQUIRES: if T is a plain pointer, value != NULL. This requirement is
// DCHECKed. In optimized builds, passing a NULL pointer here will have
// the effect of passing tensorflow::error::INTERNAL as a fallback.
StatusOr(const T& value); // NOLINT
// Copy constructor. // RValue versions of the operations declared above.
StatusOr(const StatusOr& other) = default; StatusOr(Status&& status);
StatusOr& operator=(Status&& status);
// Conversion copy constructor, T must be copy constructible from U
template <typename U>
StatusOr(const StatusOr<U>& other);
// Assignment operator.
StatusOr& operator=(const StatusOr& other) = default;
// Conversion assignment operator, T must be assignable from U
template <typename U>
StatusOr& operator=(const StatusOr<U>& other);
// Move constructor and move-assignment operator.
StatusOr(StatusOr&& other) = default;
StatusOr& operator=(StatusOr&& other) = default;
// Rvalue-reference overloads of the other constructors and assignment
// operators, to support move-only types and avoid unnecessary copying.
//
// Implementation note: we could avoid all these rvalue-reference overloads
// if the existing lvalue-reference overloads took their arguments by value
// instead. I think this would also let us omit the conversion assignment
// operator altogether, since we'd get the same functionality for free
// from the implicit conversion constructor and ordinary assignment.
// However, this could result in extra copy operations unless we use
// std::move to avoid them, and we can't use std::move because this code
// needs to be portable to C++03.
StatusOr(T&& value); // NOLINT
template <typename U>
StatusOr(StatusOr<U>&& other);
// Returns a reference to our status. If this contains a T, then
// returns Status::OK.
const Status& status() const { return status_; }
// Returns this->status().ok() // Returns this->status().ok()
bool ok() const { return status_.ok(); } bool ok() const { return this->status_.ok(); }
// Returns a reference to our status. If this contains a T, then
// returns Status::OK().
const Status& status() const &;
Status status() &&;
// Returns a reference to our current value, or CHECK-fails if !this->ok(). // Returns a reference to our current value, or CHECK-fails if !this->ok().
const T& ValueOrDie() const; //
T& ValueOrDie(); // Note: for value types that are cheap to copy, prefer simple code:
//
// T value = statusor.ValueOrDie();
//
// Otherwise, if the value type is expensive to copy, but can be left
// in the StatusOr, simply assign to a reference:
//
// T& value = statusor.ValueOrDie(); // or `const T&`
//
// Otherwise, if the value type supports an efficient move, it can be
// used as follows:
//
// T value = std::move(statusor).ValueOrDie();
//
// The std::move on statusor instead of on the whole expression enables
// warnings about possible uses of the statusor object after the move.
// C++ style guide waiver for ref-qualified overloads granted in cl/143176389
// See go/ref-qualifiers for more details on such overloads.
const T& ValueOrDie() const &;
T& ValueOrDie() &;
const T&& ValueOrDie() const &&;
T&& ValueOrDie() &&;
// Moves our current value out of this object and returns it, or CHECK-fails
// if !this->ok().
// Use of this method is discouraged; prefer std::move(statusor.ValueOrDie())
// instead.
T ConsumeValueOrDie() { return std::move(ValueOrDie()); } T ConsumeValueOrDie() { return std::move(ValueOrDie()); }
private: // Ignores any errors. This method does nothing except potentially suppress
Status status_; // complaints from any tools that are checking that errors are not dropped on
T value_; // the floor.
}; void IgnoreError() const;
// Partial specialization for when T is not copy-constructible. This uses all
// methods from the core implementation, but removes copy assignment and copy
// construction.
template <typename T>
class StatusOr<T, false> : public StatusOr<T, true> {
public:
// Remove copies.
StatusOr(const StatusOr& other) = delete;
StatusOr& operator=(const StatusOr& other) = delete;
template <typename U>
StatusOr(const StatusOr<U>& other) = delete;
StatusOr(const T& value) = delete;
// Use the superclass version for other constructors and operators.
StatusOr() = default;
StatusOr(StatusOr&& other) = default;
StatusOr& operator=(StatusOr&& other) = default;
StatusOr(T&& value) // NOLINT
: StatusOr<T, true>::StatusOr(std::move(value)) {}
StatusOr(Status status) // NOLINT
: StatusOr<T, true>::StatusOr(std::move(status)) {}
template <typename U>
StatusOr(StatusOr<U>&& other) // NOLINT
: StatusOr<T, true>::StatusOr(std::move(other)) {}
}; };
//////////////////////////////////////////////////////////////////////////////// ////////////////////////////////////////////////////////////////////////////////
// Implementation details for StatusOr<T> // Implementation details for StatusOr<T>
namespace internal {
class StatusOrHelper {
public:
// Move type-agnostic error handling to the .cc.
static Status HandleInvalidStatusCtorArg();
static Status HandleNullObjectCtorArg();
static void Crash(const Status& status);
// Customized behavior for StatusOr<T> vs. StatusOr<T*>
template <typename T> template <typename T>
struct Specialize; StatusOr<T>::StatusOr() : Base(Status(tensorflow::error::UNKNOWN, "")) {}
};
template <typename T> template <typename T>
struct StatusOrHelper::Specialize { StatusOr<T>::StatusOr(const T& value) : Base(value) {}
// For non-pointer T, a reference can never be NULL.
static inline bool IsValueNull(const T& t) { return false; }
};
template <typename T> template <typename T>
struct StatusOrHelper::Specialize<T*> { StatusOr<T>::StatusOr(const Status& status) : Base(status) {}
static inline bool IsValueNull(const T* t) { return t == NULL; }
};
} // namespace internal template <typename T>
StatusOr<T>& StatusOr<T>::operator=(const Status& status) {
template <typename T, bool CopyConstructible> this->Assign(status);
inline StatusOr<T, CopyConstructible>::StatusOr() return *this;
: status_(tensorflow::error::UNKNOWN, "") {}
template <typename T, bool CopyConstructible>
inline StatusOr<T, CopyConstructible>::StatusOr(Status status)
: status_(std::move(status)) {
if (status_.ok()) {
status_ = internal::StatusOrHelper::HandleInvalidStatusCtorArg();
}
} }
template <typename T, bool CopyConstructible> template <typename T>
inline StatusOr<T, CopyConstructible>::StatusOr(const T& value) StatusOr<T>::StatusOr(T&& value) : Base(std::move(value)) {}
: value_(value) {
if (internal::StatusOrHelper::Specialize<T>::IsValueNull(value)) { template <typename T>
status_ = internal::StatusOrHelper::HandleNullObjectCtorArg(); StatusOr<T>::StatusOr(Status&& status) : Base(std::move(status)) {}
}
template <typename T>
StatusOr<T>& StatusOr<T>::operator=(Status&& status) {
this->Assign(std::move(status));
return *this;
} }
template <typename T, bool CopyConstructible> template <typename T>
template <typename U> template <typename U>
inline StatusOr<T, CopyConstructible>::StatusOr(const StatusOr<U>& other) inline StatusOr<T>::StatusOr(const StatusOr<U>& other)
: status_(other.status_), value_(other.value_) {} : Base(static_cast<const typename StatusOr<U>::Base&>(other)) {}
template <typename T, bool CopyConstructible> template <typename T>
inline StatusOr<T, CopyConstructible>::StatusOr(T&& value)
: value_(std::move(value)) {
if (internal::StatusOrHelper::Specialize<T>::IsValueNull(value_)) {
status_ = internal::StatusOrHelper::HandleNullObjectCtorArg();
}
}
template <typename T, bool CopyConstructible>
template <typename U> template <typename U>
inline StatusOr<T, CopyConstructible>::StatusOr(StatusOr<U>&& other) inline StatusOr<T>& StatusOr<T>::operator=(const StatusOr<U>& other) {
: status_(std::move(other.status_)), value_(std::move(other.value_)) {} if (other.ok())
this->Assign(other.ValueOrDie());
template <typename T, bool CopyConstructible> else
inline const T& StatusOr<T, CopyConstructible>::ValueOrDie() const { this->Assign(other.status());
if (!ok()) { return *this;
internal::StatusOrHelper::Crash(status());
}
return value_;
} }
template <typename T, bool CopyConstructible> template <typename T>
inline T& StatusOr<T, CopyConstructible>::ValueOrDie() { template <typename U>
if (!status_.ok()) { inline StatusOr<T>::StatusOr(StatusOr<U>&& other)
internal::StatusOrHelper::Crash(status()); : Base(static_cast<typename StatusOr<U>::Base&&>(other)) {}
template <typename T>
template <typename U>
inline StatusOr<T>& StatusOr<T>::operator=(StatusOr<U>&& other) {
if (other.ok()) {
this->Assign(std::move(other).ValueOrDie());
} else {
this->Assign(std::move(other).status());
} }
return value_; return *this;
}
template <typename T>
const Status& StatusOr<T>::status() const & {
return this->status_;
}
template <typename T>
Status StatusOr<T>::status() && {
return ok() ? Status::OK() : std::move(this->status_);
}
template <typename T>
const T& StatusOr<T>::ValueOrDie() const & {
this->EnsureOk();
return this->data_;
}
template <typename T>
T& StatusOr<T>::ValueOrDie() & {
this->EnsureOk();
return this->data_;
}
template <typename T>
const T&& StatusOr<T>::ValueOrDie() const && {
this->EnsureOk();
return std::move(this->data_);
}
template <typename T>
T&& StatusOr<T>::ValueOrDie() && {
this->EnsureOk();
return std::move(this->data_);
}
template <typename T>
void StatusOr<T>::IgnoreError() const {
// no-op
} }
} // namespace xla } // namespace xla

View File

@ -0,0 +1,245 @@
/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#ifndef THIRD_PARTY_TENSORFLOW_COMPILER_XLA_STATUSOR_INTERNALS_H_
#define THIRD_PARTY_TENSORFLOW_COMPILER_XLA_STATUSOR_INTERNALS_H_
#include "tensorflow/compiler/xla/status.h"
#include "tensorflow/core/platform/macros.h"
namespace xla {
namespace internal_statusor {
class Helper {
public:
// Move type-agnostic error handling to the .cc.
static void HandleInvalidStatusCtorArg(Status*);
TF_ATTRIBUTE_NORETURN static void Crash(const Status& status);
};
// Construct an instance of T in `p` through placement new, passing Args... to
// the constructor.
// This abstraction is here mostly for the gcc performance fix.
template <typename T, typename... Args>
void PlacementNew(void* p, Args&&... args) {
#if defined(__GNUC__) && !defined(__clang__)
// Teach gcc that 'p' cannot be null, fixing code size issues.
if (p == nullptr) __builtin_unreachable();
#endif
new (p) T(std::forward<Args>(args)...);
}
// Helper base class to hold the data and all operations.
// We move all this to a base class to allow mixing with the appropriate
// TraitsBase specialization.
template <typename T>
class StatusOrData {
template <typename U>
friend class StatusOrData;
public:
StatusOrData() = delete;
StatusOrData(const StatusOrData& other) {
if (other.ok()) {
MakeValue(other.data_);
MakeStatus();
} else {
MakeStatus(other.status_);
}
}
StatusOrData(StatusOrData&& other) noexcept {
if (other.ok()) {
MakeValue(std::move(other.data_));
MakeStatus();
} else {
MakeStatus(std::move(other.status_));
}
}
template <typename U>
StatusOrData(const StatusOrData<U>& other) {
if (other.ok()) {
MakeValue(other.data_);
MakeStatus();
} else {
MakeStatus(other.status_);
}
}
template <typename U>
StatusOrData(StatusOrData<U>&& other) {
if (other.ok()) {
MakeValue(std::move(other.data_));
MakeStatus();
} else {
MakeStatus(std::move(other.status_));
}
}
explicit StatusOrData(const T& value) : data_(value) { MakeStatus(); }
explicit StatusOrData(T&& value) : data_(std::move(value)) { MakeStatus(); }
explicit StatusOrData(const Status& status) : status_(status) {
EnsureNotOk();
}
explicit StatusOrData(Status&& status) : status_(std::move(status)) {
EnsureNotOk();
}
StatusOrData& operator=(const StatusOrData& other) {
if (this == &other) return *this;
if (other.ok())
Assign(other.data_);
else
Assign(other.status_);
return *this;
}
StatusOrData& operator=(StatusOrData&& other) {
if (this == &other) return *this;
if (other.ok())
Assign(std::move(other.data_));
else
Assign(std::move(other.status_));
return *this;
}
~StatusOrData() {
if (ok()) {
status_.~Status();
data_.~T();
} else {
status_.~Status();
}
}
void Assign(const T& value) {
if (ok()) {
data_.~T();
MakeValue(value);
} else {
MakeValue(value);
status_ = Status::OK();
}
}
void Assign(T&& value) {
if (ok()) {
data_.~T();
MakeValue(std::move(value));
} else {
MakeValue(std::move(value));
status_ = Status::OK();
}
}
void Assign(const Status& status) {
Clear();
status_ = status;
EnsureNotOk();
}
void Assign(Status&& status) {
Clear();
status_ = std::move(status);
EnsureNotOk();
}
bool ok() const { return status_.ok(); }
protected:
// status_ will always be active after the constructor.
// We make it a union to be able to initialize exactly how we need without
// waste.
// Eg. in the copy constructor we use the default constructor of Status in
// the ok() path to avoid an extra Ref call.
union {
Status status_;
};
// data_ is active iff status_.ok()==true
struct Dummy {};
union {
// When T is const, we need some non-const object we can cast to void* for
// the placement new. dummy_ is that object.
Dummy dummy_;
T data_;
};
void Clear() {
if (ok()) data_.~T();
}
void EnsureOk() const {
if (!ok()) Helper::Crash(status_);
}
void EnsureNotOk() {
if (ok()) Helper::HandleInvalidStatusCtorArg(&status_);
}
// Construct the value (ie. data_) through placement new with the passed
// argument.
template <typename Arg>
void MakeValue(Arg&& arg) {
internal_statusor::PlacementNew<T>(&dummy_, std::forward<Arg>(arg));
}
// Construct the status (ie. status_) through placement new with the passed
// argument.
template <typename... Args>
void MakeStatus(Args&&... args) {
internal_statusor::PlacementNew<Status>(&status_,
std::forward<Args>(args)...);
}
};
// Helper base class to allow implicitly deleted constructors and assignment
// operations in StatusOr.
// TraitsBase will explicitly delete what it can't support and StatusOr will
// inherit that behavior implicitly.
template <bool Copy, bool Move>
struct TraitsBase {
TraitsBase() = default;
TraitsBase(const TraitsBase&) = default;
TraitsBase(TraitsBase&&) = default;
TraitsBase& operator=(const TraitsBase&) = default;
TraitsBase& operator=(TraitsBase&&) = default;
};
template <>
struct TraitsBase<false, true> {
TraitsBase() = default;
TraitsBase(const TraitsBase&) = delete;
TraitsBase(TraitsBase&&) = default;
TraitsBase& operator=(const TraitsBase&) = delete;
TraitsBase& operator=(TraitsBase&&) = default;
};
template <>
struct TraitsBase<false, false> {
TraitsBase() = default;
TraitsBase(const TraitsBase&) = delete;
TraitsBase(TraitsBase&&) = delete;
TraitsBase& operator=(const TraitsBase&) = delete;
TraitsBase& operator=(TraitsBase&&) = delete;
};
} // namespace internal_statusor
} // namespace xla
#endif // THIRD_PARTY_TENSORFLOW_COMPILER_XLA_STATUSOR_INTERNALS_H_

View File

@ -29,8 +29,6 @@ limitations under the License.
namespace xla { namespace xla {
namespace { namespace {
using tensorflow::Status;
class Base1 { class Base1 {
public: public:
virtual ~Base1() {} virtual ~Base1() {}
@ -59,6 +57,14 @@ class CopyNoAssign {
const CopyNoAssign& operator=(const CopyNoAssign&); const CopyNoAssign& operator=(const CopyNoAssign&);
}; };
class NoDefaultConstructor {
public:
explicit NoDefaultConstructor(int foo);
};
static_assert(!std::is_default_constructible<NoDefaultConstructor>(),
"Should not be default-constructible.");
StatusOr<std::unique_ptr<int>> ReturnUniquePtr() { StatusOr<std::unique_ptr<int>> ReturnUniquePtr() {
// Uses implicit constructor from T&& // Uses implicit constructor from T&&
return std::unique_ptr<int>(new int(0)); return std::unique_ptr<int>(new int(0));
@ -69,6 +75,18 @@ TEST(StatusOr, ElementType) {
static_assert(std::is_same<StatusOr<char>::element_type, char>(), ""); static_assert(std::is_same<StatusOr<char>::element_type, char>(), "");
} }
TEST(StatusOr, TestNoDefaultConstructorInitialization) {
// Explicitly initialize it with an error code.
StatusOr<NoDefaultConstructor> statusor(tensorflow::errors::Cancelled(""));
EXPECT_FALSE(statusor.ok());
EXPECT_EQ(statusor.status().code(), tensorflow::error::CANCELLED);
// Default construction of StatusOr initializes it with an UNKNOWN error code.
StatusOr<NoDefaultConstructor> statusor2;
EXPECT_FALSE(statusor2.ok());
EXPECT_EQ(statusor2.status().code(), tensorflow::error::UNKNOWN);
}
TEST(StatusOr, TestMoveOnlyInitialization) { TEST(StatusOr, TestMoveOnlyInitialization) {
StatusOr<std::unique_ptr<int>> thing(ReturnUniquePtr()); StatusOr<std::unique_ptr<int>> thing(ReturnUniquePtr());
ASSERT_TRUE(thing.ok()); ASSERT_TRUE(thing.ok());

View File

@ -308,6 +308,137 @@ XLA_TEST_P(BatchNormTest, DISABLED_ON_GPU(RandomizedTests)) {
ErrorSpec(0.01, 1)); ErrorSpec(0.01, 1));
} }
// TODO(b/62764704): Implement on GPU. Disabled on 2017-06-20.
XLA_TEST_P(BatchNormTest, DISABLED_ON_CPU_PARALLEL(DISABLED_ON_CPU(
DISABLED_ON_GPU(RandomizedGradTests)))) {
float epsilon = 0.001;
ComputationBuilder builder(client_, TestName());
const std::vector<int64>& bounds = GetParam().bounds;
Array4D<float> input_array(bounds[0], bounds[1], bounds[2], bounds[3]);
input_array.FillRandom(GetParam().random_value_var,
GetParam().random_value_mean);
Array4D<float> grad_output_array(bounds[0], bounds[1], bounds[2], bounds[3]);
grad_output_array.FillRandom(GetParam().random_value_var,
GetParam().random_value_mean);
const int64 feature_index = GetParam().feature_index;
const int64 num_elements_per_feature =
Product(bounds) / bounds[feature_index];
const int64 feature_bound = bounds[feature_index];
std::vector<float> scale(feature_bound, 2);
auto input_squared =
ReferenceUtil::MapArray4D(input_array, [](float a) { return a * a; });
std::vector<int64> reduce_dims;
for (int64 i = 0; i < bounds.size(); ++i) {
if (i != feature_index) {
reduce_dims.push_back(i);
}
}
auto sum =
ReferenceUtil::Reduce4DTo1D(input_array, /*init=*/0.0f, reduce_dims,
[](float a, float b) { return a + b; });
auto sum_squared =
ReferenceUtil::Reduce4DTo1D(*input_squared, /*init=*/0.0f, reduce_dims,
[](float a, float b) { return a + b; });
std::vector<float> mean(feature_bound);
for (int64 i = 0; i < feature_bound; ++i) {
mean[i] = sum[i] / num_elements_per_feature;
}
std::vector<float> mean_square(feature_bound);
for (int64 i = 0; i < feature_bound; ++i) {
mean_square[i] = mean[i] * mean[i];
}
std::vector<float> square_mean(feature_bound);
for (int64 i = 0; i < feature_bound; ++i) {
square_mean[i] = sum_squared[i] / num_elements_per_feature;
}
std::vector<float> var(feature_bound);
for (int64 i = 0; i < feature_bound; ++i) {
var[i] = square_mean[i] - mean_square[i];
}
Array4D<float> mean_4D =
*ReferenceUtil::Broadcast1DTo4D(mean, bounds, feature_index);
auto var_4D = *ReferenceUtil::Broadcast1DTo4D(var, bounds, feature_index);
auto scale_4D = *ReferenceUtil::Broadcast1DTo4D(scale, bounds, feature_index);
auto var_add_epsilon = *ReferenceUtil::MapArray4D(
var_4D, [epsilon](float a) { return std::sqrt(a + epsilon); });
auto grad_output_times_var =
*ReferenceUtil::MapArray4D(grad_output_array, var_add_epsilon,
[](float a, float b) { return a * b; });
auto grad_activation = *ReferenceUtil::MapArray4D(
grad_output_times_var, scale_4D, [](float a, float b) { return a * b; });
auto activation_shifted = *ReferenceUtil::MapArray4D(
input_array, mean_4D, [](float a, float b) { return a - b; });
auto grad_scale_before_reduction =
*ReferenceUtil::MapArray4D(grad_output_times_var, activation_shifted,
[](float a, float b) { return a * b; });
auto grad_scale = ReferenceUtil::Reduce4DTo1D(
grad_scale_before_reduction, /*init=*/0.0f, reduce_dims,
[](float a, float b) { return a + b; });
auto grad_offset =
ReferenceUtil::Reduce4DTo1D(grad_output_array, /*init=*/0.0f, reduce_dims,
[](float a, float b) { return a + b; });
auto expected_grad_activation =
Literal::CreateR4FromArray4D<float>(grad_activation);
auto input_literal = Literal::CreateR4FromArray4D<float>(input_array);
auto scale_literal = Literal::CreateR1<float>(scale);
auto mean_literal = Literal::CreateR1<float>(mean);
auto var_literal = Literal::CreateR1<float>(var);
auto grad_output_literal =
Literal::CreateR4FromArray4D<float>(grad_output_array);
auto input_parameter = builder.Parameter(0, input_literal->shape(), "input");
auto scale_parameter = builder.Parameter(1, scale_literal->shape(), "scale");
auto mean_parameter = builder.Parameter(2, mean_literal->shape(), "mean");
auto var_parameter = builder.Parameter(3, var_literal->shape(), "variance");
auto grad_output_parameter =
builder.Parameter(4, grad_output_literal->shape(), "grad_output");
std::unique_ptr<GlobalData> input_data =
client_->TransferToServer(*input_literal).ConsumeValueOrDie();
std::unique_ptr<GlobalData> scale_data =
client_->TransferToServer(*scale_literal).ConsumeValueOrDie();
std::unique_ptr<GlobalData> mean_data =
client_->TransferToServer(*mean_literal).ConsumeValueOrDie();
std::unique_ptr<GlobalData> var_data =
client_->TransferToServer(*var_literal).ConsumeValueOrDie();
std::unique_ptr<GlobalData> grad_output_data =
client_->TransferToServer(*grad_output_literal).ConsumeValueOrDie();
auto t = builder.BatchNormGrad(input_parameter, scale_parameter,
mean_parameter, var_parameter,
grad_output_parameter, epsilon, feature_index);
auto expected =
*Literal::MakeTuple({expected_grad_activation.get(),
Literal::CreateR1<float>(grad_scale).get(),
Literal::CreateR1<float>(grad_offset).get()});
ComputeAndCompareTuple(&builder, expected,
{input_data.get(), scale_data.get(), mean_data.get(),
var_data.get(), grad_output_data.get()},
ErrorSpec(0.01, 1));
}
INSTANTIATE_TEST_CASE_P( INSTANTIATE_TEST_CASE_P(
BatchNormTest_Instantiation, BatchNormTest, BatchNormTest_Instantiation, BatchNormTest,
::testing::Values(BatchNormTestParam{{2, 2, 2, 2}, 0, 100.2f, 200.0f}, ::testing::Values(BatchNormTestParam{{2, 2, 2, 2}, 0, 100.2f, 200.0f},
@ -319,6 +450,7 @@ INSTANTIATE_TEST_CASE_P(
BatchNormTestParam{{10, 10, 10, 10}, 1, -666.6f, 777.7f}, BatchNormTestParam{{10, 10, 10, 10}, 1, -666.6f, 777.7f},
BatchNormTestParam{{10, 10, 10, 10}, 2, 0.f, 777.7f}, BatchNormTestParam{{10, 10, 10, 10}, 2, 0.f, 777.7f},
BatchNormTestParam{{1, 1, 10, 130}, 2, 0.f, 777.7f}, BatchNormTestParam{{1, 1, 10, 130}, 2, 0.f, 777.7f},
BatchNormTestParam{{1, 1, 130, 11}, 2, 0.f, 777.7f},
BatchNormTestParam{{1, 1, 10, 1}, 3, 888.8f, 9.9f}, BatchNormTestParam{{1, 1, 10, 1}, 3, 888.8f, 9.9f},
BatchNormTestParam{{24, 129, 1, 2}, 2, 10000, 10000}, BatchNormTestParam{{24, 129, 1, 2}, 2, 10000, 10000},
@ -446,6 +578,37 @@ XLA_TEST_F(BatchNormTest, DISABLED_ON_GPU(LargeEpsilonTest)) {
ErrorSpec(0.1)); ErrorSpec(0.1));
} }
// TODO(b/62764704): Implement on CPU and GPU. Disabled on 2017-07-11.
XLA_TEST_F(BatchNormTest, DISABLED_ON_CPU_PARALLEL(DISABLED_ON_CPU(
DISABLED_ON_GPU(BatchNormGradBasic)))) {
const int kFeatureIndex = 2;
ComputationBuilder builder(client_, TestName());
auto operand =
builder.ConstantR4FromArray4D<float>(Array4D<float>(2, 2, 2, 1, 0.0f));
auto scale = builder.ConstantR1<float>({1.0f, 1.0f});
auto mean = builder.ConstantR1<float>({0.0f, 0.0f});
auto var = builder.ConstantR1<float>({1.0f, 1.0f});
auto grad_output = builder.ConstantR4FromArray4D<float>(
{{{{1.f}, {2.f}}, {{3.f}, {4.f}}}, {{{5.f}, {6.f}}, {{7.f}, {8.f}}}});
builder.BatchNormGrad(operand, scale, mean, var, grad_output,
/*epsilon=*/0.0, kFeatureIndex);
auto expected = *Literal::MakeTuple(
{Literal::CreateR4<float>(
{{{{1.f}, {2.f}}, {{3.f}, {4.f}}}, {{{5.f}, {6.f}}, {{7.f}, {8.f}}}})
.get(),
Literal::CreateR1<float>({0, 0}).get(),
Literal::CreateR1<float>({16, 20}).get()});
ComputeAndCompareTuple(&builder, expected, {}, ErrorSpec(0.1));
}
} // namespace } // namespace
} // namespace xla } // namespace xla

View File

@ -44,295 +44,310 @@ namespace {
class DynamicSliceTest : public ClientLibraryTestBase { class DynamicSliceTest : public ClientLibraryTestBase {
protected: protected:
template <typename IndexT> template <typename IndexT, typename DataT>
void TestR1() { void TestR1() {
// Slice at dimension start. // Slice at dimension start.
RunR1<IndexT>({0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0}, {0}, {5}, RunR1<IndexT, DataT>({0, 1, 2, 3, 4, 5, 6, 7}, {0}, {5}, {0, 1, 2, 3, 4});
{0.0, 1.0, 2.0, 3.0, 4.0});
// Slice in the middle. // Slice in the middle.
RunR1<IndexT>({0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0}, {2}, {3}, RunR1<IndexT, DataT>({0, 1, 2, 3, 4, 5, 6, 7}, {2}, {3}, {2, 3, 4});
{2.0, 3.0, 4.0});
// Slice at dimension boundaries. // Slice at dimension boundaries.
RunR1<IndexT>({0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0}, {5}, {3}, RunR1<IndexT, DataT>({0, 1, 2, 3, 4, 5, 6, 7}, {5}, {3}, {5, 6, 7});
{5.0, 6.0, 7.0});
// Slice at dimension boundaries, but with sizes that cause indices to wrap. // Slice at dimension boundaries, but with sizes that cause indices to wrap.
RunR1<IndexT>({0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0}, {6}, {4}, RunR1<IndexT, DataT>({0, 1, 2, 3, 4, 5, 6, 7}, {6}, {4}, {6, 7, 0, 1});
{6.0, 7.0, 0.0, 1.0});
// Zero element slice. // Zero element slice.
RunR1<IndexT>({0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0}, {2}, {0}, {}); RunR1<IndexT, DataT>({0, 1, 2, 3, 4, 5, 6, 7}, {2}, {0}, {});
} }
template <typename IndexT> template <typename IndexT, typename DataT>
void TestR2() { void TestR2() {
// Slice at dimension start. // Slice at dimension start.
RunR2<IndexT>({{1.0f, 2.0f, 3.0f}, {4.0f, 5.0f, 6.0f}, {7.0f, 8.0f, 9.0f}}, RunR2<IndexT, DataT>({{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}, {0, 0}, {2, 2},
{0, 0}, {2, 2}, {{1.0f, 2.0f}, {4.0f, 5.0f}}); {{1, 2}, {4, 5}});
// Slice in the middle. // Slice in the middle.
RunR2<IndexT>({{1.0f, 2.0f, 3.0f}, {4.0f, 5.0f, 6.0f}, {7.0f, 8.0f, 9.0f}}, RunR2<IndexT, DataT>({{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}, {1, 1}, {2, 1},
{1, 1}, {2, 1}, {{5.0f}, {8.0f}}); {{5}, {8}});
// Slice at dimension boundaries. // Slice at dimension boundaries.
RunR2<IndexT>({{1.0f, 2.0f, 3.0f}, {4.0f, 5.0f, 6.0f}, {7.0f, 8.0f, 9.0f}}, RunR2<IndexT, DataT>({{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}, {1, 1}, {2, 1},
{1, 1}, {2, 1}, {{5.0f}, {8.0f}}); {{5}, {8}});
// Slice at dimension boundaries, but with sizes that cause indices to wrap. // Slice at dimension boundaries, but with sizes that cause indices to wrap.
RunR2<IndexT>({{1.0f, 2.0f, 3.0f}, {4.0f, 5.0f, 6.0f}, {7.0f, 8.0f, 9.0f}}, RunR2<IndexT, DataT>({{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}, {1, 1}, {3, 3},
{1, 1}, {3, 3}, {{5, 6, 4}, {8, 9, 7}, {2, 3, 1}});
{{5.0f, 6.0f, 4.0f}, {8.0f, 9.0f, 7.0f}, {2.0f, 3.0f, 1.0f}});
// Zero element slice: 2x0. // Zero element slice: 2x0.
RunR2<IndexT>({{1.0f, 2.0f, 3.0f}, {4.0f, 5.0f, 6.0f}, {7.0f, 8.0f, 9.0f}}, RunR2<IndexT, DataT>({{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}, {0, 0}, {2, 0},
{0, 0}, {2, 0}, {{}, {}}); {{}, {}});
// Zero element slice: 0x2. // Zero element slice: 0x2.
RunR2<IndexT>({{1.0f, 2.0f, 3.0f}, {4.0f, 5.0f, 6.0f}, {7.0f, 8.0f, 9.0f}}, RunR2<IndexT, DataT>({{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}, {0, 0}, {0, 2},
{0, 0}, {0, 2}, Array2D<float>(0, 2)); Array2D<DataT>(0, 2));
} }
template <typename IndexT> template <typename IndexT, typename DataT>
void TestR3() { void TestR3() {
// R3 Shape: [2, 3, 2] // R3 Shape: [2, 3, 2]
// clang-format off // clang-format off
// Slice at dimension start. // Slice at dimension start.
RunR3<IndexT>( RunR3<IndexT, DataT>(
{{{1.0f, 2.0f}, {3.0f, 4.0f}, {5.0f, 6.0f}}, {{{1, 2}, {3, 4}, {5, 6}},
{{7.0f, 8.0f}, {9.0f, 10.0f}, {11.0f, 12.0f}}}, {{7, 8}, {9, 10}, {11, 12}}},
{0, 0, 0}, {2, 1, 2}, {0, 0, 0}, {2, 1, 2},
{{{1.0f, 2.0f}}, {{7.0f, 8.0f}}}); {{{1, 2}}, {{7, 8}}});
// Slice in the middle. // Slice in the middle.
RunR3<IndexT>( RunR3<IndexT, DataT>(
{{{1.0f, 2.0f}, {3.0f, 4.0f}, {5.0f, 6.0f}}, {{{1, 2}, {3, 4}, {5, 6}},
{{7.0f, 8.0f}, {9.0f, 10.0f}, {11.0f, 12.0f}}}, {{7, 8}, {9, 10}, {11, 12}}},
{0, 1, 1}, {2, 2, 1}, {0, 1, 1}, {2, 2, 1},
{{{4.0f}, {6.0f}}, {{10.0f}, {12.0f}}}); {{{4}, {6}}, {{10}, {12}}});
// Slice at dimension boundaries, but with sizes that cause indices to wrap. // Slice at dimension boundaries, but with sizes that cause indices to wrap.
RunR3<IndexT>( RunR3<IndexT, DataT>(
{{{1.0f, 2.0f}, {3.0f, 4.0f}, {5.0f, 6.0f}}, {{{1, 2}, {3, 4}, {5, 6}},
{{7.0f, 8.0f}, {9.0f, 10.0f}, {11.0f, 12.0f}}}, {{7, 8}, {9, 10}, {11, 12}}},
{0, 2, 1}, {2, 1, 2}, {0, 2, 1}, {2, 1, 2},
{{{6.0f, 5.0f}}, {{12.0f, 11.0f}}}); {{{6, 5}}, {{12, 11}}});
// clang-format on // clang-format on
} }
template <typename IndexT> template <typename IndexT, typename DataT>
void RunR1(const std::vector<float>& input_values, void RunR1(tensorflow::gtl::ArraySlice<DataT> input_values,
const std::vector<IndexT> slice_starts, const std::vector<IndexT> slice_starts,
const std::vector<int64>& slice_sizes, const std::vector<int64>& slice_sizes,
const std::vector<float>& expected_values) { tensorflow::gtl::ArraySlice<DataT> expected_values) {
ComputationBuilder builder(client_, TestName()); ComputationBuilder builder(client_, TestName());
// Initialize and transfer dynamic slice start indices parameter. // Initialize and transfer dynamic slice start indices parameter.
ComputationDataHandle starts; ComputationDataHandle starts;
std::unique_ptr<GlobalData> start_data = CreateR1Parameter<IndexT>( std::unique_ptr<GlobalData> start_data = CreateR1Parameter<IndexT>(
slice_starts, 0, "slice_starts", &builder, &starts); slice_starts, 0, "slice_starts", &builder, &starts);
// Build dynamic slice computation. // Build dynamic slice computation.
auto input = builder.ConstantR1<float>(input_values); auto input = builder.ConstantR1<DataT>(input_values);
builder.DynamicSlice(input, starts, slice_sizes); builder.DynamicSlice(input, starts, slice_sizes);
// Run computation and compare against expected values. // Run computation and compare against expected values.
ComputeAndCompareR1<float>(&builder, expected_values, {start_data.get()}, ComputeAndCompareR1<DataT>(&builder, expected_values, {start_data.get()});
ErrorSpec(0.000001));
} }
template <typename IndexT> template <typename IndexT, typename DataT>
void RunR2(const Array2D<float>& input_values, void RunR2(const Array2D<DataT>& input_values,
const std::vector<IndexT> slice_starts, const std::vector<IndexT> slice_starts,
const std::vector<int64>& slice_sizes, const std::vector<int64>& slice_sizes,
const Array2D<float>& expected_values) { const Array2D<DataT>& expected_values) {
ComputationBuilder builder(client_, TestName()); ComputationBuilder builder(client_, TestName());
// Initialize and transfer dynamic slice start indices parameter. // Initialize and transfer dynamic slice start indices parameter.
ComputationDataHandle starts; ComputationDataHandle starts;
std::unique_ptr<GlobalData> start_data = CreateR1Parameter<IndexT>( std::unique_ptr<GlobalData> start_data = CreateR1Parameter<IndexT>(
slice_starts, 0, "slice_starts", &builder, &starts); slice_starts, 0, "slice_starts", &builder, &starts);
// Build dynamic slice computation. // Build dynamic slice computation.
auto input = builder.ConstantR2FromArray2D<float>(input_values); auto input = builder.ConstantR2FromArray2D<DataT>(input_values);
builder.DynamicSlice(input, starts, slice_sizes); builder.DynamicSlice(input, starts, slice_sizes);
// Run computation and compare against expected values. // Run computation and compare against expected values.
ComputeAndCompareR2<float>(&builder, expected_values, {start_data.get()}, ComputeAndCompareR2<DataT>(&builder, expected_values, {start_data.get()});
ErrorSpec(0.000001));
} }
template <typename IndexT> template <typename IndexT, typename DataT>
void RunR3(const Array3D<float>& input_values, void RunR3(const Array3D<DataT>& input_values,
const std::vector<IndexT> slice_starts, const std::vector<IndexT> slice_starts,
const std::vector<int64>& slice_sizes, const std::vector<int64>& slice_sizes,
const Array3D<float>& expected_values) { const Array3D<DataT>& expected_values) {
ComputationBuilder builder(client_, TestName()); ComputationBuilder builder(client_, TestName());
// Initialize and transfer dynamic slice start indices parameter. // Initialize and transfer dynamic slice start indices parameter.
ComputationDataHandle starts; ComputationDataHandle starts;
std::unique_ptr<GlobalData> start_data = CreateR1Parameter<IndexT>( std::unique_ptr<GlobalData> start_data = CreateR1Parameter<IndexT>(
slice_starts, 0, "slice_starts", &builder, &starts); slice_starts, 0, "slice_starts", &builder, &starts);
// Build dynamic slice computation. // Build dynamic slice computation.
auto input = builder.ConstantR3FromArray3D<float>(input_values); auto input = builder.ConstantR3FromArray3D<DataT>(input_values);
builder.DynamicSlice(input, starts, slice_sizes); builder.DynamicSlice(input, starts, slice_sizes);
// Run computation and compare against expected values. // Run computation and compare against expected values.
ComputeAndCompareR3<float>(&builder, expected_values, {start_data.get()}, ComputeAndCompareR3<DataT>(&builder, expected_values, {start_data.get()});
ErrorSpec(0.000001));
} }
}; };
XLA_TEST_F(DynamicSliceTest, Int32R1) { TestR1<int32>(); } XLA_TEST_F(DynamicSliceTest, Int32R1) { TestR1<int32, int32>(); }
XLA_TEST_F(DynamicSliceTest, Int64R1) { TestR1<int64>(); } XLA_TEST_F(DynamicSliceTest, Int64R1) { TestR1<int64, float>(); }
XLA_TEST_F(DynamicSliceTest, UInt64R1) { TestR1<uint64>(); } XLA_TEST_F(DynamicSliceTest, UInt64R1) { TestR1<uint64, double>(); }
XLA_TEST_F(DynamicSliceTest, Int32R2) { TestR2<int32>(); } XLA_TEST_F(DynamicSliceTest, Int32R2) { TestR2<int32, float>(); }
XLA_TEST_F(DynamicSliceTest, Int64R2) { TestR2<int64>(); } XLA_TEST_F(DynamicSliceTest, Int64R2) { TestR2<int64, double>(); }
XLA_TEST_F(DynamicSliceTest, UInt64R2) { TestR2<uint64>(); } XLA_TEST_F(DynamicSliceTest, UInt64R2) { TestR2<uint64, int32>(); }
XLA_TEST_F(DynamicSliceTest, Int32R3) { TestR3<int32>(); } XLA_TEST_F(DynamicSliceTest, Int32R3) { TestR3<int32, int32>(); }
XLA_TEST_F(DynamicSliceTest, Int64R3) { TestR3<int64>(); } XLA_TEST_F(DynamicSliceTest, Int64R3) { TestR3<int64, float>(); }
XLA_TEST_F(DynamicSliceTest, UInt64R3) { TestR3<uint64>(); } XLA_TEST_F(DynamicSliceTest, UInt64R3) { TestR3<uint64, double>(); }
XLA_TEST_F(DynamicSliceTest, Int32R1Pred) {
// Slice at dimension start.
RunR1<int32, bool>({true, false, false, true, false, true, true, false}, {0},
{5}, {true, false, false, true, false});
// Slice in the middle.
RunR1<int32, bool>({true, false, false, true, false, true, true, false}, {2},
{3}, {false, true, false});
// Slice at dimension boundaries.
RunR1<int32, bool>({true, false, false, true, false, true, true, false}, {5},
{3}, {true, true, false});
// Zero element slice.
RunR1<int32, bool>({true, false, false, true, false, true, true, false}, {2},
{0}, {});
}
XLA_TEST_F(DynamicSliceTest, Int32R2Pred) {
// Slice at dimension start.
RunR2<int32, bool>(
{{true, false, true}, {false, false, true}, {true, true, false}}, {0, 0},
{2, 2}, {{true, false}, {false, false}});
// Slice in the middle.
RunR2<int32, bool>(
{{true, false, true}, {false, false, true}, {true, true, false}}, {1, 1},
{2, 1}, {{false}, {true}});
// Slice at dimension boundaries.
RunR2<int32, bool>(
{{true, false, true}, {false, false, true}, {true, true, false}}, {1, 1},
{2, 1}, {{false}, {true}});
// Zero element slice: 2x0.
RunR2<int32, bool>(
{{true, false, true}, {false, false, true}, {true, true, false}}, {0, 0},
{2, 0}, {{}, {}});
// Zero element slice: 0x2.
RunR2<int32, bool>(
{{true, false, true}, {false, false, true}, {true, true, false}}, {0, 0},
{0, 2}, Array2D<bool>(0, 2));
}
XLA_TEST_F(DynamicSliceTest, Int32R3Pred) {
// R3 Shape: [2, 3, 2]
// clang-format off
// Slice at dimension start.
RunR3<int32, bool>(
{{{true, false}, {false, true}, {true, true}},
{{false, true}, {true, false}, {false, false}}},
{0, 0, 0}, {2, 1, 2},
{{{true, false}}, {{false, true}}});
// Slice in the middle.
RunR3<int32, bool>(
{{{true, false}, {false, true}, {true, true}},
{{false, true}, {true, false}, {false, false}}},
{0, 1, 1}, {2, 2, 1},
{{{true}, {true}}, {{false}, {false}}});
// clang-format on
}
class DynamicUpdateSliceTest : public ClientLibraryTestBase { class DynamicUpdateSliceTest : public ClientLibraryTestBase {
protected: protected:
template <typename IndexT> template <typename IndexT, typename DataT>
void TestR1() { void TestR1() {
// clang-format off
// Slice at dimension start. // Slice at dimension start.
RunR1<IndexT>({0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0}, RunR1<IndexT, DataT>({0, 1, 2, 3, 4, 5, 6, 7}, {8, 9, 10}, {0},
{8.0, 9.0, 10.0}, {0}, {8, 9, 10, 3, 4, 5, 6, 7});
{8.0, 9.0, 10.0, 3.0, 4.0, 5.0, 6.0, 7.0});
// Slice in the middle. // Slice in the middle.
RunR1<IndexT>({0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0}, RunR1<IndexT, DataT>({0, 1, 2, 3, 4, 5, 6, 7}, {8, 9, 10}, {2},
{8.0, 9.0, 10.0}, {2}, {0, 1, 8, 9, 10, 5, 6, 7});
{0.0, 1.0, 8.0, 9.0, 10.0, 5.0, 6.0, 7.0});
// Slice at dimension boundaries. // Slice at dimension boundaries.
RunR1<IndexT>({0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0}, RunR1<IndexT, DataT>({0, 1, 2, 3, 4, 5, 6, 7}, {8, 9, 10}, {5},
{8.0, 9.0, 10.0}, {5}, {0, 1, 2, 3, 4, 8, 9, 10});
{0.0, 1.0, 2.0, 3.0, 4.0, 8.0, 9.0, 10.0});
// Slice at dimension boundaries, but with sizes that cause indices to wrap. // Slice at dimension boundaries, but with sizes that cause indices to wrap.
RunR1<IndexT>({0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0}, RunR1<IndexT, DataT>({0, 1, 2, 3, 4, 5, 6, 7}, {8, 9, 10}, {6},
{8.0, 9.0, 10.0}, {6}, {0, 1, 2, 3, 4, 5, 8, 9});
{0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 8.0, 9.0});
// Zero-sized update. // Zero-sized update.
RunR1<IndexT>({0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0}, RunR1<IndexT, DataT>({0, 1, 2, 3, 4, 5, 6, 7}, {}, {2},
{}, {2}, {0, 1, 2, 3, 4, 5, 6, 7});
{0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0});
// clang-format on
} }
template <typename IndexT> template <typename IndexT, typename DataT>
void TestR2() { void TestR2() {
// clang-format off
// Slice at dimension start. // Slice at dimension start.
RunR2<IndexT>( RunR2<IndexT, DataT>({{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}, {{10, 11}}, {0, 0},
{{1.0f, 2.0f, 3.0f}, {4.0f, 5.0f, 6.0f}, {7.0f, 8.0f, 9.0f}}, {{10, 11, 3}, {4, 5, 6}, {7, 8, 9}});
{{10.0f, 11.0f}}, {0, 0},
{{10.0f, 11.0f, 3.0f}, {4.0f, 5.0f, 6.0f}, {7.0f, 8.0f, 9.0f}});
// Slice in the middle. // Slice in the middle.
RunR2<IndexT>( RunR2<IndexT, DataT>({{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}, {{10, 11}}, {1, 1},
{{1.0f, 2.0f, 3.0f}, {4.0f, 5.0f, 6.0f}, {7.0f, 8.0f, 9.0f}}, {{1, 2, 3}, {4, 10, 11}, {7, 8, 9}});
{{10.0f, 11.0f}}, {1, 1},
{{1.0f, 2.0f, 3.0f}, {4.0f, 10.0f, 11.0f}, {7.0f, 8.0f, 9.0f}});
// Slice at dimension boundaries. // Slice at dimension boundaries.
RunR2<IndexT>( RunR2<IndexT, DataT>({{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}, {{10, 11}}, {2, 1},
{{1.0f, 2.0f, 3.0f}, {4.0f, 5.0f, 6.0f}, {7.0f, 8.0f, 9.0f}}, {{1, 2, 3}, {4, 5, 6}, {7, 10, 11}});
{{10.0f, 11.0f}}, {2, 1},
{{1.0f, 2.0f, 3.0f}, {4.0f, 5.0f, 6.0f}, {7.0f, 10.0f, 11.0f}});
// Slice at dimension boundaries, but with sizes that cause indices to wrap. // Slice at dimension boundaries, but with sizes that cause indices to wrap.
RunR2<IndexT>( RunR2<IndexT, DataT>({{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}, {{10, 11}}, {2, 2},
{{1.0f, 2.0f, 3.0f}, {4.0f, 5.0f, 6.0f}, {7.0f, 8.0f, 9.0f}}, {{1, 2, 3}, {4, 5, 6}, {7, 8, 10}});
{{10.0f, 11.0f}}, {2, 2},
{{1.0f, 2.0f, 3.0f}, {4.0f, 5.0f, 6.0f}, {7.0f, 8.0f, 10.0f}});
// Zero-sized update. // Zero-sized update.
RunR2<IndexT>( RunR2<IndexT, DataT>({{1, 2, 3}, {4, 5, 6}, {7, 8, 9}}, {{}}, {2, 1},
{{1.0f, 2.0f, 3.0f}, {4.0f, 5.0f, 6.0f}, {7.0f, 8.0f, 9.0f}}, {{1, 2, 3}, {4, 5, 6}, {7, 8, 9}});
{{}}, {2, 1},
{{1.0f, 2.0f, 3.0f}, {4.0f, 5.0f, 6.0f}, {7.0f, 8.0f, 9.0f}});
// clang-format on
} }
template <typename IndexT> template <typename IndexT, typename DataT>
void TestR3() { void TestR3() {
// R3 Shape: [2, 3, 2] // R3 Shape: [2, 3, 2]
// clang-format off
// Slice at dimension start. // Slice at dimension start.
RunR3<IndexT>( RunR3<IndexT, DataT>(
{{{1.0f, 2.0f}, {3.0f, 4.0f}, {5.0f, 6.0f}}, {{{1, 2}, {3, 4}, {5, 6}}, {{7, 8}, {9, 10}, {11, 12}}},
{{7.0f, 8.0f}, {9.0f, 10.0f}, {11.0f, 12.0f}}}, {{{13, 14}, {15, 16}}, {{17, 18}, {19, 20}}}, {0, 0, 0},
{{{13.0f, 14.0f}, {15.0f, 16.0f}}, {{{13, 14}, {15, 16}, {5, 6}}, {{17, 18}, {19, 20}, {11, 12}}});
{{17.0f, 18.0f}, {19.0f, 20.0f}}},
{0, 0, 0},
{{{13.0f, 14.0f}, {15.0f, 16.0f}, {5.0f, 6.0f}},
{{17.0f, 18.0f}, {19.0f, 20.0f}, {11.0f, 12.0f}}});
// Slice in the middle. // Slice in the middle.
RunR3<IndexT>( RunR3<IndexT, DataT>(
{{{1.0f, 2.0f}, {3.0f, 4.0f}, {5.0f, 6.0f}}, {{{1, 2}, {3, 4}, {5, 6}}, {{7, 8}, {9, 10}, {11, 12}}}, {{{13}, {15}}},
{{7.0f, 8.0f}, {9.0f, 10.0f}, {11.0f, 12.0f}}}, {1, 1, 1}, {{{1, 2}, {3, 4}, {5, 6}}, {{7, 8}, {9, 13}, {11, 15}}});
{{{13.0f}, {15.0f}}},
{1, 1, 1},
{{{1.0f, 2.0f}, {3.0f, 4.0f}, {5.0f, 6.0f}},
{{7.0f, 8.0f}, {9.0f, 13.0f}, {11.0f, 15.0f}}});
// Slice at dimension boundaries, but with sizes that cause indices to wrap. // Slice at dimension boundaries, but with sizes that cause indices to wrap.
RunR3<IndexT>( RunR3<IndexT, DataT>(
{{{1.0f, 2.0f}, {3.0f, 4.0f}, {5.0f, 6.0f}}, {{{1, 2}, {3, 4}, {5, 6}}, {{7, 8}, {9, 10}, {11, 12}}}, {{{13}, {15}}},
{{7.0f, 8.0f}, {9.0f, 10.0f}, {11.0f, 12.0f}}}, {1, 2, 1}, {{{1, 2}, {3, 4}, {5, 6}}, {{7, 8}, {9, 10}, {11, 13}}});
{{{13.0f}, {15.0f}}},
{1, 2, 1},
{{{1.0f, 2.0f}, {3.0f, 4.0f}, {5.0f, 6.0f}},
{{7.0f, 8.0f}, {9.0f, 10.0f}, {11.0f, 13.0f}}});
// clang-format on
} }
template <typename IndexT> template <typename IndexT, typename DataT>
void RunR1(const std::vector<float>& input_values, void RunR1(tensorflow::gtl::ArraySlice<DataT> input_values,
const std::vector<float>& update_values, tensorflow::gtl::ArraySlice<DataT> update_values,
const std::vector<IndexT> slice_starts, const std::vector<IndexT> slice_starts,
const std::vector<float>& expected_values) { tensorflow::gtl::ArraySlice<DataT> expected_values) {
ComputationBuilder builder(client_, TestName()); ComputationBuilder builder(client_, TestName());
// Initialize and transfer dynamic slice start indices parameter. // Initialize and transfer dynamic slice start indices parameter.
ComputationDataHandle starts; ComputationDataHandle starts;
std::unique_ptr<GlobalData> start_data = CreateR1Parameter<IndexT>( std::unique_ptr<GlobalData> start_data = CreateR1Parameter<IndexT>(
slice_starts, 0, "slice_starts", &builder, &starts); slice_starts, 0, "slice_starts", &builder, &starts);
// Build dynamic slice computation. // Build dynamic slice computation.
auto input = builder.ConstantR1<float>(input_values); auto input = builder.ConstantR1<DataT>(input_values);
auto update = builder.ConstantR1<float>(update_values); auto update = builder.ConstantR1<DataT>(update_values);
builder.DynamicUpdateSlice(input, update, starts); builder.DynamicUpdateSlice(input, update, starts);
// Run computation and compare against expected values. // Run computation and compare against expected values.
ComputeAndCompareR1<float>(&builder, expected_values, {start_data.get()}, ComputeAndCompareR1<DataT>(&builder, expected_values, {start_data.get()});
ErrorSpec(0.000001));
} }
template <typename IndexT> template <typename IndexT, typename DataT>
void RunR2(const Array2D<float>& input_values, void RunR2(const Array2D<DataT>& input_values,
const Array2D<float>& update_values, const Array2D<DataT>& update_values,
const std::vector<IndexT> slice_starts, const std::vector<IndexT> slice_starts,
const Array2D<float>& expected_values) { const Array2D<DataT>& expected_values) {
ComputationBuilder builder(client_, TestName()); ComputationBuilder builder(client_, TestName());
// Initialize and transfer dynamic slice start indices parameter. // Initialize and transfer dynamic slice start indices parameter.
ComputationDataHandle starts; ComputationDataHandle starts;
std::unique_ptr<GlobalData> start_data = CreateR1Parameter<IndexT>( std::unique_ptr<GlobalData> start_data = CreateR1Parameter<IndexT>(
slice_starts, 0, "slice_starts", &builder, &starts); slice_starts, 0, "slice_starts", &builder, &starts);
// Build dynamic slice computation. // Build dynamic slice computation.
auto input = builder.ConstantR2FromArray2D<float>(input_values); auto input = builder.ConstantR2FromArray2D<DataT>(input_values);
auto update = builder.ConstantR2FromArray2D<float>(update_values); auto update = builder.ConstantR2FromArray2D<DataT>(update_values);
builder.DynamicUpdateSlice(input, update, starts); builder.DynamicUpdateSlice(input, update, starts);
// Run computation and compare against expected values. // Run computation and compare against expected values.
ComputeAndCompareR2<float>(&builder, expected_values, {start_data.get()}, ComputeAndCompareR2<DataT>(&builder, expected_values, {start_data.get()});
ErrorSpec(0.000001));
} }
template <typename IndexT> template <typename IndexT, typename DataT>
void RunR3(const Array3D<float>& input_values, void RunR3(const Array3D<DataT>& input_values,
const Array3D<float>& update_values, const Array3D<DataT>& update_values,
const std::vector<IndexT> slice_starts, const std::vector<IndexT> slice_starts,
const Array3D<float>& expected_values) { const Array3D<DataT>& expected_values) {
ComputationBuilder builder(client_, TestName()); ComputationBuilder builder(client_, TestName());
// Initialize and transfer dynamic slice start indices parameter. // Initialize and transfer dynamic slice start indices parameter.
ComputationDataHandle starts; ComputationDataHandle starts;
std::unique_ptr<GlobalData> start_data = CreateR1Parameter<IndexT>( std::unique_ptr<GlobalData> start_data = CreateR1Parameter<IndexT>(
slice_starts, 0, "slice_starts", &builder, &starts); slice_starts, 0, "slice_starts", &builder, &starts);
// Build dynamic slice computation. // Build dynamic slice computation.
auto input = builder.ConstantR3FromArray3D<float>(input_values); auto input = builder.ConstantR3FromArray3D<DataT>(input_values);
auto update = builder.ConstantR3FromArray3D<float>(update_values); auto update = builder.ConstantR3FromArray3D<DataT>(update_values);
builder.DynamicUpdateSlice(input, update, starts); builder.DynamicUpdateSlice(input, update, starts);
// Run computation and compare against expected values. // Run computation and compare against expected values.
ComputeAndCompareR3<float>(&builder, expected_values, {start_data.get()}, ComputeAndCompareR3<DataT>(&builder, expected_values, {start_data.get()});
ErrorSpec(0.000001));
} }
void RunR3Contiguous(std::vector<int32> operand_shape, int32 index, void RunR3Contiguous(std::vector<int32> operand_shape, int32 index,
@ -393,23 +408,81 @@ class DynamicUpdateSliceTest : public ClientLibraryTestBase {
} }
}; };
XLA_TEST_F(DynamicUpdateSliceTest, Int32R1) { TestR1<int32>(); } XLA_TEST_F(DynamicUpdateSliceTest, Int32R1) { TestR1<int32, float>(); }
XLA_TEST_F(DynamicUpdateSliceTest, Int64R1) { TestR1<int64>(); } XLA_TEST_F(DynamicUpdateSliceTest, Int64R1) { TestR1<int64, float>(); }
XLA_TEST_F(DynamicUpdateSliceTest, UInt64R1) { TestR1<uint64>(); } XLA_TEST_F(DynamicUpdateSliceTest, UInt64R1) { TestR1<uint64, double>(); }
XLA_TEST_F(DynamicUpdateSliceTest, Int32R2) { TestR2<int32>(); } XLA_TEST_F(DynamicUpdateSliceTest, Int32R2) { TestR2<int32, float>(); }
XLA_TEST_F(DynamicUpdateSliceTest, Int64R2) { TestR2<int64>(); } XLA_TEST_F(DynamicUpdateSliceTest, Int64R2) { TestR2<int64, int64>(); }
XLA_TEST_F(DynamicUpdateSliceTest, UInt64R2) { TestR2<uint64>(); } XLA_TEST_F(DynamicUpdateSliceTest, UInt64R2) { TestR2<uint64, int32>(); }
XLA_TEST_F(DynamicUpdateSliceTest, Int32R3) { TestR3<int32>(); } XLA_TEST_F(DynamicUpdateSliceTest, Int32R3) { TestR3<int32, float>(); }
XLA_TEST_F(DynamicUpdateSliceTest, Int64R3) { TestR3<int64>(); } XLA_TEST_F(DynamicUpdateSliceTest, Int64R3) { TestR3<int64, int64>(); }
XLA_TEST_F(DynamicUpdateSliceTest, UInt64R3) { TestR3<uint64>(); } XLA_TEST_F(DynamicUpdateSliceTest, UInt64R3) { TestR3<uint64, uint64>(); }
XLA_TEST_F(DynamicUpdateSliceTest, Int32R1Pred) {
// Slice at dimension start.
RunR1<int32, bool>({false, false, true, true, false, true, true, false},
{true, true, false}, {0},
{true, true, false, true, false, true, true, false});
// Slice in the middle.
RunR1<int32, bool>({false, false, true, true, false, true, true, false},
{false, true, true}, {2},
{false, false, false, true, true, true, true, false});
// Slice at dimension boundaries.
RunR1<int32, bool>({false, false, true, true, false, true, true, false},
{false, true, true}, {5},
{false, false, true, true, false, false, true, true});
// Zero-sized update.
RunR1<int32, bool>({false, false, true, true, false, true, true, false}, {},
{2}, {false, false, true, true, false, true, true, false});
}
XLA_TEST_F(DynamicUpdateSliceTest, Int32R2Pred) {
// Slice at dimension start.
RunR2<int32, bool>(
{{false, true, false}, {true, false, true}, {false, true, true}},
{{true, false}}, {0, 0},
{{true, false, false}, {true, false, true}, {false, true, true}});
// Slice in the middle.
RunR2<int32, bool>(
{{false, true, false}, {true, false, true}, {false, true, true}},
{{true, false}}, {1, 1},
{{false, true, false}, {true, true, false}, {false, true, true}});
// Slice at dimension boundaries.
RunR2<int32, bool>(
{{false, true, false}, {true, false, true}, {false, true, true}},
{{true, false}}, {2, 1},
{{false, true, false}, {true, false, true}, {false, true, false}});
// Zero-sized update.
RunR2<int32, bool>(
{{false, true, false}, {true, false, true}, {false, true, true}}, {{}},
{2, 1}, {{false, true, false}, {true, false, true}, {false, true, true}});
}
XLA_TEST_F(DynamicUpdateSliceTest, Int32R3Pred) {
// R3 Shape: [2, 3, 2]
// Slice at dimension start.
RunR3<int32, bool>(
{{{true, false}, {false, true}, {true, true}},
{{false, false}, {false, true}, {true, false}}},
{{{false, true}, {true, false}}, {{true, true}, {false, true}}},
{0, 0, 0},
{{{false, true}, {true, false}, {true, true}},
{{true, true}, {false, true}, {true, false}}});
// Slice in the middle.
RunR3<int32, bool>({{{true, false}, {false, true}, {true, true}},
{{false, false}, {false, true}, {true, false}}},
{{{false}, {true}}}, {1, 1, 1},
{{{true, false}, {false, true}, {true, true}},
{{false, false}, {false, false}, {true, true}}});
}
// Tests for simple R3 case where the update is contiguous (i.e. the minor // Tests for simple R3 case where the update is contiguous (i.e. the minor
// two dimensions are not sliced). // two dimensions are not sliced).

View File

@ -26,6 +26,7 @@ limitations under the License.
#include "tensorflow/compiler/xla/layout_util.h" #include "tensorflow/compiler/xla/layout_util.h"
#include "tensorflow/compiler/xla/legacy_flags/debug_options_flags.h" #include "tensorflow/compiler/xla/legacy_flags/debug_options_flags.h"
#include "tensorflow/compiler/xla/literal_util.h" #include "tensorflow/compiler/xla/literal_util.h"
#include "tensorflow/compiler/xla/service/reduce_precision_insertion.h"
#include "tensorflow/compiler/xla/statusor.h" #include "tensorflow/compiler/xla/statusor.h"
#include "tensorflow/compiler/xla/test.h" #include "tensorflow/compiler/xla/test.h"
#include "tensorflow/compiler/xla/tests/client_library_test_base.h" #include "tensorflow/compiler/xla/tests/client_library_test_base.h"
@ -39,8 +40,11 @@ limitations under the License.
namespace xla { namespace xla {
namespace { namespace {
class ReducePrecisionTest : public ClientLibraryTestBase, // Tests to confirm that the ReducePrecision operation produces the expected
public ::testing::WithParamInterface<int> {}; // numerical values.
class ReducePrecisionAccuracyTest : public ClientLibraryTestBase,
public ::testing::WithParamInterface<int> {
};
// For reduction to IEEE-f16, we want to test the following cases, in both // For reduction to IEEE-f16, we want to test the following cases, in both
// positive and negative variants. (Note: IEEE-f16 is 5 exponent bits and 10 // positive and negative variants. (Note: IEEE-f16 is 5 exponent bits and 10
@ -201,7 +205,7 @@ static const uint32_t test_values[][4] = {
FPVAL(11111111, 1111111111, 1111111111111) // NaN FPVAL(11111111, 1111111111, 1111111111111) // NaN
}}; }};
XLA_TEST_P(ReducePrecisionTest, ReducePrecisionF32) { XLA_TEST_P(ReducePrecisionAccuracyTest, ReducePrecisionF32) {
int index = GetParam(); int index = GetParam();
int exponent_bits = exponent_sizes[index]; int exponent_bits = exponent_sizes[index];
int mantissa_bits = mantissa_sizes[index]; int mantissa_bits = mantissa_sizes[index];
@ -238,9 +242,87 @@ XLA_TEST_P(ReducePrecisionTest, ReducePrecisionF32) {
ComputeAndCompareR1<float>(&builder, expected_values, {a_data.get()}); ComputeAndCompareR1<float>(&builder, expected_values, {a_data.get()});
} }
INSTANTIATE_TEST_CASE_P(ReducePrecisionTest, ReducePrecisionTest, INSTANTIATE_TEST_CASE_P(ReducePrecisionAccuracyTest,
ReducePrecisionAccuracyTest,
::testing::Values(0, 1, 2, 3), TestDataToString); ::testing::Values(0, 1, 2, 3), TestDataToString);
// Tests to confirm that the compiler optimization functions add the expected
// ReducePrecisionInsertion passes.
class ReducePrecisionInsertionTest : public ClientLibraryTestBase {};
XLA_TEST_F(ReducePrecisionInsertionTest, ReducePrecisionBeforeFusion) {
ComputationBuilder builder(client_, TestName());
std::unique_ptr<Literal> a_literal = Literal::CreateR1<float>({1.00001});
std::unique_ptr<GlobalData> a_data =
client_->TransferToServer(*a_literal).ConsumeValueOrDie();
auto a = builder.Parameter(0, a_literal->shape(), "a");
// Abs doesn't affect resolution.
auto abs = builder.Abs(a);
// Near 1.0, Log(x) approximates x - 1; this lets us confirm that the
// reduce-precision operation showed up in the correct place in the
// graph.
auto log = builder.Log(abs);
// Insert precision-reduction after the Abs(x) operation, rounding that
// result to exactly 1.0f.
auto reduce_precision_pass = execution_options_.mutable_debug_options()
->add_hlo_reduce_precision_options();
*reduce_precision_pass = ReducePrecisionInsertion::make_options_proto(
HloReducePrecisionOptions::BEFORE_OP_FUSION, 5, 10,
[](const HloOpcode opcode) { return opcode == HloOpcode::kAbs; });
ComputeAndCompareR1<float>(&builder, {0.0f}, {a_data.get()});
}
XLA_TEST_F(ReducePrecisionInsertionTest, ReducePrecisionSkippedAfterFusion) {
ComputationBuilder builder(client_, TestName());
std::unique_ptr<Literal> a_literal = Literal::CreateR1<float>({1.00001});
std::unique_ptr<GlobalData> a_data =
client_->TransferToServer(*a_literal).ConsumeValueOrDie();
auto a = builder.Parameter(0, a_literal->shape(), "a");
// These two operations should be fused by any reasonable backend.
auto abs = builder.Abs(a);
auto neg = builder.Neg(abs);
// Add a pass after operation fusion, suffixing kAbs operations. This
// should not see into the fusion nodes and thus should not affect the
// result.
auto reduce_precision_pass = execution_options_.mutable_debug_options()
->add_hlo_reduce_precision_options();
*reduce_precision_pass = ReducePrecisionInsertion::make_options_proto(
HloReducePrecisionOptions::AFTER_OP_FUSION, 5, 10,
[](const HloOpcode opcode) { return opcode == HloOpcode::kAbs; });
ComputeAndCompareR1<float>(&builder, {-1.00001f}, {a_data.get()});
}
XLA_TEST_F(ReducePrecisionInsertionTest, ReducePrecisionAddedAfterFusion) {
ComputationBuilder builder(client_, TestName());
std::unique_ptr<Literal> a_literal = Literal::CreateR1<float>({1.00001});
std::unique_ptr<GlobalData> a_data =
client_->TransferToServer(*a_literal).ConsumeValueOrDie();
auto a = builder.Parameter(0, a_literal->shape(), "a");
// These two operations should be fused by any reasonable backend.
auto abs = builder.Abs(a);
auto neg = builder.Neg(abs);
// Add a pass after operation fusion, suffixing kFusion operations.
auto reduce_precision_pass = execution_options_.mutable_debug_options()
->add_hlo_reduce_precision_options();
*reduce_precision_pass = ReducePrecisionInsertion::make_options_proto(
HloReducePrecisionOptions::AFTER_OP_FUSION, 5, 10,
[](const HloOpcode opcode) { return opcode == HloOpcode::kFusion; });
ComputeAndCompareR1<float>(&builder, {-1.0f}, {a_data.get()});
}
} // namespace } // namespace
} // namespace xla } // namespace xla

View File

@ -20,6 +20,24 @@ import "tensorflow/compiler/xla/service/session.proto";
package xla; package xla;
// Options for the HLO insert-reduce-precision-operations pass.
message HloReducePrecisionOptions {
// When to run the pass.
enum PassTiming {
BEFORE_OP_FUSION = 0;
AFTER_OP_FUSION = 1;
}
PassTiming pass_timing = 1;
// Exponent and mantissa bit counts for the reduced precision.
uint32 exponent_bits = 2;
uint32 mantissa_bits = 3;
// Opcodes for operations that should be suffixed with reduced-precision
// operations.
repeated uint32 opcodes_to_suffix = 4;
}
// Debugging options for XLA. These options may change at any time - there are // Debugging options for XLA. These options may change at any time - there are
// no guarantees about backward or forward compatibility for these fields. // no guarantees about backward or forward compatibility for these fields.
message DebugOptions { message DebugOptions {
@ -112,6 +130,11 @@ message DebugOptions {
// the generated IR. // the generated IR.
bool xla_llvm_enable_invariant_load_metadata = 72; bool xla_llvm_enable_invariant_load_metadata = 72;
// Options for inserting reduce-precision operations for numerical
// experimentation. This is a repeated field, as we may want to have
// multiple passes with different parameters.
repeated HloReducePrecisionOptions hlo_reduce_precision_options = 80;
// This is used by ClientLibraryTestBase::ComputeAndCompare*. If true, the // This is used by ClientLibraryTestBase::ComputeAndCompare*. If true, the
// computation will run n! times with all permunations of layouts for the // computation will run n! times with all permunations of layouts for the
// output shape in rank n. For example, with a 3D shape, all permutations of // output shape in rank n. For example, with a 3D shape, all permutations of

View File

@ -199,6 +199,7 @@ tensorflow/core/kernels/aggregate_ops.cc
tensorflow/core/kernels/depthwise_conv_op.cc tensorflow/core/kernels/depthwise_conv_op.cc
tensorflow/core/kernels/dequantize_op.cc tensorflow/core/kernels/dequantize_op.cc
tensorflow/core/kernels/meta_support.cc tensorflow/core/kernels/meta_support.cc
tensorflow/core/kernels/population_count_op.cc
tensorflow/core/kernels/quantization_utils.cc tensorflow/core/kernels/quantization_utils.cc
tensorflow/core/kernels/quantize_down_and_shrink_range.cc tensorflow/core/kernels/quantize_down_and_shrink_range.cc
tensorflow/core/kernels/quantize_op.cc tensorflow/core/kernels/quantize_op.cc

View File

@ -406,6 +406,31 @@ class RNNCellTest(test.TestCase):
# States are left untouched # States are left untouched
self.assertAllClose(res[2], res[3]) self.assertAllClose(res[2], res[3])
def testResidualWrapperWithSlice(self):
with self.test_session() as sess:
with variable_scope.variable_scope(
"root", initializer=init_ops.constant_initializer(0.5)):
x = array_ops.zeros([1, 5])
m = array_ops.zeros([1, 3])
base_cell = rnn_cell_impl.GRUCell(3)
g, m_new = base_cell(x, m)
variable_scope.get_variable_scope().reuse_variables()
def residual_with_slice_fn(inp, out):
inp_sliced = array_ops.slice(inp, [0, 0], [-1, 3])
return inp_sliced + out
g_res, m_new_res = rnn_cell_impl.ResidualWrapper(
base_cell, residual_with_slice_fn)(x, m)
sess.run([variables_lib.global_variables_initializer()])
res_g, res_g_res, res_m_new, res_m_new_res = sess.run(
[g, g_res, m_new, m_new_res], {
x: np.array([[1., 1., 1., 1., 1.]]),
m: np.array([[0.1, 0.1, 0.1]])
})
# Residual connections
self.assertAllClose(res_g_res, res_g + [1., 1., 1.])
# States are left untouched
self.assertAllClose(res_m_new, res_m_new_res)
def testDeviceWrapper(self): def testDeviceWrapper(self):
with variable_scope.variable_scope( with variable_scope.variable_scope(
"root", initializer=init_ops.constant_initializer(0.5)): "root", initializer=init_ops.constant_initializer(0.5)):

View File

@ -220,15 +220,14 @@ class SpectralOpsTest(test.TestCase):
# stft_bound, inverse_stft_bound). # stft_bound, inverse_stft_bound).
# TODO(rjryan): Investigate why STFT gradient error is so high. # TODO(rjryan): Investigate why STFT gradient error is so high.
test_configs = [ test_configs = [
(512, 64, 32, 64, 2e-3, 3e-5), (64, 16, 8, 16),
(512, 64, 64, 64, 2e-3, 3e-5), (64, 16, 16, 16),
(512, 64, 25, 64, 2e-3, 3e-5), (64, 16, 7, 16),
(512, 25, 15, 36, 2e-3, 3e-5), (64, 7, 4, 9),
(123, 23, 5, 42, 2e-3, 4e-5), (29, 5, 1, 10),
] ]
for (signal_length, frame_length, frame_step, fft_length, for (signal_length, frame_length, frame_step, fft_length) in test_configs:
stft_bound, inverse_stft_bound) in test_configs:
signal_shape = [signal_length] signal_shape = [signal_length]
signal = random_ops.random_uniform(signal_shape) signal = random_ops.random_uniform(signal_shape)
stft_shape = [max(0, 1 + (signal_length - frame_length) // frame_step), stft_shape = [max(0, 1 + (signal_length - frame_length) // frame_step),
@ -242,8 +241,8 @@ class SpectralOpsTest(test.TestCase):
stft, stft_shape) stft, stft_shape)
inverse_stft_error = test.compute_gradient_error( inverse_stft_error = test.compute_gradient_error(
stft, stft_shape, inverse_stft, inverse_stft_shape) stft, stft_shape, inverse_stft, inverse_stft_shape)
self.assertLess(stft_error, stft_bound) self.assertLess(stft_error, 2e-3)
self.assertLess(inverse_stft_error, inverse_stft_bound) self.assertLess(inverse_stft_error, 4e-5)
if __name__ == "__main__": if __name__ == "__main__":

View File

@ -54,9 +54,12 @@ def _tpu_job(run_config):
return None if run_config.master in ['', 'local'] else 'tpu_worker' return None if run_config.master in ['', 'local'] else 'tpu_worker'
def _per_shard_batch_size(global_batch_size, run_config): def _per_shard_batch_size(global_batch_size, run_config, use_tpu):
"""Returns the batch size for each shard.""" """Returns the batch size for each shard."""
if use_tpu:
return global_batch_size // run_config.tpu_config.num_shards return global_batch_size // run_config.tpu_config.num_shards
else:
return global_batch_size
class _SIGNAL(object): class _SIGNAL(object):
@ -470,7 +473,7 @@ class _ModelFnWrapper(object):
self._train_batch_size = train_batch_size self._train_batch_size = train_batch_size
def call_without_tpu(self, features, labels): def call_without_tpu(self, features, labels):
return self._call_model_fn(features, labels) return self._call_model_fn(features, labels, False)
def convert_to_single_tpu_train_step(self, dequeue_fn): def convert_to_single_tpu_train_step(self, dequeue_fn):
"""Converts the `model_fn` as a single train step on TPU.""" """Converts the `model_fn` as a single train step on TPU."""
@ -481,8 +484,8 @@ class _ModelFnWrapper(object):
features, labels = dequeue_fn() features, labels = dequeue_fn()
# Makes deep copy with `config` and params` in case user mutates them. # Makes deep copy with `config` and params` in case user mutates them.
estimator_spec = self._verify_estimator_spec(self._call_model_fn( estimator_spec = self._verify_estimator_spec(
features, labels, add_batch_size_in_params=True)) self._call_model_fn(features, labels, True))
loss, train_op = estimator_spec.loss, estimator_spec.train_op loss, train_op = estimator_spec.loss, estimator_spec.train_op
with ops.control_dependencies([train_op]): with ops.control_dependencies([train_op]):
return array_ops.identity(loss) return array_ops.identity(loss)
@ -492,7 +495,7 @@ class _ModelFnWrapper(object):
def config(self): def config(self):
return self._config return self._config
def _call_model_fn(self, features, labels, add_batch_size_in_params=False): def _call_model_fn(self, features, labels, use_tpu):
"""Calls the model_fn with required parameters.""" """Calls the model_fn with required parameters."""
model_fn_args = util.fn_args(self._model_fn) model_fn_args = util.fn_args(self._model_fn)
kwargs = {} kwargs = {}
@ -513,7 +516,6 @@ class _ModelFnWrapper(object):
if 'params' in model_fn_args: if 'params' in model_fn_args:
kwargs['params'] = params kwargs['params'] = params
if add_batch_size_in_params:
if 'params' not in model_fn_args: if 'params' not in model_fn_args:
raise ValueError( raise ValueError(
'model_fn ({}) does not include params argument, ' 'model_fn ({}) does not include params argument, '
@ -521,8 +523,8 @@ class _ModelFnWrapper(object):
'params[\'batch_size\']'.format(self._model_fn)) 'params[\'batch_size\']'.format(self._model_fn))
if self._mode == model_fn_lib.ModeKeys.TRAIN: if self._mode == model_fn_lib.ModeKeys.TRAIN:
# For TPU training. `params` is never `None`. # For TPU training. `params` is never `None`.
params[_BATCH_SIZE_KEY] = _per_shard_batch_size(self._train_batch_size, params[_BATCH_SIZE_KEY] = _per_shard_batch_size(
config) self._train_batch_size, config, use_tpu)
return self._model_fn(features=features, **kwargs) return self._model_fn(features=features, **kwargs)
@ -609,16 +611,12 @@ class TPUEstimator(estimator_lib.Estimator):
'batch size {} must be divisible by number of shards {}' 'batch size {} must be divisible by number of shards {}'
.format(train_batch_size, config.tpu_config.num_shards)) .format(train_batch_size, config.tpu_config.num_shards))
if use_tpu:
# Verifies the model_fn signature according to Estimator framework. # Verifies the model_fn signature according to Estimator framework.
estimator_lib._verify_model_fn_args(model_fn, params) # pylint: disable=protected-access estimator_lib._verify_model_fn_args(model_fn, params) # pylint: disable=protected-access
# We cannot store config and params in this constructor as parent # We cannot store config and params in this constructor as parent
# constructor might change them, such as assigning a temp dir for # constructor might change them, such as assigning a temp dir for
# config.model_dir. # config.model_dir.
model_function = augment_model_fn_with_tpu_support( model_function = _augment_model_fn(model_fn, train_batch_size, use_tpu)
model_fn, train_batch_size)
else:
model_function = model_fn
super(TPUEstimator, self).__init__( super(TPUEstimator, self).__init__(
model_fn=model_function, model_fn=model_function,
@ -670,9 +668,6 @@ class TPUEstimator(estimator_lib.Estimator):
Raises: Raises:
ValueError: if input_fn takes invalid arguments or does not have `params`. ValueError: if input_fn takes invalid arguments or does not have `params`.
""" """
if not self._use_tpu or mode != model_fn_lib.ModeKeys.TRAIN:
return super(TPUEstimator, self)._call_input_fn(input_fn, mode)
input_fn_args = util.fn_args(input_fn) input_fn_args = util.fn_args(input_fn)
config = self.config # a deep copy. config = self.config # a deep copy.
kwargs = {} kwargs = {}
@ -686,8 +681,13 @@ class TPUEstimator(estimator_lib.Estimator):
kwargs['config'] = config kwargs['config'] = config
# Now for TPU training. # Now for TPU training.
per_shard_batch_size = _per_shard_batch_size(self._train_batch_size, config) if mode == model_fn_lib.ModeKeys.TRAIN:
kwargs['params'][_BATCH_SIZE_KEY] = per_shard_batch_size kwargs['params'][_BATCH_SIZE_KEY] = (
_per_shard_batch_size(self._train_batch_size, config, self._use_tpu))
if not self._use_tpu or mode != model_fn_lib.ModeKeys.TRAIN:
with ops.device('/cpu:0'):
return input_fn(**kwargs)
job = _tpu_job(config) job = _tpu_job(config)
def placement_function(index): def placement_function(index):
@ -746,7 +746,7 @@ def _create_infeed_enqueue_ops_and_dequeue_fn(inputs_holder):
return (dequeue_fn, enqueue_fn) return (dequeue_fn, enqueue_fn)
def augment_model_fn_with_tpu_support(model_fn, train_batch_size): def _augment_model_fn(model_fn, train_batch_size, use_tpu):
"""Returns a new model_fn, which wraps the TPU support.""" """Returns a new model_fn, which wraps the TPU support."""
def _model_fn(features, labels, mode, config, params): def _model_fn(features, labels, mode, config, params):
@ -755,7 +755,7 @@ def augment_model_fn_with_tpu_support(model_fn, train_batch_size):
train_batch_size) train_batch_size)
# TODO(jhseu): Move to EVAL and PREDICT to TPU. # TODO(jhseu): Move to EVAL and PREDICT to TPU.
if mode != model_fn_lib.ModeKeys.TRAIN: if not use_tpu or mode != model_fn_lib.ModeKeys.TRAIN:
return model_fn_wrapper.call_without_tpu(features, labels) return model_fn_wrapper.call_without_tpu(features, labels)
inputs = _InputsHolder(sharded_features=features, sharded_labels=labels) inputs = _InputsHolder(sharded_features=features, sharded_labels=labels)

View File

@ -15,7 +15,9 @@ limitations under the License.
#include "tensorflow/core/framework/attr_value_util.h" #include "tensorflow/core/framework/attr_value_util.h"
#include <string>
#include <vector> #include <vector>
#include "tensorflow/core/framework/attr_value.pb_text.h" #include "tensorflow/core/framework/attr_value.pb_text.h"
#include "tensorflow/core/framework/tensor.pb_text.h" #include "tensorflow/core/framework/tensor.pb_text.h"
#include "tensorflow/core/framework/tensor_shape.pb.h" #include "tensorflow/core/framework/tensor_shape.pb.h"
@ -27,7 +29,6 @@ limitations under the License.
#include "tensorflow/core/platform/protobuf.h" #include "tensorflow/core/platform/protobuf.h"
namespace tensorflow { namespace tensorflow {
namespace { namespace {
string SummarizeString(const string& str) { string SummarizeString(const string& str) {
@ -460,7 +461,8 @@ bool HasPlaceHolder(const AttrValue& val) {
return false; return false;
} }
bool SubstitutePlaceholders(SubstituteFunc substitute, AttrValue* value) { bool SubstitutePlaceholders(const SubstituteFunc& substitute,
AttrValue* value) {
switch (value->value_case()) { switch (value->value_case()) {
case AttrValue::kList: { case AttrValue::kList: {
for (NameAttrList& func : *value->mutable_list()->mutable_func()) { for (NameAttrList& func : *value->mutable_list()->mutable_func()) {

View File

@ -16,8 +16,10 @@ limitations under the License.
#ifndef TENSORFLOW_FRAMEWORK_ATTR_VALUE_UTIL_H_ #ifndef TENSORFLOW_FRAMEWORK_ATTR_VALUE_UTIL_H_
#define TENSORFLOW_FRAMEWORK_ATTR_VALUE_UTIL_H_ #define TENSORFLOW_FRAMEWORK_ATTR_VALUE_UTIL_H_
#include <functional>
#include <string> #include <string>
#include <vector> #include <vector>
#include "tensorflow/core/framework/attr_value.pb.h" // TODO(62899350): Remove #include "tensorflow/core/framework/attr_value.pb.h" // TODO(62899350): Remove
#include "tensorflow/core/framework/partial_tensor_shape.h" #include "tensorflow/core/framework/partial_tensor_shape.h"
#include "tensorflow/core/framework/tensor.h" #include "tensorflow/core/framework/tensor.h"
@ -100,8 +102,8 @@ bool HasPlaceHolder(const AttrValue& val);
// SubstituteFunc is given a placeholder string. If the placeholder is // SubstituteFunc is given a placeholder string. If the placeholder is
// unknown, SubstituteFunc returns false. Otherwise, overwrites the // unknown, SubstituteFunc returns false. Otherwise, overwrites the
// attr value and returns true. // attr value and returns true.
typedef std::function<bool(const string&, AttrValue*)> SubstituteFunc; using SubstituteFunc = std::function<bool(const string&, AttrValue*)>;
bool SubstitutePlaceholders(SubstituteFunc substitute, AttrValue* value); bool SubstitutePlaceholders(const SubstituteFunc& substitute, AttrValue* value);
} // namespace tensorflow } // namespace tensorflow

View File

@ -104,7 +104,8 @@ ConstantFolding::ConstantFolding() {
ops_to_preserve_ = std::regex( ops_to_preserve_ = std::regex(
"Placeholder.*|Const|.*Save.*|.*Restore.*|.*Reader|" "Placeholder.*|Const|.*Save.*|.*Restore.*|.*Reader|"
"Enter|RefEnter|Exit|RefExit|NextIteration|RefNextIteration|" "Enter|RefEnter|Exit|RefExit|NextIteration|RefNextIteration|"
".*Quantized.*"); ".*Quantized.*",
std::regex_constants::optimize);
} }
string ConstantFolding::AddControlDependency(const string& input_name) { string ConstantFolding::AddControlDependency(const string& input_name) {
@ -240,13 +241,18 @@ Status ConstantFolding::MaterializeShapes(const GrapplerItem& item,
} }
bool ConstantFolding::IsFoldable(const NodeDef& node) const { bool ConstantFolding::IsFoldable(const NodeDef& node) const {
// Folding not applicable to ops with no inputs.
if (node.input().empty()) {
return false;
}
// Skips nodes that must be preserved, and op_types that don't benefit from // Skips nodes that must be preserved, and op_types that don't benefit from
// folding // folding
if (nodes_to_preserve_.find(node.name()) != nodes_to_preserve_.end()) { if (nodes_to_preserve_.find(node.name()) != nodes_to_preserve_.end()) {
return false; return false;
} }
std::cmatch match; if (std::regex_match(node.op().c_str(), ops_to_preserve_,
if (std::regex_match(node.op().c_str(), match, ops_to_preserve_)) { std::regex_constants::match_any)) {
return false; return false;
} }
@ -264,23 +270,6 @@ bool ConstantFolding::IsFoldable(const NodeDef& node) const {
return false; return false;
} }
DeviceTypeVector device_types;
status = SupportedDeviceTypesForNode({DeviceType(DEVICE_CPU)}, node,
&device_types);
if (!status.ok()) {
return false;
}
// Only fold ops with a CPU implementation available.
if (device_types.empty()) {
return false;
}
DCHECK_EQ(DeviceType(DEVICE_CPU), device_types[0]);
// Folding not applicable to ops with no inputs.
if (node.input().empty()) {
return false;
}
// No need to (and don't) fold nodes that have no outgoing edges. Such nodes // No need to (and don't) fold nodes that have no outgoing edges. Such nodes
// could be introduced by an earlier constant folding pass and are preserved // could be introduced by an earlier constant folding pass and are preserved
// in case users want to fetch their values; re-processing them would // in case users want to fetch their values; re-processing them would
@ -391,12 +380,15 @@ Status ConstantFolding::EvaluateOneFoldable(const NodeDef& node,
// Control dependency // Control dependency
break; break;
} }
// There should be a single output since the input node should be a constant const NodeDef* input_node = node_map_->GetNode(input);
// node. if (!IsConstant(*input_node)) {
TensorVector output; return Status(error::INVALID_ARGUMENT,
TF_RETURN_IF_ERROR( strings::StrCat("Can't fold ", node.name(), ", its ", input,
EvaluateNode(*node_map_->GetNode(input), TensorVector(), &output)); " isn't constant"));
inputs.push_back(output[position]); }
Tensor* value = new Tensor(input_node->attr().at("dtype").type());
CHECK(value->FromProto(input_node->attr().at("value").tensor()));
inputs.emplace_back(value);
} }
TensorVector output_tensors; TensorVector output_tensors;
@ -583,24 +575,31 @@ Status ConstantFolding::FoldNode(const NodeDef& node, GraphDef* output) {
Status ConstantFolding::FoldGraph(GraphDef* output) { Status ConstantFolding::FoldGraph(GraphDef* output) {
std::unordered_set<string> processed_nodes; std::unordered_set<string> processed_nodes;
int previously_processed = 0; std::deque<const NodeDef*> queue;
do {
previously_processed = processed_nodes.size();
for (const auto& node : graph_.node()) { for (const auto& node : graph_.node()) {
if (IsFoldable(node) && if (IsFoldable(node)) {
processed_nodes.find(node.name()) == processed_nodes.end()) { queue.push_back(&node);
Status s = FoldNode(node, output); }
}
while (!queue.empty()) {
const NodeDef* node = queue.front();
queue.pop_front();
if (processed_nodes.count(node->name())) {
continue;
}
Status s = FoldNode(*node, output);
processed_nodes.insert(node->name());
if (!s.ok()) { if (!s.ok()) {
VLOG(1) << "Failed to fold node " << node.name() << ": " << s; VLOG(1) << "Failed to fold node " << node->name() << ": " << s;
} } else {
processed_nodes.insert(node.name()); auto outputs = node_map_->GetOutputs(node->name());
for (auto& output : outputs) {
if (IsFoldable(*output)) {
queue.push_back(output);
}
}
} }
} }
// Try again as long as we find new constants. In most cases, this loop will
// only run once since the graph is already in topological order.
VLOG(1) << "Folded " << processed_nodes.size() - previously_processed
<< " nodes in this pass";
} while (previously_processed != processed_nodes.size());
// Build the graph after constant folding. Note that we keep all processed // Build the graph after constant folding. Note that we keep all processed
// nodes in the graph in case users need to fetch their values. // nodes in the graph in case users need to fetch their values.
@ -740,7 +739,6 @@ Status ConstantFolding::SimplifyGraph(GraphDef* output,
Status ConstantFolding::Optimize(Cluster* cluster, const GrapplerItem& item, Status ConstantFolding::Optimize(Cluster* cluster, const GrapplerItem& item,
GraphDef* output) { GraphDef* output) {
graph_ = item.graph; graph_ = item.graph;
LOG(INFO) << "Initial graph size: " << item.graph.node_size();
node_map_.reset(new NodeMap(&graph_)); node_map_.reset(new NodeMap(&graph_));
for (const auto& node : item.fetch) { for (const auto& node : item.fetch) {
nodes_to_preserve_.insert(NodeName(node)); nodes_to_preserve_.insert(NodeName(node));
@ -761,7 +759,6 @@ Status ConstantFolding::Optimize(Cluster* cluster, const GrapplerItem& item,
TF_RETURN_IF_ERROR(FoldGraph(output)); TF_RETURN_IF_ERROR(FoldGraph(output));
TF_RETURN_IF_ERROR(SimplifyGraph(output, properties)); TF_RETURN_IF_ERROR(SimplifyGraph(output, properties));
LOG(INFO) << "Optimized graph size: " << output->node_size();
*output->mutable_library() = item.graph.library(); *output->mutable_library() = item.graph.library();
*output->mutable_versions() = item.graph.versions(); *output->mutable_versions() = item.graph.versions();

View File

@ -701,6 +701,39 @@ tf_kernel_library(
deps = ARRAY_DEPS, deps = ARRAY_DEPS,
) )
tf_kernel_library(
name = "compare_and_bitpack_op",
srcs = ["compare_and_bitpack_op.cc"],
hdrs = ["compare_and_bitpack_op.h"],
gpu_srcs = [
"compare_and_bitpack_op.h",
"compare_and_bitpack_op_gpu.cu.cc",
],
deps = ARRAY_DEPS,
)
# TODO(ebrevdo): Add benchmarks once the op is in the autogen array namespace.
# tf_cuda_cc_test(
# name = "compare_and_bitpack_op_test",
# srcs = ["compare_and_bitpack_op_test.cc"],
# deps = [
# ":array",
# ":ops_testutil",
# ":ops_util",
# "//third_party/eigen3",
# "//tensorflow/cc:cc_ops",
# "//tensorflow/cc:cc_ops_internal",
# "//tensorflow/core:core_cpu",
# "//tensorflow/core:core_cpu_internal",
# "//tensorflow/core:framework",
# "//tensorflow/core:lib",
# "//tensorflow/core:protos_all_cc",
# "//tensorflow/core:test",
# "//tensorflow/core:test_main",
# "//tensorflow/core:testlib",
# ],
# )
tf_kernel_library( tf_kernel_library(
name = "reshape_op", name = "reshape_op",
prefix = "reshape_op", prefix = "reshape_op",
@ -2344,10 +2377,12 @@ cc_library(
":bucketize_op", ":bucketize_op",
":cast_op", ":cast_op",
":check_numerics_op", ":check_numerics_op",
":compare_and_bitpack_op",
":cross_op", ":cross_op",
":cwise_op", ":cwise_op",
":fft_ops", ":fft_ops",
":matmul_op", ":matmul_op",
":population_count_op",
":reduction_ops", ":reduction_ops",
":scan_ops", ":scan_ops",
":segment_reduction_ops", ":segment_reduction_ops",
@ -2409,6 +2444,12 @@ tf_kernel_library(
deps = MATH_DEPS + ["//tensorflow/core:bitwise_ops_op_lib"], deps = MATH_DEPS + ["//tensorflow/core:bitwise_ops_op_lib"],
) )
tf_kernel_library(
name = "population_count_op",
prefix = "population_count_op",
deps = MATH_DEPS,
)
tf_kernel_library( tf_kernel_library(
name = "fft_ops", name = "fft_ops",
prefix = "fft_ops", prefix = "fft_ops",
@ -4292,6 +4333,8 @@ filegroup(
"fake_quant_ops.cc", "fake_quant_ops.cc",
"fifo_queue.cc", "fifo_queue.cc",
"fused_batch_norm_op.cc", "fused_batch_norm_op.cc",
"population_count_op.cc",
"population_count_op.h",
"winograd_transform.h", "winograd_transform.h",
":android_extended_ops_headers", ":android_extended_ops_headers",
] + select({ ] + select({

View File

@ -0,0 +1,185 @@
/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
// See docs in ../ops/math_ops.cc
#define EIGEN_USE_THREADS
#include "tensorflow/core/kernels/compare_and_bitpack_op.h"
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/register_types.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/framework/tensor_shape.h"
#include "tensorflow/core/framework/types.h"
#include "tensorflow/core/lib/core/status.h"
#include "tensorflow/core/util/work_sharder.h"
namespace tensorflow {
typedef Eigen::ThreadPoolDevice CPUDevice;
typedef Eigen::GpuDevice GPUDevice;
template <typename Device, typename T>
class CompareAndBitpackOp : public OpKernel {
public:
explicit CompareAndBitpackOp(OpKernelConstruction* context)
: OpKernel(context) {}
void Compute(OpKernelContext* c) override {
const Tensor& input_t = c->input(0);
const Tensor& threshold_t = c->input(1);
OP_REQUIRES(
c, TensorShapeUtils::IsScalar(threshold_t.shape()),
errors::InvalidArgument("Compare must be a scalar, but saw shape: ",
threshold_t.shape().DebugString()));
const TensorShape& input_shape = input_t.shape();
OP_REQUIRES(c, TensorShapeUtils::IsVectorOrHigher(input_shape),
errors::InvalidArgument(
"Input should be at least a vector, but saw a scalar."));
OP_REQUIRES(c, input_shape.dim_size(input_shape.dims() - 1) % 8 == 0,
errors::InvalidArgument(
"Inner dimension of input should be "
"divisible by ",
8, ", but saw shape: ", input_shape.DebugString()));
TensorShape output_shape = input_shape;
int rank = input_shape.dims();
output_shape.set_dim(rank - 1, input_shape.dim_size(rank - 1) / 8);
Tensor* output_t;
OP_REQUIRES_OK(c, c->allocate_output(0, output_shape, &output_t));
auto input = input_t.flat_inner_dims<T>();
auto threshold = threshold_t.scalar<T>();
auto output = output_t->flat_inner_dims<uint8>();
functor::CompareAndBitpack<Device, T> func;
func(c, input, threshold, output);
}
};
#define REGISTER_COMPARE_AND_BITPACK(type) \
REGISTER_KERNEL_BUILDER( \
Name("CompareAndBitpack").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
CompareAndBitpackOp<CPUDevice, type>);
TF_CALL_REAL_NUMBER_TYPES(REGISTER_COMPARE_AND_BITPACK);
TF_CALL_bool(REGISTER_COMPARE_AND_BITPACK);
#undef REGISTER_COMPARE_AND_BITPACK
namespace functor {
template <typename T, class = void, class = void>
struct ComputeShard {
static EIGEN_STRONG_INLINE void Compute(typename TTypes<T>::ConstMatrix input,
typename TTypes<uint8>::Matrix output,
const T& thresh, int64 start,
int64 limit) {
for (int64 i = start; i < limit; ++i) {
uint8* out = output.data() + i;
const T* block = input.data() + 8 * i;
*out = ((((block[0] > thresh) << 7)) | (((block[1] > thresh) << 6)) |
(((block[2] > thresh) << 5)) | (((block[3] > thresh) << 4)) |
(((block[4] > thresh) << 3)) | (((block[5] > thresh) << 2)) |
(((block[6] > thresh) << 1)) | (((block[7] > thresh))));
}
}
};
// Specialization for bool on systems where sizeof(bool) == 1.
template <typename T>
struct ComputeShard<T,
typename std::enable_if<std::is_same<T, bool>::value>::type,
typename std::enable_if<sizeof(T) == 1>::type> {
static EIGEN_STRONG_INLINE void Compute(
typename TTypes<bool>::ConstMatrix input,
typename TTypes<uint8>::Matrix output, bool /*thresh*/, int64 start,
int64 limit) {
// NOTE(ebrevdo): This assumes memory is little-endian.
for (int64 i = start; i < limit; ++i) {
uint8* out = output.data() + i;
const int64 block = *reinterpret_cast<const int64*>(input.data() + 8 * i);
*out =
((((block & (1LL << (7 * 8))) >> (7 * 8 - 0))) |
(((block & (1LL << (6 * 8))) >> (6 * 8 - 1))) |
(((block & (1LL << (5 * 8))) >> (5 * 8 - 2))) |
(((block & (1LL << (4 * 8))) >> (4 * 8 - 3))) |
(((block & (1LL << (3 * 8))) >> (3 * 8 - 4))) |
(((block & (1LL << (2 * 8))) >> (2 * 8 - 5))) |
(((block & (1LL << 8)) >> (1 * 8 - 6))) | (((block & (1LL)) << 7)));
}
}
};
template <typename T>
struct CompareAndBitpack<CPUDevice, T> {
void operator()(OpKernelContext* c, typename TTypes<T>::ConstMatrix input,
typename TTypes<T>::ConstScalar threshold,
TTypes<uint8>::Matrix output) {
const T thresh = threshold();
auto shard = [&, thresh](int64 start, int64 limit) {
ComputeShard<T>::Compute(input, output, thresh, start, limit);
};
int64 total_shards = output.size(); // Approximate cmp as an add and
// bitwise-or + shift as an add.
const double total_cost = 8 * (Eigen::TensorOpCost::AddCost<T>() +
Eigen::TensorOpCost::AddCost<uint8>());
const int64 shard_cost = (total_cost >= static_cast<double>(kint64max))
? kint64max
: static_cast<int64>(total_cost);
auto worker_threads = *(c->device()->tensorflow_cpu_worker_threads());
Shard(worker_threads.num_threads, worker_threads.workers, total_shards,
shard_cost, shard);
}
};
} // namespace functor
#if GOOGLE_CUDA
#define REGISTER_COMPARE_AND_BITPACK(type) \
REGISTER_KERNEL_BUILDER( \
Name("CompareAndBitpack").Device(DEVICE_GPU).TypeConstraint<type>("T"), \
CompareAndBitpackOp<GPUDevice, type>);
TF_CALL_GPU_NUMBER_TYPES(REGISTER_COMPARE_AND_BITPACK);
TF_CALL_bool(REGISTER_COMPARE_AND_BITPACK);
#undef REGISTER_COMPARE_AND_BITPACK
namespace functor {
#define DECLARE_GPU_SPEC(T) \
template <> \
void CompareAndBitpack<GPUDevice, T>::operator()( \
OpKernelContext* c, typename TTypes<T>::ConstMatrix input, \
typename TTypes<T>::ConstScalar threshold, \
TTypes<uint8>::Matrix output); \
extern template struct CompareAndBitpack<GPUDevice, T>;
TF_CALL_GPU_NUMBER_TYPES(DECLARE_GPU_SPEC)
TF_CALL_bool(DECLARE_GPU_SPEC)
#undef DECLARE_GPU_SPEC
} // namespace functor
#endif // GOOGLE_CUDA
} // namespace tensorflow

View File

@ -0,0 +1,42 @@
/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#ifndef THIRD_PARTY_TENSORFLOW_CORE_KERNELS_COMPARE_AND_BITPACK_OP_H_
#define THIRD_PARTY_TENSORFLOW_CORE_KERNELS_COMPARE_AND_BITPACK_OP_H_
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/tensor_types.h"
#include "tensorflow/core/platform/types.h"
namespace tensorflow {
namespace functor {
typedef Eigen::ThreadPoolDevice CPUDevice;
typedef Eigen::GpuDevice GPUDevice;
template <typename Device, typename T>
struct CompareAndBitpack {
void operator()(OpKernelContext* c, typename TTypes<T>::ConstMatrix input,
typename TTypes<T>::ConstScalar threshold,
TTypes<uint8>::Matrix output);
};
} // namespace functor
} // namespace tensorflow
#endif // THIRD_PARTY_TENSORFLOW_CORE_KERNELS_COMPARE_AND_BITPACK_OP_H_

View File

@ -0,0 +1,141 @@
/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#if GOOGLE_CUDA
#define EIGEN_USE_GPU
#include "tensorflow/core/kernels/compare_and_bitpack_op.h"
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/register_types.h"
#include "tensorflow/core/framework/tensor_types.h"
#include "tensorflow/core/platform/types.h"
#include "tensorflow/core/util/cuda_kernel_helper.h"
namespace tensorflow {
typedef Eigen::GpuDevice GPUDevice;
namespace functor {
template <typename T>
__global__ void CompareAndBitpackKernel(const int size, const T* threshold,
const T* input, uint8* output) {
// TODO(ebrevdo): Erich said: to get a better memory access pattern
// you could have 8 threads load this data and do a comparison, then
// use the ballot instruction to combine the values from each thread
// in the warp in one instruction (so each thread will have the
// result for 4 blocks) followed by an appropriate shift and mask to
// get the 8-bits of interest.
const T thresh = ldg(threshold);
CUDA_1D_KERNEL_LOOP(i, size) {
const T* block = input + 8 * i;
output[i] =
((((ldg(block) > thresh) << 7)) | (((ldg(block + 1) > thresh) << 6)) |
(((ldg(block + 2) > thresh) << 5)) |
(((ldg(block + 3) > thresh) << 4)) |
(((ldg(block + 4) > thresh) << 3)) |
(((ldg(block + 5) > thresh) << 2)) |
(((ldg(block + 6) > thresh) << 1)) | (((ldg(block + 7) > thresh))));
}
}
template <>
__global__ void CompareAndBitpackKernel<bool>(const int size,
const bool* threshold,
const bool* input,
uint8* output) {
// TODO(ebrevdo): Erich said: I think you could again have multiple
// threads work on one block and use the ballot instruction to the
// bit packing in one instruction.
CUDA_1D_KERNEL_LOOP(i, size) {
const int64 block = ldg(reinterpret_cast<const int64*>(input + 8 * i));
// NOTE(ebrevdo): This assumes memory is little-endian.
output[i] =
((((block & (1LL << (7 * 8))) >> (7 * 8 - 0))) |
(((block & (1LL << (6 * 8))) >> (6 * 8 - 1))) |
(((block & (1LL << (5 * 8))) >> (5 * 8 - 2))) |
(((block & (1LL << (4 * 8))) >> (4 * 8 - 3))) |
(((block & (1LL << (3 * 8))) >> (3 * 8 - 4))) |
(((block & (1LL << (2 * 8))) >> (2 * 8 - 5))) |
(((block & (1LL << 8)) >> (1 * 8 - 6))) | (((block & (1LL)) << 7)));
}
}
template <>
__global__ void CompareAndBitpackKernel<float>(const int size,
const float* threshold,
const float* input,
uint8* output) {
const float thresh = ldg(threshold);
CUDA_1D_KERNEL_LOOP(i, size) {
const float4 block0 = ldg(reinterpret_cast<const float4*>(input + 8 * i));
const float4 block1 =
ldg(reinterpret_cast<const float4*>(input + 8 * i + 4));
output[i] = ((((block0.x > thresh) << 7)) | (((block0.y > thresh) << 6)) |
(((block0.z > thresh) << 5)) | (((block0.w > thresh) << 4)) |
(((block1.x > thresh) << 3)) | (((block1.y > thresh) << 2)) |
(((block1.z > thresh) << 1)) | (((block1.w > thresh))));
}
}
template <>
__global__ void CompareAndBitpackKernel<double>(const int size,
const double* threshold,
const double* input,
uint8* output) {
const double thresh = ldg(threshold);
CUDA_1D_KERNEL_LOOP(i, size) {
const double2 block0 = ldg(reinterpret_cast<const double2*>(input + 8 * i));
const double2 block1 =
ldg(reinterpret_cast<const double2*>(input + 8 * i + 2));
const double2 block2 =
ldg(reinterpret_cast<const double2*>(input + 8 * i + 4));
const double2 block3 =
ldg(reinterpret_cast<const double2*>(input + 8 * i + 6));
output[i] = ((((block0.x > thresh) << 7)) | (((block0.y > thresh) << 6)) |
(((block1.x > thresh) << 5)) | (((block1.y > thresh) << 4)) |
(((block2.x > thresh) << 3)) | (((block2.y > thresh) << 2)) |
(((block3.x > thresh) << 1)) | (((block3.y > thresh))));
}
}
#define DEFINE_GPU_SPECS(T) \
template <> \
void CompareAndBitpack<GPUDevice, T>::operator()( \
OpKernelContext* c, typename TTypes<T>::ConstMatrix input, \
typename TTypes<T>::ConstScalar threshold, \
TTypes<uint8>::Matrix output) { \
const GPUDevice& d = c->eigen_device<GPUDevice>(); \
int64 total_count = output.size(); \
CudaLaunchConfig config = GetCudaLaunchConfig(total_count, d); \
\
CompareAndBitpackKernel<T> \
<<<config.block_count, config.thread_per_block, 0, d.stream()>>>( \
total_count, threshold.data(), input.data(), output.data()); \
}
TF_CALL_GPU_NUMBER_TYPES(DEFINE_GPU_SPECS)
TF_CALL_bool(DEFINE_GPU_SPECS)
#undef DECLARE_GPU_SPECS
} // namespace functor
} // namespace tensorflow
#endif // GOOGLE_CUDA

View File

@ -0,0 +1,163 @@
/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
// See docs in ../ops/math_ops.cc
#define EIGEN_USE_THREADS
#include <bitset>
#include "tensorflow/core/kernels/population_count_op.h"
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/register_types.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/framework/tensor_shape.h"
#include "tensorflow/core/framework/types.h"
#include "tensorflow/core/lib/core/status.h"
#include "tensorflow/core/util/work_sharder.h"
namespace tensorflow {
typedef Eigen::ThreadPoolDevice CPUDevice;
typedef Eigen::GpuDevice GPUDevice;
template <typename Device, typename T>
class PopulationCountOp : public OpKernel {
public:
explicit PopulationCountOp(OpKernelConstruction* context)
: OpKernel(context) {}
void Compute(OpKernelContext* c) override {
const Tensor& input_t = c->input(0);
Tensor* output_t;
OP_REQUIRES_OK(c, c->allocate_output(0, input_t.shape(), &output_t));
auto input = input_t.flat<T>();
auto output = output_t->flat<uint8>();
functor::PopulationCount<Device, T> popcnt;
popcnt(c, input, output);
}
};
#define REGISTER_POPULATION_COUNT(type) \
REGISTER_KERNEL_BUILDER( \
Name("PopulationCount").Device(DEVICE_CPU).TypeConstraint<type>("T"), \
PopulationCountOp<CPUDevice, type>);
TF_CALL_uint8(REGISTER_POPULATION_COUNT);
TF_CALL_int8(REGISTER_POPULATION_COUNT);
TF_CALL_uint16(REGISTER_POPULATION_COUNT);
TF_CALL_int16(REGISTER_POPULATION_COUNT);
TF_CALL_int32(REGISTER_POPULATION_COUNT);
TF_CALL_int64(REGISTER_POPULATION_COUNT);
#undef REGISTER_POPULATION_COUNT
namespace functor {
namespace {
template <typename T>
inline uint8 PopCnt(const T v);
#define POPCNT(T, N) \
template <> \
uint8 PopCnt<T>(const T v) { \
return std::bitset<N>(v).count(); \
}
POPCNT(int8, 8);
POPCNT(uint8, 8);
POPCNT(int16, 16);
POPCNT(uint16, 16);
POPCNT(int32, 32);
POPCNT(int64, 64);
#undef POPCNT
} // namespace
template <typename T>
struct PopulationCount<CPUDevice, T> {
void operator()(OpKernelContext* c, typename TTypes<T>::ConstFlat input,
TTypes<uint8>::Flat output) {
const T* input_ptr = input.data();
uint8* output_ptr = output.data();
auto shard = [input_ptr, output_ptr](int64 start, int64 limit) {
for (int64 i = start; i < limit; ++i) {
output_ptr[i] = PopCnt<T>(input_ptr[i]);
}
};
int64 total_shards = input.size();
// Approximating cost of popcnt: convert T to int64
// (std::bitset constructor) and convert int64 to uint8
// (bitset.count() -> output). The .count() itself is relatively cheap.
const double total_cost = (Eigen::TensorOpCost::CastCost<T, uint8>() +
Eigen::TensorOpCost::CastCost<int64, uint8>());
const int64 shard_cost = (total_cost >= static_cast<double>(kint64max))
? kint64max
: static_cast<int64>(total_cost);
auto worker_threads = *(c->device()->tensorflow_cpu_worker_threads());
Shard(worker_threads.num_threads, worker_threads.workers, total_shards,
shard_cost, shard);
}
};
} // namespace functor
#if GOOGLE_CUDA
#define REGISTER_POPULATION_COUNT(type) \
REGISTER_KERNEL_BUILDER( \
Name("PopulationCount").Device(DEVICE_GPU).TypeConstraint<type>("T"), \
PopulationCountOp<GPUDevice, type>)
TF_CALL_uint8(REGISTER_POPULATION_COUNT);
TF_CALL_int8(REGISTER_POPULATION_COUNT);
TF_CALL_uint16(REGISTER_POPULATION_COUNT);
TF_CALL_int16(REGISTER_POPULATION_COUNT);
TF_CALL_int32(REGISTER_POPULATION_COUNT);
TF_CALL_int64(REGISTER_POPULATION_COUNT);
#undef REGISTER_POPULATION_COUNT
namespace functor {
#define DECLARE_GPU_SPEC(T) \
template <> \
void PopulationCount<GPUDevice, T>::operator()( \
OpKernelContext* c, typename TTypes<T>::ConstFlat input, \
TTypes<uint8>::Flat output); \
extern template struct PopulationCount<GPUDevice, T>
TF_CALL_uint8(DECLARE_GPU_SPEC);
TF_CALL_int8(DECLARE_GPU_SPEC);
TF_CALL_uint16(DECLARE_GPU_SPEC);
TF_CALL_int16(DECLARE_GPU_SPEC);
TF_CALL_int32(DECLARE_GPU_SPEC);
TF_CALL_int64(DECLARE_GPU_SPEC);
#undef DECLARE_GPU_SPEC
} // namespace functor
#endif // GOOGLE_CUDA
} // namespace tensorflow

View File

@ -0,0 +1,38 @@
/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#ifndef THIRD_PARTY_TENSORFLOW_CORE_KERNELS_POPULATION_COUNT_OP_H_
#define THIRD_PARTY_TENSORFLOW_CORE_KERNELS_POPULATION_COUNT_OP_H_
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/tensor_types.h"
#include "tensorflow/core/platform/types.h"
namespace tensorflow {
namespace functor {
template <typename Device, typename T>
struct PopulationCount {
void operator()(OpKernelContext* c, typename TTypes<T>::ConstFlat input,
TTypes<uint8>::Flat output);
};
} // namespace functor
} // namespace tensorflow
#endif // THIRD_PARTY_TENSORFLOW_CORE_KERNELS_POPULATION_COUNT_OP_H_

View File

@ -0,0 +1,92 @@
/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied.
See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/
#if GOOGLE_CUDA
#define EIGEN_USE_GPU
#include "tensorflow/core/kernels/population_count_op.h"
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/register_types.h"
#include "tensorflow/core/framework/tensor_types.h"
#include "tensorflow/core/platform/types.h"
#include "tensorflow/core/util/cuda_kernel_helper.h"
namespace tensorflow {
typedef Eigen::GpuDevice GPUDevice;
namespace functor {
template <typename T>
__global__ void PopulationCountKernel(const int size, const T* input,
uint8* output) {
CUDA_1D_KERNEL_LOOP(i, size) { output[i] = __popc(ldg(input + i)); }
}
template <>
__global__ void PopulationCountKernel(const int size, const int8* input,
uint8* output) {
// For some reason, __popc on a negative int8 gets confused.
CUDA_1D_KERNEL_LOOP(i, size) {
output[i] = __popc(ldg(reinterpret_cast<const uint8*>(input + i)));
}
}
template <>
__global__ void PopulationCountKernel(const int size, const int16* input,
uint8* output) {
// For some reason, __popc on a negative int16 gets confused.
CUDA_1D_KERNEL_LOOP(i, size) {
output[i] = __popc(ldg(reinterpret_cast<const uint16*>(input + i)));
}
}
template <>
__global__ void PopulationCountKernel<int64>(const int size, const int64* input,
uint8* output) {
CUDA_1D_KERNEL_LOOP(i, size) { output[i] = __popcll(ldg(input + i)); }
}
#define DEFINE_GPU_SPECS(T) \
template <> \
void PopulationCount<GPUDevice, T>::operator()( \
OpKernelContext* c, typename TTypes<T>::ConstFlat input, \
TTypes<uint8>::Flat output) { \
const GPUDevice& d = c->eigen_device<GPUDevice>(); \
int64 total_count = input.size(); \
CudaLaunchConfig config = GetCudaLaunchConfig(total_count, d); \
PopulationCountKernel<T> \
<<<config.block_count, config.thread_per_block, 0, d.stream()>>>( \
total_count, input.data(), output.data()); \
}
TF_CALL_uint8(DEFINE_GPU_SPECS);
TF_CALL_int8(DEFINE_GPU_SPECS);
TF_CALL_uint16(DEFINE_GPU_SPECS);
TF_CALL_int16(DEFINE_GPU_SPECS);
TF_CALL_int32(DEFINE_GPU_SPECS);
TF_CALL_int64(DEFINE_GPU_SPECS);
#undef DEFINE_GPU_SPECS
} // namespace functor
} // namespace tensorflow
#endif // GOOGLE_CUDA

View File

@ -40,6 +40,22 @@ computation is performed on the underlying representation of x.
.Attr("T: {int8, int16, int32, int64, uint8, uint16}") \ .Attr("T: {int8, int16, int32, int64, uint8, uint16}") \
.SetShapeFn(shape_inference::UnchangedShape) .SetShapeFn(shape_inference::UnchangedShape)
REGISTER_OP("PopulationCount")
.Input("x: T")
.Output("y: uint8")
.Attr("T: {int8, int16, int32, int64, uint8, uint16}")
.SetShapeFn(shape_inference::UnchangedShape)
.Doc(R"doc(
Computes element-wise population count (a.k.a. popcount, bitsum, bitcount).
For each entry in `x`, calculates the number of `1` (on) bits in the binary
representation of that entry.
**NOTE**: It is more efficient to first `tf.bitcast` your tensors into
`int32` or `int64` and perform the bitcount on the result, than to feed in
8- or 16-bit inputs and then aggregate the resulting counts.
)doc");
REGISTER_OP("BitwiseAnd").BINARY_BITWISE().Doc(R"doc( REGISTER_OP("BitwiseAnd").BINARY_BITWISE().Doc(R"doc(
Elementwise computes the bitwise AND of `x` and `y`. Elementwise computes the bitwise AND of `x` and `y`.

View File

@ -4597,6 +4597,37 @@ op {
} }
} }
} }
op {
name: "CompareAndBitpack"
input_arg {
name: "input"
type_attr: "T"
}
input_arg {
name: "threshold"
type_attr: "T"
}
output_arg {
name: "output"
type: DT_UINT8
}
attr {
name: "T"
type: "type"
allowed_values {
list {
type: DT_BOOL
type: DT_HALF
type: DT_FLOAT
type: DT_DOUBLE
type: DT_INT8
type: DT_INT16
type: DT_INT32
type: DT_INT64
}
}
}
}
op { op {
name: "Complex" name: "Complex"
input_arg { input_arg {
@ -16267,6 +16298,31 @@ op {
} }
} }
} }
op {
name: "PopulationCount"
input_arg {
name: "x"
type_attr: "T"
}
output_arg {
name: "y"
type: DT_UINT8
}
attr {
name: "T"
type: "type"
allowed_values {
list {
type: DT_INT8
type: DT_INT16
type: DT_INT32
type: DT_INT64
type: DT_UINT8
type: DT_UINT16
}
}
}
}
op { op {
name: "Pow" name: "Pow"
input_arg { input_arg {

View File

@ -2458,6 +2458,64 @@ out_type: The type of the output. Should be a lower bit depth than Tinput.
)doc"); )doc");
REGISTER_OP("CompareAndBitpack")
.Input("input: T")
.Input("threshold: T")
.Output("output: uint8")
.Attr("T: {bool, float16, float32, float64, int8, int16, int32, int64}")
.SetShapeFn([](InferenceContext* c) {
ShapeHandle input;
TF_RETURN_IF_ERROR(c->WithRankAtLeast(c->input(0), 1, &input));
ShapeHandle unused;
TF_RETURN_IF_ERROR(c->WithRank(c->input(1), 0, &unused));
ShapeHandle output = input;
if (c->RankKnown(input)) {
int rank = c->Rank(input);
auto inner_dim = c->Dim(input, rank - 1);
DimensionHandle inferred_dim;
TF_RETURN_IF_ERROR(c->Divide(inner_dim, 8,
/* evenly_divisible */ true,
&inferred_dim));
TF_RETURN_IF_ERROR(
c->ReplaceDim(output, rank - 1, inferred_dim, &output));
}
c->set_output(0, output);
return Status::OK();
})
.Doc(R"doc(
Compare values of `input` to `threshold` and pack resulting bits into a `uint8`.
Each comparison returns a boolean `true` (if `input_value > threshold`)
or and `false` otherwise.
This operation is useful for Locality-Sensitive-Hashing (LSH) and other
algorithms that use hashing approximations of cosine and `L2` distances;
codes can be generated from an input via:
```python
codebook_size = 50
codebook_bits = codebook_size * 32
codebook = tf.get_variable('codebook', [x.shape[-1].value, codebook_bits],
dtype=x.dtype,
initializer=tf.orthogonal_initializer())
codes = compare_and_threshold(tf.matmul(x, codebook), threshold=0.)
codes = tf.bitcast(codes, tf.int32) # go from uint8 to int32
# now codes has shape x.shape[:-1] + [codebook_size]
```
**NOTE**: Currently, the innermost dimension of the tensor must be divisible
by 8.
Given an `input` shaped `[s0, s1, ..., s_n]`, the output is
a `uint8` tensor shaped `[s0, s1, ..., s_n / 8]`.
input: Values to compare against `threshold` and bitpack.
threshold: Threshold to compare against.
T: The type of the input and threshold.
output: The bitpacked comparisons.
)doc");
REGISTER_OP("RequantizationRange") REGISTER_OP("RequantizationRange")
.Input("input: Tinput") .Input("input: Tinput")
.Input("input_min: float") .Input("input_min: float")

View File

@ -499,7 +499,7 @@ op {
} }
input_arg { input_arg {
name: "reduction_indices" name: "reduction_indices"
description: "The dimensions to reduce." description: "The dimensions to reduce. Must be in the range\n`[-rank(input), rank(input))`."
type_attr: "Tidx" type_attr: "Tidx"
} }
output_arg { output_arg {
@ -601,7 +601,7 @@ op {
} }
input_arg { input_arg {
name: "reduction_indices" name: "reduction_indices"
description: "The dimensions to reduce." description: "The dimensions to reduce. Must be in the range\n`[-rank(input), rank(input))`."
type_attr: "Tidx" type_attr: "Tidx"
} }
output_arg { output_arg {
@ -1691,7 +1691,7 @@ op {
} }
input_arg { input_arg {
name: "dimension" name: "dimension"
description: "int32 or int64, 0 <= dimension < rank(input). Describes\nwhich dimension of the input Tensor to reduce across. For vectors,\nuse dimension = 0." description: "int32 or int64, must be in the range `[-rank(input), rank(input))`.\nDescribes which dimension of the input Tensor to reduce across. For vectors,\nuse dimension = 0."
type_attr: "Tidx" type_attr: "Tidx"
} }
output_arg { output_arg {
@ -1757,7 +1757,7 @@ op {
} }
input_arg { input_arg {
name: "dimension" name: "dimension"
description: "int32 or int64, 0 <= dimension < rank(input). Describes\nwhich dimension of the input Tensor to reduce across. For vectors,\nuse dimension = 0." description: "int32 or int64, must be in the range `[-rank(input), rank(input))`.\nDescribes which dimension of the input Tensor to reduce across. For vectors,\nuse dimension = 0."
type_attr: "Tidx" type_attr: "Tidx"
} }
output_arg { output_arg {
@ -4406,6 +4406,43 @@ op {
summary: "Computes the reverse mode backpropagated gradient of the Cholesky algorithm." summary: "Computes the reverse mode backpropagated gradient of the Cholesky algorithm."
description: "For an explanation see \"Differentiation of the Cholesky algorithm\" by\nIain Murray http://arxiv.org/abs/1602.07527." description: "For an explanation see \"Differentiation of the Cholesky algorithm\" by\nIain Murray http://arxiv.org/abs/1602.07527."
} }
op {
name: "CompareAndBitpack"
input_arg {
name: "input"
description: "Values to compare against `threshold` and bitpack."
type_attr: "T"
}
input_arg {
name: "threshold"
description: "Threshold to compare against."
type_attr: "T"
}
output_arg {
name: "output"
description: "The bitpacked comparisons."
type: DT_UINT8
}
attr {
name: "T"
type: "type"
description: "The type of the input and threshold."
allowed_values {
list {
type: DT_BOOL
type: DT_HALF
type: DT_FLOAT
type: DT_DOUBLE
type: DT_INT8
type: DT_INT16
type: DT_INT32
type: DT_INT64
}
}
}
summary: "Compare values of `input` to `threshold` and pack resulting bits into a `uint8`."
description: "Each comparison returns a boolean `true` (if `input_value > threshold`)\nor and `false` otherwise.\n\nThis operation is useful for Locality-Sensitive-Hashing (LSH) and other\nalgorithms that use hashing approximations of cosine and `L2` distances;\ncodes can be generated from an input via:\n\n```python\ncodebook_size = 50\ncodebook_bits = codebook_size * 32\ncodebook = tf.get_variable(\'codebook\', [x.shape[-1].value, codebook_bits],\n dtype=x.dtype,\n initializer=tf.orthogonal_initializer())\ncodes = compare_and_threshold(tf.matmul(x, codebook), threshold=0.)\ncodes = tf.bitcast(codes, tf.int32) # go from uint8 to int32\n# now codes has shape x.shape[:-1] + [codebook_size]\n```\n\n**NOTE**: Currently, the innermost dimension of the tensor must be divisible\nby 8.\n\nGiven an `input` shaped `[s0, s1, ..., s_n]`, the output is\na `uint8` tensor shaped `[s0, s1, ..., s_n / 8]`."
}
op { op {
name: "Complex" name: "Complex"
input_arg { input_arg {
@ -5656,10 +5693,12 @@ op {
name: "Cumprod" name: "Cumprod"
input_arg { input_arg {
name: "x" name: "x"
description: "A `Tensor`. Must be one of the following types: `float32`, `float64`,\n`int64`, `int32`, `uint8`, `uint16`, `int16`, `int8`, `complex64`,\n`complex128`, `qint8`, `quint8`, `qint32`, `half`."
type_attr: "T" type_attr: "T"
} }
input_arg { input_arg {
name: "axis" name: "axis"
description: "A `Tensor` of type `int32` (default: 0). Must be in the range\n`[-rank(x), rank(x))`."
type_attr: "Tidx" type_attr: "Tidx"
} }
output_arg { output_arg {
@ -5672,6 +5711,7 @@ op {
default_value { default_value {
b: false b: false
} }
description: "If `True`, perform exclusive cumprod."
} }
attr { attr {
name: "reverse" name: "reverse"
@ -5679,6 +5719,7 @@ op {
default_value { default_value {
b: false b: false
} }
description: "A `bool` (default: False)."
} }
attr { attr {
name: "T" name: "T"
@ -5722,10 +5763,12 @@ op {
name: "Cumsum" name: "Cumsum"
input_arg { input_arg {
name: "x" name: "x"
description: "A `Tensor`. Must be one of the following types: `float32`, `float64`,\n`int64`, `int32`, `uint8`, `uint16`, `int16`, `int8`, `complex64`,\n`complex128`, `qint8`, `quint8`, `qint32`, `half`."
type_attr: "T" type_attr: "T"
} }
input_arg { input_arg {
name: "axis" name: "axis"
description: "A `Tensor` of type `int32` (default: 0). Must be in the range\n`[-rank(x), rank(x))`."
type_attr: "Tidx" type_attr: "Tidx"
} }
output_arg { output_arg {
@ -5738,6 +5781,7 @@ op {
default_value { default_value {
b: false b: false
} }
description: "If `True`, perform exclusive cumsum."
} }
attr { attr {
name: "reverse" name: "reverse"
@ -5745,6 +5789,7 @@ op {
default_value { default_value {
b: false b: false
} }
description: "A `bool` (default: False)."
} }
attr { attr {
name: "T" name: "T"
@ -7615,7 +7660,7 @@ op {
} }
input_arg { input_arg {
name: "dim" name: "dim"
description: "0-D (scalar). Specifies the dimension index at which to\nexpand the shape of `input`." description: "0-D (scalar). Specifies the dimension index at which to\nexpand the shape of `input`. Must be in the range\n`[-rank(input) - 1, rank(input)]`."
type_attr: "Tdim" type_attr: "Tdim"
} }
output_arg { output_arg {
@ -12325,7 +12370,7 @@ op {
} }
input_arg { input_arg {
name: "reduction_indices" name: "reduction_indices"
description: "The dimensions to reduce." description: "The dimensions to reduce. Must be in the range\n`[-rank(input), rank(input))`."
type_attr: "Tidx" type_attr: "Tidx"
} }
output_arg { output_arg {
@ -13102,7 +13147,7 @@ op {
} }
input_arg { input_arg {
name: "reduction_indices" name: "reduction_indices"
description: "The dimensions to reduce." description: "The dimensions to reduce. Must be in the range\n`[-rank(input), rank(input))`."
type_attr: "Tidx" type_attr: "Tidx"
} }
output_arg { output_arg {
@ -13293,7 +13338,7 @@ op {
} }
input_arg { input_arg {
name: "reduction_indices" name: "reduction_indices"
description: "The dimensions to reduce." description: "The dimensions to reduce. Must be in the range\n`[-rank(input), rank(input))`."
type_attr: "Tidx" type_attr: "Tidx"
} }
output_arg { output_arg {
@ -15436,6 +15481,33 @@ op {
summary: "Compute the polygamma function \\\\(\\psi^{(n)}(x)\\\\)." summary: "Compute the polygamma function \\\\(\\psi^{(n)}(x)\\\\)."
description: "The polygamma function is defined as:\n\n\n\\\\(\\psi^{(n)}(x) = \\frac{d^n}{dx^n} \\psi(x)\\\\)\n\nwhere \\\\(\\psi(x)\\\\) is the digamma function." description: "The polygamma function is defined as:\n\n\n\\\\(\\psi^{(n)}(x) = \\frac{d^n}{dx^n} \\psi(x)\\\\)\n\nwhere \\\\(\\psi(x)\\\\) is the digamma function."
} }
op {
name: "PopulationCount"
input_arg {
name: "x"
type_attr: "T"
}
output_arg {
name: "y"
type: DT_UINT8
}
attr {
name: "T"
type: "type"
allowed_values {
list {
type: DT_INT8
type: DT_INT16
type: DT_INT32
type: DT_INT64
type: DT_UINT8
type: DT_UINT16
}
}
}
summary: "Computes element-wise population count (a.k.a. popcount, bitsum, bitcount)."
description: "For each entry in `x`, calculates the number of `1` (on) bits in the binary\nrepresentation of that entry.\n\n**NOTE**: It is more efficient to first `tf.bitcast` your tensors into\n`int32` or `int64` and perform the bitcount on the result, than to feed in\n8- or 16-bit inputs and then aggregate the resulting counts."
}
op { op {
name: "Pow" name: "Pow"
input_arg { input_arg {
@ -15662,7 +15734,7 @@ op {
} }
input_arg { input_arg {
name: "reduction_indices" name: "reduction_indices"
description: "The dimensions to reduce." description: "The dimensions to reduce. Must be in the range\n`[-rank(input), rank(input))`."
type_attr: "Tidx" type_attr: "Tidx"
} }
output_arg { output_arg {
@ -21708,7 +21780,7 @@ op {
} }
input_arg { input_arg {
name: "axis" name: "axis"
description: "1-D. The indices of the dimensions to reverse." description: "1-D. The indices of the dimensions to reverse. Must be in the range\n`[-rank(tensor), rank(tensor))`."
type_attr: "Tidx" type_attr: "Tidx"
} }
output_arg { output_arg {
@ -27260,7 +27332,7 @@ op {
list { list {
} }
} }
description: "If specified, only squeezes the dimensions listed. The dimension\nindex starts at 0. It is an error to squeeze a dimension that is not 1." description: "If specified, only squeezes the dimensions listed. The dimension\nindex starts at 0. It is an error to squeeze a dimension that is not 1. Must\nbe in the range `[-rank(input), rank(input))`."
has_minimum: true has_minimum: true
} }
summary: "Removes dimensions of size 1 from the shape of a tensor." summary: "Removes dimensions of size 1 from the shape of a tensor."
@ -28250,7 +28322,7 @@ op {
} }
input_arg { input_arg {
name: "reduction_indices" name: "reduction_indices"
description: "The dimensions to reduce." description: "The dimensions to reduce. Must be in the range\n`[-rank(input), rank(input))`."
type_attr: "Tidx" type_attr: "Tidx"
} }
output_arg { output_arg {

View File

@ -20,6 +20,7 @@ limitations under the License.
#if (defined(__GNUC__) || defined(__APPLE__)) && !defined(SWIG) #if (defined(__GNUC__) || defined(__APPLE__)) && !defined(SWIG)
// Compiler supports GCC-style attributes // Compiler supports GCC-style attributes
#define TF_ATTRIBUTE_NORETURN __attribute__((noreturn)) #define TF_ATTRIBUTE_NORETURN __attribute__((noreturn))
#define TF_ATTRIBUTE_ALWAYS_INLINE __attribute__((always_inline))
#define TF_ATTRIBUTE_NOINLINE __attribute__((noinline)) #define TF_ATTRIBUTE_NOINLINE __attribute__((noinline))
#define TF_ATTRIBUTE_UNUSED __attribute__((unused)) #define TF_ATTRIBUTE_UNUSED __attribute__((unused))
#define TF_ATTRIBUTE_COLD __attribute__((cold)) #define TF_ATTRIBUTE_COLD __attribute__((cold))
@ -33,6 +34,7 @@ limitations under the License.
#elif defined(COMPILER_MSVC) #elif defined(COMPILER_MSVC)
// Non-GCC equivalents // Non-GCC equivalents
#define TF_ATTRIBUTE_NORETURN __declspec(noreturn) #define TF_ATTRIBUTE_NORETURN __declspec(noreturn)
#define TF_ATTRIBUTE_ALWAYS_INLINE
#define TF_ATTRIBUTE_NOINLINE #define TF_ATTRIBUTE_NOINLINE
#define TF_ATTRIBUTE_UNUSED #define TF_ATTRIBUTE_UNUSED
#define TF_ATTRIBUTE_COLD #define TF_ATTRIBUTE_COLD
@ -43,6 +45,7 @@ limitations under the License.
#else #else
// Non-GCC equivalents // Non-GCC equivalents
#define TF_ATTRIBUTE_NORETURN #define TF_ATTRIBUTE_NORETURN
#define TF_ATTRIBUTE_ALWAYS_INLINE
#define TF_ATTRIBUTE_NOINLINE #define TF_ATTRIBUTE_NOINLINE
#define TF_ATTRIBUTE_UNUSED #define TF_ATTRIBUTE_UNUSED
#define TF_ATTRIBUTE_COLD #define TF_ATTRIBUTE_COLD

View File

@ -171,11 +171,7 @@ error::Code ErrnoToCode(int err_number) {
Status IOError(const string& context, int err_number) { Status IOError(const string& context, int err_number) {
auto code = ErrnoToCode(err_number); auto code = ErrnoToCode(err_number);
if (code == error::UNKNOWN) {
return Status(code, strings::StrCat(context, "; ", strerror(err_number))); return Status(code, strings::StrCat(context, "; ", strerror(err_number)));
} else {
return Status(code, context);
}
} }
} // namespace tensorflow } // namespace tensorflow

View File

@ -1130,7 +1130,8 @@ type SqueezeAttr func(optionalAttr)
// SqueezeSqueezeDims sets the optional squeeze_dims attribute to value. // SqueezeSqueezeDims sets the optional squeeze_dims attribute to value.
// //
// value: If specified, only squeezes the dimensions listed. The dimension // value: If specified, only squeezes the dimensions listed. The dimension
// index starts at 0. It is an error to squeeze a dimension that is not 1. // index starts at 0. It is an error to squeeze a dimension that is not 1. Must
// be in the range `[-rank(input), rank(input))`.
// If not specified, defaults to <> // If not specified, defaults to <>
// //
// REQUIRES: len(value) >= 0 // REQUIRES: len(value) >= 0
@ -7069,6 +7070,61 @@ func TFRecordReaderV2(scope *Scope, optional ...TFRecordReaderV2Attr) (reader_ha
return op.Output(0) return op.Output(0)
} }
// TextLineReaderV2Attr is an optional argument to TextLineReaderV2.
type TextLineReaderV2Attr func(optionalAttr)
// TextLineReaderV2SkipHeaderLines sets the optional skip_header_lines attribute to value.
//
// value: Number of lines to skip from the beginning of every file.
// If not specified, defaults to 0
func TextLineReaderV2SkipHeaderLines(value int64) TextLineReaderV2Attr {
return func(m optionalAttr) {
m["skip_header_lines"] = value
}
}
// TextLineReaderV2Container sets the optional container attribute to value.
//
// value: If non-empty, this reader is placed in the given container.
// Otherwise, a default container is used.
// If not specified, defaults to ""
func TextLineReaderV2Container(value string) TextLineReaderV2Attr {
return func(m optionalAttr) {
m["container"] = value
}
}
// TextLineReaderV2SharedName sets the optional shared_name attribute to value.
//
// value: If non-empty, this reader is named in the given bucket
// with this shared_name. Otherwise, the node name is used instead.
// If not specified, defaults to ""
func TextLineReaderV2SharedName(value string) TextLineReaderV2Attr {
return func(m optionalAttr) {
m["shared_name"] = value
}
}
// A Reader that outputs the lines of a file delimited by '\n'.
//
// Returns The handle to reference the Reader.
func TextLineReaderV2(scope *Scope, optional ...TextLineReaderV2Attr) (reader_handle tf.Output) {
if scope.Err() != nil {
return
}
attrs := map[string]interface{}{}
for _, a := range optional {
a(attrs)
}
opspec := tf.OpSpec{
Type: "TextLineReaderV2",
Attrs: attrs,
}
op := scope.AddOperation(opspec)
return op.Output(0)
}
// Computes rectified linear 6: `min(max(features, 0), 6)`. // Computes rectified linear 6: `min(max(features, 0), 6)`.
func Relu6(scope *Scope, features tf.Output) (activations tf.Output) { func Relu6(scope *Scope, features tf.Output) (activations tf.Output) {
if scope.Err() != nil { if scope.Err() != nil {
@ -12819,7 +12875,8 @@ func ReciprocalGrad(scope *Scope, x tf.Output, y tf.Output) (z tf.Output) {
// //
// Arguments: // Arguments:
// tensor: Up to 8-D. // tensor: Up to 8-D.
// axis: 1-D. The indices of the dimensions to reverse. // axis: 1-D. The indices of the dimensions to reverse. Must be in the range
// `[-rank(tensor), rank(tensor))`.
// //
// Returns The same shape as `tensor`. // Returns The same shape as `tensor`.
func ReverseV2(scope *Scope, tensor tf.Output, axis tf.Output) (output tf.Output) { func ReverseV2(scope *Scope, tensor tf.Output, axis tf.Output) (output tf.Output) {
@ -14493,61 +14550,6 @@ func Tanh(scope *Scope, x tf.Output) (y tf.Output) {
return op.Output(0) return op.Output(0)
} }
// TextLineReaderV2Attr is an optional argument to TextLineReaderV2.
type TextLineReaderV2Attr func(optionalAttr)
// TextLineReaderV2SkipHeaderLines sets the optional skip_header_lines attribute to value.
//
// value: Number of lines to skip from the beginning of every file.
// If not specified, defaults to 0
func TextLineReaderV2SkipHeaderLines(value int64) TextLineReaderV2Attr {
return func(m optionalAttr) {
m["skip_header_lines"] = value
}
}
// TextLineReaderV2Container sets the optional container attribute to value.
//
// value: If non-empty, this reader is placed in the given container.
// Otherwise, a default container is used.
// If not specified, defaults to ""
func TextLineReaderV2Container(value string) TextLineReaderV2Attr {
return func(m optionalAttr) {
m["container"] = value
}
}
// TextLineReaderV2SharedName sets the optional shared_name attribute to value.
//
// value: If non-empty, this reader is named in the given bucket
// with this shared_name. Otherwise, the node name is used instead.
// If not specified, defaults to ""
func TextLineReaderV2SharedName(value string) TextLineReaderV2Attr {
return func(m optionalAttr) {
m["shared_name"] = value
}
}
// A Reader that outputs the lines of a file delimited by '\n'.
//
// Returns The handle to reference the Reader.
func TextLineReaderV2(scope *Scope, optional ...TextLineReaderV2Attr) (reader_handle tf.Output) {
if scope.Err() != nil {
return
}
attrs := map[string]interface{}{}
for _, a := range optional {
a(attrs)
}
opspec := tf.OpSpec{
Type: "TextLineReaderV2",
Attrs: attrs,
}
op := scope.AddOperation(opspec)
return op.Output(0)
}
// Component-wise multiplies a SparseTensor by a dense Tensor. // Component-wise multiplies a SparseTensor by a dense Tensor.
// //
// The output locations corresponding to the implicitly zero elements in the sparse // The output locations corresponding to the implicitly zero elements in the sparse
@ -16147,6 +16149,8 @@ func SegmentMean(scope *Scope, data tf.Output, segment_ids tf.Output) (output tf
type CumprodAttr func(optionalAttr) type CumprodAttr func(optionalAttr)
// CumprodExclusive sets the optional exclusive attribute to value. // CumprodExclusive sets the optional exclusive attribute to value.
//
// value: If `True`, perform exclusive cumprod.
// If not specified, defaults to false // If not specified, defaults to false
func CumprodExclusive(value bool) CumprodAttr { func CumprodExclusive(value bool) CumprodAttr {
return func(m optionalAttr) { return func(m optionalAttr) {
@ -16155,6 +16159,8 @@ func CumprodExclusive(value bool) CumprodAttr {
} }
// CumprodReverse sets the optional reverse attribute to value. // CumprodReverse sets the optional reverse attribute to value.
//
// value: A `bool` (default: False).
// If not specified, defaults to false // If not specified, defaults to false
func CumprodReverse(value bool) CumprodAttr { func CumprodReverse(value bool) CumprodAttr {
return func(m optionalAttr) { return func(m optionalAttr) {
@ -16192,6 +16198,13 @@ func CumprodReverse(value bool) CumprodAttr {
// ```python // ```python
// tf.cumprod([a, b, c], exclusive=True, reverse=True) # => [b * c, c, 1] // tf.cumprod([a, b, c], exclusive=True, reverse=True) # => [b * c, c, 1]
// ``` // ```
//
// Arguments:
// x: A `Tensor`. Must be one of the following types: `float32`, `float64`,
// `int64`, `int32`, `uint8`, `uint16`, `int16`, `int8`, `complex64`,
// `complex128`, `qint8`, `quint8`, `qint32`, `half`.
// axis: A `Tensor` of type `int32` (default: 0). Must be in the range
// `[-rank(x), rank(x))`.
func Cumprod(scope *Scope, x tf.Output, axis tf.Output, optional ...CumprodAttr) (out tf.Output) { func Cumprod(scope *Scope, x tf.Output, axis tf.Output, optional ...CumprodAttr) (out tf.Output) {
if scope.Err() != nil { if scope.Err() != nil {
return return
@ -16420,6 +16433,8 @@ func QuantizedRelu6(scope *Scope, features tf.Output, min_features tf.Output, ma
type CumsumAttr func(optionalAttr) type CumsumAttr func(optionalAttr)
// CumsumExclusive sets the optional exclusive attribute to value. // CumsumExclusive sets the optional exclusive attribute to value.
//
// value: If `True`, perform exclusive cumsum.
// If not specified, defaults to false // If not specified, defaults to false
func CumsumExclusive(value bool) CumsumAttr { func CumsumExclusive(value bool) CumsumAttr {
return func(m optionalAttr) { return func(m optionalAttr) {
@ -16428,6 +16443,8 @@ func CumsumExclusive(value bool) CumsumAttr {
} }
// CumsumReverse sets the optional reverse attribute to value. // CumsumReverse sets the optional reverse attribute to value.
//
// value: A `bool` (default: False).
// If not specified, defaults to false // If not specified, defaults to false
func CumsumReverse(value bool) CumsumAttr { func CumsumReverse(value bool) CumsumAttr {
return func(m optionalAttr) { return func(m optionalAttr) {
@ -16465,6 +16482,13 @@ func CumsumReverse(value bool) CumsumAttr {
// ```python // ```python
// tf.cumsum([a, b, c], exclusive=True, reverse=True) # => [b + c, c, 0] // tf.cumsum([a, b, c], exclusive=True, reverse=True) # => [b + c, c, 0]
// ``` // ```
//
// Arguments:
// x: A `Tensor`. Must be one of the following types: `float32`, `float64`,
// `int64`, `int32`, `uint8`, `uint16`, `int16`, `int8`, `complex64`,
// `complex128`, `qint8`, `quint8`, `qint32`, `half`.
// axis: A `Tensor` of type `int32` (default: 0). Must be in the range
// `[-rank(x), rank(x))`.
func Cumsum(scope *Scope, x tf.Output, axis tf.Output, optional ...CumsumAttr) (out tf.Output) { func Cumsum(scope *Scope, x tf.Output, axis tf.Output, optional ...CumsumAttr) (out tf.Output) {
if scope.Err() != nil { if scope.Err() != nil {
return return
@ -17894,6 +17918,28 @@ func Svd(scope *Scope, input tf.Output, optional ...SvdAttr) (s tf.Output, u tf.
return op.Output(0), op.Output(1), op.Output(2) return op.Output(0), op.Output(1), op.Output(2)
} }
// Computes element-wise population count (a.k.a. popcount, bitsum, bitcount).
//
// For each entry in `x`, calculates the number of `1` (on) bits in the binary
// representation of that entry.
//
// **NOTE**: It is more efficient to first `tf.bitcast` your tensors into
// `int32` or `int64` and perform the bitcount on the result, than to feed in
// 8- or 16-bit inputs and then aggregate the resulting counts.
func PopulationCount(scope *Scope, x tf.Output) (y tf.Output) {
if scope.Err() != nil {
return
}
opspec := tf.OpSpec{
Type: "PopulationCount",
Input: []tf.Input{
x,
},
}
op := scope.AddOperation(opspec)
return op.Output(0)
}
// AssertAttr is an optional argument to Assert. // AssertAttr is an optional argument to Assert.
type AssertAttr func(optionalAttr) type AssertAttr func(optionalAttr)
@ -18063,7 +18109,8 @@ func AnyKeepDims(value bool) AnyAttr {
// //
// Arguments: // Arguments:
// input: The tensor to reduce. // input: The tensor to reduce.
// reduction_indices: The dimensions to reduce. // reduction_indices: The dimensions to reduce. Must be in the range
// `[-rank(input), rank(input))`.
// //
// Returns The reduced tensor. // Returns The reduced tensor.
func Any(scope *Scope, input tf.Output, reduction_indices tf.Output, optional ...AnyAttr) (output tf.Output) { func Any(scope *Scope, input tf.Output, reduction_indices tf.Output, optional ...AnyAttr) (output tf.Output) {
@ -19213,7 +19260,8 @@ func ProdKeepDims(value bool) ProdAttr {
// //
// Arguments: // Arguments:
// input: The tensor to reduce. // input: The tensor to reduce.
// reduction_indices: The dimensions to reduce. // reduction_indices: The dimensions to reduce. Must be in the range
// `[-rank(input), rank(input))`.
// //
// Returns The reduced tensor. // Returns The reduced tensor.
func Prod(scope *Scope, input tf.Output, reduction_indices tf.Output, optional ...ProdAttr) (output tf.Output) { func Prod(scope *Scope, input tf.Output, reduction_indices tf.Output, optional ...ProdAttr) (output tf.Output) {
@ -20258,7 +20306,8 @@ func MaxKeepDims(value bool) MaxAttr {
// //
// Arguments: // Arguments:
// input: The tensor to reduce. // input: The tensor to reduce.
// reduction_indices: The dimensions to reduce. // reduction_indices: The dimensions to reduce. Must be in the range
// `[-rank(input), rank(input))`.
// //
// Returns The reduced tensor. // Returns The reduced tensor.
func Max(scope *Scope, input tf.Output, reduction_indices tf.Output, optional ...MaxAttr) (output tf.Output) { func Max(scope *Scope, input tf.Output, reduction_indices tf.Output, optional ...MaxAttr) (output tf.Output) {
@ -20583,7 +20632,8 @@ func Sqrt(scope *Scope, x tf.Output) (y tf.Output) {
// Arguments: // Arguments:
// //
// dim: 0-D (scalar). Specifies the dimension index at which to // dim: 0-D (scalar). Specifies the dimension index at which to
// expand the shape of `input`. // expand the shape of `input`. Must be in the range
// `[-rank(input) - 1, rank(input)]`.
// //
// Returns Contains the same data as `input`, but its shape has an additional // Returns Contains the same data as `input`, but its shape has an additional
// dimension of size 1 added. // dimension of size 1 added.
@ -20623,7 +20673,8 @@ func AllKeepDims(value bool) AllAttr {
// //
// Arguments: // Arguments:
// input: The tensor to reduce. // input: The tensor to reduce.
// reduction_indices: The dimensions to reduce. // reduction_indices: The dimensions to reduce. Must be in the range
// `[-rank(input), rank(input))`.
// //
// Returns The reduced tensor. // Returns The reduced tensor.
func All(scope *Scope, input tf.Output, reduction_indices tf.Output, optional ...AllAttr) (output tf.Output) { func All(scope *Scope, input tf.Output, reduction_indices tf.Output, optional ...AllAttr) (output tf.Output) {
@ -21665,8 +21716,8 @@ func ArgMinOutputType(value tf.DataType) ArgMinAttr {
// //
// Arguments: // Arguments:
// //
// dimension: int32 or int64, 0 <= dimension < rank(input). Describes // dimension: int32 or int64, must be in the range `[-rank(input), rank(input))`.
// which dimension of the input Tensor to reduce across. For vectors, // Describes which dimension of the input Tensor to reduce across. For vectors,
// use dimension = 0. // use dimension = 0.
func ArgMin(scope *Scope, input tf.Output, dimension tf.Output, optional ...ArgMinAttr) (output tf.Output) { func ArgMin(scope *Scope, input tf.Output, dimension tf.Output, optional ...ArgMinAttr) (output tf.Output) {
if scope.Err() != nil { if scope.Err() != nil {
@ -22716,7 +22767,8 @@ func MeanKeepDims(value bool) MeanAttr {
// //
// Arguments: // Arguments:
// input: The tensor to reduce. // input: The tensor to reduce.
// reduction_indices: The dimensions to reduce. // reduction_indices: The dimensions to reduce. Must be in the range
// `[-rank(input), rank(input))`.
// //
// Returns The reduced tensor. // Returns The reduced tensor.
func Mean(scope *Scope, input tf.Output, reduction_indices tf.Output, optional ...MeanAttr) (output tf.Output) { func Mean(scope *Scope, input tf.Output, reduction_indices tf.Output, optional ...MeanAttr) (output tf.Output) {
@ -22856,7 +22908,8 @@ func MinKeepDims(value bool) MinAttr {
// //
// Arguments: // Arguments:
// input: The tensor to reduce. // input: The tensor to reduce.
// reduction_indices: The dimensions to reduce. // reduction_indices: The dimensions to reduce. Must be in the range
// `[-rank(input), rank(input))`.
// //
// Returns The reduced tensor. // Returns The reduced tensor.
func Min(scope *Scope, input tf.Output, reduction_indices tf.Output, optional ...MinAttr) (output tf.Output) { func Min(scope *Scope, input tf.Output, reduction_indices tf.Output, optional ...MinAttr) (output tf.Output) {
@ -22914,8 +22967,8 @@ func ArgMaxOutputType(value tf.DataType) ArgMaxAttr {
// //
// Arguments: // Arguments:
// //
// dimension: int32 or int64, 0 <= dimension < rank(input). Describes // dimension: int32 or int64, must be in the range `[-rank(input), rank(input))`.
// which dimension of the input Tensor to reduce across. For vectors, // Describes which dimension of the input Tensor to reduce across. For vectors,
// use dimension = 0. // use dimension = 0.
func ArgMax(scope *Scope, input tf.Output, dimension tf.Output, optional ...ArgMaxAttr) (output tf.Output) { func ArgMax(scope *Scope, input tf.Output, dimension tf.Output, optional ...ArgMaxAttr) (output tf.Output) {
if scope.Err() != nil { if scope.Err() != nil {
@ -23888,6 +23941,51 @@ func QuantizeDownAndShrinkRange(scope *Scope, input tf.Output, input_min tf.Outp
return op.Output(0), op.Output(1), op.Output(2) return op.Output(0), op.Output(1), op.Output(2)
} }
// Compare values of `input` to `threshold` and pack resulting bits into a `uint8`.
//
// Each comparison returns a boolean `true` (if `input_value > threshold`)
// or and `false` otherwise.
//
// This operation is useful for Locality-Sensitive-Hashing (LSH) and other
// algorithms that use hashing approximations of cosine and `L2` distances;
// codes can be generated from an input via:
//
// ```python
// codebook_size = 50
// codebook_bits = codebook_size * 32
// codebook = tf.get_variable('codebook', [x.shape[-1].value, codebook_bits],
// dtype=x.dtype,
// initializer=tf.orthogonal_initializer())
// codes = compare_and_threshold(tf.matmul(x, codebook), threshold=0.)
// codes = tf.bitcast(codes, tf.int32) # go from uint8 to int32
// # now codes has shape x.shape[:-1] + [codebook_size]
// ```
//
// **NOTE**: Currently, the innermost dimension of the tensor must be divisible
// by 8.
//
// Given an `input` shaped `[s0, s1, ..., s_n]`, the output is
// a `uint8` tensor shaped `[s0, s1, ..., s_n / 8]`.
//
// Arguments:
// input: Values to compare against `threshold` and bitpack.
// threshold: Threshold to compare against.
//
// Returns The bitpacked comparisons.
func CompareAndBitpack(scope *Scope, input tf.Output, threshold tf.Output) (output tf.Output) {
if scope.Err() != nil {
return
}
opspec := tf.OpSpec{
Type: "CompareAndBitpack",
Input: []tf.Input{
input, threshold,
},
}
op := scope.AddOperation(opspec)
return op.Output(0)
}
// Outputs a `Summary` protocol buffer with a tensor and per-plugin data. // Outputs a `Summary` protocol buffer with a tensor and per-plugin data.
// //
// Arguments: // Arguments:
@ -24724,7 +24822,8 @@ func SumKeepDims(value bool) SumAttr {
// //
// Arguments: // Arguments:
// input: The tensor to reduce. // input: The tensor to reduce.
// reduction_indices: The dimensions to reduce. // reduction_indices: The dimensions to reduce. Must be in the range
// `[-rank(input), rank(input))`.
// //
// Returns The reduced tensor. // Returns The reduced tensor.
func Sum(scope *Scope, input tf.Output, reduction_indices tf.Output, optional ...SumAttr) (output tf.Output) { func Sum(scope *Scope, input tf.Output, reduction_indices tf.Output, optional ...SumAttr) (output tf.Output) {

View File

@ -1671,6 +1671,18 @@ cuda_py_test(
], ],
) )
cuda_py_test(
name = "compare_and_bitpack_op_test",
size = "small",
srcs = ["compare_and_bitpack_op_test.py"],
additional_deps = [
"//third_party/py/numpy",
"//tensorflow/python:math_ops",
"//tensorflow/python:client_testlib",
"//tensorflow/python:framework_for_generated_wrappers",
],
)
cuda_py_test( cuda_py_test(
name = "scalar_test", name = "scalar_test",
size = "small", size = "small",

View File

@ -0,0 +1,83 @@
# Copyright 2017 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""Tests for tensorflow.ops.compare_and_bitpack_op."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
from tensorflow.python.ops import math_ops
from tensorflow.python.platform import test
class CompareAndBitpackTest(test.TestCase):
def _testCompareAndBitpack(self,
x, threshold,
truth,
expected_err_re=None):
with self.test_session(use_gpu=True):
ans = math_ops.compare_and_bitpack(x, threshold)
if expected_err_re is None:
tf_ans = ans.eval()
self.assertShapeEqual(truth, ans)
self.assertAllEqual(tf_ans, truth)
else:
with self.assertRaisesOpError(expected_err_re):
ans.eval()
def _testBasic(self, dtype):
rows = 371
cols = 294
x = np.random.randn(rows, cols * 8)
if dtype == np.bool:
x = x > 0
else:
x = x.astype(dtype)
threshold = dtype(0)
# np.packbits flattens the tensor, so we reshape it back to the
# expected dimensions.
truth = np.packbits(x > threshold).reshape(rows, cols)
self._testCompareAndBitpack(x, threshold, truth)
def testBasicFloat32(self):
self._testBasic(np.float32)
def testBasicFloat64(self):
self._testBasic(np.float64)
def testBasicFloat16(self):
self._testBasic(np.float16)
def testBasicBool(self):
self._testBasic(np.bool)
def testBasicInt8(self):
self._testBasic(np.int8)
def testBasicInt16(self):
self._testBasic(np.int16)
def testBasicInt32(self):
self._testBasic(np.int32)
def testBasicInt64(self):
self._testBasic(np.int64)
if __name__ == "__main__":
test.main()

View File

@ -88,8 +88,8 @@ class GatherTest(test.TestCase):
def testHigherRank(self): def testHigherRank(self):
# We check that scalar and empty indices shapes work as well # We check that scalar and empty indices shapes work as well
for shape in (4, 3, 2), (2, 1, 3, 2): shape = (2, 1, 3, 2)
for indices_shape in (), (0,), (3, 0), (3, 5), (5, 2, 3): for indices_shape in (), (0,), (2, 0), (2, 3):
for dtype in _TEST_TYPES: for dtype in _TEST_TYPES:
for axis in range(len(shape)): for axis in range(len(shape)):
params = self._buildParams(np.random.randn(*shape), dtype) params = self._buildParams(np.random.randn(*shape), dtype)

View File

@ -330,7 +330,7 @@ def rank(input, name=None):
# pylint: disable=redefined-builtin # pylint: disable=redefined-builtin
"""Returns the rank of a tensor. """Returns the rank of a tensor.
This operation returns an integer representing the rank of `input`. Returns a 0-D `int32` `Tensor` representing the rank of `input`.
For example: For example:

View File

@ -36,5 +36,6 @@ ops.NotDifferentiable("BitwiseAnd")
ops.NotDifferentiable("BitwiseOr") ops.NotDifferentiable("BitwiseOr")
ops.NotDifferentiable("BitwiseXor") ops.NotDifferentiable("BitwiseXor")
ops.NotDifferentiable("Invert") ops.NotDifferentiable("Invert")
ops.NotDifferentiable("PopulationCount")
remove_undocumented(__name__) remove_undocumented(__name__)

View File

@ -18,10 +18,14 @@ from __future__ import absolute_import
from __future__ import division from __future__ import division
from __future__ import print_function from __future__ import print_function
import numpy as np
import six
from tensorflow.python.framework import constant_op from tensorflow.python.framework import constant_op
from tensorflow.python.framework import dtypes from tensorflow.python.framework import dtypes
from tensorflow.python.framework import test_util from tensorflow.python.framework import test_util
from tensorflow.python.ops import bitwise_ops from tensorflow.python.ops import bitwise_ops
from tensorflow.python.ops import gen_bitwise_ops
from tensorflow.python.platform import googletest from tensorflow.python.platform import googletest
@ -46,6 +50,25 @@ class BitwiseOpTest(test_util.TensorFlowTestCase):
self.assertAllEqual(or_result, [5, 5, 7, 15]) self.assertAllEqual(or_result, [5, 5, 7, 15])
self.assertAllEqual(xor_result, [5, 5, 4, 5]) self.assertAllEqual(xor_result, [5, 5, 4, 5])
def testPopulationCountOp(self):
dtype_list = [dtypes.int8, dtypes.int16,
dtypes.int32, dtypes.int64,
dtypes.uint8, dtypes.uint16]
raw_inputs = [0, 1, -1, 3, -3, 5, -5, 14, -14,
127, 128, 255, 256, 65535, 65536,
2**31 - 1, 2**31, 2**32 - 1, 2**32, -2**32 + 1, -2**32,
-2**63 + 1, 2**63 - 1]
def count_bits(x):
return sum([bin(z).count("1") for z in six.iterbytes(x.tobytes())])
for dtype in dtype_list:
with self.test_session(use_gpu=True) as sess:
print("PopulationCount test: ", dtype)
inputs = np.array(raw_inputs, dtype=dtype.as_numpy_dtype)
truth = [count_bits(x) for x in inputs]
input_tensor = constant_op.constant(inputs, dtype=dtype)
popcnt_result = sess.run(gen_bitwise_ops.population_count(input_tensor))
self.assertAllEqual(truth, popcnt_result)
def testInvertOp(self): def testInvertOp(self):
dtype_list = [dtypes.int8, dtypes.int16, dtypes.int32, dtypes.int64, dtype_list = [dtypes.int8, dtypes.int16, dtypes.int32, dtypes.int64,
dtypes.uint8, dtypes.uint16] dtypes.uint8, dtypes.uint16]

View File

@ -61,6 +61,7 @@ from tensorflow.python.framework import dtypes
from tensorflow.python.framework import ops from tensorflow.python.framework import ops
from tensorflow.python.framework import sparse_tensor from tensorflow.python.framework import sparse_tensor
from tensorflow.python.framework import tensor_shape from tensorflow.python.framework import tensor_shape
from tensorflow.python.framework import tensor_util
from tensorflow.python.ops import array_ops from tensorflow.python.ops import array_ops
from tensorflow.python.ops import gen_array_ops from tensorflow.python.ops import gen_array_ops
from tensorflow.python.ops import gen_control_flow_ops from tensorflow.python.ops import gen_control_flow_ops
@ -983,9 +984,16 @@ class GradLoopState(object):
# the right control flow context. # the right control flow context.
real_value = self._grad_context.AddValue(cur_value) real_value = self._grad_context.AddValue(cur_value)
break break
elif constant_op.is_constant(cur_value):
# If the value to be forwarded is a constant, clone the constant in
# the gradient loop rather than using a stack.
# TODO(phawkins): consider hoisting the constant out of the loop
# instead.
real_value = constant_op.constant(
tensor_util.constant_value(cur_value), dtype=cur_value.dtype)
break
else: else:
# Record the history of this value in forward_ctxt. # Record the history of this value in forward_ctxt.
# TODO(yuanbyu): Avoid recording constants.
self._grad_context.Exit() self._grad_context.Exit()
history_value = cur_grad_state.AddForwardAccumulator(cur_value) history_value = cur_grad_state.AddForwardAccumulator(cur_value)
self._grad_context.Enter() self._grad_context.Enter()

View File

@ -41,7 +41,6 @@ from tensorflow.python.ops import array_ops
from tensorflow.python.ops import linalg_ops from tensorflow.python.ops import linalg_ops
from tensorflow.python.ops import math_ops from tensorflow.python.ops import math_ops
from tensorflow.python.ops import random_ops from tensorflow.python.ops import random_ops
from tensorflow.python.ops import math_ops
class Initializer(object): class Initializer(object):

View File

@ -830,7 +830,8 @@ class ReluTest(test_lib.TestCase):
class MomentsTest(test_lib.TestCase): class MomentsTest(test_lib.TestCase):
def doOutputTest(self, input_shape, moments_axes, tol=1e-4): def doOutputTest(self, input_shape, moments_axes, tol=1e-4,
check_gradients=False):
for mu in [0.0, 1.0, 1e3]: for mu in [0.0, 1.0, 1e3]:
for sigma in [1.0, 0.1]: for sigma in [1.0, 0.1]:
for keep_dims in [True, False]: for keep_dims in [True, False]:
@ -846,6 +847,15 @@ class MomentsTest(test_lib.TestCase):
mean, variance = nn_impl.moments( mean, variance = nn_impl.moments(
inputs, moments_axes, keep_dims=keep_dims) inputs, moments_axes, keep_dims=keep_dims)
if check_gradients:
err = gradient_checker.compute_gradient_error(
inputs, input_shape, mean, mean.shape.as_list())
self.assertLess(err, 1e-3)
err = gradient_checker.compute_gradient_error(
inputs, input_shape, variance, variance.shape.as_list())
self.assertLess(err, 1e-3)
# Evaluate.
[mean, variance] = sess.run([mean, variance]) [mean, variance] = sess.run([mean, variance])
# Make sure that there are no NaNs # Make sure that there are no NaNs
self.assertFalse(np.isnan(mean).any()) self.assertFalse(np.isnan(mean).any())
@ -853,6 +863,12 @@ class MomentsTest(test_lib.TestCase):
self.assertAllClose(mean, expected_mean, rtol=tol, atol=tol) self.assertAllClose(mean, expected_mean, rtol=tol, atol=tol)
self.assertAllClose(variance, expected_var, rtol=tol, atol=tol) self.assertAllClose(variance, expected_var, rtol=tol, atol=tol)
def testOutputAndGradient2DInput0(self):
self.doOutputTest((10, 10), (0,), check_gradients=True)
def testOutputAndGradient2DInput01(self):
self.doOutputTest((10, 10), (0, 1), check_gradients=True)
def testOutput2DInput0(self): def testOutput2DInput0(self):
self.doOutputTest((10, 300), (0,)) self.doOutputTest((10, 300), (0,))

View File

@ -786,13 +786,18 @@ class DropoutWrapper(RNNCell):
class ResidualWrapper(RNNCell): class ResidualWrapper(RNNCell):
"""RNNCell wrapper that ensures cell inputs are added to the outputs.""" """RNNCell wrapper that ensures cell inputs are added to the outputs."""
def __init__(self, cell): def __init__(self, cell, residual_fn=None):
"""Constructs a `ResidualWrapper` for `cell`. """Constructs a `ResidualWrapper` for `cell`.
Args: Args:
cell: An instance of `RNNCell`. cell: An instance of `RNNCell`.
residual_fn: (Optional) The function to map raw cell inputs and raw cell
outputs to the actual cell outputs of the residual network.
Defaults to calling nest.map_structure on (lambda i, o: i + o), inputs
and outputs.
""" """
self._cell = cell self._cell = cell
self._residual_fn = residual_fn
@property @property
def state_size(self): def state_size(self):
@ -807,7 +812,7 @@ class ResidualWrapper(RNNCell):
return self._cell.zero_state(batch_size, dtype) return self._cell.zero_state(batch_size, dtype)
def __call__(self, inputs, state, scope=None): def __call__(self, inputs, state, scope=None):
"""Run the cell and add its inputs to its outputs. """Run the cell and then apply the residual_fn on its inputs to its outputs.
Args: Args:
inputs: cell inputs. inputs: cell inputs.
@ -822,13 +827,14 @@ class ResidualWrapper(RNNCell):
ValueError: If cell inputs and outputs have different structure (value). ValueError: If cell inputs and outputs have different structure (value).
""" """
outputs, new_state = self._cell(inputs, state, scope=scope) outputs, new_state = self._cell(inputs, state, scope=scope)
nest.assert_same_structure(inputs, outputs)
# Ensure shapes match # Ensure shapes match
def assert_shape_match(inp, out): def assert_shape_match(inp, out):
inp.get_shape().assert_is_compatible_with(out.get_shape()) inp.get_shape().assert_is_compatible_with(out.get_shape())
def default_residual_fn(inputs, outputs):
nest.assert_same_structure(inputs, outputs)
nest.map_structure(assert_shape_match, inputs, outputs) nest.map_structure(assert_shape_match, inputs, outputs)
res_outputs = nest.map_structure( return nest.map_structure(lambda inp, out: inp + out, inputs, outputs)
lambda inp, out: inp + out, inputs, outputs) res_outputs = (self._residual_fn or default_residual_fn)(inputs, outputs)
return (res_outputs, new_state) return (res_outputs, new_state)

View File

@ -54,7 +54,7 @@ tf_class {
} }
member_method { member_method {
name: "__init__" name: "__init__"
argspec: "args=[\'self\', \'cell\'], varargs=None, keywords=None, defaults=None" argspec: "args=[\'self\', \'cell\', \'residual_fn\'], varargs=None, keywords=None, defaults=[\'None\'], "
} }
member_method { member_method {
name: "add_loss" name: "add_loss"

View File

@ -17,6 +17,7 @@
# Common Bash functions used by build scripts # Common Bash functions used by build scripts
COLOR_NC='\033[0m' COLOR_NC='\033[0m'
COLOR_LIGHT_GRAY='\033[0;37m'
COLOR_GREEN='\033[0;32m' COLOR_GREEN='\033[0;32m'
COLOR_RED='\033[0;31m' COLOR_RED='\033[0;31m'

View File

@ -56,7 +56,7 @@ else
fi fi
pushd "${CI_TENSORFLOW_SUBMODULE_PATH:-.}" pushd "${CI_TENSORFLOW_SUBMODULE_PATH:-.}"
yes "" | ./configure $PYTHON_BIN_PATH configure.py
popd popd
# Gather and print build information # Gather and print build information

View File

@ -73,6 +73,9 @@ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "${SCRIPT_DIR}/builds_common.sh" source "${SCRIPT_DIR}/builds_common.sh"
SKIP_RETURN_CODE=112
# Get the command line arguments # Get the command line arguments
CONTAINER_TYPE=$( echo "$1" | tr '[:upper:]' '[:lower:]' ) CONTAINER_TYPE=$( echo "$1" | tr '[:upper:]' '[:lower:]' )
shift shift
@ -310,6 +313,13 @@ create_activate_virtualenv_and_install_tensorflow() {
# Smoke test of tensorflow install in clean virtualenv # Smoke test of tensorflow install in clean virtualenv
################################################################################ ################################################################################
do_clean_virtualenv_smoke_test() { do_clean_virtualenv_smoke_test() {
if [[ -n "${NO_TEST_ON_INSTALL}" ]] &&
[[ "${NO_TEST_ON_INSTALL}" != "0" ]]; then
echo "NO_TEST_ON_INSTALL=${NO_TEST_ON_INSTALL}:"
echo " Skipping smoke test of tensorflow install in clean virtualenv"
return ${SKIP_RETURN_CODE}
fi
CLEAN_VENV_DIR="${PIP_TEST_ROOT}/venv_clean" CLEAN_VENV_DIR="${PIP_TEST_ROOT}/venv_clean"
create_activate_virtualenv_and_install_tensorflow --clean \ create_activate_virtualenv_and_install_tensorflow --clean \
"${CLEAN_VENV_DIR}" "${WHL_PATH}" "${CLEAN_VENV_DIR}" "${WHL_PATH}"
@ -361,6 +371,7 @@ do_virtualenv_pip_test() {
[[ "${NO_TEST_ON_INSTALL}" != "0" ]]; then [[ "${NO_TEST_ON_INSTALL}" != "0" ]]; then
echo "NO_TEST_ON_INSTALL=${NO_TEST_ON_INSTALL}:" echo "NO_TEST_ON_INSTALL=${NO_TEST_ON_INSTALL}:"
echo " Skipping ALL Python unit tests on install" echo " Skipping ALL Python unit tests on install"
return ${SKIP_RETURN_CODE}
else else
# Call run_pip_tests.sh to perform test-on-install # Call run_pip_tests.sh to perform test-on-install
"${SCRIPT_DIR}/run_pip_tests.sh" --virtualenv ${GPU_FLAG} ${MAC_FLAG} "${SCRIPT_DIR}/run_pip_tests.sh" --virtualenv ${GPU_FLAG} ${MAC_FLAG}
@ -379,6 +390,7 @@ do_virtualenv_oss_serial_pip_test() {
[[ "${NO_TEST_ON_INSTALL}" != "0" ]]; then [[ "${NO_TEST_ON_INSTALL}" != "0" ]]; then
echo "NO_TEST_ON_INSTALL=${NO_TEST_ON_INSTALL}:" echo "NO_TEST_ON_INSTALL=${NO_TEST_ON_INSTALL}:"
echo " Skipping Python unit tests on install tagged with oss_serial" echo " Skipping Python unit tests on install tagged with oss_serial"
return ${SKIP_RETURN_CODE}
else else
# Call run_pip_tests.sh to perform test-on-install # Call run_pip_tests.sh to perform test-on-install
"${SCRIPT_DIR}/run_pip_tests.sh" \ "${SCRIPT_DIR}/run_pip_tests.sh" \
@ -402,6 +414,7 @@ do_test_user_ops() {
fi fi
else else
echo "Skipping user-op test-on-install due to DO_TEST_USER_OPS = ${DO_TEST_USER_OPS}" echo "Skipping user-op test-on-install due to DO_TEST_USER_OPS = ${DO_TEST_USER_OPS}"
return ${SKIP_RETURN_CODE}
fi fi
} }
@ -424,6 +437,7 @@ do_test_tfdbg_binaries() {
popd popd
else else
echo "Skipping test of tfdbg binaries due to DO_TEST_TFDBG_BINARIES = ${DO_TEST_TFDBG_BINARIES}" echo "Skipping test of tfdbg binaries due to DO_TEST_TFDBG_BINARIES = ${DO_TEST_TFDBG_BINARIES}"
return ${SKIP_RETURN_CODE}
fi fi
} }
@ -439,6 +453,7 @@ do_test_tutorials() {
fi fi
else else
echo "Skipping tutorial tests-on-install due to DO_TEST_TUTORIALS = ${DO_TEST_TUTORIALS}" echo "Skipping tutorial tests-on-install due to DO_TEST_TUTORIALS = ${DO_TEST_TUTORIALS}"
return ${SKIP_RETURN_CODE}
fi fi
} }
@ -455,6 +470,7 @@ do_ffmpeg_integration_test() {
fi fi
else else
echo "Skipping ffmpeg integration due to DO_INTEGRATION_TESTS = ${DO_INTEGRATION_TESTS}" echo "Skipping ffmpeg integration due to DO_INTEGRATION_TESTS = ${DO_INTEGRATION_TESTS}"
return ${SKIP_RETURN_CODE}
fi fi
} }
@ -468,6 +484,7 @@ PIP_TASKS_DESC=("Smoke test of pip install in clean virtualenv" "PIP tests in vi
COUNTER=0 COUNTER=0
FAIL_COUNTER=0 FAIL_COUNTER=0
PASS_COUNTER=0 PASS_COUNTER=0
SKIP_COUNTER=0
while [[ ${COUNTER} -lt "${#PIP_TASKS[@]}" ]]; do while [[ ${COUNTER} -lt "${#PIP_TASKS[@]}" ]]; do
INDEX=COUNTER INDEX=COUNTER
((INDEX++)) ((INDEX++))
@ -480,7 +497,9 @@ while [[ ${COUNTER} -lt "${#PIP_TASKS[@]}" ]]; do
${PIP_TASKS[COUNTER]} ${PIP_TASKS[COUNTER]}
RESULT=$? RESULT=$?
if [[ ${RESULT} != "0" ]]; then if [[ ${RESULT} == ${SKIP_RETURN_CODE} ]]; then
((SKIP_COUNTER++))
elif [[ ${RESULT} != "0" ]]; then
((FAIL_COUNTER++)) ((FAIL_COUNTER++))
else else
((PASS_COUNTER++)) ((PASS_COUNTER++))
@ -503,7 +522,9 @@ while [[ ${COUNTER} -lt "${#PIP_TASKS[@]}" ]]; do
((INDEX++)) ((INDEX++))
echo "${INDEX}. ${PIP_TASKS[COUNTER]}: ${PIP_TASKS_DESC[COUNTER]}" echo "${INDEX}. ${PIP_TASKS[COUNTER]}: ${PIP_TASKS_DESC[COUNTER]}"
if [[ ${STEP_EXIT_CODES[COUNTER]} == "0" ]]; then if [[ ${STEP_EXIT_CODES[COUNTER]} == ${SKIP_RETURN_CODE} ]]; then
printf " ${COLOR_LIGHT_GRAY}SKIP${COLOR_NC}\n"
elif [[ ${STEP_EXIT_CODES[COUNTER]} == "0" ]]; then
printf " ${COLOR_GREEN}PASS${COLOR_NC}\n" printf " ${COLOR_GREEN}PASS${COLOR_NC}\n"
else else
printf " ${COLOR_RED}FAIL${COLOR_NC}\n" printf " ${COLOR_RED}FAIL${COLOR_NC}\n"
@ -513,7 +534,7 @@ while [[ ${COUNTER} -lt "${#PIP_TASKS[@]}" ]]; do
done done
echo echo
echo "${FAIL_COUNTER} failed; ${PASS_COUNTER} passed." echo "${SKIP_COUNTER} skipped; ${FAIL_COUNTER} failed; ${PASS_COUNTER} passed."
echo echo
if [[ ${FAIL_COUNTER} == "0" ]]; then if [[ ${FAIL_COUNTER} == "0" ]]; then

View File

@ -120,7 +120,7 @@ else
fi fi
export TF_NEED_CUDA=$IS_GPU export TF_NEED_CUDA=$IS_GPU
yes "" | ./configure ${PYTHON_BIN_PATH} configure.py
# Figure out how many concurrent tests we can run and do run the tests. # Figure out how many concurrent tests we can run and do run the tests.
BAZEL_PARALLEL_TEST_FLAGS="" BAZEL_PARALLEL_TEST_FLAGS=""

View File

@ -30,7 +30,7 @@ export TF_NEED_HDFS=0
export TF_NEED_CUDA=0 export TF_NEED_CUDA=0
# Only running cc tests, python version does not matter. # Only running cc tests, python version does not matter.
export PYTHON_BIN_PATH=`which python` export PYTHON_BIN_PATH=`which python`
yes "" | ./configure $PYTHON_BIN_PATH configure.py
# Run bazel test command. Double test timeouts to avoid flakes. # Run bazel test command. Double test timeouts to avoid flakes.
bazel test --test_tag_filters=-no_oss,-gpu,-benchmark-test --test_lang_filters=cc -k \ bazel test --test_tag_filters=-no_oss,-gpu,-benchmark-test --test_lang_filters=cc -k \

View File

@ -29,7 +29,7 @@ export TF_NEED_GCP=0
export TF_NEED_HDFS=0 export TF_NEED_HDFS=0
export TF_NEED_CUDA=0 export TF_NEED_CUDA=0
export PYTHON_BIN_PATH=`which python2` export PYTHON_BIN_PATH=`which python2`
yes "" | ./configure $PYTHON_BIN_PATH configure.py
# Run bazel test command. Double test timeouts to avoid flakes. # Run bazel test command. Double test timeouts to avoid flakes.
bazel test --test_tag_filters=-no_oss,-oss_serial,-gpu,-benchmark-test --test_lang_filters=py -k \ bazel test --test_tag_filters=-no_oss,-oss_serial,-gpu,-benchmark-test --test_lang_filters=py -k \

View File

@ -29,7 +29,7 @@ export TF_NEED_GCP=0
export TF_NEED_HDFS=0 export TF_NEED_HDFS=0
export TF_NEED_CUDA=0 export TF_NEED_CUDA=0
export PYTHON_BIN_PATH=`which python3` export PYTHON_BIN_PATH=`which python3`
yes "" | ./configure $PYTHON_BIN_PATH configure.py
# Run bazel test command. Double test timeouts to avoid flakes. # Run bazel test command. Double test timeouts to avoid flakes.
bazel test --test_tag_filters=-no_oss,-oss_serial,-gpu,-benchmark-test -k \ bazel test --test_tag_filters=-no_oss,-oss_serial,-gpu,-benchmark-test -k \

View File

@ -29,7 +29,7 @@ export TF_NEED_GCP=0
export TF_NEED_HDFS=0 export TF_NEED_HDFS=0
export TF_NEED_CUDA=0 export TF_NEED_CUDA=0
export PYTHON_BIN_PATH=`which python3` export PYTHON_BIN_PATH=`which python3`
yes "" | ./configure $PYTHON_BIN_PATH configure.py
# Run bazel test command. Double test timeouts to avoid flakes. # Run bazel test command. Double test timeouts to avoid flakes.
bazel test --test_tag_filters=-no_oss,-oss_serial,-gpu,-benchmark-test --test_lang_filters=py -k \ bazel test --test_tag_filters=-no_oss,-oss_serial,-gpu,-benchmark-test --test_lang_filters=py -k \

View File

@ -32,7 +32,7 @@ export PYTHON_BIN_PATH=`which python3`
export TF_NEED_CUDA=1 export TF_NEED_CUDA=1
export TF_CUDA_COMPUTE_CAPABILITIES=3.7 export TF_CUDA_COMPUTE_CAPABILITIES=3.7
yes "" | ./configure $PYTHON_BIN_PATH configure.py
# Run bazel test command. Double test timeouts to avoid flakes. # Run bazel test command. Double test timeouts to avoid flakes.
bazel test --config=cuda --test_tag_filters=-no_oss,-oss_serial,-no_gpu,-benchmark-test -k \ bazel test --config=cuda --test_tag_filters=-no_oss,-oss_serial,-no_gpu,-benchmark-test -k \

View File

@ -32,7 +32,7 @@ export PYTHON_BIN_PATH=`which python3`
export TF_NEED_CUDA=1 export TF_NEED_CUDA=1
export TF_CUDA_COMPUTE_CAPABILITIES=3.7 export TF_CUDA_COMPUTE_CAPABILITIES=3.7
yes "" | ./configure $PYTHON_BIN_PATH configure.py
# Run bazel test command. Double test timeouts to avoid flakes. # Run bazel test command. Double test timeouts to avoid flakes.
bazel test --config=cuda --test_tag_filters=-no_oss,-oss_serial,-no_gpu,-benchmark-test -k \ bazel test --config=cuda --test_tag_filters=-no_oss,-oss_serial,-no_gpu,-benchmark-test -k \

View File

@ -30,7 +30,7 @@ export TF_NEED_GCP=0
export TF_NEED_HDFS=0 export TF_NEED_HDFS=0
export TF_NEED_CUDA=0 export TF_NEED_CUDA=0
export PYTHON_BIN_PATH=$(which python2) export PYTHON_BIN_PATH=$(which python2)
yes "" | ./configure $PYTHON_BIN_PATH configure.py
which bazel which bazel
bazel test --test_tag_filters=-no_oss,-gpu,-benchmark-test,-nomac \ bazel test --test_tag_filters=-no_oss,-gpu,-benchmark-test,-nomac \
--test_timeout 300,450,1200,3600 \ --test_timeout 300,450,1200,3600 \

View File

@ -33,7 +33,7 @@ export TF_NEED_CUDA=1
export TF_ENABLE_XLA=1 export TF_ENABLE_XLA=1
export TF_CUDA_COMPUTE_CAPABILITIES=3.7 export TF_CUDA_COMPUTE_CAPABILITIES=3.7
yes "" | ./configure $PYTHON_BIN_PATH configure.py
# Run bazel test command. Double test timeouts to avoid flakes. # Run bazel test command. Double test timeouts to avoid flakes.
bazel test --config=cuda --test_tag_filters=-no_gpu,-benchmark-test -k \ bazel test --config=cuda --test_tag_filters=-no_gpu,-benchmark-test -k \

View File

@ -899,6 +899,7 @@ cc_library(
"include/llvm/Target/ARM/InstPrinter/*.h", "include/llvm/Target/ARM/InstPrinter/*.h",
"include/llvm/Target/ARM/InstPrinter/*.def", "include/llvm/Target/ARM/InstPrinter/*.def",
"include/llvm/Target/ARM/InstPrinter/*.inc", "include/llvm/Target/ARM/InstPrinter/*.inc",
"lib/Target/ARM/*.h",
"lib/Target/ARM/InstPrinter/*.h", "lib/Target/ARM/InstPrinter/*.h",
]), ]),
copts = ["-Iexternal/llvm/lib/Target/ARM"], copts = ["-Iexternal/llvm/lib/Target/ARM"],
@ -1206,6 +1207,7 @@ cc_library(
"lib/IR/*.h", "lib/IR/*.h",
]), ]),
hdrs = glob([ hdrs = glob([
"include/llvm/Analysis/*.def",
"include/llvm/IR/*.h", "include/llvm/IR/*.h",
"include/llvm/IR/*.def", "include/llvm/IR/*.def",
"include/llvm/IR/*.inc", "include/llvm/IR/*.inc",
@ -2022,6 +2024,8 @@ cc_library(
"lib/Target/*.h", "lib/Target/*.h",
]), ]),
hdrs = glob([ hdrs = glob([
"include/llvm/CodeGen/*.h",
"include/llvm/CodeGen/*.def",
"include/llvm/Target/*.h", "include/llvm/Target/*.h",
"include/llvm/Target/*.def", "include/llvm/Target/*.def",
"include/llvm/Target/*.inc", "include/llvm/Target/*.inc",