END_PUBLIC --- Commitd0f53f77f
authored by Penghao Cen<scorpiocph@gmail.com> Committed by Shanqing Cai<cais@google.com>: Minor fix typo (#11323) --- Commit02fcf564e
authored by Chris Song<sjhshy@gmail.com> Committed by Chris Song<sjhshy@gmail.com>: Fix misspells. --- Commit764c9b6b4
authored by Louis Tiao<ltiao@users.noreply.github.com> Committed by GitHub<noreply@github.com>: Fixed typo in docstring --- Commitf8cd1283e
authored by Shanqing Cai<cais@google.com> Committed by Shanqing Cai<cais@google.com>: Chaser --- Commit01383b946
authored by Shanqing Cai<cais@google.com> Committed by Shanqing Cai<cais@google.com>: Adapt TensorFlowTestCase.setUp() to new reset_default_graph() semantics Avoid calling reset_default_graph() directly to prevent exceptions in cases where test methods error out from within nested graph contexts, which can leave _default_graph_stack non-empty in certain Python versions. --- Commit0ffc37890
authored by Amit Patankar<amitpatankar@google.com> Committed by Amit Patankar<amitpatankar@google.com>: Removing second declaration of functions. --- Commitf9c9cacb0
authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Refactor ElementalIrEmitter's slice index finding code into IrArray::Index::SourceIndexOfSlice(). PiperOrigin-RevId: 161140653 --- Commitba297aec9
authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Update ops-related pbtxt files. PiperOrigin-RevId: 161138258 --- Commit68d666737
authored by Alexandre Passos<apassos@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fixes a reentrant lock issue with tensors using ndarray memory which uses tensor memory. PiperOrigin-RevId: 161137788 --- Commita2ee8bca3
authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add support for int8 x int8 -> int32 matrix multiplication via cublasGemmEx to stream_executor. PiperOrigin-RevId: 161137741 --- Commit755fa7b50
authored by Mark Daoust<markdaoust@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Block generate_test, and docs generating from running in python3. - Doc generation is currently unsupported in python3 - These both end in errors in python 3.5.1+ PiperOrigin-RevId: 161137467 --- Commit97cbcac45
authored by Peter Hawkins<phawkins@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [TF:XLA] Fix failure in functionalize_control_flow rewrite for Enter nodes that are unused. Make sure we ignore such nodes without producing an error. PiperOrigin-RevId: 161136545 --- Commitdabcb60bc
authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: [XLA] Add reasonable error messages to Builder::Build for bad parameter numbers. PiperOrigin-RevId: 161136262 --- Commit0cbd249e8
authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Add complex tensors support to `matrix_determinant`. PiperOrigin-RevId: 161132422 --- Commit335f1f14d
authored by A. Unique TensorFlower<gardener@tensorflow.org> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Extend static shape inference for SparseTensors with dense_shapes constructed using slicing. PiperOrigin-RevId: 161132391 --- Commit53604916e
authored by Jianwei Xie<xiejw@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Fixed the missing labels test in TPUEstimator. PiperOrigin-RevId: 161131282 --- Commit9f57dc8dd
authored by Bruno Rosa<bruno.rosa@eldorado.org.br> Committed by Bruno Rosa<bruno.rosa@eldorado.org.br>: Use mcpu instead of march for ppc64le march is not support by gcc on ppc64le --- Commit7d5c74a9c
authored by Skye Wanderman-Milne<skyewm@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Move duplicate detection logic from Graph to FunctionLibraryDefinition Turns out this is more useful, since there are many function libraries that don't belong to a graph. This will be used in a future change. Note that this maintains the current behavior of Graph. In addition, updates FunctionDefsEqual() to handle unset attr entries (I ran into this when using this in said future change). PiperOrigin-RevId: 161126628 --- Commit2caec3af1
authored by Shanqing Cai<cais@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Disable more timeseries py tests failing in OSS PIP GPU builds PiperOrigin-RevId: 161124799 --- Commit0b5cce367
authored by Eugene Brevdo<ebrevdo@google.com> Committed by TensorFlower Gardener<gardener@tensorflow.org>: Get TopK op working on GPU again. Extend using cub's radix sort. 1. Undo rollback of Andreas Kirsch's initial implementation. 2. Use cub segmented radix sort if Andreas' heap-based impl for large k and small num_cols (thresholds of k=100, n=1000 determined empirically). 3. Use cub segmented radix sort if k == num_cols (this case is always faster). 4. Added benchmarks. Benchmarks show that the GPU implementation is up to 3x slower for small k but can be 10x faster for large num_cols and k. Benchmarks: Benchmark: m_128_n_10_k_5_use_gpu_False wall_time: 0.000166 s Throughput: 0.0077 GB/s Benchmark: m_128_n_10_k_5_use_gpu_True wall_time: 0.000796 s Throughput: 0.00161 GB/s Benchmark: m_128_n_10_k_9_use_gpu_False wall_time: 0.00017 s Throughput: 0.00751 GB/s Benchmark: m_128_n_10_k_9_use_gpu_True wall_time: 0.000796 s Throughput: 0.00161 GB/s Benchmark: m_128_n_10_k_10_use_gpu_False wall_time: 0.00017 s Throughput: 0.00753 GB/s Benchmark: m_128_n_10_k_10_use_gpu_True wall_time: 0.000775 s Throughput: 0.00165 GB/s Benchmark: m_128_n_100_k_1_use_gpu_False wall_time: 0.000155 s Throughput: 0.0826 GB/s Benchmark: m_128_n_100_k_1_use_gpu_True wall_time: 0.000796 s Throughput: 0.0161 GB/s Benchmark: m_128_n_100_k_50_use_gpu_False wall_time: 0.000247 s Throughput: 0.0519 GB/s Benchmark: m_128_n_100_k_50_use_gpu_True wall_time: 0.0008 s Throughput: 0.016 GB/s Benchmark: m_128_n_100_k_99_use_gpu_False wall_time: 0.000261 s Throughput: 0.049 GB/s Benchmark: m_128_n_100_k_99_use_gpu_True wall_time: 0.000794 s Throughput: 0.0161 GB/s Benchmark: m_128_n_100_k_100_use_gpu_False wall_time: 0.000239 s Throughput: 0.0536 GB/s Benchmark: m_128_n_100_k_100_use_gpu_True wall_time: 0.000777 s Throughput: 0.0165 GB/s Benchmark: m_128_n_1000_k_1_use_gpu_False wall_time: 0.000324 s Throughput: 0.395 GB/s Benchmark: m_128_n_1000_k_1_use_gpu_True wall_time: 0.000916 s Throughput: 0.14 GB/s Benchmark: m_128_n_1000_k_10_use_gpu_False wall_time: 0.00042 s Throughput: 0.305 GB/s Benchmark: m_128_n_1000_k_10_use_gpu_True wall_time: 0.000902 s Throughput: 0.142 GB/s Benchmark: m_128_n_1000_k_500_use_gpu_False wall_time: 0.0011 s Throughput: 0.116 GB/s Benchmark: m_128_n_1000_k_500_use_gpu_True wall_time: 0.00097 s Throughput: 0.132 GB/s Benchmark: m_128_n_1000_k_990_use_gpu_False wall_time: 0.00133 s Throughput: 0.0962 GB/s Benchmark: m_128_n_1000_k_990_use_gpu_True wall_time: 0.000993 s Throughput: 0.129 GB/s Benchmark: m_128_n_1000_k_1000_use_gpu_False wall_time: 0.00102 s Throughput: 0.126 GB/s Benchmark: m_128_n_1000_k_1000_use_gpu_True wall_time: 0.000964 s Throughput: 0.133 GB/s Benchmark: m_128_n_10000_k_10_use_gpu_False wall_time: 0.002 s Throughput: 0.64 GB/s Benchmark: m_128_n_10000_k_10_use_gpu_True wall_time: 0.00288 s Throughput: 0.445 GB/s Benchmark: m_128_n_10000_k_100_use_gpu_False wall_time: 0.00233 s Throughput: 0.549 GB/s Benchmark: m_128_n_10000_k_100_use_gpu_True wall_time: 0.00325 s Throughput: 0.394 GB/s Benchmark: m_128_n_10000_k_5000_use_gpu_False wall_time: 0.0127 s Throughput: 0.101 GB/s Benchmark: m_128_n_10000_k_5000_use_gpu_True wall_time: 0.00381 s Throughput: 0.336 GB/s Benchmark: m_128_n_10000_k_9900_use_gpu_False wall_time: 0.015 s Throughput: 0.0853 GB/s Benchmark: m_128_n_10000_k_9900_use_gpu_True wall_time: 0.00438 s Throughput: 0.292 GB/s Benchmark: m_128_n_10000_k_10000_use_gpu_False wall_time: 0.0104 s Throughput: 0.123 GB/s Benchmark: m_128_n_10000_k_10000_use_gpu_True wall_time: 0.00427 s Throughput: 0.3 GB/s Benchmark: m_128_n_100000_k_100_use_gpu_False wall_time: 0.0148 s Throughput: 0.865 GB/s Benchmark: m_128_n_100000_k_100_use_gpu_True wall_time: 0.0262 s Throughput: 0.488 GB/s Benchmark: m_128_n_100000_k_1000_use_gpu_False wall_time: 0.0201 s Throughput: 0.636 GB/s Benchmark: m_128_n_100000_k_1000_use_gpu_True wall_time: 0.0263 s Throughput: 0.486 GB/s Benchmark: m_128_n_100000_k_50000_use_gpu_False wall_time: 0.214 s Throughput: 0.0599 GB/s Benchmark: m_128_n_100000_k_50000_use_gpu_True wall_time: 0.0322 s Throughput: 0.398 GB/s Benchmark: m_128_n_100000_k_99000_use_gpu_False wall_time: 0.262 s Throughput: 0.0489 GB/s Benchmark: m_128_n_100000_k_99000_use_gpu_True wall_time: 0.0377 s Throughput: 0.34 GB/s Benchmark: m_128_n_100000_k_100000_use_gpu_False wall_time: 0.118 s Throughput: 0.108 GB/s Benchmark: m_128_n_100000_k_100000_use_gpu_True wall_time: 0.0365 s Throughput: 0.351 GB/s END_PUBLIC BEGIN_PUBLIC BEGIN_PUBLIC Automated g4 rollback of changelist 157169178 PiperOrigin-RevId: 161476569
95 lines
4.4 KiB
Smarty
Executable File
95 lines
4.4 KiB
Smarty
Executable File
#!/usr/bin/env python
|
|
|
|
import os
|
|
import sys
|
|
import tempfile
|
|
from subprocess import call, Popen, PIPE
|
|
|
|
CPU_CXX_COMPILER = ('%{host_cxx_compiler}')
|
|
CPU_C_COMPILER = ('%{host_c_compiler}')
|
|
|
|
CURRENT_DIR = os.path.dirname(sys.argv[0])
|
|
COMPUTECPP_ROOT = CURRENT_DIR + '/../sycl/'
|
|
COMPUTECPP_DRIVER= COMPUTECPP_ROOT + 'bin/compute++'
|
|
COMPUTECPP_INCLUDE = COMPUTECPP_ROOT + 'include'
|
|
|
|
def main():
|
|
remove_flags = ('-Wl,--no-undefined', '-Wno-unused-but-set-variable', '-Wignored-attributes')
|
|
# remove -fsanitize-coverage from string with g++
|
|
if 'g++' in CPU_CXX_COMPILER:
|
|
remove_flags += ('-fsanitize-coverage',)
|
|
compiler_flags = [flag for flag in sys.argv[1:] if not flag.startswith(remove_flags)]
|
|
|
|
output_file_index = compiler_flags.index('-o') + 1
|
|
output_file_name = compiler_flags[output_file_index]
|
|
|
|
if output_file_index == 1:
|
|
# we are linking
|
|
return call([CPU_CXX_COMPILER] + compiler_flags + ['-Wl,--no-undefined'])
|
|
|
|
# find what we compile
|
|
compiling_cpp = False
|
|
if '-c' in compiler_flags:
|
|
compiled_file_index = compiler_flags.index('-c') + 1
|
|
compiled_file_name = compiler_flags[compiled_file_index]
|
|
compiling_cpp = compiled_file_name.endswith(('.cc', '.c++', '.cpp', '.CPP', '.C', '.cxx'))
|
|
|
|
# add -D_GLIBCXX_USE_CXX11_ABI=0 to the command line if you have custom installation of GCC/Clang
|
|
compiler_flags = compiler_flags + ['-DEIGEN_USE_SYCL=1', '-DTENSORFLOW_USE_SYCL', '-DEIGEN_HAS_C99_MATH']
|
|
|
|
if not compiling_cpp:
|
|
# compile for C
|
|
return call([CPU_C_COMPILER] + compiler_flags)
|
|
|
|
# create a blacklist of folders that will be skipped when compiling with ComputeCpp
|
|
skip_extensions = [".cu.cc"]
|
|
skip_folders = ["tensorflow/compiler", "tensorflow/docs_src", "third_party", "external", "hexagon"]
|
|
skip_folders = [(folder + '/') for folder in skip_folders]
|
|
# if compiling external project skip computecpp
|
|
if any(compiled_file_name.endswith(_ext) for _ext in skip_extensions) or any(_folder in output_file_name for _folder in skip_folders):
|
|
return call([CPU_CXX_COMPILER] + compiler_flags)
|
|
|
|
# this is an optimisation that will check if compiled file has to be compiled with ComputeCpp
|
|
flags_without_output = list(compiler_flags)
|
|
del flags_without_output[output_file_index] # remove output_file_name
|
|
del flags_without_output[output_file_index - 1] # remove '-o'
|
|
# create preprocessed of the file and store it for later use
|
|
pipe = Popen([CPU_CXX_COMPILER] + flags_without_output + ["-E"], stdout=PIPE)
|
|
preprocessed_file_str = pipe.communicate()[0]
|
|
if pipe.returncode != 0:
|
|
return pipe.returncode
|
|
|
|
# check if it has parallel_for in it
|
|
if not '.parallel_for' in preprocessed_file_str:
|
|
# call CXX compiler like usual
|
|
with tempfile.NamedTemporaryFile(suffix=".ii") as preprocessed_file: # Force '.ii' extension so that g++ does not preprocess the file again
|
|
preprocessed_file.write(preprocessed_file_str)
|
|
preprocessed_file.flush()
|
|
compiler_flags[compiled_file_index] = preprocessed_file.name
|
|
return call([CPU_CXX_COMPILER] + compiler_flags)
|
|
del preprocessed_file_str # save some memory as this string can be quite big
|
|
|
|
filename, file_extension = os.path.splitext(output_file_name)
|
|
bc_out = filename + '.sycl'
|
|
|
|
# strip asan for the device
|
|
computecpp_device_compiler_flags = ['-sycl-compress-name', '-Wno-unused-variable', '-Wno-c++11-narrowing',
|
|
'-I', COMPUTECPP_INCLUDE, '-isystem', COMPUTECPP_INCLUDE,
|
|
'-std=c++11', '-sycl', '-emit-llvm', '-no-serial-memop',
|
|
'-Xclang', '-cl-denorms-are-zero', '-Xclang', '-cl-fp32-correctly-rounded-divide-sqrt']
|
|
# disable flags enabling SIMD instructions
|
|
computecpp_device_compiler_flags += [flag for flag in compiler_flags if \
|
|
not any(x in flag.lower() for x in ('-fsanitize', '-fno-canonical-system-headers', '=native', '=core2', 'msse', 'vectorize', 'mavx', 'mmmx', 'm3dnow', 'fma'))]
|
|
|
|
x = call([COMPUTECPP_DRIVER] + computecpp_device_compiler_flags)
|
|
if x == 0:
|
|
# dont want that in case of compiling with computecpp first
|
|
host_compiler_flags = [flag for flag in compiler_flags if (not flag.startswith(('-MF', '-MD',)) and not '.d' in flag)]
|
|
host_compiler_flags[host_compiler_flags.index('-c')] = "--include"
|
|
host_compiler_flags = ['-xc++', '-Wno-unused-variable', '-I', COMPUTECPP_INCLUDE, '-c', bc_out] + host_compiler_flags
|
|
x = call([CPU_CXX_COMPILER] + host_compiler_flags)
|
|
return x
|
|
|
|
if __name__ == '__main__':
|
|
sys.exit(main())
|