STT-tensorflow/tensorflow/python/ops/image_ops.py
Vijay Vasudevan ddd4aaf528 TensorFlow: upstream changes to git.
Change 109695551
	Update FAQ
Change 109694725
	Add a gradient for resize_bilinear op.
Change 109694505
	Don't mention variables module in docs

	variables.Variable should be tf.Variable.
Change 109658848
	Adding an option to create a new thread-pool for each session.
Change 109640570

	Take the snapshot of stream-executor.
	+ Expose an interface for scratch space allocation in the interface.

Change 109638559
	Let image_summary accept uint8 input

	This allows users to do their own normalization / scaling if the default
	(very weird) behavior of image_summary is undesired.

	This required a slight tweak to fake_input.cc to make polymorphically typed
	fake inputs infer if their type attr is not set but has a default.

	Unfortunately, adding a second valid type to image_summary *disables* automatic
	implicit conversion from np.float64 to tf.float32, so this change is slightly
	backwards incompatible.
Change 109636969
	Add serialization operations for SparseTensor.
Change 109636644
	Update generated Op docs.
Change 109634899
	TensorFlow: add a markdown file for producing release notes for our
	releases.  Seed with 0.5.0 with a boring but accurate description.
Change 109634502
	Let histogram_summary take any realnumbertype

	It used to take only floats, not it understands ints.
Change 109634434
	TensorFlow: update locations where we mention python 3 support, update
	them to current truth.
Change 109632108
	Move HSV <> RGB conversions, grayscale conversions, and adjust_* ops back to tensorflow
	- make GPU-capable version of RGBToHSV and HSVToRGB, allows only float input/output
	- change docs to reflect new size constraints
	- change HSV format to be [0,1] for all components
	- add automatic dtype conversion for all adjust_* and grayscale conversion ops
	- fix up docs
Change 109631077
	Improve optimizer exceptions

	1. grads_and_vars is now a tuple, so must be wrapped when passed to format.
	2. Use '%r' instead of '%s' for dtype formatting

Base CL: 109697989
2015-12-08 09:58:59 -08:00

1133 lines
38 KiB
Python

# Copyright 2015 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
# pylint: disable=g-short-docstring-punctuation
"""## Encoding and Decoding
TensorFlow provides Ops to decode and encode JPEG and PNG formats. Encoded
images are represented by scalar string Tensors, decoded images by 3-D uint8
tensors of shape `[height, width, channels]`.
The encode and decode Ops apply to one image at a time. Their input and output
are all of variable size. If you need fixed size images, pass the output of
the decode Ops to one of the cropping and resizing Ops.
Note: The PNG encode and decode Ops support RGBA, but the conversions Ops
presently only support RGB, HSV, and GrayScale. Presently, the alpha channel has
to be stripped from the image and re-attached using slicing ops.
@@decode_jpeg
@@encode_jpeg
@@decode_png
@@encode_png
## Resizing
The resizing Ops accept input images as tensors of several types. They always
output resized images as float32 tensors.
The convenience function [`resize_images()`](#resize_images) supports both 4-D
and 3-D tensors as input and output. 4-D tensors are for batches of images,
3-D tensors for individual images.
Other resizing Ops only support 4-D batches of images as input:
[`resize_area`](#resize_area), [`resize_bicubic`](#resize_bicubic),
[`resize_bilinear`](#resize_bilinear),
[`resize_nearest_neighbor`](#resize_nearest_neighbor).
Example:
```python
# Decode a JPG image and resize it to 299 by 299 using default method.
image = tf.image.decode_jpeg(...)
resized_image = tf.image.resize_images(image, 299, 299)
```
@@resize_images
@@resize_area
@@resize_bicubic
@@resize_bilinear
@@resize_nearest_neighbor
## Cropping
@@resize_image_with_crop_or_pad
@@pad_to_bounding_box
@@crop_to_bounding_box
@@random_crop
@@extract_glimpse
## Flipping and Transposing
@@flip_up_down
@@random_flip_up_down
@@flip_left_right
@@random_flip_left_right
@@transpose_image
## Converting Between Colorspaces.
Image ops work either on individual images or on batches of images, depending on
the shape of their input Tensor.
If 3-D, the shape is `[height, width, channels]`, and the Tensor represents one
image. If 4-D, the shape is `[batch_size, height, width, channels]`, and the
Tensor represents `batch_size` images.
Currently, `channels` can usefully be 1, 2, 3, or 4. Single-channel images are
grayscale, images with 3 channels are encoded as either RGB or HSV. Images
with 2 or 4 channels include an alpha channel, which has to be stripped from the
image before passing the image to most image processing functions (and can be
re-attached later).
Internally, images are either stored in as one `float32` per channel per pixel
(implicitly, values are assumed to lie in `[0,1)`) or one `uint8` per channel
per pixel (values are assumed to lie in `[0,255]`).
Tensorflow can convert between images in RGB or HSV. The conversion functions
work only on float images, so you need to convert images in other formats using
[`convert_image_dtype`](#convert-image-dtype).
Example:
```python
# Decode an image and convert it to HSV.
rgb_image = tf.decode_png(..., channels=3)
rgb_image_float = tf.convert_image_dtype(rgb_image, tf.float32)
hsv_image = tf.hsv_to_rgb(rgb_image)
```
@@rgb_to_grayscale
@@grayscale_to_rgb
@@hsv_to_rgb
@@rgb_to_hsv
@@convert_image_dtype
## Image Adjustments
TensorFlow provides functions to adjust images in various ways: brightness,
contrast, hue, and saturation. Each adjustment can be done with predefined
parameters or with random parameters picked from predefined intervals. Random
adjustments are often useful to expand a training set and reduce overfitting.
If several adjustments are chained it is advisable to minimize the number of
redundant conversions by first converting the images to the most natural data
type and representation (RGB or HSV).
@@adjust_brightness
@@random_brightness
@@adjust_contrast
@@random_contrast
@@adjust_hue
@@random_hue
@@adjust_saturation
@@random_saturation
@@per_image_whitening
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
import tensorflow.python.platform
from tensorflow.python.framework import dtypes
from tensorflow.python.framework import ops
from tensorflow.python.framework import random_seed
from tensorflow.python.framework import tensor_shape
from tensorflow.python.framework import tensor_util
from tensorflow.python.ops import array_ops
from tensorflow.python.ops import clip_ops
from tensorflow.python.ops import common_shapes
from tensorflow.python.ops import constant_op
from tensorflow.python.ops import gen_image_ops
from tensorflow.python.ops import math_ops
from tensorflow.python.ops import random_ops
# pylint: disable=wildcard-import
from tensorflow.python.ops.gen_image_ops import *
from tensorflow.python.ops.gen_attention_ops import *
# pylint: enable=wildcard-import
ops.NoGradient('RandomCrop')
ops.NoGradient('RGBToHSV')
ops.NoGradient('HSVToRGB')
def _ImageDimensions(images):
"""Returns the dimensions of an image tensor.
Args:
images: 4-D Tensor of shape [batch, height, width, channels]
Returns:
list of integers [batch, height, width, channels]
"""
# A simple abstraction to provide names for each dimension. This abstraction
# should make it simpler to switch dimensions in the future (e.g. if we ever
# want to switch height and width.)
return images.get_shape().as_list()
def _Check3DImage(image):
"""Assert that we are working with properly shaped image.
Args:
image: 3-D Tensor of shape [height, width, channels]
Raises:
ValueError: if image.shape is not a [3] vector.
"""
if not image.get_shape().is_fully_defined():
raise ValueError('\'image\' must be fully defined.')
if image.get_shape().ndims != 3:
raise ValueError('\'image\' must be three-dimensional.')
if not all(x > 0 for x in image.get_shape()):
raise ValueError('all dims of \'image.shape\' must be > 0: %s' %
image.get_shape())
def _CheckAtLeast3DImage(image):
"""Assert that we are working with properly shaped image.
Args:
image: >= 3-D Tensor of size [*, height, width, depth]
Raises:
ValueError: if image.shape is not a [>= 3] vector.
"""
if not image.get_shape().is_fully_defined():
raise ValueError('\'image\' must be fully defined.')
if image.get_shape().ndims < 3:
raise ValueError('\'image\' must be at least three-dimensional.')
if not all(x > 0 for x in image.get_shape()):
raise ValueError('all dims of \'image.shape\' must be > 0: %s' %
image.get_shape())
def random_flip_up_down(image, seed=None):
"""Randomly flips an image vertically (upside down).
With a 1 in 2 chance, outputs the contents of `image` flipped along the first
dimension, which is `height`. Otherwise output the image as-is.
Args:
image: A 3-D tensor of shape `[height, width, channels].`
seed: A Python integer. Used to create a random seed. See
[`set_random_seed`](../../api_docs/python/constant_op.md#set_random_seed)
for behavior.
Returns:
A 3-D tensor of the same type and shape as `image`.
Raises:
ValueError: if the shape of `image` not supported.
"""
_Check3DImage(image)
uniform_random = random_ops.random_uniform([], 0, 1.0, seed=seed)
mirror = math_ops.less(array_ops.pack([uniform_random, 1.0, 1.0]), 0.5)
return array_ops.reverse(image, mirror)
def random_flip_left_right(image, seed=None):
"""Randomly flip an image horizontally (left to right).
With a 1 in 2 chance, outputs the contents of `image` flipped along the
second dimension, which is `width`. Otherwise output the image as-is.
Args:
image: A 3-D tensor of shape `[height, width, channels].`
seed: A Python integer. Used to create a random seed. See
[`set_random_seed`](../../api_docs/python/constant_op.md#set_random_seed)
for behavior.
Returns:
A 3-D tensor of the same type and shape as `image`.
Raises:
ValueError: if the shape of `image` not supported.
"""
_Check3DImage(image)
uniform_random = random_ops.random_uniform([], 0, 1.0, seed=seed)
mirror = math_ops.less(array_ops.pack([1.0, uniform_random, 1.0]), 0.5)
return array_ops.reverse(image, mirror)
def flip_left_right(image):
"""Flip an image horizontally (left to right).
Outputs the contents of `image` flipped along the second dimension, which is
`width`.
See also `reverse()`.
Args:
image: A 3-D tensor of shape `[height, width, channels].`
Returns:
A 3-D tensor of the same type and shape as `image`.
Raises:
ValueError: if the shape of `image` not supported.
"""
_Check3DImage(image)
return array_ops.reverse(image, [False, True, False])
def flip_up_down(image):
"""Flip an image horizontally (upside down).
Outputs the contents of `image` flipped along the first dimension, which is
`height`.
See also `reverse()`.
Args:
image: A 3-D tensor of shape `[height, width, channels].`
Returns:
A 3-D tensor of the same type and shape as `image`.
Raises:
ValueError: if the shape of `image` not supported.
"""
_Check3DImage(image)
return array_ops.reverse(image, [True, False, False])
def transpose_image(image):
"""Transpose an image by swapping the first and second dimension.
See also `transpose()`.
Args:
image: 3-D tensor of shape `[height, width, channels]`
Returns:
A 3-D tensor of shape `[width, height, channels]`
Raises:
ValueError: if the shape of `image` not supported.
"""
_Check3DImage(image)
return array_ops.transpose(image, [1, 0, 2], name='transpose_image')
def pad_to_bounding_box(image, offset_height, offset_width, target_height,
target_width):
"""Pad `image` with zeros to the specified `height` and `width`.
Adds `offset_height` rows of zeros on top, `offset_width` columns of
zeros on the left, and then pads the image on the bottom and right
with zeros until it has dimensions `target_height`, `target_width`.
This op does nothing if `offset_*` is zero and the image already has size
`target_height` by `target_width`.
Args:
image: 3-D tensor with shape `[height, width, channels]`
offset_height: Number of rows of zeros to add on top.
offset_width: Number of columns of zeros to add on the left.
target_height: Height of output image.
target_width: Width of output image.
Returns:
3-D tensor of shape `[target_height, target_width, channels]`
Raises:
ValueError: If the shape of `image` is incompatible with the `offset_*` or
`target_*` arguments
"""
_Check3DImage(image)
height, width, depth = _ImageDimensions(image)
if target_width < width:
raise ValueError('target_width must be >= width')
if target_height < height:
raise ValueError('target_height must be >= height')
after_padding_width = target_width - offset_width - width
after_padding_height = target_height - offset_height - height
if after_padding_width < 0:
raise ValueError('target_width not possible given '
'offset_width and image width')
if after_padding_height < 0:
raise ValueError('target_height not possible given '
'offset_height and image height')
# Do not pad on the depth dimensions.
if (offset_width or offset_height or after_padding_width or
after_padding_height):
paddings = [[offset_height, after_padding_height],
[offset_width, after_padding_width], [0, 0]]
padded = array_ops.pad(image, paddings)
padded.set_shape([target_height, target_width, depth])
else:
padded = image
return padded
def crop_to_bounding_box(image, offset_height, offset_width, target_height,
target_width):
"""Crops an image to a specified bounding box.
This op cuts a rectangular part out of `image`. The top-left corner of the
returned image is at `offset_height, offset_width` in `image`, and its
lower-right corner is at
`offset_height + target_height, offset_width + target_width`.
Args:
image: 3-D tensor with shape `[height, width, channels]`
offset_height: Vertical coordinate of the top-left corner of the result in
the input.
offset_width: Horizontal coordinate of the top-left corner of the result in
the input.
target_height: Height of the result.
target_width: Width of the result.
Returns:
3-D tensor of image with shape `[target_height, target_width, channels]`
Raises:
ValueError: If the shape of `image` is incompatible with the `offset_*` or
`target_*` arguments
"""
_Check3DImage(image)
height, width, _ = _ImageDimensions(image)
if offset_width < 0:
raise ValueError('offset_width must be >= 0.')
if offset_height < 0:
raise ValueError('offset_height must be >= 0.')
if width < (target_width + offset_width):
raise ValueError('width must be >= target + offset.')
if height < (target_height + offset_height):
raise ValueError('height must be >= target + offset.')
cropped = array_ops.slice(image, [offset_height, offset_width, 0],
[target_height, target_width, -1])
return cropped
def resize_image_with_crop_or_pad(image, target_height, target_width):
"""Crops and/or pads an image to a target width and height.
Resizes an image to a target width and height by either centrally
cropping the image or padding it evenly with zeros.
If `width` or `height` is greater than the specified `target_width` or
`target_height` respectively, this op centrally crops along that dimension.
If `width` or `height` is smaller than the specified `target_width` or
`target_height` respectively, this op centrally pads with 0 along that
dimension.
Args:
image: 3-D tensor of shape [height, width, channels]
target_height: Target height.
target_width: Target width.
Raises:
ValueError: if `target_height` or `target_width` are zero or negative.
Returns:
Cropped and/or padded image of shape
`[target_height, target_width, channels]`
"""
_Check3DImage(image)
original_height, original_width, _ = _ImageDimensions(image)
if target_width <= 0:
raise ValueError('target_width must be > 0.')
if target_height <= 0:
raise ValueError('target_height must be > 0.')
offset_crop_width = 0
offset_pad_width = 0
if target_width < original_width:
offset_crop_width = (original_width - target_width) // 2
elif target_width > original_width:
offset_pad_width = (target_width - original_width) // 2
offset_crop_height = 0
offset_pad_height = 0
if target_height < original_height:
offset_crop_height = (original_height - target_height) // 2
elif target_height > original_height:
offset_pad_height = (target_height - original_height) // 2
# Maybe crop if needed.
cropped = crop_to_bounding_box(image, offset_crop_height, offset_crop_width,
min(target_height, original_height),
min(target_width, original_width))
# Maybe pad if needed.
resized = pad_to_bounding_box(cropped, offset_pad_height, offset_pad_width,
target_height, target_width)
if resized.get_shape().ndims is None:
raise ValueError('resized contains no shape.')
if not resized.get_shape()[0].is_compatible_with(target_height):
raise ValueError('resized height is not correct.')
if not resized.get_shape()[1].is_compatible_with(target_width):
raise ValueError('resized width is not correct.')
return resized
class ResizeMethod(object):
BILINEAR = 0
NEAREST_NEIGHBOR = 1
BICUBIC = 2
AREA = 3
def resize_images(images, new_height, new_width, method=ResizeMethod.BILINEAR):
"""Resize `images` to `new_width`, `new_height` using the specified `method`.
Resized images will be distorted if their original aspect ratio is not
the same as `new_width`, `new_height`. To avoid distortions see
[`resize_image_with_crop_or_pad`](#resize_image_with_crop_or_pad).
`method` can be one of:
* <b>`ResizeMethod.BILINEAR`</b>: [Bilinear interpolation.]
(https://en.wikipedia.org/wiki/Bilinear_interpolation)
* <b>`ResizeMethod.NEAREST_NEIGHBOR`</b>: [Nearest neighbor interpolation.]
(https://en.wikipedia.org/wiki/Nearest-neighbor_interpolation)
* <b>`ResizeMethod.BICUBIC`</b>: [Bicubic interpolation.]
(https://en.wikipedia.org/wiki/Bicubic_interpolation)
* <b>`ResizeMethod.AREA`</b>: Area interpolation.
Args:
images: 4-D Tensor of shape `[batch, height, width, channels]` or
3-D Tensor of shape `[height, width, channels]`.
new_height: integer.
new_width: integer.
method: ResizeMethod. Defaults to `ResizeMethod.BILINEAR`.
Raises:
ValueError: if the shape of `images` is incompatible with the
shape arguments to this function
ValueError: if an unsupported resize method is specified.
Returns:
If `images` was 4-D, a 4-D float Tensor of shape
`[batch, new_height, new_width, channels]`.
If `images` was 3-D, a 3-D float Tensor of shape
`[new_height, new_width, channels]`.
"""
if images.get_shape().ndims is None:
raise ValueError('\'images\' contains no shape.')
# TODO(shlens): Migrate this functionality to the underlying Op's.
is_batch = True
if len(images.get_shape()) == 3:
is_batch = False
images = array_ops.expand_dims(images, 0)
_, height, width, depth = _ImageDimensions(images)
if width == new_width and height == new_height:
return images
if method == ResizeMethod.BILINEAR:
images = gen_image_ops.resize_bilinear(images, [new_height, new_width])
elif method == ResizeMethod.NEAREST_NEIGHBOR:
images = gen_image_ops.resize_nearest_neighbor(images, [new_height,
new_width])
elif method == ResizeMethod.BICUBIC:
images = gen_image_ops.resize_bicubic(images, [new_height, new_width])
elif method == ResizeMethod.AREA:
images = gen_image_ops.resize_area(images, [new_height, new_width])
else:
raise ValueError('Resize method is not implemented.')
if not is_batch:
images = array_ops.squeeze(images, squeeze_dims=[0])
return images
def per_image_whitening(image):
"""Linearly scales `image` to have zero mean and unit norm.
This op computes `(x - mean) / adjusted_stddev`, where `mean` is the average
of all values in image, and
`adjusted_stddev = max(stddev, 1.0/srqt(image.NumElements()))`.
`stddev` is the standard deviation of all values in `image`. It is capped
away from zero to protect against division by 0 when handling uniform images.
Note that this implementation is limited:
* It only whitens based on the statistics of an individual image.
* It does not take into account the covariance structure.
Args:
image: 3-D tensor of shape `[height, width, channels]`.
Returns:
The whitened image with same shape as `image`.
Raises:
ValueError: if the shape of 'image' is incompatible with this function.
"""
_Check3DImage(image)
height, width, depth = _ImageDimensions(image)
num_pixels = height * width * depth
image = math_ops.cast(image, dtype=dtypes.float32)
image_mean = math_ops.reduce_mean(image)
variance = (math_ops.reduce_mean(math_ops.square(image)) -
math_ops.square(image_mean))
stddev = math_ops.sqrt(variance)
# Apply a minimum normalization that protects us against uniform images.
min_stddev = constant_op.constant(1.0 / math.sqrt(num_pixels))
pixel_value_scale = math_ops.maximum(stddev, min_stddev)
pixel_value_offset = image_mean
image = math_ops.sub(image, pixel_value_offset)
image = math_ops.div(image, pixel_value_scale)
return image
def random_brightness(image, max_delta, seed=None):
"""Adjust the brightness of images by a random factor.
Equivalent to `adjust_brightness()` using a `delta` randomly picked in the
interval `[-max_delta, max_delta)`.
Note that `delta` is picked as a float. Because for integer type images,
the brightness adjusted result is rounded before casting, integer images may
have modifications in the range `[-max_delta,max_delta]`.
Args:
image: 3-D tensor of shape `[height, width, channels]`.
max_delta: float, must be non-negative.
seed: A Python integer. Used to create a random seed. See
[`set_random_seed`](../../api_docs/python/constant_op.md#set_random_seed)
for behavior.
Returns:
3-D tensor of images of shape `[height, width, channels]`
Raises:
ValueError: if `max_delta` is negative.
"""
_Check3DImage(image)
if max_delta < 0:
raise ValueError('max_delta must be non-negative.')
delta = random_ops.random_uniform([], -max_delta, max_delta, seed=seed)
return adjust_brightness(image, delta)
def random_contrast(image, lower, upper, seed=None):
"""Adjust the contrase of an image by a random factor.
Equivalent to `adjust_constrast()` but uses a `contrast_factor` randomly
picked in the interval `[lower, upper]`.
Args:
image: 3-D tensor of shape `[height, width, channels]`.
lower: float. Lower bound for the random contrast factor.
upper: float. Upper bound for the random contrast factor.
seed: A Python integer. Used to create a random seed. See
[`set_random_seed`](../../api_docs/python/constant_op.md#set_random_seed)
for behavior.
Returns:
3-D tensor of shape `[height, width, channels]`.
Raises:
ValueError: if `upper <= lower` or if `lower < 0`.
"""
_Check3DImage(image)
if upper <= lower:
raise ValueError('upper must be > lower.')
if lower < 0:
raise ValueError('lower must be non-negative.')
# Generate an a float in [lower, upper]
contrast_factor = random_ops.random_uniform([], lower, upper, seed=seed)
return adjust_contrast(image, contrast_factor)
def adjust_brightness(image, delta, min_value=None, max_value=None):
"""Adjust the brightness of RGB or Grayscale images.
The value `delta` is added to all components of the tensor `image`. `image`
and `delta` are cast to `float` before adding, and the resulting values are
clamped to `[min_value, max_value]`. Finally, the result is cast back to
`images.dtype`.
If `min_value` or `max_value` are not given, they are set to the minimum and
maximum allowed values for `image.dtype` respectively.
Args:
image: A tensor.
delta: A scalar. Amount to add to the pixel values.
min_value: Minimum value for output.
max_value: Maximum value for output.
Returns:
A tensor of the same shape and type as `image`.
"""
if min_value is None:
min_value = image.dtype.min
if max_value is None:
max_value = image.dtype.max
with ops.op_scope([image, delta, min_value, max_value], None,
'adjust_brightness') as name:
adjusted = math_ops.add(
math_ops.cast(image, dtypes.float32),
math_ops.cast(delta, dtypes.float32),
name=name)
if image.dtype.is_integer:
rounded = math_ops.round(adjusted)
else:
rounded = adjusted
clipped = clip_ops.clip_by_value(rounded, float(min_value),
float(max_value))
output = math_ops.cast(clipped, image.dtype)
return output
def adjust_contrast(images, contrast_factor, min_value=None, max_value=None):
"""Adjust contrast of RGB or grayscale images.
`images` is a tensor of at least 3 dimensions. The last 3 dimensions are
interpreted as `[height, width, channels]`. The other dimensions only
represent a collection of images, such as `[batch, height, width, channels].`
Contrast is adjusted independently for each channel of each image.
For each channel, this Op first computes the mean of the image pixels in the
channel and then adjusts each component `x` of each pixel to
`(x - mean) * contrast_factor + mean`.
The adjusted values are then clipped to fit in the `[min_value, max_value]`
interval. If `min_value` or `max_value` is not given, it is replaced with the
minimum and maximum values for the data type of `images` respectively.
The contrast-adjusted image is always computed as `float`, and it is
cast back to its original type after clipping.
Args:
images: Images to adjust. At least 3-D.
contrast_factor: A float multiplier for adjusting contrast.
min_value: Minimum value for clipping the adjusted pixels.
max_value: Maximum value for clipping the adjusted pixels.
Returns:
The constrast-adjusted image or images.
Raises:
ValueError: if the arguments are invalid.
"""
_CheckAtLeast3DImage(images)
# If these are None, the min/max should be a nop, but still prevent overflows
# from the cast back to images.dtype at the end of adjust_contrast.
if min_value is None:
min_value = images.dtype.min
if max_value is None:
max_value = images.dtype.max
with ops.op_scope(
[images, contrast_factor, min_value,
max_value], None, 'adjust_contrast') as name:
adjusted = gen_image_ops.adjust_contrast(images,
contrast_factor=contrast_factor,
min_value=min_value,
max_value=max_value,
name=name)
if images.dtype.is_integer:
return math_ops.cast(math_ops.round(adjusted), images.dtype)
else:
return math_ops.cast(adjusted, images.dtype)
ops.RegisterShape('AdjustContrast')(
common_shapes.unchanged_shape_with_rank_at_least(3))
@ops.RegisterShape('ResizeBilinear')
@ops.RegisterShape('ResizeNearestNeighbor')
@ops.RegisterShape('ResizeBicubic')
@ops.RegisterShape('ResizeArea')
def _ResizeShape(op):
"""Shape function for the resize_bilinear and resize_nearest_neighbor ops."""
input_shape = op.inputs[0].get_shape().with_rank(4)
size = tensor_util.ConstantValue(op.inputs[1])
if size is not None:
height = size[0]
width = size[1]
else:
height = None
width = None
return [tensor_shape.TensorShape(
[input_shape[0], height, width, input_shape[3]])]
@ops.RegisterShape('DecodeJpeg')
@ops.RegisterShape('DecodePng')
def _ImageDecodeShape(op):
"""Shape function for image decoding ops."""
unused_input_shape = op.inputs[0].get_shape().merge_with(
tensor_shape.scalar())
channels = op.get_attr('channels') or None
return [tensor_shape.TensorShape([None, None, channels])]
@ops.RegisterShape('EncodeJpeg')
@ops.RegisterShape('EncodePng')
def _ImageEncodeShape(op):
"""Shape function for image encoding ops."""
unused_input_shape = op.inputs[0].get_shape().with_rank(3)
return [tensor_shape.scalar()]
@ops.RegisterShape('RandomCrop')
def _random_cropShape(op):
"""Shape function for the random_crop op."""
input_shape = op.inputs[0].get_shape().with_rank(3)
unused_size_shape = op.inputs[1].get_shape().merge_with(
tensor_shape.vector(2))
size = tensor_util.ConstantValue(op.inputs[1])
if size is not None:
height = size[0]
width = size[1]
else:
height = None
width = None
channels = input_shape[2]
return [tensor_shape.TensorShape([height, width, channels])]
def random_crop(image, size, seed=None, name=None):
"""Randomly crops `image` to size `[target_height, target_width]`.
The offset of the output within `image` is uniformly random. `image` always
fully contains the result.
Args:
image: 3-D tensor of shape `[height, width, channels]`
size: 1-D tensor with two elements, specifying target `[height, width]`
seed: A Python integer. Used to create a random seed. See
[`set_random_seed`](../../api_docs/python/constant_op.md#set_random_seed)
for behavior.
name: A name for this operation (optional).
Returns:
A cropped 3-D tensor of shape `[target_height, target_width, channels]`.
"""
seed1, seed2 = random_seed.get_seed(seed)
return gen_image_ops.random_crop(image, size, seed=seed1, seed2=seed2,
name=name)
def convert_image_dtype(image, dtype, name=None):
"""Convert `image` to `dtype`, scaling its values if needed.
Images that are represented using floating point values are expected to have
values in the range [0,1). Image data stored in integer data types are
expected to have values in the range `[0,MAX]`, wbere `MAX` is the largest
positive representable number for the data type.
This op converts between data types, scaling the values appropriately before
casting.
Note that for floating point inputs, this op expects values to lie in [0,1).
Conversion of an image containing values outside that range may lead to
overflow errors when converted to integer `Dtype`s.
Args:
image: An image.
dtype: A `DType` to convert `image` to.
name: A name for this operation (optional).
Returns:
`image`, converted to `dtype`.
"""
if dtype == image.dtype:
return image
with ops.op_scope([image], name, 'convert_image') as name:
# Both integer: use integer multiplication in the larger range
if image.dtype.is_integer and dtype.is_integer:
scale_in = image.dtype.max
scale_out = dtype.max
if scale_in > scale_out:
# Scaling down, scale first, then cast. The scaling factor will
# cause in.max to be mapped to above out.max but below out.max+1,
# so that the output is safely in the supported range.
scale = (scale_in + 1) // (scale_out + 1)
scaled = math_ops.div(image, scale)
return math_ops.cast(scaled, dtype)
else:
# Scaling up, cast first, then scale. The scale will not map in.max to
# out.max, but converting back and forth should result in no change.
cast = math_ops.cast(image, dtype)
scale = (scale_out + 1) // (scale_in + 1)
return math_ops.mul(cast, scale)
elif image.dtype.is_floating and dtype.is_floating:
# Both float: Just cast, no possible overflows in the allowed ranges.
return math_ops.cast(image, dtype)
else:
if image.dtype.is_integer:
# Converting to float: first cast, then scale
cast = math_ops.cast(image, dtype)
scale = 1. / image.dtype.max
return math_ops.mul(cast, scale)
else:
# Converting from float: first scale, then cast
scale = dtype.max + 0.5 # avoid rounding problems in the cast
scaled = math_ops.mul(image, scale)
return math_ops.cast(scaled, dtype)
def rgb_to_grayscale(images):
"""Converts one or more images from RGB to Grayscale.
Outputs a tensor of the same `DType` and rank as `images`. The size of the
last dimension of the output is 1, containing the Grayscale value of the
pixels.
Args:
images: The RGB tensor to convert. Last dimension must have size 3 and
should contain RGB values.
Returns:
The converted grayscale image(s).
"""
with ops.op_scope([images], None, 'rgb_to_grayscale'):
# Remember original dtype to so we can convert back if needed
orig_dtype = images.dtype
flt_image = convert_image_dtype(images, dtypes.float32)
# Reference for converting between RGB and grayscale.
# https://en.wikipedia.org/wiki/Luma_%28video%29
rgb_weights = [0.2989, 0.5870, 0.1140]
rank_1 = array_ops.expand_dims(array_ops.rank(images) - 1, 0)
gray_float = math_ops.reduce_sum(flt_image * rgb_weights,
rank_1,
keep_dims=True)
return convert_image_dtype(gray_float, orig_dtype)
def grayscale_to_rgb(images):
"""Converts one or more images from Grayscale to RGB.
Outputs a tensor of the same `DType` and rank as `images`. The size of the
last dimension of the output is 3, containing the RGB value of the pixels.
Args:
images: The Grayscale tensor to convert. Last dimension must be size 1.
Returns:
The converted grayscale image(s).
"""
with ops.op_scope([images], None, 'grayscale_to_rgb'):
rank_1 = array_ops.expand_dims(array_ops.rank(images) - 1, 0)
shape_list = (
[array_ops.ones(rank_1,
dtype=dtypes.int32)] + [array_ops.expand_dims(3, 0)])
multiples = array_ops.concat(0, shape_list)
return array_ops.tile(images, multiples)
# pylint: disable=invalid-name
@ops.RegisterShape('HSVToRGB')
@ops.RegisterShape('RGBToHSV')
def _ColorspaceShape(op):
"""Shape function for colorspace ops."""
input_shape = op.inputs[0].get_shape().with_rank_at_least(1)
input_rank = input_shape.ndims
if input_rank is not None:
input_shape = input_shape.merge_with([None] * (input_rank - 1) + [3])
return [input_shape]
# pylint: enable=invalid-name
def random_hue(image, max_delta, seed=None):
"""Adjust the hue of an RGB image by a random factor.
Equivalent to `adjust_hue()` but uses a `delta` randomly
picked in the interval `[-max_delta, max_delta]`.
`max_delta` must be in the interval `[0, 0.5]`.
Args:
image: RGB image or images. Size of the last dimension must be 3.
max_delta: float. Maximum value for the random delta.
seed: An operation-specific seed. It will be used in conjunction
with the graph-level seed to determine the real seeds that will be
used in this operation. Please see the documentation of
set_random_seed for its interaction with the graph-level random seed.
Returns:
3-D float tensor of shape `[height, width, channels]`.
Raises:
ValueError: if `max_delta` is invalid.
"""
if max_delta > 0.5:
raise ValueError('max_delta must be <= 0.5.')
if max_delta < 0:
raise ValueError('max_delta must be non-negative.')
delta = random_ops.random_uniform([], -max_delta, max_delta, seed=seed)
return adjust_hue(image, delta)
def adjust_hue(image, delta, name=None):
"""Adjust hue of an RGB image.
This is a convenience method that converts an RGB image to float
representation, converts it to HSV, add an offset to the hue channel, converts
back to RGB and then back to the original data type. If several adjustments
are chained it is advisable to minimize the number of redundant conversions.
`image` is an RGB image. The image hue is adjusted by converting the
image to HSV and rotating the hue channel (H) by
`delta`. The image is then converted back to RGB.
`delta` must be in the interval `[-1, 1]`.
Args:
image: RGB image or images. Size of the last dimension must be 3.
delta: float. How much to add to the hue channel.
name: A name for this operation (optional).
Returns:
Adjusted image(s), same shape and DType as `image`.
"""
with ops.op_scope([image], name, 'adjust_hue') as name:
# Remember original dtype to so we can convert back if needed
orig_dtype = image.dtype
flt_image = convert_image_dtype(image, dtypes.float32)
hsv = gen_image_ops.rgb_to_hsv(flt_image)
hue = array_ops.slice(hsv, [0, 0, 0], [-1, -1, 1])
saturation = array_ops.slice(hsv, [0, 0, 1], [-1, -1, 1])
value = array_ops.slice(hsv, [0, 0, 2], [-1, -1, 1])
# Note that we add 2*pi to guarantee that the resulting hue is a positive
# floating point number since delta is [-0.5, 0.5].
hue = math_ops.mod(hue + (delta + 1.), 1.)
hsv_altered = array_ops.concat(2, [hue, saturation, value])
rgb_altered = gen_image_ops.hsv_to_rgb(hsv_altered)
return convert_image_dtype(rgb_altered, orig_dtype)
def random_saturation(image, lower, upper, seed=None):
"""Adjust the saturation of an RGB image by a random factor.
Equivalent to `adjust_saturation()` but uses a `saturation_factor` randomly
picked in the interval `[lower, upper]`.
Args:
image: RGB image or images. Size of the last dimension must be 3.
lower: float. Lower bound for the random saturation factor.
upper: float. Upper bound for the random saturation factor.
seed: An operation-specific seed. It will be used in conjunction
with the graph-level seed to determine the real seeds that will be
used in this operation. Please see the documentation of
set_random_seed for its interaction with the graph-level random seed.
Returns:
Adjusted image(s), same shape and DType as `image`.
Raises:
ValueError: if `upper <= lower` or if `lower < 0`.
"""
if upper <= lower:
raise ValueError('upper must be > lower.')
if lower < 0:
raise ValueError('lower must be non-negative.')
# Pick a float in [lower, upper]
saturation_factor = random_ops.random_uniform([], lower, upper, seed=seed)
return adjust_saturation(image, saturation_factor)
def adjust_saturation(image, saturation_factor, name=None):
"""Adjust staturation of an RGB image.
This is a convenience method that converts an RGB image to float
representation, converts it to HSV, add an offset to the saturation channel,
converts back to RGB and then back to the original data type. If several
adjustments are chained it is advisable to minimize the number of redundant
conversions.
`image` is an RGB image. The image saturation is adjusted by converting the
image to HSV and multiplying the saturation (S) channel by
`saturation_factor` and clipping. The image is then converted back to RGB.
Args:
image: RGB image or images. Size of the last dimension must be 3.
saturation_factor: float. Factor to multiply the saturation by.
name: A name for this operation (optional).
Returns:
Adjusted image(s), same shape and DType as `image`.
"""
with ops.op_scope([image], name, 'adjust_saturation') as name:
# Remember original dtype to so we can convert back if needed
orig_dtype = image.dtype
flt_image = convert_image_dtype(image, dtypes.float32)
hsv = gen_image_ops.rgb_to_hsv(flt_image)
hue = array_ops.slice(hsv, [0, 0, 0], [-1, -1, 1])
saturation = array_ops.slice(hsv, [0, 0, 1], [-1, -1, 1])
value = array_ops.slice(hsv, [0, 0, 2], [-1, -1, 1])
saturation *= saturation_factor
saturation = clip_ops.clip_by_value(saturation, 0.0, 1.0)
hsv_altered = array_ops.concat(2, [hue, saturation, value])
rgb_altered = gen_image_ops.hsv_to_rgb(hsv_altered)
return convert_image_dtype(rgb_altered, orig_dtype)