STT-tensorflow/tensorflow/examples/wav_to_spectrogram
Adrian Kuegel 78844fa978 internal BUILD file cleanup
PiperOrigin-RevId: 278833682
Change-Id: I43390b8edf4596477b6231a8462c06e9199fc0bc
2019-11-06 05:05:40 -08:00
..
BUILD internal BUILD file cleanup 2019-11-06 05:05:40 -08:00
main.cc Add AudioSpectrogram op to TensorFlow for audio feature generation 2017-04-11 16:08:23 -07:00
README.md Add AudioSpectrogram op to TensorFlow for audio feature generation 2017-04-11 16:08:23 -07:00
wav_to_spectrogram_test.cc Add AudioSpectrogram op to TensorFlow for audio feature generation 2017-04-11 16:08:23 -07:00
wav_to_spectrogram.cc Remove #include of command_line_flags.h where not needed. 2018-12-04 17:38:26 -08:00
wav_to_spectrogram.h Remove THIRD_PARTY_ from #include guards 2018-01-24 14:31:28 -08:00

TensorFlow Spectrogram Example

This example shows how you can load audio from a .wav file, convert it to a spectrogram, and then save it out as a PNG image. A spectrogram is a visualization of the frequencies in sound over time, and can be useful as a feature for neural network recognition on noise or speech.

Building

To build it, run this command:

bazel build tensorflow/examples/wav_to_spectrogram/...

That should build a binary executable that you can then run like this:

bazel-bin/tensorflow/examples/wav_to_spectrogram/wav_to_spectrogram

This uses a default test audio file that's part of the TensorFlow source code, and writes out the image to the current directory as spectrogram.png.

Options

To load your own audio, you need to supply a .wav file in LIN16 format, and use the --input_audio flag to pass in the path.

To control how the spectrogram is created, you can specify the --window_size and --stride arguments, which control how wide the window used to estimate frequencies is, and how widely adjacent windows are spaced.

The --output_image flag sets the path to save the image file to. This is always written out in PNG format, even if you specify a different file extension.

If your result seems too dark, try using the --brightness flag to make the output image easier to see.

Here's an example of how to use all of them together:

bazel-bin/tensorflow/examples/wav_to_spectrogram/wav_to_spectrogram \
--input_wav=/tmp/my_audio.wav \
--window=1024 \
--stride=512 \
--output_image=/tmp/my_spectrogram.png