Adds missing docs.

PiperOrigin-RevId: 300405855 Change-Id: Iaece469885db17d9a59627f790299358b57bc7d1
2020-03-11 14:21:57 -07:00 · 2020-03-11 14:21:57 -07:00 · 819f1bd4e1
commit 819f1bd4e1
parent 604ebadb89
4 changed files with 141 additions and 9 deletions
--- a/tensorflow/lite/experimental/microfrontend/README.md
+++ b/tensorflow/lite/experimental/microfrontend/README.md
@ -0,0 +1,75 @@
+# Audio "frontend" TensorFlow operations for feature generation
+The most common module used by most audio processing modules is the feature
+generation (also called frontend). It receives raw audio input, and produces
+filter banks (a vector of values).
+
+More specifically the audio signal goes through a pre-emphasis filter
+(optionally); then gets sliced into (overlapping) frames and a window function
+is applied to each frame; afterwards, we do a Fourier transform on each frame
+(or more specifically a Short-Time Fourier Transform) and calculate the power
+spectrum; and subsequently compute the filter banks.
+
+## Operations
+Here we provide implementations for both a TensorFlow and TensorFlow Lite
+operations that encapsulate the functionality of the audio frontend.
+
+Both frontend Ops receives audio data and produces as many unstacked frames
+(filterbanks) as audio is passed in, according to the configuration.
+
+The processing uses a lightweight library to perform:
+
+1. A slicing window function
+2. Short-time FFTs
+3. Filterbank calculations
+4. Noise reduction
+5. Auto Gain Control
+6. Logarithmic scaling
+
+Please refer to the Op's documentation for details on the different
+configuration parameters.
+
+However, it is important to clarify the contract of the Ops:
+
+> *A frontend OP will produce as many unstacked frames as possible with the
+> given audio input.*
+
+This means:
+
+1. The output is a rank-2 Tensor, where each row corresponds to the
+  sequence/time dimension, and each column is the feature dimension).
+2. It is expected that the Op will receive the right input (in terms of
+  positioning in the audio stream, and the amount), as needed to produce the
+  expected output.
+3. Thus, any logic to slice, cache, or otherwise rearrange the input and/or
+  output of the operation must be handled externally in the graph.
+
+For example, a 200ms audio input will produce an output tensor of shape
+`[18, num_channels]`, when configured with a `window_size=25ms`, and
+`window_step=10ms`. The reason being that when reaching the point in the
+audio at 180ms there’s not enough audio to construct a complete window.
+
+Due to both functional and efficiency reasons, we provide the following
+functionality related to input processing:
+
+**Padding.** A boolean flag `zero_padding` that indicates whether to pad the
+audio with zeros such that we generate output frames based on the `window_step`.
+This means that in the example above, we would generate a tensor of shape
+`[20, num_channels]` by adding enough zeros such that we step over all the
+available audio and still be able to create complete windows of audio (some of
+the window will just have zeros; in the example above, frame 19 and 20 will have
+the equivalent of 5 and 15ms full of zeros respectively).
+
+<!-- TODO
+Stacking. An integer that indicates how many contiguous frames to stack in the output tensor’s first dimension, such that the tensor is shaped [-1, stack_size * num_channels]. For example, if the stack_size is 3, the example above would produce an output tensor shaped [18, 120] is padding is false, and [20, 120] is padding is set to true.
+-->
+
+**Striding.** An integer `frame_stride` that indicates the striding step used to
+generate the output tensor, thus determining the second dimension. In the
+example above, with a `frame_stride=3`, the output tensor would have a shape of
+`[6, 120]` when `zero_padding` is set to false, and `[7, 120]` when
+`zero_padding` is set to true.
+
+<!-- TODO
+Note we would not expect the striding step to be larger than the stack_size
+(should we enforce that?).
+-->
--- a/tensorflow/lite/experimental/microfrontend/lib/README
+++ b/tensorflow/lite/experimental/microfrontend/lib/README
@ -1,9 +0,0 @@
-The binary frontend_main shows sample usage of the frontend, printing out
-coefficients when it has processed enough data.
-
-The binary frontend_memmap_main shows a sample usage of how to avoid all the
-init code in your runtime, by first running "frontend_generate_memmap" to
-create a header/source file that uses a baked in frontend state. This command
-could be automated as part of your build process, or you can just use the output
-directly.
-
--- a/tensorflow/lite/experimental/microfrontend/lib/README.md
+++ b/tensorflow/lite/experimental/microfrontend/lib/README.md
@ -0,0 +1,65 @@
+# Audio "frontend" library for feature generation
+
+A feature generation library (also called frontend) that receives raw audio
+input, and produces filter banks (a vector of values).
+
+The raw audio input is expected to be 16-bit PCM features, with a configurable
+sample rate. More specifically the audio signal goes through a pre-emphasis
+filter (optionally); then gets sliced into (potentially overlapping) frames and
+a window function is applied to each frame; afterwards, we do a Fourier
+transform on each frame (or more specifically a Short-Time Fourier Transform)
+and calculate the power spectrum; and subsequently compute the filter banks.
+
+By default the library is configured with a set of defaults to perform the
+different processing tasks. This takes place with the frontend_util.c function:
+
+```c++
+void FrontendFillConfigWithDefaults(struct FrontendConfig* config)
+```
+
+A single invocation looks like:
+
+```c++
+struct FrontendConfig frontend_config;
+FrontendFillConfigWithDefaults(&frontend_config);
+int sample_rate = 16000;
+FrontendPopulateState(&frontend_config, &frontend_state, sample_rate);
+int16_t* audio_data = ;  // PCM audio samples at 16KHz.
+size_t audio_size = ;  // Number of audio samples.
+size_t num_samples_read;  // How many samples were processed.
+struct FrontendOutput output =
+    FrontendProcessSamples(
+        &frontend_state, audio_data, audio_size, &num_samples_read);
+for (i = 0; i < output.size; ++i) {
+  printf("%d ", output.values[i]);  // Print the feature vector.
+}
+```
+
+Something to note in the above example is that the frontend consumes as many
+samples needed from the audio data to produce a single feature vector (according
+to the frontend configuration). If not enough samples were available to generate
+a feature vector, the returned size will be 0 and the values pointer will be
+`NULL`.
+
+An example of how to use the frontend is provided in frontend_main.cc and its
+binary frontend_main. This example, expects a path to a file containing `int16`
+PCM features at a sample rate of 16KHz, and upon execution will printing out
+the coefficients according to the frontend default configuration.
+
+## Extra features
+Extra features of this frontend library include a noise reduction module, as
+well as a gain control module.
+
+**Noise cancellation**. Removes stationary noise from each channel of the signal
+using a low pass filter.
+
+**Gain control**. A novel automatic gain control based dynamic compression to
+replace the widely used static (such as log or root) compression. Disabled
+by default.
+
+## Memory map
+The binary frontend_memmap_main shows a sample usage of how to avoid all the
+initialization code in your application, by first running
+"frontend_generate_memmap" to create a header/source file that uses a baked in
+frontend state. This command could be automated as part of your build process,
+or you can just use the output directly.
--- a/tensorflow/lite/experimental/microfrontend/lib/pcan_gain_control.h
+++ b/tensorflow/lite/experimental/microfrontend/lib/pcan_gain_control.h
@ -25,6 +25,7 @@ limitations under the License.
 extern "C" {
 #endif

+// Details at https://research.google/pubs/pub45911.pdf
 struct PcanGainControlState {
  int enable_pcan;
  uint32_t* noise_estimate;