Adds missing docs.
PiperOrigin-RevId: 300405855 Change-Id: Iaece469885db17d9a59627f790299358b57bc7d1
This commit is contained in:
parent
604ebadb89
commit
819f1bd4e1
75
tensorflow/lite/experimental/microfrontend/README.md
Normal file
75
tensorflow/lite/experimental/microfrontend/README.md
Normal file
@ -0,0 +1,75 @@
|
||||
# Audio "frontend" TensorFlow operations for feature generation
|
||||
The most common module used by most audio processing modules is the feature
|
||||
generation (also called frontend). It receives raw audio input, and produces
|
||||
filter banks (a vector of values).
|
||||
|
||||
More specifically the audio signal goes through a pre-emphasis filter
|
||||
(optionally); then gets sliced into (overlapping) frames and a window function
|
||||
is applied to each frame; afterwards, we do a Fourier transform on each frame
|
||||
(or more specifically a Short-Time Fourier Transform) and calculate the power
|
||||
spectrum; and subsequently compute the filter banks.
|
||||
|
||||
## Operations
|
||||
Here we provide implementations for both a TensorFlow and TensorFlow Lite
|
||||
operations that encapsulate the functionality of the audio frontend.
|
||||
|
||||
Both frontend Ops receives audio data and produces as many unstacked frames
|
||||
(filterbanks) as audio is passed in, according to the configuration.
|
||||
|
||||
The processing uses a lightweight library to perform:
|
||||
|
||||
1. A slicing window function
|
||||
2. Short-time FFTs
|
||||
3. Filterbank calculations
|
||||
4. Noise reduction
|
||||
5. Auto Gain Control
|
||||
6. Logarithmic scaling
|
||||
|
||||
Please refer to the Op's documentation for details on the different
|
||||
configuration parameters.
|
||||
|
||||
However, it is important to clarify the contract of the Ops:
|
||||
|
||||
> *A frontend OP will produce as many unstacked frames as possible with the
|
||||
> given audio input.*
|
||||
|
||||
This means:
|
||||
|
||||
1. The output is a rank-2 Tensor, where each row corresponds to the
|
||||
sequence/time dimension, and each column is the feature dimension).
|
||||
2. It is expected that the Op will receive the right input (in terms of
|
||||
positioning in the audio stream, and the amount), as needed to produce the
|
||||
expected output.
|
||||
3. Thus, any logic to slice, cache, or otherwise rearrange the input and/or
|
||||
output of the operation must be handled externally in the graph.
|
||||
|
||||
For example, a 200ms audio input will produce an output tensor of shape
|
||||
`[18, num_channels]`, when configured with a `window_size=25ms`, and
|
||||
`window_step=10ms`. The reason being that when reaching the point in the
|
||||
audio at 180ms there’s not enough audio to construct a complete window.
|
||||
|
||||
Due to both functional and efficiency reasons, we provide the following
|
||||
functionality related to input processing:
|
||||
|
||||
**Padding.** A boolean flag `zero_padding` that indicates whether to pad the
|
||||
audio with zeros such that we generate output frames based on the `window_step`.
|
||||
This means that in the example above, we would generate a tensor of shape
|
||||
`[20, num_channels]` by adding enough zeros such that we step over all the
|
||||
available audio and still be able to create complete windows of audio (some of
|
||||
the window will just have zeros; in the example above, frame 19 and 20 will have
|
||||
the equivalent of 5 and 15ms full of zeros respectively).
|
||||
|
||||
<!-- TODO
|
||||
Stacking. An integer that indicates how many contiguous frames to stack in the output tensor’s first dimension, such that the tensor is shaped [-1, stack_size * num_channels]. For example, if the stack_size is 3, the example above would produce an output tensor shaped [18, 120] is padding is false, and [20, 120] is padding is set to true.
|
||||
-->
|
||||
|
||||
**Striding.** An integer `frame_stride` that indicates the striding step used to
|
||||
generate the output tensor, thus determining the second dimension. In the
|
||||
example above, with a `frame_stride=3`, the output tensor would have a shape of
|
||||
`[6, 120]` when `zero_padding` is set to false, and `[7, 120]` when
|
||||
`zero_padding` is set to true.
|
||||
|
||||
<!-- TODO
|
||||
Note we would not expect the striding step to be larger than the stack_size
|
||||
(should we enforce that?).
|
||||
-->
|
@ -1,9 +0,0 @@
|
||||
The binary frontend_main shows sample usage of the frontend, printing out
|
||||
coefficients when it has processed enough data.
|
||||
|
||||
The binary frontend_memmap_main shows a sample usage of how to avoid all the
|
||||
init code in your runtime, by first running "frontend_generate_memmap" to
|
||||
create a header/source file that uses a baked in frontend state. This command
|
||||
could be automated as part of your build process, or you can just use the output
|
||||
directly.
|
||||
|
65
tensorflow/lite/experimental/microfrontend/lib/README.md
Normal file
65
tensorflow/lite/experimental/microfrontend/lib/README.md
Normal file
@ -0,0 +1,65 @@
|
||||
# Audio "frontend" library for feature generation
|
||||
|
||||
A feature generation library (also called frontend) that receives raw audio
|
||||
input, and produces filter banks (a vector of values).
|
||||
|
||||
The raw audio input is expected to be 16-bit PCM features, with a configurable
|
||||
sample rate. More specifically the audio signal goes through a pre-emphasis
|
||||
filter (optionally); then gets sliced into (potentially overlapping) frames and
|
||||
a window function is applied to each frame; afterwards, we do a Fourier
|
||||
transform on each frame (or more specifically a Short-Time Fourier Transform)
|
||||
and calculate the power spectrum; and subsequently compute the filter banks.
|
||||
|
||||
By default the library is configured with a set of defaults to perform the
|
||||
different processing tasks. This takes place with the frontend_util.c function:
|
||||
|
||||
```c++
|
||||
void FrontendFillConfigWithDefaults(struct FrontendConfig* config)
|
||||
```
|
||||
|
||||
A single invocation looks like:
|
||||
|
||||
```c++
|
||||
struct FrontendConfig frontend_config;
|
||||
FrontendFillConfigWithDefaults(&frontend_config);
|
||||
int sample_rate = 16000;
|
||||
FrontendPopulateState(&frontend_config, &frontend_state, sample_rate);
|
||||
int16_t* audio_data = ; // PCM audio samples at 16KHz.
|
||||
size_t audio_size = ; // Number of audio samples.
|
||||
size_t num_samples_read; // How many samples were processed.
|
||||
struct FrontendOutput output =
|
||||
FrontendProcessSamples(
|
||||
&frontend_state, audio_data, audio_size, &num_samples_read);
|
||||
for (i = 0; i < output.size; ++i) {
|
||||
printf("%d ", output.values[i]); // Print the feature vector.
|
||||
}
|
||||
```
|
||||
|
||||
Something to note in the above example is that the frontend consumes as many
|
||||
samples needed from the audio data to produce a single feature vector (according
|
||||
to the frontend configuration). If not enough samples were available to generate
|
||||
a feature vector, the returned size will be 0 and the values pointer will be
|
||||
`NULL`.
|
||||
|
||||
An example of how to use the frontend is provided in frontend_main.cc and its
|
||||
binary frontend_main. This example, expects a path to a file containing `int16`
|
||||
PCM features at a sample rate of 16KHz, and upon execution will printing out
|
||||
the coefficients according to the frontend default configuration.
|
||||
|
||||
## Extra features
|
||||
Extra features of this frontend library include a noise reduction module, as
|
||||
well as a gain control module.
|
||||
|
||||
**Noise cancellation**. Removes stationary noise from each channel of the signal
|
||||
using a low pass filter.
|
||||
|
||||
**Gain control**. A novel automatic gain control based dynamic compression to
|
||||
replace the widely used static (such as log or root) compression. Disabled
|
||||
by default.
|
||||
|
||||
## Memory map
|
||||
The binary frontend_memmap_main shows a sample usage of how to avoid all the
|
||||
initialization code in your application, by first running
|
||||
"frontend_generate_memmap" to create a header/source file that uses a baked in
|
||||
frontend state. This command could be automated as part of your build process,
|
||||
or you can just use the output directly.
|
@ -25,6 +25,7 @@ limitations under the License.
|
||||
extern "C" {
|
||||
#endif
|
||||
|
||||
// Details at https://research.google/pubs/pub45911.pdf
|
||||
struct PcanGainControlState {
|
||||
int enable_pcan;
|
||||
uint32_t* noise_estimate;
|
||||
|
Loading…
Reference in New Issue
Block a user