Adds missing docs.
PiperOrigin-RevId: 300405855 Change-Id: Iaece469885db17d9a59627f790299358b57bc7d1
This commit is contained in:
		
							parent
							
								
									604ebadb89
								
							
						
					
					
						commit
						819f1bd4e1
					
				
							
								
								
									
										75
									
								
								tensorflow/lite/experimental/microfrontend/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										75
									
								
								tensorflow/lite/experimental/microfrontend/README.md
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,75 @@ | ||||
| # Audio "frontend" TensorFlow operations for feature generation | ||||
| The most common module used by most audio processing modules is the feature | ||||
| generation (also called frontend). It receives raw audio input, and produces | ||||
| filter banks (a vector of values). | ||||
| 
 | ||||
| More specifically the audio signal goes through a pre-emphasis filter | ||||
| (optionally); then gets sliced into (overlapping) frames and a window function | ||||
| is applied to each frame; afterwards, we do a Fourier transform on each frame | ||||
| (or more specifically a Short-Time Fourier Transform) and calculate the power | ||||
| spectrum; and subsequently compute the filter banks. | ||||
| 
 | ||||
| ## Operations | ||||
| Here we provide implementations for both a TensorFlow and TensorFlow Lite | ||||
| operations that encapsulate the functionality of the audio frontend. | ||||
| 
 | ||||
| Both frontend Ops receives audio data and produces as many unstacked frames | ||||
| (filterbanks) as audio is passed in, according to the configuration. | ||||
| 
 | ||||
| The processing uses a lightweight library to perform: | ||||
| 
 | ||||
| 1. A slicing window function | ||||
| 2. Short-time FFTs | ||||
| 3. Filterbank calculations | ||||
| 4. Noise reduction | ||||
| 5. Auto Gain Control | ||||
| 6. Logarithmic scaling | ||||
| 
 | ||||
| Please refer to the Op's documentation for details on the different | ||||
| configuration parameters. | ||||
| 
 | ||||
| However, it is important to clarify the contract of the Ops: | ||||
| 
 | ||||
| > *A frontend OP will produce as many unstacked frames as possible with the | ||||
| > given audio input.* | ||||
| 
 | ||||
| This means: | ||||
| 
 | ||||
| 1. The output is a rank-2 Tensor, where each row corresponds to the | ||||
|   sequence/time dimension, and each column is the feature dimension). | ||||
| 2. It is expected that the Op will receive the right input (in terms of | ||||
|   positioning in the audio stream, and the amount), as needed to produce the | ||||
|   expected output. | ||||
| 3. Thus, any logic to slice, cache, or otherwise rearrange the input and/or | ||||
|   output of the operation must be handled externally in the graph. | ||||
| 
 | ||||
| For example, a 200ms audio input will produce an output tensor of shape | ||||
| `[18, num_channels]`, when configured with a `window_size=25ms`, and | ||||
| `window_step=10ms`. The reason being that when reaching the point in the | ||||
| audio at 180ms there’s not enough audio to construct a complete window. | ||||
| 
 | ||||
| Due to both functional and efficiency reasons, we provide the following | ||||
| functionality related to input processing: | ||||
| 
 | ||||
| **Padding.** A boolean flag `zero_padding` that indicates whether to pad the | ||||
| audio with zeros such that we generate output frames based on the `window_step`. | ||||
| This means that in the example above, we would generate a tensor of shape | ||||
| `[20, num_channels]` by adding enough zeros such that we step over all the | ||||
| available audio and still be able to create complete windows of audio (some of | ||||
| the window will just have zeros; in the example above, frame 19 and 20 will have | ||||
| the equivalent of 5 and 15ms full of zeros respectively). | ||||
| 
 | ||||
| <!-- TODO | ||||
| Stacking. An integer that indicates how many contiguous frames to stack in the output tensor’s first dimension, such that the tensor is shaped [-1, stack_size * num_channels]. For example, if the stack_size is 3, the example above would produce an output tensor shaped [18, 120] is padding is false, and [20, 120] is padding is set to true. | ||||
| --> | ||||
| 
 | ||||
| **Striding.** An integer `frame_stride` that indicates the striding step used to | ||||
| generate the output tensor, thus determining the second dimension. In the | ||||
| example above, with a `frame_stride=3`, the output tensor would have a shape of | ||||
| `[6, 120]` when `zero_padding` is set to false, and `[7, 120]` when | ||||
| `zero_padding` is set to true. | ||||
| 
 | ||||
| <!-- TODO | ||||
| Note we would not expect the striding step to be larger than the stack_size | ||||
| (should we enforce that?). | ||||
| --> | ||||
| @ -1,9 +0,0 @@ | ||||
| The binary frontend_main shows sample usage of the frontend, printing out | ||||
| coefficients when it has processed enough data. | ||||
| 
 | ||||
| The binary frontend_memmap_main shows a sample usage of how to avoid all the | ||||
| init code in your runtime, by first running "frontend_generate_memmap" to | ||||
| create a header/source file that uses a baked in frontend state. This command | ||||
| could be automated as part of your build process, or you can just use the output | ||||
| directly. | ||||
| 
 | ||||
							
								
								
									
										65
									
								
								tensorflow/lite/experimental/microfrontend/lib/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										65
									
								
								tensorflow/lite/experimental/microfrontend/lib/README.md
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,65 @@ | ||||
| # Audio "frontend" library for feature generation | ||||
| 
 | ||||
| A feature generation library (also called frontend) that receives raw audio | ||||
| input, and produces filter banks (a vector of values). | ||||
| 
 | ||||
| The raw audio input is expected to be 16-bit PCM features, with a configurable | ||||
| sample rate. More specifically the audio signal goes through a pre-emphasis | ||||
| filter (optionally); then gets sliced into (potentially overlapping) frames and | ||||
| a window function is applied to each frame; afterwards, we do a Fourier | ||||
| transform on each frame (or more specifically a Short-Time Fourier Transform) | ||||
| and calculate the power spectrum; and subsequently compute the filter banks. | ||||
| 
 | ||||
| By default the library is configured with a set of defaults to perform the | ||||
| different processing tasks. This takes place with the frontend_util.c function: | ||||
| 
 | ||||
| ```c++ | ||||
| void FrontendFillConfigWithDefaults(struct FrontendConfig* config) | ||||
| ``` | ||||
| 
 | ||||
| A single invocation looks like: | ||||
| 
 | ||||
| ```c++ | ||||
| struct FrontendConfig frontend_config; | ||||
| FrontendFillConfigWithDefaults(&frontend_config); | ||||
| int sample_rate = 16000; | ||||
| FrontendPopulateState(&frontend_config, &frontend_state, sample_rate); | ||||
| int16_t* audio_data = ;  // PCM audio samples at 16KHz. | ||||
| size_t audio_size = ;  // Number of audio samples. | ||||
| size_t num_samples_read;  // How many samples were processed. | ||||
| struct FrontendOutput output = | ||||
|     FrontendProcessSamples( | ||||
|         &frontend_state, audio_data, audio_size, &num_samples_read); | ||||
| for (i = 0; i < output.size; ++i) { | ||||
|   printf("%d ", output.values[i]);  // Print the feature vector. | ||||
| } | ||||
| ``` | ||||
| 
 | ||||
| Something to note in the above example is that the frontend consumes as many | ||||
| samples needed from the audio data to produce a single feature vector (according | ||||
| to the frontend configuration). If not enough samples were available to generate | ||||
| a feature vector, the returned size will be 0 and the values pointer will be | ||||
| `NULL`. | ||||
| 
 | ||||
| An example of how to use the frontend is provided in frontend_main.cc and its | ||||
| binary frontend_main. This example, expects a path to a file containing `int16` | ||||
| PCM features at a sample rate of 16KHz, and upon execution will printing out | ||||
| the coefficients according to the frontend default configuration. | ||||
| 
 | ||||
| ## Extra features | ||||
| Extra features of this frontend library include a noise reduction module, as | ||||
| well as a gain control module. | ||||
| 
 | ||||
| **Noise cancellation**. Removes stationary noise from each channel of the signal | ||||
| using a low pass filter. | ||||
| 
 | ||||
| **Gain control**. A novel automatic gain control based dynamic compression to | ||||
| replace the widely used static (such as log or root) compression. Disabled | ||||
| by default. | ||||
| 
 | ||||
| ## Memory map | ||||
| The binary frontend_memmap_main shows a sample usage of how to avoid all the | ||||
| initialization code in your application, by first running | ||||
| "frontend_generate_memmap" to create a header/source file that uses a baked in | ||||
| frontend state. This command could be automated as part of your build process, | ||||
| or you can just use the output directly. | ||||
| @ -25,6 +25,7 @@ limitations under the License. | ||||
| extern "C" { | ||||
| #endif | ||||
| 
 | ||||
| // Details at https://research.google/pubs/pub45911.pdf
 | ||||
| struct PcanGainControlState { | ||||
|   int enable_pcan; | ||||
|   uint32_t* noise_estimate; | ||||
|  | ||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user