The RecordingMicroAllocator class currently doesn't track variable tensor allocations. This was noted why the measured allocations had a missing ~10kb of tail space unaccounted for in the keyword model. This change tracks variable tensor allocation for the keyword model (the test conv model does not have any variable tensors).
Total and tail allocation creep up a bit here to handle the additional fields in RecordingMicroAllocator:
TestKeywordModelMemoryThreshold:
-------------------------------
[RecordingMicroAllocator] Arena allocation total 21472 bytes
[RecordingMicroAllocator] Arena allocation head 672 bytes
[RecordingMicroAllocator] Arena allocation tail 20800 bytes
[RecordingMicroAllocator] 'TfLiteTensor struct' used 6048 bytes with alignment overhead (requested 6048 bytes for 54 tensors)
[RecordingMicroAllocator] 'TfLiteTensor quantization data' used 2160 bytes with alignment overhead (requested 2160 bytes for 162 allocations)
[RecordingMicroAllocator] 'TfLiteTensor variable buffer data' used 10240 bytes with alignment overhead (requested 10240 bytes for 7 allocations)
[RecordingMicroAllocator] 'NodeAndRegistration struct' used 1200 bytes with alignment overhead (requested 1200 bytes for 15 NodeAndRegistration structs)
[RecordingMicroAllocator] 'Operator runtime data' used 148 bytes with alignment overhead (requested 148 bytes for 13 OpData structs)
TestConvModelMemoryThreshold:
-----------------------------
[RecordingMicroAllocator] Arena allocation total 12128 bytes
[RecordingMicroAllocator] Arena allocation head 7744 bytes
[RecordingMicroAllocator] Arena allocation tail 4384 bytes
[RecordingMicroAllocator] 'TfLiteTensor struct' used 1680 bytes with alignment overhead (requested 1680 bytes for 15 tensors)
[RecordingMicroAllocator] 'TfLiteTensor quantization data' used 1216 bytes with alignment overhead (requested 1216 bytes for 36 allocations)
[RecordingMicroAllocator] 'TfLiteTensor variable buffer data' used 0 bytes with alignment overhead (requested 0 bytes for 0 allocations)
[RecordingMicroAllocator] 'NodeAndRegistration struct' used 560 bytes with alignment overhead (requested 560 bytes for 7 NodeAndRegistration structs)
[RecordingMicroAllocator] 'Operator runtime data' used 136 bytes with alignment overhead (requested 136 bytes for 5 OpData structs)
PiperOrigin-RevId: 316166016
Change-Id: I7d806f901b39e5d6a73c3baaf11d85fa7f6e17b1
This new test ensures that TF Micro does not regress current allocations (on x86-64 systems) for a canonical model. As RAM reduction changes are introduced, the values in this test can be updated from the console log of this test.
Current output for the keyword model:
Testing TestKeywordModelMemoryThreshold
[RecordingMicroAllocator] Arena allocation total 21440 bytes
[RecordingMicroAllocator] Arena allocation head 672 bytes
[RecordingMicroAllocator] Arena allocation tail 20768 bytes
[RecordingMicroAllocator] 'TfLiteTensor struct allocation' used 6048 bytes (requested 6048 bytes 54 times)
[RecordingMicroAllocator] 'TfLiteTensor quantization data allocations' used 2160 bytes (requested 2160 bytes 162 times)
[RecordingMicroAllocator] 'NodeAndRegistration struct allocation' used 1200 bytes (requested 1200 bytes 15 times)
[RecordingMicroAllocator] 'Operator runtime data allocation' used 148 bytes (requested 148 bytes 13 times)
PiperOrigin-RevId: 315958032
Change-Id: I226f6a01aa555970805388632559241a41ff8342
This change simplifies the interaction between the MicroInterpreter and MicroAllocator. All allocation for a given model is staged in MicroAllocator.StartModelAllocation() and MicroAllocator.FinishModelAllocation().
This change prepares for two upcoming features:
1.) Multi-tenant memory arena
2.) An easy-to-use RecordingMicroInterpreter to allow auditing of recorded memory arena allocations.
PiperOrigin-RevId: 315736762
Change-Id: Ia9da1f6edcd1001e3aad975c117905054f172e18
This change is a stepping stone to enable users to:
1.) Enable users to use a single MicroAllocator/arena for multiple models.
2.) Enable users to use the new recording allocation APIs for auditing arena allocations.
PiperOrigin-RevId: 315414448
Change-Id: Ied1ea56deb73c09bb64b3e41fd3502b5a4cd5bb8
This new class enables TFLM to measure, audit, and report memory usage in the shared tensor arena. Users may opt into this class by simply passing this class into a MicroInterpreter instance.
PiperOrigin-RevId: 314995667
Change-Id: I6a451944d55b0498a98f1cfd54244f9008e578d2
This will allow us to implement selective registration of the builtin parse
functions without changing the OpResolver base class in TFLite.
* MicroOpResolver is now an interface (matching the OpResolver name in TFLite).
* MicroMutableOpResolver is the implementation of the MicroOpResolver
interface that should be used by applications that do not want to use
AllOpsResolver.
PiperOrigin-RevId: 313691276
Change-Id: I0a9f51f6584326a3b3dd645cde083ba42116083d
This change is a precursor to adding a new memory logging MicroAllocator subclass that will enable TFLM to keep track of tensor arena tail allocations. Outside of moving all arena allocations to utility methods - I also cleaned up the organization of the methods inside of the cc file.
PiperOrigin-RevId: 313242666
Change-Id: Icddcc07187419fe314bc57708170cda8cd35690a
Head space is reusable while the tail space is persistent.
This gives some guidance on how much ram could be saved by using multi-tenant TFLM.
Check test `TestFinishTensorAllocation` in micro_allocator_test.cc for the usage.
PiperOrigin-RevId: 310271606
Change-Id: I5ad75fb4a01504ba584d2af036f474869270bba1
This helps to choose the optimal arena size.
- I've also used this tool to adjust the arena size for a few test cases.
- This CL changes the GreedyMemoryPlanner by exposing the per buffer size requirement so that we can estimate if the remaining arena is enough for planning the entire buffer.
PiperOrigin-RevId: 307628733
Change-Id: Id47f578a0bd0b67a3bbbd2a2ef7103d2336b17aa
A few things to notice.
- ContextHelper is a helper class reducing the overload on the interpreter. It forwards the request to the allocator while keep tracking the latest node ID.
- Buffers have are located in different areas due to their different lifespan. persistent buffers and scratch buffer handles need to be allocated from the persistent area (tail). Scratch buffers sit together with other tensors (head).
- Buffer handles are located in a "stack". The most recent buffer handle has the lowest address. This optimization saves us from reversing the order of buffer handles list.
PiperOrigin-RevId: 298288221
Change-Id: I7fa55d2b9acf837eafa1c4eedc6d7d339100af95
- SplitFinishTensorAllocation method into several stateless functions.
- Rename TensorInfo to allocationInfo so that later we can use it for buffers as well
- Move tensor initialization to the constructor
- Move InitializeRuntimeTensor out of MicroAllocator since it shouldn't be called by clients.
- Make MicroArena aligned by default.
PiperOrigin-RevId: 290262535
Change-Id: I2a4d06cb749368919038b17ba18727f7babdc322