Long debug log caused the program to crash.
Also this debug log is not much in use anyway.
PiperOrigin-RevId: 338264544
Change-Id: I940208380074e74a40e21cf22960fdc8f1c00c4d
All kernels have been ported aside from some optimized versions. The only open issues involve adding buffer/TfLiteEvalTensor API functionality to MicroInterpreter. This change simple cleans up references in the allocation and interpreter code.
PiperOrigin-RevId: 336274016
Change-Id: I0707739fa51b40a1621410639779e576b63bfcb7
The API for SimpleMemoryAllocator should just simply set the size of the head buffer and return a pointer to the start of that buffer. The current APIs exposed on this class are confusing and can easily be mixed up. This change drops those getters and relies on publicly facing methods to expose functionality. Additionaly, the head APIs are renamed to make more sense to what they do - manage the single head buffer allocation.
PiperOrigin-RevId: 336273280
Change-Id: Ibd4218c40962946b90633ca55169595057ea46c3
All allocations are now handled internally in ::CommitStaticMemoryPlan() to improve readability and tracking of allocations.
PiperOrigin-RevId: 335550726
Change-Id: Ia472939d216b950b234e9192fb60206f4a247c91
This change moves scratch buffer request logic from 3 classes (ContextHelper, MicroInterpreter, MicroAllocator) into simply the MicroAllocator. All member variable instances in ContextHelper are dropped in favor of storing temporary request structs in the head section. When a model finishes allocation, one final allocation of a the ScratchBufferHandle struct is placed in the tail (one allocation per model). All temp request placeholders are dropped in the head section after the model finishes allocation.
PiperOrigin-RevId: 335009075
Change-Id: Ic8e4e821563dd00a65e85f416df791bba778588d
The current implementation allows for EnsureHeadSize() to be called several times and ensures that the head is the size of the largest value specified. An upcoming change will reuse the head for storing kernel requested scratch buffer allocations instead of members on class (currently ContextHelper). This change allows for TFLM to adjust the head size, but always revert to the largest model head requirement when memory planning is complete for a model.
PiperOrigin-RevId: 334492115
Change-Id: Ic03e0af7b61acaccd69b2d5aeaea352d201d4c0d
Major changes:
- Scratch buffers are placed in the head during prepare stage then move to the tail once we know its length before static memory plan.
- ContextHelper sends RequestScratchBuffer request in a batch to workaround some limitation with temp allocation during Prepare stage.
PiperOrigin-RevId: 328945674
Change-Id: I09db5c1be0e225904f1c4bf3a5a4a2831a5db438
Currently, some platforms that have offsets during allocation (e.g. something on the Sparkfun @ 32bits) will fail to allocate. This is due to how the head was adjusted and the allocation size request in MicroAllocator.cc during memory planning phase (the part that gets committed to head).
First, this change fixes the actual bytes available call by taking in account the offset requested. This is a bug that is exposed with the new adjust head API. All head space was requested as a temp buffer to plan memory usage. This allocation did not account for offsets properly.
Secondly, I've simplified the API for head adjustment. The head is a value that can be set with a given requested size + offset. The watermark logic has been removed in favor of simplicity - callers (e.g. MicroAllocator) should check if they need to increase the head size before adjusting.
PiperOrigin-RevId: 324426138
Change-Id: Ifc683450ba32b9dd9fc5ba587855608a0bc6e311
The major change is in SimpleMemoryAllocator to allow the head space to be reused among different models.
PiperOrigin-RevId: 323470479
Change-Id: If709181da5e9b71222742b2850e6b08d25122a49
This change drastically modifies the way memory is used in TF Micro. Currently, large blocks of persistent memory are allocated for TfLiteTensor structs and any associated quantization data. Instead of this pattern, those TfLiteTensor structs and quantization data will be allocated from the "temp" section of the memory arena.
Instead of allocating a large block of TfLiteTensor structs - a minimal TfLiteEval struct is allocated. This new struct will serve as the source of truth for all buffers in the graph.
Everything works in the kernel implementations with this change - they are just temporarily slower. All TfLiteTensor structs fetched from GetInput()/GetOutput()/etc are now allocated on the fly through the temp allocation. Each kernel should be updated to fetch the TfLiteEval struct in the Eval() block in each kernel. Additionally, quantization data should be cached in those op kernels.
This CL saves up to 50% on the arena for larger conv-based models.
PiperOrigin-RevId: 322224278
Change-Id: Id32509a75c9f68177f5bb6b850ea11907afcbb1d
Upcoming changes to memory allocations will remove the global TfLiteTensor allocation. This change prepares the allocator for internal adjustments to memory requests. When the class fully switches over to TfLiteEvalTensor, the TfLitePtrUnion data buffer will be used instead of the existing large allocation on TfLiteContext.
PiperOrigin-RevId: 321882339
Change-Id: Ia33fe5f3f5f10bb5fce3f4a78fbc4e97a4021dae
In the future, all TfLiteTensor structs should be allocated through this API. This allocation allows for a chain of TfLiteTensor objects that can be reset through "ResetTempAllocations()".
PiperOrigin-RevId: 321211032
Change-Id: I6ab86b8749338590f1457486aa81a39e036534ec
Now that TFLM has completely switched over to the selective registration of
builtin parse functions, we can remove the unnecessary additional parameter.
PiperOrigin-RevId: 319072289
Change-Id: I4a43953e73c54e05b1d9f815bb8cf0605dc45bb8
This method is trivial and this file is easier to trace by just calling the two methods that it calls instead.
PiperOrigin-RevId: 318357169
Change-Id: Ib75aaaf67f4aa6908e5aabc7ce0fb7a84a87608e
It turns out that std::is_same() has dropped the non-string argument in c++17. This breaks internal users that are building against qualcomm.
PiperOrigin-RevId: 317790812
Change-Id: If56a61d20426670251b55f370a6b5fa886a49e21
Currently, MicroAllocator manually maps TfLite*Array struct values directly to flatbuffer values. This change cleans up other instances inside MicroAllocator that are not endian-aware.
This works only on little-endian (LE) architecture systems because of the layout of TfLite*Array:
struct TfLiteIntArray {
int size;
int data[];
}
The compiler maintains mapping, but |size| and |data| are laid out as the following in LE:
[lowest-order-byte(e.g. data) .... highest-order-byte(e.g. size)]
Casting and remapping work on LE because the vector is written in the lowest-order-byte sequence. On BE systems, this memory savings trick does not work and requires a malloc from the arena and manual copying of values from the flatbuffer.
PiperOrigin-RevId: 317730072
Change-Id: I1baff898356e3d82b2faed6468a50ae44acd3082
This change was introduced in cl/316533499 (PR: https://github.com/tensorflow/tensorflow/pull/38121). Lint was complaining of c-style casts, upon fixing it also was hiding const usage.
PiperOrigin-RevId: 317680917
Change-Id: I4d874564875e58eb5f6905c7b75562f90588bb22
Currently, TFLM manually allocates a tail chunk to store "quantization" tensor data on TfLiteTensor objects. The size of these allocations vary based on the type of model - conv1d/2d models tend to be rich since quantization data is stored "per channel".
This change simply points the scale data at the existing value in the flatbuffer. The flatbuffer schema stores float values as flatbuffers::Vector<float> and the TfLiteAffineQuantization struct can point the scale pointer at these values. Unfortunately, the zero point values are stored as flatbuffers::Vector<int64_t> and can not be reused. This allocation will be addressed in a future change.
Keyword Model ~2% reduction in tail allocation:
-----------------------------------------------
[RecordingMicroAllocator] Arena allocation total 21040 bytes
[RecordingMicroAllocator] Arena allocation head 672 bytes
[RecordingMicroAllocator] Arena allocation tail 20368 bytes
[RecordingMicroAllocator] 'TfLiteTensor struct' used 6048 bytes with alignment overhead (requested 6048 bytes for 54 tensors)
[RecordingMicroAllocator] 'TfLiteTensor quantization data' used 1728 bytes with alignment overhead (requested 1728 bytes for 108 allocations)
[RecordingMicroAllocator] 'TfLiteTensor variable buffer data' used 10240 bytes with alignment overhead (requested 10240 bytes for 7 allocations)
[RecordingMicroAllocator] 'NodeAndRegistration struct' used 1200 bytes with alignment overhead (requested 1200 bytes for 15 NodeAndRegistration structs)
[RecordingMicroAllocator] 'Operator runtime data' used 148 bytes with alignment overhead (requested 148 bytes for 13 OpData structs)
Test Conv Model ~10% reduction in tail allocation:
-----------------------------------------------
[RecordingMicroAllocator] Arena allocation total 11680 bytes
[RecordingMicroAllocator] Arena allocation head 7744 bytes
[RecordingMicroAllocator] Arena allocation tail 3936 bytes
[RecordingMicroAllocator] 'TfLiteTensor struct' used 1680 bytes with alignment overhead (requested 1680 bytes for 15 tensors)
[RecordingMicroAllocator] 'TfLiteTensor quantization data' used 768 bytes with alignment overhead (requested 752 bytes for 24 allocations)
[RecordingMicroAllocator] 'TfLiteTensor variable buffer data' used 0 bytes with alignment overhead (requested 0 bytes for 0 allocations)
[RecordingMicroAllocator] 'NodeAndRegistration struct' used 560 bytes with alignment overhead (requested 560 bytes for 7 NodeAndRegistration structs)
[RecordingMicroAllocator] 'Operator runtime data' used 136 bytes with alignment overhead (requested 136 bytes for 5 OpData structs)
PiperOrigin-RevId: 316556393
Change-Id: Iadadab51019d2787d11af9713b3639f087afa7bc
The RecordingMicroAllocator class currently doesn't track variable tensor allocations. This was noted why the measured allocations had a missing ~10kb of tail space unaccounted for in the keyword model. This change tracks variable tensor allocation for the keyword model (the test conv model does not have any variable tensors).
Total and tail allocation creep up a bit here to handle the additional fields in RecordingMicroAllocator:
TestKeywordModelMemoryThreshold:
-------------------------------
[RecordingMicroAllocator] Arena allocation total 21472 bytes
[RecordingMicroAllocator] Arena allocation head 672 bytes
[RecordingMicroAllocator] Arena allocation tail 20800 bytes
[RecordingMicroAllocator] 'TfLiteTensor struct' used 6048 bytes with alignment overhead (requested 6048 bytes for 54 tensors)
[RecordingMicroAllocator] 'TfLiteTensor quantization data' used 2160 bytes with alignment overhead (requested 2160 bytes for 162 allocations)
[RecordingMicroAllocator] 'TfLiteTensor variable buffer data' used 10240 bytes with alignment overhead (requested 10240 bytes for 7 allocations)
[RecordingMicroAllocator] 'NodeAndRegistration struct' used 1200 bytes with alignment overhead (requested 1200 bytes for 15 NodeAndRegistration structs)
[RecordingMicroAllocator] 'Operator runtime data' used 148 bytes with alignment overhead (requested 148 bytes for 13 OpData structs)
TestConvModelMemoryThreshold:
-----------------------------
[RecordingMicroAllocator] Arena allocation total 12128 bytes
[RecordingMicroAllocator] Arena allocation head 7744 bytes
[RecordingMicroAllocator] Arena allocation tail 4384 bytes
[RecordingMicroAllocator] 'TfLiteTensor struct' used 1680 bytes with alignment overhead (requested 1680 bytes for 15 tensors)
[RecordingMicroAllocator] 'TfLiteTensor quantization data' used 1216 bytes with alignment overhead (requested 1216 bytes for 36 allocations)
[RecordingMicroAllocator] 'TfLiteTensor variable buffer data' used 0 bytes with alignment overhead (requested 0 bytes for 0 allocations)
[RecordingMicroAllocator] 'NodeAndRegistration struct' used 560 bytes with alignment overhead (requested 560 bytes for 7 NodeAndRegistration structs)
[RecordingMicroAllocator] 'Operator runtime data' used 136 bytes with alignment overhead (requested 136 bytes for 5 OpData structs)
PiperOrigin-RevId: 316166016
Change-Id: I7d806f901b39e5d6a73c3baaf11d85fa7f6e17b1
The TFLM team is preparing to provide an "optimized" memory build option. This build option will eliminate non-needed/essential fields from core TFLite structs. The first big change is to reduce the number of pointers on TfLiteTensor. Many models have multiple tensors (e.g. benchmark keyword has 54) and each pointer adds up for TFLM. This cleanup pass removes the soon to be un-used 'name' field from TfLiteTensor.
PiperOrigin-RevId: 316000388
Change-Id: I230865014d5a59b78c1c1c9f5eda784f6d611e77
This new test ensures that TF Micro does not regress current allocations (on x86-64 systems) for a canonical model. As RAM reduction changes are introduced, the values in this test can be updated from the console log of this test.
Current output for the keyword model:
Testing TestKeywordModelMemoryThreshold
[RecordingMicroAllocator] Arena allocation total 21440 bytes
[RecordingMicroAllocator] Arena allocation head 672 bytes
[RecordingMicroAllocator] Arena allocation tail 20768 bytes
[RecordingMicroAllocator] 'TfLiteTensor struct allocation' used 6048 bytes (requested 6048 bytes 54 times)
[RecordingMicroAllocator] 'TfLiteTensor quantization data allocations' used 2160 bytes (requested 2160 bytes 162 times)
[RecordingMicroAllocator] 'NodeAndRegistration struct allocation' used 1200 bytes (requested 1200 bytes 15 times)
[RecordingMicroAllocator] 'Operator runtime data allocation' used 148 bytes (requested 148 bytes 13 times)
PiperOrigin-RevId: 315958032
Change-Id: I226f6a01aa555970805388632559241a41ff8342
This change simplifies the interaction between the MicroInterpreter and MicroAllocator. All allocation for a given model is staged in MicroAllocator.StartModelAllocation() and MicroAllocator.FinishModelAllocation().
This change prepares for two upcoming features:
1.) Multi-tenant memory arena
2.) An easy-to-use RecordingMicroInterpreter to allow auditing of recorded memory arena allocations.
PiperOrigin-RevId: 315736762
Change-Id: Ia9da1f6edcd1001e3aad975c117905054f172e18
This change is a stepping stone to enable users to:
1.) Enable users to use a single MicroAllocator/arena for multiple models.
2.) Enable users to use the new recording allocation APIs for auditing arena allocations.
PiperOrigin-RevId: 315414448
Change-Id: Ied1ea56deb73c09bb64b3e41fd3502b5a4cd5bb8
This new class enables TFLM to measure, audit, and report memory usage in the shared tensor arena. Users may opt into this class by simply passing this class into a MicroInterpreter instance.
PiperOrigin-RevId: 314995667
Change-Id: I6a451944d55b0498a98f1cfd54244f9008e578d2
With this CL:
* We have the hooks needed to register an operator specific parse function with
MicroMutableOpResolver and the retrieve it without ParseOpData being used.
* This CL is still passing in ParseOpData as the operator specific parse
function and that will be changed in a follow-on CL.
PiperOrigin-RevId: 314982707
Change-Id: I174259aabd66e97184a8a282832f6c71580366c9
This new helper class will enable TFLM to log and record where the allocations in the shared arena are going. A future change will use this new class in a special "recording" MicroAllocator subclass. All these logging mechanisms will be opt-in by code.
PiperOrigin-RevId: 313843072
Change-Id: I3fc9205e475e89b4a3795c3cc79c31d2166da2c8
This will allow us to implement selective registration of the builtin parse
functions without changing the OpResolver base class in TFLite.
* MicroOpResolver is now an interface (matching the OpResolver name in TFLite).
* MicroMutableOpResolver is the implementation of the MicroOpResolver
interface that should be used by applications that do not want to use
AllOpsResolver.
PiperOrigin-RevId: 313691276
Change-Id: I0a9f51f6584326a3b3dd645cde083ba42116083d