TinyML: Determining the Right Model Size for Edge Devices

AI on a Coin Battery. Forget GPUs. Forget cloud servers. TinyML is the art of running Artificial Intelligence on hardware that costs less than $5 and runs for a year on a CR2032 battery.

We are talking about Microcontrollers (MCUs) like the Arduino Nano 33 BLE, the ESP32, or the STM32 series. These chips often have less than 256KB of RAM.

The Constraint Triangle

When deploying TinyML, you are playing a zero-sum game between three variables. You can pick two, but you usually sacrifice the third:

Accuracy: How well does the model detect the wake word or anomaly?
Latency: How fast does it react? (Crucial for vibration monitoring).
Memory Size: Does it actually fit on the flash storage and run within the RAM limits?

Sizing Guide: Cortex-M vs. Ethos-U

Not all MCUs are created equal. Knowing your hardware is step one.

1. Standard Arm Cortex-M4 / M7 These are general-purpose microcontrollers.

Typical RAM: 128KB - 1MB.
Model Capacity: Suitable for Keyword Spotting (KWS) like "Ok Google", or simple accelerometer gesture recognition.
Max Model Size: ~50KB to 100KB (int8 quantized).

2. Arm Ethos-U (NPU) These are microNPU accelerators designed specifically for ML on the edge.

Capabilities: Can run Vision models (Person Detection) at 30 FPS.
Efficiency: Offloads the math from the Cortex-M CPU, saving massive power.
Max Model Size: Can handle significantly larger models (up to several MB depending on external flash) efficiently.

Techniques for Shrinking Models

Rule of Thumb: If your model is in 32-bit Float (FP32), it is 4x bigger than it needs to be.

Quantization (Post-Training) This is mandatory. By converting your model's weights from 32-bit floats to 8-bit integers (int8), you reduce the model size by 75% instantly. On MCUs without a Floating Point Unit (FPU), this also speeds up inference by 10x-50x.

Pruning Neural networks are sparse. Many connections (weights) are close to zero and do nothing. Pruning sets these to exactly zero, allowing compression algorithms to ignore them.

Conclusion

TinyML is not about being "state of the art" in accuracy. It is about being "good enough" to solve a problem for $2 in hardware. Start with TensorFlow Lite for Microcontrollers or Edge Impulse, and always assume you have less RAM than you think.

See, Understand, Optimize -
All in One Place

Atler Pilot decodes your cloud spend story by bringing monitoring, automation, and intelligent insights together for faster and better cloud operations.