AI/ML Model Development Platform for Embedded MCU IoT Edge Gateway

mediumThis post was originally published by Prasan Dutt Raju at Medium [AI]

Image for post
Photo by Louis Reed on Unsplash

As billions of devices getting connected to the Internet is not a distant reality anymore, there are more and more number of intelligent gateways needed to act as a backbone of complex infrastructure for internet-of-things (IoT). Need for more gateways are increasing which could work on low power in constrained environments as well as having capability to process as much data as possible, also termed as edge gateway.

Edge gateway processes sensor data, infer and detect any anomalies in some cases before sending data to the cloud. This constrained capability is very important to avoid higher charges incurred for using cloud infrastructure. Further, in order to reduce hardware infrastructure cost, there is a trend towards designing micro-controller (MCU) based edge gateway. MCU edge gateway could run on Real-Time-Operating-System (RTOS) and perform a set number of tasks in order to save power and run on constrained RAM and where flash memory is available.

For example, a simple weather station for agriculture field application needs to process temperature & humidity data. This simple application requires a controller having ADC (Analog-to-Digital Converter) to sample analog values and GPRS capability over serial interfacing. There is no need for a high-end processor based gateway having multi-purpose capabilities. However, there is a need to detect the pattern of sensor data and send only valuable data or anomalies to the cloud.Having such constrained capabilities with lowest possible power is not possible without adding the capability of Artificial Intelligence / Machine Learning (AI/ML) at the edge.

Embedding trained models into the firmware code allows AI/ML capabilities and makes intelligent edge. Taking the example of a weather station, there are greater changes of humidity fluctuation in tropical climates and need for controlled irrigation plays a major role in yield. Getting timely data and finding out any anomaly in humidity information plays a major role in taking predictive decisions. Thus, Deploying machine learning inference models at the edge along with edge analytics results in refined data at the edge.

AI/ ML models work on some particular set of frameworks such as Keras, TFLite (TensorFlow Lite), ONNX, Lasagne, Caffe, Convetjs etc. Written in C++ 11, TFLite officially provides support for ARM Cortex-M series micro-controller architecture. There is direct support for some popular development board platforms. In-spite of all such supports, there are some limitations which is as following-

  • Support for a limited subset of TensorFlow operations
  • Support for a limited set of devices
  • Low-level C++ API requiring manual memory management
  • On device training is not supported

Further, developing training models from sensor raw data using these individual frameworks from scratch requires knowledge of data science, algorithms and other ML practices. Increased time-to-market is the biggest challenge for implementing AI/ML at the edge.

Since ARM being the leading architecture choice for designing MCU based gateway, we tried to find out the partners providing AI/ML support. We found a comprehensive list of solutions providers who are partners of the STM32 ecosystem. These solution providers not only resolved the limitation mentioned above, but provided an all-in-one solution for end-to-end firmware development.

We tried to look into nuts and bolts of 2 major solutions — Cartesium AI and Edge Impulse. Below is a detailed walk-through of each of these platforms.

Cartesium.AI — Nanoedge AI Studio

Cartesium is one of the leading partners of the STM ecosystem,with expertise in AI, Catesium developed NanoEdge AI studio, a revolutionary technology enabling Machine Learning and AI analysis in small footprint, low energy devices, such as STM32 micro-controllers. NanoEdge AI studio takes away the complexity of the pipeline generation based on the data. It automatically finds and generates the model required for your need and generates the code. The code can then be integrated into your application and decisions based on the AI/ML can be acted upon.

It allows complex AI models to be loaded into standard ARM processors for advanced analysis of data from multiple sensors. It not only only does analysis but also allows learning ability within the micro-controller and without the help of complex cloud infrastructure.

There are 3 types of pricing model — Entry, Pro and Corp. Entry level comes 2 months of trial account with limited functionality. Developers need to download nanoedge AI studio software in order to start working with it. Attached screenshot of the software –

Image for post

Device On-boarding

One key feature found for this platform is the flexibility to choose target ARM architecture such as M0, M0+, M1, M3, M4, M23, M33, M7 etc. Latest updates on this platform allows direct support for STM Nucleo boards.

Data Acquisition

Capturing raw data from a sensor is possible in only two modes — either plain raw text file or directly from serial port. The process of data acquisition and library generation is shown below using flow chart-

Image for post

Model Parameter Design

There is no provision to set model parameters in the trial account. Developers cannot know which ML algorithm is chosen for model training. With a pro user account, there is support for new algorithms and use case validations.

Training Model with Anomalies

Abnormal signals are introduced in this stage in order to train the model to learn any unusual signal behavior during actual field environment conditions. This step is followed by optimize and benchmark where the model is retrained with some optional parameters and it is made sure that balanced accuracy and confidence achieves up to 90% or above as shown in flowchart.

Live Test (Emulator) / Model Testing

NanoEdge AI Studio allows developers to test the model on a hardware emulator as it should behave on an actual MCU. Such a feature gives confidence on the model before releasing product ready firmware.


The trained model output provides a static .a file which is the library used by AI core components. It provides smart functions (learn, detect, …) as building blocks to implement smart features into your C code, to be embedded in micro-controllers. The output file can be used in STM32 CubeEdge IDE for firmware development. However, an entry level account allows only supported boards library files to be available to download. With a pro user account, unlimited number of prototypes available on any arm Cortex M core.


The demo account comes with 3 months of trial and not fully open source. It doesn’t allow time series model training. It seems still evolving. Documentation is aggregated but the platform seems still evolving.


Edge Impulse

Edge impulse is an advanced AI/ML edge platform having a wide range of device integration support. Having Edge Optimized Neural (EON)™ technology, it enables valuable use of the 99% of sensor data discarded today due to cost, bandwidth or power. Edge Impulse provides maximum efficiency on a wide range of hardware from MCUs to CPUs and It is free for developers as well as trusted by enterprises.

As attached below, It works on a web based portal called edge impulse studio.

Image for post

Device On-boarding

Edge Impulse studio provides direct support for acquiring sensor data from the following boards- ST IoT Discovery Kit, Arduino Nano 33 BLE sense, Eta Compute ECM3532 AI sensor, OpenMV Cam H7 Plus. Apart from these directly supported boards, it can acquire smartphone sensor data in 3 simple steps. Developers can easily train models using phone sensor data and edge impulse allows easy integration to make AI/ML based applications.

In order to acquire data from any other device, there are options to use data forwarder over serial interfacing or directly uploading data in existing datasets format such as WAV, JPG, PNG, CBOR or JSON.

Data Acquisition/ Test and Training data

Data can be captured from the list of on-boarded devices. There is an option to acquire test data and then training data. Also, users can manually define the number of samples needed to be captured.

Model Parameter Design

The Edge Impulse Data Acquisition format is an optimized binary format for time-series data. It allows cryptographic signing of the data when sampling is complete to prove authenticity of data. Data is encoded using CBOR (Concise Binary Object Representation) or JSON and signed according to the JSON Web Signature specification.

Known as impulse design, it allows parameter settings from a range of options and thus it allows precise control on model training. There are two major segments available in impulse design- processing block and learning block.

Processing block contains most common signal processing options like spectral analysis for repetitive motion data, flatten axis for slow moving data like temperature, image and audio processing etc. If there is a requirement to detect unusual sensor data, then the option to add a custom processing block is helpful.

Similar to processing blocks, there are different types of machine learning blocks, each with their own unique benefits and drawbacks. Edge Impulse gives the ability to choose learning blocks such as Neural network (Keras), K-Means anomaly detection, Transfer Image Learning etc.

It allows time series data capture by default, where window size and window increase size can be manually set. A typical captured data in CBOR notation looks as follows:


“protected”: {

“ver”: “v1”,

“alg”: “HS256”,

“iat”: 1564128599


“signature”: “b0ee0572a1984b93b6bc56e6576e2cbbd6bccd65d0c356e26b31bbc9a48210c6”,

“payload”: {

“device_name”: “ac:87:a3:0a:2d:1b”,

“device_type”: “DISCO-L475VG-IOT01A”,

“interval_ms”: 10,

“sensors”: [

{ “name”: “accX”, “units”: “m/s2” },

{ “name”: “accY”, “units”: “m/s2” },

{ “name”: “accZ”, “units”: “m/s2” }


“values”: [

[ -9.81, 0.03, 1.21 ],

[ -9.83, 0.04, 1.27 ],

[ -9.12, 0.03, 1.23 ],

[ -9.14, 0.01, 1.25 ]




Where, iat stands for the data when the file was created in seconds since epoch. Only set this when the device creating the file has an accurate clock (optional).

Retraining Model with Anomalies

Once impulse design is complete, the model can be retrained with known parameters. This step gives a complete picture of all parameters taken into consideration for retraining the model.

Live Test (Emulator) / Model Testing

Live testing is not as much real time as nanoedge, but the results are similar. The results of test appear after defined time period as shown below-

Image for post


There is direct integration for the version control system of generated output model data. Direct support to generate Arduino libraries gives the flexibility of quick prototyping. Direct support to generate STM32Cube.AI pack allows industrial grade firmware development on trusted STM32 ARM ecosystem. Direct availability of C++ libraries allows integration of training models for any type of ARM core architecture. It is also possible to run the models on Zephyr RTOS based micro-controller architecture.


One significant advantage of Edge impulse is availability of SDK and API. Almost all services and steps discussed in this article can be automated using API integration. It is free to use, and SDK is open-source. Documentation is properly arranged to allow a seamless development cycle.

Comparative Decision

The article presented a detailed description of Cartesium NanoEdge AI and Edge Impulse. Edge Impulse definitely has more capabilities and developer friendly nature as well as its open source SDKs and APIs allows wider adoption. Cartesium NanoEdge AI seems still evolving but there is promise to provide all support and possibilities in pro and corp accounts.

Finally, We at Techolution IoT practice recommends Edge Impulse for adding AI/ML capabilities at the edge gateway. It is having end-to-end capability having precise control on AI/ML model design. However, integrating all firmware code pieces into an automated way and at a scale requires significant effort from a pool of expertise.

Spread the word

This post was originally published by Prasan Dutt Raju at Medium [AI]

Related posts