Rust on Embedded Linux
We made the pragmatic decision early on to build our libraries on top of Tokio, which supports common operating systems like Linux, Windows, and MacOS. Although Rust can target bare metal, the async and library ecosystem for no_std is still rapidly evolving.
Embedded Linux
Despite this limitation, our libraries are still great for resource constrained environments by deploying on embedded Linux. Every release of our software includes pre-compiled libraries for popular embedded Linux architectures such as ARMv6/v7 and ARM-64.
These devices have limited RAM and CPU. Running native code is desirable in such environments. Fortunately, Rust has performance on par with C/C++. It is the same type of language from an execution standpoint.
Library Size
Embedded systems usually have less storage than servers and workstations. Flash memory used to be a lot more expensive than it is today, but footprint can still matter. I compiled the C bindings for DNP3 for ARMv6 under various configurations to illustrate some tradeoffs. The binary artifacts we build on release are not optimized for size:
- They include a full TLS stack inside of them by default.
- Debug symbols are not stripped.
- The compiler is configured to optimize for speed, not size.
Size (KB) | Remove TLS | Stripped | Optimize for size |
---|---|---|---|
9260 | |||
6668 | ✓ | ||
3496 | ✓ | ||
2236 | ✓ | ✓ | |
1900 | ✓ | ✓ | ✓ |
The biggest savings comes from stripping debug symbols, followed by removing TLS support. It would be nice in the future to also be able to optionally remove client or server support if only one is needed.
While our default release builds are not optimized for size, customers may produce their own builds based on their needs.
Memory Usage
Performance has many aspects, but one important dimension for embedded systems is low (and predictable) memory usage. I used the Valgrind tool massif to do heap profiling on two targets:
- OpenDNP3’s C++ outstation demo
- The C example program for our new Rust library
Both programs were configured to use a thread-pool with one worker thread. I let both programs run under the load of being polled by a client every 10 milliseconds.
Command: ./cpp/examples/outstation/outstation-demo (OpenDNP3 C++)
Massif arguments: (none)
ms_print arguments: massif.out.27100
--------------------------------------------------------------------------------
KB
224.3^ @
| #@:::::@@@@@::::::::::@:::@::::@:::@:::@:::@:::@::
| #@:::::@@@@@::::::::::@:::@::::@:::@:::@:::@:::@::
| #@:::::@@@@@::::::::::@:::@::::@:::@:::@:::@:::@::
| #@:::::@@@@@::::::::::@:::@::::@:::@:::@:::@:::@::
| #@:::::@@@@@::::::::::@:::@::::@:::@:::@:::@:::@::
| #@:::::@@@@@::::::::::@:::@::::@:::@:::@:::@:::@::
| #@:::::@@@@@::::::::::@:::@::::@:::@:::@:::@:::@::
| #@:::::@@@@@::::::::::@:::@::::@:::@:::@:::@:::@::
| #@:::::@@@@@::::::::::@:::@::::@:::@:::@:::@:::@::
| #@:::::@@@@@::::::::::@:::@::::@:::@:::@:::@:::@::
| #@:::::@@@@@::::::::::@:::@::::@:::@:::@:::@:::@::
| #@:::::@@@@@::::::::::@:::@::::@:::@:::@:::@:::@::
| :#@:::::@@@@@::::::::::@:::@::::@:::@:::@:::@:::@::
| ::#@:::::@@@@@::::::::::@:::@::::@:::@:::@:::@:::@::
| ::#@:::::@@@@@::::::::::@:::@::::@:::@:::@:::@:::@::
| ::#@:::::@@@@@::::::::::@:::@::::@:::@:::@:::@:::@::
| ::#@:::::@@@@@::::::::::@:::@::::@:::@:::@:::@:::@::
| ::#@:::::@@@@@::::::::::@:::@::::@:::@:::@:::@:::@::
| ::#@:::::@@@@@::::::::::@:::@::::@:::@:::@:::@:::@::
0 +----------------------------------------------------------------------->Mi
0 26.07
OpenDNP3 peaked at 224KB of memory usage. The heap usage was constant under load.
--------------------------------------------------------------------------------
Command: ./outstation_example tcp (C linked against Rust library)
Massif arguments: (none)
ms_print arguments: massif.out.25037
--------------------------------------------------------------------------------
KB
134.1^ #
| #::::@:::@::::::::@:::::::@:::::::@::::::@:::::::@:::::::@::::::
| ::#::::@:::@::::::::@:::::::@:::::::@::::: @:::::: @:::::: @::::::
| ::#::::@:::@::::::::@:::::::@:::::::@::::: @:::::: @:::::: @::::::
| ::#::::@:::@::::::::@:::::::@:::::::@::::: @:::::: @:::::: @::::::
| ::#::::@:::@::::::::@:::::::@:::::::@::::: @:::::: @:::::: @::::::
| ::#::::@:::@::::::::@:::::::@:::::::@::::: @:::::: @:::::: @::::::
| :::#::::@:::@::::::::@:::::::@:::::::@::::: @:::::: @:::::: @::::::
| ::::#::::@:::@::::::::@:::::::@:::::::@::::: @:::::: @:::::: @::::::
| ::::#::::@:::@::::::::@:::::::@:::::::@::::: @:::::: @:::::: @::::::
| ::::#::::@:::@::::::::@:::::::@:::::::@::::: @:::::: @:::::: @::::::
| ::::#::::@:::@::::::::@:::::::@:::::::@::::: @:::::: @:::::: @::::::
| ::::#::::@:::@::::::::@:::::::@:::::::@::::: @:::::: @:::::: @::::::
| ::::#::::@:::@::::::::@:::::::@:::::::@::::: @:::::: @:::::: @::::::
| ::::#::::@:::@::::::::@:::::::@:::::::@::::: @:::::: @:::::: @::::::
| ::::#::::@:::@::::::::@:::::::@:::::::@::::: @:::::: @:::::: @::::::
| :::::#::::@:::@::::::::@:::::::@:::::::@::::: @:::::: @:::::: @::::::
| :::::#::::@:::@::::::::@:::::::@:::::::@::::: @:::::: @:::::: @::::::
| :::::#::::@:::@::::::::@:::::::@:::::::@::::: @:::::: @:::::: @::::::
| :::::#::::@:::@::::::::@:::::::@:::::::@::::: @:::::: @:::::: @::::::
0 +----------------------------------------------------------------------->Mi
0 8.805
The C program using the FFI bindings to the Rust library peaked at 134KB, 40% less than the pure C++ application. Similarly, it had stable memory usage under load.
This is by no means an exhaustive analysis, but what it demonstrates is that our new library in Rust uses quite a bit less memory than an equivalent C++ library.
Scalable performance
Since our libraries are asynchronous, memory usage will not scale linearly as you add communication sessions. Each session doesn’t require its own thread: they share a pool of threads. This is where our libraries really shine for embedded environments: you squeeze tons of performance out of tiny hardware footprints. The result is reduced memory usage and time spent context switching since the various communication sessions can cooperatively multi-task.
Measuring exactly how memory usage scales while adding communication sessions will be the subject of another blog post.