Booting Embedded Linux in One Second !

  • 0

Booting Embedded Linux in One Second !

Booting a device as fast as possible is not only a requirement for time critical applications but also an important facet for improving the usability and user experience.
Most of the Embedded Linux distribution are designed to be generic and flexible to support variety of devices and use cases, therefore the boot-time aspect is not an important focus.
Thanks to its modularity and open source nature it is possible to reduce the boot-time and and achieve some spectacular results just using optimization techniques which does not require any considerable engineering effort.
we will cover in this article an ARM based systems and show a practical example of those tweaks applied to boot a Yocto based Linux on a Beaglebone Black in the blink of an eye:

Before starting any optimizations let’s get a closer look at a typical Embedded Linux boot-up sequence on an ARM processor and analyze how time is spent on each stage :

Boot Sequence on Sitara AM335x


Initial measurements

For our example on Beaglebone black the application is an In-vehicle infotainment (IVI) QT5 based connected application and the goal is to reduce the time from Power-On of the device (Cold-Boot) until the application shows up on the display and fully operable by a user.

To measure the time taken by the application to show its availability, we will use grabserial running on a Host (Ubuntu Linux) to measure time-stamps coming from the target on the serial console:

Important to note that grabserial cannot measure from power-on but starts counting time-stamps upon getting the first character on serial console. In the measurement above we set the time base to SPL using -m option.

Our application needed more than 12 seconds to start-up, from special markers present in the serial logs, we can deduce the time spent on each stage:

such time to start an Infotainment system in the car are unacceptable for an impatient end user.


As a recommendation, do not optimize things that reduce the ability to make measurements and hinder implementing further optimizations.

We start then from the last stage of the boot process, by optimizing the user-space and application start-up, then reduce kernel boot-time. Finally optimize the boot-loader(s).

User Space

Init Process:

One obvious optimization is to configure the Start of the critical application as soon as possible, of course after starting dependencies. In our case we use Systemd so we change the default target from multi-user to basic and remove dependencies to other services as follow:

As Systemd has an overhead, specially if not running on a multi-core CPU, we can start our application before Systemd initialization by creating a wrapper to init:

and instruct the kernel to use it instead of the default /sbin/init, by adding it to kernel command line:  init=/sbin/preinit

A drawback of this Setup is that your application loses some benefits of Systemd such as auto-restart after crash.

If there are many interdependent processes in play, systemd-analyze can be used to inspect those dependencies and reorder their priorities.


In our example, Qt application alone took almost 0,7s to run!

That could be definitely improved by:

  • choosing toolchains and compiler flags wisely, a new gcc build a faster code, compiler flags set with optimization flags: for example -O2 instead of -Os
  • compiling statically if possible. This will remove the overhead of using shared libraries
  • use prelink which reduce the time needed by dynamic linker to perform relocations
  • in case of a Qt QML based application, using QtQuickCompiler allows to compile QML source code into a binary that runs faster



Before running the init process, the Kernel needs first to mount the root Filesystem, therefore size and choice of the Filesystem have impact on startup time.

Filesystem Size

Size matters but in this case a smaller footprint will have less mount time. Here are some tweaks to reduce the footprint of a Yocto based Root Filesystem :

  • remove DISTRO features that are not used in local.conf:
  • remove unnecessary packages and dependencies from image recipes
  • finally use a lightweight C-library such as musl instead of default glibc:


Filesystem Type

Depending on the storage type an appropriate Filesystem can be used:

In case of eMMC/MMC, EXT3 or EXT4 are widely used but they have an overhead in compared to other Filesystems such as SquashFS (Read-only):

In Yocto this could be easily generated by selecting:

or if using wic kickstart :

The kernel cmdline need to include:



This is an important part of the optimization since a big part of our boot process was spent at this stage.

here are few steps we performed to speed-up kernel loading and execution:

  • build everything that is not needed at boot time as a kernel module
  • reduce Kernel configuration to strict minimum drivers and features that the application need, this implies a lot of trial and error
  • remove from device tree redundant devices or set their status to disabled
  • avoid calibration of loop delay by presetting the value to kernel command line lpj=1990656
  • turn off console output by setting quiet option to command line or disabling  completely printk, which also significantly reduces the kernel size
  • benchmark compressed versus non-compressed Kernel, on our board the decompression went faster than loading an uncompressed image



We enabled falcon-mode to bypass u-boot and focused only on optimizing SPL startup: See our Article about how to enable falcon mode

We disabled in SPL all features that are not required for production such as Networking, USB, YModem, Environment, EFI and Filesystems support:

As we disabled Filesystems support to have less overhead, Boot-Rom code is loading SPL from Raw MMC partition using specific offsets.

We aslo avoided slow bus initialization such as I2C, for example in the board file we removed the code responsible for board detection using I2C/EEPROM and hard-coded the board type to beaglebone black:

All changes made for SPL can be found here.



Last but not least, hardware settings can have an impact on boot time. For example the Boot Rom may lose precious time by trying to fetch software from wrong media if the bootstrap pins configuration is not correctly set .

On our board, we also noticed that boot up from internal eMMC configured in SLC Mode is a bit faster than default MLC mode configuration, and even faster than using a fast SD-Card(Class 10).



we succeeded in reducing the boot time from 12 second to one second with optimizing different components of the software. The startup time could be further shortened but at cost of the system flexibility.