Booting Embedded Linux in One Second !

  • 0

Booting Embedded Linux in One Second !

Booting a device as fast as possible is not only a requirement for time critical applications but also an important facet for improving the usability and user experience.
Most of the Embedded Linux distribution are designed to be generic and flexible to support variety of devices and use cases, therefore the boot-time aspect is not an important focus.
Thanks to its modularity and open source nature it is possible to reduce the boot-time and and achieve some spectacular results just using optimization techniques which does not require any considerable engineering effort.
we will cover in this article an ARM based systems and show a practical example of those tweaks applied to boot a Yocto based Linux on a Beaglebone Black in the blink of an eye:

Before starting any optimizations let’s get a closer look at a typical Embedded Linux boot-up sequence on an ARM processor and analyze how time is spent on each stage :

Boot Sequence on Sitara AM335x

 

Initial measurements

For our example on Beaglebone black the application is an In-vehicle infotainment (IVI) QT5 based connected application and the goal is to reduce the time from Power-On of the device (Cold-Boot) until the application shows up on the display and fully operable by a user.

To measure the time taken by the application to show its availability, we will use grabserial running on a Host (Ubuntu Linux) to measure time-stamps coming from the target on the serial console:

$ grabserial -d "/dev/ttyUSB0" -e 30 -t -m "U-Boot SPL*"

Important to note that grabserial cannot measure from power-on but starts counting time-stamps upon getting the first character on serial console. In the measurement above we set the time base to SPL using -m option.

Our application needed more than 12 seconds to start-up, from special markers present in the serial logs, we can deduce the time spent on each stage:

such time to start an Infotainment system in the car are unacceptable for an impatient end user.

Optimizations

As a recommendation, do not optimize things that reduce the ability to make measurements and hinder implementing further optimizations.

We start then from the last stage of the boot process, by optimizing the user-space and application start-up, then reduce kernel boot-time. Finally optimize the boot-loader(s).

User Space

Init Process:

One obvious optimization is to configure the Start of the critical application as soon as possible, of course after starting dependencies. In our case we use Systemd so we change the default target from multi-user to basic and remove dependencies to other services as follow:

[Unit]
Description=ConnectedCarIVI service
DefaultDependencies=no

[Service]
ExecStart=/usr/bin/ConnectedCarIVI -plugin Tslib

[Install]
WantedBy=basic.target

As Systemd has an overhead, specially if not running on a multi-core CPU, we can start our application before Systemd initialization by creating a wrapper to init:

#!/bin/busybox sh

echo "-> Start Application..."

#Initialize your time-critical application here !
/usr/bin/ConnectedCarIVI -plugin Tslib &

echo "-> Application started !"

# start real init (systemd/SysVinit)
exec /sbin/init

and instruct the kernel to use it instead of the default /sbin/init, by adding it to kernel command line:  init=/sbin/preinit

A drawback of this Setup is that your application loses some benefits of Systemd such as auto-restart after crash.

If there are many interdependent processes in play, systemd-analyze can be used to inspect those dependencies and reorder their priorities.

Application:

In our example, Qt application alone took almost 0,7s to run!

That could be definitely improved by:

  • choosing toolchains and compiler flags wisely, a new gcc build a faster code, compiler flags set with optimization flags: for example -O2 instead of -Os
  • compiling statically if possible. This will remove the overhead of using shared libraries
  • use prelink which reduce the time needed by dynamic linker to perform relocations
  • in case of a Qt QML based application, using QtQuickCompiler allows to compile QML source code into a binary that runs faster

 

Root-Filesystem:

Before running the init process, the Kernel needs first to mount the root Filesystem, therefore size and choice of the Filesystem have impact on startup time.

Filesystem Size

Size matters but in this case a smaller footprint will have less mount time. Here are some tweaks to reduce the footprint of a Yocto based Root Filesystem :

  • remove DISTRO features that are not used in local.conf:
    DISTRO_FEATURES_remove = "bluetooth"
    DISTRO_FEATURES_remove = "3g"
    DISTRO_FEATURES_remove = "opengl"
    DISTRO_FEATURES_remove = "wayland"
    DISTRO_FEATURES_remove = "x11"
    DISTRO_FEATURES_remove = "nfc"
    DISTRO_FEATURES_remove = "nfs"
    DISTRO_FEATURES_remove = "ext2"
  • remove unnecessary packages and dependencies from image recipes
  • finally use a lightweight C-library such as musl instead of default glibc:
    TCLIBC=musl MACHINE=my-machine bitbake my-image

 

Filesystem Type

Depending on the storage type an appropriate Filesystem can be used:

In case of eMMC/MMC, EXT3 or EXT4 are widely used but they have an overhead in compared to other Filesystems such as SquashFS (Read-only):

In Yocto this could be easily generated by selecting:

IMAGE_FSTYPE += "squashfs"

or if using wic kickstart :

part / --source rootfs --ondisk mmcblk --fstype=squashfs  --label root --size 150M

The kernel cmdline need to include:

rootfstype=squashfs

 

KERNEL

This is an important part of the optimization since a big part of our boot process was spent at this stage.

here are few steps we performed to speed-up kernel loading and execution:

  • build everything that is not needed at boot time as a kernel module
  • reduce Kernel configuration to strict minimum drivers and features that the application need, this implies a lot of trial and error
  • remove from device tree redundant devices or set their status to disabled
  • avoid calibration of loop delay by presetting the value to kernel command line lpj=1990656
  • turn off console output by setting quiet option to command line or disabling  completely printk, which also significantly reduces the kernel size
  • benchmark compressed versus non-compressed Kernel, on our board the decompression went faster than loading an uncompressed image

 

BOOTLOADER

We enabled falcon-mode to bypass u-boot and focused only on optimizing SPL startup: See our Article about how to enable falcon mode

We disabled in SPL all features that are not required for production such as Networking, USB, YModem, Environment, EFI and Filesystems support:

CONFIG_SPL_MUSB_NEW_SUPPORT=n
CONFIG_SPL_EXT_SUPPORT=n
CONFIG_SPL_FAT_SUPPORT=n
CONFIG_SPL_ETH_SUPPORT=n
CONFIG_SPL_LIBDISK_SUPPORT=n
CONFIG_DRIVER_TI_CPSW=n
CONFIG_SPL_USBETH_SUPPORT=n
CONFIG_SPL_MUSB_NEW_SUPPORT=n
CONFIG_SPL_YMODEM_SUPPORT=n
CONFIG_SPL_EFI_PARTITION=n
CONFIG_SPL_DOS_PARTITION=n
CONFIG_SPL_ENV_SUPPORT=n

As we disabled Filesystems support to have less overhead, Boot-Rom code is loading SPL from Raw MMC partition using specific offsets.

We aslo avoided slow bus initialization such as I2C, for example in the board file we removed the code responsible for board detection using I2C/EEPROM and hard-coded the board type to beaglebone black:

index 48c139a..18c7942 100644
--- a/board/ti/am335x/board.h
+++ b/board/ti/am335x/board.h
@@ -26,27 +26,27 @@
 
 static inline int board_is_bone(void)
 {
-       return board_ti_is("A335BONE");
+       return 0;
 }
 
 static inline int board_is_bone_lt(void)
 {
-       return board_ti_is("A335BNLT");
+       return 1;
 }
 
 static inline int board_is_bbg1(void)
 {
-       return board_is_bone_lt() && !strncmp(board_ti_get_rev(), "BBG1", 4);
+       return 0;
 }
 
 static inline int board_is_evm_sk(void)
 {
-       return board_ti_is("A335X_SK");
+       return 0;
 }
 
 static inline int board_is_idk(void)
 {
-       return !strncmp(board_ti_get_config(), "SKU#02", 6);
+       return 0;
 }
 
 static inline int board_is_gp_evm(void)
@@ -56,13 +56,12 @@ static inline int board_is_gp_evm(void)
 
 static inline int board_is_evm_15_or_later(void)
 {
-       return (board_is_gp_evm() &&
-               strncmp("1.5", board_ti_get_rev(), 3) <= 0);
+       return 0;
 }
 
 static inline int board_is_icev2(void)
 {
-       return board_ti_is("A335_ICE") && !strncmp("2", board_ti_get_rev(), 1);
+       return 0;
 }
 
 /*

All changes made for SPL can be found here.

 

HARDWARE CONSIDERATIONS

Last but not least, hardware settings can have an impact on boot time. For example the Boot Rom may lose precious time by trying to fetch software from wrong media if the bootstrap pins configuration is not correctly set .

On our board, we also noticed that boot up from internal eMMC configured in SLC Mode is a bit faster than default MLC mode configuration, and even faster than using a fast SD-Card(Class 10).

 

CONCLUSION

we succeeded in reducing the boot time from 12 second to one second with optimizing different components of the software. The startup time could be further shortened but at cost of the system flexibility.