Category Archives: FastBoot

  • 0

Booting Embedded Linux in One Second !

Booting a device as fast as possible is not only a requirement for time critical applications but also an important facet for improving the usability and user experience.
Most of the Embedded Linux distribution are designed to be generic and flexible to support variety of devices and use cases, therefore the boot-time aspect is not an important focus.
Thanks to its modularity and open source nature it is possible to reduce the boot-time and and achieve some spectacular results just using optimization techniques which does not require any considerable engineering effort.
we will cover in this article an ARM based systems and show a practical example of those tweaks applied to boot a Yocto based Linux on a Beaglebone Black in the blink of an eye:

Before starting any optimizations let’s get a closer look at a typical Embedded Linux boot-up sequence on an ARM processor and analyze how time is spent on each stage :

Boot Sequence on Sitara AM335x

 

Initial measurements

For our example on Beaglebone black the application is an In-vehicle infotainment (IVI) QT5 based connected application and the goal is to reduce the time from Power-On of the device (Cold-Boot) until the application shows up on the display and fully operable by a user.

To measure the time taken by the application to show its availability, we will use grabserial running on a Host (Ubuntu Linux) to measure time-stamps coming from the target on the serial console:

Important to note that grabserial cannot measure from power-on but starts counting time-stamps upon getting the first character on serial console. In the measurement above we set the time base to SPL using -m option.

Our application needed more than 12 seconds to start-up, from special markers present in the serial logs, we can deduce the time spent on each stage:

such time to start an Infotainment system in the car are unacceptable for an impatient end user.

Optimizations

As a recommendation, do not optimize things that reduce the ability to make measurements and hinder implementing further optimizations.

We start then from the last stage of the boot process, by optimizing the user-space and application start-up, then reduce kernel boot-time. Finally optimize the boot-loader(s).

User Space

Init Process:

One obvious optimization is to configure the Start of the critical application as soon as possible, of course after starting dependencies. In our case we use Systemd so we change the default target from multi-user to basic and remove dependencies to other services as follow:

As Systemd has an overhead, specially if not running on a multi-core CPU, we can start our application before Systemd initialization by creating a wrapper to init:

and instruct the kernel to use it instead of the default /sbin/init, by adding it to kernel command line:  init=/sbin/preinit

A drawback of this Setup is that your application loses some benefits of Systemd such as auto-restart after crash.

If there are many interdependent processes in play, systemd-analyze can be used to inspect those dependencies and reorder their priorities.

Application:

In our example, Qt application alone took almost 0,7s to run!

That could be definitely improved by:

  • choosing toolchains and compiler flags wisely, a new gcc build a faster code, compiler flags set with optimization flags: for example -O2 instead of -Os
  • compiling statically if possible. This will remove the overhead of using shared libraries
  • use prelink which reduce the time needed by dynamic linker to perform relocations
  • in case of a Qt QML based application, using QtQuickCompiler allows to compile QML source code into a binary that runs faster

 

Root-Filesystem:

Before running the init process, the Kernel needs first to mount the root Filesystem, therefore size and choice of the Filesystem have impact on startup time.

Filesystem Size

Size matters but in this case a smaller footprint will have less mount time. Here are some tweaks to reduce the footprint of a Yocto based Root Filesystem :

  • remove DISTRO features that are not used in local.conf:
  • remove unnecessary packages and dependencies from image recipes
  • finally use a lightweight C-library such as musl instead of default glibc:

 

Filesystem Type

Depending on the storage type an appropriate Filesystem can be used:

In case of eMMC/MMC, EXT3 or EXT4 are widely used but they have an overhead in compared to other Filesystems such as SquashFS (Read-only):

In Yocto this could be easily generated by selecting:

or if using wic kickstart :

The kernel cmdline need to include:

 

KERNEL

This is an important part of the optimization since a big part of our boot process was spent at this stage.

here are few steps we performed to speed-up kernel loading and execution:

  • build everything that is not needed at boot time as a kernel module
  • reduce Kernel configuration to strict minimum drivers and features that the application need, this implies a lot of trial and error
  • remove from device tree redundant devices or set their status to disabled
  • avoid calibration of loop delay by presetting the value to kernel command line lpj=1990656
  • turn off console output by setting quiet option to command line or disabling  completely printk, which also significantly reduces the kernel size
  • benchmark compressed versus non-compressed Kernel, on our board the decompression went faster than loading an uncompressed image

 

BOOTLOADER

We enabled falcon-mode to bypass u-boot and focused only on optimizing SPL startup: See our Article about how to enable falcon mode

We disabled in SPL all features that are not required for production such as Networking, USB, YModem, Environment, EFI and Filesystems support:

As we disabled Filesystems support to have less overhead, Boot-Rom code is loading SPL from Raw MMC partition using specific offsets.

We aslo avoided slow bus initialization such as I2C, for example in the board file we removed the code responsible for board detection using I2C/EEPROM and hard-coded the board type to beaglebone black:

All changes made for SPL can be found here.

 

HARDWARE CONSIDERATIONS

Last but not least, hardware settings can have an impact on boot time. For example the Boot Rom may lose precious time by trying to fetch software from wrong media if the bootstrap pins configuration is not correctly set .

On our board, we also noticed that boot up from internal eMMC configured in SLC Mode is a bit faster than default MLC mode configuration, and even faster than using a fast SD-Card(Class 10).

 

CONCLUSION

we succeeded in reducing the boot time from 12 second to one second with optimizing different components of the software. The startup time could be further shortened but at cost of the system flexibility.


  • 0

Fast Boot Linux with u-Boot Falcon Mode

Falcon mode is a feature in u-Boot that enables fast booting by allowing SPL directly to start Linux kernel and skip completely u-boot loading and initialization.

To understand how Falcon mode works let’s first have a quick look at a typical Linux boot-up sequence on an ARM processor:

Standard Linux Boot Process

1. First stage – Boot ROM

This is the primary program loader residing on a read-only flash memory (ROM) integrated directly into the processor chip.
It contains the very first code which is executed on power-on or reset.
Depending on the configuration of the bootstrap pins or internal fuses it may decide from which media to load and run the next piece of software. In case of a Secure Boot processor it will also verify the code authenticity before its execution.
At this stage, Boot ROM code is not aware about memory type and different interconnected peripherals.
The main goal here is to perform basic peripherals initialization such as PLLs, system clocks setup then find a boot device from which load a bootloader such as u-Boot.

2. Second stage – SPL

A typical u-Boot image is around few hundreds KB size (~300KB) which does not fit inside internal SRAM of most ARM processor. They are typically less than 100KB.
To handle this limitation, u-Boot adopted the SPL (Secondary Program Loader) approach which consists of creating a very small pre-loader that after configuring and initializing peripherals and the main system memory can load the full blown u-Boot.
It shares the same u-Boot’s sources but with a minimal set of code.
So when u-Boot is built for a platform that requires SPL, it generate two binaries : SPL (MLO file) and u-Boot image.

3. Third stage – u-Boot

Das u-Boot aims to offer a flexibel way to load and start the Linux Kernel from a different type of devices, it also provides rich features for a bootloader, such as a command line interface, Shell Scripting, Support of a variety of Filesystems, networking and other options that are very helpful during initial Hardware Bring-Up and development process, but can be bypassed for the production by enabling the Falcon-Mode and save by the way some precious seconds of the boot time !

Falcon Mode

Configure and enable Falcon-Mode

We will use a Beaglebone Black  as hardware example to showcase the setup, booting either from an eMMC or SD Card. Nevertheless the procedure should be almost identical to other ARM based boards supporting the SPL framework.

If Boot Rom Code support it, we recommend to store and boot the SPL from raw partition and by this mean also u-Boot and Linux Kernel to skip the overhead of using a Filesystem. As result the boot is even  faster !

Partition # Name Description Offset range (Bytes) Offset range (Blocks*) Size
1 MBR Master Boot Record 0x000000 – 0x010000 0x0000 – 0x007F 64KB
2 FDT Device Tree + ARGS 0x010000 – 0x040000 0x0080 – 0x01FF 192KB
4 SPL*   SPL 0x040000 – 0x060000 0x0200 – 0x02FF 128KB
5 U-Boot Full Bootloader 0x060000 – 0x0e0000 0x0300 – 0x06FF 512KB
6 U-Boot Env U-Boot environment 0x0e0000 – 0x120000 0x0700 – 0x08FF 256KB
7 Kernel Linux Kernel 0x120000 -0x1000000 0x0900 – 0x28FF 14MB

 

SPL* offset is the address from which the Boot ROM can fetch bootloader. This address is hard coded in the Boot ROM and specific to processor.

In case of AM335x there are 4 possibilities at 0x0, 0x20000,0x40000, 0x60000 [chapter 26.1.7.5.5 in the technical refrence manual ].

[1 x Block is 512 Bytes]

We are going to use u-boot  v2017.05-rc3:

U-boot offset location is defined in the sources by the following config :

Kernel offset location is defined in the sources by the following config :

Environment offset location and size are defined by the following configs:

Configs above are default in u-Boot apart from environment config which can be set in the board config file as follow:

Make sure that Falcon Mode config is enabled :

Let’s configure now u-boot for the beaglebone black :

Build it using an arm cross-compiler for example yocto toolchains:

If everything went well, MLO and u-boot.img files should generated in the top directory.

Now we are ready to write them adding the Kernel and Device Tree to SD card using the address table above:

In case of using Yocto we can easily generate an image to flash on SD Card/eMMC using the following wic kickstart file:

Note that Falcon-Mode supports only uImage Kernel format !

Now let’s start our board using the previous image and configure it to use falcon-mode:

we set first the bootargs:

overwrite loadfdt and loadimage macros to use raw partitions:

then change the bootcmd to reflect our changes:

we run spl export command from u-Boot to prepare for the SPL everything that should be in place as if bootm to be executed:

The spl export command does not persist to media so we have to overwrite
the prepared FDT from RAM offsets 8ffd9000 to 8ffed310 into the FDT partition:

Finally we are ready to switch-on the falcon mode :

On the next boot we see that SPL jumped directly to Linux kernel :

In other articles we will cover further techniques used to achieve a Faster Linux Boot, stay tuned !