开始 汽车 智能 虚拟 仪表 2

转载一篇 一秒启动 linux的文章,基于TI的DM6446的,做智能仪表,可以用得上


All This For 1 Second Boot


Results

Let me give the Results first

Sequence time (uncompressed Kernel) time (compressed Kernel)
U-boot load and start 0.08 0.08
Kernel load 0.52 (0.24 crc32 and 0.28 for copy) 0.35 (0.16 crc32 and 0.19 for copy)
Kernel start and uncompress 0.0 0.66 (for unzip)
Kernel Initialization 0.37 0.38
Init 0 (bypassed init=/bin/ash) 0 (bypassed init=/bin/ash)
Switching Run level and executing init scripts 0 (bypassed init=/bin/ash) 0 (bypassed init=/bin/ash)
Starting Shell 0 (init=/bin/ash) 0 (init=/bin/ash)
Total 0.97 1.47

Message Log with uncompressed kernel (refer Measuring Boot Time)

$ ./tstamp.exe < /dev/ttyS0

       column1 is elapsed time since first message
       column2 is elapsed time since previous message
       column3 is the message
0.000 0.000:
0.000 0.000:
0.000 0.000: U-Boot 1.2.0 (Jun 23 2008 - 14:53:30)
0.000 0.000:
0.000 0.000: I2C:   ready
0.000 0.000: DRAM:  256 MB
0.000 0.000: MY AMD Flash: 16 MB
0.060 0.060: In:    serial
0.060 0.000: Out:   serial
0.060 0.000: Err:   serial
0.070 0.010: RM Clock :- 297MHz DDR Clock :- 162MHz
1.071 1.001: Hit any key to stop autoboot:   0
1.351 0.280: # Booting image at 80007fc0 ...
1.351 0.000:    Verifying Checksum ...
1.502 0.151: OK
1.502 0.000: OK
1.502 0.000: ## Loading Ramdisk Image at 80900000 ...
1.502 0.000:    Verifying Checksum ...
1.592 0.090: OK
1.602 0.010:
1.602 0.000: Starting kernel ...
1.602 0.000:
1.972 0.370:bin/ash: can't access tty; job control turned off

Note: remember to subtract 1 second bootdelay

Message Log with compressed kernel (refer Measuring Boot Time)

$ ./tstamp.exe < /dev/ttyS0

       column1 is elapsed time since first message
       column2 is elapsed time since previous message
       column3 is the message
0.000 0.000:
0.000 0.000: U-Boot 1.2.0 (Jun 23 2008 - 14:53:30)
0.000 0.000:
0.000 0.000: I2C:   ready
0.010 0.010: DRAM:  256 MB
0.010 0.000: MY AMD Flash: 16 MB
0.060 0.050: In:    serial
0.060 0.000: Out:   serial
0.070 0.010: Err:   serial
0.070 0.000: ARM Clock :- 297MHz DDR Clock :- 162MHz
1.071 1.001: Hit any key to stop autoboot:   0
1.261 0.190: # Booting image at 80007fc0 ...
1.261 0.000:    Verifying Checksum ...
1.331 0.070: OK
1.331 0.000: OK
1.341 0.010: ## Loading Ramdisk Image at 80900000 ...
1.341 0.000:    Verifying Checksum ...
1.432 0.091: OK
1.432 0.000:
1.432 0.000: Starting kernel ...
1.432 0.000:
2.093 0.661: Uncompressing Linux.................................................
......................... done, booting the kernel.
2.473 0.380:/bin/ash: can't access tty; job control turned off

Note: remember to subtract 1 second bootdelay

Hardware setup

Board:DM6446 DVEVM
RS232 port connected to PC

Software setup

Linux: Montavista Pro 5.0 installed on Linux Box
LSP: REL_LSP_PSP_02_00_00_010

Linux configuration for boot time reduction

U-Boot, Kernel and ramdisk(cramfs) in NOR flash.
Rootfilesystem is ramdisk (cramfs)

Optimize kernel size

  • Remove unused components from Kernel
  • Use loadable modules option to defer initialization of components to after-boot. Example: network initialization.
This gave 0x107650 bytes (~1MB) compressed Kernel and 0x238FA0 bytes (~2.2MB) uncompressed Kernel.
I have used 0x107650 and 0x238FA0 in this article, please replace it with your Kernel size appropriately.

Optimize filesystem size

  • Rebuild Rootfilesystem with minimal components
  • Use cramfs as rootfilesystem
  • Recipe to make cramfs from existing ext2 filesystem
Make ramdisk
 Host# mkdir <tempdir>
 Host# cd <tempdir>
 Host# cp /opt/montavista/pro/devkit/arm/v5t_le_uclibc/images/ramdisk.gz .
 Host# gzip -d ramdisk.gz
loop mount
 Host# mkdir disk
 Host# mount -o loop -t ext2 ramdisk disk
copy modules (created during kernel size optimization)
 Host# cp /opt/montavista/pro/devkit/lsp/ti-davinci/linux-2.6.18_pro500/drivers/net/davinci_emac_driver.ko disk/home
make cramfs 
 Host# mkcramfs -n ramdisk disk rootfs.cramfs
make it U-boot compatible. (place U-Boot header)
 Host# mkimage -A arm -T ramdisk -n 'Ramdisk' -a 0x80900000 -e 0x80900000 -d rootfs.cramfs uCramfsdisk
This gave 0x164040 (~1.4MB) filesystem(cramfs). I have used 0x164040 in this article. Please replace it with your filesystem size.

Burn u-boot,Kernel and filesystem to NOR flash

  • Burn u-boot at 0x02000000
  • Burn Kernel at 0x02050000
  • Burn filesystem at 0x02300000
  • see
    • Put CRAMFS Image to Flash
    • Burn any image to NOR flash
  • get working system for boot time reduction
  • Boot time at this stage probably is below 10 seconds
boot parameters at this stage:
 setenv bootargs mem=256M console=ttyS0,115200n8 root=/dev/ram0 ro
 setenv bootcmd 'cp.b 0x2300000 0x80900000 <your filesystem size in hex>; bootm 0x2050000 0x80900000'

bootarg parameter changes

  • Use quiet bootargs parameter to avoid printing kernel messages.
  • The Linux kernel runs a test loop that takes 200 milliseconds checking to see how fast the CPU is running.It will print the value to the screen during boot. Plug the value into bootargs to save 200ms.
  • Switch off network initialization during boot. (do it a part of your system or application initialization on need basis)
  • Set a memory limit on the bootargs parameter that is just enough to support your system. 16MB is used here. Linux allocates and pre-initializes all the DDR memory in its heap, the smaller the heap, the quicker the pre-initialization process.
  • provide the shell to run as part of bootargs
  • Boot time is not very impressive yet
boot parameters at this stage:
setenv bootargs mem=16M console=ttyS0,115200n8 root=/dev/ram0 ro quiet lpj=741376 ip=off init=/bin/ash

Optimize u-boot

NOR EMIFA CS2 settings

U-Boot is configured for the slowest possible NOR speed by defalut. So, copy from NOR is very slow (slowest possible).Board uses AM29LV256MH. connected using 16 bit bus. Data sheet (of NOR) mentions 120ns as the access time (read setup time + read strobe time). read strobe time has to be atleast 40ns. EMIFA runs at 100 MHz EMIFA speed (1/6 th of PLL1 600MHz). i.e. theoretically 12 EMIFA clocks are needed to fetch one short(16 bits bus).

Convention to calculate EMIFA cycles need
 EMIFA cycles = ceil of ("calculated cycles as per data sheets") + 1
                = 12 + 1
one additional clock (+ 1) is to account for crystal/oscillator accuracy.
EMIFA CS2 setting can be any of the following (refer DM6446 EMIF user guide for EMIFA CS2 register)
EMIFA CS2 0x3FFE058D. read setup of 1 clock (register value of 0) + 12 clocks (register value of 11).
EMIFA CS2 0x3FFEC20D. read setup of 7 clocks (register value of 6) + 5 clocks (register value of 4)
read hold time is not needed, but minimum is 1 clock on DM6446 (register value of 0)

Changes go into AEMIF NOR initialization part of board/davinci/lowlevel_init.S

ACFG2:                  .word 0x01E00010
ACFG2_VAL:              .word 0x3FFE058D
..
LDR R0, ACFG2
LDR R1, ACFG2_VAL
LDR R2, [R0]
AND R1, R2, R1
STR R1, [R0]
...

Parameter Space

Reduce the U-Boot Parameter Space from 128kB to 2kB.

Changes goino include/configs/davinci.h

#define CFG_ENV_SIZE            0x800


Remove/Relocate I2C communication code

I2C communication in U-Boot is slow and can be removed or relocated outof u-boot init code. Instead of running it as part of init, it can run as part of appropriate command (when u-boot is in interactive mode). On this board, for example, MAC address is read over I2C in board/davinci/davinci.c. Moved it to cmd_net.c

..
netboot_common(...)
..
if(readset_ethaddr_first) {
   /* do I2C communication to get MAC address here */
   readset_ethaddr_first = 0;
}
....

Make NOR to DDR Copy Faster

C Optimization

u-boot memory copy code is not optimal.CPU does byte by byte copy.

Two functions where the copy happens

memmove of lib_generic/string.c
do_mem_cp of common/cmd_mem.c

By writing optimized c routine to copy data (when source and destination are aligned on double boundary) got 0.8MB per 100 milliseconds. i.e. 2.4MB (Kernel+filesystem) copy takes 300 milliseconds.

    if( (((uint)dst|(uint)src)&0x7) == 0 )
    { // both dst and src are aligned on double boundary
       double *dDbl;
       const double *sDbl;
       
       loop = len >> 3;
       dDbl = (double *)dst;
       sDbl = (const double*)src;
       
       for (i = 0; i < loop; i++)
          *dDbl++ = *sDbl++;
       
       d = (uchar*)dDbl;
       s = (const uchar*)sDbl;
       if (len & 4)    { *d++ = *s++; *d++ = *s++; *d++ = *s++; *d++ = *s++;}
       if (len & 2)    { *d++ = *s++; *d++ = *s++; }
       if (len & 1)      *d++ = *s++;
    }

Optimization Using EDMA

By moving to EDMA to copy, got 1.26 MB per 100milliseconds. i.e. 2.4MB (Kernel+filesystem) copy takes 190 milliseconds.Theoretical calculations (based on AM29LV256MH's 120ns access time per two bytes) give 144milliseconds for 2.4MB NOR to DDR copy.

Access Kernel from NOR once

The following bootcmd has one issue, Kernel is accessed from NOR twice.First time for crc32 check and second time for kernel relocation

setenv bootcmd 'cp.b 0x2300000 0x80900000 <your filesystem size in hex>; bootm 0x2050000 0x80900000'

Change bootcmd to

setenv bootcmd 'cp.b 0x2050000 0x80700000 <your kernel size in hex>;cp.b 0x2300000 0x80900000 <your filesystem size in hex>; bootm 0x80700000 0x80900000'

Now NOR is accessed only once for copy. Crc32 and relocation happens on DDR

Kernel Relocation

u-boot relocates Kernel to 0x80008000 using memmove function. This step happens after crc32 check passes.Relocation can be avoided by making the first copy smartly.uImage (Kernel image with header) has 0x40 byte header. Copy to 0x80007FC0 puts actual Kernel at 0x80008000

Change the bootcmd

setenv bootcmd 'cp.b 0x2050000 0x80007FC0 0x107650;cp.b 0x2300000 0x80900000 0x164040; bootm 0x80007FC0 0x80900000'

This is good, but, u-boot calls memmove to copy Kernel onto itself (relocate from 0x80008000 to 0x80008000 now).

add a check in memmove code to "do nothing" if destination and source are pointing to the same address

Optimize crc32

Run it on DDR

In the steps above, Kernel is copied from NOR to DDR. Therefore, Crc32 works on DDR instead of NOR.First step of crc32 optimization is already done.

Wider input data access

Input buffer is accessed byte by byte in crc32 code. Change to 4bytes (as an integer) eachtime.It takes 500 milliseconds to do crc32 check of 2.4MB (Kernel + filesystem) at this stage.

Place crc table onchip

Place 1kByte crc table (crc_table) onchip.DM6446 has 8kB of onchip at 0xA000

#ifdef CRC_TABLE_ONCHIP
       unsigned int * pTOnchip=(unsigned int*)0x0000A000;
       if(crc_table_onchip_first)
       {
          for(i=0;i<256;i++) // copy to onchip
            pTOnchip[i] = crc_table[i];
          crc_table_onchip_first = 0;
       }
       pT = pTOnchip;
#endif

It takes ~200 milliseconds to do crc32 check of 2.4MB (Kernel + filesystem) at this stage.

Edma input data onchip

DM6446 has 8kB of onchip at 0xA000. In the step above, 1kB is used for crc table.That leaves 7kB. Use edma to get input data in chunks of 7kB and runs crc32 on onchip data.

uInt curLen;
uInt *pDOnchip=(unsigned int*)0x0000A400;

crc = crc ^ 0xffffffffL;
for(i=0;i<len;i+=curLen)
{
  curLen = ( (len-i) < (7*1024) )? (len-i):(7*1024);
  your_memcpy_using_edma(pDOnchip,buf+i,curLen);
  crc = your_crc32_with_no_compliment(crc,pDOnchip,curLen); // ^0xffffffff is already done
}
return crc ^ 0xffffffffL;

your_crc32_with_no_compliment is crc32 function without initial and final compliment (^0xffffffff)

It takes 160 milliseconds to do crc32 check of 2.4MB (compressed Kernel + filesystem)and 240 milliseconds to do crc32 check of 3.6MB (uncompressed Kernel + filesystem).

Use Uncompressed Kernel

Untill this point compressed kernel is used. Now, with copy and crc32 optimized and unzip of kernel taking 0.66 seconds, it is a trade off between Kernel size (flash size) vs. boottime.

To use uncompressed kernel (Image) with u-boot, header has to be placed on Image.

mkimage -A arm -O linux -T kernel -C none -a 0x80008000 -e 0x80008000 -n 'Linux-2.6.18_pro500' -d Image uImage
mkimage here places a 0x40 bytes header on Image to produce uImage

Now Burn Kernel at 0x02050000. see Burn any image to NOR flash

My booot parameters

bootdelay=1
bootcmd=cp.b 0x2050000 0x80007FC0 0x238FA0;cp.b 0x2300000 0x80900000 0x164040; bootm 0x80007FC0 0x80900000
bootargs=mem=16M console=ttyS0,115200n8 root=/dev/ram0 ro quiet lpj=741376 ip=off init=/bin/ash

All this gives 1 second Boot time (0.97 to be precise)

Further work

Not sure I would be able to spend further time on this.If I endup spending time on this, I would do the following

  • Kernel Unzip optimization
Kernel Unzip takes a lot of time. see results above.
Move Kernel unzipping to u-boot and optimize
  • Get more detailed splitup of "Kernel Initialization" step which is taking 0.38 and optimize.
  • Try initramfs


see also

  1. Boot Time Optimization
  2. Measuring Boot Time
  3. http://elinux.org/Boot_Time
  4. kernel and initramfs initialization in 0.5 sec sample
  5. Booting Linux Network Camera on dm365 in 3.2 (2.5) seconds
国外还有一个一秒启动的方案,似乎要钱,出了个demo,基于QT的,FS的平台

1 second Linux boot to Qt!

At the end of last year, to demonstrate my company’s swiftBoot service, I put together a rather impressive demo. Using a Renesas MS7724 development board I was able to achieve a one second cold Linux boot to a Qt application. Here’s the demo…

Many people see a demo like this and assume there are ‘smoke and mirrors’ or that we’ve implemented a suspend to disk solution. This is genuinely a cold boot including UBoot (2009-01), Linux kernel (2.6.31-rc7) and Qt Embedded Open Source 4.6.2. We’ve not applied any specific intellectual property but instead spent time analysing where boot delays are coming from and simply optimising them away. The majority of the modifications we make usually fall into the category of ‘removing things that aren’t required’, ‘optimising things that are required’, or ‘taking a new approach to solving problems’ and are tailored very precisely to the needs of the ‘product’.

If you’re interested in exactly what modification I made and a little more about the approach taken – you may be interested in these slides which I presented at ELC-E 2010 – I’m also expecting a video of this presentation to appear on Free Electrons in the near future.

You may also remember my last demo based on an OMAP3530 EVM. [© 2011 embedded-bits.co.uk]


你可能感兴趣的:(开始 汽车 智能 虚拟 仪表 2)