DSP processors are in general very I/O balanced processors. This means they offer a variety of high speed serial and parallel peripheral interfaces. These interfaces are ideally designed in a way, that they can be operated with very low or none overhead impact to the processor core, leaving enough CPU time for running the OS and processing the incoming or outgoing data.
A Blackin Processor as an example has multiple, fexible and independent Direct Memory Access (DMA) controllers. DMA transfers can occur between the processor’s internal memories and any of its DMA-capable peripherals. Additionally, DMA transfers can be performed between any of the DMA-capable peripherals and external devices connected to the external memory interfaces, including the SDRAM controller and the asynchronous memory controller.
The Blackfin processor provides besides other interfaces a Parallel Peripheral Interface (PPI) that can connect directly to parallel D/A and A/D converters, ITU-R-601/656 video encoders and decoders, and other general-purpose peripherals, such as CMOS camera sensors. The PPI consists of a dedicated input clock pin, up to 3 frame synchronization pins, and up to 16 data pins.
Figure 1 below is an example of how easily a CMOS imaging sensor can be wired to a Blackfin Processor, without the need of additional active hardware components.
|
Figure 1: Micron CMOS Camera Sensor wiring diagram |
Below is example code for a simple program that reads from a CMOS Camera Sensor, assuming a PPI driver is compiled into the kernel or loaded as a kernel module. There are two different PPI drivers available, a generic full featured driver, supporting various PPI operation modes (ppi.c), and a simple PPI Frame Capture Driver (adsp-ppifcd.c). Latter is here used.
The application opens the PPI device driver, performs some I/O controls (ioctls), setting the number of pixels per line and the number of lines to be captured. After the application invokes the read system call, the driver arms the DMA transfer. The start of a new frame is detected by the PPI peripheral, by monitoring the Line- and Frame-Valid strobes.
A special correlation between the two signals indicates the start of frame, and kicks-off the DMA transfer, capturing pixels per line times lines samples. The DMA engine stores the incoming samples at the address allocated by the application. After the transfer is finished, execution returns to the application.
The image is then converted into the PNG (Portable Network Graphic) format, utilizing libpng included in the uClinux distribution. The converted image is then written to stdout. Assuming the compiled program executable is called readimg, a command line to execute the program, writing the converted output image to a file, can look like following:
Audio, Video and Still Image Silicon Products widely use a I2C compatible Two Wire Interface (TWI) as a system configuration bus. The configuration bus allows a system master to gain access over device internal configuration registers, such as brightness. Usually, I2C devices are controlled by a kernel driver. But it is also possible to access all devices on an adapter from user space, through the /dev interface. Following example shows how to write a value of 0x248 into register 9 of a I2C slave device identified by
I2C_DEVID:
The power of Linux is the inexhaustible number of applications released under various open source licenses that can be cross compiled to run on the embedded uClinux system. Cross compiling can be sometimes a little bit tricky, that’s why it’s discussed here.
Cross compiling
Linux or UNIX is not a single platform, there is a wide range of choices. Most programs distributed as source code are coming with a so-called 'configure' script. This is a shell script that must be run to recognize the current system configuration, so that the correct compiler switches, library paths and tools will be used.
When there isn’t a configure script, the developer can manually modify the Makefile to add target processor specific changes, or can integrate it into the uClinux distribution. Detailed instructions can be found here [18]. The configure script is usually a big script, and it takes quite a while to execute. When this script is created from recent autoconf releases, it will work for Blackfin/uClinux with minor or none modifications.
The configure shell script inside a source package, can be executed for cross compilation using following command line:
CC='bfin-uclinux-gcc �CO2 -Wl,-elf2flt' ./configure --host=bfin-uclinux --build=i686-linux
Alternatively:
./configure --host=bfin-uclinux --build=i686-linux LDFLAGS='-Wl,-elf2flt' CFLAGS=-O2
There are at least two events that are able to stop the running script:
(1) some of the files used by the script are too old or
(2) there are missing tools or libraries. If the supplied scripts are too old to execute properly for bfin-uclinux, or they don't recognize bfin-uclinux as a possible target. The developer need to replace config.sub with more recent version form (e.g. a up to date gcc source directory). Only in very few cases cross compiling is not supported by the configure.in script manually written by the author and used by autoconf. In this case latter file can be modified to remove or change the failing test case.
Network Oscilloscope Demo
The Network Oscilloscope Demo shown in
Figure 2 below is one of the sample applications, besides the VoIP Linphone Application or the Networked Audio Player, included in the Blackfin/uClinux distribution. Purpose of the Network Oscilloscope Project is to demonstrates a simple remote GUI (Graphical User Interface) mechanism to share access and data, distributed over a TCP/IP network. Furthermore it demonstrates the integration of several open source projects and libraries as building blocks into single application.
For instance gnuplot, a portable command-line driven interactive data file and function plotting utility, is used to generate graphical data plots, while thttpd a CGI (Common Gateway Interface) capable web server is servicing incoming HTTP requests. CGI is typically used to generate dynamic webpages. It's a simple protocol to communicate between web forms and a specified program. A CGI script can be written in any language, including C/C++ ,that can read stdin, write to stdout, and read environment variables.
The Network Oscilloscope works as following. A remote web browser contacts the HTTP server running on uClinux where the CGI script resides, and asks it to run the program. Parameters from the HTML form such as sample frequency, trigger settings and displaying options are passed to the program through the environment. The called program samples data from a externally connected Analog to Digital Converter (ADC) using a Linux device driver (adsp-spiadc.c).
Incoming samples are preprocessed and stored in a file. The CGI program then starts gnuplot as a process and requests to generate a PNG or JPEG image based on the sampled data and form settings. The webserver takes the output of the CGI program and tunnels it through to the web browser. The web browser displays the output as an HTML page, including the generated image plot.
|
Figure 2 |
Real-time capabilities of uClinux
Since Linux was originally developed for server and desktop usage, it has no hard real-time capabilities like most other operating systems of comparable complexity and size. Nevertheless, Linux―and in particular, uClinux―has excellent so-called “soft real-time” capabilities. This means that while Linux or uClinux cannot guarantee certain interrupt or scheduler latency compared with other operating systems of similar complexity, they show very favorable performance characteristics. If one needs a so-called “hard real-time” system that can guarantee scheduler or interrupt latency time, there are a few ways to achieve such a goal:
1) Provide the real-time capabilities in the form of an underlying minimal real-time kernel such as RT-Linux ( http://www.rtlinux.org) or RTAI (http://www.rtai.org). Both solutions use a small real-time kernel that runs Linux as a real-time task with lower priority. Programs that need predictable real time are designed to run on the real-time kernel and are specially coded to do so. All other tasks and services run on top of the Linux kernel and can utilize everything that Linux can provide. This approach can guarantee deterministic interrupt latency while preserving the flexibility that Linux provides.
2) Provide the real-time capabilities using Xenomai [19]. Xenomai is a real-time development framework cooperating with the Linux kernel, in order to provide a pervasive, interface-agnostic, hard real-time support to user-space applications, seamlessly integrated into the GNU/Linux environment. It is based on an abstract RTOS core, usable for building any kind of real-time interfaces, over a nucleus which exports a set of generic RTOS services. Any number of RTOS personalities called "skins" can then be built over the nucleus, providing their own specific interface to the applications, by using the services of a single generic core to implement it. Aside of its own native and POSIX interfaces, Xenomai also provides emulators for the VxWorks, VRTX, pSOS+ and uITRON personalities. People interested in learning more about this project can refer to the on-line documentation [21].
For the initial Blackfin port, included in Xenomai v2.1 [20], the worst-case scheduling latency observed so far with user-space Xenomai threads on a Blackfin BF533 is slightly lower than 50 us under load, with an expected margin of improvement of 10-20 us, in the future.
Xenomai and RTAI use Adeos [22] as a underlying Hardware Abstraction Layer (HAL). Adeos is a real-time enabler for the Linux kernel. To this end, it enables multiple prioritized O/S domains to exist simultaneously on the same hardware, connected through an interrupt pipeline.
Xenomai as well as Adeos has been ported to the Blackfin architecture by Philippe Gerum who leads both projects. This development has been significantly sponsored by Openwide, a specialist in embedded and real-time solutions for Linux [23].
Nevertheless in most cases, hard real time is not needed, particularly for consumer multimedia applications, in which the time constraints are dictated by the abilities of the user to recognize glitches in audio and video. Those physically detectable constraints that have to be met normally lie in the area of milliseconds―which is no big problem on fast chips like the Blackfin Processor. In Linux kernel 2.6.x, the new stable kernel release, those qualities have even been improved with the introduction of the new O(1) scheduler.
Figures 3 and 4 below show the context switch time for a default Linux 2.6.x kernel running on Blackfin/uClinux:
|
Figure 3 |
|
Figure4 |
Context Switch time was measured with
lat_ctx from
lmbench. The processes are connected in a ring of Unix pipes. Each process reads a token from its pipe, possibly does some work, and then writes the token to the next process. As number of processes increases, effect of cache is less. For 10 processes the average context switch time is 16.2us, and with a standard deviation of .58, 95% of time, is under 17us.
Comclusion
Blackfin Processors offer a good price performance ratio (800 MMAC @ 400 MHz for less than $5/unit in quantities), advanced power management functions, and small mini-BGA packages. This represents a very low power, cost and space-efficient solution. The Blackfin’s advanced DSP and multimedia capabilities qualify it not only for audio and video appliances, but also for all kinds of industrial, automotive, and communication devices.
Development tools are well tested, documented and include everything necessary to get started and successfully finished in-time. Another advantage of the Blackfin Processor in combination with uClinux is the availability of a wide range of applications, drivers, libraries and protocols, often as open source or free software. In most cases, there is only basic cross compilation necessary to get that software up and running.
Combine this with such invaluable tools as Perl, Python, MySQL and PHP, and developers have the opportunity to develop even the most demanding feature-rich applications in a very short time frame, often with enough processing power left for future improvements and new features.
Since obtaining his MSc (Computer Based Engineering) and Dipl-Ing.(FH) (Electronics and Information Technologies) Degree from the Reutlingen University, Michael Hennerich has worked as a design engineer on a variety of DSP based applications. Michael now works as a DSP Applications and Systems Engineer at Analog Devices Inc. in Munich.
This article is excerpted from a paper of the same name presented at the Embedded Systems Conference Silicon Valley 2006. Used with permission of the Embedded Systems Conference. For more information, please visit www.embedded.com/esc/sv
.
References
[1] Analog Devices, Inc. Blackfin Processors
[2] uClinux Project Page
[3] The Linux Kernel Archives
[4] The Blackfin/uClinux Project Page
[5] Busybox Project Page
[6] Linuxdevices
[7] Context Switching and IPC Performance Comparison between uClinuxand Linux on the ARM9 based Processor, by Hyok-Sung Choi, Hee-ChulYun
[8] Linux Test Project (LTP)
[9] DejaGnu - GNU Project - Free Software Foundation (FSF)
[10] Blackfin/uClinux Documentation DokuWiki
[11] ADSP-BF537 STAMP Board Support Package (BSP)
[12] GCC Home Page - GNU Project - Free Software Foundation (FSF)
[15] Cooperative Linux Project Page
[16] Das U-Boot - Universal Bootloader Project Page
[17] GCC Code-Size Benchmark Environment ( CSiBE) Department of Software Engineering, University of Szeged
[18] Blackfin/uClinux Documentation DokuWiki
[19] Xenomai Project Page
[20] Xenomai Download
[21] Xenomai Documentation
[22] Adeos Project Page
[23] Openwide