Mapping memory efficiently

Dan Saks

10/20/2004 3:00 PM EDT

How you define pointers to memory-mapped device registers can have an impact on the efficiency of your device drivers.

Many processors use memory-mapped I/O, which maps device registers to fixed addresses in the conventional memory space. To a C or C++ programmer, a memory-mapped device register looks very much like an ordinary data object. Programs can use ordinary assignment operators to move values to or from memory-mapped device registers.

Unfortunately, neither Standard C nor C++ lets you declare a variable to reside at a specified address. (A few compilers have extensions that will let you do it, but most don't.) In Standard C and C++, you typically access a memory-mapped device register by dereferencing a pointer whose value is the register's address.

As I discussed in my last column, you can define a pointer to a memory-mapped device register either as a macro or as a constant object. These alternative definitions are often interchangeable, but they actually have slightly different behavior. After briefly reviewing those differences, I'll explain why they also might generate slightly different machine code.

A brief recap with a correction
In my previous column, I sketched a small example of memory-mapped I/O using the ARM Evaluator-7T single-board computer. The board's documentation refers to the device registers as special registers, so I do, too.

The Evaluator-7T's memory is byte-addressable, but each special register occupies a four-byte word. Special registers are also volatile, so I defined the type for special registers as:

typedef unsigned int volatile special_register;

The Evaluator-7T uses five special registers to control the two integrated timers:

  • TMOD: timer mode register
  • TDATA0: timer 0 data register
  • TDATA1: timer 1 data register
  • TCNT0: timer 0 count register
  • TCNT1: timer 1 count register
which I represented as a struct defined as:

 

typedef struct dual_timers dual_timers; struct dual_timers { special_register TMOD; special_register TDATA0; special_register TDATA1; special_register TCNT0; special_register TCNT1; };

The timer registers on the Evaluator-7T reside at address 0x03FF6000. A program can access the timer registers via a pointer defined as a macro, as in:

#define timers ((dual_timers *)0x03FF6000)

or as a constant object, as in:

 

dual_timers *const timers = (dual_timers *)0x03FF6000;

The TMOD register contains bits that you can set to enable a timer and clear to disable a timer. You can define the masks for those bits as enumeration constants:

enum { TE0 = 0x01, TE1 = 0x08 };

Then you can disable both timers using:

timers->TMOD &= ~(TE0 | TE1);

In my last column, I inadvertently wrote this expression with & instead of |, as in:

timers->TMOD &= ~(TE0 & TE1); //incorrect

I apologize for the error. Thanks to Ralf Holly ([email protected]) for being the first to bring this to my attention.

Defining the pointer as a macro has a couple of drawbacks:

  • With many development platforms, macro names are invisible to the debugger.
  • Macros names don't observe the scope rules that apply to other names, and so they might substitute in places where you don't expect them to.

Declaring timers as a constant pointer avoids both of these problems. On the other hand, with some compilers on some platforms, declaring timers as a constant pointer might produce slightly larger and/or slower code. Here's why.

The theory
When timers is a macro, the compiler transforms:

timers->TMOD &= ~(TE0 | TE1);

into:

((dual_timers *)0x03FF6000)->TMOD &= ~(TE0 | TE1);

Member TMOD is a special_register at offset zero within the dual_timers struct, so this expression stores a value into location 0x03FF6000, just as if you had written:

(*(special_register *)0x03FF6000) &= ~(TE0 | TE1);

The subexpression TE0 | TE1 yields the value 9, so the entire expression simplifies to:

(*(special_register *)0x03FF6000) &= ~9;

The expression (special_register *)0x03FF6000 is an rvalue. So is ~9. As I explained in an earlier column, an lvalue is an expression that designates an object. An rvalue is an expression that's not an lvalue; it doesn't refer to an object. In practice, it's not that an rvalue can't refer to an object—it's just that an rvalue doesn't necessarily refer to an object. By assuming that rvalues do not refer to objects, C and C++ compilers gain considerable freedom in generating code for rvalue expressions.

For example, a compiler might generate data storage with compiler-generated names to hold the values 0x03FF6000 and ~9 as if they were lvalues, as in:

T1: word 0x03FF6000
T2: word ~9

It would then generate code that uses T1 and T2 to evaluate the expression. In some make-believe assembly language, this might look like:

mov r0, T1 ; r0 points to TMOD
mov r1, *r0 ; read TMOD into r1
and r1, T2 ; bitwise-and ~9 into r1
mov *r0, r1 ; write r1 back to TMOD

Many machines provide instructions with immediate operand addressing, in which the operand can be part of the instruction rather than separate data. The compiler might use immediate mode operands for both of the rvalues, in which case the assembly code might look like:

mov r0, #0x03FF6000
mov r1, *r0
and r1, #-10
mov *r0, r1

or maybe even just:

and 0x03FF6000, #-10

In this case, the rvalues never appear as objects in the data space. Rather, they appear as part of the instruction(s) in the code space.

When timers is a constant object, the expression timers is an lvalue. That is, it designates an object. In C, constant objects—in fact, all objects—declared at global scope have external linkage by default. That is, they behave as if they had been declared with the keyword extern, as in:

 

extern dual_timers *const timers = (dual_timers *)0x03FF6000;

This means that references to timers may appear in other translation units and a C compiler must generate storage for timers just in case such external references exist. In theory, the linker might be able to determine that no external references exist and eliminate the storage for timers, but I don't know of a linker that does.

In short, if you declare timers as a global constant object, a C compiler will almost certainly generate storage for that pointer. However, this doesn't mean that the code that evaluates an expression such as:

timers->TMOD &= ~(TE0 | TE1);

must use the constant pointer object. It might just use the pointer's value as an immediate operand and ignore the generated constant. In that case, the generated constant is just wasted space.

In C++, constant objects declared at global scope (or any namespace scope) have internal linkage by default. That is, they behave as if they had been declared with the keyword static, as in:

 

static dual_timers *const timers = (dual_timers *)0x03FF6000;

This means that all references to timers must appear in the same translation unit as the definition for timers. In that case, the compiler might be able to determine that it doesn't need to generate the storage for the constant pointer. A C compiler should also be able to eliminate the storage for the constant pointer if you define it with the keyword static. Both C and C++ can also eliminate the constant pointer's storage if you define it local to a function.

The practice
I wrote a number of small programs to test if real compilers generated code for memory-mapped I/O as I just described. I looked at the generated assembly code and found that, for the most part, the compilers behaved as I expected. However, I did encounter a few surprises.

Listing 1: A little test to see how the compiler generates code to access memory-mapped device registers

typedef unsigned int volatile special_register;

typedef struct dual_timers dual_timers;
struct dual_timers
    {
    special_register TMOD;
    special_register TDATA0;
    special_register TDATA1;
    special_register TCNT0;
    special_register TCNT1;
    };

enum { TE0 = 0x01, TE1 = 0x08 };

#define timers ((dual_timers *)0x03FF6000)


int main()
	{
	timers->TMOD &= ~(TE0 | TE1);
	timers->TDATA0 = 50000;
	return 0;
	}

The first test program appears in Listing 1. It defines timers as a macro and uses it to access a couple of the timer registers. I compiled it as both C and C++ using four different compilers. In all cases, the compiler generated code that used immediate operands for the pointer values, and did not generate a copy of the constant pointer in the data space. No surprises there.

I then replaced the macro with a constant pointer:

 

dual_timers *const timers = (dual_timers *)0x03FF6000;

First, I compiled the code as C. As I expected, all of the compilers generated a copy of the constant pointer in the data space. Two of the compilers generated code that used the constant pointer. The other two compilers generated code that ignored the constant pointer and used immediate operands instead. When I compiled the same code as C++, every compiler generated code that eliminated the constant pointer and used immediate operands. Again, no surprises.

Next, I added the keyword static to the constant pointer definition:

 

static dual_timers *const timers = (dual_timers *)0x03FF6000;

Since static is the default for global constant pointers in C++, I expected this change to have no impact on the generated code. Indeed, for all but one of the C++ compilers, this had no impact. However, one compiler generated data storage for the constant pointer, with internal linkage, and used that pointer in the code that accesses the memory-mapped device registers.

I compiled that code as C expecting each compiler to produce the same code as it did when compiling as C++. Only one compiler did. The others generated slightly poorer code. One compiler generated code that used immediate operands but generated an unused copy of the constant pointer with internal linkage. The other two compilers generated code that used a constant pointer rather than immediate operands. I was surprised that so many C compilers failed to eliminate the data storage for constant pointers that they really didn't need.

So what's the bottom line? All other things being equal, you should avoid macros and use constant pointer objects. Unfortunately, other things aren't equal. My little study suggests that, while C++ compilers can produce equally good code when using constant pointer objects instead of macros, C compilers struggle with constant pointers and often produce non-optimal code. In the grand scheme of things, the loss of storage economy is probably too small to worry about, but you should be aware of it.

I encourage you to run these tests on your compiler(s). I'd be interested to see if you get different results.

Finding past columns
Most of my columns, including this one, contain references to earlier columns. You can find my recent past columns at http://www.embedded.com/columns/pp, but only back about three or four years. I recently brought my website, www.dansaks.com, on line. As I find time, I will post older columns on my site. If you go there and can't find what you're looking for, send me your request by e-mail and I'll see what I can do.

Dan Saks is president of Saks & Associates, a C/C++ training and consulting company. You can write to him at [email protected].

你可能感兴趣的:(Mapping memory efficiently)