Windows CE 6 Memory Architecture

Written by Henrik Viklund

http://www.addlogic.se/articles/articles/windows-ce-6-memory-architecture.html

In short, the new memory model paves way for practically unlimited number of running processes and 2 Gb per-process virtual memory. To accommodate this, the old "slot-style" architecture has been dropped in favor for a more full-grown model where only one process is mapped into the user space at any one time. The upside of this approach is that since user space is no longer packed with dormant processes, the running process can use a lot more virual memory than before -the downside is primarily inter-process communication, where mapping memory across processes becomes much more complex and will have a performance penalty associated with it.

The Legacy

Before we go into the details of the new architecture, let's recap what CE 5's memory model looks like. I'll not go into great detail since there’s plenty of information available on CE 5's memory architecture already:

Image

Windows CE 5 Virtual Memory Model

The user space of CE 5 is divided into 32 process slots, each occupying 32 Mb of virtual memory, and a shared memory area in which any processes can allocate virtual memory. This architecture actually has its roots in the very first versions of Windows CE, designed well over 10 years ago. By restricting the size and maximum number of processes, all processes can be mapped in user space all the time, greatly reducing the complexity and overhead of the kernel.

Back in those days 32 Mb was considered plenty of virtual memory for a process, and the ability to run 32 processes simultaneously on an embedded device was more than enough for most device manufacturers. In short, it was a fair compromise between performance and flexibility.

Although this architecture was almost bordering to overkill at the time "32/32" kernel was first designed and the first CE devices where rolled out, the strain today’s typical Windows Mobile devices puts on the OS in terms of virtual memory is really at the very limit of what this architecture can handle. So, with the release of Windows CE 6, this memory model has been replaced with one that’s better suited to handle the ever increasing complexity of tomorrow's smart devices.

The CE 6 Memory Model in detail

OK, so what about this new memory model? Well, while the memory is still divided into 2 GB kernel space and 2 GB user space, that's about the only thing that’s common between CE5's and CE6's memory model.

In CE 6, each process is given 1 GB virtual memory for itself. This can be done, because the whole 1 GB process memory space is switched out when switching processes (as opposed to CE 5 and earlier, where all the processes are accessible in user space at all times). Since processes no longer share the user space with each other, the maximum number of running processes is no longer limited by how many process slots you can cram into the user space. While CE 6 marketing talks about being able to run over 32000 processes, the real limit is set by the available physical memory. The amount of physcial memory that needs to be allocated for each running process will cause us to run out of memory long before we reach 32k processes.  But, let's jsut say the new kernel and memory model is no longer the limiting factor when it comes to running processes.

User Space

Now, let’s take a closer look at how the virtual memory is partitioned in CE 6:

Image

Windows CE 6 User Space

Starting at the bottom, reaching up to the 1 GB mark, we find the process space mentioned above. At the 1 GB mark, we find 512 Mb for use by dll:s, and above that, we have 256Mb for RAM backed memory-mapped files. At the top, we have 255Mb for a shared system heap, leaving 1 Mb protective unmapped space between user space and kernel space.

Now, the 1 GB process space will map the process executable code and data, its VM allocations and any file backed memory-mapped files. In the dll area, the dll:s will have the same mapping across processes for optimization reasons. While the code pages are shared across processes, the data pages in this area will be unique physical pages for each process. The dll:s are now loaded from the bottom and grow up, as opposed to the stack-style, top-down loading utilized in CE 5 and before.

The RAM backed memory mapped-file area is mapped at a fixed location in user space. This is done for backwards-compatibility reasons. Finally, the shared system heap is an area where OS components and processes can exchange data in a safe way. While OS components have read and write permissions to this area, processes running in user space can be given read only-, or read and write access.

Kernel Space

The memory model for kernel space has also changed:

Image

Windows CE 6 kernel space

While it looks pretty familiar -the two 512 Mb areas for cached and un-cached access to the physical memory haven't changed, and the trap space is still up there on the top of the pile, there are some important changes that needs to be pointed out. First of all there is now a 128Mb area for kernel space XIP DLL:s directly above the 1GB mark, and above that 128Mb is reserved for use by the Object Store. Immediately above this, there are two 256Mb slots for kernel virtual memory.

Now, what on earth are XIP dll:s and the Object store doing in kernel space? Well, MS din't stop with just changing the virtual memory model, they have also revised the OS layout; in CE6, most drivers, the file system, and gwes runs in kernel space! I say most drivers, because you can actually still choose to run drivers in user space. There are a couple of reasons for moving the three servers into the kernel. One reason is to minimize overhead involved in doing inter process calls. If they're running in kernel space, there is less process swapping to do compared to if they ran in user space. We will look closer at the new kernel structure and os layout in another article.

Pointer Marshalling in CE 6

Having 1 GB virtual memory for each process and 512Mb for dlls is certainly not a bad thing, but abandoning the slot architecture also means greater overhead for certain operations. While switching in and out the process’ virtual memory map might seem as pretty resource intensive task for the kernel, this switch is actually a pretty simple operation. The real performance penalty comes when data needs to be accessed over process boundaries.

Let's say you want to pass a pointer to a chunk of data to a service from your application. The service runs in another process in user space. If you pass a pointer to a service in CE 5, the kernel automatically checks that the calling process has permissions to the memory pointed to, and then remaps the pointer to the calling process’ slot. This is really done with just some trivial arithmetics on the pointer, since all processes are accessible in user space at all times in CE 5.

In CE 6 however, things gets more complicated because processes are shifted in and out of user space by the kernel. Only the currently running process' memory is mapped in user space.

Consider the service example above, but applied to CE 6. Here, the pointer passed to the service is only valid when the process that passed the pointer is actually executing, and thus is loaded in user space. As soon as another process is switched in, the pointer becomes invalid. To handle this, the CE 6 kernel implements a mechanism called marshalling. When marshalling, the kernel actually may need to map part of the memory in the caller process into the memory space of the server process. So, because there's work involved in figuring out how to marshall a pointer, it has an impact on performance. But, that's the price you pay for having 1 GB process space and support for a practically unlimited number of processes.

While a lot of the tricky stuff is automatically handled by the kernel, there are situations where you need to do the marshalling yourself. I’ll take a closer look at pointer marshalling and memory security in another article, so I won’t go into any details here.

你可能感兴趣的:(Architecture)