SMP

Introduction
Booting an x86 SMP system
Booting a PowerPC SMP system
How the SMP microkernel works
Critical sections

Introduction

SMP (Symmetrical Multi-Processing) is typically associated with high-end operating systems such as UNIX and NT running on high-end servers. These large monolithic systems tend to be quite complex, the result of many man-years of development. Since these large kernels contain the bulk of all OS services, the changes to support SMP are extensive, usually requiring large numbers of modifications and the use of specialized spinlocks throughout the code.

QNX Neutrino, on the other hand, contains a very small microkernel surrounded by processes that act as resource managers, providing services such as filesystems, character I/O, and networking. By modifying the microkernel alone, all other OS services will gain full advantage of SMP without the need for coding changes. If these service-providing processes are multi-threaded, their many threads will be scheduled among the available processors. Even a single-threaded server would also benefit from an SMP system, because its thread would be scheduled on the available processors beside other servers and client processes.

SMP versions of `procnto*`

As a testament to this microkernel approach, the SMP-enabled QNX Neutrino kernel/process manager adds only a few kilobytes of additional code. The SMP versions are designed for these main processor families:

PowerPC (e.g. procnto-600-smp).
MIPS (procnto-smp)
x86 (procnto-smp)

The x86 version can boot on any system that conforms to the Intel MultiProcessor Specification (MP Spec) with up to eight Pentium (or better) processors. QNX Neutrino also supports Intel's new Hyper-Threading Technology found in P4 and Xeon processors.

The procnto-smp manager will also function on a single non-SMP system. With the cost of building a dual-processor Pentium motherboard very nearly the same as a single-processor motherboard, it's possible to deliver cost-effective solutions that can be scaled in the field by the simple addition of a second CPU. The fact that the OS itself is only a few kilobytes larger also allows SMP to be seriously considered for small CPU-intensive embedded systems, not just high-end servers.

The PowerPC and MIPS versions of the SMP-enabled kernel deliver full SMP support (e.g. cache-coherency, inter-processor interrupts, etc.) on appropriate PPC and MIPS hardware. The PPC version supports any SMP system with 7xx or 74xx series processors, as in such reference design platforms as the Motorola MVP or the Marvell EV-64260-2XMPC7450 SMP Development System. The MIPS version supports such systems as the dual-core Broadcom BCM1250 processor.

Booting an x86 SMP system

The microkernel itself contains very little hardware- or system-specific code. The code that determines the capabilities of the system is isolated in a startup program, which is responsible for initializing the system, determining available memory, etc. Information gathered is placed into a memory table available to the microkernel and to all processes (on a read-only basis).

The startup-bios program is designed to work on systems compatible with the Intel MP Spec (version 1.4 or later). This startup program is responsible for:

determining the number of processors
determining the address of the local and I/O APIC
initializing each additional processor.

After reset, only one processor will be executing the reset code. This processor is called the boot processor (BP). For each additional processor found, the BP running the startup-bios code will:

initialize the processor
switch it to 32-bit protected mode
allocate the processor its own page directory
set the processor spinning with interrupts disabled, waiting to be released by the kernel.

Booting a PowerPC or MIPS SMP system

On a PPC or MIPS SMP system, the boot sequence is similar to that of an x86, but a specific startup program (e.g. startup-mvp, startup-bcm1250) will be used instead. Specifically, the PPC-specific startup is responsible for:

determining the number of processors
initializing each additional processor
initializing the IRQ, IPI, system controller, etc.

For each additional processor found, the startup code will:

initialize the processor
initialize the MMU
initialize the caches
set the processor spinning with interrupts disabled, waiting to be released by the kernel.

How the SMP microkernel works

Once the additional processors have been released and are running, all processors are considered peers for the scheduling of threads.

Scheduling

The scheduling algorithm follows the same rules as on a uniprocessor system. That is, the highest-priority thread will be running on the available processor. If a new thread becomes ready to run as the highest-priority thread in the system, it will be dispatched to the appropriate processor. If more than one processor is selected as a potential target, then the microkernel will try to dispatch the thread to the processor where it last ran. This affinity is used as an attempt to reduce thread migration from one processor to another, which can affect cache performance.

In an SMP system, the scheduler has some flexibility in deciding exactly how to schedule low-priority threads, with an eye towards optimizing cache usage and minimizing thread migration. In any case, the realtime scheduling rules that were in place on a uniprocessor system are guaranteed to be upheld on an SMP system.

Hard processor affinity

QNX Neutrino also supports the concept of hard processor affinity through the kernel call ThreadCtl(_NTO_TCTL_RUNMASK, runmask). Each set bit in runmask represents a processor that a thread can run on. By default, a thread's runmask is set to all ones, allowing it to run on any processor. A value of 0x01 would allow a thread to execute only on the first processor. By careful use of this primitive, a systems designer can further optimize the runtime performance of a system (e.g. by relegating nonrealtime processes to a specific processor). In general, however, this shouldn't be necessary, because our realtime scheduler will always preempt a lower-priority thread immediately when a higher-priority thread becomes ready. Processor locking will likely affect only the efficiency of the cache, since threads can be prevented from migrating.

Kernel locking

In a uniprocessor system, only one thread is allowed to execute within the microkernel at a time. Most kernel operations are short in duration (typically a few microseconds on a Pentium-class processor). The microkernel is also designed to be completely preemptable and restartable for those operations that take more time. This design keeps the microkernel lean and fast without the need for large numbers of fine-grained locks. It is interesting to note that placing many locks in the main code path through a kernel will noticeably slow the kernel down. Each lock typically involves processor bus transactions, which can cause processor stalls.

In an SMP system, QNX Neutrino maintains this philosophy of only one thread in a preemptable and restartable kernel. The microkernel may be entered on any processor, but only one processor will be granted access at a time.

For most systems, the time spent in the microkernel represents only a small fraction of the processor's workload. Therefore, while conflicts will occur, they should be more the exception than the norm. This is especially true for a microkernel where traditional OS services like filesystems are separate processes and not part of the kernel itself.

Inter-processor interrupts (IPIs)

The processors communicate with each other through IPIs (inter-processor interrupts). IPIs can effectively schedule and control threads over multiple processors. For example, an IPI to another processor is often needed when:

a higher-priority thread becomes ready
a thread running on another processor is hit with a signal
a thread running on another processor is canceled
a thread running on another processor is destroyed.

Critical sections

To control access to data structures that are shared between them, threads and processes use the standard POSIX primitives of mutexes, condvars, and semaphores. These work without change in an SMP system.

Many realtime systems also need to protect access to shared data structures between an interrupt handler and the thread that owns the handler. The traditional POSIX primitives used between threads aren't available for use by an interrupt handler. There are two solutions here:

One is to remove all work from the interrupt handler and do all the work at thread time instead. Given our fast thread scheduling, this is a very viable solution.
In a uniprocessor system running QNX Neutrino, an interrupt handler may preempt a thread, but a thread will never preempt an interrupt handler. This allows the thread to protect itself from the interrupt handler by disabling and enabling interrupts for very brief periods of time.

The thread on a non-SMP system protects itself with code of the form:

InterruptDisable()
// critical section
InterruptEnable()

Or:

InterruptMask(intr)
// critical section
InterruptUnmask(intr)

Unfortunately, this code will fail on an SMP system since the thread may be running on one processor while the interrupt handler is concurrently running on another processor!

One solution would be to lock the thread to a particular processor (by setting the processor affinity to 1 via the ThreadCtl() function).

A better solution would be to use a new exclusion lock available to both the thread and the interrupt handler. This is provided by the following primitives, which work on both uniprocessor and SMP machines:

InterruptLock(intrspin_t* spinlock): Attempt to acquire spinlock, a variable shared between the interrupt handler and thread. The code will spin in a tight loop until the lock is acquired. After disabling interrupts, the code will acquire the lock (if it was acquired by a thread). The lock must be released as soon as possible (typically within a few lines of C code without any loops).
InterruptUnlock(intrspin_t* spinlock): Release a lock and reenable interrupts.

On a non-SMP system, there's no need for a spinlock.