Google
 

Tuesday, July 17, 2007

Processes, Threads, and Jobs

Although programs and processes appear similar on the surface, they are fundamentally different. A program is a static sequence of instructions, whereas a process is a container for a set of resources used when executing the instance of the program. At the highest level of abstraction, a Windows process comprises the following:

  • A private virtual address space, which is a set of virtual memory addresses that the process can use

  • An executable program, which defines initial code and data and is mapped into the process's virtual address space

  • A list of open handles to various system resources, such as semaphores, communication ports, and files, that are accessible to all threads in the process

  • A security context called an access token that identifies the user, security groups, and privileges associated with the process

  • A unique identifier called a process ID (internally called a client ID)

  • At least one thread of execution

Each process also points to its parent or creator process. However, if the parent exits, this information is not updated. Therefore, it is possible for a process to point to a nonexistent parent. This is not a problem, as nothing relies on this information being present.


A thread is the entity within a process that Windows schedules for execution. Without it, the process's program can't run. A thread includes the following essential components:

  • The contents of a set of CPU registers representing the state of the processor.

  • Two stacks, one for the thread to use while executing in kernel mode and one for executing in user mode.

  • A private storage area called thread-local storage (TLS) for use by subsystems, run-time libraries, and DLLs.

  • A unique identifier called a thread ID (also internally called a client ID—process IDs and thread IDs are generated out of the same namespace, so they never overlap).

  • Threads sometimes have their own security context that is often used by multithreaded server applications that impersonate the security context of the clients that they serve.

The volatile registers, stacks, and private storage area are called the thread's context. Because this information is different for each machine architecture that Windows runs on, this structure, by necessity, is architecture-specific. The Windows GetThreadContext function provides access to this architecture-specific information (called the CONTEXT block).

Fibers vs. Threads

Fibers allow an application to schedule its own "threads" of execution rather than rely on the priority-based scheduling mechanism built into Windows. Fibers are often called "lightweight" threads, and in terms of scheduling, they're invisible to the kernel because they're implemented in user mode in Kernel32.dll. To use fibers, a call is first made to the Windows ConvertThreadToFiber function. This function converts the thread to a running fiber. Afterward, the newly converted fiber can create additional fibers with the CreateFiber function. (Each fiber can have its own set of fibers.) Unlike a thread, however, a fiber doesn't begin execution until it's manually selected through a call to the SwitchToFiber function. The new fiber runs until it exits or until it calls SwitchToFiber, again selecting another fiber to run. For more information, see the Platform SDK documentation on fiber functions.


Although threads have their own execution context, every thread within a process shares the process's virtual address space (in addition to the rest of the resources belonging to the process), meaning that all the threads in a process can write to and read from each other's memory. Threads cannot accidentally reference the address space of another process, however, unless the other process makes available part of its private address space as a shared memory section (called a file mapping object in the Windows API) or unless one process has the right to open another process to use cross-process memory functions such as ReadProcessMemory and WriteProcessMemory.

In addition to a private address space and one or more threads, each process has a security identification and a list of open handles to objects such as files, shared memory sections, or one of the synchronization objects such as mutexes, events, or semaphores.


Every process has a security context that is stored in an object called an access token. The process access token contains the security identification and credentials for the process. By default, threads don't have their own access token, but they can obtain one, thus allowing individual threads to impersonate the security context of another process—including processes running on a remote Windows system—without affecting other threads in the process.

The virtual address descriptors (VADs) are data structures that the memory manager uses to keep track of the virtual addresses the process is using.

Windows provides an extension to the process model called a job. A job object's main function is to allow groups of processes to be managed and manipulated as a unit. A job object allows control of certain attributes and provides limits for the process or processes associated with the job. It also records basic accounting information for all processes associated with the job and for all processes that were associated with the job but have since terminated. In some ways, the job object compensates for the lack of a structured process tree in Windows—yet in many ways it is more powerful than a UNIX-style process tree.

No comments: